in common.
To accomplish that without penalizing the non-profiling dispatcher
we do the stats gathering *after* the jitted code returns to the
dispatcher. For that to work properly, we need to stash away the
instruction adddress before entering the jitted code so we can use
it later. (See also VEX r2208).
Two other tweaks are included here:
(1) For the non-profiling dispatcher it is not necessary to update
the LR in each iteration. Quite obviously the jitted code cannot
modify the LR in its iteration because it needs it at the very end
when it returns. So we move this step out of the core loop.
(2) Move loading the address of VG_(tt_fast) past testing for a changed
guest state pointer.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@12044
so as to avoid a GSP-changed check in the common case. See vex r2155.
(amd64-darwin and x86-darwin are now temporarily unbuildable.)
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@11786
use test-based detection of GSP pointer changes.
Saves one load per SB.
dispatch-amd64-linux.S:
ditto
dispatch-amd64-linux.S:
use movabsq to get &VG_(tt_fast) into a register,
instead of an rsp-relative load from a constant pool.
Saves a second load per SB.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@11785
the fact that all {VG,VEX}_TRC_VALUES have their lowest bit set. All
other targets can benefit from this trick too.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@11781
make some attempt to schedule for Cortex-A8. Improves overall IPC
for none running perf/bz2.c "-O" from 0.879 to 0.925.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@11780
"insn_address >> 1". The former is appropriate for ARM code, where
all insns are 4-sized and 4-aligned, but not for Thumb code, where the
minimum size and alignment is 2. The old scheme happened to work for
Thumb (indeed, any hash function would), but caused huge amounts of
conflict misses in the fast cache for some programs.
The change has been observed to reduce conflict misses by up to 100
times, and in some cases, improves performance significantly for Thumb
code. Performance of ARM code is unchanged or possibly a bit worse.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@11716
the changes to do with reading and using ELF and DWARF3 info.
This breaks all targets except amd64-linux and x86-linux.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@10982
following improvements:
- Arch/OS/platform-specific files are now included/excluded via the
preprocessor, rather than via the build system. This is more consistent
(we use the pre-processor for small arch/OS/platform-specific chunks
within files) and makes the build system much simpler, as the sources for
all programs are the same on all platforms.
- Vast amounts of cut+paste Makefile.am code has been factored out. If a
new platform is implemented, you need to add 11 extra Makefile.am lines.
Previously it was over 100 lines.
- Vex has been autotoolised. Dependency checking now works in Vex (no more
incomplete builds). Parallel builds now also work. --with-vex no longer
works; it's little use and a pain to support. VEX/Makefile is still in
the Vex repository and gets overwritten at configure-time; it should
probably be renamed Makefile-gcc to avoid possible problems, such as
accidentally committing a generated Makefile. There's a bunch of hacky
copying to deal with the fact that autotools don't handle same-named files
in different directories. Julian plans to rename the files to avoid this
problem.
- Various small Makefile.am things have been made more standard automake
style, eg. the use of pkginclude/pkglib prefixes instead of rolling our
own.
- The existing five top-level Makefile.am include files have been
consolidated into three.
- Most Makefile.am files now are structured more clearly, with comment
headers separating sections, declarations relating to the same things next
to each other, better spacing and layout, etc.
- Removed the unused exp-ptrcheck/tests/x86 directory.
- Renamed some XML files.
- Factored out some duplicated dSYM handling code.
- Split auxprogs/ into auxprogs/ and mpi/, which allowed the resulting
Makefile.am files to be much more standard.
- Cleaned up m_coredump by merging a bunch of files that had been
overzealously separated.
The net result is 630 fewer lines of Makefile.am code, or 897 if you exclude
the added Makefile.vex.am, or 997 once the hacky file copying for Vex is
removed. And the build system is much simpler.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@10364
I tried using 'svn merge' to do the merge but it did a terrible job and
there were bazillions of conflicts. So instead I just took the diff between
the branch and trunk at r10155, applied the diff to the trunk, 'svn add'ed
the added files (no files needed to be 'svn remove'd) and committed.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@10156
more cache friendly. This changes the mechanism from being a table of
pointers to (guest address, translated code pairs) to being a table of
pairs (guest address, pointer to translated code). The effect ranges
from zero up to about 20% performance improvement on memcheck, the
biggest effects being seen for programs which jump around a large
number of blocks of code and whose data set does not fit in L2.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@6582
Use 'ctr' rather than 'lr' for indirect jumps, so as not to trash the
branch predictor(s) for returns from generated code. Makes a big
difference on ppc970 (and POWER4).
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@6291
with self hosting. Without this, the symbol has
size 0 and type NOT, and is ignored by the symbol loader.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@5872
to the extent needed to make ppc32 work.
* As a result, remove the replacements for glibc's floor/ceil fns on
ppc32/64, since vex can now correctly simulate the real ones.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@5605
branch hereby becomes inactive. This currently breaks everything
except x86; fixes for amd64/ppc32 to follow.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@5520
perf/tinycc.
- run_thread_for_a_while: just clear this thread's reservation when
starting, not all of them.
- use a different fast-cache hashing function for ppc32/64 than for
x86/amd64. This allows the former to use all the fast-cache entries
rather than just 1/4 of them.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@5441