66-illegal-instr
When translation encounters an illegal instruction, emit a call to an
illegal instruction rather than giving up altogether. Some programs
check for CPU capabilities by actually trying them out, so we want to
match a dumb Pentium's behaviour a little better.
It still prints the message, so it won't hide actual illegal or
mis-parsed instructions. I was hoping this might make the Nvidia
drivers realize they're running on a pre-MMX P5, but apparently they
just won't take that as an answer. It does make the virtual CPU
behave a little more like a real CPU though.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1370
71-linux-2.5
There doesn't seem to be any problem supporting Linux 2.5 (and one
presumes 2.6 when it appears).
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1369
75-simple-jle
Another pattern to test for Jle/Jnle. The observation is that EFLAGS
looks like this:
----O--+SZ------
with Z in bit 6, S in 7 and O in 11. Therefore RORL $7, %eflags will
result in:
Z------+-------+-------+---O---S
Since parity is only computed on the lower 8 bits, testing on P will
determine whether O==S, and since Z is in the MSB, it can be tested
with S.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1366
72-jump
Add some codegen infrastructure to make it easier to generate local
jumps. If you want to generate a local backwards jump, use
VG_(init_target)(&tgt) to initialize the target descriptor, then
VG_(emit_target_back)(&tgt) just before emitting the target
instruction. Then, when emitting the delta for the jump, call
VG_(emit_delta)(&tgt).
Forward jumps are analogous, except that you call VG_(emit_delta)()
then VG_(emit_target_forward)().
The new emit function, VG_(emit_jcondshort_target)() takes a target
pointer rather than a delta.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1364
69-simple-jlo, which takes account of the fact that the P flag is set
only from the lowest 8 bits of the result, a problem causing the
original version of this patch not to work right.
Also fixes a call to new_emit.
69-simple-jlo
For Jlo and Jnlo, which test S == O or S != O, when generating special
test sequences which don't require the simulated flags in the real
flags, generate a test and parity test to see if both bits are equal
(even parity) or not equal (odd parity).
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1363
Make file_err() not abort the current process; recover and keep
going instead. This fixes a problem running OpenOffice on cachegrind.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1362
69-simple-jlo
For Jlo and Jnlo, which test S == O or S != O, when generating special
test sequences which don't require the simulated flags in the real
flags, generate a test and parity test to see if both bits are equal
(even parity) or not equal (odd parity).
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1357
pushf/popf is catastrophically expensive on most target CPUs, which is
certainly true for P3 and Athlon and I assume (but not checked) P4.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1351
65-fix-ldt
Fix LDT handling in threaded programs. do__apply_in_new_thread() was
failing to set up the child thread's LDT inherited from the parent,
and was triggering an assert in VG_(save_thread_state)() when trying
to copy the parent's thread state to the child.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1348
- D flag is seperated from the rest (OSZCAP)
- Minimise transfers between real and simulated %eflags since these
are very expensive.
61-special-d
Make the D flag special. Store it separately in the baseblock rather
than in EFLAGs. This is because it is used almost completely unlike
the other flags, and mashing them together just makes maintaining
eflags hard.
62-lazy-eflags
Implements lazy eflags save and restore. Helps a lot.
Hopefully more documentation to follow.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1346
track of the current %EIP value and write it to memory at an INCEIP.
Uses JeremyF's idea of only writing the lowest 8 bits if the upper 24
are unchanged since the previous write. [might this cause probls
to do with write combining on high-performance CPUs? To be checked
out.]
On a simple program running a small inner loop, this gets about 2/3
the benefits of removing INCEIPs altogether, compared with the add-insn
scheme.
I tried a much more complex scheme too, in which we do analysis to
remove as many INCEIPs as possible if it is possible to show that
there will be no EIP reads in between them. This seemed to make
almost no improvement on real programs (kate, xedit) and adds some
code and slows down the code generator, so I don't think it's worth
the hassle.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1343
56-chained-accounting
Fix accounting for chained blocks, by only counting real unchain
events, rather than the unchains used to establish the initial call to
VG_(patch_me) at the jump site.
Also a minor cleanup of the jump delta calculation in synth_jcond_lit.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1340
50-fast-cond
Implement Julian's idea for fast conditional jumps. Rather than fully
restoring the eflags register with an expensive push-popf pair, just
test the flag bits directly out of the base block. Faster, and smaller
code too!
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1339
46-fix-writeable_or_erring-proto
Prototype fix for wait_for_fd_to_be_writable_or_erring(). (bugfix for
43-nonblock-readwritev)
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1337
translation chaining patch.
47-chained-bb
This implements basic-block chaining. Rather than always going through
the dispatch loop, a BB may jump directly to a successor BB if it is
present in the translation cache.
When the BB's code is first generated, the jumps to the successor BBs
are filled with undefined instructions. When the BB is inserted into
the translation cache, the undefined instructions are replaced with a
call to VG_(patch_me). When VG_(patch_me) is called, it looks up the
desired target address in the fast translation cache. If present, it
backpatches the call to patch_me with a jump to the translated target
BB. If the fast lookup fails, it falls back into the normal dispatch
loop.
When the parts of the translation cache are discarded, all translations
are unchained, so as to ensure we don't have direct jumps to code which
has been thrown away.
This optimisation only has effect on direct jumps; indirect jumps
(including returns) still go through the dispatch loop. The -v stats
indicate a worst-case rate of about 16% of jumps having to go via the
slow mechanism. This will be a combination of function returns and
genuine indirect jumps.
Certain parts of the dispatch loop's actions have to be moved into
each basic block; namely: updating the virtual EIP and keeping track
of the basic block counter.
At present, basic block chaining seems to improve performance by up to
25% with --skin=none. Gains for skins adding more instrumentation
will be correspondingly smaller.
There is a command line option: --chain-bb=yes|no (defaults to yes).
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1336
could potentially cause hard-to-find code generation bugs):
00-lazy-fp
This patch implements lazy FPU state save and restore, which improves
the performance of FPU-intensive code by a factor of 15 or so. [when
running without any instrumentatation, that is.]
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1335
This commit adds stats gathering / printing (use -v -v), and selection
of sector size decided by asking skins, via
VG_(details).avg_translation_sizeB, the average size of their
translations.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1334
translation chaining. The old LRU system has gone, since it required
marking each translation each time it was used -- simulating a
reference bit. This is unacceptably expensive.
New scheme uses FIFO discard. TC is split into a variable number
(currently 8) parts. When all 8 parts are full, the oldest is
discarded and reused for allocation. This somewhat guards against
discarding recently-made translations and performs well in practice.
TT entries are simplified: the orig and trans size fields are now
stored in the TC, not in the TT. The TC entries are "self
describing", so it is possible to scan forwards through the TC entries
and rebuild the TT from them. TC entries are now word-aligned.
VG_(tt_fast) entries now point to TC entries, not TT entries.
The main dispatch loop now is 2 insns shorter since there's no need to
mark the current epoch on each TT entry as it is used. For that
matter, there's no longer any need for the notion of a current epoch
anyway.
It's all a great deal simpler than the old scheme, and it seems
significantly faster too.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1333
installation tree (`pwd`/Inst) back to the build tree since it is a
lot easier to edit them in the installation tree. Use with care!
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1330