- removed various things that are no longer used
- made (module-)local some things that were global
- improved the formatting in places
Removed about 160 lines of code, and non-trivially reduced the number
of global entities.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@2541
This patch adds translation tests for most of the basic x86 instructions and
fixes a few missing/broken instructions to work properly.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@2242
Patch to improve SSE/SS2 support
This patch should implement most of the missing SSE/SSE2 opcodes. About
the only ones it doesn't do are the MASKMOVxxx ones as they are quite
horrible and involved an implicit reference to EDI so I need to think
about them a bit more.
The patch also includes a set of tests for the MMX/SSE/SSE2 opcodes to
validate that they have the same effect under valgrind as they do when
run normally. In one or two cases this wasn't actually the case even
for some of the implemented opcodes, so I fixed those as well ;-)
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@2202
I added a test case and cleaned up vg_dispatch.S while I was about it.
CCMAIL: 69529-done@bugs.kde.org
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@2129
--branchpred=yes. I'm interested to know if these make a significant
difference for anyone - I see a small speed increase on the Pentium M.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@2126
Valgrind's dependency on the dynamic linker for getting started, and
instead takes things into its own hands.
This checkin doesn't add much in the way of new functionality, but it
is the basis for all future work on Valgrind. It allows us much more
flexibility in implementation, and well as increasing the reliability
of Valgrind by protecting it more from its clients.
This patch requires some changes to tools to update them to the changes
in the tool API, but they are straightforward. See the posting "Heads
up: Full Virtualization" on valgrind-developers for a more complete
description of this change and its effects on you.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@2118
Scientific Library (gsl-1.4) compiled with Intel Icc 7.1 20030307Z '-g
-O -xW'. I think this gives pretty good coverage of SSE/SSE2 floating
point instructions, or at least the subset emitted by Icc. So far
tested on memcheck and nulgrind; addrcheck and cachesim still testing.
MERGE TO STABLE
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1955
can treat it like add and generate partially-defined results of multiply
with partially defined arguments. It may also speed things up a bit,
if they use lots of multiplies.
This change only deals with signed "new style" multiplies. That the x86
has two quite different kinds of multiply instructions: the "old-style"
signed and unsigned multiply which uses fixed registers (eax:edx) and
generates a result twice the size of the arguments, and the newer signed
multiple which takes general addressing modes. It seems that gcc always
(almost always?) generates the new signed multiply instructions, except
for byte-sized multiplies.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1925
to inline. This is needed to get a warning-free compilation on 3.3.1.
It seems we had "inline" on some pretty huge functions in places.
Also it appears gcc-3.3.1 won't inline a function call in a tail call
position, reasonably enough. I assume in that case it prefers to
create a tailcall to the callee, rather than inlining it.
MERGE TO STABLE
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1907
a way consistent with the position of the register field in the
instruction. In Intel encoding parlance, the G register is in bits
5,4,3 and the E register is bits 2,1,0, and so we adopt this scheme
consistently. Considering how much confusion this has caused me in
this recent bout of SSE hacking, consistent renaming can only be a
good thing. It makes it a lot easier to figure out if parts of the
SSE handling machinery are correct, or not.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1698
run Qt-3.1 as built with "icc -xW" (P4 code generation). Hopefully by
now I've worked through most SSE/SSE2 conceptual nasties, and it's
mostly a question of filling in the gaps.
I think I might have created a bug of some kind with SSE3g_RegWr. My
current test app segfaults if I run without --optimise=no, which makes
me think I've written something erroneous in the UInstr predicates
controlling optimisation. I don't know what though.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1676
generally rationalise it as the structure of Intel's encoding scheme
becomes more apparent. Add a bunch more instructions.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1657
work, when running on a P4 with an NVidia Vanta card and using
NVidia-supplied libGL.so.1.0.3123. Surprisingly this seems to require
only a minimal set of instructions. So far this is only with
--skin=none.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1652
Removed the SK_(written_shadow_regs_values)() function. Instead, skins that
use shadow regs can track the `post_regs_write_init' event, and set the shadow
regs from within it. This is much more flexible, since it allows each shadow
register to be set to a separate value if necessary. It also matches the new
shadow-reg-change events described below.
In the core, there were some places where the shadow regs were changed, and
skins had no way of knowing about it, which was a problem for some skins.
So I added a bunch of new events to notify skins about these:
post_reg_write_syscall_return
post_reg_write_deliver_signal
post_reg_write_pthread_return
post_reg_write_clientreq_return
post_reg_write_clientcall_return
Any skin that uses shadow regs should almost certainly track these events. The
post_reg_write_clientcall_return allows a skin to tailor the shadow reg of the
return value of a CLIENTCALL'd function appropriately; this is especially
useful when replacing malloc() et al.
Defined some macros that should be used *whenever the core changes the value of
a shadow register* :
SET_SYSCALL_RETVAL
SET_SIGNAL_EDX (maybe should be SET_SIGNAL_RETVAL? ... not sure)
SET_SIGNAL_ESP
SET_CLREQ_RETVAL
SET_CLCALL_RETVAL
SET_PTHREQ_ESP
SET_PTHREQ_RETVAL
These replace all the old SET_EAX and SET_EDX macros, and are added in a few
places where the shadow-reg update was missing.
Added shadow registers to the machine state saved/restored when signal handlers
are pushed/popped (they were missing).
Added skin-callable functions VG_(set_return_from_syscall_shadow)() and
VG_(get_exit_status_shadow)() which are useful and abstract away from which
registers the results are in.
Also, poll() changes %ebx (it's first argument) sometimes, I don't know why.
So we notify skins about that too (with the `post_reg_write_syscall_return'
event, which isn't ideal I guess...)
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1642
(prerelease) (SuSE Linux)") seems to complain about signed-vs-unsigned
comparisons, when -Wall is on. This commit fixes (most of) those
complaints.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1638
to include the SSE/SSE2 architectural state. Automagically detect
at startup, in vg_startup.S, whether or not this is a SSE-enabled
CPU and act accordingly. All subsequent FPU/SSE state transfers
between the simulated and real machine are then done either with
fsave/frstor (as before) or fxsave/fxrstor (the SSE equivalents).
Fragile and fiddly; (1) the SSE state needs to be stored on a 16-byte
boundary, and (2) certain bits in the saved MXCSR reg in a state
written by fxsave need to be anded out before we can safely restore
using fxrstor.
It does appear to work. I'd appreciate people trying it out on
various CPUs to establish whether the SSE / not-SSE check works
right, and/or anything else is broken.
Unfortunately makes some programs run significantly slower.
I don't know why. Perhaps due to copying around more processor
state than there was before (SSE state is 512 bytes, FPU state
was only 108). I will look into this.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1574
overview
-----------------------------------------------------------------------------
This commit introduces an optimisation that speeds up Memcheck by roughly
-3 -- 28%, and Addrcheck by 1 -- 36%, at least for the SPEC2000 benchmarks on
my 1400MHz Athlon.
Basic idea: that handling of A/V bit updates on %esp-adjustments was quite
sub-optimal -- for each "PUT ESP", a function was called that computed the
delta from the old and new ESPs, and then called a looping function to deal
with it.
Improvements:
1. most of the time, the delta can be seen from the code. So there's no need
to compute it.
2. when the delta is known, we can directly call a skin function to handle it.
3. we can specialise for certain common cases (eg. +/- 4, 8, 12, 16, 32),
including having unrolled loops for these.
This slightly bloats UCode because of setting up args for the call, and for
updating ESP in code (previously was done in the called C function). Eg. for
`date' the code expansion ratio goes from 14.2 --> 14.6. But it's much faster.
Note that skins don't have to use the specialised cases, they can just
define the ordinary case if they want; the specialised cases are only used
if present.
-----------------------------------------------------------------------------
details
-----------------------------------------------------------------------------
Removed addrcheck/ac_common.c, put its (minimal) contents in ac_main.c.
Updated the major interface version, because this change isn't binary
compatible with the old core/skin interface.
Removed the hooks {new,die}_mem_stack_aligned, replaced with the better
{new,die}_mem_stack_{4,8,12,16,32}. Still have the generic {die,new}_mem_stack
hooks. These are called directly from UCode, thanks to a new pass that occurs
between instrumentation and register allocation (but only if the skin uses
these stack-adjustment hooks). VG_(unknown_esp_update)() is called from UCode
for the generic case; it determines if it's a stack switch, and calls the
generic {new,die}_stack_mem hooks accordingly. This meant
synth_handle_esp_assignment() could be removed.
The new %esp-delta computation phase is in vg_translate.c.
In Memcheck and Addrcheck, added functions for updating the A and V bits of a
single aligned word and a single aligned doubleword. These are called from the
specialised functions new_mem_stack_4, etc. Could remove the one for the old
hooks new_mem_stack_aligned and die_mem_stack_aligned.
In mc_common.h, added a big macro containing the definitions of new_mem_stack_4
et al. It's ``instantiated'' separately by Memcheck and Addrcheck. The macro
is a bit klugey, but I did it that way because speed is vital for these
functions, so eg. a function pointer would have slowed things down.
Updated the built-in profiling events appropriately for the changes (removed
one old event, added a new one; finding their names is left as an exercise for
the reader).
Fixed memory event profiling in {Addr,Mem}check, which had rotted.
A few other minor things.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1510
the same as that for FPU instructions. That is, regard the MMX state
(which is the same as the FPU state) opaquely, and every time we
need to do a MMX instruction, move the simulated MMX state into the
real CPU, do the instruction, and move it back. JeremyF's optimisation
to minimise FPU saves/restores applies automatically here.
So, this scheme is simple. It will cause memcheck to complain bitterly
if uninitialised data is copied through the MMX registers, in the same
way that memcheck complains if you move uninit data through the FPU
registers. Whether this turns out to be a problem remains to be seen.
Most instructions are done, and doing the rest is easy enough, I just
need people to send test cases so I can do them on demand.
(Core) UCode has been extended with 7 new uinstrs:
MMX1 MMX2 MMX3
-- 1/2/3 byte mmx insns, no references to
integer regs or memory, copy exactly to the output stream.
MMX_MemRd MMX_MemWr
-- 2 byte mmx insns which read/write memory and therefore need
to have an address register patched in at code generation
time. These are the analogues to FPU_R / FPU_W.
MMX_RegRd MMX_RegWr
-- These have no analogues in FPU land. They hold 2 byte insns
which move data to/from a normal integer register (%eax etc),
and so this has to be made explicit so that (1) a suitable
int reg can be patched in at codegen time, and (2) so that
memcheck can do suitable magic with the V bits going into/
out of the MMX regs.
Nulgrind (ok, this is a nop, but still ...) and AddrCheck's
instrumenters have been extended to cover these new UInstrs. All
others (cachesim, memcheck, lackey, helgrind, did I forget any)
abort when they see any of them. This may be overkill but at least
it ensures we don't forget to implement it in those skins.
[A bad thing would be that some skin silently passes along
MMX uinstrs because of a default: case, when it should actually
do something with them.]
If this works out well, I propose to backport this to 2_0_BRANCH.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1483
VG_(handle_esp_assignment)() was needed by a skin (and thus whether to register
it in the baseBlock) was different to that used when determining whether to
call it in code generation... so it could be (attempted to be) called having
not been registered.
Fixed this by consistifying the conditions, using a function
VG_(need_to_handle_esp_assignment)() that is used in both places. The bug
hadn't been found previously because no existing skin exercised the mismatched
conditions in conflicting ways.
Also took VG_(track).post_mem_write out of consideration because it's no longer
important (due to a change in how stack switching is detected).
----
Improved the error message for when a helper can't be found in the baseBlock --
now looks up the debug info to tell you the name of the not-found function.
----
Increased the number of noncompact helpers allowed from 8 to 24
----
Removed a magic number that was hardcoded all over the place, introducing
VG_MAX_REGS_USED for the size of the arrays needed by VG_(get_reg_usage)()
----
Also added these functions
VG_(get_archreg)()
VG_(get_thread_archreg)()
VG_(get_thread_shadow_archreg)()
VG_(set_thread_shadow_archreg)()
which can be useful for skins.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1419
VG_(get_shadow_archreg), VG_(set_shadow_archreg), VG_(shadow_archreg_address).
Curiously, the only way skins could previously access them was with
VG_(shadow_reg_offset), which wasn't very flexible.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1412
with similar functions, and made it visible to skins (useful).
Also bumped up the skin interface minor version number due to this change; this
bumping will cover any other binary-compatible changes between now and the next
release (after 1.9.3).
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1410
These assumed that ROR sets the P and Z flags and in fact it
sets neither. Add an extra OR insn to really set those flags.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1397
As of now it is correct, following several hours study.
- Rename upd_cc parameters to simd_flags since that's what they
really mean: does this insn interact at all with %EFLAGS
(the simulated flags) ?
- Have a convention that calls to new_emit which specify
FlagsEmpty for both the def and use sets should pass False
as the simd_flags parameter; this seems more logical than
saying True. From partial evaluation of new_emit with
these args one can see it does nothing under such circumstances,
as one would hope.
- Add an alternative, unused implementation of new_emit in
which the state space is explicitly enumerated. Instructive.
--------------------------------------------------------------
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1396
(Int)False == (Int)FlagsEmpty, but still.
Whilst hunting (completely unsuccessfully) for some bug causing
MySQL to malfunction with some skins (memcheck), or with most
skins when --single-step=yes.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1395
75-simple-jle
Another pattern to test for Jle/Jnle. The observation is that EFLAGS
looks like this:
----O--+SZ------
with Z in bit 6, S in 7 and O in 11. Therefore RORL $7, %eflags will
result in:
Z------+-------+-------+---O---S
Since parity is only computed on the lower 8 bits, testing on P will
determine whether O==S, and since Z is in the MSB, it can be tested
with S.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1366
72-jump
Add some codegen infrastructure to make it easier to generate local
jumps. If you want to generate a local backwards jump, use
VG_(init_target)(&tgt) to initialize the target descriptor, then
VG_(emit_target_back)(&tgt) just before emitting the target
instruction. Then, when emitting the delta for the jump, call
VG_(emit_delta)(&tgt).
Forward jumps are analogous, except that you call VG_(emit_delta)()
then VG_(emit_target_forward)().
The new emit function, VG_(emit_jcondshort_target)() takes a target
pointer rather than a delta.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1364
69-simple-jlo, which takes account of the fact that the P flag is set
only from the lowest 8 bits of the result, a problem causing the
original version of this patch not to work right.
Also fixes a call to new_emit.
69-simple-jlo
For Jlo and Jnlo, which test S == O or S != O, when generating special
test sequences which don't require the simulated flags in the real
flags, generate a test and parity test to see if both bits are equal
(even parity) or not equal (odd parity).
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1363