Update the list of architectures to differentiate between the n32 and n64 abi
for mips64 when defining the fast cache macros in
coregrind/pub_core_transtab_asm.h.
Also amend the VG_(disp_cp_xindir) function in
coregrind/m_dispatch/dispatch-mips64-linux.S to use word-sized loads in case
of the n32 abi since the FastCacheSet structure members are now 4 bytes in
size for mips64 n32.
Patch by Stefan Maksimovic.
Necessary changes to support nanoMIPS on Linux.
Part 2/4 - Coregrind changes
Patch by Aleksandar Rikalo, Dimitrije Nikolic, Tamara Vlahovic and
Aleksandra Karadzic.
Related KDE issue: #400872.
The result of the floating point instructions vmaddfp, vnmsubfp,
vaddfp, vsubfp, vmaxfp, vminfp, vrefp, vrsqrtefp, vcmpeqfp, vcmpeqfp,
vcmpgefp, vcmpgtfp are controlled by the setting of the NJ bit in
the VSCR register. If VSCR[NJ] = 0; then denormalized values are
handled as specified by Java and the IEEE standard. If the bit is
a 1, then the denormalized element in the vector is replaced with
a zero.
Valgrind was not properly handling the denormalized case for these
instructions. This patch fixes the issue.
https://bugs.kde.org/show_bug.cgi?id=406256
Sync VEX/LICENSE.GPL with top-level COPYING file. We used 3 different
addresses for writing to the FSF to receive a copy of the GPL. Replace
all different variants with an URL <http://www.gnu.org/licenses/>.
The following files might still have some slightly different (L)GPL
copyright notice because they were derived from other programs:
- files under coregrind/m_demangle which come from libiberty:
cplus-dem.c, d-demangle.c, demangle.h, rust-demangle.c,
safe-ctype.c and safe-ctype.h
- coregrind/m_demangle/dyn-string.[hc] derived from GCC.
- coregrind/m_demangle/ansidecl.h derived from glibc.
- VEX files for FMA detived from glibc:
host_generic_maddf.h and host_generic_maddf.c
- files under coregrin/m_debuginfo derived from LZO:
lzoconf.h, lzodefs.h, minilzo-inl.c and minilzo.h
- files under coregrind/m_gdbserver detived from GDB:
gdb/signals.h, inferiors.c, regcache.c, regcache.h,
regdef.h, remote-utils.c, server.c, server.h, signals.c,
target.c, target.h and utils.c
Plus the following test files:
- none/tests/ppc32/testVMX.c derived from testVMX.
- ppc tests derived from QEMU: jm-insns.c, ppc64_helpers.h
and test_isa_3_0.c
- tests derived from bzip2 (with embedded GPL text in code):
hackedbz2.c, origin5-bz2.c, varinfo6.c
- tests detived from glibc: str_tester.c, pth_atfork1.c
- test detived from GCC libgomp: tc17_sembar.c
- performance tests derived from bzip2 or tinycc (with embedded GPL
text in code): bz2.c, test_input_for_tinycc.c and tinycc.c
Implementation for x86-solaris and amd64-solaris. This completes the
implementations for all targets. Note these two are untested because I don't
have any way to test them.
[This commit contains an implementation for all targets except amd64-solaris
and x86-solaris, which will be completed shortly.]
In the baseline simulator, jumps to guest code addresses that are not known at
JIT time have to be looked up in a guest->host mapping table. That means:
indirect branches, indirect calls and most commonly, returns. Since there are
huge numbers of these (often 10+ million/second) the mapping mechanism needs
to be extremely cheap.
Currently, this is implemented using a direct-mapped cache, VG_(tt_fast), with
2^15 (guest_addr, host_addr) pairs. This is queried in handwritten assembly
in VG_(disp_cp_xindir) in dispatch-<arch>-<os>.S. If there is a miss in the
cache then we fall back out to C land, and do a slow lookup using
VG_(search_transtab).
Given that the size of the translation table(s) in recent years has expanded
significantly in order to keep pace with increasing application sizes, two bad
things have happened: (1) the cost of a miss in the fast cache has risen
significantly, and (2) the miss rate on the fast cache has also increased
significantly. This means that large (~ one-million-basic-blocks-JITted)
applications that run for a long time end up spending a lot of time in
VG_(search_transtab).
The proposed fix is to increase associativity of the fast cache, from 1
(direct mapped) to 4. Simulations of various cache configurations using
indirect-branch traces from a large application show that is the best of
various configurations. In an extreme case with 5.7 billion indirect
branches:
* The increase of associativity from 1 way to 4 way, whilst keeping the
overall cache size the same (32k guest/host pairs), reduces the miss rate by
around a factor of 3, from 4.02% to 1.30%.
* The use of a slightly better hash function than merely slicing off the
bottom 15 bits of the address, reduces the miss rate further, from 1.30% to
0.53%.
Overall the VG_(tt_fast) miss rate is almost unchanged on small workloads, but
reduced by a factor of up to almost 8 on large workloads.
By implementing each (4-entry) cache set using a move-to-front scheme in the
case of hits in ways 1, 2 or 3, the vast majority of hits can be made to
happen in way 0. Hence the cost of having this extra associativity is almost
zero in the case of a hit. The improved hash function costs an extra 2 ALU
shots (a shift and an xor) but overall this seems performance neutral to a
win.
Adding MIPS N32 ABI support.
BZ issue - #345763.
Contributed and maintained by mulitple people over the years:
Crestez Dan Leonard, Maran Pakkirisamy, Dimitrije Nikolic,
Aleksandar Rikalo, Tamara Vlahovic.
Replace use of daddi/addi with daddiu/addiu.
This is more R6-friendly and we actually want to use the instructions
that do not cause integer overflow exception.
Patch by Vicente Olivert Riera.
Related issue - BZ#356112.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@16018
Explanation by Matthias Schwarzott:
The linker will request an executable stack as soon as at least one
object file, that is linked in, wants an executable stack.
And the absence of the
.section .note.GNU-stack."",@progbits
is enough to tell the linker that an executable stack is needed.
So even an empty asm-file must at least contain this statement to not
force executable stacks on the whole executable.
* Define a helper macro MARK_STACK_NO_EXEC that disables the
executable stack.
* Instantiate this macro unconditionally at the end of each asm file.
Patch by Matthias Schwarzott <zzam@gentoo.org>.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@15692
Valgrind aspects, to match vex r3124.
See bug 339778 - Linux/TileGx platform support to Valgrind
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@15080
to add PPC64 LE support. The other two patches can be found in Bugzillas
334384 and 334836.
POWER PC, add the functional Little Endian support, patch 2
The IBM POWER processor now supports both Big Endian and Little Endian.
The ABI for Little Endian also changes. Specifically, the function
descriptor is not used, the stack size changed, accessing the TOC
changed. Functions now have a local and a global entry point. Register
r2 contains the TOC for local calls and register r12 contains the TOC
for global calls. This patch makes the functional changes to the
Valgrind tool. The patch makes the changes needed for the
none/tests/ppc32 and none/tests/ppc64 Makefile.am. A number of the
ppc specific tests have Endian dependencies that are not fixed in
this patch. They are fixed in the next patch.
Per Julian's comments renamed coregrind/m_dispatch/dispatch-ppc64-linux.S
to coregrind/m_dispatch/dispatch-ppc64be-linux.S Created new file for LE
coregrind/m_dispatch/dispatch-ppc64le-linux.S. The same was done for
coregrind/m_syswrap/syscall-ppc-linux.S.
Signed-off-by: Carl Love <carll@us.ibm.com>
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@14239
to add PPC64 LE support. The other two patches can be found in Bugzillas
334834 and 334836. The commit does not have a VEX commit associated with it.
POWER PC, add initial Little Endian support
The IBM POWER processor now supports both Big Endian and Little Endian.
This patch renames the #defines with the name ppc64 to ppc64be for the BE
specific code. This patch adds the Little Endian #define ppc64le to the
Additionally, a few functions are renamed to remove BE from the name if the
function is used by BE and LE. Functions that are BE specific have BE put
in the name.
The goals of this patch is to make sure #defines, function names and
variables consistently use PPC64/ppc64 if it refers to BE and LE,
PPC64BE/ppc64be if it is specific to BE, PPC64LE/ppc64le if it is LE
specific. The patch does not break the code for PPC64 Big Endian.
The test files memcheck/tests/atomic_incs.c, tests/power_insn_available.c
and tests/power_insn_available.c are also updated to the new #define
definition for PPC64 BE.
Signed-off-by: Carl Love <carll@us.ibm.com>
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@14238
Necessary changes to Valgrind to support MIPS64LE on Linux.
Minor cleanup/style changes embedded in the patch as well.
The change corresponds to r2687 in VEX.
Patch written by Dejan Jevtic and Petar Jovanovic.
More information about this issue:
https://bugs.kde.org/show_bug.cgi?id=313267
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@13292
Reserve space for frame header in disp_run_translations, as some optimizations
may decide to use it. This should fix issue #307141.
Related link:
https://bugs.kde.org/show_bug.cgi?id=307141
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@13080
way that the fast-path through VG_(disp_cp_xindir) only has to
increment a 32 bit counter, saving memory bandwidth on 32 bit
platforms compared to a 64-bit inc. The overall numbers of XIndirs
can still be 64 bit though.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@12527
in common.
To accomplish that without penalizing the non-profiling dispatcher
we do the stats gathering *after* the jitted code returns to the
dispatcher. For that to work properly, we need to stash away the
instruction adddress before entering the jitted code so we can use
it later. (See also VEX r2208).
Two other tweaks are included here:
(1) For the non-profiling dispatcher it is not necessary to update
the LR in each iteration. Quite obviously the jitted code cannot
modify the LR in its iteration because it needs it at the very end
when it returns. So we move this step out of the core loop.
(2) Move loading the address of VG_(tt_fast) past testing for a changed
guest state pointer.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@12044
so as to avoid a GSP-changed check in the common case. See vex r2155.
(amd64-darwin and x86-darwin are now temporarily unbuildable.)
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@11786