New Iops are defined:
Iop_Scale2_32Fx4, Iop_Scale2_64Fx2,
Iop_Log2_32Fx4, Iop_Log2_64Fx2,
Iop_F32x4_2toQ16x8, Iop_F64x2_2toQ32x4,
Iop_PackOddLanes8x16, Iop_PackEvenLanes8x16,
Iop_PackOddLanes16x8, Iop_PackEvenLanes16x8,
Iop_PackOddLanes32x4, Iop_PackEvenLanes32x4.
Contributed by:
Tamara Vlahovic, Aleksandar Rikalo and Aleksandra Karadzic.
Related BZ issue - #382563.
glibc doesn't guarantee anything about setrlimit with a NULL limit argument.
It could just crash (if it needs to adjust the limit) or might silently
succeed (as newer glibc do). Just remove the extra check.
See also the "setrlimit change to prlimit change in behavior" thread:
https://sourceware.org/ml/libc-alpha/2017-10/threads.html#00830
glibc ld.so has an optimization when resolving a symbol that checks
whether or not the upper 128 bits of the ymm registers are zero. If
so it uses "cheaper" instructions to save/restore them using the xmm
registers. If those upper 128 bits contain undefined values memcheck
will issue an Conditional jump or move depends on uninitialised value(s)
warning whenever trying to resolve a symbol.
This triggers in our sh-mem-vecxxx test cases. Suppress the warning
by default.
https://bugs.kde.org/show_bug.cgi?id=385868
Use MIPSRH_Reg to get MIPSRH for Iop_Max32U. Without it, under specific
circumstances, the code may explode and exceed Valgrind instruction buffer
due to multiple calls to iselWordExpr_R through iselWordExpr_RH.
Issue discovered while testing Valgrind on Android.
Patch by Tamara Vlahovic.
While handling Iex_ITE, do not use the same virtual register for the
input and output.
Issue discovered while testing Valgrind on Android.
Patch by Tamara Vlahovic.
Reg<->Reg MOV coalescing status is now a part of the HRegUsage.
This allows register allocation to query it two times without incurring
a performance penalty. This in turn allows to better keep track of
vreg<->vreg MOV coalescing so that all vregs in the coalesce chain
get the effective |dead_before| of the last vreg.
A small performance improvement has been observed because this allows
to coalesce even spilled vregs (previously only assigned ones).
If native compiler can build Valgrind for mips32 o32 on native mips64
system, it should do it.
This change adds a second architecture for MIPS in a similar way how it has
been previously done for amd64 and ppc64.
The implementation of the vpermr, xxperm, xxpermr violate this by
using a mask of 0x1F. Fix the code and the corresponding comments
to met the definition for Iop_Perm8x16. Use Iop_Dup8x16 to generate
vector value for subtraction.
Bugzilla 385334.
The patch was in my git tree with the patch I intended to apply.
I didn't realize the patch was in the tree. Git applied both
patches. Still investigating the vperm change to see if it is
really needed.
The ISA says:
Let the source vector be the concatenation of the
contents of VR[VRA] followed by the contents of
VR[VRB].
For each integer value i from 0 to 15, do the following.
Let index be the value specified by bits 3:7 of byte
element i of VR[VRC].
So, the index value is 5-bits wide ([3:7]), not 4-bits wide.
The current implementation will generate a lot of Iops. The number
of generated Iops can lead to Valgrind running out of temporary space.
See bugzilla https://bugs.kde.org/show_bug.cgi?id=385208 as an example
of the issue. Using Iop_Perm8x16 reduces the number of Iops significantly.
bugzilla 385210
The current xxperm instruction implementation generates a huge
number of Iops to explicitly do the permutation. The code
was changed to use the Iop_Perm8x16 which is much more efficient
so temporary memory doesn't get exhausted.
Bugzilla 385208
The function calculates the floating point condition code values
and stores them into the floating point condition code register.
The function is used by a number of instructions. The calculation
generates a lot of Iops as it much check the operatds for NaN, SNaN,
zero, dnorm, norm and infinity. The large number of Iops exhausts
temporary memory.
So we never intended to ignore all changes from the top-level down in /include
or /cachegrind. Instead allow the filetype-specific .gitignore patterns match
to the contents of these two folders.
Also, don't ignore changes to include/valgrind.h as it exists in the repository
and should be tracked for any changes developers might make.
Changes tested by running a git clean force and then full rebuild. No stray
build artifacts were being tracked erroneously by git after these changes.
Set correct values from Linux kernel.
See ./arch/mips/include/uapi/asm/sockios.h
This issue is covered by newly introduced memcheck test mips32/bad_sioc.
Helper calls always trash all caller saved registers. By listing the callee saved
first then VEX register allocator (both v2 and v3) is more likely to pick them
and does not need to spill that much before helper calls.
clang has been reasonably good at standards compliance for a while now, and
the Apple-shipped clang-variant in Xcode remains fairly close to upstream.
Let's assume that the Apple-shipped clang-variant is sufficient for
building valgrind, provided it is above a minimum version of 5.1.
massif/tests/mmapunmap on ppc now indicates a below main function.
Note: this ppc53 specific file is needed because the valgrind stack unwinder
does not properly unwind in main.
At the mmap syscall, gdb backtrace gives:
Breakpoint 3, 0x00000000041dbae0 in .__GI_mmap () from /lib64/libc.so.6
(gdb) bt
while the valgrind stack trace gives:
Thread 1: status = VgTs_Runnable (lwpid 64207)
==64207== at 0x41DBAE0: mmap (in /usr/lib64/libc-2.17.so)
==64207== by 0x10000833: f (mmapunmap.c:9)
==64207== by 0x40E6BEB: (below main) (in /usr/lib64/libc-2.17.so)
client stack range: [0x1FFEFF0000 0x1FFF00FFFF] client SP: 0x1FFF00ECE0
valgrind stack top usage: 15632 of 1048576
We can have stacktraces such as:
==41840== by 0x10000927: a1 (deep.c:27)
==41840== by 0x1000096F: main (deep.c:35)
==41840== by 0x4126BEB: generic_start_main.isra.0 (in /usr/lib64/libc-2.17.so)
==41840== by 0x4126E13: __libc_start_main (in /usr/lib64/libc-2.17.so)
So, add generic_start_main.isra.0 as a below main function.
This fixes the test massif/tests/deep-D
Currently, --ignore-fn is only matched with the top IP entries that
have a fnname. With this change, we first search for the first IP that
has a fnname.
This e.g. allows to ignore the allocation for a stacktrace such as:
0x1 0x2 0x3 fn_to_ignore otherfn
This is then used in massif c++ tests new-cpp and overloaded-new to ignore
the c++ libstdc++ allocation similar to:
==10754== 72,704 bytes in 1 blocks are still reachable in loss record 10 of 10
==10754== at 0x4C2BBCD: malloc (vg_replace_malloc.c:299)
==10754== by 0x4EC39BF: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==10754== by 0x400F8A9: call_init.part.0 (dl-init.c:72)
==10754== by 0x400F9BA: call_init (dl-init.c:30)
==10754== by 0x400F9BA: _dl_init (dl-init.c:120)
==10754== by 0x4000C59: ??? (in /lib/x86_64-linux-gnu/ld-2.24.so)
The bug itself was solved in 3.12 by the addition of __gnu_cxx::__freeres
in the libstdc++ and have valgrind calling it before exit.
However, depending on the version of the libstdc++, the test leak_cpp_interior
was giving different results.
This commit adds some filtering specific to the test, so as to not depend
anymore of the absolute number of bytes leaked, and adds a suppression entry to
ignore the memory allocated by libstdc++.
This allows to have only 2 .exp files, instead of 4 (or worse, if
we would have to handle yet other .exp files depending on the libstdc++
version).
gdbserver_tests/hgtls is failing on a number of platforms
as it looks like static tls handling is now needed.
So, omplement static tls for a few more platforms.
The formulas that are platform dependent are somewhat wild guesses
obtained with trial and errors.
Note that arm/arm64/ppc32 are not (yet) done
The below commit introduced a regression on ppc32
ommit 00d4667295a821fef9eb198abcb0c942dffb6045
Author: Ivo Raisr <ivosh@ivosh.net>
Date: Wed Sep 6 08:10:36 2017 +0200
Reorder allocatable registers for AMD64, X86, and PPC so that the callee saved are listed first.
Helper calls always trash all caller saved registers. By listing the callee saved
first then VEX register allocator (both v2 and v3) is more likely to pick them
and does not need to spill that much before helper calls.
Investigation/fix done by Ivo.
Compiler may optimize out call to cbrt. Change test to prevent that.
Otherwise, the test does not exercise a desired codepath for cbrt, and it
prints precalculated value.
Helper calls always trash all caller saved registers. By listing the callee saved
first then VEX register allocator (both v2 and v3) is more likely to pick them
and does not need to spill that much before helper calls.
The code handling array bounds is not ready to accept a reference
to something else (not very clear what this reference could be) :
the code only expects directly the value of a bound.
So, it was using the reference (i.e. an offset somewehere in the debug
info) as the value of the bound.
This then gave huge bounds for some arrays, causing an overlap
in the stack variable handling code in exp-sgcheck.
Such references seems to be used sometimes for arrays with variable
size stack allocated.
Fix (or rather bypass) the problem by not considering that we have
a usable array bound when a reference is given.
Keeps track whether the bound real register has been reloaded from a virtual
register recently and if this real reg is still equal to that spill slot.
Avoids unnecessary spilling that vreg later, when this rreg needs
to be reserved, usually as a caller save register for a helper call.
Fixes BZ#384526.
.. so that the code it creates runs in approximately half the time it did
before. This is in support of making the cost of expensive (exactly)
integer EQ/NE as low as possible, since the day will soon come when we'll
need to enable this by default.