For most SIMD operations that happen on 64-bit values (as would arise from MMX
instructions, for example, such as Add16x4, CmpEQ32x2, etc), generate code
that performs the operation using SSE/SSE2 instructions on values in the low
halves of XMM registers. This is much more efficient than the previous scheme
of calling out to helper functions written in C. There are still a few SIMD64
operations done via helpers, though.
.. by adding support for MOVQ xmm/ireg and using that to implement 64HLtoV128,
4x64toV256 and their inverses. This reduces the number of instructions,
removes the use of memory as an intermediary, and avoids store-forwarding
stalls.
pshufb mm/xmm/ymm rearranges byte lanes in vector registers. It's fairly
widely used, but we generated terrible code for it. With this patch, we just
generate, at the back end, pshufb plus a bit of masking, which is a great
improvement.
* changes set_AV_CR6 so that it does scalar comparisons against zero,
rather than sometimes against an all-ones word. This is something
that Memcheck can instrument exactly.
* in Memcheck, requests expensive instrumentation of Iop_Cmp{EQ,NE}64
by default on ppc64le.
https://bugs.kde.org/show_bug.cgi?id=386945#c62
This makes it possible for memcheck to know which part of the 128bit
vector is defined, even if the load is partly beyond an addressable block.
Partially resolves bug 386945.
On powerpc partial unaligned loads of vectors from partially invalid
addresses are OK and could be generated by our translation of lxvd2x.
Adjust partial_load memcheck tests to allow partial loads of 16 byte
vectors on powerpc64.
Part of resolving bug #386945.
This makes it possible for memcheck to know which part of the 128bit
vector is defined, even if the load is partly beyond an addressable block.
Partially resolves bug 386945.
On powerpc partial unaligned loads of words from partially invalid
addresses are OK and could be generated by our translation of ldbrx.
Adjust partial_load memcheck tests to allow partial loads of words
on powerpc64.
Part of resolving bug #386945.
This makes it possible for memcheck to analyse the new gcc strcmp
inlined code correctly even if the ldbrx load is partly beyond an
addressable block.
Partially resolves bug 386945.
This happens when processing openssl aes_v8_set_encrypt_key
(aesv8-armx.S:133). The noteTmpUsesIn () function is new since
PR387664 Memcheck: make expensive-definedness-checks be the default.
It didn't handle Iex_VECRET which is used in the arm64 crypto
instruction dirty handlers.
The sys_ptrace post didn't mark the thread as being in traceme mode.
This occassionally would make the memcheck/tests/linux/getregset.vgtest
testcase fail. With this patch it reliably passes.
Wait for children to finish before terminating the main process.
This fixes occasional failures of the following tests:
drd/tests/fork-parallel (stderr)
drd/tests/fork-serial (stderr)
In final_tidyup we setup the guest to call the freeres_wrapper, which
will (possibly) call __gnu_cxx::__freeres() and/or __libc_freeres().
In a couple of cases (ppc64be, ppc64le and mips32) this involves setting
up one or more helper registers. Since we setup these guest registers
we should make sure to mark them as fully defined. Otherwise we might
see spurious warnings about undefined value usage if the guest register
happened to not be fully defined before.
This fixes PR402006.
Because it's very useful. As part of this, the "percentage of events
annotated" numbers at the bottom of the output is changed to "events
annotated" so that --show-percs doesn't compute a percentage of a
percentage.
Example output lines:
```
4,967,137,442 (100.0%) PROGRAM TOTALS
4,543 (25.23%) 17,566 ( 0.43%) 47,993 ( 0.92%) /build/glibc-OTsEL5/glibc-2.27/elf/dl-lookup.c
1 ( 0.01%) 2,000,001 (49.29%) 3,000,004 (57.36%) for (int i = 0; i < 1000000; i++) {
```
The commit also adds some much-needed tests for cg_annotate and
callgrind_annotate.
glibc 2.28 filters out some bad signal numbers and returns
Invalid argument instead of passing such bad signal numbers
the kernel sigaction syscall. So we won't see such bad signal
numbers and won't print "bad signal number" ourselves.
Add a new memcheck/tests/sigkill.stderr.exp-glibc-2.28 to catch
this case.
The mfvscr and vor instructions in jm-insns.c had a "=vr" constraint.
This should have been an "=v" constraint. This resolved assembler
warnings and the testcase failing on ppc64le with gcc 8.2 and
binutils 2.30.
This adds a configuration file ".dir-locals.el" for Emacs to the topmost
directory of the Valgrind source tree, and another such file to the
directory drd/tests. These files contain per-directory local Emacs
variables.
The following settings are performed:
* The base C style is set to "Linux", indentation is set to 3 columns
per level, the use of tabs for indentation is disabled, and the fill
column is set to 80.
* The source files in drd/tests use 2 instead of 3 columns per indentation
level.
Add test cases for the z13 vector FP support. Bring s390-opcodes.csv
up-to-date, reflecting that the z13 vector instructions are now supported.
Also remove the non-support disclaimer for the vector facility from
README.s390.
The patch was contributed by Vadim Barkov, with some clean-up and minor
adjustments by Andreas Arnez.
This adds support for the z/Architecture vector FP instructions that were
introduced with z13.
The patch was contributed by Vadim Barkov, with some clean-up and minor
adjustments by Andreas Arnez.
This generalises the existing spec rules for W of 32 bits:
W <u 0---(N-1)---0 1 0---0 or
(that is, B/NB after SUBL, where dep2 has the above form), to also cover
W <=u 0---(N-1)---0 0 1---1
(that is, BE/NBE after SUBL, where dept2 has the specified form).
Patch from Nicolas B. Pierron (nicolas.b.pierron@nbp.name).
- The option --xtree-leak=yes (to output leak result in xtree format)
automatically activates the option --show-leak-kinds=all,
as xtree visualisation tools such as kcachegrind can in any case
select what kind of leak to visualise.
This patch addresses the following:
* Fix the implementation of LOCGHI. Previously Valgrind performed 32-bit
sign extension instead of 64-bit sign extension on the immediate value.
* Advertise VXRS in HWCAP. If no VXRS are advertised, but the program
uses vector registers, this could cause problems with a glibc built with
"-march=z13".
This pertains to bug 386945.
VEX/priv/guest_ppc_toIR.c:
gen_POPCOUNT: use Iop_PopCount{32,64} where possible.
gen_vpopcntd_mode32: use Iop_PopCount32.
for cntlz{w,d}, use Iop_CtzNat{32,64}.
gen_byterev32: use Iop_Reverse8sIn32_x1 instead of lengthy sequence.
verbose_Clz32: remove (was unused anyway).
memcheck/mc_translate.c:
Add mkRight{32,64} as right-travelling analogues to mkLeft{32,64}.
doCmpORD: for the cases of a signed comparison against zero, compute
definedness of the 3 result bits (lt,gt,eq) separately, and, for the lt and eq
bits, do it exactly accurately.
expensiveCountTrailingZeroes: no functional change. Re-analyse/verify and add
comments.
expensiveCountLeadingZeroes: add. Very similar to
expensiveCountTrailingZeroes.
Add some comments to mark unary ops which are self-shadowing.
Route Iop_Ctz{,Nat}{32,64} through expensiveCountTrailingZeroes.
Route Iop_Clz{,Nat}{32,64} through expensiveCountLeadingZeroes.
Add instrumentation for Iop_PopCount{32,64} and Iop_Reverse8sIn32_x1.
memcheck/tests/vbit-test/irops.c
Add dummy new entries for all new IROps, just enough to make it compile and
run.
VEX/priv/host_ppc_defs.c, VEX/priv/host_ppc_defs.h:
Dont emit cnttz{w,d}. We may need them on a target which doesn't support
them. Instead we can generate a fairly reasonable alternative sequence with
cntlz{w,d} instead.
Add support for emitting popcnt{w,d}.
VEX/priv/host_ppc_isel.c
Add support for: Iop_ClzNat32 Iop_ClzNat64
Redo support for: Iop_Ctz{32,64} and their Nat equivalents, so as to not use
cnttz{w,d}, as mentioned above.
Add support for: Iop_PopCount64 Iop_PopCount32 Iop_Reverse8sIn32_x1
This is part of the fix for bug 386945. It adds the following IROps, plus
their supporting type- and printing- fragments:
Iop_Reverse8sIn32_x1: 32-bit byteswap. A fancy name, but it is consistent
with naming for the other swapping IROps that already exist.
Iop_PopCount64, Iop_PopCount32: population count
Iop_ClzNat64, Iop_ClzNat32, Iop_CtzNat64, Iop_CtzNat32: counting leading and
trailing zeroes, with "natural" (Nat) semantics for a zero input, meaning, in
the case of zero input, return the number of bits in the word. These
functionally overlap with the existing Iop_Clz64, Iop_Clz32, Iop_Ctz64,
Iop_Ctz32. The existing operations are undefined in case of a zero input.
Adding these new variants avoids the complexity of having to change the
declared semantics of the existing operations. Instead they are deprecated
but still available for use.