There will be a lot more to come.
On amd64 Linux
In faultstatus was seeing the division by zero and emitting a ud2 opcode.
In wrap3 a pair of mutually recursive functions were being inlined.
When forced not to be inlined GCC merged them into a single function.
It cannot see that the client requests have diffeent behaviour.
Leaving it in place for 11 (which is now EOL) and 12 - not
woth the complexity for them. Improve comment for supporession.
Also add a pointer to the illumos source web page for lwp_unlock_mutex
in case the syswrap ever needs improving.
I tried to test drd/tests/pth_mutex_signal on Solaris
(you never know) but encountered a missing syscall
wrapper. So this adds a very basic wrapper for lwp_mutex_unlock.
Also update a Solaris expected that I missed amongst the FreeBSD changes.
This was broken by commit 75e3ef0f3 "readdwarf3: Skip units without
addresses when looking for inlined functions". Specifically by this
part: "Also use skip_DIE instead of read_DIE when not parsing
(skipping) children"
rustc puts concrete function instances in namespaces (which is
allowed in DWARF since there is no strict separation between type
declarations and program scope entries in a DIE tree), the inline
parser didn't expect this and so skipped any DIE under a namespace
entry. This wasn't an issue before because "skipping" a DIE tree was
done by reading it, so it wasn't actually skipped. But now that we
really skip the DIE (sub)tree (which is faster than actually parsing
it) some entries were missed in the rustc case.
https://bugs.kde.org/show_bug.cgi?id=445668
The name malloc-leaks-cxx-stl-string-classes-debug was confusing
since the suppression wasn't a leak, not part of stl, string,
classes or debug. Rename it to libstdcxx-emergency-eh-alloc-pool
to indicate it is part of the emergency exception handling memory
pool.
Note that suppression is only needed for some test cases, normally
the pool is cleaned up as part of cxx_freeres.
S after SHRQ
Z after SHLQ
NZ after SHLQ
Z after SHLL
S after SHLL
The lack of at least one of these was observed to cause occasional false
positives in Memcheck.
Plus add commented-out cases so as to complete the set of 12 rules
{Z,NZ,S,NS} after {SHRQ,SHLQ,SHLL}. The commented-out ones are commented
out because I so far didn't find any use cases for them.
This time with debuginfo removed.
Also update the vgtest files for a couple of massif tests
(and also the expected because of the commmand line change).
Not yet tested these two with debuginfo installed.
For the arm64 front end, none of the atomic instructions have address
alignment checks included in their IR. They all should. The effect of
missing alignment checks in the IR is that, since this IR will in most cases
be translated back to atomic instructions in the back end, we will get
alignment traps (SIGBUS) on the host side and not on the guest side, which is
(very) incorrect behaviour of the simulation.
The problem is that the testcase specific suppression has stacks
that are too specific. This causes breakage with different versions
of GCC and libstdc++. The suppression only needs to mask the memory
pool used for standard io.
There are several suppression stanzas so future tweaks may still be
necessary.
Clang uses CMOV for ternary operators which does not immediately
trigger an error. Using double free and new/free mismatch still
poses no problem with clang but still uses the demangling.
Also update .gitignore
This is unfortunately a big and complex patch, to implement LD{,A}XP and
ST{,L}XP. These were omitted from the original AArch64 v8.0 implementation
for unknown reasons.
(Background) the patch is made significantly more complex because for AArch64
we actually have two implementations of the underlying
Load-Linked/Store-Conditional (LL/SC) machinery: a "primary" implementation,
which translates LL/SC more or less directly into IR and re-emits them at the
back end, and a "fallback" implementation that implements LL/SC "manually", by
taking advantage of the fact that V serialises thread execution, so we can
"implement" LL/SC by simulating a reservation using fields LLSC_* in the guest
state, and invalidating the reservation at every thread switch.
(Background) the fallback scheme is needed because the primary scheme is in
violation of the ARMv8 semantics in that it can (easily) introduce extra
memory references between the LL and SC, hence on some hardware causing the
reservation to always fail and so the simulated program to wind up looping
forever.
For these instructions, big picture:
* for the primary implementation, we take advantage of the fact that
IRStmt_LLSC allows I128 bit transactions to be represented. Hence we bundle
up the two 64-bit data elements into an I128 (or vice versa) and present a
single I128-typed IRStmt_LLSC in the IR. In the backend, those are
re-emitted as LDXP/STXP respectively. For LL/SC on 32-bit register pairs,
that bundling produces a single 64-bit item, and so the existing LL/SC
backend machinery handles it. The effect is that a doubleword 32-bit LL/SC
in the front end translates into a single 64-bit LL/SC in the back end.
Overall, though, the implementation is straightforward.
* for the fallback implementation, it is necessary to extend the guest state
field `guest_LLSC_DATA` to represent a 128-bit transaction, by splitting it
into _DATA_LO64 and DATA_HI64. Then, the implementation is an exact
analogue of the fallback implementation for single-word LL/SC. It takes
advantage of the fact that the backend already supports 128-bit CAS, as
fixed in bug 445354. As with the primary implementation, doubleword 32-bit
LL/SC is bundled into a single 64-bit transaction.
Detailed changes:
* new arm64 guest state fields LLSC_DATA_LO64/LLSC_DATA_LO64 to replace
guest_LLSC_DATA
* (ridealong fix) arm64 front end: a fix to a minor and harmless decoding bug
for the single-word LDX/STX case.
* arm64 front end: IR generation for LD{,A}XP/ST{,L}XP: tedious and
longwinded, but per comments above, an exact(ish) analogue of the singleword
case
* arm64 backend: new insns ARM64Instr_LdrEXP / ARM64Instr_StrEXP to wrap up 2
x 64 exclusive loads/stores. Per comments above, there's no need to handle
the 2 x 32 case.
* arm64 isel: translate I128-typed IRStmt_LLSC into the above two insns
* arm64 isel: some auxiliary bits and pieces needed to handle I128 values;
this is standard doubleword isel stuff
* arm64 isel: (ridealong fix): Ist_CAS: check for endianness of the CAS!
* arm64 isel: (ridealong) a couple of formatting fixes
* IR infrastructure: add support for I128 constants, done the same as V128
constants
* memcheck: handle shadow loads and stores for I128 values
* testcase: memcheck/tests/atomic_incs.c: on arm64, also test 128-bit atomic
addition, to check we really have atomicity right
* testcase: new test none/tests/arm64/ldxp_stxp.c, tests operation but not
atomicity. (Smoke test).
The sequence of instructions emitted by the arm64 backend for doubleword
compare-and-swap is incorrect. This could lead to incorrect simulation of the
AArch8.1 atomic instructions (CASP, at least). It also causes failures in the
upcoming fix for v8.0 support for LD{,A}XP/ST{,L}XP in bug 444399, at least
when running with the fallback LL/SC implementation
(`--sim-hints=fallback-llsc`, or as autoselected at startup). In the worst
case it can cause segfaulting in the generated code, because it could jump
backwards unexpectedly far.
The problem is the sequence emitted for ARM64in_CASP:
* the jump offsets are incorrect, both for `bne out` (x 2) and `cbnz w1, loop`.
* using w1 to hold the success indication of the stxp instruction trashes the
previous value in x1. But the value in x1 is an output of ARM64in_CASP,
hence one of the two output registers is corrupted. That confuses any code
downstream that want to inspect those values to find out whether or not the
transaction succeeded.
The fixes are to
* fix the branch offsets
* use a different register to hold the stxp success indication. w3 is a
convenient check.
The demangle-rust.vgtest would fail because the demangle-rust binary
wasn't build by default. Add it to check_PROGRAMS and define
demangle_rust_SOURCES to make sure it is always build.
It's currently broken due to a silly test that prevents the v0
demangling code from even running.
The commit also adds a test, to avoid such problems in the future.
The problem was that 'struct sigframe' has both a uContext struct
member and a puContext pointer to that struct. And puContext wasn't
being initialized to point to uContext.
It seems that the pthread sigreturn code uses puContext on i386.
amd64, with register arguments, didn't have this problem.
Because of what looks like some copy/paste issues the new F16 Iops
seem to be tested on the wrong architectures. They are only implemented
on arm64. So this patch only enables them for arm64.
https://bugs.kde.org/show_bug.cgi?id=444831
Contributed by Will Schmidt <will_schmidt@vnet.ibm.com>
This includes updates and adjustments as suggested by Carl.
Add tests that exercise PCRelative instructions.
These instructions are encoded with R==1, which indicate that
the memory accessed by the instruction is at a location
relative to the currently executing instruction.
These tests are built using -Wl,-text and -Wl,-bss
options to ensure the location of the target array is at a
location with a specific offset from the currently
executing instruction.
The write instructions are aimed at a large buffer in
the bss section; which is checked for updates at the
completion of each test.
In order to ensure consistent output across assorted
systems, the tests have been padded with ori, nop instructions
and align directives.
Detailed changes:
* Makefile.am: Add test_isa_3_1_R1_RT and test_isa_3_1_R1_XT tests.
* isa_3_1_helpers.h: Add identify_instruction_by_func_name() helper function
to indicate if the test is for R==1.
Add helpers to initialize and print changes to the pcrelative_write_target
array.
Add #define to help pad code with a series of eyecatcher ORI instructions.
* test_isa_3_1_R1_RT.c: New test.
* test_isa_3_1_R1_XT.c: New test.
* test_isa_3_1_R1_XT.stdout.exp: New expected output.
* test_isa_3_1_R1_XT.stdout.exp: New expected output.
* test_isa_3_1_R1_RT.stderr.exp: New expected output.
* test_isa_3_1_R1_RT.stderr.exp: New expected output.
* test_isa_3_1_R1_RT.vgtest: New test handler.
* test_isa_3_1_R1_XT.vgtest: New test handler.
* test_isa_3_1_common.c: Add indicators (updates_byte,updates_halfword,
updates_word) indicators to control the output from the R==1 tests.
Add helper check for "_R1" to indicate if instruction is coded with R==1.
Add init and print helpers for the pcrelative_write_target array.
The pstq instruction for R=1, was not using the correct effective address.
The EA_hi and EA_lo should have been based on the value of EA as calculated
by the function calculate_prefix_EA. Unfortuanely, the EA_hi and EA_lo
addresses were still using the previous code (not PC relative) to calculate
the address from the contants of RA plus the offset.
On some systems the gdbserver_tests would fail because the filter
for the optimized hwcaps subdir didn't match because the file is
called slightly differently, with the version number before .so
instead of after. For example: /lib64/glibc-hwcaps/power9/libc-2.28.so
Add one extra filter for this pattern.
The lxsibzx was doing a 64-bit load. The result was initializing
additional bytes in the register that should not have been initialized.
The memcheck/tests/linux/dlclose_leak test detected the issue. The
code generation uses lxsibzx and stxsibx with -mcpu=power9. Previously
the lbz and stb instructions were generated.
The same issue was noted and fixed with the lxsihzx instruction. The
memcheck/tests/linux/badrw test now passes as well.
https://bugs.kde.org/show_bug.cgi?id=444571
In s390_irgen_EXRL, the offset is zero-extended instead of sign-extended,
typically causing Valgrind to crash when a negative offset occurs.
Fix this with a new helper function that calculates a "relative long"
address from a 32-bit offset. Replace other calculations of "relative
long" addresses by invocations of this function as well. And for
consistency, do the same with "relative" (short) addresses.
Currently the version is updated in 3 places, configure.ac,
include/valgrind.h and docs/xml/vg-entities.xml. This goes wrong from
time to time. So only define the version (and release date) once in
configure.ac and update both other places at configure time.
Depending on architecture glibc has various functions that set things
up to call "main". glibc 2.34 added __libc_start_call_main (at least
on ppc64le and s390x). Other variants recognized are __libc_start_main,
generic_start_main and variants of those names.
This fixes the massif/tests/deep-D and massif/tests/mmapunmap on ppc64le.