On other arches stpcpy () is intercepted for both libc.so and ld.so.
But not on arm64, where it is only intercepted for libc.so.
This can cause memcheck warnings about the use of stpcpy () in ld.so
when called through dlopen () because ld.so contains its own copy of
that functions.
Fix by introducing VG_Z_LD_LINUX_AARCH64_SO_1 (the encoded name of
ld.so on arm64) and using that in vg_replace_strmem.c to intercept
stpcpy.
https://bugs.kde.org/show_bug.cgi?id=407307
Based on a patch from Łukasz Marek.
Note that this function differs from:
*(T*)VG_(indexXA)(arr, index) = new_value;
as the function will mark the array as unsorted.
Note that this function is currently unused in the current valgrind code basis,
but it is useful for tools outside of valgrind tree.
This commit thoroughly overhauls DHAT, moving it out of the
"experimental" ghetto. It makes moderate changes to DHAT itself,
including dumping profiling data to a JSON format output file. It also
implements a new data viewer (as a web app, in dhat/dh_view.html).
The main benefits over the old DHAT are as follows.
- The separation of data collection and presentation means you can run a
program once under DHAT and then sort the data in various ways. Also,
full data is in the output file, and the viewer chooses what to omit.
- The data can be sorted in more ways than previously. Some of these
sorts involve useful filters such as "short-lived" and "zero reads or
zero writes".
- The tree structure view avoids the need to choose stack trace depth.
This avoids both the problem of not enough depth (when records that
should be distinct are combined, and may not contain enough
information to be actionable) and the problem of too much depth (when
records that should be combined are separated, making them seem less
important than they really are).
- Byte and block measures are shown with a percentage relative to the
global count, which helps gauge relative significance of different
parts of the profile.
- Byte and blocks measures are also shown with an allocation rate
(bytes and blocks per million instructions), which enables comparisons
across multiple profiles, even if those profiles represent different
workloads.
- Both global and per-node measurements are taken at the global heap
peak ("At t-gmax"), which gives Massif-like insight into the point of
peak memory use.
- The final/liftimes stats are a bit more useful than the old deaths
stats. (E.g. the old deaths stats didn't take into account lifetimes
of unfreed blocks.)
- The handling of realloc() has changed. The sequence `p = malloc(100);
realloc(p, 200);` now increases the total block count by 2 and the
total byte count by 300. Previously it increased them by 1 and 200.
The new handling is a more operational view that better reflects the
effect of allocations on performance. It makes a significant
difference in the results, giving paths involving reallocation (e.g.
repeated pushing to a growing vector) more prominence.
Other things of note:
- There is now testing, both regression tests that run within the
standard test suite, and viewer-specific tests that cannot run within
the standard test suite. The latter are run by loading
dh_view.html?test=1 in a web browser.
- The commit puts all tool lists in Makefiles (and similar files) in the
following consistent order: memcheck, cachegrind, callgrind, helgrind,
drd, massif, dhat, lackey, none; exp-sgcheck, exp-bbv.
- A lot of fields in dh_main.c have been given more descriptive names.
Those names now match those used in dh_view.js.
[This commit contains an implementation for all targets except amd64-solaris
and x86-solaris, which will be completed shortly.]
In the baseline simulator, jumps to guest code addresses that are not known at
JIT time have to be looked up in a guest->host mapping table. That means:
indirect branches, indirect calls and most commonly, returns. Since there are
huge numbers of these (often 10+ million/second) the mapping mechanism needs
to be extremely cheap.
Currently, this is implemented using a direct-mapped cache, VG_(tt_fast), with
2^15 (guest_addr, host_addr) pairs. This is queried in handwritten assembly
in VG_(disp_cp_xindir) in dispatch-<arch>-<os>.S. If there is a miss in the
cache then we fall back out to C land, and do a slow lookup using
VG_(search_transtab).
Given that the size of the translation table(s) in recent years has expanded
significantly in order to keep pace with increasing application sizes, two bad
things have happened: (1) the cost of a miss in the fast cache has risen
significantly, and (2) the miss rate on the fast cache has also increased
significantly. This means that large (~ one-million-basic-blocks-JITted)
applications that run for a long time end up spending a lot of time in
VG_(search_transtab).
The proposed fix is to increase associativity of the fast cache, from 1
(direct mapped) to 4. Simulations of various cache configurations using
indirect-branch traces from a large application show that is the best of
various configurations. In an extreme case with 5.7 billion indirect
branches:
* The increase of associativity from 1 way to 4 way, whilst keeping the
overall cache size the same (32k guest/host pairs), reduces the miss rate by
around a factor of 3, from 4.02% to 1.30%.
* The use of a slightly better hash function than merely slicing off the
bottom 15 bits of the address, reduces the miss rate further, from 1.30% to
0.53%.
Overall the VG_(tt_fast) miss rate is almost unchanged on small workloads, but
reduced by a factor of up to almost 8 on large workloads.
By implementing each (4-entry) cache set using a move-to-front scheme in the
case of hits in ways 1, 2 or 3, the vast majority of hits can be made to
happen in way 0. Hence the cost of having this extra associativity is almost
zero in the case of a hit. The improved hash function costs an extra 2 ALU
shots (a shift and an xor) but overall this seems performance neutral to a
win.
PTRACE_GET_THREAD_AREA is not handled by amd64 linux syswrap, which leads
to false positive errors in 64 bits program ptrace-ing 32 bits processes.
For example, the below error was wrongly reported on GDB:
==25377== Conditional jump or move depends on uninitialised value(s)
==25377== at 0x8A1D7EC: td_thr_get_info (td_thr_get_info.c:35)
==25377== by 0x526819: thread_from_lwp(thread_info*, ptid_t) (linux-thread-db.c:417)
==25377== by 0x5281D4: thread_db_notice_clone(ptid_t, ptid_t) (linux-thread-db.c:442)
==25377== by 0x51773B: linux_handle_extended_wait(lwp_info*, int) (linux-nat.c:2027)
....
==25377== Uninitialised value was created by a stack allocation
==25377== at 0x69A360: x86_linux_get_thread_area(int, void*, unsigned int*) (x86-linux-nat.c:278)
Fix this by implementing PTRACE_GET|SET_THREAD_AREA on amd64.
When code uses utimensat with UTIME_NOW or UTIME_OMIT valgrind memcheck
would generate a warning. But as the utimensat manpage says:
If the tv_nsec field of one of the timespec structures has the special
value UTIME_NOW, then the corresponding file timestamp is set to the
current time. If the tv_nsec field of one of the timespec structures
has the special value UTIME_OMIT, then the corresponding file timestamp
is left unchanged. In both of these cases, the value of the corre‐
sponding tv_sec field is ignored.
So ignore the timespec tv_sec when tv_nsec is set to UTIME_NOW or
UTIME_OMIT.
Support for the bpf system call was added in a previous commit, but
did not include tracking for file descriptors handled by the call.
Add checks and tracking for file descriptors. Check in PRE() wrapper
that all file descriptors (pointing to object such as eBPF programs or
maps, cgroups, or raw tracepoints) used by the system call are valid,
then add tracking in POST() wrapper for newly produced file descriptors.
As the file descriptors are not always processed in the same way by the
bpf call, add to the header file some additional definitions from bpf.h
that are necessary to sort out under what conditions descriptors should
be checked in the PRE() helper.
Fixes: 388786 - Support bpf syscall in amd64 Linux
Add support for bpf() Linux-specific system call on amd64 platform. The
bpf() syscall is used to handle eBPF objects (programs and maps), and
can be used for a number of operations. It takes three arguments:
- "cmd" is an integer encoding a subcommand to run. Available subcommand
include loading a new program, creating a map or updating its entries,
retrieving information about an eBPF object, and may others.
- "attr" is a pointer to an object of type union bpf_attr. This object
converts to a struct related to selected subcommand, and embeds the
various parameters used with this subcommand. Some of those parameters
are read by the kernel (example for an eBPF map lookup: the key of the
entry to lookup), others are written into (the value retrieved from
the map lookup).
- "attr_size" is the size of the object pointed by "attr".
Since the action performed by the kernel, and the way "attr" attributes
are processed depends on the subcommand in use, the PRE() and POST()
wrappers need to make the distinction as well. For each subcommand, mark
the attributes that are read or written.
For some map operations, the only way to infer the size of the memory
areas used for read or write operations seems to involve reading
from /proc/<pid>/fdinfo/<fd> in order to retrieve the size of keys
and values for this map.
The definitions of union bpf_attr and of other eBPF-related elements
required for adequately performing the checks were added to the Linux
header file.
Processing related to file descriptors is added in a follow-up patch.
This patch makes sure that the process running under valgrind only sees
the AES, PMULL, SHA1, SHA2, CRC32, FP, and ASIMD features in auxv AT_HWCAPS.
https://bugs.kde.org/show_bug.cgi?id=381556
Adding MIPS N32 ABI support.
BZ issue - #345763.
Contributed and maintained by mulitple people over the years:
Crestez Dan Leonard, Maran Pakkirisamy, Dimitrije Nikolic,
Aleksandar Rikalo, Tamara Vlahovic.
Use RegWord type in mips64.
Part of the changes required for MIPS N32 ABI support.
BZ issue - #345763.
Contributed by:
Dimitrije Nikolic, Aleksandar Rikalo and Tamara Vlahovic.
The modified test none/tests/sem crashes with a SEGV when valgrind is compiled
with lto on various amd64 platforms (debian/gcc 6.3, RHEL7/gcc 6.4,
Ubuntu/gcc 7.2)
The problem is that the vki_semid_ds buf is not what is expected by the kernel:
the kernel expects a bigger structure vki_semid64_ds (at least on
these platforms).
Getting the sem_nsems seems to work by chance, as sem_nsems is at
the same offset in both vki_semid_ds and vki_semid64_ds.
However, e.g. the ctime was not set properly after syscall return,
and 2 words after sem_nsems were set to 0 by the kernel, causing
the SEGV, as a spilled register became 0.
Fix consists in using the 64 bit version for __NR_semctl.
Tested on debian/amd64 and s390x.
Shingled magnetic recording drives support a command set called ZBC
(Zoned Block Commands). Two new ioctls have been added to the Linux
kernel to support such drives, namely VKI_BLKREPORTZONE and
VKI_BLKRESETZONE. Add support to Valgrind for these ioctls.
This patch implements the flag --delta-stacktrace=yes/no.
Yes indicates to calculate the full history stack traces by
changing just the last frame if no call/return instruction was
executed.
This can speed up helgrind by up to 25%.
This flags is currently set to yes only on linux x86 and amd64, as some
platform dependent validation of the used heuristics is needed before
setting the default to yes on a platform. See function check_cached_rcec_ok
in libhb_core.c for more details about how to validate/check the behaviour
on a new platform.
Set correct values from Linux kernel.
See ./arch/mips/include/uapi/asm/sockios.h
This issue is covered by newly introduced memcheck test mips32/bad_sioc.
g++ 4.4.7 doesn't accept union field initializers:
In file included from ../../include/pub_tool_vki.h:50,
from valgrind_cpp_test.cpp:13:
../../include/vki/vki-linux.h: In function ‘vki_cmsghdr* __vki_cmsg_nxthdr(void*, __vki_kernel_size_t, vki_cmsghdr*)’:
../../include/vki/vki-linux.h:673: error: expected primary-expression before ‘.’ token
Assign value after declaration which works for any g++ version.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@16437
- VG_MINIMAL_SETJMP and VG_MINIMAL_LONGJMP for VGP_mips64_linux are defined.
- Implementation of VG_MINIMAL_SETJMP and VG_MINIMAL_LONGJMP for mips32 is
improved by rescuing FP registers.
This should unbreak mips64/clang build.
Patch by Aleksandar Rikalo.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@16378