memfd_secret is a new syscall in linux 5.14. memfd_secret() is
disabled by default and a command-line option needs to be added to
enable it at boot time.
$ cat /proc/cmdline
[...] secretmem.enable=y
https://bugs.kde.org/451878https://lwn.net/Articles/865256/
Found this by testing the Solaris execx (the bits that are
Linux-cmpatible) test. That was giving
--28286-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting
--28286-- si_code=2; Faulting address: 0x4A0095A; sp: 0x1002ca9c88
valgrind: the 'impossible' happened:
Killed by fatal signal
host stacktrace:
==28286== at 0x5803DE54: vgPlain_strcpy (m_libcbase.c:309)
==28286== by 0x5810A9B3: vgSysWrap_linux_sys_execveat_before (syswrap-linux.c:13310)
==28286== by 0x580953C9: vgPlain_client_syscall (syswrap-main.c:2234)
It's a mistake to copy the path obtained with VG_(resolve_filename) to
the client ARG2, it's unlikely to have space for the path.
Instead just copy the pointer.
For BPF_RAW_TRACEPOINT_OPEN attr->raw_tracepoint.name may be NULL.
Otherwise it should point to a valid (max 128 char) string. Only
raw_tracepoint.prog_fd needs to be set.
https://bugs.kde.org/show_bug.cgi?id=451626
On s390x Linux platforms the sys_ipc semtimedop call has four instead of
five parameters, where the timeout is passed in the third instead of the
fifth.
Reflect this difference in the handling of VKI_SEMTIMEDOP.
In POST(sys_io_uring_setup) we tried to use record_fd_open_with_given_name
with ARG1 as name. But ARG1 isn't a char pointer. So this might crash with
--track-fds=yes. Since no (file) name is associated with the fd returned by
io_uring_setup use record_fd_open_nameless instead.
https://bugs.kde.org/show_bug.cgi?id=449838
Implement BPF_MAP_LOOKUP_AND_DELETE_ELEM (command 21) and
BPF_MAP_FREEZE (command 22) and produce a WARNING instead of a fatal
error for unrecognized BPF commands.
https://bugs.kde.org/show_bug.cgi?id=426148
This is a system call introduced in Linux 5.9.
It's typically used to bulk-close file descriptors that a process inherited
without having desired so and doesn't want to pass them to its offspring
for security reasons. For this reason the sensible upper limit value tends
to be unknown and the users prefer to stay on the safe side by setting it
high.
This is a bit peculiar because, if unfiltered, the syscall could end up
closing descriptors Valgrind uses for its purposes, ending in no end of
mayhem and suffering.
This patch adjusts the upper bounds to a safe value and then skips over
the descriptor Valgrind uses by potentially calling the real system call
with sub-ranges that are safe to close.
The call can fail on negative ranges and bad flags -- we're dealing with
the first condition ourselves while letting the real call fail on bad
flags.
https://bugs.kde.org/show_bug.cgi?id=439090
This is a followup to 2a7d3ae76, in the case rust code runs against a
glibc that supports statx but a kernel that doesn't, in which case glibc
falls back to fstatat.
https://bugs.kde.org/show_bug.cgi?id=433641
Some programs open /proc/self/exe to read some data. Currently valgrind
supports following the /proc/self/exe link (to the original binary, so you
could then open that), but directly opening /proc/self/exe will open the
valgrind tool, not the executable file itself.
Add ML_(handle_self_exe_open) which dups VG_(cl_exec_fd) if the file
to open is /proc/self/exe or /proc/<pid>/exe. And do the same for openat.
https://bugs.kde.org/show_bug.cgi?id=140178
The shmctl syscall on amd64, arm64 and riscv (but we don't have a port
for that last one) always use IPC_64. Explicitly pass it to the generic
PRE/POST handlers so they select the correct (64bit) data structures on
those architectures.
https://bugzilla.redhat.com/show_bug.cgi?id=1909548
faccessat2 is a new syscall in linux 5.8 and will be used by glibc 2.33.
faccessat2 is simply faccessat with a new flag argument. It has
a common number across all linux arches.
https://bugs.kde.org/427787
m_syswrap/syswrap-linux.c:3716:10: warning: format '%ld' expects argument of type 'long int', but argument 4 has type 'RegWord' {aka 'long unsigned int'} [-Wformat=]
struct vki_epoll_event is packed on x86_64, but not on other 64bit
arches. This means that on 64bit arches there can be padding in the
epoll_event struct. Seperately the data field is only used by user
space (which might not set the data field if it doesn't need to).
Only check the events field on epoll_ctl. But assume both events
and data are both written to by epoll_[p]wait (exclude padding).
https://bugs.kde.org/show_bug.cgi?id=422623
The only "special" thing about these syscalls is that the given
struct sched_attr determines its own size for future expansion.
Original fix by "ISHIKAWA,chiaki" <ishikawa@yk.rim.or.jp>
https://bugs.kde.org/show_bug.cgi?id=369029
I've tested this on amd64 and arm but I'm enabling it on all
arches since the syscall should work identically on all of them.
This was requested by users for a long time (almost 5 years) and
in fact, some programs (like libvirt) use namespaces and fork off
to enter other namespaces. Lack of implementation means valgrind
can't be used with these programs (or their configuration must be
changed to not use namespaces, which defeats the purpose).
Without knowing it, I've converged to same patch as mentioned in
bugs below.
https://bugs.kde.org/show_bug.cgi?id=343099https://bugs.kde.org/show_bug.cgi?id=368923https://bugs.kde.org/show_bug.cgi?id=369031
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
- The release 3.15 introduced a backward incompatible change for
some suppression entries related to preadv and pwritev syscalls.
When reading a suppression entry using the unsupported 3.14 format,
valgrind will now produce a warning to say the suppression entry will not
work, and suggest the needed change.
For example, in 3.14, the extra line was:
pwritev(vector[...])
while in 3.15, it became e.g.
pwritev(vector[2])
3 possible fixes were discussed:
* revert the 3.15 change to go back to 3.14 format.
This is ugly because valgrind 3.16 would be incompatible
with the supp entries for 3.15.
* make the suppression matching logic consider that ... is a wildcard
equivalent to a *.
This is ugly because the suppression matching logic/functionality
is already very complex, and ... would mean 2 different things
in a suppression entry: wildcard in the extra line, and whatever
nr of stackframes in the backtrace portion of the supp entry.
* keep the 3.15 format, and accept the incompatibility with 3.14 and before.
This is ugly as valgrind 3.16 and above are still incompatible with 3.14
and before.
The third option was deemed the less ugly, in particular because it was possible
to detect the incompatible unsupported supp entry and produce a warning.
So, now, valgrind reports a warning when such an entry is detected, giving
e.g. a behaviour such as:
==21717== WARNING: pwritev(vector[...]) is an obsolete suppression line not supported in valgrind 3.15 or later.
==21717== You should replace [...] by a specific index such as [0] or [1] or [2] or similar
==21717==
....
==21717== Syscall param pwritev(vector[1]) points to unaddressable byte(s)
==21717== at 0x495B65A: pwritev (pwritev64.c:30)
==21717== by 0x1096C5: main (sys-preadv_pwritev.c:69)
==21717== Address 0xffffffffffffffff is not stack'd, malloc'd or (recently) free'd
So, we can hope that users having incompatible entries will easily understand
the problem of the supp entry not matching anymore.
In future releases of valgrind, we must take care to:
* never change the extra string produced for an error, unless *really* necessary
* minimise as much as possible 'variable' information generated dynamically
in error extra string. Such extra information can be reported in the rest
of the error message (like the address above for example).
The user can use e.g. GDB + vgdb to examine in details the offending
data or parameter values or uninitialised bytes or ...
A comment is added in pub_tool_errormgr.h to remind tool developers of the above.
This patch adds sycall wrappers for the following syscalls which
use a 64bit time_t on 32bit arches: gettime64, settime64,
clock_getres_time64, clock_nanosleep_time64, timer_gettime64,
timer_settime64, timerfd_gettime64, timerfd_settime64,
utimensat_time64, pselect6_time64, ppoll_time64, recvmmsg_time64,
mq_timedsend_time64, mq_timedreceive_time64, semtimedop_time64,
rt_sigtimedwait_time64, futex_time64 and sched_rr_get_interval_time64.
Still missing are clock_adjtime64 and io_pgetevents_time64.
For the more complicated syscalls futex[_time64], pselect6[_time64]
and ppoll[_time64] there are shared pre and/or post helper functions.
Other functions just have their own PRE and POST handler.
Note that the vki_timespec64 struct really is the struct as used by
by glibc (it internally translates a 32bit timespec struct to a 64bit
timespec64 struct before passing it to any of the time64 syscalls).
The kernel uses a 64-bit signed int, but is ignoring the upper 32 bits
of the tv_nsec field. It does always write the full struct though.
So avoid checking the padding is only needed for PRE_MEM_READ.
There are two helper pre_read_timespec64 and pre_read_itimerspec64
to check the new structs.
https://bugs.kde.org/show_bug.cgi?id=416753
The CLONE_VFORK flag causes the parent to suspend until the child
exits or execs so without the memory sharing CLONE_VM would give
this is really closer to fork but we convert vfork to fork by
removing CLONE_VM anyway so there is no reason not to allow this.
Fixes BZ#417906
- Reset syscall return register (a0) in clone_new_thread()
- Use "syscall[32]" asm idiom instead of "syscall" with immediate parameter
in ML_ (call_on_new_stack_0_1)()
- Optimize stack usage in ML_ (call_on_new_stack_0_1)()
- Code refactor of ML_ (call_on_new_stack_0_1)()
It partially fixes all tests which use clone system call, e.g. none/tests/pth_atfork1.
Patch by Aleksandar Rikalo.
Necessary changes to support nanoMIPS on Linux.
Part 3/4 - Coregrind and tools changes
Patch by Aleksandar Rikalo, Dimitrije Nikolic, Tamara Vlahovic,
Nikola Milutinovic and Aleksandra Karadzic.
Related KDE issue: #400872.
Necessary changes to support nanoMIPS on Linux.
Part 2/4 - Coregrind changes
Patch by Aleksandar Rikalo, Dimitrije Nikolic, Tamara Vlahovic and
Aleksandra Karadzic.
Related KDE issue: #400872.
Support for amd64, x86 - 64 and 32 bit, arm64, ppc64, ppc64le,
s390x, mips64. This should work identically on all
arches, tested on x86 32bit and 64bit one, but enabled on all.
Refactor the code to be reusable between old/new syscalls. Resolve TODO
items in the code. Add the testcase for the preadv2/pwritev2 and also
add the (similar) testcase for the older preadv/pwritev syscalls.
Trying to test handling an uninitialized flag argument for the v2 syscalls
does not work because the flag always comes out as defined zero.
Turns out glibc does this deliberately on 64bit architectures because
the kernel does actually have a low_offset and high_offset argument, but
ignores the high_offset/assumes it is zero.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=601cc11d054ae4b5e9b5babec3d8e4667a2cb9b5https://bugs.kde.org/408414
This implements minimal support for the pkey_alloc, pkey_free and
pkey_mprotect syscalls. pkey_alloc will simply indicate that pkeys
are not supported. pkey_free always fails. pkey_mprotect works just
like mprotect if the special pkey -1 is provided.
https://bugs.kde.org/show_bug.cgi?id=408091
Sync VEX/LICENSE.GPL with top-level COPYING file. We used 3 different
addresses for writing to the FSF to receive a copy of the GPL. Replace
all different variants with an URL <http://www.gnu.org/licenses/>.
The following files might still have some slightly different (L)GPL
copyright notice because they were derived from other programs:
- files under coregrind/m_demangle which come from libiberty:
cplus-dem.c, d-demangle.c, demangle.h, rust-demangle.c,
safe-ctype.c and safe-ctype.h
- coregrind/m_demangle/dyn-string.[hc] derived from GCC.
- coregrind/m_demangle/ansidecl.h derived from glibc.
- VEX files for FMA detived from glibc:
host_generic_maddf.h and host_generic_maddf.c
- files under coregrin/m_debuginfo derived from LZO:
lzoconf.h, lzodefs.h, minilzo-inl.c and minilzo.h
- files under coregrind/m_gdbserver detived from GDB:
gdb/signals.h, inferiors.c, regcache.c, regcache.h,
regdef.h, remote-utils.c, server.c, server.h, signals.c,
target.c, target.h and utils.c
Plus the following test files:
- none/tests/ppc32/testVMX.c derived from testVMX.
- ppc tests derived from QEMU: jm-insns.c, ppc64_helpers.h
and test_isa_3_0.c
- tests derived from bzip2 (with embedded GPL text in code):
hackedbz2.c, origin5-bz2.c, varinfo6.c
- tests detived from glibc: str_tester.c, pth_atfork1.c
- test detived from GCC libgomp: tc17_sembar.c
- performance tests derived from bzip2 or tinycc (with embedded GPL
text in code): bz2.c, test_input_for_tinycc.c and tinycc.c
GCC 7 instroduced -Wimplicit-fallthrough
https://developers.redhat.com/blog/2017/03/10/wimplicit-fallthrough-in-gcc-7/
It caught a couple of bugs, but it does need a bit of extra comments to
explain when a switch case statement fall-through is deliberate. Luckily
with -Wimplicit-fallthrough=2 various existing comments already do that.
I have fixed the bugs, but adding explicit break statements where
necessary and added comments where the fall-through was correct.
https://bugs.kde.org/show_bug.cgi?id=405430
When code uses utimensat with UTIME_NOW or UTIME_OMIT valgrind memcheck
would generate a warning. But as the utimensat manpage says:
If the tv_nsec field of one of the timespec structures has the special
value UTIME_NOW, then the corresponding file timestamp is set to the
current time. If the tv_nsec field of one of the timespec structures
has the special value UTIME_OMIT, then the corresponding file timestamp
is left unchanged. In both of these cases, the value of the corre‐
sponding tv_sec field is ignored.
So ignore the timespec tv_sec when tv_nsec is set to UTIME_NOW or
UTIME_OMIT.
Support for the bpf system call was added in a previous commit, but
did not include tracking for file descriptors handled by the call.
Add checks and tracking for file descriptors. Check in PRE() wrapper
that all file descriptors (pointing to object such as eBPF programs or
maps, cgroups, or raw tracepoints) used by the system call are valid,
then add tracking in POST() wrapper for newly produced file descriptors.
As the file descriptors are not always processed in the same way by the
bpf call, add to the header file some additional definitions from bpf.h
that are necessary to sort out under what conditions descriptors should
be checked in the PRE() helper.
Fixes: 388786 - Support bpf syscall in amd64 Linux
Add support for bpf() Linux-specific system call on amd64 platform. The
bpf() syscall is used to handle eBPF objects (programs and maps), and
can be used for a number of operations. It takes three arguments:
- "cmd" is an integer encoding a subcommand to run. Available subcommand
include loading a new program, creating a map or updating its entries,
retrieving information about an eBPF object, and may others.
- "attr" is a pointer to an object of type union bpf_attr. This object
converts to a struct related to selected subcommand, and embeds the
various parameters used with this subcommand. Some of those parameters
are read by the kernel (example for an eBPF map lookup: the key of the
entry to lookup), others are written into (the value retrieved from
the map lookup).
- "attr_size" is the size of the object pointed by "attr".
Since the action performed by the kernel, and the way "attr" attributes
are processed depends on the subcommand in use, the PRE() and POST()
wrappers need to make the distinction as well. For each subcommand, mark
the attributes that are read or written.
For some map operations, the only way to infer the size of the memory
areas used for read or write operations seems to involve reading
from /proc/<pid>/fdinfo/<fd> in order to retrieve the size of keys
and values for this map.
The definitions of union bpf_attr and of other eBPF-related elements
required for adequately performing the checks were added to the Linux
header file.
Processing related to file descriptors is added in a follow-up patch.
The sys_prctl wrapper with PR_SET_NAME option reads an ASCII string passed
as its second argument. This string is supposed to be shorter than a given
limit. As the actual length of the string is unknown, the PRE() wrapper
performs a number of checks on it, including, in worst case, trying to
dereference it byte by byte.
To avoid re-implementing all this logic for other wrappers that could
need it, get the string processing out of the wrapper and move it to a
static function. Note that passing tid as an argument to the function is
required for macros PRE_MEM_RASCIIZ and PRE_MEM_READ to work properly.