This is a system call introduced in Linux 5.9.
It's typically used to bulk-close file descriptors that a process inherited
without having desired so and doesn't want to pass them to its offspring
for security reasons. For this reason the sensible upper limit value tends
to be unknown and the users prefer to stay on the safe side by setting it
high.
This is a bit peculiar because, if unfiltered, the syscall could end up
closing descriptors Valgrind uses for its purposes, ending in no end of
mayhem and suffering.
This patch adjusts the upper bounds to a safe value and then skips over
the descriptor Valgrind uses by potentially calling the real system call
with sub-ranges that are safe to close.
The call can fail on negative ranges and bad flags -- we're dealing with
the first condition ourselves while letting the real call fail on bad
flags.
https://bugs.kde.org/show_bug.cgi?id=439090
glibc 2.34 will try to use clone3 first before falling back to
the clone syscall. So implement clone3 as sys_ni_syscall which
simply return ENOSYS without producing a warning.
https://bugs.kde.org/show_bug.cgi?id=439590
io_uring syscalls only work on x86/amd64, but they can be enabled on
all arches. Based on a patch by Nathan Ringo <nathan@remexre.xyz>.
https://bugs.kde.org/show_bug.cgi?id=423361
faccessat2 is a new syscall in linux 5.8 and will be used by glibc 2.33.
faccessat2 is simply faccessat with a new flag argument. It has
a common number across all linux arches.
https://bugs.kde.org/427787
The only "special" thing about these syscalls is that the given
struct sched_attr determines its own size for future expansion.
Original fix by "ISHIKAWA,chiaki" <ishikawa@yk.rim.or.jp>
https://bugs.kde.org/show_bug.cgi?id=369029
I've tested this on amd64 and arm but I'm enabling it on all
arches since the syscall should work identically on all of them.
This was requested by users for a long time (almost 5 years) and
in fact, some programs (like libvirt) use namespaces and fork off
to enter other namespaces. Lack of implementation means valgrind
can't be used with these programs (or their configuration must be
changed to not use namespaces, which defeats the purpose).
Without knowing it, I've converged to same patch as mentioned in
bugs below.
https://bugs.kde.org/show_bug.cgi?id=343099https://bugs.kde.org/show_bug.cgi?id=368923https://bugs.kde.org/show_bug.cgi?id=369031
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
- Restore guest sigmask in VG_(sigframe_destroy)
- Use "syscall[32]" asm idiom instead of "syscall" with immediate parameter
in VG_(nanomips_linux_SUBST_FOR_rt_sigreturn)
- Call ML_(fixup_guest_state_to_restart_syscall) from PRE(sys_rt_sigreturn)
- Tiny code refactor of sigframe-nanomips-linux.c
This fixes none/tests/thread-exits.
- Reset syscall return register (a0) in clone_new_thread()
- Use "syscall[32]" asm idiom instead of "syscall" with immediate parameter
in ML_ (call_on_new_stack_0_1)()
- Optimize stack usage in ML_ (call_on_new_stack_0_1)()
- Code refactor of ML_ (call_on_new_stack_0_1)()
It partially fixes all tests which use clone system call, e.g. none/tests/pth_atfork1.
Patch by Aleksandar Rikalo.
Necessary changes to support nanoMIPS on Linux.
Part 2/4 - Coregrind changes
Patch by Aleksandar Rikalo, Dimitrije Nikolic, Tamara Vlahovic and
Aleksandra Karadzic.
Related KDE issue: #400872.