ftmemsim-valgrind

mirror of https://github.com/Zenithsiz/ftmemsim-valgrind.git synced 2026-02-13 14:42:03 +00:00

Author	SHA1	Message	Date
Julian Seward	f96d131ce2	Bug 402781 - Redo the cache used to process indirect branch targets. Implementation for x86-solaris and amd64-solaris. This completes the implementations for all targets. Note these two are untested because I don't have any way to test them.	2019-01-25 09:27:23 +01:00
Julian Seward	50bb127b1d	Bug 402781 - Redo the cache used to process indirect branch targets. [This commit contains an implementation for all targets except amd64-solaris and x86-solaris, which will be completed shortly.] In the baseline simulator, jumps to guest code addresses that are not known at JIT time have to be looked up in a guest->host mapping table. That means: indirect branches, indirect calls and most commonly, returns. Since there are huge numbers of these (often 10+ million/second) the mapping mechanism needs to be extremely cheap. Currently, this is implemented using a direct-mapped cache, VG_(tt_fast), with 2^15 (guest_addr, host_addr) pairs. This is queried in handwritten assembly in VG_(disp_cp_xindir) in dispatch-<arch>-<os>.S. If there is a miss in the cache then we fall back out to C land, and do a slow lookup using VG_(search_transtab). Given that the size of the translation table(s) in recent years has expanded significantly in order to keep pace with increasing application sizes, two bad things have happened: (1) the cost of a miss in the fast cache has risen significantly, and (2) the miss rate on the fast cache has also increased significantly. This means that large (~ one-million-basic-blocks-JITted) applications that run for a long time end up spending a lot of time in VG_(search_transtab). The proposed fix is to increase associativity of the fast cache, from 1 (direct mapped) to 4. Simulations of various cache configurations using indirect-branch traces from a large application show that is the best of various configurations. In an extreme case with 5.7 billion indirect branches: * The increase of associativity from 1 way to 4 way, whilst keeping the overall cache size the same (32k guest/host pairs), reduces the miss rate by around a factor of 3, from 4.02% to 1.30%. * The use of a slightly better hash function than merely slicing off the bottom 15 bits of the address, reduces the miss rate further, from 1.30% to 0.53%. Overall the VG_(tt_fast) miss rate is almost unchanged on small workloads, but reduced by a factor of up to almost 8 on large workloads. By implementing each (4-entry) cache set using a move-to-front scheme in the case of hits in ways 1, 2 or 3, the vast majority of hits can be made to happen in way 0. Hence the cost of having this extra associativity is almost zero in the case of a hit. The improved hash function costs an extra 2 ALU shots (a shift and an xor) but overall this seems performance neutral to a win.	2019-01-25 09:14:56 +01:00
Andreas Arnez	467c7c4c96	Bug 403552 s390x: Fix vector facility bit number The wrong bit number was used when checking for the vector facility. This can result in a fatal emulation error: "Encountered an instruction that requires the vector facility. That facility is not available on this host." In many cases the wrong facility bit was usually set as well, hence nothing bad happened. But when running Valgrind within a Qemu/KVM guest, the wrong bit was not (always?) set and the emulation error occurred. This fix simply corrects the vector facility bit number, changing it from 128 to 129.	2019-01-24 11:11:51 +01:00
Philippe Waroquiers	d7d8231750	Fix false positive 'Conditional jump or move' on amd64 64 bits ptracing 32 bits. PTRACE_GET_THREAD_AREA is not handled by amd64 linux syswrap, which leads to false positive errors in 64 bits program ptrace-ing 32 bits processes. For example, the below error was wrongly reported on GDB: ==25377== Conditional jump or move depends on uninitialised value(s) ==25377== at 0x8A1D7EC: td_thr_get_info (td_thr_get_info.c:35) ==25377== by 0x526819: thread_from_lwp(thread_info, ptid_t) (linux-thread-db.c:417) ==25377== by 0x5281D4: thread_db_notice_clone(ptid_t, ptid_t) (linux-thread-db.c:442) ==25377== by 0x51773B: linux_handle_extended_wait(lwp_info, int) (linux-nat.c:2027) .... ==25377== Uninitialised value was created by a stack allocation ==25377== at 0x69A360: x86_linux_get_thread_area(int, void, unsigned int) (x86-linux-nat.c:278) Fix this by implementing PTRACE_GET\|SET_THREAD_AREA on amd64.	2019-01-12 15:35:59 +01:00
Mark Wielaard	3528f84037	readdwarf3.c (parse_type_DIE): Accept DW_TAG_subrange_type with DW_AT_count GCC9 generates a subrange_type with a lower_bound and count, but no upper_bound attribute. This simply means the upper bound is lower plus count.	2019-01-11 21:52:58 +01:00
Mark Wielaard	c512949082	Bug 402480 Do not use %esp in clobber list. This is the same fix as for amd64-linux, but now for x86-linux.	2019-01-11 20:00:21 +01:00
Mark Wielaard	2c1f016e63	Bug 402519 - POWER 3.0 addex instruction incorrectly implemented addex uses OV as carry in and carry out. For all other instructions OV is the signed overflow flag. And instructions like adde use CA as carry. Replace set_XER_OV_OV32 with set_XER_OV_OV32_ADDEX, which will call calculate_XER_CA_64 and calculate_XER_CA_32, but with OV as input, and sets OV and OV32. Enable test_addex in none/tests/ppc64/test_isa_3_0.c and update the expected output. test_addex would fail to match the expected output before this patch.	2018-12-31 22:26:31 +01:00
Philippe Waroquiers	9966fa6b69	Add memcheck/tests/vbit-test/vbit-test-sec in .gitignore	2018-12-29 10:23:35 +01:00
Philippe Waroquiers	ed1c1ef744	Some more .exp changes following --show-error-list new option A few .exp files (not tested on amd64) have to be changed to have the messages in the new order: Use --track-origins=yes to see where uninitialised values come from For lists of detected and suppressed errors, rerun with: -s	2018-12-29 10:20:33 +01:00
Philippe Waroquiers	4962900a13	Fix the name of the option in the FIXED BUGS section	2018-12-29 00:25:34 +01:00
Philippe Waroquiers	f3a3eadf36	Document new options --show-error-list=no\|yes and -s in NEWS	2018-12-29 00:16:46 +01:00
Philippe Waroquiers	9efc7e80f2	Document the new options --show-error-list and -s	2018-12-28 19:33:06 +01:00
Philippe Waroquiers	cfae4f70a6	Modify .exp files following the new error message. Change: For counts of detected and suppressed errors, rerun with: -v to For lists of detected and suppressed errors, rerun with: -s	2018-12-28 19:33:00 +01:00
Philippe Waroquiers	d680f66465	Implement option --show-error-list=no\|yes -s This option allows to list the detected errors and show the used suppressions without increasing the verbosity. Increasing the verbosity also activates a lot of messages that are often not very useful for the user. So, this option allows to see the list of errors and used suppressions independently of the verbosity. Note if a high verbosity is selected, the behaviour is unchanged. In other words, when specifying -v, the list of detected errors and the used suppressions are still shown, even if --show-error-list=yes and -s are not used.	2018-12-28 19:32:53 +01:00
Philippe Waroquiers	36bf7c0647	Factorize producing the 'For counts of detected and suppressed errors' msg Each tool producing errors had identical code to produce this msg. Factorize the production of the message in m_main.c This prepares the work to have a specific option to show the list of detected errors and the count of suppressed errors. This has a (small) visible effect on the output of memcheck: Instead of producing For counts of detected and suppressed errors, rerun with: -v Use --track-origins=yes to see where uninitialised values come from memcheck now produces: Use --track-origins=yes to see where uninitialised values come from For counts of detected and suppressed errors, rerun with: -v i.e. the track origin and counts of errors msg are inverted.	2018-12-23 23:45:33 +01:00
Mark Wielaard	39f0abfc92	Add vbit-test-sec.vgtest and vbit-test-sec.stderr.exp to EXTRA_DIST.	2018-12-23 23:42:27 +01:00
Mark Wielaard	087979e467	Mention 402481 as fixed in NEWS.	2018-12-23 23:11:42 +01:00
Khem Raj	022f5af61b	tests/amd64: Do not clobber %rsp register This is seen with gcc-9.0 compiler now which is fix that gcc community did recently https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52813 Signed-off-by: Khem Raj <raj.khem@gmail.com>	2018-12-23 23:09:28 +01:00
Mark Wielaard	49ca1853fc	Also test memcheck/tests/vbit-test on any secondary arch. If we are building a secondary arch then also build and run the memcheck vbit-test for that architecture.	2018-12-23 22:20:44 +01:00
Julian Seward	d43c20b391	Bug 402481 - vbit-test fails on x86 for Iop_CmpEQ64 iselInt64Expr Sar64(Sub64(t14,Shr64(t14,0x1:I8)),0x3F:I8). Fixes the failure by implementing Iop_Sar64 in the x86 back end.	2018-12-23 22:02:03 +01:00
Philippe Waroquiers	59f0855049	Fix 402395 coregrind/vgdb-invoker-solaris.c: 2 * poor error checking 2 size_t variables were used as return value of read syscalls, while ssize_t must be used.	2018-12-23 14:49:25 +01:00
Julian Seward	3b2f8bf69e	amd64 back end: generate improved SIMD64 code. For most SIMD operations that happen on 64-bit values (as would arise from MMX instructions, for example, such as Add16x4, CmpEQ32x2, etc), generate code that performs the operation using SSE/SSE2 instructions on values in the low halves of XMM registers. This is much more efficient than the previous scheme of calling out to helper functions written in C. There are still a few SIMD64 operations done via helpers, though.	2018-12-22 19:01:50 +01:00
Julian Seward	b17d5ffdb8	amd64 back end: generate better code for 2x64<-->V128 and 4x64<-->V256 transfers .. .. by adding support for MOVQ xmm/ireg and using that to implement 64HLtoV128, 4x64toV256 and their inverses. This reduces the number of instructions, removes the use of memory as an intermediary, and avoids store-forwarding stalls.	2018-12-22 18:04:42 +01:00
Julian Seward	dda0d80f3d	amd64 pipeline: improve performance of cvtdq2ps and cvtps2dq (128 and 256 bit versions) .. .. by giving them their own vector IROps rather than doing each lane individually.	2018-12-22 16:11:39 +01:00
Julian Seward	901f3d3813	amd64 back end: generate better code for 128/256 bit vector shifts by immediate. n-i-bz.	2018-12-22 13:34:11 +01:00
Julian Seward	b078fabb56	amd64 pipeline: generate much better code for pshufb mm/xmm/ymm. n-i-bz. pshufb mm/xmm/ymm rearranges byte lanes in vector registers. It's fairly widely used, but we generated terrible code for it. With this patch, we just generate, at the back end, pshufb plus a bit of masking, which is a great improvement.	2018-12-22 07:23:00 +01:00
Julian Seward	6cb6bdbd0a	amd64 hosts: detect SSSE3 (not SSE3) capabilities on the host. As-yet unused. n-i-bz.	2018-12-22 06:06:19 +01:00
Julian Seward	3af8e12b0d	Fix memcheck/tests/undef_malloc_args failure. Try harder to trigger a memcheck error if a value is (partially) undefined.	2018-12-20 22:47:00 +01:00
Julian Seward	01f1936b12	Adjust ppc set_AV_CR6 computation to help Memcheck instrumentation. * changes set_AV_CR6 so that it does scalar comparisons against zero, rather than sometimes against an all-ones word. This is something that Memcheck can instrument exactly. * in Memcheck, requests expensive instrumentation of Iop_Cmp{EQ,NE}64 by default on ppc64le. https://bugs.kde.org/show_bug.cgi?id=386945#c62	2018-12-20 22:46:59 +01:00
Mark Wielaard	3ef4b2c780	Implement ppc64 lxvb16x as 128-bit vector load with reversed double words. This makes it possible for memcheck to know which part of the 128bit vector is defined, even if the load is partly beyond an addressable block. Partially resolves bug 386945.	2018-12-20 22:46:59 +01:00
Mark Wielaard	8d12697b15	memcheck: Allow unaligned loads of 128bit vectors on ppc64[le]. On powerpc partial unaligned loads of vectors from partially invalid addresses are OK and could be generated by our translation of lxvd2x. Adjust partial_load memcheck tests to allow partial loads of 16 byte vectors on powerpc64. Part of resolving bug #386945.	2018-12-20 22:46:59 +01:00
Mark Wielaard	98a73de1c0	Implement ppc64 lxvd2x as 128-bit load with double word swap for ppc64le. This makes it possible for memcheck to know which part of the 128bit vector is defined, even if the load is partly beyond an addressable block. Partially resolves bug 386945.	2018-12-20 22:46:59 +01:00
Mark Wielaard	5ecdecdcd3	memcheck: Allow unaligned loads of words on ppc64[le]. On powerpc partial unaligned loads of words from partially invalid addresses are OK and could be generated by our translation of ldbrx. Adjust partial_load memcheck tests to allow partial loads of words on powerpc64. Part of resolving bug #386945.	2018-12-20 22:46:59 +01:00
Mark Wielaard	0ed17bc9f6	Implement ppc64 ldbrx as 64-bit load and Iop_Reverse8sIn64_x1. This makes it possible for memcheck to analyse the new gcc strcmp inlined code correctly even if the ldbrx load is partly beyond an addressable block. Partially resolves bug 386945.	2018-12-20 22:46:59 +01:00
Bart Van Assche	5b4029b8cc	drd/tests/tsan_thread_wrappers_pthread.h: Fix MyThread::ThreadBody() See also https://bugs.kde.org/show_bug.cgi?id=402341.	2018-12-19 18:13:31 -08:00
Mark Wielaard	a751b5be01	PR402134 assert fail in mc_translate.c (noteTmpUsesIn) Iex_VECRET on arm64 This happens when processing openssl aes_v8_set_encrypt_key (aesv8-armx.S:133). The noteTmpUsesIn () function is new since PR387664 Memcheck: make expensive-definedness-checks be the default. It didn't handle Iex_VECRET which is used in the arm64 crypto instruction dirty handlers.	2018-12-19 20:52:29 +01:00
Mark Wielaard	e4dde1327e	PR402327 Warning: DWARF2 CFI reader: unhandled DW_OP_ opcode 0x13 DW_OP_drop readdwarf.c (dwarfexpr_to_dag) didn't handle DW_OP_drop. Implement it by simply popping the last element on the stack.	2018-12-19 20:14:03 +01:00
Mark Wielaard	2e2ae5bda8	Implement minimal ptrace support for ppc64[le]-linux.	2018-12-14 14:41:57 +01:00
Mark Wielaard	43fe4bc236	arm64: Fix PTRACE_TRACEME memcheck/tests/linux/getregset.vgtest testcase. The sys_ptrace post didn't mark the thread as being in traceme mode. This occassionally would make the memcheck/tests/linux/getregset.vgtest testcase fail. With this patch it reliably passes.	2018-12-14 14:32:27 +01:00
Petar Jovanovic	c4ab123605	mips64: fix build break introduced by `be7a730` Follow up to commit `be7a730045` that broke the build for mips64.	2018-12-13 16:20:28 +01:00
Petar Jovanovic	71be91d2dd	make outputs of drd/tests/fork* deterministic Wait for children to finish before terminating the main process. This fixes occasional failures of the following tests: drd/tests/fork-parallel (stderr) drd/tests/fork-serial (stderr)	2018-12-12 17:53:43 +00:00
Mark Wielaard	be7a730045	Mark helper regs defined in final_tidyup before freeres_wrapper call. In final_tidyup we setup the guest to call the freeres_wrapper, which will (possibly) call __gnu_cxx::__freeres() and/or __libc_freeres(). In a couple of cases (ppc64be, ppc64le and mips32) this involves setting up one or more helper registers. Since we setup these guest registers we should make sure to mark them as fully defined. Otherwise we might see spurious warnings about undefined value usage if the guest register happened to not be fully defined before. This fixes PR402006.	2018-12-12 14:15:28 +01:00
Nicholas Nethercote	46fb3eb81c	Fix path handling in the new Cachegrind and Callgrind tests.	2018-12-12 20:52:33 +11:00
Nicholas Nethercote	e6e8377521	Add a --show-percs option to cg_annotate and callgrind_annotate. Because it's very useful. As part of this, the "percentage of events annotated" numbers at the bottom of the output is changed to "events annotated" so that --show-percs doesn't compute a percentage of a percentage. Example output lines: ``` 4,967,137,442 (100.0%) PROGRAM TOTALS 4,543 (25.23%) 17,566 ( 0.43%) 47,993 ( 0.92%) /build/glibc-OTsEL5/glibc-2.27/elf/dl-lookup.c 1 ( 0.01%) 2,000,001 (49.29%) 3,000,004 (57.36%) for (int i = 0; i < 1000000; i++) { ``` The commit also adds some much-needed tests for cg_annotate and callgrind_annotate.	2018-12-10 14:14:20 +11:00
Mark Wielaard	0c701ba2a4	Fix sigkill.stderr.exp for glibc-2.28. glibc 2.28 filters out some bad signal numbers and returns Invalid argument instead of passing such bad signal numbers the kernel sigaction syscall. So we won't see such bad signal numbers and won't print "bad signal number" ourselves. Add a new memcheck/tests/sigkill.stderr.exp-glibc-2.28 to catch this case.	2018-12-07 14:05:15 +01:00
Mark Wielaard	a0d97e88ec	Bug 401822 Fix asm constraints for ppc64 jm-vmx jm-insns.c test. The mfvscr and vor instructions in jm-insns.c had a "=vr" constraint. This should have been an "=v" constraint. This resolved assembler warnings and the testcase failing on ppc64le with gcc 8.2 and binutils 2.30.	2018-12-06 20:52:36 +01:00
Mark Wielaard	cf00e0e59d	Bug 401627 - Add wcsncmp override and testcase. glibc 2.28 added an avx2 optimized variant of wstrncmp which memcheck cannot proof correct. Add a simple override in vg_replace_strmem.c.	2018-12-06 16:40:34 +01:00
Andreas Arnez	43699f30f6	Add Emacs configuration files This adds a configuration file ".dir-locals.el" for Emacs to the topmost directory of the Valgrind source tree, and another such file to the directory drd/tests. These files contain per-directory local Emacs variables. The following settings are performed: * The base C style is set to "Linux", indentation is set to 3 columns per level, the use of tabs for indentation is disabled, and the fill column is set to 80. * The source files in drd/tests use 2 instead of 3 columns per indentation level.	2018-12-05 18:15:57 -08:00
Mark Wielaard	206e81e8ad	Fix tsan_unittest.cpp compile error with older compilers. Older compilers (g++ 4.8.5) don't like '>>': error: ‘>>’ should be ‘> >’ within a nested template argument list. Add an extra space.	2018-12-02 12:39:27 +01:00
Bart Van Assche	65dcbc70db	drd/tests: Fix remaining gcc 8 compiler warnings	2018-12-01 21:53:59 -08:00

1 2 3 4 5 ...

16216 Commits