Commit Graph

16247 Commits

Author SHA1 Message Date
Rhys Kidd
b06c2c7e23 config: remove unrequired AC_HEADER_STDC
Autoconf says:
"This macro is obsolescent, as current systems have conforming
header files. New programs need not use this macro".

Was previously required to ensure the system has C header files conforming
to ANSI C89 (ISO C90). Specifically, this macro checks for stdlib.h,
stdarg.h, string.h, and float.h.

This autoconf option was used to provide conditional fallback support
via defined STDC_HEADERS.

valgrind does not utilize conditional fallback support so, so this macro
is both obsolete and unused, so let's drop it.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
2019-03-11 22:49:37 +11:00
Julian Seward
dffe3a2d1b Add a 3_14_BUGSTATUS.txt file and add to it all bugs reported since 3.14 was release.
At least, the bugs are post-triaged ones, so some have been removed.
2019-03-10 11:11:16 +01:00
Julian Seward
4ee1dd2778 bb_to_IR(): increase assertion limits on the maximum size of self-checking translations. n-i-bz. 2019-03-09 17:58:11 +01:00
Petar Jovanovic
3217459c72 modify massif/tests/mmapunmap.vgtest to comply with glibc change
The change in the glibc version (2.27 -> 2.28) results in one additional
function call being present in the backtrace for mips64, which leads to the
line to be checked to be out of bounds.

Changed the post line in mmapunmap.vgtest to work around this.

This fixes massif/tests/mmapunmap failure on mips64.

Patch by Stefan Maksimovic.
2019-03-04 19:26:37 +01:00
Mark Wielaard
7f74ba249e Bug 405079 - unhandled ppc64le-linux syscall: 131 (quotactl)
quotactl is really a "generic" linux syscall that just happened to not
have been hooked up for ppc64le. Add it to syswrap-ppc64-linux.c.
2019-03-04 17:22:56 +01:00
Julian Seward
6bcb493b03 Adjust the built-in profiler so that it can try to count host insns as well as guest insns. n-i-bz. 2019-02-26 09:57:57 +01:00
Julian Seward
85545d9d25 Fix another format string signedness warning, arm64-linux only. n-i-bz. 2019-02-25 11:48:43 +01:00
Mark Wielaard
256cf43c5e memcheck powerpc subfe x, x, x initializes x to 0 or -1 based on CA
GCC might use subfe x, x, x to initialize x to 0 or -1, based on
whether the carry flag is set. This happens in some cases when g++
compiles resetting a unique_ptr. The "trick" used by the compiler is
that it can AND a pointer with the register x (now 0x0 or 0xffffffff)
to set something to NULL or to the given pointer.

subfe is implemented as rD = (log not)rA + rB + XER[CA]
if we instead implement it as rD = rB - rA - (XER[CA] ^ 1)
then memcheck can see that rB and Ra cancel each other out if they
are the same.

https://bugs.kde.org/show_bug.cgi?id=404054
2019-02-21 17:21:53 +01:00
Carl Love
de7fc1a059 Fix missed changes from Rename some int<->fp conversion IROps patch
The previous commit 6b16f0e2a0 dated
Sat Jan 26 17:38:01 2019 by Julian Seward <jseward@acm.org> renamed some of
the int<->fp conversion Iops to add a trailing _DEP.  The patch missed
renaming two of the Iops.  This patch renames the missed Iops.
2019-02-05 10:19:01 -06:00
Julian Seward
e125eb3931 Make the DHAT viewer components be copied into the distribution tarball. Followup to 441bfc5f51 (dhat overhaul). 2019-02-03 10:31:15 +01:00
Julian Seward
15ac949bef Make the DHAT viewer components be copied into the install tree. Followup to 441bfc5f51 (dhat overhaul). 2019-02-03 10:06:36 +01:00
Julian Seward
cad6b8a984 Fix "make post-regtest-checks" after 441bfc5f51 (dhat overhaul). 2019-02-02 16:10:50 +01:00
Julian Seward
7094b51f0a Another -Wformat-signedness fix that was missed in dee1c5ac84. 2019-02-02 14:22:43 +01:00
Julian Seward
cadbb5d441 Enable -Wformat-signedness, if the compiler supports it. 2019-02-02 14:20:49 +01:00
Julian Seward
dee1c5ac84 Fix format string warnings from gcc9. No functional change (I think!) 2019-02-02 14:06:51 +01:00
Nicholas Nethercote
7e5fc882e9 Remove reference to non-existent *.post.exp files in dhat/tests/. 2019-02-02 07:41:02 +11:00
Nicholas Nethercote
f71002f1b5 Add missing stuff for a DHAT test. 2019-02-01 15:08:31 +11:00
Nicholas Nethercote
441bfc5f51 Overhaul DHAT.
This commit thoroughly overhauls DHAT, moving it out of the
"experimental" ghetto. It makes moderate changes to DHAT itself,
including dumping profiling data to a JSON format output file. It also
implements a new data viewer (as a web app, in dhat/dh_view.html).

The main benefits over the old DHAT are as follows.

- The separation of data collection and presentation means you can run a
  program once under DHAT and then sort the data in various ways. Also,
  full data is in the output file, and the viewer chooses what to omit.

- The data can be sorted in more ways than previously. Some of these
  sorts involve useful filters such as "short-lived" and "zero reads or
  zero writes".

- The tree structure view avoids the need to choose stack trace depth.
  This avoids both the problem of not enough depth (when records that
  should be distinct are combined, and may not contain enough
  information to be actionable) and the problem of too much depth (when
  records that should be combined are separated, making them seem less
  important than they really are).

- Byte and block measures are shown with a percentage relative to the
  global count, which helps gauge relative significance of different
  parts of the profile.

- Byte and blocks measures are also shown with an allocation rate
  (bytes and blocks per million instructions), which enables comparisons
  across multiple profiles, even if those profiles represent different
  workloads.

- Both global and per-node measurements are taken at the global heap
  peak ("At t-gmax"), which gives Massif-like insight into the point of
  peak memory use.

- The final/liftimes stats are a bit more useful than the old deaths
  stats. (E.g. the old deaths stats didn't take into account lifetimes
  of unfreed blocks.)

- The handling of realloc() has changed. The sequence `p = malloc(100);
  realloc(p, 200);` now increases the total block count by 2 and the
  total byte count by 300. Previously it increased them by 1 and 200.
  The new handling is a more operational view that better reflects the
  effect of allocations on performance. It makes a significant
  difference in the results, giving paths involving reallocation (e.g.
  repeated pushing to a growing vector) more prominence.

Other things of note:

- There is now testing, both regression tests that run within the
  standard test suite, and viewer-specific tests that cannot run within
  the standard test suite. The latter are run by loading
  dh_view.html?test=1 in a web browser.

- The commit puts all tool lists in Makefiles (and similar files) in the
  following consistent order: memcheck, cachegrind, callgrind, helgrind,
  drd, massif, dhat, lackey, none; exp-sgcheck, exp-bbv.

- A lot of fields in dh_main.c have been given more descriptive names.
  Those names now match those used in dh_view.js.
2019-02-01 14:54:34 +11:00
Julian Seward
b19f6882cf s390 back end: s390_isel_vec_expr_wrk: fix some enum type confusion. n-i-bz.
In s390_isel_vec_expr_wrk() there has been some assignments of enum-typed
values to variables of different enum types.  This fixes it.  It also adds a
few initialisations to variables of type HReg for safety against the
possibility of them being used uninitialised.  No functional change.  Tested
by Andreas Arnez.
2019-01-31 07:56:26 +01:00
Rhys Kidd
5cd48eed00 memcheck,macos: Fix vbit-test building on macOS x86 architectures. n-i-bz.
Secondary architectures on macOS are generally x86, which requires additional
LDFLAGS to be set to avoid linker errors.

apple clang (clang-800.0.42.1) error:
  ld: illegal text-relocation to '___stderrp' in /usr/lib/libSystem.dylib from '_main'
      in vbit_test_sec-main.o for architecture i386

Fixes: 49ca185 ("Also test memcheck/tests/vbit-test on any secondary arch.")
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
2019-01-29 01:34:27 -05:00
Philippe Waroquiers
e911f75ee3 Fix callgrind_annotate Use of uninitialized value in numeric gt (>)
When a callgrind dump file contains no event (at all I think),
then callgrind_annotate can produce the below error messages:
Ir sysCount sysTime  file:function
--------------------------------------------------------------------------------
Use of uninitialized value in numeric gt (>) at ../trunk_untouched/Inst/bin/callgrind_annotate line 957.
Use of uninitialized value in numeric gt (>) at ../trunk_untouched/Inst/bin/callgrind_annotate line 957.
Use of uninitialized value in numeric gt (>) at ../trunk_untouched/Inst/bin/callgrind_annotate line 957.
 .        .       .  /build/glibc-yWQXbR/glibc-2.24/csu/../csu/libc-start.c:(below main) [/lib/x86_64-linux-gnu/libc-2.24.so]
Use of uninitialized value in numeric gt (>) at ../trunk_untouched/Inst/bin/callgrind_annotate line 957.
Use of uninitialized value in numeric gt (>) at ../trunk_untouched/Inst/bin/callgrind_annotate line 957.
Use of uninitialized value in numeric gt (>) at ../trunk_untouched/Inst/bin/callgrind_annotate line 957.
 .        .       .  /build/glibc-yWQXbR/glibc-2.24/elf/../sysdeps/x86_64/dl-trampoline.h:_dl_runtime_resolve_xsave [/lib/x86_64-linux-gnu/ld-2.24.so]
Use of uninitialized value in numeric gt (>) at ../trunk_untouched/Inst/bin/callgrind_annotate line 957.
.....

The above can be produced by:
  run sleep 100 under callgrind.
  take some callgrind dumps after the startup.
  ./Inst/bin/callgrind_annotate --threshold=1  callgrind.out.31377.2

Check that the value is defined before doing the comparison.

Note: callgrind_annotate shows functions which have undefined costs
for all events (and I guess it would also show functions that have zero
costs for all events).
Maybe it would be better to not show at all such functions, rather than
show them with all '.'.
2019-01-27 13:12:42 +01:00
Philippe Waroquiers
f57661926b Fix callgrind_annotate --threshold=100 does not print all functions. 2019-01-27 12:36:54 +01:00
Philippe Waroquiers
423c754049 Update callgrind_annotate documentation.
Clarify the meaning of the threshold argument.
Document the per event thresholds that can be given as part
of the --sort option.
2019-01-27 12:32:32 +01:00
Philippe Waroquiers
50f76c756a Fix callgrind_annotate non deterministic order for equal total
Patch by Matthias Schwarzott
2019-01-27 11:15:30 +01:00
Philippe Waroquiers
52713e29c7 Sort the bug entries by bug nr, add a entry for a fixed bug. 2019-01-27 11:04:01 +01:00
Julian Seward
3e94579a5a Enable warning flag -Wenum-conversion if the compiler supports it.
This picks up some enum type confusion, and so looks useful.  Unfortunately
only Clang seems to have it; gcc doesn't.
2019-01-26 18:19:50 +01:00
Julian Seward
130ac30533 s390 front end: remove unused function 'put_gpr_int'. n-i-bz. 2019-01-26 18:18:28 +01:00
Julian Seward
2656009e6f amd64 pipeline: generate a much better translation for PMADDUBSW.
This seems pretty common in some codecs, and the existing translation
was somewhat longwinded.
2019-01-26 18:00:41 +01:00
Julian Seward
6b16f0e2a0 Rename some int<->fp conversion IROps for consistency. No functional change. n-i-bz.
2018-Dec-27: some of int<->fp conversion operations have been renamed so as to
have a trailing _DEP, meaning "deprecated".  This is because they don't
specify a rounding mode to be used for the conversion and so are
underspecified.  Their use should be replaced with equivalents that do specify
a rounding mode, either as a first argument or using a suffix on the name,
that indicates the rounding mode to use.
2019-01-26 17:38:01 +01:00
Julian Seward
a05a920edc VG_(discard_translations): try to avoid invalidating the entire VG_(tt_fast) cache. n-i-bz.
It is very commonly the case that a call to VG_(discard_translations) results
in the discarding of exactly one superblock.  In such cases, it's much cheaper
to find and invalidate the VG_(tt_fast) cache entry associated with the block,
than it is to invalidate the entire cache, because

(1) invalidating the fast cache is expensive, and

(2) repopulating the fast cache after invalidation is even more expensive.

For QEMU, which intensively invalidates individual translations (presumably
due to patching them), this reduces the fast-cache miss rate from circa one in
33 lookups to around one in 130 lookups.
2019-01-25 12:06:37 +01:00
Julian Seward
f4072abf6b Update. 2019-01-25 09:31:19 +01:00
Julian Seward
f96d131ce2 Bug 402781 - Redo the cache used to process indirect branch targets.
Implementation for x86-solaris and amd64-solaris.  This completes the
implementations for all targets.  Note these two are untested because I don't
have any way to test them.
2019-01-25 09:27:23 +01:00
Julian Seward
50bb127b1d Bug 402781 - Redo the cache used to process indirect branch targets.
[This commit contains an implementation for all targets except amd64-solaris
and x86-solaris, which will be completed shortly.]

In the baseline simulator, jumps to guest code addresses that are not known at
JIT time have to be looked up in a guest->host mapping table.  That means:
indirect branches, indirect calls and most commonly, returns.  Since there are
huge numbers of these (often 10+ million/second) the mapping mechanism needs
to be extremely cheap.

Currently, this is implemented using a direct-mapped cache, VG_(tt_fast), with
2^15 (guest_addr, host_addr) pairs.  This is queried in handwritten assembly
in VG_(disp_cp_xindir) in dispatch-<arch>-<os>.S.  If there is a miss in the
cache then we fall back out to C land, and do a slow lookup using
VG_(search_transtab).

Given that the size of the translation table(s) in recent years has expanded
significantly in order to keep pace with increasing application sizes, two bad
things have happened: (1) the cost of a miss in the fast cache has risen
significantly, and (2) the miss rate on the fast cache has also increased
significantly.  This means that large (~ one-million-basic-blocks-JITted)
applications that run for a long time end up spending a lot of time in
VG_(search_transtab).

The proposed fix is to increase associativity of the fast cache, from 1
(direct mapped) to 4.  Simulations of various cache configurations using
indirect-branch traces from a large application show that is the best of
various configurations.  In an extreme case with 5.7 billion indirect
branches:

* The increase of associativity from 1 way to 4 way, whilst keeping the
  overall cache size the same (32k guest/host pairs), reduces the miss rate by
  around a factor of 3, from 4.02% to 1.30%.

* The use of a slightly better hash function than merely slicing off the
  bottom 15 bits of the address, reduces the miss rate further, from 1.30% to
  0.53%.

Overall the VG_(tt_fast) miss rate is almost unchanged on small workloads, but
reduced by a factor of up to almost 8 on large workloads.

By implementing each (4-entry) cache set using a move-to-front scheme in the
case of hits in ways 1, 2 or 3, the vast majority of hits can be made to
happen in way 0.  Hence the cost of having this extra associativity is almost
zero in the case of a hit.  The improved hash function costs an extra 2 ALU
shots (a shift and an xor) but overall this seems performance neutral to a
win.
2019-01-25 09:14:56 +01:00
Andreas Arnez
467c7c4c96 Bug 403552 s390x: Fix vector facility bit number
The wrong bit number was used when checking for the vector facility.  This
can result in a fatal emulation error: "Encountered an instruction that
requires the vector facility.  That facility is not available on this
host."

In many cases the wrong facility bit was usually set as well, hence
nothing bad happened.  But when running Valgrind within a Qemu/KVM guest,
the wrong bit was not (always?) set and the emulation error occurred.

This fix simply corrects the vector facility bit number, changing it from
128 to 129.
2019-01-24 11:11:51 +01:00
Philippe Waroquiers
d7d8231750 Fix false positive 'Conditional jump or move' on amd64 64 bits ptracing 32 bits.
PTRACE_GET_THREAD_AREA is not handled by amd64 linux syswrap, which leads
to false positive errors in 64 bits program ptrace-ing 32 bits processes.

For example, the below error was wrongly reported on GDB:
==25377== Conditional jump or move depends on uninitialised value(s)
==25377==    at 0x8A1D7EC: td_thr_get_info (td_thr_get_info.c:35)
==25377==    by 0x526819: thread_from_lwp(thread_info*, ptid_t) (linux-thread-db.c:417)
==25377==    by 0x5281D4: thread_db_notice_clone(ptid_t, ptid_t) (linux-thread-db.c:442)
==25377==    by 0x51773B: linux_handle_extended_wait(lwp_info*, int) (linux-nat.c:2027)
....
==25377==  Uninitialised value was created by a stack allocation
==25377==    at 0x69A360: x86_linux_get_thread_area(int, void*, unsigned int*) (x86-linux-nat.c:278)

Fix this by implementing PTRACE_GET|SET_THREAD_AREA on amd64.
2019-01-12 15:35:59 +01:00
Mark Wielaard
3528f84037 readdwarf3.c (parse_type_DIE): Accept DW_TAG_subrange_type with DW_AT_count
GCC9 generates a subrange_type with a lower_bound and count, but no
upper_bound attribute. This simply means the upper bound is lower
plus count.
2019-01-11 21:52:58 +01:00
Mark Wielaard
c512949082 Bug 402480 Do not use %esp in clobber list.
This is the same fix as for amd64-linux, but now for x86-linux.
2019-01-11 20:00:21 +01:00
Mark Wielaard
2c1f016e63 Bug 402519 - POWER 3.0 addex instruction incorrectly implemented
addex uses OV as carry in and carry out. For all other instructions
OV is the signed overflow flag. And instructions like adde use CA
as carry.

Replace set_XER_OV_OV32 with set_XER_OV_OV32_ADDEX, which will
call calculate_XER_CA_64 and calculate_XER_CA_32, but with OV
as input, and sets OV and OV32.

Enable test_addex in none/tests/ppc64/test_isa_3_0.c and update
the expected output. test_addex would fail to match the expected
output before this patch.
2018-12-31 22:26:31 +01:00
Philippe Waroquiers
9966fa6b69 Add memcheck/tests/vbit-test/vbit-test-sec in .gitignore 2018-12-29 10:23:35 +01:00
Philippe Waroquiers
ed1c1ef744 Some more .exp changes following --show-error-list new option
A few .exp files (not tested on amd64) have to be changed to
have the messages in the new order:
  Use --track-origins=yes to see where uninitialised values come from
  For lists of detected and suppressed errors, rerun with: -s
2018-12-29 10:20:33 +01:00
Philippe Waroquiers
4962900a13 Fix the name of the option in the FIXED BUGS section 2018-12-29 00:25:34 +01:00
Philippe Waroquiers
f3a3eadf36 Document new options --show-error-list=no|yes and -s in NEWS 2018-12-29 00:16:46 +01:00
Philippe Waroquiers
9efc7e80f2 Document the new options --show-error-list and -s 2018-12-28 19:33:06 +01:00
Philippe Waroquiers
cfae4f70a6 Modify .exp files following the new error message.
Change:
For counts of detected and suppressed errors, rerun with: -v
to
For lists of detected and suppressed errors, rerun with: -s
2018-12-28 19:33:00 +01:00
Philippe Waroquiers
d680f66465 Implement option --show-error-list=no|yes -s
This option allows to list the detected errors and show the used
suppressions without increasing the verbosity.
Increasing the verbosity also activates a lot of messages that
are often not very useful for the user.
So, this option allows to see the list of errors and used suppressions
independently of the verbosity.

Note if a high verbosity is selected, the behaviour is unchanged.
In other words, when specifying -v, the list of detected errors
and the used suppressions are still shown, even if
--show-error-list=yes and -s are not used.
2018-12-28 19:32:53 +01:00
Philippe Waroquiers
36bf7c0647 Factorize producing the 'For counts of detected and suppressed errors' msg
Each tool producing errors had identical code to produce this msg.
Factorize the production of the message in m_main.c

This prepares the work to have a specific option to show the list
of detected errors and the count of suppressed errors.

This has a (small) visible effect on the output of memcheck:
Instead of producing
  For counts of detected and suppressed errors, rerun with: -v
  Use --track-origins=yes to see where uninitialised values come from
memcheck now produces:
  Use --track-origins=yes to see where uninitialised values come from
  For counts of detected and suppressed errors, rerun with: -v

i.e. the track origin and counts of errors msg are inverted.
2018-12-23 23:45:33 +01:00
Mark Wielaard
39f0abfc92 Add vbit-test-sec.vgtest and vbit-test-sec.stderr.exp to EXTRA_DIST. 2018-12-23 23:42:27 +01:00
Mark Wielaard
087979e467 Mention 402481 as fixed in NEWS. 2018-12-23 23:11:42 +01:00
Khem Raj
022f5af61b tests/amd64: Do not clobber %rsp register
This is seen with gcc-9.0 compiler now which is fix that gcc community
did recently
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52813

Signed-off-by: Khem Raj <raj.khem@gmail.com>
2018-12-23 23:09:28 +01:00
Mark Wielaard
49ca1853fc Also test memcheck/tests/vbit-test on any secondary arch.
If we are building a secondary arch then also build and run the
memcheck vbit-test for that architecture.
2018-12-23 22:20:44 +01:00