Previous commit (cceed053ce876560b9a7512125dd93c7fa059778) broke the build
for MIPS architecture.
Update the code in VG_(get_StackTrace_wrk) to reflect the changes made in
the previous commit.
This patch implements the flag --delta-stacktrace=yes/no.
Yes indicates to calculate the full history stack traces by
changing just the last frame if no call/return instruction was
executed.
This can speed up helgrind by up to 25%.
This flags is currently set to yes only on linux x86 and amd64, as some
platform dependent validation of the used heuristics is needed before
setting the default to yes on a platform. See function check_cached_rcec_ok
in libhb_core.c for more details about how to validate/check the behaviour
on a new platform.
* main is more likely to be an outermost frame rather than an innermost
frame. So, searching from the outermost frame will more quickly find it.
* Also, in case the stacktrace contains twice the main functionn, this
ensures we only removes the functions below the outermost main.
Having 2 mains in a stacktrace does not happen normally.
However, this prepares for some future commit that improves
the outer/inner setup: the outer will append the inner guest stack trace.
The inner stack trace sometimes already contains main.
Searching from outermost frame main allows to keep the interesting
part of the stacktrace.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@16135
Same kind of problems as explained and fixed in revision 15720:
In some cases, unwinding always retrieves the same pc/sp/bp.
Fix for 64 bits is similar: stop unwinding if the previous sp is >= new sp
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@15794
On RHEL7 x86 32 bits, Valgrind unwinder cannot properly unwind
the stack just after a thread creation : the unwinder always retrieves
the same pc/sp/bp.
See below for an example.
This has as consequences that some stack traces are bigger than
needed (i.e. they always fill up the ips array). If
--merge-recursive-frames is given, then the unwinder enters in an
infinite loop (as identical frames will be merged, and the ips array
will never be filled in).
Thi patch adds an additional exit condition : after unwinding
a frame, if the previous sp is >= new sp, then unwinding stops.
Patch has been tested on debian 8/x86, RHEL7/x86.
0x0417db67 <+55>: mov 0x18(%esp),%ebx
0x0417db6b <+59>: mov 0x28(%esp),%edi
0x0417db6f <+63>: mov $0x78,%eax
0x0417db74 <+68>: mov %ebx,(%ecx)
0x0417db76 <+70>: int $0x80
=> 0x0417db78 <+72>: pop %edi
0x0417db79 <+73>: pop %esi
0x0417db7a <+74>: pop %ebx
0x0417db7b <+75>: test %eax,%eax
Valgrind stacktrace gives:
==21261== at 0x417DB78: clone (clone.S:110)
==21261== by 0x424702F: ???
==21261== by 0x424702F: ???
==21261== by 0x424702F: ???
==21261== by 0x424702F: ???
==21261== by 0x424702F: ???
==21261== by 0x424702F: ???
==21261== by 0x424702F: ???
...
(till the array of ips is full)
while gdb stacktrace gives:
(gdb) bt
#0 clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:110
#1 0x00000000 in ?? ()
(gdb) p $pc
$2 = (void (*)()) 0x417db78 <clone+72>
(gdb)
With the fix, valgrind gives:
==21261== at 0x417DB78: clone (clone.S:110)
==21261== by 0x424702F: ???
which looks more reasonable.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@15729
frame-pointer chains (via EBP), don't continue if EBP doesn't contain
a 4-aligned value. A misaligned EBP is almost certainly invalid --
hence, no loss in unwind capability here -- and the misaligned access
causes gcc 5.1 ubsan alignment checks to fail. So avoid them.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@15400
should be able to access the redzone.
So, when computing fp_min, substract the redzone.
Currently, only amd64 and ppc64 have a non 0 redzone.
Regtested on amd64 and ppc64le, no regression.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@15309
e.g. perf/memrw is improved by 2% to 3% with this patch.
The unwinding code on x86 is trying to unwind using
either the %ebp-chain or CFI unwinding.
If these 2 techniques fail, then it tries to unwind
using FPO (PDB) debug info.
However, unless running wine or similar, there will never be
such FPO/PDB info.
The function VG_(use_FPO_info) is thus called for nothing
for each 'end of stack'. This function scans all the loaded di
to find a debug info that has some FP, to not find anything.
With this patch, the unwind code on x86 will only call VG_(use_FPO_info) if
some FPO/PDB info was loaded.
The fact that FPO/PDB info was loaded is cached and updated similarly to
cfi cache : each time new debug info is loaded, the cache value is refreshed
using the debuginfo generation.
The patch also changes the name of VG_(CF_info_generation)
to VG_(debuginfo_generation), as this generation is changed for
any kind of load or unload of debug info, not only for CFI based debug
info
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@15293
Update ifdefs around the bogus-LR-value-handling code to allow ppc64le to
behave as ppc64 (BE) does.
This fixes the overlap test case, where the stack unwinding code was
otherwise coming up with bad instruction pointers.
This patch fixes Vagrind bugzilla 347322.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@15183
Valgrind aspects, to match vex r3124.
See bug 339778 - Linux/TileGx platform support to Valgrind
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@15080
bz#344560
- Also fixes memcheck/tests/badpoll test on OS X
- Problem occurs because the guest stack seen in a system call pre or post
function happens to not have a correct topmost stack frame, as Darwin system
call stubs do not start with the usual function prolog.
- New regression test case added.
- Thanks to Greg Banks for research, patch and test case.
Before:
== 587 tests, 240 stderr failures, 22 stdout failures, 0 stderrB failures, 0 stdoutB failures, 31 post failures ==
After:
== 588 tests, 239 stderr failures, 22 stdout failures, 0 stderrB failures, 0 stdoutB failures, 31 post failures ==
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@14985
Changes VG_(describe_IP) to return the untruncated result in a statically
allocated local buffer. Fix call sites and update two .exp files who had
truncated names.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@14685
This patch changes the interface and behaviour of VG_(demangle) and
VG_(maybe_Z_demangle). Instead of copying the demangled name into a
fixed sized buffer that is passed in from the caller (HChar *buf, Int n_buf),
the demangling functions will now return a pointer to the full-length
demangled name (HChar **result). It is the caller's responsiblilty to
make a copy if needed.
This change in function parameters ripples upward
- first: to get_sym_name
- then to the convenience wrappers
- VG_(get_fnname)
- VG_(get_fnname_w_offset)
- VG_(get_fnname_if_entry)
- VG_(get_fnname_raw)
- VG_(get_fnname_no_cxx_demangle)
- VG_(get_datasym_and_offset)
The changes in foComplete then forces the arguments of
- VG_(get_objname) to be changed as well
There are some issues regarding the ownership and persistence of
character strings to consider.
In general, the returned character string is owned by "somebody else"
which means the caller must not free it. Also, the caller must not
modify the returned string as it possibly points to read only memory.
Additionally, the returned string is not necessarily persistent. Here are
the scenarios:
- the returned string is a demangled function name in which case the
memory holding the string will be freed when the demangler is called again.
- the returned string hangs off of a DebugInfo structure in which case
it will be freed when the DebugInfo is discarded
- the returned string hangs off of a segment in the address space manager
in which case it may be overwritten when the segment is merged with
another segment
So the rule of thunb here is: if in doubt strdup the string.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@14664
At various places, there were either some assumption that the 'end'
boundary (highest address) was either not included, included,
or was the highest addressable word, or the highest addressable byte.
This e.g. was very visible when doing:
./vg-in-place -d -d ./helgrind/tests/tc01_simple_race|&grep regi
giving
--24040:2:stacks register 0xBEDB4000-0xBEDB4FFF as stack 0
--24040:2:stacks register 0x402C000-0x4A2C000 as stack 1
showing that the main stack end was (on x86) not the highest word
but the highest byte, while for the thread 1, the registered end
was a byte not part of the stack.
The attached patch ensures that stack bounds semantic are documented and
consistent. Also, some of the stack handling code is factorised.
The convention that the patch ensures and documents is:
start is the lowest addressable byte, end is the highest addressable byte.
(the words 'min' and 'max' have been kept when already used, as this wording is
consistent with the new semantic of start/end).
In various debug log, used brackets [ and ] to make clear that
both bounds are included.
The code to guess and register the client stack was duplicated
in all the platform specific syswrap-<plat>-<os>.c files.
Code has been factorised in syswrap-generic.c
The patch has been regression tested on
x86, amd64, ppc32/64, s390x.
It has been compiled and one test run on arm64.
Not compiled/not tested on darwin, android, mips32/64, arm
More in details, the patch does the following:
coregrind/pub_core_aspacemgr.h
include/valgrind.h
include/pub_tool_machine.h
coregrind/pub_core_scheduler.h
coregrind/pub_core_stacks.h
- document start/end semantic in various functions
also in pub_tool_machine.h:
- replaces unclear 'bottommost address' by 'lowest address'
(unclear as stack bottom is or at least can be interpreted as
the 'functional' bottom of the stack, which is the highest
address for 'stack growing downwards').
coregrind/pub_core_initimg.h
replace unclear clstack_top by clstack_end
coregrind/m_main.c
updated to clstack_end
coregrind/pub_core_threadstate.h
renamed client_stack_highest_word to client_stack_highest_byte
coregrind/m_scheduler/scheduler.c
computes client_stack_highest_byte as the highest addressable byte
Update comments in call to VG_(show_sched_status)
coregrind/m_machine.c
coregrind/m_stacktrace.c
updated to client_stack_highest_byte, and switched
stack_lowest/highest_word to stack_lowest/highest_byte accordingly
coregrind/m_stacks.c
clarify semantic of start/end,
added a comment to indicate why we invert start/end in register call
(note that the code find_stack_by_addr was already assuming that
end was included as the checks were doing e.g.
sp >= i->start && sp <= i->end
coregrind/pub_core_clientstate.h
coregrind/m_clientstate.c
renames Addr VG_(clstk_base) to Addr VG_(clstk_start_base)
(start to indicate it is the lowest address, base suffix kept
to indicate it is the initial lowest address).
coregrind/m_initimg/initimg-darwin.c
updated to VG_(clstk_start_base)
replace unclear iicii.clstack_top by iicii.clstack_end
updated clstack_max_size computation according to both bounds included.
coregrind/m_initimg/initimg-linux.c
updated to VG_(clstk_start_base)
updated VG_(clstk_end) computation according to both bounds included.
replace unclear iicii.clstack_top by iicii.clstack_end
coregrind/pub_core_aspacemgr.h
extern Addr VG_(am_startup) : clarify semantic of the returned value
coregrind/m_aspacemgr/aspacemgr-linux.c
removed a copy of a comment that was already in pub_core_aspacemgr.h
(avoid double maintenance)
renamed unclear suggested_clstack_top to suggested_clstack_end
(note that here, it looks like suggested_clstack_top was already
the last addressable byte)
* factorisation of the stack guessing and registration causes
mechanical changes in the following files:
coregrind/m_syswrap/syswrap-ppc64-linux.c
coregrind/m_syswrap/syswrap-x86-darwin.c
coregrind/m_syswrap/syswrap-amd64-linux.c
coregrind/m_syswrap/syswrap-arm-linux.c
coregrind/m_syswrap/syswrap-generic.c
coregrind/m_syswrap/syswrap-mips64-linux.c
coregrind/m_syswrap/syswrap-ppc32-linux.c
coregrind/m_syswrap/syswrap-amd64-darwin.c
coregrind/m_syswrap/syswrap-mips32-linux.c
coregrind/m_syswrap/priv_syswrap-generic.h
coregrind/m_syswrap/syswrap-x86-linux.c
coregrind/m_syswrap/syswrap-s390x-linux.c
coregrind/m_syswrap/syswrap-darwin.c
coregrind/m_syswrap/syswrap-arm64-linux.c
Some files to look at more in details:
syswrap-darwin.c : the handling of sysctl(kern.usrstack) looked
buggy to me, and has probably be made correct by the fact that
VG_(clstk_end) is now the last addressable byte. However,unsure
about this, as I could not find any documentation about
sysctl(kern.usrstack). I only find several occurences on the web,
showing that the result of this is page aligned, which I guess
means it must be 1+ the last addressable byte.
syswrap-x86-darwin.c and syswrap-amd64-darwin.c
I suspect the code that was computing client_stack_highest_word
was wrong, and the patch makes it correct.
syswrap-mips64-linux.c
not sure what to do for this code. This is the only code
that was guessing the stack differently from others.
Kept (almost) untouched. To be discussed with mips maintainers.
coregrind/pub_core_libcassert.h
coregrind/m_libcassert.c
* void VG_(show_sched_status):
renamed Bool valgrind_stack_usage to Bool stack_usage
if stack_usage, shows both the valgrind stack usage and
the client stack boundaries
coregrind/m_scheduler/scheduler.c
coregrind/m_gdbserver/server.c
coregrind/m_gdbserver/remote-utils.c
Updated comments in callers to VG_(show_sched_status)
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@14392
to add PPC64 LE support. The other two patches can be found in Bugzillas
334834 and 334836. The commit does not have a VEX commit associated with it.
POWER PC, add initial Little Endian support
The IBM POWER processor now supports both Big Endian and Little Endian.
This patch renames the #defines with the name ppc64 to ppc64be for the BE
specific code. This patch adds the Little Endian #define ppc64le to the
Additionally, a few functions are renamed to remove BE from the name if the
function is used by BE and LE. Functions that are BE specific have BE put
in the name.
The goals of this patch is to make sure #defines, function names and
variables consistently use PPC64/ppc64 if it refers to BE and LE,
PPC64BE/ppc64be if it is specific to BE, PPC64LE/ppc64le if it is LE
specific. The patch does not break the code for PPC64 Big Endian.
The test files memcheck/tests/atomic_incs.c, tests/power_insn_available.c
and tests/power_insn_available.c are also updated to the new #define
definition for PPC64 BE.
Signed-off-by: Carl Love <carll@us.ibm.com>
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@14238
showing inlined function calls.
See 278972 valgrind stacktraces and suppression do not handle inlined function call debuginfo
Reading the inlined dwarf call info is activated using the new clo
--read-inline-info=yes
Default is currently no but an objective is to optimise the performance
and memory in order to possibly set it on by default.
(see below discussion about performances).
Basically, the patch provides the following pieces:
1. Implement a new dwarf3 reader that reads the inlined call info
2. Some performance improvements done for this new parser, and
on some common code between the new parser and the var info parser.
3. Use the parsed inlined info to produce stacktrace showing inlined calls
4. Use the parsed inlined info in the suppression matching and suppression generation
5. and of course, some reg tests
1. new dwarf3 reader:
---------------------
Two options were possible: add the reading of the inlined info
in the current var info dwarf reader, or add a 2nd reader.
The 2nd approach was preferred, for the following reasons:
The var info reader is slow, memory hungry and quite complex.
Having a separate parsing phase for the inlined information
is simpler/faster when just reading the inlined info.
Possibly, a single parser would be faster when using both
--read-var-info=yes and --read-inline-info=yes.
However, var-info being extremely memory/cpu hungry, it is unlikely
to be used often, and having a separate parsing for inlined info
does in any case make not much difference.
(--read-var-info=yes is also now less interesting thanks to commit
r13991, which provides a fast and low memory "reasonable" location
for an address).
The inlined info parser reads the dwarf info to make calls
to priv_storage.h ML_(addInlInfo).
2. performance optimisations
----------------------------
* the abbrev cache has been improved in revision r14035.
* The new parser skips the non interesting DIEs
(the var-info parser has no logic to skip uninteresting DIEs).
* Some other minor perf optimisation here and there.
In total now, on a big executable, 15 seconds CPU are needed to
create the inlined info (on my slow x86 pentium).
With regards to memory, the dinfo arena:
with inlined info: 172281856/121085952 max/curr mmap'd
without : 157892608/106721280 max/curr mmap'd,
So, basically, inlined information costs about 15Mb of memory for
my big executable (compared to first version of the patch, this is
already using less memory, thanks to the strpool deduppoolalloc.
The needed memory can probably be decreased somewhat more.
3. produce better stack traces
------------------------------
VG_(describe_IP) has a new argument InlIPCursor *iipc which allows
to describe inlined function calls by doing repetitive calls
to describe_IP. See pub_tool_debuginfo.h for a description.
4. suppression generation and matching
--------------------------------------
* suppression generation now also uses an InlIPCursor *iipc
to generate a line for each inlined fn call.
* suppression matching: to allow suppression matching to
match one IP to several function calls in a suppression entry,
the 'inputCompleter' object (that allows to lazily generate
function or object names for a stacktrace when matching
an error with a suppression) has been generalised a little bit
more to also lazily generate the input sequence.
VG_(generic_match) has been updated so as to be more generic
with respect to the input completer : when providing an
input completer, VG_(generic_match) does not need anymore
to produce/compute any input itself : this is all delegated
to the input completer.
5. various regtests
-------------------
to test stack traces with inlined calls, and suppressions
of (some of) these errors using inlined fn calls matching.
Work still to do:
-----------------
* improve parsing performance
* improve the memory overhead.
* handling the directory name for files of the inlined function calls is not yet done.
(probably implies to refactor some code)
* see if m_errormgr.c *offsets arrays cannot be managed via xarray
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@14036
Necessary changes to Valgrind to support MIPS64LE on Linux.
Minor cleanup/style changes embedded in the patch as well.
The change corresponds to r2687 in VEX.
Patch written by Dejan Jevtic and Petar Jovanovic.
More information about this issue:
https://bugs.kde.org/show_bug.cgi?id=313267
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@13292
gcc reports a warning:
m_stacktrace.c:183: warning: ‘xip_verified’ may be used uninitialized in this function
This warning is a false positive:
xip_verified is assigned in the following branch:
if (UNLIKELY(xip_verif >= CFUNWIND)) {
if (xip_verif == CFUNWIND) {
...
} else {
<<<< here xip_verified is initialised >>>>
}
}
xip_verified is then used only if xip_verif > CFUNWIND.
Assign a rubish value to xip_verified to silence gcc.
(??? there are GCC pragmas that can be used to
disable a warning only on a specific line e.g.
something like:
#pragma GCC diagnostic ignored "-Wuninitialized"
Addr xip_verified; // xip for which we have calculated fpverif_uregs
#pragma GCC diagnostic warning "-Wuninitialized"
instead of
Addr xip_verified = 0; // xip for which we have calculated fpverif_uregs
// 0 assigned to silence false positive -Wuninitialized warning
but the #pragma technique seems not used currently.
So, using the bypass by assigning a rubbish value
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@13282
* other platforms (e.g. amd64) are first trying to unwind
with cfi info, then with the fp chain.
* fp unwind when code is compiled without frame pointer can
fail and give incomplete stack traces (often terminating
with a random program counter, causing a huge amount of
recorded stack traces).
This patch improves unwinding on x86 by:
* first time an IP is unwound, do the unwind both with
CFI technique and with fp technique.
If results are identical, IP is inserted in a cache of
'fp unwindable' IP
* following unwind of the same IP are then done directly
either with fp unwind or with cfi, depending on the
cached result of the check done during first unwind.
The cache is needed so as to avoid as much as possible cfi unwind,
as this is significantly slower than fp unwind.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@13280
In a big applications, some recursive algorithms have created
hundreds of thousands of stacktraces, taking a lot of memory.
Option --merge-recursive-frames=<number> tells Valgrind to
detect and merge (collapse) recursive calls when recording stack traces.
The value is changeable using the monitor command
'v.set merge-recursive-frames'.
Also, this provides a new client request: VALGRIND_MONITOR_COMMAND
allowing to execute a gdbsrv monitor command from the client
program.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@13246
If VG_(use_CF_info) fails to find the next frame using loaded debug symbols, it
will still change the data in uregs. Thus, we need to have uregs_copy before
calling VG_(use_CF_info), and restore uregs if the call returns wrong data.
This fixes drd/tests/tc04_free_lock on MIPS.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@12962