The name is not necessarily found in the abstract origin, it can be
in a referred to specification.
If both a name and a DW_AT_specification is found in the abstract origin,
the name will have priority over the name of the specification.
(unclear if that can happen)
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@14076
the inlined info of a big executable.
On a slow pentium, reading the inline info now takes 5.5 seconds.
The optimisation consists in having per dw3 abbreviation a structure
allowing to skip efficiently the non interesting DIEs (i.e. the DIEs
the parse_inl_DIE is not interested in).
Mostly, the idea is to avoid calling the image abstraction, and replace
this by just advancing the cursor (i.e. addition rather than a bunch
of function calls to read the data).
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@14075
* add a trace_DIE function
* use it to trace a bad DIE
and to trace all DIEs that are (maybe) read
(due to the "avoid read twice" optimisation, the tracing was not
so easy to read anymore => add an explicit trace_DIE call at the beginning
of read_DIE)
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@14050
by a DIE parser
Instead of pre-reading the DIE, first let the parser(s) possibly
parse the DIE. Read (to skip) the DIE data if no parser has parsed it.
OTherwise, just jump to the end of the DIE as established by the parser
that has read the DIE.
This slightly improves the reading of inlined info.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@14049
Wrong place for the assertion for the inlparser
+ move the "zero the parsers" out of the "if VG_(clo*)" conditions
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@14044
of DIEs when one or more parsers will read them also)
+ add the name of the parser in the barf output.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@14041
showing inlined function calls.
See 278972 valgrind stacktraces and suppression do not handle inlined function call debuginfo
Reading the inlined dwarf call info is activated using the new clo
--read-inline-info=yes
Default is currently no but an objective is to optimise the performance
and memory in order to possibly set it on by default.
(see below discussion about performances).
Basically, the patch provides the following pieces:
1. Implement a new dwarf3 reader that reads the inlined call info
2. Some performance improvements done for this new parser, and
on some common code between the new parser and the var info parser.
3. Use the parsed inlined info to produce stacktrace showing inlined calls
4. Use the parsed inlined info in the suppression matching and suppression generation
5. and of course, some reg tests
1. new dwarf3 reader:
---------------------
Two options were possible: add the reading of the inlined info
in the current var info dwarf reader, or add a 2nd reader.
The 2nd approach was preferred, for the following reasons:
The var info reader is slow, memory hungry and quite complex.
Having a separate parsing phase for the inlined information
is simpler/faster when just reading the inlined info.
Possibly, a single parser would be faster when using both
--read-var-info=yes and --read-inline-info=yes.
However, var-info being extremely memory/cpu hungry, it is unlikely
to be used often, and having a separate parsing for inlined info
does in any case make not much difference.
(--read-var-info=yes is also now less interesting thanks to commit
r13991, which provides a fast and low memory "reasonable" location
for an address).
The inlined info parser reads the dwarf info to make calls
to priv_storage.h ML_(addInlInfo).
2. performance optimisations
----------------------------
* the abbrev cache has been improved in revision r14035.
* The new parser skips the non interesting DIEs
(the var-info parser has no logic to skip uninteresting DIEs).
* Some other minor perf optimisation here and there.
In total now, on a big executable, 15 seconds CPU are needed to
create the inlined info (on my slow x86 pentium).
With regards to memory, the dinfo arena:
with inlined info: 172281856/121085952 max/curr mmap'd
without : 157892608/106721280 max/curr mmap'd,
So, basically, inlined information costs about 15Mb of memory for
my big executable (compared to first version of the patch, this is
already using less memory, thanks to the strpool deduppoolalloc.
The needed memory can probably be decreased somewhat more.
3. produce better stack traces
------------------------------
VG_(describe_IP) has a new argument InlIPCursor *iipc which allows
to describe inlined function calls by doing repetitive calls
to describe_IP. See pub_tool_debuginfo.h for a description.
4. suppression generation and matching
--------------------------------------
* suppression generation now also uses an InlIPCursor *iipc
to generate a line for each inlined fn call.
* suppression matching: to allow suppression matching to
match one IP to several function calls in a suppression entry,
the 'inputCompleter' object (that allows to lazily generate
function or object names for a stacktrace when matching
an error with a suppression) has been generalised a little bit
more to also lazily generate the input sequence.
VG_(generic_match) has been updated so as to be more generic
with respect to the input completer : when providing an
input completer, VG_(generic_match) does not need anymore
to produce/compute any input itself : this is all delegated
to the input completer.
5. various regtests
-------------------
to test stack traces with inlined calls, and suppressions
of (some of) these errors using inlined fn calls matching.
Work still to do:
-----------------
* improve parsing performance
* improve the memory overhead.
* handling the directory name for files of the inlined function calls is not yet done.
(probably implies to refactor some code)
* see if m_errormgr.c *offsets arrays cannot be managed via xarray
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@14036
For each DIE, the dwarf3 reader must know which data elements to read.
These elements are described by an abbreviation.
Re-reading these abbreviations for each DIE is costly as
the location of the needed abbreviation is found by scanning the full
abbv section, which is very costly.
(A small cache of 32 abbv offsets in the abbv section somewhat decreases
the cost, but reading the abbvs is still a hot spot, in particular for
big debug informations).
This patch:
* adds an hash table of parsed abbreviations
* all abbreviations for a CU are read in one single scan of the abbv
section, when the CU header is read
So, with the patch, the di image is not accessed anymore for reading the abbvs
after the CU header parsing.
On a big executable, --read-var-info=yes user cpu changes from
trunk: 320 seconds
to
abbv cache: 270 seconds
This further improves on a previous (not committed) abbv cache that
was just caching up to 513 entries in the abbv pos cache and populating
the cache with an initial scan. The user cpu for this version was 285 seconds.
NB: this is some work in anticipation of a following patch that
will add reading dwarf3 inlined information, with the hope to make
this reading fast enough to activate it by default.
Note: on the examples I looked at, all abbreviations were numbered starting
from 1, with no holes. If that would always be the case, then one could use
an xarray of parsed abbreviations rather than an hash table. However,
I found nothing in the dwarf standard that guarantees that abbreviations
are numbered from 1. So, the hash table.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@14035
It is possible that a debug info contains no string (and so strpool
is never allocated).
A protection to avoid accessing strpool was already necessary
in ML_(canonicaliseTables) :
if (di->strpool)
VG_(freezeDedupPA) (di->strpool);
So, if a similar debug info is released, we need the same protection
to avoid accessing a NULL strpool.
Detect by Julian on arm64, but not (at least easily) reproduced on amd64.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@14033
include/pub_tool_deduppoolalloc.h
coregrind/pub_core_deduppoolalloc.h
coregrind/m_deduppoolalloc.c
and uses it (currently only) for the strings in m_debuginfo/storage.c
The idea is that such ddup pool allocator will also be used for other
highly duplicated information (e.g. the DiCFSI information), where
significant gains can also be achieved.
The dedup pool for strings also decreases significantly the memory
needed by the read inline information (patch still to be committed,
see bug 278972).
When testing with a big executable (tacot_process),
this reduces the size of the dinfo arena from
trunk: 158941184/109760512 max/curr mmap'd, 156775944/107882728 max/curr,
to
ddup: 157892608/106614784 max/curr mmap'd, 156362160/101414712 max/curr
(so 3Mb less mmap-ed once debug info is read, 1Mb less mmap-ed in peak,
6Mb less allocated once debug info is read).
This is all gained due to the string which changes from:
trunk: 17,434,704 in 266: di.storage.addStr.1
to
ddup: 10,966,608 in 750: di.storage.addStr.1
(6.5Mb less memory used by strings)
The gain in mmap-ed memory is smaller due to fragmentation.
Probably one could decrease the fragmentation by using bigger
size for the dedup pool, but then we would lose memory on the last
allocated pool (and for small libraries, we often do not use much
of a big pool block).
Solution might be to increase the pool size but have a "shrink_block"
operation. To be looked at in the future.
In terms of performance, startup of a big executable (on an old pentium)
is not influenced significantly (something like 0.1 seconds on 15 seconds
startup for a big executable, on a slow pentium).
The dedup pool uses a hash table. The hash function used currently
is the VG_(adler32) check sum. It is reported (and visible also here)
that this checksum is not a very good hash function (many collisions).
To have statistics about collisions, use --stats -v -v -v
As an example of the collisions, on the strings in debug info of memcheck tool on x86,
one obtain:
--4789-- dedupPA:di.storage.addStr.1 9983 allocs (8174 uniq) 11 pools (4820 bytes free in last pool)
--4789-- nr occurences of chains of len N, N-plicated keys, N-plicated elts
--4789-- N: 0 : nr chain 6975, nr keys 0, nr elts 0
--4789-- N: 1 : nr chain 3670, nr keys 6410, nr elts 8174
--4789-- N: 2 : nr chain 1070, nr keys 226, nr elts 0
--4789-- N: 3 : nr chain 304, nr keys 100, nr elts 0
--4789-- N: 4 : nr chain 104, nr keys 84, nr elts 0
--4789-- N: 5 : nr chain 72, nr keys 42, nr elts 0
--4789-- N: 6 : nr chain 44, nr keys 34, nr elts 0
--4789-- N: 7 : nr chain 18, nr keys 13, nr elts 0
--4789-- N: 8 : nr chain 17, nr keys 8, nr elts 0
--4789-- N: 9 : nr chain 4, nr keys 6, nr elts 0
--4789-- N:10 : nr chain 9, nr keys 4, nr elts 0
--4789-- N:11 : nr chain 1, nr keys 0, nr elts 0
--4789-- N:13 : nr chain 1, nr keys 1, nr elts 0
--4789-- total nr of unique chains: 12289, keys 6928, elts 8174
which shows that on 8174 different strings, we have only 6410 strings which have
a unique hash value. As other examples, N:13 line shows we have 13 strings
mapping to the same key. N:14 line shows we have 4 groups of 10 strings mapping to the
same key, etc.
So, adler32 is definitely a bad hash function.
Trials have been done with another hash function, giving a much lower
collision rate. So, a better (but still fast) hash function would probably
be beneficial. To be looked at ...
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@14029
-ffunction-sections -fdata-sections and the linker option
-Wl,--gc-sections, --read-var-info=yes gives the following:
valgrind: m_debuginfo/d3basics.c:973 (vgModuleLocal_evaluate_GX): Assertion 'aMax == ~(Addr)0' failed.
host stacktrace:
==18521== at 0x38057C54: show_sched_status_wrk (m_libcassert.c:308)
==18521== by 0x38057F50: report_and_quit (m_libcassert.c:367)
==18521== by 0x38058151: vgPlain_assert_fail (m_libcassert.c:432)
==18521== by 0x3813F084: vgModuleLocal_evaluate_GX (d3basics.c:973)
==18521== by 0x38098300: data_address_is_in_var (debuginfo.c:2769)
==18521== by 0x38099E26: vgPlain_get_data_description (debuginfo.c:3298)
...
The problem is that -Wl,--gc-sections eliminates the unused functions
but keeps some debug info for the functions or their compilation units.
The dwarf entry has low and high pc, but both are equal to 0.
The dwarf reader of Valgrind is confused by this, as the varstack becomes
empty, while it should not. This then causes local (eliminated) variables
to be put in the global scope, leading afterwards to evaluation errors
when describing any other variables.
The fix is to also push something on the varstack when
a CU that has low and high pc given but with 0 value.
This is similar to the varstack_push done for a CU that has
no low pc, no high pc and no range.
Despite considerable effort to make a small reproducer, the problem
could only be produced with a big executable.
After the fix, everything was working properly.
The wrong behaviour for dwarf entries produce the following trace:
<2><2ff291a>: Abbrev Number: 23 (DW_TAG_formal_parameter)
DW_AT_name : AET
DW_AT_decl_file : 1
DW_AT_decl_line : 243
DW_AT_type : <2ff2811>
DW_AT_location : 18288554
Recording this variable, with 1 PC range(s)
....
<2ff291a> addVar: level 0: AET :: EdgeTableEntry*
Loc=GX(final){[0x0,0x8]=50,[0x9,0x1d]=53,[0x1e,0x26]=51,[0x27,0x29]=53,[0x2a,0x2f]=51,[0x44,0x4a]=53,[0x4d,0x5e]=51,[0x5f,0x62]=53}
FrB=none
declared at: gdkpolyreg-generic.c:243
ACQUIRE for range(s) [0x0,0xffffffff]
The AET is a formal parameter of a function, but is wrongly added
at level 0, with a PC range covering the full space. It has a Loc GX
which uses non biased program counters (e.g. 0x0,0x8).
This dwarf entry will require a FrB (and registers when evaluating)
but no such things are available (or given) when evaluating a variable
in the global scope.
The fix is to handle compilation units with lo and hi pc == 0x0
similarly to a CU that has no lo and hi pc.
With this fix, valgrind --read-var-info=yes could properly
handle a big application with plenty of eliminated functions.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@13941
main(int argc)
{
typedef
struct {
int before_name;
char name[argc];
int after_name;
}
namet;
namet n;
}
compiled with gcc 4.7.4, the trunk --read-var-info=yes gives:
parse_type_DIE: confused by:
<2><51>: DW_TAG_structure_type
DW_AT_decl_file : 1
DW_AT_decl_line : 4
DW_AT_sibling : <83>
This is because that dwarf entry defines a struct with no size.
This happens when the struct has a VLA array in the middle
of a struct. This is a C gcc extension, and is a standard
feature of Ada.
The proper solution would be to have the size calculated at runtime,
using the gnat extensions or dwarf entries (to be generated by
the compiler).
The patch fixes this problem by defining the size of such structure
as 1 byte.
Another approach tried was to put the max possible size.
This had the disadvantage that any address on the stack was seen
as belonging to this variable.
This allows the description to work for the 1st byte of the variable
but cannot properly describe the 2nd and following bytes :
(gdb) p &n
$9 = (namet *) 0xbefbc070
(gdb) mo c d 0xbefbc070
Address 0xBEFBC070 len 1 not defined:
Uninitialised value at 0xBEFBC070
==1396== Location 0xbefbc070 is 0 bytes inside n.before_name,
==1396== declared at crec.c:10, in frame #0 of thread 1
(gdb) mo c d 0xbefbc071
Address 0xBEFBC071 len 1 not defined:
Uninitialised value at 0xBEFBC071
==1396== Address 0xbefbc071 is on thread 1's stack
(gdb)
A possible refinement would be to use a huge size but have the
logic of variable description understanding this and describing
all between this var and hte next var on the stack as being
in the VLA variable.
In the meantime, the size 1 avoids --read-var-info=yes to fail.
Also, the 'goto bad_DIE' have been replaced by a macro
goto_bad_DIE that ensures the line nr at which the bad DIE has
been detected is reported in the error msg.
This makes it easier to understand what is the problem.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@13938
have DW_AT_signature attribute. That wasn't the case in DWARF version 3.
From DWARF version 4:
If the complete declaration of a type has been placed in a separate type unit,
an incomplete declaration of that type in the compilation unit may provide the
unique 64-bit signature of the type using a DW_AT_signature attribute.
This patch adds an extra field in TyStOrUn structure (typeR). This field is
reference to other TyEnt that is placed in separate type unit. Because of the new
field in TyStOrUn structure we need to add an extra case in parse_type_DIE
that will put the right reference to other TyEnt and an extra case in
ML_(describe_type) that will describe type when the ty->Te.TyStOrUn.typeR field
is used.
This patch is resolving the problem with memcheck/tests/dw4 test when it's
compiled with compiler that will emit DW_AT_signature under the DW_TAG_structure_type.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@13891
We already accepted DW_TAG_typedef without a name for Ada. But g++ for
OpenMP can also emit such nameless DW_TAG_typedefs. Just accept them.
Also fix up anonymous enum and typedef printing in tytypes.c.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@13718
Bug #327837. The buildid from the .gnu_debugaltlink section was parsed
incorrectly (from the wrong offset). Causing the debug alt file not to
be found.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@13715
GCC allows incomplete enums as GNU extension.
http://gcc.gnu.org/onlinedocs/gcc/Incomplete-Enums.html
These are marked as DW_AT_declaration and won't have a size.
They can only be used in declaration or as pointer types.
You can't allocate variables or storage using such an enum type.
So don't require a size for such enum types.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@13433
exceeds the allowable range. With this change, it should be
essentially impossible to crash V by feeding it invalid ELF or Dwarf.
git-svn-id: svn://svn.valgrind.org/valgrind/branches/DISRV@13432
* add a new flag --allow-mismatched-debuginfo to override the
CRC32/build-id checks, if needed
* tidy up logic for finding files on the --extra-debuginfo-path
and at the --debuginfo-server
* don't assert if connection to the debuginfo server is lost;
instead print a reasonable message and quit.
git-svn-id: svn://svn.valgrind.org/valgrind/branches/DISRV@13431
* Addition of a function to compute size of buffer needed for VG_(mkstemp)
* Use it to dimension buffers for all VG_(mkstemp) calls.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@13409
Reading header length and values in external line info was incorrect at
some places as it used offsets based on dw64 that came from .debug_info.
Instead, offsets should be calculated based on is64 from .debug_line.
This issue surfaced in MIPS64 port, and it was discussed at:
https://bugs.kde.org/show_bug.cgi?id=313267#c20
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@13373
Bug #305513. We should only read the first DIE of a compilation unit.
Each compilation unit header is followed by a single DW_TAG_compile_unit
(or DW_TAG_partial_unit, but those aren't important here) and its children.
There is no reason to read any of the children at this point. If the first
DIE isn't a DW_TAG_compile_unit we are done, none of the child DIEs will
provide any useful information.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@13369
Bug #305513 contained a patch for some extra robustness checks. But
the real cause of crashing in the read_unitinfo_dwarf2 DWARF reader
seemed to have been this issue where DWARF version 2 DWZ partial_units
were read and DW_FORM_ref_addr had an unexpected size. This combination
is rare. DWARF version 4 is the current default version of GCC.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@13367
Necessary changes to Valgrind to support MIPS64LE on Linux.
Minor cleanup/style changes embedded in the patch as well.
The change corresponds to r2687 in VEX.
Patch written by Dejan Jevtic and Petar Jovanovic.
More information about this issue:
https://bugs.kde.org/show_bug.cgi?id=313267
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@13292
* other platforms (e.g. amd64) are first trying to unwind
with cfi info, then with the fp chain.
* fp unwind when code is compiled without frame pointer can
fail and give incomplete stack traces (often terminating
with a random program counter, causing a huge amount of
recorded stack traces).
This patch improves unwinding on x86 by:
* first time an IP is unwound, do the unwind both with
CFI technique and with fp technique.
If results are identical, IP is inserted in a cache of
'fp unwindable' IP
* following unwind of the same IP are then done directly
either with fp unwind or with cfi, depending on the
cached result of the check done during first unwind.
The cache is needed so as to avoid as much as possible cfi unwind,
as this is significantly slower than fp unwind.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@13280
This change is based on rumours/legends/oral transmission of experience/...
that prime nrs are good to use for hash table size :).
If someone has a (short) explanation about why this is useful,
that will be welcome.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@13237