Test adds a block of xori instructions for use with the PC relative tests.
The registers used by the xori instructions need to be saved and restored,
otherwise the register changes can impact the execution of the for loops
in the test as registers are randomly changed. The issue occcurs when
GCC is optimizing and inlining the test functions.
Some linux os make softlink from customized directory like lib64xxx
into standard system lib64 directory.
https://bugs.kde.org/show_bug.cgi?id=467839
Contributed-by: JojoR <rjiejie@gmail.com>
Executing vgdb --multi makes vgdb talk the gdb extended-remote
protocol. This means that the gdb run command is supported and
vgdb will start up the program under valgrind. Which means you
don't need to run gdb and valgrind from different terminals.
Also vgdb keeps being connected to gdb after valgrind exits. So
you can easily rerun the program with the same breakpoints in
place.
vgdb now implements a minimal gdbserver that just recognizes
a few extended-remote protocol packets. Once it starts up valgrind
it sets up noack and qsupported then it will forward packets
between gdb and valgrind gdbserver. After valgrind shutsdown it
resumes handling gdb packets itself.
https://bugs.kde.org/show_bug.cgi?id=434057
Co-authored-by: Mark Wielaard <mark@klomp.org>
Most notable, the "Function summary" section, which printed one CC for each
`file:function` combination, has been replaced by two sections, "File:function
summary" and "Function:file summary".
These new sections both feature "deep CCs", which have an "outer CC" for the
file (or function), and one or more "inner CCs" for the paired functions (or
files).
Here is a file:function example, which helps show which files have a lot of
events, even if those events are spread across a lot of functions.
```
> 12,427,830 (5.4%, 26.3%) /home/njn/moz/gecko-dev/js/src/ds/LifoAlloc.h:
6,107,862 (2.7%) js::frontend::ParseNodeVerifier::visit(js::frontend::ParseNode*)
3,685,203 (1.6%) js::detail::BumpChunk::setBump(unsigned char*)
1,640,591 (0.7%) js::LifoAlloc::alloc(unsigned long)
711,008 (0.3%) js::detail::BumpChunk::assertInvariants()
```
And here is a function:file example, which shows how heavy inlining can result
in a machine code function being derived from source code from multiple files:
```
> 1,343,736 (0.6%, 35.6%) js::gc::TenuredCell::isMarkedGray() const:
651,108 (0.3%) /home/njn/moz/gecko-dev/js/src/d64/dist/include/js/HeapAPI.h
292,672 (0.1%) /home/njn/moz/gecko-dev/js/src/gc/Cell.h
254,854 (0.1%) /home/njn/moz/gecko-dev/js/src/gc/Heap.h
```
Previously these patterns were very hard to find, and it was easy to overlook a
hot piece of code because its counts were spread across multiple non-adjacent
entries. I have already found these changes very useful for profiling Rust
code.
Also, cumulative percentages on the outer CCs (e.g. the 26.3% and 35.6% in the
example) tell you what fraction of all events are covered by the entries so
far, something I've wanted for a long time.
Some other, related changes:
- Column event headers are now padded with `_`, e.g. `Ir__________`. This makes
the column/event mapping clearer.
- The "Cachegrind profile" section is now called "Metadata", which is
shorter and clearer.
- A few minor test tweaks, beyond those required for the output changes.
- I converted some doc comments to normal comments. Not standard Python, but
nicer to read, and there are no public APIs here.
- Roughly 2x speedups to `cg_annotate` and smaller improvements for `cg_diff`
and `cg_merge`, due to the following.
- Change the `Cc` class to a type alias for `list[int]`, to avoid the class
overhead (sigh).
- Process event count lines in a single split, instead of a regex
match + split.
- Add the `add_cc_to_ccs` function, which does multiple CC additions in a
single function call.
- Better handling of dicts while reading input, minimizing lookups.
- Pre-computing the missing CC string for each CcPrinter, instead of
regenerating it each time.
All for clang and mostly Apple clang
There are still numerous deprecated warnings on macOS 10.13
(sem* functions, syscall, sbrk, i386, PIEi, OSSpinLocki, swapcontext, getcontext)
- Move it to `auxprogs/`, alongside `pybuild.sh`.
- Disable the annoying design lints, instead of just modifying the
values (which often requires modifying them again later).
Provide the user with a hint of what caused an out of memory error.
And explain that some memory policies, like selinux deny_execmem
might cause Permission denied errors.
Add an err argument to out_of_memory_NORETURN. And change
am_shadow_alloc to return a SysRes (all three callers were already
checking for errors and calling out_of_memory_NORETURN).
Users shouldn't ever see this, but it's useful to distinguish this
malformed data file case from the missing symbol case (which is still
shown as `???`).
It's currently written in C, but `cg_annotate` and `cg_diff` are written in
Python. It's better to have them all in the same language.
The good news is that the Python code is 4.5x shorter than the C code.
The bad news is that the Python code is roughly 3x slower than the C
code. But `cg_merge` isn't used that often, so I think it's a reasonable
trade-off.
For all the same reasons I rewrote `cg_annotate` in Python.
The commit also moves the Python "build" steps into
`auxprogs/pybuild.sh`, for easy sharing.
Finally, it very slightly tweaks the whitespace in the output of
`cg_annotate`.