For all the changes I've made recently. And also various other changes
that occurred over the past 20 years that didn't previously make it into
the docs.
Also, this change de-emphasises the cache and branch simulation aspect,
because they're no longer that useful. Instead it emphasises the
precision and reproducibility of instruction count profiling.
By not configuring the caches in that case. This requires moving a few
assertions around, because they currently assume that the caches are
configured.
And deprecate the use of `cg_diff` and `cg_merge`.
Because `cg_annotate` can do a better job, even annotating source files
when doing diffs in some cases.
The user requests merging by passing multiple cgout files to
`cg_annotate`, and diffing by passing two cgout files to `cg_annotate`
along with `--diff`.
Most notable, the "Function summary" section, which printed one CC for each
`file:function` combination, has been replaced by two sections, "File:function
summary" and "Function:file summary".
These new sections both feature "deep CCs", which have an "outer CC" for the
file (or function), and one or more "inner CCs" for the paired functions (or
files).
Here is a file:function example, which helps show which files have a lot of
events, even if those events are spread across a lot of functions.
```
> 12,427,830 (5.4%, 26.3%) /home/njn/moz/gecko-dev/js/src/ds/LifoAlloc.h:
6,107,862 (2.7%) js::frontend::ParseNodeVerifier::visit(js::frontend::ParseNode*)
3,685,203 (1.6%) js::detail::BumpChunk::setBump(unsigned char*)
1,640,591 (0.7%) js::LifoAlloc::alloc(unsigned long)
711,008 (0.3%) js::detail::BumpChunk::assertInvariants()
```
And here is a function:file example, which shows how heavy inlining can result
in a machine code function being derived from source code from multiple files:
```
> 1,343,736 (0.6%, 35.6%) js::gc::TenuredCell::isMarkedGray() const:
651,108 (0.3%) /home/njn/moz/gecko-dev/js/src/d64/dist/include/js/HeapAPI.h
292,672 (0.1%) /home/njn/moz/gecko-dev/js/src/gc/Cell.h
254,854 (0.1%) /home/njn/moz/gecko-dev/js/src/gc/Heap.h
```
Previously these patterns were very hard to find, and it was easy to overlook a
hot piece of code because its counts were spread across multiple non-adjacent
entries. I have already found these changes very useful for profiling Rust
code.
Also, cumulative percentages on the outer CCs (e.g. the 26.3% and 35.6% in the
example) tell you what fraction of all events are covered by the entries so
far, something I've wanted for a long time.
Some other, related changes:
- Column event headers are now padded with `_`, e.g. `Ir__________`. This makes
the column/event mapping clearer.
- The "Cachegrind profile" section is now called "Metadata", which is
shorter and clearer.
- A few minor test tweaks, beyond those required for the output changes.
- I converted some doc comments to normal comments. Not standard Python, but
nicer to read, and there are no public APIs here.
- Roughly 2x speedups to `cg_annotate` and smaller improvements for `cg_diff`
and `cg_merge`, due to the following.
- Change the `Cc` class to a type alias for `list[int]`, to avoid the class
overhead (sigh).
- Process event count lines in a single split, instead of a regex
match + split.
- Add the `add_cc_to_ccs` function, which does multiple CC additions in a
single function call.
- Better handling of dicts while reading input, minimizing lookups.
- Pre-computing the missing CC string for each CcPrinter, instead of
regenerating it each time.
- Move it to `auxprogs/`, alongside `pybuild.sh`.
- Disable the annoying design lints, instead of just modifying the
values (which often requires modifying them again later).
Users shouldn't ever see this, but it's useful to distinguish this
malformed data file case from the missing symbol case (which is still
shown as `???`).
It's currently written in C, but `cg_annotate` and `cg_diff` are written in
Python. It's better to have them all in the same language.
The good news is that the Python code is 4.5x shorter than the C code.
The bad news is that the Python code is roughly 3x slower than the C
code. But `cg_merge` isn't used that often, so I think it's a reasonable
trade-off.
For all the same reasons I rewrote `cg_annotate` in Python.
The commit also moves the Python "build" steps into
`auxprogs/pybuild.sh`, for easy sharing.
Finally, it very slightly tweaks the whitespace in the output of
`cg_annotate`.
- Every section now has a heading with the long `----` lines above and
below.
- Event names are always shown below that heading, rather than within
it.
- Each Unreadable file now gets its own section, much like files that
lack any data.
Currently their width is mostly hard-wired in a quick and dirty fashion.
This commit does them properly, so:
- all columns are always the right width, even ones with really large
percentages
- things like `( 1.00%)` are now `(1.00%)`
- any percentages that would involve a division by zero now show as
`(n/a)` rather than `( 0.00%)`
Perl was a reasonable choice for `cg_annotate` in 2002, but not in 2023.
Also, the existing structure of the code is not good. These two things
make it hard to modify `cg_annotate` in any significant way.
Benefits of the change:
- Now written in a language that is (a) nice, and (b) not moribund.
- Easier to maintain, due to (a) abovementioned better language, (b)
better code structure, and (c) better language tooling, such as
formatters, type checkers, and linters.
- The new version is a little shorter.
- It runs about 2x faster.
- Argument handling is more standard. E.g. things like `--context 2`,
`--auto`, `--no-auto` are supported. (The old forms that require `=`
are still supported, though the `=yes`/`=no` forms are deprecated.)
The behaviour and output of the new version is identical for typical
uses, but there are some very minor changes for edge cases, which nobody
is likely to notice. For example:
- The file format is slightly changed: I removed support for '.'
counts, which had the same meaning as '0'. This was a feature that
Cachegrind never used, and the old script handled it inconsistently.
- The new version will abort on a malformed data line. The old version
would just print a warning and continue.
The commit also adds a new test `ann3` that tests many parts of
`cg_annotate` that weren't tested previously, and tweaks the existing
`ann2` test.
Both a.c and cgout-test are checked into the repository and
used in testcases. Make sure cgout-test is newer than a.c
before running the post script to prevent warnings liks:
@@ WARNING @@ WARNING @@ WARNING @@ WARNING @@ WARNING @@ WARNING @@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ Source file 'a.c' is more recent than input file
../../cachegrind/tests/cgout-test'.
@ Annotations may not be correct.
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
When running `cg_annotate` on files produced with `cg_diff`, it's common
to get multiple occurrences of this pair of errors:
```
Use of uninitialized value $pairs[0] in numeric lt (<) at
/home/njn/grind/ws1/cachegrind/cg_annotate line 848.
Use of uninitialized value $high in numeric lt (<) at
/home/njn/grind/ws1/cachegrind/cg_annotate line 859.
```
This is because `cg_annotate` wasn't properly handling the case where no
source code lines have annotations, which never happens in the normal
case but does happen in `cg_diff` output.
Happily, it turns out that the warnings were harmless, the fix is
trivial, and it doesn't change the output at all.
Rust v0 symbols can have `#` chars in them, things like this:
```
core::panic::unwind_safe::AssertUnwindSafe<<proc_macro::bridge::server::Dispat
cher<proc_macro::bridge::server::MarkedTypes<rustc_expand::proc_macro_server::Rustc>> as proc_macro::bridge::server::DispatcherTrait>::dispatch::{closure#14}>, ()>
```
`cg_diff` currently messes these up in two ways.
- It treats anything after a `#` in the input file as a comment. In
comparison, `cg_annotate` only treats a `#` as starting a comment at
the start of a line.
- It uses `#` to temporarily join file names and function names while
processing.
This commit adjusts the parsing to fix the first problem, and changes
the joiner sequence to `###` to fix the second problem.
Files in the root directory
Several Makefile.am files that have dependencies on FreeBSD autoconf
variables. Included a few new filter files to act as placeholders
to create new freebsd subdirectories.
Updated NEWS with the FreeBSD bugzilla items plus a couple of other
items fixed indirectly.
manpages-index.xml is just to easily get at each individual man page
with xsltproc. It wasn't a complete docbookx xml file. Now that it is
we can validate it with xmllint. It doesn't fully validate, but we
are close.
This makes the rule for xmllint easier since it doesn't need to
override the DTD to validate against. It also helps with other tools
tryinf to process the docbookx xml files.
Necessary changes to support nanoMIPS on Linux.
Part 3/4 - Coregrind and tools changes
Patch by Aleksandar Rikalo, Dimitrije Nikolic, Tamara Vlahovic,
Nikola Milutinovic and Aleksandra Karadzic.
Related KDE issue: #400872.
C++ function names can contain substrings like "{lambda()#1}". But
callgrind_annotate and cg_annotate interpret the '#'-character as a
comment marker anywhere on each input line, and thus truncate such names
there.
On the other hand, the documentation in docs/cl-format.xml, states:
Everywhere, comments on own lines starting with '#' are allowed.
This seems to imply that a comment line must start with '#' in the first
column. Thus skip exactly such lines in the input file and don't handle
'#' as a comment marker anywhere else.
Signed-off-by: Philippe Waroquiers <philippe.waroquiers@skynet.be>
Sync VEX/LICENSE.GPL with top-level COPYING file. We used 3 different
addresses for writing to the FSF to receive a copy of the GPL. Replace
all different variants with an URL <http://www.gnu.org/licenses/>.
The following files might still have some slightly different (L)GPL
copyright notice because they were derived from other programs:
- files under coregrind/m_demangle which come from libiberty:
cplus-dem.c, d-demangle.c, demangle.h, rust-demangle.c,
safe-ctype.c and safe-ctype.h
- coregrind/m_demangle/dyn-string.[hc] derived from GCC.
- coregrind/m_demangle/ansidecl.h derived from glibc.
- VEX files for FMA detived from glibc:
host_generic_maddf.h and host_generic_maddf.c
- files under coregrin/m_debuginfo derived from LZO:
lzoconf.h, lzodefs.h, minilzo-inl.c and minilzo.h
- files under coregrind/m_gdbserver detived from GDB:
gdb/signals.h, inferiors.c, regcache.c, regcache.h,
regdef.h, remote-utils.c, server.c, server.h, signals.c,
target.c, target.h and utils.c
Plus the following test files:
- none/tests/ppc32/testVMX.c derived from testVMX.
- ppc tests derived from QEMU: jm-insns.c, ppc64_helpers.h
and test_isa_3_0.c
- tests derived from bzip2 (with embedded GPL text in code):
hackedbz2.c, origin5-bz2.c, varinfo6.c
- tests detived from glibc: str_tester.c, pth_atfork1.c
- test detived from GCC libgomp: tc17_sembar.c
- performance tests derived from bzip2 or tinycc (with embedded GPL
text in code): bz2.c, test_input_for_tinycc.c and tinycc.c
cachegrind/callgrind fails ann[12] tests because of missing a.c
These testcases fail because the dist tar is missing the a.c
(auto-annotated) source file. Fix by adding it to EXTRA_DIST.
https://bugs.kde.org/show_bug.cgi?id=406352
Because it's very useful. As part of this, the "percentage of events
annotated" numbers at the bottom of the output is changed to "events
annotated" so that --show-percs doesn't compute a percentage of a
percentage.
Example output lines:
```
4,967,137,442 (100.0%) PROGRAM TOTALS
4,543 (25.23%) 17,566 ( 0.43%) 47,993 ( 0.92%) /build/glibc-OTsEL5/glibc-2.27/elf/dl-lookup.c
1 ( 0.01%) 2,000,001 (49.29%) 3,000,004 (57.36%) for (int i = 0; i < 1000000; i++) {
```
The commit also adds some much-needed tests for cg_annotate and
callgrind_annotate.
On majority of architectures size of long matches register width.
On mips n32 size of long is 32 bits and register width is 64 bits.
Valgrind is written with assumption that long size matches register
width. This is the reason why both UWord for Valgrind and HWord for VEX
match size of long. Long size differs from register size on mips n32 ABI.
Introducing RegWord type that will match size of registers.
Part of the changes required for BZ issue - #345763.
Contributed by:
Tamara Vlahovic and Dimitrije Nikolic.