diff --git a/cachegrind/docs/cg-manual.xml b/cachegrind/docs/cg-manual.xml
index 92fe08682..35d6a412e 100644
--- a/cachegrind/docs/cg-manual.xml
+++ b/cachegrind/docs/cg-manual.xml
@@ -5,167 +5,117 @@
 
 <!-- Referenced from both the manual and manpage -->
 <chapter id="&vg-cg-manual-id;" xreflabel="&vg-cg-manual-label;">
-<title>Cachegrind: a cache and branch-prediction profiler</title>
+<title>Cachegrind: a high-precision tracing profiler</title>
 
-<para>To use this tool, you must specify
-<option>--tool=cachegrind</option> on the
-Valgrind command line.</para>
+<para>
+To use this tool, specify <option>--tool=cachegrind</option> on the Valgrind
+command line.
+</para>
 
 <sect1 id="cg-manual.overview" xreflabel="Overview">
 <title>Overview</title>
 
-<para>Cachegrind simulates how your program interacts with a machine's cache
-hierarchy and (optionally) branch predictor.  It simulates a machine with
-independent first-level instruction and data caches (I1 and D1), backed by a
-unified second-level cache (L2).  This exactly matches the configuration of
-many modern machines.</para>
-
-<para>However, some modern machines have three or four levels of cache.  For these
-machines (in the cases where Cachegrind can auto-detect the cache
-configuration) Cachegrind simulates the first-level and last-level caches.
-The reason for this choice is that the last-level cache has the most influence on
-runtime, as it masks accesses to main memory.  Furthermore, the L1 caches
-often have low associativity, so simulating them can detect cases where the
-code interacts badly with this cache (eg. traversing a matrix column-wise
-with the row length being a power of 2).</para>
-
-<para>Therefore, Cachegrind always refers to the I1, D1 and LL (last-level)
-caches.</para>
-
 <para>
-Cachegrind gathers the following statistics (abbreviations used for each statistic
-is given in parentheses):</para>
+Cachegrind is a high-precision tracing profiler. It runs slowly, but collects
+precise and reproducible profiling data. It can merge and diff data from
+different runs. To expand on these characteristics:
+</para>
+
 <itemizedlist>
   <listitem>
-    <para>I cache reads (<computeroutput>Ir</computeroutput>,
-    which equals the number of instructions executed),
-    I1 cache read misses (<computeroutput>I1mr</computeroutput>) and
-    LL cache instruction read misses (<computeroutput>ILmr</computeroutput>).
+    <para>
+    <emphasis>Precise.</emphasis> Cachegrind measures the exact number of
+    instructions executed by your program, not an approximation. Furthermore,
+    it presents the gathered data at the file, function, and line level. This
+    is different to many other profilers that measure approximate execution
+    time, using sampling, and only at the function level.
     </para>
   </listitem>
+
   <listitem>
-    <para>D cache reads (<computeroutput>Dr</computeroutput>, which
-    equals the number of memory reads),
-    D1 cache read misses (<computeroutput>D1mr</computeroutput>), and
-    LL cache data read misses (<computeroutput>DLmr</computeroutput>).
-    </para>
-  </listitem>
-  <listitem>
-    <para>D cache writes (<computeroutput>Dw</computeroutput>, which equals
-    the number of memory writes),
-    D1 cache write misses (<computeroutput>D1mw</computeroutput>), and
-    LL cache data write misses (<computeroutput>DLmw</computeroutput>).
-    </para>
-  </listitem>
-  <listitem>
-    <para>Conditional branches executed (<computeroutput>Bc</computeroutput>) and
-    conditional branches mispredicted (<computeroutput>Bcm</computeroutput>).
-    </para>
-  </listitem>
-  <listitem>
-    <para>Indirect branches executed (<computeroutput>Bi</computeroutput>) and
-    indirect branches mispredicted (<computeroutput>Bim</computeroutput>).
+    <para>
+    <emphasis>Reproducible.</emphasis> In general, execution time is a better
+    metric than instruction counts because it's what users perceive. However,
+    execution time often has high variability. When running the exact same
+    program on the exact same input multiple times, execution time might vary
+    by several percent. Furthermore, small changes in a program can change its
+    memory layout and have even larger effects on runtime. In contrast,
+    instruction counts are highly reproducible; for some programs they are
+    perfectly reproducible. This means the effects of small changes in a
+    program can be measured with high precision.
     </para>
   </listitem>
 </itemizedlist>
 
-<para>Note that D1 total accesses is given by
-<computeroutput>D1mr</computeroutput> +
-<computeroutput>D1mw</computeroutput>, and that LL total
-accesses is given by <computeroutput>ILmr</computeroutput> +
-<computeroutput>DLmr</computeroutput> +
-<computeroutput>DLmw</computeroutput>.
+<para>
+For these reasons, Cachegrind is an excellent complement to time-based profilers.
 </para>
 
-<para>These statistics are presented for the entire program and for each
-function in the program.  You can also annotate each line of source code in
-the program with the counts that were caused directly by it.</para>
+<para>
+Cachegrind can annotate programs written in any language, so long as debug info
+is present to map machine code back to the original source code. Cachegrind has
+been used successfully on programs written in C, C++, Rust, and assembly.
+</para>
 
-<para>On a modern machine, an L1 miss will typically cost
-around 10 cycles, an LL miss can cost as much as 200
-cycles, and a mispredicted branch costs in the region of 10
-to 30 cycles.  Detailed cache and branch profiling can be very useful
-for understanding how your program interacts with the machine and thus how
-to make it faster.</para>
-
-<para>Also, since one instruction cache read is performed per
-instruction executed, you can find out how many instructions are
-executed per line, which can be useful for traditional profiling.</para>
+<para>
+Cachegrind can also simulate how your program interacts with a machine's cache
+hierarchy and branch predictor. This simulation was the original motivation for
+the tool, hence its name. However, the simulations are basic and unlikely to
+reflect the behaviour of a modern machine. For this reason they are off by
+default. If you really want cache and branch information, a profiler like
+<computeroutput>perf</computeroutput> that accesses hardware counters is a
+better choice.
+</para>
 
 </sect1>
 
 
-
 <sect1 id="cg-manual.profile"
-       xreflabel="Using Cachegrind, cg_annotate and cg_merge">
-<title>Using Cachegrind, cg_annotate and cg_merge</title>
+       xreflabel="Using Cachegrind and cg_annotate">
+<title>Using Cachegrind and cg_annotate</title>
 
-<para>First off, as for normal Valgrind use, you probably want to
-compile with debugging info (the
-<option>-g</option> option).  But by contrast with
-normal Valgrind use, you probably do want to turn
-optimisation on, since you should profile your program as it will
-be normally run.</para>
+<para>
+First, as for normal Valgrind use, you should compile with debugging info (the
+<option>-g</option> option in most compilers). But by contrast with normal
+Valgrind use, you probably do want to turn optimisation on, since you should
+profile your program as it will be normally run.
+</para>
 
-<para>Then, you need to run Cachegrind itself to gather the profiling
-information, and then run cg_annotate to get a detailed presentation of that
-information.  As an optional intermediate step, you can use cg_merge to sum
-together the outputs of multiple Cachegrind runs into a single file which
-you then use as the input for cg_annotate.  Alternatively, you can use
-cg_diff to difference the outputs of two Cachegrind runs into a single file
-which you then use as the input for cg_annotate.</para>
+<para>
+Second, run Cachegrind itself to gather the profiling data.
+</para>
+
+<para>
+Third, run cg_annotate to get a detailed presentation of that data. cg_annotate
+can combine the results of multiple Cachegrind output files. It can also
+perform a diff between two Cachegrind output files.
+</para>
 
 
 <sect2 id="cg-manual.running-cachegrind" xreflabel="Running Cachegrind">
 <title>Running Cachegrind</title>
 
-<para>To run Cachegrind on a program <filename>prog</filename>, run:</para>
+<para>
+To run Cachegrind on a program <filename>prog</filename>, run:
 <screen><![CDATA[
 valgrind --tool=cachegrind prog
 ]]></screen>
-
-<para>The program will execute (slowly).  Upon completion,
-summary statistics that look like this will be printed:</para>
-
-<programlisting><![CDATA[
-==31751== I   refs:      27,742,716
-==31751== I1  misses:           276
-==31751== LLi misses:           275
-==31751== I1  miss rate:        0.0%
-==31751== LLi miss rate:        0.0%
-==31751== 
-==31751== D   refs:      15,430,290  (10,955,517 rd + 4,474,773 wr)
-==31751== D1  misses:        41,185  (    21,905 rd +    19,280 wr)
-==31751== LLd misses:        23,085  (     3,987 rd +    19,098 wr)
-==31751== D1  miss rate:        0.2% (       0.1%   +       0.4%)
-==31751== LLd miss rate:        0.1% (       0.0%   +       0.4%)
-==31751== 
-==31751== LL misses:         23,360  (     4,262 rd +    19,098 wr)
-==31751== LL miss rate:         0.0% (       0.0%   +       0.4%)]]></programlisting>
-
-<para>Cache accesses for instruction fetches are summarised
-first, giving the number of fetches made (this is the number of
-instructions executed, which can be useful to know in its own
-right), the number of I1 misses, and the number of LL instruction
-(<computeroutput>LLi</computeroutput>) misses.</para>
-
-<para>Cache accesses for data follow. The information is similar
-to that of the instruction fetches, except that the values are
-also shown split between reads and writes (note each row's
-<computeroutput>rd</computeroutput> and
-<computeroutput>wr</computeroutput> values add up to the row's
-total).</para>
-
-<para>Combined instruction and data figures for the LL cache
-follow that.  Note that the LL miss rate is computed relative to the total
-number of memory accesses, not the number of L1 misses.  I.e.  it is
-<computeroutput>(ILmr + DLmr + DLmw) / (Ir + Dr + Dw)</computeroutput>
-not
-<computeroutput>(ILmr + DLmr + DLmw) / (I1mr + D1mr + D1mw)</computeroutput>
 </para>
 
-<para>Branch prediction statistics are not collected by default.
-To do so, add the option <option>--branch-sim=yes</option>.</para>
+<para>
+The program will execute (slowly). Upon completion, summary statistics that
+look like this will be printed:
+</para>
+
+<programlisting><![CDATA[
+==17942== I refs:          8,195,070
+]]></programlisting>
+
+<para>
+The <computeroutput>I refs</computeroutput> number is short for "Instruction
+cache references", which is equivalent to "instructions executed". If you
+enable the cache and/or branch simulation, additional counts will be shown.
+</para>
 
 </sect2>
 
@@ -173,660 +123,744 @@ To do so, add the option <option>--branch-sim=yes</option>.</para>
 <sect2 id="cg-manual.outputfile" xreflabel="Output File">
 <title>Output File</title>
 
-<para>As well as printing summary information, Cachegrind also writes
-more detailed profiling information to a file.  By default this file is named
-<filename>cachegrind.out.&lt;pid&gt;</filename> (where
-<filename>&lt;pid&gt;</filename> is the program's process ID), but its name
-can be changed with the <option>--cachegrind-out-file</option> option.  This
-file is human-readable, but is intended to be interpreted by the
-accompanying program cg_annotate, described in the next section.</para>
+<para>
+Cachegrind also writes more detailed profiling data to a file. By default this
+Cachegrind output file is named <filename>cachegrind.out.&lt;pid&gt;</filename>
+(where <filename>&lt;pid&gt;</filename> is the program's process ID), but its
+name can be changed with the <option>--cachegrind-out-file</option> option.
+This file is human-readable, but is intended to be interpreted by the
+accompanying program cg_annotate, described in the next section.
+</para>
 
-<para>The default <computeroutput>.&lt;pid&gt;</computeroutput> suffix
-on the output file name serves two purposes.  Firstly, it means you 
-don't have to rename old log files that you don't want to overwrite.  
-Secondly, and more importantly, it allows correct profiling with the
-<option>--trace-children=yes</option> option of
-programs that spawn child processes.</para>
-
-<para>The output file can be big, many megabytes for large applications
-built with full debugging information.</para>
+<para>
+The default <computeroutput>.&lt;pid&gt;</computeroutput> suffix on the output
+file name serves two purposes. First, it means existing Cachegrind output files
+aren't immediately overwritten. Second, and more importantly, it allows correct
+profiling with the <option>--trace-children=yes</option> option of programs
+that spawn child processes.
+</para>
 
 </sect2>
 
-
   
 <sect2 id="cg-manual.running-cg_annotate" xreflabel="Running cg_annotate">
 <title>Running cg_annotate</title>
 
-<para>Before using cg_annotate,
-it is worth widening your window to be at least 120-characters
-wide if possible, as the output lines can be quite long.</para>
-
-<para>To get a function-by-function summary, run:</para>
+<para>
+Before using cg_annotate, it is worth widening your window to be at least 120
+characters wide if possible, because the output lines can be quite long.
+</para>
 
+<para>
+Then run:
 <screen>cg_annotate &lt;filename&gt;</screen>
-
-<para>on a Cachegrind output file.</para>
+on a Cachegrind output file.
+</para>
 
 </sect2>
 
+<!--
+To produce the sample date, I did the following. Note that the single hypens in
+the valgrind command should be double hyphens, but XML doesn't allow double
+hyphens in comments.
 
-<sect2 id="cg-manual.the-output-preamble" xreflabel="The Output Preamble">
-<title>The Output Preamble</title>
+  gcc -g -O concord.c -o concord
+  valgrind -tool=cachegrind -cachegrind-out-file=concord.cgout ./concord ../cg_main.c
+  (to exit, type `q` and hit enter)
+  python ../cg_annotate concord.cgout > concord.cgann
 
-<para>The first part of the output looks like this:</para>
+concord.c is a small C program I wrote at university. It's a good size for an example.
+-->
+
+<sect2 id="cg-manual.the-metadata" xreflabel="The Metadata Section">
+<title>The Metadata Section</title>
+
+<para>
+The first part of the output looks like this:
+</para>
 
 <programlisting><![CDATA[
 --------------------------------------------------------------------------------
-I1 cache:              65536 B, 64 B, 2-way associative
-D1 cache:              65536 B, 64 B, 2-way associative
-LL cache:              262144 B, 64 B, 8-way associative
-Command:               concord vg_to_ucode.c
-Events recorded:       Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw
-Events shown:          Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw
-Event sort order:      Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw
-Threshold:             99%
-Chosen for annotation:
-Auto-annotation:       off
+-- Metadata
+--------------------------------------------------------------------------------
+Invocation:       ../cg_annotate concord.cgout
+Command:          ./concord ../cg_main.c
+Events recorded:  Ir
+Events shown:     Ir
+Event sort order: Ir
+Threshold:        0.1%
+Annotation:       on
 ]]></programlisting>
 
-
-<para>This is a summary of the annotation options:</para>
+<para>
+It summarizes how Cachegrind and the profiled program were run.
+</para>
                     
 <itemizedlist>
-
   <listitem>
-    <para>I1 cache, D1 cache, LL cache: cache configuration.  So
-    you know the configuration with which these results were
-    obtained.</para>
+    <para>
+    Invocation: the command line used to produce this output.
+    </para>
   </listitem>
 
   <listitem>
-    <para>Command: the command line invocation of the program
-      under examination.</para>
+    <para>
+    Command: the command line used to run the profiled program.
+    </para>
   </listitem>
 
   <listitem>
-   <para>Events recorded: which events were recorded.</para>
-
- </listitem>
-
- <listitem>
-   <para>Events shown: the events shown, which is a subset of the events
-   gathered.  This can be adjusted with the
-   <option>--show</option> option.</para>
+    <para>
+    Events recorded: which events were recorded. By default, this is
+    <computeroutput>Ir</computeroutput>. More events will be recorded if cache
+    and/or branch simulation is enabled.
+    </para>
   </listitem>
 
   <listitem>
-    <para>Event sort order: the sort order in which functions are
-    shown.  For example, in this case the functions are sorted
-    from highest <computeroutput>Ir</computeroutput> counts to
-    lowest.  If two functions have identical
-    <computeroutput>Ir</computeroutput> counts, they will then be
-    sorted by <computeroutput>I1mr</computeroutput> counts, and
-    so on.  This order can be adjusted with the
-    <option>--sort</option> option.</para>
-
-    <para>Note that this dictates the order the functions appear.
-    It is <emphasis>not</emphasis> the order in which the columns
-    appear; that is dictated by the "events shown" line (and can
-    be changed with the <option>--show</option>
-    option).</para>
+    <para>
+    Events shown: the events shown, which is a subset of the events gathered.
+    This can be adjusted with the <option>--show</option> option.
+    </para>
   </listitem>
 
   <listitem>
-    <para>Threshold: cg_annotate
-    by default omits functions that cause very low counts
-    to avoid drowning you in information.  In this case,
-    cg_annotate shows summaries the functions that account for
-    99% of the <computeroutput>Ir</computeroutput> counts;
-    <computeroutput>Ir</computeroutput> is chosen as the
-    threshold event since it is the primary sort event.  The
-    threshold can be adjusted with the
-    <option>--threshold</option>
-    option.</para>
+    <para>
+    Event sort order: the sort order used for the subsequent sections. For
+    example, in this case those sections are sorted from highest
+    <computeroutput>Ir</computeroutput> counts to lowest. If there are multiple
+    events, one will be the primary sort event, and then there can be a
+    secondary sort event, tertiary sort event, etc., though more than one is
+    rarely needed. This order can be adjusted with the <option>--sort</option>
+    option. Note that this does <emphasis>not</emphasis> specify the order in
+    which the columns appear. That is specified by the "events shown" line (and
+    can be changed with the <option>--show</option> option).
+    </para>
   </listitem>
 
   <listitem>
-    <para>Chosen for annotation: names of files specified
-    manually for annotation; in this case none.</para>
+    <para>
+    Threshold: cg_annotate by default omits files and functions with very low
+    counts to keep the output size reasonable. By default cg_annotate only
+    shows files and functions that account for at least 0.1% of the primary
+    sort event. The threshold can be adjusted with the
+    <option>--threshold</option> option.
+    </para>
   </listitem>
 
   <listitem>
-    <para>Auto-annotation: whether auto-annotation was requested
-    via the <option>--auto=yes</option>
-    option. In this case no.</para>
+    <para>
+    Annotation: whether source file annotation is enabled. Controlled with the
+    <option>--annotate</option> option.
+    </para>
   </listitem>
 
 </itemizedlist>
 
+<para>
+If cache simulation is enabled, details of the cache parameters will be shown
+above the "Invocation" line.
+</para>
+
 </sect2>
 
 
 <sect2 id="cg-manual.the-global"
-       xreflabel="The Global and Function-level Counts">
-<title>The Global and Function-level Counts</title>
-
-<para>Then follows summary statistics for the whole
-program:</para>
-  
-<programlisting><![CDATA[
---------------------------------------------------------------------------------
-Ir         I1mr ILmr Dr         D1mr   DLmr  Dw        D1mw   DLmw
---------------------------------------------------------------------------------
-27,742,716  276  275 10,955,517 21,905 3,987 4,474,773 19,280 19,098  PROGRAM TOTALS]]></programlisting>
+       xreflabel="Global, File, and Function-level Counts">
+<title>Global, File, and Function-level Counts</title>
 
 <para>
-These are similar to the summary provided when Cachegrind finishes running.
+Next comes the summary for the whole program:
+</para>
+  
+<programlisting><![CDATA[
+--------------------------------------------------------------------------------
+-- Summary
+--------------------------------------------------------------------------------
+Ir________________ 
+
+8,195,070 (100.0%)  PROGRAM TOTALS
+]]></programlisting>
+
+<para>
+The <computeroutput>Ir</computeroutput> column label is suffixed with
+underscores to show the bounds of the columns underneath.
 </para>
 
-<para>Then comes function-by-function statistics:</para>
+<para>
+Then comes file:function counts. Here is the first part of that section:
+</para>
 
 <programlisting><![CDATA[
 --------------------------------------------------------------------------------
-Ir        I1mr ILmr Dr        D1mr  DLmr  Dw        D1mw   DLmw    file:function
+-- File:function summary
 --------------------------------------------------------------------------------
-8,821,482    5    5 2,242,702 1,621    73 1,794,230      0      0  getc.c:_IO_getc
-5,222,023    4    4 2,276,334    16    12   875,959      1      1  concord.c:get_word
-2,649,248    2    2 1,344,810 7,326 1,385         .      .      .  vg_main.c:strcmp
-2,521,927    2    2   591,215     0     0   179,398      0      0  concord.c:hash
-2,242,740    2    2 1,046,612   568    22   448,548      0      0  ctype.c:tolower
-1,496,937    4    4   630,874 9,000 1,400   279,388      0      0  concord.c:insert
-  897,991   51   51   897,831    95    30        62      1      1  ???:???
-  598,068    1    1   299,034     0     0   149,517      0      0  ../sysdeps/generic/lockfile.c:__flockfile
-  598,068    0    0   299,034     0     0   149,517      0      0  ../sysdeps/generic/lockfile.c:__funlockfile
-  598,024    4    4   213,580    35    16   149,506      0      0  vg_clientmalloc.c:malloc
-  446,587    1    1   215,973 2,167   430   129,948 14,057 13,957  concord.c:add_existing
-  341,760    2    2   128,160     0     0   128,160      0      0  vg_clientmalloc.c:vg_trap_here_WRAPPER
-  320,782    4    4   150,711   276     0    56,027     53     53  concord.c:init_hash_table
-  298,998    1    1   106,785     0     0    64,071      1      1  concord.c:create
-  149,518    0    0   149,516     0     0         1      0      0  ???:tolower@@GLIBC_2.0
-  149,518    0    0   149,516     0     0         1      0      0  ???:fgetc@@GLIBC_2.0
-   95,983    4    4    38,031     0     0    34,409  3,152  3,150  concord.c:new_word_node
-   85,440    0    0    42,720     0     0    21,360      0      0  vg_clientmalloc.c:vg_bogus_epilogue]]></programlisting>
+  Ir______________________  file:function
 
-<para>Each function
-is identified by a
-<computeroutput>file_name:function_name</computeroutput> pair. If
-a column contains only a dot it means the function never performs
-that event (e.g. the third row shows that
-<computeroutput>strcmp()</computeroutput> contains no
-instructions that write to memory). The name
-<computeroutput>???</computeroutput> is used if the file name
-and/or function name could not be determined from debugging
-information. If most of the entries have the form
-<computeroutput>???:???</computeroutput> the program probably
-wasn't compiled with <option>-g</option>.</para>
+< 3,078,746 (37.6%, 37.6%)  /home/njn/grind/ws1/cachegrind/concord.c:
+  1,630,232 (19.9%)           get_word
+    630,918  (7.7%)           hash
+    461,095  (5.6%)           insert
+    130,560  (1.6%)           add_existing
+     91,014  (1.1%)           init_hash_table
+     88,056  (1.1%)           create
+     46,676  (0.6%)           new_word_node
 
-<para>It is worth noting that functions will come both from
-the profiled program (e.g. <filename>concord.c</filename>)
-and from libraries (e.g. <filename>getc.c</filename>)</para>
+< 1,746,038 (21.3%, 58.9%)  ./malloc/./malloc/malloc.c:
+  1,285,938 (15.7%)           _int_malloc
+    458,225  (5.6%)           malloc
+
+< 1,107,550 (13.5%, 72.4%)  ./libio/./libio/getc.c:getc
+
+<   551,071  (6.7%, 79.1%)  ./string/../sysdeps/x86_64/multiarch/strcmp-avx2.S:__strcmp_avx2
+
+<   521,228  (6.4%, 85.5%)  ./ctype/../include/ctype.h:
+    260,616  (3.2%)           __ctype_tolower_loc
+    260,612  (3.2%)           __ctype_b_loc
+
+<   468,163  (5.7%, 91.2%)  ???:
+    468,151  (5.7%)           ???
+
+<   456,071  (5.6%, 96.8%)  /usr/include/ctype.h:get_word
+
+]]></programlisting>
+
+<para>
+Each entry covers one file, and one or more functions within that file. If
+there is only one significant function within a file, as in the first entry,
+the file and function are shown on the same line separate by a colon. If there
+are multiple significant functions within a file, as in the third entry, each
+function gets its own line.
+</para>
+
+<para>
+This example involves a small C program, and shows a combination of code from
+the program itself (including functions like <function>get_word</function> and
+<function>hash</function> in the file <filename>concord.c</filename>) as well
+as code from system libraries, such as functions like
+<function>malloc</function> and <function>getc</function>.
+</para>
+
+<para>
+Each entry is preceded with a <computeroutput>&lt;</computeroutput>, which can
+be useful when navigating through the output in an editor, or grepping through
+results.
+</para>
+
+<para>
+The first percentage in each column indicates the proportion of the total event
+count is covered by this line. The second percentage, which only shows on the
+first line of each entry, shows the cumulative percentage of all the entries up
+to and including this one. The entries shown here account for 96.8% of the
+instructions executed by the program.
+</para>
+
+<para>
+The name <computeroutput>???</computeroutput> is used if the file name and/or
+function name could not be determined from debugging information. If
+<filename>???</filename> filenames dominate, the program probably wasn't
+compiled with <option>-g</option>. If <function>???</function> function names
+dominate, the program may have had symbols stripped.
+</para>
+
+<para>
+After that comes function:file counts. Here is the first part of that section:
+</para>
+
+<programlisting><![CDATA[
+--------------------------------------------------------------------------------
+-- Function:file summary
+--------------------------------------------------------------------------------
+  Ir______________________  function:file
+
+> 2,086,303 (25.5%, 25.5%)  get_word:
+  1,630,232 (19.9%)           /home/njn/grind/ws1/cachegrind/concord.c
+    456,071  (5.6%)           /usr/include/ctype.h
+
+> 1,285,938 (15.7%, 41.1%)  _int_malloc:./malloc/./malloc/malloc.c
+
+> 1,107,550 (13.5%, 54.7%)  getc:./libio/./libio/getc.c
+
+>   630,918  (7.7%, 62.4%)  hash:/home/njn/grind/ws1/cachegrind/concord.c
+
+>   551,071  (6.7%, 69.1%)  __strcmp_avx2:./string/../sysdeps/x86_64/multiarch/strcmp-avx2.S
+
+>   480,248  (5.9%, 74.9%)  malloc:
+    458,225  (5.6%)           ./malloc/./malloc/malloc.c
+     22,023  (0.3%)           ./malloc/./malloc/arena.c
+
+>   468,151  (5.7%, 80.7%)  ???:???
+
+>   461,095  (5.6%, 86.3%)  insert:/home/njn/grind/ws1/cachegrind/concord.c
+]]></programlisting>
+
+<para>
+This is similar to the previous section, but is grouped by functions first and
+files second. Also, the entry markers are <computeroutput>&gt;</computeroutput>
+instead of <computeroutput>&lt;</computeroutput>.
+</para>
+
+<para>
+You might wonder why this section is needed, and how it differs from the
+previous section. The answer is inlining. In this example there are two entries
+demonstrating a function whose code is effectively spread across more than one
+file: <function>get_word</function> and <function>malloc</function>. Here is an
+example from profiling the Rust compiler, a much larger program that uses
+inlining more:
+</para>
+
+<programlisting><![CDATA[
+>  30,469,230 (1.3%, 11.1%)  <rustc_middle::ty::context::CtxtInterners>::intern_ty:
+   10,269,220 (0.5%)           /home/njn/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.12.3/src/raw/mod.rs
+    7,696,827 (0.3%)           /home/njn/dev/rust0/compiler/rustc_middle/src/ty/context.rs
+    3,858,099 (0.2%)           /home/njn/dev/rust0/library/core/src/cell.rs
+]]></programlisting>
+
+<para>
+In this case the compiled function <function>intern_ty</function> includes code
+from three different source files, due to inlining. These should be examined
+together. Older versions of cg_annotate presented this entry as three separate
+file:function entries, which would typically be intermixed with all the other
+entries, making it hard to see that they are all really part of the same
+function.
+</para>
 
 </sect2>
 
 
-<sect2 id="cg-manual.line-by-line" xreflabel="Line-by-line Counts">
-<title>Line-by-line Counts</title>
+<sect2 id="cg-manual.line-by-line" xreflabel="Per-line Counts">
+<title>Per-line Counts</title>
 
-<para>By default, all source code annotation is also shown. (Filenames to be
-annotated can also by specified manually as arguments to cg_annotate, but this
-is rarely needed.) For example, the output from running <filename>cg_annotate
-&lt;filename&gt; </filename> for our example produces the same output as above
-followed by an annotated version of <filename>concord.c</filename>, a section
-of which looks like:</para>
+<para>
+By default, a source file is annotated if it contains at least one function
+that meets the significance threshold. This can be disabled with the
+<option>--annotate</option> option.
+</para>
+
+<para>
+To continue the previous example, here is part of the annotation of the file
+<filename>concord.c</filename>:
+</para>
 
 <programlisting><![CDATA[
 --------------------------------------------------------------------------------
--- Auto-annotated source: concord.c
+-- Annotated source file: /home/njn/grind/ws1/cachegrind/docs/concord.c
 --------------------------------------------------------------------------------
-Ir        I1mr ILmr Dr      D1mr  DLmr  Dw      D1mw   DLmw
+Ir____________
 
-        .    .    .       .     .     .       .      .      .  void init_hash_table(char *file_name, Word_Node *table[])
-        3    1    1       .     .     .       1      0      0  {
-        .    .    .       .     .     .       .      .      .      FILE *file_ptr;
-        .    .    .       .     .     .       .      .      .      Word_Info *data;
-        1    0    0       .     .     .       1      1      1      int line = 1, i;
-        .    .    .       .     .     .       .      .      .
-        5    0    0       .     .     .       3      0      0      data = (Word_Info *) create(sizeof(Word_Info));
-        .    .    .       .     .     .       .      .      .
-    4,991    0    0   1,995     0     0     998      0      0      for (i = 0; i < TABLE_SIZE; i++)
-    3,988    1    1   1,994     0     0     997     53     52          table[i] = NULL;
-        .    .    .       .     .     .       .      .      .
-        .    .    .       .     .     .       .      .      .      /* Open file, check it. */
-        6    0    0       1     0     0       4      0      0      file_ptr = fopen(file_name, "r");
-        2    0    0       1     0     0       .      .      .      if (!(file_ptr)) {
-        .    .    .       .     .     .       .      .      .          fprintf(stderr, "Couldn't open '%s'.\n", file_name);
-        1    1    1       .     .     .       .      .      .          exit(EXIT_FAILURE);
-        .    .    .       .     .     .       .      .      .      }
-        .    .    .       .     .     .       .      .      .
-  165,062    1    1  73,360     0     0  91,700      0      0      while ((line = get_word(data, line, file_ptr)) != EOF)
-  146,712    0    0  73,356     0     0  73,356      0      0          insert(data->;word, data->line, table);
-        .    .    .       .     .     .       .      .      .
-        4    0    0       1     0     0       2      0      0      free(data);
-        4    0    0       1     0     0       2      0      0      fclose(file_ptr);
-        3    0    0       2     0     0       .      .      .  }]]></programlisting>
+      .         /* Function builds the hash table from the given file. */  
+      .         void init_hash_table(char *file_name, Word_Node *table[])  
+      8 (0.0%)  {                                                          
+      .             FILE *file_ptr;                                        
+      .             Word_Info *data;                                       
+      2 (0.0%)      int line = 1, i;                                       
+      .                                                                    
+      .             /* Structure used when reading in words and line numbers. */
+      3 (0.0%)      data = (Word_Info *) create(sizeof(Word_Info));        
+      .                                                                    
+      .             /* Initialise entire table to NULL. */                 
+  2,993 (0.0%)      for (i = 0; i < TABLE_SIZE; i++)                       
+    997 (0.0%)          table[i] = NULL;                                   
+      .                                                                    
+      .             /* Open file, check it. */                             
+      4 (0.0%)      file_ptr = fopen(file_name, "r");                      
+      2 (0.0%)      if (!(file_ptr)) {                                     
+      .                 fprintf(stderr, "Couldn't open '%s'.\n", file_name);
+      .                 exit(EXIT_FAILURE);                                
+      .             }                                                      
+      .                                                                    
+      .             /*  'Get' the words and lines one at a time from the file, and insert them
+      .             ** into the table one at a time. */                    
+ 55,363 (0.7%)      while ((line = get_word(data, line, file_ptr)) != EOF) 
+ 31,632 (0.4%)          insert(data->word, data->line, table);             
+      .                                                                    
+      2 (0.0%)      free(data);                                            
+      2 (0.0%)      fclose(file_ptr);                                      
+      6 (0.0%)  }  
+]]></programlisting>
 
-<para>(Although column widths are automatically minimised, a wide
-terminal is clearly useful.)</para>
-  
-<para>Each source file is clearly marked
-(<computeroutput>User-annotated source</computeroutput>) as
-having been chosen manually for annotation.  If the file was
-found in one of the directories specified with the
-<option>-I</option>/<option>--include</option> option, the directory
-and file are both given.</para>
+<para>
+Each executed line is annotated with its event counts. Other lines are
+annotated with a dot. This may be because they contain no executable code, or
+they contain executable code but were never executed.
+</para>
 
-<para>Each line is annotated with its event counts.  Events not
-applicable for a line are represented by a dot.  This is useful
-for distinguishing between an event which cannot happen, and one
-which can but did not.</para>
+<para>
+You can easily tell if a function is inlined from this output. If it is not
+inlined, it will have event counts on the lines containing the opening and
+closing braces. If it is inlined, it will not have event counts on those lines.
+In the example above, <function>init_hash_table</function> does have counts,
+so you can tell it is not inlined.
+</para>
 
-<para>Sometimes only a small section of a source file is
-executed.  To minimise uninteresting output, Cachegrind only shows
-annotated lines and lines within a small distance of annotated
-lines.  Gaps are marked with the line numbers so you know which
-part of a file the shown code comes from, eg:</para>
+<para>
+Note again that inlining can lead to surprising results. If a function
+<function>f</function> is always inlined, in the file:function and
+function:file sections counts will be attributed to the functions it is inlined
+into, rather than itself. However, if you look at the line-by-line annotations
+for <function>f</function> you'll see the counts that belong to
+<function>f</function>. So it's worth looking for large counts/percentages in the
+line-by-line annotations.
+</para>
+
+<para>
+Sometimes only a small section of a source file is executed. To minimise
+uninteresting output, Cachegrind only shows annotated lines and lines within a
+small distance of annotated lines. Gaps are marked with line numbers, for
+example:
+</para>
 
 <programlisting><![CDATA[
-(figures and code for line 704)
--- line 704 ----------------------------------------
--- line 878 ----------------------------------------
-(figures and code for line 878)]]></programlisting>
+(counts and code for line 704)
+-- line 375 ----------------------------------------
+-- line 514 ----------------------------------------
+(counts and code for line 878)
+]]></programlisting>
 
-<para>The amount of context to show around annotated lines is
-controlled by the <option>--context</option>
-option.</para>
+<para>
+The number of lines of context shown around annotated lines is controlled by
+the <option>--context</option> option.
+</para>
 
-<para>Automatic annotation is enabled by default.
-cg_annotate will automatically annotate every source file it can
-find that is mentioned in the function-by-function summary.
-Therefore, the files chosen for auto-annotation are affected by
-the <option>--sort</option> and
-<option>--threshold</option> options.  Each
-source file is clearly marked (<computeroutput>Auto-annotated
-source</computeroutput>) as being chosen automatically.  Any
-files that could not be found are mentioned at the end of the
-output, eg:</para>
+<para>
+Any significant source files that could not be found are shown like this:
+</para>
 
 <programlisting><![CDATA[
-------------------------------------------------------------------
-The following files chosen for auto-annotation could not be found:
-------------------------------------------------------------------
-  getc.c
-  ctype.c
-  ../sysdeps/generic/lockfile.c]]></programlisting>
+--------------------------------------------------------------------------------
+-- Annotated source file: ./malloc/./malloc/malloc.c                       
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:    
+- ./malloc/./malloc/malloc.c 
+]]></programlisting>
 
-<para>This is quite common for library files, since libraries are
-usually compiled with debugging information, but the source files
-are often not present on a system.  If a file is chosen for
-annotation both manually and automatically, it
-is marked as <computeroutput>User-annotated
-source</computeroutput>. Use the
-<option>-I</option>/<option>--include</option> option to tell Valgrind where
-to look for source files if the filenames found from the debugging
-information aren't specific enough.</para>
+<para>
+This is common for library files, because libraries are usually compiled with
+debugging information but the source files are rarely present on a system.
+</para>
 
-<para> Beware that auto-annotation can produce a lot of output if your program
-is large.</para>
+<para>
+Cachegrind relies heavily on accurate debug info. Sometimes compilers do not
+map a particular compiled instruction to line number 0, where the 0 represents
+"unknown" or "none". This is annoying but does happen in practice. cg_annotate
+prints these in the following way:
+</para>
+
+<programlisting><![CDATA[
+--------------------------------------------------------------------------------
+-- Annotated source file: /home/njn/dev/rust0/compiler/rustc_borrowck/src/lib.rs
+--------------------------------------------------------------------------------
+Ir______________
+
+1,046,746 (0.0%)  <unknown (line 0)>
+]]></programlisting>
+
+<para>
+Finally, when annotation is performed, the output ends with a summary of how
+many counts were annotated and unannotated, and why. For example:
+</para>
+
+<programlisting><![CDATA[
+--------------------------------------------------------------------------------
+-- Annotation summary
+--------------------------------------------------------------------------------
+Ir_______________ 
+
+3,534,817 (43.1%)    annotated: files known & above threshold & readable, line numbers known
+        0            annotated: files known & above threshold & readable, line numbers unknown
+        0          unannotated: files known & above threshold & two or more non-identical
+4,132,126 (50.4%)  unannotated: files known & above threshold & unreadable 
+   59,950  (0.7%)  unannotated: files known & below threshold
+  468,163  (5.7%)  unannotated: files unknown
+]]></programlisting>
 
 </sect2>
 
 
-<sect2 id="cg-manual.assembler" xreflabel="Annotating Assembly Code Programs">
-<title>Annotating Assembly Code Programs</title>
-
-<para>Valgrind can annotate assembly code programs too, or annotate
-the assembly code generated for your C program.  Sometimes this is
-useful for understanding what is really happening when an
-interesting line of C code is translated into multiple
-instructions.</para>
-
-<para>To do this, you just need to assemble your
-<computeroutput>.s</computeroutput> files with assembly-level debug
-information.  You can use compile with the <option>-S</option> to compile C/C++
-programs to assembly code, and then assemble the assembly code files with
-<option>-g</option> to achieve this.  You can then profile and annotate the
-assembly code source files in the same way as C/C++ source files.</para>
-
-</sect2>
-
 <sect2 id="cg-manual.forkingprograms" xreflabel="Forking Programs">
 <title>Forking Programs</title>
-<para>If your program forks, the child will inherit all the profiling data that
-has been gathered for the parent.</para>
 
-<para>If the output file format string (controlled by
-<option>--cachegrind-out-file</option>) does not contain <option>%p</option>,
-then the outputs from the parent and child will be intermingled in a single
-output file, which will almost certainly make it unreadable by
-cg_annotate.</para>
+<para>
+If your program forks, the child will inherit all the profiling data that
+has been gathered for the parent.
+</para>
+
+<para>
+If the output file name (controlled by <option>--cachegrind-out-file</option>)
+does not contain <option>%p</option>, then the outputs from the parent and
+child will be intermingled in a single output file, which will almost certainly
+make it unreadable by cg_annotate.
+</para>
+
 </sect2>
 
 
 <sect2 id="cg-manual.annopts.warnings" xreflabel="cg_annotate Warnings">
 <title>cg_annotate Warnings</title>
 
-<para>There are a couple of situations in which
-cg_annotate issues warnings.</para>
+<para>
+There are two situations in which cg_annotate prints warnings.
+</para>
 
 <itemizedlist>
   <listitem>
-    <para>If a source file is more recent than the
-    <filename>cachegrind.out.&lt;pid&gt;</filename> file.
-    This is because the information in
-    <filename>cachegrind.out.&lt;pid&gt;</filename> is only
-    recorded with line numbers, so if the line numbers change at
-    all in the source (e.g.  lines added, deleted, swapped), any
-    annotations will be incorrect.</para>
+    <para>
+    If a source file is more recent than the Cachegrind output file. This is
+    because the information in the Cachegrind output file is only recorded with
+    line numbers, so if the line numbers change at all in the source (e.g.
+    lines added, deleted, swapped), any annotations will be incorrect.
+    </para>
   </listitem>
   <listitem>
-    <para>If information is recorded about line numbers past the
-    end of a file.  This can be caused by the above problem,
-    i.e. shortening the source file while using an old
-    <filename>cachegrind.out.&lt;pid&gt;</filename> file.  If
-    this happens, the figures for the bogus lines are printed
-    anyway (clearly marked as bogus) in case they are
-    important.</para>
+    <para>
+    If information is recorded about line numbers past the end of a file. This
+    can be caused by the above problem, e.g. shortening the source file while
+    using an old Cachegrind output file. If this happens, the figures for the
+    bogus lines are printed anyway (and clearly marked as bogus) in case they
+    are important.
+    </para>
   </listitem>
 </itemizedlist>
 
 </sect2>
 
 
-
-<sect2 id="cg-manual.annopts.things-to-watch-out-for"
-       xreflabel="Unusual Annotation Cases">
-<title>Unusual Annotation Cases</title>
-
-<para>Some odd things that can occur during annotation:</para>
-
-<itemizedlist>
-  <listitem>
-    <para>If annotating at the assembler level, you might see
-    something like this:</para>
-<programlisting><![CDATA[
-      1    0    0  .    .    .  .    .    .          leal -12(%ebp),%eax
-      1    0    0  .    .    .  1    0    0          movl %eax,84(%ebx)
-      2    0    0  0    0    0  1    0    0          movl $1,-20(%ebp)
-      .    .    .  .    .    .  .    .    .          .align 4,0x90
-      1    0    0  .    .    .  .    .    .          movl $.LnrB,%eax
-      1    0    0  .    .    .  1    0    0          movl %eax,-16(%ebp)]]></programlisting>
-
-    <para>How can the third instruction be executed twice when
-    the others are executed only once?  As it turns out, it
-    isn't.  Here's a dump of the executable, using
-    <computeroutput>objdump -d</computeroutput>:</para>
-<programlisting><![CDATA[
-      8048f25:       8d 45 f4                lea    0xfffffff4(%ebp),%eax
-      8048f28:       89 43 54                mov    %eax,0x54(%ebx)
-      8048f2b:       c7 45 ec 01 00 00 00    movl   $0x1,0xffffffec(%ebp)
-      8048f32:       89 f6                   mov    %esi,%esi
-      8048f34:       b8 08 8b 07 08          mov    $0x8078b08,%eax
-      8048f39:       89 45 f0                mov    %eax,0xfffffff0(%ebp)]]></programlisting>
-
-    <para>Notice the extra <computeroutput>mov
-    %esi,%esi</computeroutput> instruction.  Where did this come
-    from?  The GNU assembler inserted it to serve as the two
-    bytes of padding needed to align the <computeroutput>movl
-    $.LnrB,%eax</computeroutput> instruction on a four-byte
-    boundary, but pretended it didn't exist when adding debug
-    information.  Thus when Valgrind reads the debug info it
-    thinks that the <computeroutput>movl
-    $0x1,0xffffffec(%ebp)</computeroutput> instruction covers the
-    address range 0x8048f2b--0x804833 by itself, and attributes
-    the counts for the <computeroutput>mov
-    %esi,%esi</computeroutput> to it.</para>
-  </listitem>
-
-  <!--
-  I think this isn't true any more, not since cost centres were moved from
-  being associated with instruction addresses to being associated with
-  source line numbers.
-  <listitem>
-    <para>Inlined functions can cause strange results in the
-    function-by-function summary.  If a function
-    <computeroutput>inline_me()</computeroutput> is defined in
-    <filename>foo.h</filename> and inlined in the functions
-    <computeroutput>f1()</computeroutput>,
-    <computeroutput>f2()</computeroutput> and
-    <computeroutput>f3()</computeroutput> in
-    <filename>bar.c</filename>, there will not be a
-    <computeroutput>foo.h:inline_me()</computeroutput> function
-    entry.  Instead, there will be separate function entries for
-    each inlining site, i.e.
-    <computeroutput>foo.h:f1()</computeroutput>,
-    <computeroutput>foo.h:f2()</computeroutput> and
-    <computeroutput>foo.h:f3()</computeroutput>.  To find the
-    total counts for
-    <computeroutput>foo.h:inline_me()</computeroutput>, add up
-    the counts from each entry.</para>
-
-    <para>The reason for this is that although the debug info
-    output by GCC indicates the switch from
-    <filename>bar.c</filename> to <filename>foo.h</filename>, it
-    doesn't indicate the name of the function in
-    <filename>foo.h</filename>, so Valgrind keeps using the old
-    one.</para>
-  </listitem>
-  -->
-
-  <listitem>
-    <para>Sometimes, the same filename might be represented with
-    a relative name and with an absolute name in different parts
-    of the debug info, eg:
-    <filename>/home/user/proj/proj.h</filename> and
-    <filename>../proj.h</filename>.  In this case, if you use
-    auto-annotation, the file will be annotated twice with the
-    counts split between the two.</para>
-  </listitem>
-
-  <listitem>
-    <para>If you compile some files with
-    <option>-g</option> and some without, some
-    events that take place in a file without debug info could be
-    attributed to the last line of a file with debug info
-    (whichever one gets placed before the non-debug-info file in
-    the executable).</para>
-  </listitem>
-
-</itemizedlist>
-
-<para>These cases should be rare.</para>
-
-</sect2>
-
-
 <sect2 id="cg-manual.cg_merge" xreflabel="cg_merge">
-<title>Merging Profiles with cg_merge</title>
+<title>Merging Cachegrind Output Files</title>
 
 <para>
-cg_merge is a simple program which
-reads multiple profile files, as created by Cachegrind, merges them
-together, and writes the results into another file in the same format.
-You can then examine the merged results using
-<computeroutput>cg_annotate &lt;filename&gt;</computeroutput>, as
-described above.  The merging functionality might be useful if you
-want to aggregate costs over multiple runs of the same program, or
-from a single parallel run with multiple instances of the same
-program.</para>
+cg_annotate can merge data from multiple Cachegrind output files in a single
+run. (There is also a program called cg_merge that can merge multiple
+Cachegrind output files into a single Cachegrind output file, but it is now
+deprecated because cg_annotate's merging does a better job.)
+</para>
 
 <para>
-cg_merge is invoked as follows:
+Use it as follows:
 </para>
 
 <programlisting><![CDATA[
-cg_merge -o outputfile file1 file2 file3 ...]]></programlisting>
+cg_annotate file1 file2 file3 ...
+]]></programlisting>
 
 <para>
-It reads and checks <computeroutput>file1</computeroutput>, then read
-and checks <computeroutput>file2</computeroutput> and merges it into
-the running totals, then the same with
-<computeroutput>file3</computeroutput>, etc.  The final results are
-written to <computeroutput>outputfile</computeroutput>, or to standard
-out if no output file is specified.</para>
+cg_annotate computes the sum of these files (effectively
+<filename>file1</filename> + <filename>file2</filename> +
+<filename>file3</filename>), and then produces output as usual that shows the
+summed counts.
+</para>
 
 <para>
-Costs are summed on a per-function, per-line and per-instruction
-basis.  Because of this, the order in which the input files does not
-matter, although you should take care to only mention each file once,
-since any file mentioned twice will be added in twice.</para>
-
-<para>
-cg_merge does not attempt to check
-that the input files come from runs of the same executable.  It will
-happily merge together profile files from completely unrelated
-programs.  It does however check that the
-<computeroutput>Events:</computeroutput> lines of all the inputs are
-identical, so as to ensure that the addition of costs makes sense.
-For example, it would be nonsensical for it to add a number indicating
-D1 read references to a number from a different file indicating LL
-write misses.</para>
-
-<para>
-A number of other syntax and sanity checks are done whilst reading the
-inputs.  cg_merge will stop and
-attempt to print a helpful error message if any of the input files
-fail these checks.</para>
+The most common merging scenario is if you want to aggregate costs over
+multiple runs of the same program, possibly on different inputs.
+</para>
 
 </sect2>
 
 
 <sect2 id="cg-manual.cg_diff" xreflabel="cg_diff">
-<title>Differencing Profiles with cg_diff</title>
+<title>Differencing Cachegrind output files</title>
 
 <para>
-cg_diff is a simple program which
-reads two profile files, as created by Cachegrind, finds the difference
-between them, and writes the results into another file in the same format.
-You can then examine the merged results using
-<computeroutput>cg_annotate &lt;filename&gt;</computeroutput>, as
-described above.  This is very useful if you want to measure how a change to
-a program affected its performance.
+cg_annotate can diff data from two Cachegrind output files in a single run.
+(There is also a program called cg_diff that can diff two Cachegrind output
+files into a single Cachegrind output file, but it is now deprecated because
+cg_annotate's differencing does a better job.)
 </para>
 
 <para>
-cg_diff is invoked as follows:
+Use it as follows:
 </para>
 
 <programlisting><![CDATA[
-cg_diff file1 file2]]></programlisting>
+cg_annotate --diff file1 file2
+]]></programlisting>
 
 <para>
-It reads and checks <computeroutput>file1</computeroutput>, then read
-and checks <computeroutput>file2</computeroutput>, then computes the
-difference (effectively <computeroutput>file1</computeroutput> -
-<computeroutput>file2</computeroutput>).  The final results are written to
-standard output.</para>
+cg_annotate computes the difference between these two files (effectively
+<filename>file2</filename> - <filename>file1</filename>), and then
+produces output as usual that shows the count differences. Note that many of
+the counts may be negative; this indicates that the counts for the relevant
+file/function/line are smaller in the second version than those in the first
+version.
+</para>
 
 <para>
-Costs are summed on a per-function basis.  Per-line costs are not summed,
-because doing so is too difficult.  For example, consider differencing two
-profiles, one from a single-file program A, and one from the same program A
-where a single blank line was inserted at the top of the file.  Every single
-per-line count has changed.  In comparison, the per-function counts have not
-changed.  The per-function count differences are still very useful for
-determining differences between programs.  Note that because the result is
-the difference of two profiles, many of the counts will be negative;  this
-indicates that the counts for the relevant function are fewer in the second
-version than those in the first version.</para>
+The simplest common scenario is comparing two Cachegrind output files that came
+from the same program, but on different inputs. cg_annotate will do a good job
+on this without assistance.
+</para>
 
 <para>
-cg_diff does not attempt to check
-that the input files come from runs of the same executable.  It will
-happily merge together profile files from completely unrelated
-programs.  It does however check that the
-<computeroutput>Events:</computeroutput> lines of all the inputs are
-identical, so as to ensure that the addition of costs makes sense.
-For example, it would be nonsensical for it to add a number indicating
-D1 read references to a number from a different file indicating LL
-write misses.</para>
+A more complex scenario is if you want to compare Cachegrind output files from
+two slightly different versions of a program that you have sitting
+side-by-side, running on the same input. For example, you might have
+<filename>version1/prog.c</filename> and <filename>version2/prog.c</filename>.
+A straight comparison of the two would not be useful. Because functions are
+always paired with filenames, a function <function>f</function> would be listed
+as <filename>version1/prog.c:f</filename> for the first version but
+<filename>version2/prog.c:f</filename> for the second version.
+</para>
 
 <para>
-A number of other syntax and sanity checks are done whilst reading the
-inputs.  cg_diff will stop and
-attempt to print a helpful error message if any of the input files
-fail these checks.</para>
-
-<para>
-Sometimes you will want to compare Cachegrind profiles of two versions of a
-program that you have sitting side-by-side.  For example, you might have
-<computeroutput>version1/prog.c</computeroutput> and
-<computeroutput>version2/prog.c</computeroutput>, where the second is
-slightly different to the first.  A straight comparison of the two will not
-be useful -- because functions are qualified with filenames, a function
-<function>f</function> will be listed as
-<computeroutput>version1/prog.c:f</computeroutput> for the first version but
-<computeroutput>version2/prog.c:f</computeroutput> for the second
-version.</para>
-
-<para>
-When this happens, you can use the <option>--mod-filename</option> option.
-Its argument is a Perl search-and-replace expression that will be applied
-to all the filenames in both Cachegrind output files.  It can be used to
-remove minor differences in filenames.  For example, the option
-<option>--mod-filename='s/version[0-9]/versionN/'</option> will suffice for
-this case.</para>
+In this case, use the <option>--mod-filename</option> option. Its argument is a
+search-and-replace expression that will be applied to all the filenames in both
+Cachegrind output files.  It can be used to remove minor differences in
+filenames. For example, the option
+<option>--mod-filename='s/version[0-9]/versionN/'</option> will suffice for the
+above example.
+</para>
 
 <para>
 Similarly, sometimes compilers auto-generate certain functions and give them
-randomized names.  For example, GCC sometimes auto-generates functions with
-names like <function>T.1234</function>, and the suffixes vary from build to
-build.  You can use the <option>--mod-funcname</option> option to remove
-small differences like these;  it works in the same way as
-<option>--mod-filename</option>.</para>
+randomized names like <function>T.1234</function> where the suffixes vary from
+build to build. You can use the <option>--mod-funcname</option> option to
+remove small differences like these; it works in the same way as
+<option>--mod-filename</option>.
+</para>
+
+<para>
+When <option>--mod-filename</option> is used to compare two different versions
+of the same program, cg_annotate will not annotate any file that is different
+between the two versions, because the per-line counts are not reliable in such
+a case. For example, imagine if <filename>version2/prog.c</filename> is the
+same as <filename>version1/prog.c</filename> except with an extra blank line at
+the top of the file. Every single per-line count will have changed. In
+comparison, the per-file and per-function counts have not changed, and are
+still very useful for determining differences between programs. You might think
+that this means every interesting file will be left unannotated, but again
+inlining means that files that are identical in the two versions can have
+different counts on many lines.
+</para>
+
 
 </sect2>
 
+<sect2 id="cg-manual.cache-branch-sim" xreflabel="cache-branch-sim">
+<title>Cache and Branch Simulation</title>
+
+<para>
+Cachegrind can simulate how your program interacts with a machine's cache
+hierarchy and/or branch predictor.
+
+The cache simulation models a machine with independent first-level instruction
+and data caches (I1 and D1), backed by a unified second-level cache (L2). For
+these machines (in the cases where Cachegrind can auto-detect the cache
+configuration) Cachegrind simulates the first-level and last-level caches.
+Therefore, Cachegrind always refers to the I1, D1 and LL (last-level) caches.
+</para>
+
+<para>
+When simulating the cache, with <option>--cache-sim=yes</option>, Cachegrind
+gathers the following statistics:
+</para>
+
+<itemizedlist>
+  <listitem>
+    <para>
+    I cache reads (<computeroutput>Ir</computeroutput>, which equals the number
+    of instructions executed), I1 cache read misses
+    (<computeroutput>I1mr</computeroutput>) and LL cache instruction read
+    misses (<computeroutput>ILmr</computeroutput>).
+    </para>
+  </listitem>
+  <listitem>
+    <para>
+    D cache reads (<computeroutput>Dr</computeroutput>, which equals the number
+    of memory reads), D1 cache read misses
+    (<computeroutput>D1mr</computeroutput>), and LL cache data read misses
+    (<computeroutput>DLmr</computeroutput>).
+    </para>
+  </listitem>
+  <listitem>
+    <para>
+    D cache writes (<computeroutput>Dw</computeroutput>, which equals the
+    number of memory writes), D1 cache write misses
+    (<computeroutput>D1mw</computeroutput>), and LL cache data write misses
+    (<computeroutput>DLmw</computeroutput>).
+    </para>
+  </listitem>
+</itemizedlist>
+
+<para>
+Note that D1 total accesses is given by <computeroutput>D1mr</computeroutput> +
+<computeroutput>D1mw</computeroutput>, and that LL total accesses is given by
+<computeroutput>ILmr</computeroutput> + <computeroutput>DLmr</computeroutput> +
+<computeroutput>DLmw</computeroutput>.
+</para>
+
+<para>
+When simulating the branch predictor, with <option>--branch-sim=yes</option>,
+Cachegrind gathers the following statistics:
+</para>
+
+<itemizedlist>
+  <listitem>
+    <para>
+    Conditional branches executed (<computeroutput>Bc</computeroutput>) and
+    conditional branches mispredicted (<computeroutput>Bcm</computeroutput>).
+    </para>
+  </listitem>
+  <listitem>
+    <para>
+    Indirect branches executed (<computeroutput>Bi</computeroutput>) and
+    indirect branches mispredicted (<computeroutput>Bim</computeroutput>).
+    </para>
+  </listitem>
+</itemizedlist>
+
+<para>
+When cache and/or branch simulation is enabled, cg_annotate will print multiple
+counts per line of output. For example:
+</para>
+
+<programlisting><![CDATA[
+  Ir______________________ Bc____________________ Bcm__________________ Bi____________________ Bim______________  function:file
+
+>     8,547  (0.1%, 99.4%)     936  (0.1%, 99.1%)    177  (0.3%, 96.7%)      59  (0.0%, 99.9%) 38 (19.4%, 66.3%)  strcmp:
+      8,503  (0.1%)            928  (0.1%)           175  (0.3%)             59  (0.0%)        38 (19.4%)           ./string/../sysdeps/x86_64/multiarch/../multiarch/strcmp-sse2.S
+]]></programlisting>
+
+</sect2>
 
 </sect1>
 
 
-
 <sect1 id="cg-manual.cgopts" xreflabel="Cachegrind Command-line Options">
 <title>Cachegrind Command-line Options</title>
 
 <!-- start of xi:include in the manpage -->
-<para>Cachegrind-specific options are:</para>
+<para>
+Cachegrind-specific options are:
+</para>
 
 <variablelist id="cg.opts.list">
 
-  <varlistentry id="cg.opt.I1" xreflabel="--I1">
+  <varlistentry id="opt.cachegrind-out-file" xreflabel="--cachegrind-out-file">
     <term>
-      <option><![CDATA[--I1=<size>,<associativity>,<line size> ]]></option>
+      <option><![CDATA[--cachegrind-out-file=<file> ]]></option>
     </term>
     <listitem>
-      <para>Specify the size, associativity and line size of the level 1
-      instruction cache.  </para>
-    </listitem>
-  </varlistentry>
-
-  <varlistentry id="cg.opt.D1" xreflabel="--D1">
-    <term>
-      <option><![CDATA[--D1=<size>,<associativity>,<line size> ]]></option>
-    </term>
-    <listitem>
-      <para>Specify the size, associativity and line size of the level 1
-      data cache.</para>
-    </listitem>
-  </varlistentry>
-
-  <varlistentry id="cg.opt.LL" xreflabel="--LL">
-    <term>
-      <option><![CDATA[--LL=<size>,<associativity>,<line size> ]]></option>
-    </term>
-    <listitem>
-      <para>Specify the size, associativity and line size of the last-level
-      cache.</para>
+      <para>
+      Write the Cachegrind output file to <filename>file</filename> rather than
+      to the default output file,
+      <filename>cachegrind.out.&lt;pid&gt;</filename>. The <option>%p</option>
+      and <option>%q</option> format specifiers can be used to embed the
+      process ID and/or the contents of an environment variable in the name, as
+      is the case for the core option
+      <option><link linkend="opt.log-file">--log-file</link></option>.
+      </para>
     </listitem>
   </varlistentry>
 
   <varlistentry id="opt.cache-sim" xreflabel="--cache-sim">
     <term>
-      <option><![CDATA[--cache-sim=no|yes [yes] ]]></option>
+      <option><![CDATA[--cache-sim=no|yes [no] ]]></option>
     </term>
     <listitem>
-      <para>Enables or disables collection of cache access and miss
-            counts.</para>
+      <para>
+      Enables or disables collection of cache access and miss counts.
+      </para>
     </listitem>
   </varlistentry>
 
@@ -835,29 +869,45 @@ small differences like these;  it works in the same way as
       <option><![CDATA[--branch-sim=no|yes [no] ]]></option>
     </term>
     <listitem>
-      <para>Enables or disables collection of branch instruction and
-            misprediction counts.  By default this is disabled as it
-            slows Cachegrind down by approximately 25%.  Note that you
-            cannot specify <option>--cache-sim=no</option>
-            and <option>--branch-sim=no</option>
-            together, as that would leave Cachegrind with no
-            information to collect.</para>
+      <para>
+      Enables or disables collection of branch instruction and
+      misprediction counts.
+      </para>
     </listitem>
   </varlistentry>
 
-  <varlistentry id="opt.cachegrind-out-file" xreflabel="--cachegrind-out-file">
+  <varlistentry id="cg.opt.I1" xreflabel="--I1">
     <term>
-      <option><![CDATA[--cachegrind-out-file=<file> ]]></option>
+      <option><![CDATA[--I1=<size>,<associativity>,<line size> ]]></option>
     </term>
     <listitem>
-      <para>Write the profile data to 
-            <computeroutput>file</computeroutput> rather than to the default
-            output file,
-            <filename>cachegrind.out.&lt;pid&gt;</filename>.  The
-            <option>%p</option> and <option>%q</option> format specifiers
-            can be used to embed the process ID and/or the contents of an
-            environment variable in the name, as is the case for the core
-            option <option><link linkend="opt.log-file">--log-file</link></option>.
+      <para>
+      Specify the size, associativity and line size of the level 1 instruction
+      cache. Only useful with <option>--cache-sim=yes</option>.
+      </para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry id="cg.opt.D1" xreflabel="--D1">
+    <term>
+      <option><![CDATA[--D1=<size>,<associativity>,<line size> ]]></option>
+    </term>
+    <listitem>
+      <para>
+      Specify the size, associativity and line size of the level 1 data cache.
+      Only useful with <option>--cache-sim=yes</option>.
+      </para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry id="cg.opt.LL" xreflabel="--LL">
+    <term>
+      <option><![CDATA[--LL=<size>,<associativity>,<line size> ]]></option>
+    </term>
+    <listitem>
+      <para>
+      Specify the size, associativity and line size of the last-level cache.
+      Only useful with <option>--cache-sim=yes</option>.
       </para>
     </listitem>
   </varlistentry>
@@ -895,29 +945,65 @@ small differences like these;  it works in the same way as
 
   <varlistentry>
     <term>
-      <option><![CDATA[--show=A,B,C [default: all, using order in
-      cachegrind.out.<pid>] ]]></option>
+      <option><![CDATA[--diff ]]></option>
     </term>
     <listitem>
-      <para>Specifies which events to show (and the column
-      order). Default is to use all present in the
-      <filename>cachegrind.out.&lt;pid&gt;</filename> file (and
-      use the order in the file).  Useful if you want to concentrate on, for
-      example, I cache misses (<option>--show=I1mr,ILmr</option>), or data
-      read misses (<option>--show=D1mr,DLmr</option>), or LL data misses
-      (<option>--show=DLmr,DLmw</option>).  Best used in conjunction with
-      <option>--sort</option>.</para>
+      <para>Diff two Cachegrind output files.</para>
     </listitem>
   </varlistentry>
 
   <varlistentry>
     <term>
-      <option><![CDATA[--sort=A,B,C [default: order in
-      cachegrind.out.<pid>] ]]></option>
+      <option><![CDATA[--mod-filename <regex> [default: none]]]></option>
     </term>
     <listitem>
-      <para>Specifies the events upon which the sorting of the
-      function-by-function entries will be based.</para>
+      <para>
+      Specifies an <option>s/old/new/</option> search-and-replace expression
+      that is applied to all filenames. Useful when differencing, for removing
+      minor differences in paths between two different versions of a program
+      that are sitting in different directories. An <option>i</option> suffix
+      makes the regex case-insensitive, and a <option>g</option> suffix makes
+      it match multiple times.
+      </para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>
+      <option><![CDATA[--mod-funcname <regex> [default: none]]]></option>
+    </term>
+    <listitem>
+      <para>
+      Like <option>--mod-filename</option>, but for filenames. Useful for
+      removing minor differences in randomized names of auto-generated
+      functions generated by some compilers.
+      </para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>
+      <option><![CDATA[--show=A,B,C [default: all, using order in
+      the Cachegrind output file] ]]></option>
+    </term>
+    <listitem>
+      <para>
+      Specifies which events to show (and the column order). Default is to use
+      all present in the Cachegrind output file (and use the order in the
+      file). Best used in conjunction with <option>--sort</option>.
+      </para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>
+      <option><![CDATA[--sort=A,B,C [default: order in the Cachegrind output file] ]]></option>
+    </term>
+    <listitem>
+      <para>
+      Specifies the events upon which the sorting of the file:function and
+      function:file entries will be based.
+      </para>
     </listitem>
   </varlistentry>
 
@@ -926,18 +1012,12 @@ small differences like these;  it works in the same way as
       <option><![CDATA[--threshold=X [default: 0.1%] ]]></option>
     </term>
     <listitem>
-      <para>Sets the threshold for the function-by-function
-      summary.  A function is shown if it accounts for more than X%
-      of the counts for the primary sort event.  If auto-annotating, also
-      affects which files are annotated.</para>
-        
-      <para>Note: thresholds can be set for more than one of the
-      events by appending any events for the
-      <option>--sort</option> option with a colon
-      and a number (no spaces, though).  E.g. if you want to see
-      each function that covers more than 1% of LL read misses or 1% of LL
-      write misses, use this option:</para>
-      <para><option>--sort=DLmr:1,DLmw:1</option></para>
+      <para>
+      Sets the significance threshold for the file:function and function:files
+      sections. A file or function is shown if it accounts for more than X% of
+      the counts for the primary sort event.  If annotating source files, this
+      also affects which files are annotated.
+      </para>
     </listitem>
   </varlistentry>
 
@@ -946,20 +1026,21 @@ small differences like these;  it works in the same way as
       <option><![CDATA[--show-percs, --no-show-percs, --show-percs=<no|yes> [default: yes] ]]></option>
     </term>
     <listitem>
-      <para>When enabled, a percentage is printed next to all event counts.
-      This helps gauge the relative importance of each function and line.
+      <para>
+      When enabled, a percentage is printed next to all event counts. This
+      helps gauge the relative importance of each function and line.
       </para>
     </listitem>
   </varlistentry>
 
   <varlistentry>
     <term>
-      <option><![CDATA[--auto, --no-auto, --auto=<no|yes> [default: yes] ]]></option>
+      <option><![CDATA[--annotate, --no-annotate, --auto=<no|yes> [default: yes] ]]></option>
     </term>
     <listitem>
-      <para>When enabled, automatically annotates every file that
-      is mentioned in the function-by-function summary that can be
-      found.  Also gives a list of those that couldn't be found.</para>
+      <para>
+      Enables or disables source file annotation.
+      </para>
     </listitem>
   </varlistentry>
 
@@ -968,21 +1049,10 @@ small differences like these;  it works in the same way as
       <option><![CDATA[--context=N [default: 8] ]]></option>
     </term>
     <listitem>
-      <para>Print N lines of context before and after each
-      annotated line.  Avoids printing large sections of source
-      files that were not executed.  Use a large number
-      (e.g. 100000) to show all source lines.</para>
-    </listitem>
-  </varlistentry>
-
-  <varlistentry>
-    <term>
-      <option><![CDATA[-I<dir> --include=<dir> [default: none] ]]></option>
-    </term>
-    <listitem>
-      <para>Adds a directory to the list in which to search for
-      files.  Multiple <option>-I</option>/<option>--include</option>
-      options can be given to add multiple directories.</para>
+      <para>
+      The number of lines of context to show before and after each annotated
+      line. Use a large number (e.g. 100000) to show all source lines.
+      </para>
     </listitem>
   </varlistentry>
 
@@ -995,6 +1065,8 @@ small differences like these;  it works in the same way as
 <sect1 id="cg-manual.mergeopts" xreflabel="cg_merge Command-line Options">
 <title>cg_merge Command-line Options</title>
 
+Although cg_merge is deprecated, its options are listed here for completeness.
+
 <!-- start of xi:include in the manpage -->
 <variablelist id="cg_merge.opts.list">
 
@@ -1003,8 +1075,9 @@ small differences like these;  it works in the same way as
       <option><![CDATA[-o outfile]]></option>
     </term>
     <listitem>
-      <para>Write the profile data to <computeroutput>outfile</computeroutput>
-            rather than to standard output.
+      <para>
+      Write the output to to <computeroutput>outfile</computeroutput>
+      instead of standard output.
       </para>
     </listitem>
   </varlistentry>
@@ -1018,6 +1091,8 @@ small differences like these;  it works in the same way as
 <sect1 id="cg-manual.diffopts" xreflabel="cg_diff Command-line Options">
 <title>cg_diff Command-line Options</title>
 
+Although cg_diff is deprecated, its options are listed here for completeness.
+
 <!-- start of xi:include in the manpage -->
 <variablelist id="cg_diff.opts.list">
 
@@ -1044,10 +1119,10 @@ small differences like these;  it works in the same way as
       <option><![CDATA[--mod-filename=<expr> [default: none]]]></option>
     </term>
     <listitem>
-      <para>Specifies a Perl search-and-replace expression that is applied
-      to all filenames.  Useful for removing minor differences in paths
-      between two different versions of a program that are sitting in
-      different directories.</para>
+      <para>
+      Specifies an <option>s/old/new/</option> search-and-replace expression
+      that is applied to all filenames.
+      </para>
     </listitem>
   </varlistentry>
 
@@ -1056,9 +1131,9 @@ small differences like these;  it works in the same way as
       <option><![CDATA[--mod-funcname=<expr> [default: none]]]></option>
     </term>
     <listitem>
-      <para>Like <option>--mod-filename</option>, but for filenames.
-      Useful for removing minor differences in randomized names of
-      auto-generated functions generated by some compilers.</para>
+      <para>
+      Like <option>--mod-filename</option>, but for filenames.
+      </para>
     </listitem>
   </varlistentry>
 
@@ -1068,99 +1143,6 @@ small differences like these;  it works in the same way as
 </sect1>
 
 
-
-
-<sect1 id="cg-manual.acting-on"
-       xreflabel="Acting on Cachegrind's Information">
-<title>Acting on Cachegrind's Information</title>
-<para>
-Cachegrind gives you lots of information, but acting on that information
-isn't always easy.  Here are some rules of thumb that we have found to be
-useful.</para>
-
-<para>
-First of all, the global hit/miss counts and miss rates are not that useful.
-If you have multiple programs or multiple runs of a program, comparing the
-numbers might identify if any are outliers and worthy of closer
-investigation.  Otherwise, they're not enough to act on.</para>
-
-<para>
-The function-by-function counts are more useful to look at, as they pinpoint
-which functions are causing large numbers of counts.  However, beware that
-inlining can make these counts misleading.  If a function
-<function>f</function> is always inlined, counts will be attributed to the
-functions it is inlined into, rather than itself.  However, if you look at
-the line-by-line annotations for <function>f</function> you'll see the
-counts that belong to <function>f</function>.  (This is hard to avoid, it's
-how the debug info is structured.)  So it's worth looking for large numbers
-in the line-by-line annotations.</para>
-
-<para>
-The line-by-line source code annotations are much more useful.  In our
-experience, the best place to start is by looking at the
-<computeroutput>Ir</computeroutput> numbers.  They simply measure how many
-instructions were executed for each line, and don't include any cache
-information, but they can still be very useful for identifying
-bottlenecks.</para>
-
-<para>
-After that, we have found that LL misses are typically a much bigger source
-of slow-downs than L1 misses.  So it's worth looking for any snippets of
-code with high <computeroutput>DLmr</computeroutput> or
-<computeroutput>DLmw</computeroutput> counts.  (You can use
-<option>--show=DLmr
---sort=DLmr</option> with cg_annotate to focus just on
-<literal>DLmr</literal> counts, for example.) If you find any, it's still
-not always easy to work out how to improve things.  You need to have a
-reasonable understanding of how caches work, the principles of locality, and
-your program's data access patterns.  Improving things may require
-redesigning a data structure, for example.</para>
-
-<para>
-Looking at the <computeroutput>Bcm</computeroutput> and
-<computeroutput>Bim</computeroutput> misses can also be helpful.
-In particular, <computeroutput>Bim</computeroutput> misses are often caused
-by <literal>switch</literal> statements, and in some cases these
-<literal>switch</literal> statements can be replaced with table-driven code.
-For example, you might replace code like this:</para>
-
-<programlisting><![CDATA[
-enum E { A, B, C };
-enum E e;
-int i;
-...
-switch (e)
-{
-    case A: i += 1; break;
-    case B: i += 2; break;
-    case C: i += 3; break;
-}
-]]></programlisting>
-
-<para>with code like this:</para>
-
-<programlisting><![CDATA[
-enum E { A, B, C };
-enum E e;
-int table[] = { 1, 2, 3 };
-int i;
-...
-i += table[e];
-]]></programlisting>
-
-<para>
-This is obviously a contrived example, but the basic principle applies in a
-wide variety of situations.</para>
-
-<para>
-In short, Cachegrind can tell you where some of the bottlenecks in your code
-are, but it can't tell you how to fix them.  You have to work that out for
-yourself.  But at least you have the information!
-</para>
-
-</sect1>
-
-
 <sect1 id="cg-manual.sim-details"
        xreflabel="Simulation Details">
 <title>Simulation Details</title>
@@ -1172,8 +1154,9 @@ use Cachegrind, but may be of interest to some people.
 <sect2 id="cache-sim" xreflabel="Cache Simulation Specifics">
 <title>Cache Simulation Specifics</title>
 
-<para>Specific characteristics of the cache simulation are as
-follows:</para>
+<para>
+The cache simulation approximates the hardware of an AMD Athlon CPU circa 2002.
+Its specific characteristics are as follows:</para>
 
 <itemizedlist>
 
@@ -1271,11 +1254,11 @@ need to specify it with the
 
 </itemizedlist>
 
-<para>If you are interested in simulating a cache with different
-properties, it is not particularly hard to write your own cache
-simulator, or to modify the existing ones in
-<computeroutput>cg_sim.c</computeroutput>. We'd be
-interested to hear from anyone who does.</para>
+<para>
+If you are interested in simulating a cache with different properties, it is
+not particularly hard to write your own cache simulator, or to modify the
+existing ones in <computeroutput>cg_sim.c</computeroutput>.
+</para>
 
 </sect2>
 
@@ -1324,19 +1307,38 @@ Architecture: A Quantitative Approach", 4th edition (2007), Section
 <sect2 id="cg-manual.annopts.accuracy" xreflabel="Accuracy">
 <title>Accuracy</title>
 
-<para>Valgrind's cache profiling has a number of
-shortcomings:</para>
+<para>
+Cachegrind's instruction counting has one shortcoming on x86/amd64:
+</para>
 
 <itemizedlist>
   <listitem>
-    <para>It doesn't account for kernel activity -- the effect of system
-    calls on the cache and branch predictor contents is ignored.</para>
+    <para>
+    When a <function>REP</function>-prefixed instruction executes each
+    iteration is counted separately. In contrast, hardware counters count each
+    such instruction just once, no matter how many times it iterates. It is
+    arguable that Cachegrind's behaviour is more useful.
+    </para>
+  </listitem>
+</itemizedlist>
+
+<para>
+Cachegrind's cache profiling has a number of shortcomings:
+</para>
+
+<itemizedlist>
+  <listitem>
+    <para>
+    It doesn't account for kernel activity. The effect of system calls on the
+    cache and branch predictor contents is ignored.
+    </para>
   </listitem>
 
   <listitem>
-    <para>It doesn't account for other process activity.
-    This is probably desirable when considering a single
-    program.</para>
+    <para>
+    It doesn't account for other process activity. This is arguably desirable
+    when considering a single program.
+    </para>
   </listitem>
 
   <listitem>
@@ -1360,15 +1362,15 @@ shortcomings:</para>
   </listitem>
 
   <listitem>
-    <para>The x86/amd64 instructions <computeroutput>bts</computeroutput>,
+    <para>
+    The x86/amd64 instructions <computeroutput>bts</computeroutput>,
     <computeroutput>btr</computeroutput> and
-    <computeroutput>btc</computeroutput> will incorrectly be
-    counted as doing a data read if both the arguments are
-    registers, eg:</para>
-<programlisting><![CDATA[
+    <computeroutput>btc</computeroutput> will incorrectly be counted as doing a
+    data read if both the arguments are registers, e.g.:
+    <programlisting><![CDATA[
     btsl %eax, %edx]]></programlisting>
-
-    <para>This should only happen rarely.</para>
+    This should only happen rarely.
+    </para>
   </listitem>
 
   <listitem>
@@ -1387,13 +1389,12 @@ file names, can perturb the results.  Variations will be small, but
 don't expect perfectly repeatable results if your program changes at
 all.</para>
 
-<para>More recent GNU/Linux distributions do address space
-randomisation, in which identical runs of the same program have their
-shared libraries loaded at different locations, as a security measure.
-This also perturbs the results.</para>
-
-<para>While these factors mean you shouldn't trust the results to
-be super-accurate, they should be close enough to be useful.</para>
+<para>
+Many Linux distributions perform address space layout randomisation (ASLR), in
+which identical runs of the same program have their shared libraries loaded at
+different locations, as a security measure. This also perturbs the
+results.
+</para>
 
 </sect2>
 
diff --git a/cachegrind/docs/cg_annotate-manpage.xml b/cachegrind/docs/cg_annotate-manpage.xml
index 5790eb060..e4239c8a2 100644
--- a/cachegrind/docs/cg_annotate-manpage.xml
+++ b/cachegrind/docs/cg_annotate-manpage.xml
@@ -30,8 +30,9 @@
 <refsect1 id="cg_annotate-description">
 <title>Description</title>
 
-<para><command>cg_annotate</command> takes an output file produced by the
-Valgrind tool Cachegrind and prints the information in an easy-to-read form.
+<para>
+<command>cg_annotate</command> takes one or more Cachegrind output files and
+prints data about the profiled program in an easy-to-read form.
 </para>
 
 </refsect1>
diff --git a/cachegrind/docs/cg_diff-manpage.xml b/cachegrind/docs/cg_diff-manpage.xml
index daffdfbbb..fe14d14c6 100644
--- a/cachegrind/docs/cg_diff-manpage.xml
+++ b/cachegrind/docs/cg_diff-manpage.xml
@@ -14,7 +14,7 @@
 
 <refnamediv>
   <refname>cg_diff</refname>
-  <refpurpose>compares two Cachegrind output files</refpurpose>
+  <refpurpose>(deprecated) diffs two Cachegrind output files</refpurpose>
 </refnamediv>
 
 <refsynopsisdiv>
@@ -30,9 +30,10 @@
 <refsect1 id="cg_diff-description">
 <title>Description</title>
 
-<para><command>cg_diff</command> takes two output files produced by the
-Valgrind tool Cachegrind, computes the difference and prints the result
-in the same format that Cachegrinds outputs.
+<para>
+<command>cg_diff</command> diffs two Cachegrind output files into a single
+Cachegrind output file. It is deprecated because <command>cg_annotate</command>
+can now do much the same thing, but better.
 </para>
 
 </refsect1>
diff --git a/cachegrind/docs/cg_merge-manpage.xml b/cachegrind/docs/cg_merge-manpage.xml
index e4e97310e..48aef4d77 100644
--- a/cachegrind/docs/cg_merge-manpage.xml
+++ b/cachegrind/docs/cg_merge-manpage.xml
@@ -14,7 +14,7 @@
 
 <refnamediv>
   <refname>cg_merge</refname>
-  <refpurpose>merges multiple Cachegrind output files into one</refpurpose>
+  <refpurpose>(deprecated) merges multiple Cachegrind output files into one</refpurpose>
 </refnamediv>
 
 <refsynopsisdiv>
@@ -29,8 +29,10 @@
 <refsect1 id="cg_merge-description">
 <title>Description</title>
 
-<para><command>cg_merge</command> sums together the outputs of multiple
-Cachegrind runs into a single output file.
+<para>
+<command>cg_merge</command> sums together multiple Cachegrind output files into
+a single Cachegrind output file. It is deprecated because
+<command>cg_annotate</command> can now do much the same thing, but better.
 </para>
 
 </refsect1>
diff --git a/cachegrind/docs/concord.c b/cachegrind/docs/concord.c
new file mode 100644
index 000000000..7ebdbea65
--- /dev/null
+++ b/cachegrind/docs/concord.c
@@ -0,0 +1,532 @@
+/********************************************************************************
+**  Program: concord.c
+**  By: Nick Nethercote, 36448.  Any code taken from elsewhere as noted.
+**  For: 433-253 assignment 3.
+**  
+**  Program description:  This program is a tool for finding specific 
+**  occurrences of words in a text;  it can count the number of times a single
+**  word appears, or list the lines that a word, or multiple words, all appear
+**  on.  See the project specification for more detail.
+**  	The primary data structure used is a static hash table, of fixed size.
+**  Any collisions of words hashing to the same position in the table are
+**  dealt with via separate chaining.  Also, for each word, there is a 
+**  subsidiary linked list containing the line numbers that the word appears on.
+**  Thus there are linked lists within linked lists.
+**  	I have implemented the entire program within one file, partly because
+**  there isn't a great deal of code, and partly because I haven't yet done
+**  433-252, and thus don't know a great deal about .h files, makefiles, etc.
+*/
+
+#include <stdio.h>
+#include <ctype.h>
+#include <stdlib.h>
+#include <string.h>
+
+#define TRUE 1
+#define FALSE 0
+#define MAX_WORD_LENGTH 100 
+#define DUMMY_WORD_LENGTH 2
+#define TABLE_SIZE 997 
+#define BEFORE_WORD 1
+#define IN_WORD 2
+#define AFTER_WORD 3
+#define HASH_CONSTANT 256
+#define ARGS_NUMBER 1
+
+typedef struct word_node Word_Node;
+typedef struct line_node Line_Node;
+typedef struct word_info Word_Info;
+typedef struct arg_node Arg_Node;
+
+/*  Linked list node for storing each word */
+struct word_node {
+    char *word;		    /* The actual word */
+    int number;		    /* The number of occurrences */	
+    Line_Node *line_list;   /* Points to the linked list of line numbers */
+    Line_Node *last_line;   /* Points to the last line node, for easy append */
+    Word_Node *next_word;   /* Next node in list */
+};
+
+/*  Subsidiary linked list node for storing line numbers */
+struct line_node {
+    int line;
+    Line_Node *next_line;
+};
+
+/*  Structure used when reading each word, and it line number, from file. */
+struct word_info {
+    char word[MAX_WORD_LENGTH];
+    int line;
+};
+
+/*  Linked list node used for holding multiple arguments from the program's
+**  internal command line.  Also, can point to a list of line numbers;  this
+**  is used when displaying line numbers.  
+*/ 
+struct arg_node {
+    char *word;
+    Line_Node *line_list;
+    Arg_Node *next_arg;
+};
+
+int        hash(char *word);
+void      *create(int mem_size);
+void       init_hash_table(char *file_name, Word_Node *table[]);
+int        get_word(Word_Info *data, int line, FILE *file_ptr);
+void       insert(char *inword, int in_line, Word_Node *table[]);
+Word_Node *new_word_node(char *inword, int in_line);
+Line_Node *add_existing(Line_Node *curr, int in_line);
+void       interact(Word_Node *table[]);
+Arg_Node  *place_args_in_list(char command[]);
+Arg_Node  *append(char *word, Arg_Node *head);
+void       count(Arg_Node *head, Word_Node *table[]);
+void       list_lines(Arg_Node *head, Word_Node *table[]);
+void       intersection(Arg_Node *head);
+void       intersect_array(int master[], int size, Arg_Node *arg_head);
+void       kill_arg_list(Arg_Node *head);
+
+int main(int argc, char *argv[])
+{
+    /* The actual hash table, a fixed-size array of pointers to word nodes */
+    Word_Node *table[TABLE_SIZE];
+
+    /* Checking command line input for one file name */
+    if (argc != ARGS_NUMBER + 1) {
+	fprintf(stderr, "%s requires %d argument\n", argv[0], ARGS_NUMBER); 
+	exit(EXIT_FAILURE);
+    }
+
+    init_hash_table(argv[1], table);
+    interact(table);
+
+    /* Nb:  I am not freeing the dynamic memory in the hash table, having been
+    ** told this is not necessary. */
+    return 0;
+}
+
+/* General dynamic allocation function that allocates and then checks. */
+void *create(int mem_size)
+{
+    void *dyn_block;
+
+    dyn_block = malloc(mem_size);
+    if (!(dyn_block)) {
+        fprintf(stderr, "Couldn't allocate enough memory to continue.\n");
+        exit(EXIT_FAILURE);
+    }
+
+    return dyn_block;
+}
+
+/* Function returns a hash value on a word.  Almost identical to the hash
+** function presented in Sedgewick.
+*/
+int hash(char *word)
+{
+    int hash_value = 0;
+
+    for ( ; *word; word++)
+        hash_value = (HASH_CONSTANT * hash_value + *word) % TABLE_SIZE;
+
+    return hash_value;
+}
+
+/* Function builds the hash table from the given file. */
+void init_hash_table(char *file_name, Word_Node *table[])
+{
+    FILE *file_ptr;
+    Word_Info *data;
+    int line = 1, i;
+
+    /* Structure used when reading in words and line numbers. */
+    data = (Word_Info *) create(sizeof(Word_Info));
+
+    /* Initialise entire table to NULL. */
+    for (i = 0; i < TABLE_SIZE; i++)
+        table[i] = NULL;
+
+    /* Open file, check it. */
+    file_ptr = fopen(file_name, "r");
+    if (!(file_ptr)) {
+        fprintf(stderr, "Couldn't open '%s'.\n", file_name);
+        exit(EXIT_FAILURE);
+    }
+
+    /*  'Get' the words and lines one at a time from the file, and insert them
+    ** into the table one at a time. */
+    while ((line = get_word(data, line, file_ptr)) != EOF)
+        insert(data->word, data->line, table);
+
+    free(data);
+    fclose(file_ptr);
+}
+
+/* Function reads the next word, and it's line number, and places them in the 
+** structure 'data', via a pointer.
+*/
+int get_word(Word_Info *data, int line, FILE *file_ptr)
+{
+    int index = 0, pos = BEFORE_WORD;
+
+    /* Only alphabetic characters are read, apostrophes are ignored, and other
+    ** characters are considered separators.  'pos' helps keep track whether
+    ** the current file position is inside a word or between words.
+    */
+    while ((data->word[index] = tolower(fgetc(file_ptr))) != EOF) {
+        if (data->word[index] == '\n')
+            line++;
+        if (islower(data->word[index])) {
+            if (pos == BEFORE_WORD) {
+                pos = IN_WORD;
+                data->line = line;
+            }
+            index++;
+        }
+        else if ((pos == IN_WORD) && (data->word[index] != '\'')) {
+            break;
+        }
+    }
+    /* Signals end of file has been reached. */
+    if (data->word[index] == EOF)
+        line = EOF;
+
+    /* Adding the null character. */
+    data->word[index] = '\0';
+
+    return line;
+}
+
+/* Function inserts a word and it's line number into the hash table. */
+void insert(char *inword, int in_line, Word_Node *table[])
+{
+    int position = hash(inword);
+    Word_Node *curr, *prev = NULL;
+    char dummy_word[DUMMY_WORD_LENGTH] = "A";
+
+    /* The case where that hash position hasn't been used before; a new word
+    ** node is created. 
+    */
+    if (table[position] == NULL)
+        table[position] = new_word_node(dummy_word, 0);
+    curr = table[position];
+
+    /* Traverses that position's list of words until the current word is found
+    ** (i.e. it's come up before) or the list end is reached (i.e. it's the
+    ** first occurrence of the word).
+    */
+    while ((curr != NULL) && (strcmp(inword, curr->word) > 0)) {
+        prev = curr;
+        curr = curr->next_word;
+    }
+
+    /* If the word hasn't appeared before, it's inserted alphabetically into
+    ** the list.
+    */
+    if ((curr == NULL) || (strcmp(curr->word, inword) != 0)) {
+        prev->next_word = new_word_node(inword, in_line);
+        prev->next_word->next_word = curr;
+    }
+    /* Otherwise, the word count is incremented, and the line number is added
+    ** to the existing list.
+    */
+    else {
+        (curr->number)++;
+        curr->last_line = add_existing(curr->last_line, in_line);
+    }
+}
+
+/* Function creates a new node for when a word is inserted for the first time.
+*/
+Word_Node *new_word_node(char *inword, int in_line)
+{
+    Word_Node *new;
+
+    new = (Word_Node *) create(sizeof(Word_Node));
+    new->word = (char *) create(sizeof(char) * (strlen(inword) + 1));
+    new->word = strcpy(new->word, inword);
+    /* The word count is set to 1, as this is the first occurrence! */
+    new->number = 1;
+    new->next_word = NULL;
+    /* One line number node is added. */
+    new->line_list = (Line_Node *) create(sizeof(Line_Node));
+    new->line_list->line = in_line;
+    new->line_list->next_line = NULL;
+    new->last_line = new->line_list;
+
+    return new;
+}
+
+/* Function adds a line number to the line number list of a word that has
+** already been inserted at least once.  The pointer 'last_line', part of
+** the word node structure, allows easy appending to the list.
+*/
+Line_Node *add_existing(Line_Node *last_line, int in_line)
+{
+    /* Check to see if that line has already occurred - multiple occurrences on
+    ** the one line are only recorded once.  (Nb:  They are counted twice, but
+    ** only listed once.)
+    */
+    if (last_line->line != in_line) {
+        last_line->next_line = (Line_Node *) create(sizeof(Line_Node));
+        last_line = last_line->next_line;
+        last_line->line = in_line;
+        last_line->next_line = NULL;
+    }
+
+    return last_line;
+}
+
+/*  Function controls the interactive command line part of the program. */
+void interact(Word_Node *table[])
+{
+    char args[MAX_WORD_LENGTH];     /* Array to hold command line */
+    Arg_Node *arg_list = NULL;      /* List that holds processed arguments */ 
+    int not_quitted = TRUE;         /* Quit flag */
+
+    /* The prompt (?) is displayed.  Commands are read into an array, and then
+    ** individual arguments are placed into a linked list for easy use. 
+    ** The first argument (actually the command) is looked at to determine
+    ** what action should be performed.  'arg_list->next_arg' is passed to
+    ** count() and list_lines(), because the actual 'c' or 'l' is not needed
+    ** by them.  Lastly, the argument linked list is freed, by 'kill_arg_list'.  
+    */ 
+    do {
+        printf("?");		     
+        fgets(args, MAX_WORD_LENGTH - 1, stdin);
+        arg_list = place_args_in_list(args);
+        if (arg_list) {
+            if (strcmp(arg_list->word, "c") == 0)
+		count(arg_list->next_arg, table);
+            else if (strcmp(arg_list->word, "l") == 0)
+               	list_lines(arg_list->next_arg, table); 
+            else if (strcmp(arg_list->word, "q") == 0) {
+               	printf("Quitting concord\n");
+		not_quitted = FALSE;
+	    }
+            else
+               	printf("Not a valid command.\n");
+	    kill_arg_list(arg_list);
+        }
+    } while (not_quitted);	/* Quits on flag */
+}
+
+/* Function takes an array containing a command line, and parses it, placing
+** actual word into a linked list.
+*/
+Arg_Node *place_args_in_list(char command[])
+{
+    int index1 = 0, index2 = 0, pos = BEFORE_WORD;
+    char token[MAX_WORD_LENGTH], c;
+    Arg_Node *head = NULL;
+
+    /* Non alphabetic characters are discarded.  Alphabetic characters are
+    ** copied into the array 'token'.  Once the current word has been copied
+    ** into 'token', 'append' is called, copying 'token' to a new node in the
+    ** linked list.
+    */
+    while (command[index1] != '\0') {
+        c = tolower(command[index1++]);
+        if (islower(c)) {
+            token[index2++] = c;
+            pos = IN_WORD;
+        }
+        else if (c == '\'')
+            token[index2] = c;
+        else if (pos == IN_WORD) {
+            pos = BEFORE_WORD;
+            token[index2] = '\0';
+            head = append(token, head);
+            index2 = 0;
+        }
+    }
+
+    return head;
+}
+
+/* Function takes a word, and appends a new node containing that word to the
+** list.
+*/
+Arg_Node *append(char *word, Arg_Node *head)
+{
+    Arg_Node *curr = head,
+             *new = (Arg_Node *) create(sizeof(Arg_Node));
+
+    new->word = (char *) create(sizeof(char) * (strlen(word) + 1));
+    strcpy(new->word, word);
+    new->line_list = NULL;
+    new->next_arg = NULL;
+
+    if (head == NULL)
+        return new;
+
+    while (curr->next_arg != NULL)
+        curr = curr->next_arg;
+    curr->next_arg = new;
+
+    return head;
+}
+
+
+/* Function displays the number of times a word has occurred. */
+void count(Arg_Node *arg_list, Word_Node *table[])
+{
+    int hash_pos = 0;		/* Only initialised to avoid gnuc warnings */
+    Word_Node *curr_word = NULL;  
+
+    /* Checking for the right number of arguments (one). */
+    if (arg_list) {
+        if (arg_list->next_arg != NULL) {
+	    printf("c requires only one argument\n");
+	    return;
+	}
+        hash_pos = hash(arg_list->word);
+    }
+    else
+	return;    
+
+    /* Finds if the supplied word is in table, firstly by hashing to it's 
+    ** would be position, and then traversing the list of words.  If present,
+    ** it's number is displayed, otherwise '0' is printed. 
+    */
+    if (table[hash_pos]) {
+	curr_word = table[hash_pos]->next_word;
+	while ((curr_word != NULL) &&
+	       (strcmp(arg_list->word, curr_word->word) != 0)) 
+	    curr_word = curr_word->next_word; 	
+        if (curr_word) 
+  	    printf("%d\n", curr_word->number);
+        else
+	    printf("0\n");
+    }
+    else
+	printf("0\n");
+}
+
+/* Function that takes each node in the argument list, and directs a pointer
+** to that word's list of lines, which are present in the hash table. 
+*/
+void list_lines(Arg_Node *arg_head, Word_Node *table[])
+{
+    int hash_pos = 0;		/* Only initialised to avoid gnuc warnings */
+    Word_Node *curr_word;
+    Arg_Node *curr_arg = arg_head;
+
+    /* For each word in the list of arguments, the word is looked for in the 
+    ** hash table.  Each argument node has a pointer, and if the word is there,
+    ** that pointer is set to point at that word's list of line numbers. 
+    */ 
+    while (curr_arg != NULL) {
+        hash_pos = hash(curr_arg->word);
+        if (table[hash_pos]) {
+            curr_word = table[hash_pos]->next_word;   /* Gets past dummy node */
+            while (curr_word != NULL && 
+		   strcmp(curr_arg->word, curr_word->word) != 0) 
+	        curr_word = curr_word->next_word;
+            if (curr_word) 
+	        curr_arg->line_list = curr_word->line_list;
+        }
+        curr_arg = curr_arg->next_arg;
+    }
+    /* An intersection is then performed, to determine which lines, if any, 
+    ** all the arguments appear on.
+    */
+    if (arg_head)
+        intersection(arg_head); 
+}
+
+/*  Function takes a list of line lists, and finds the lines that are common
+**  to each line list, by using a comparison array.
+*/
+void intersection(Arg_Node *arg_head)
+{
+    Line_Node *curr_line;
+    int *master, n = 0, index = 0, output = FALSE;
+ 
+    /* Find size of first list, for creating master array */
+    curr_line = arg_head->line_list;
+    while (curr_line) {
+        n++;
+        curr_line = curr_line->next_line;
+    }
+
+    /* The master comparison array is created. */ 
+    master = (int *) create(sizeof(int) * n);
+    curr_line = arg_head->line_list;
+ 
+    /*  Copy first list into master array */
+    while (curr_line) {
+        *(master + index++) = curr_line->line; 
+	curr_line = curr_line->next_line;
+    }
+
+    /* Perform the actual intersection. */
+    intersect_array(master, n, arg_head->next_arg);
+
+    /* Print the line numbers left in the processed array, those left contain
+    ** all the words specified in the command. 
+    */
+    for (index = 0; index < n; index++)
+	if (*(master + index) != 0) { 
+	    printf("%d ", *(master + index));
+	    output = TRUE; 
+	}
+    /* 'Output' merely prevents an unnecessary newline when 'l' returns no 
+    ** answer. 
+    */
+    if (output)
+        printf("\n");
+
+    /* Deallocate dynamic memory for master array */
+    free(master);
+}
+
+/* Function takes master array containing line numbers - these depend on the
+** first list of lines, and is done in 'list_lines'.  It then moves through the
+** argument list.  For each word, each line number in master is compared to each
+** line number in that word's line list.  If there is no match, then that 
+** position in the array is set to 0, because that line is no longer in 
+** contention as an answer.
+*/
+void intersect_array(int master[], int size, Arg_Node *arg_head)
+{
+    int index = 0;
+    Line_Node *curr_line;
+
+    while (arg_head) {
+        index = 0;
+        curr_line = arg_head->line_list;
+    /* For each line in the list, any number less than that in the array will
+    ** be set to zero.  Any number equal to that in the list will remain.
+    ** This loop depends on the fact that both the line list, and the master 
+    ** array, are sorted. */ 
+        while (curr_line) {
+            while (*(master + index) < curr_line->line && index < size)
+                *(master + index++) = 0;
+            while (*(master + index) <= curr_line->line && index < size)
+                index++;
+            curr_line = curr_line->next_line;
+        }
+    /* Once the list of lines has been traversed, any array positions that 
+    ** haven't been examined are set to zero, as they are no longer in 
+    ** contention. 
+    */
+        for ( ; index < size; index++)
+            *(master + index) = 0;
+
+        arg_head = arg_head->next_arg;
+    }
+}
+
+/*  Function to free dynamic memory used by the arguments linked list. */
+void kill_arg_list(Arg_Node *head)
+{
+    Arg_Node *temp;
+
+    while (head != NULL) {
+        temp = head;
+        head = head->next_arg;
+        free(temp->word);
+        free(temp);
+    }
+}
+
diff --git a/cachegrind/docs/concord.cgann b/cachegrind/docs/concord.cgann
new file mode 100644
index 000000000..930e4dc7b
--- /dev/null
+++ b/cachegrind/docs/concord.cgann
@@ -0,0 +1,560 @@
+--------------------------------------------------------------------------------
+-- Metadata
+--------------------------------------------------------------------------------
+Invocation:       ../cg_annotate concord.cgout
+Command:          ./concord ../cg_main.c
+Events recorded:  Ir
+Events shown:     Ir
+Event sort order: Ir
+Threshold:        0.1%
+Annotation:       on
+
+--------------------------------------------------------------------------------
+-- Summary
+--------------------------------------------------------------------------------
+Ir________________ 
+
+8,195,056 (100.0%)  PROGRAM TOTALS
+
+--------------------------------------------------------------------------------
+-- File:function summary
+--------------------------------------------------------------------------------
+  Ir______________________  file:function
+
+< 3,078,746 (37.6%, 37.6%)  /home/njn/grind/ws1/cachegrind/docs/concord.c:
+  1,630,232 (19.9%)           get_word
+    630,918  (7.7%)           hash
+    461,095  (5.6%)           insert
+    130,560  (1.6%)           add_existing
+     91,014  (1.1%)           init_hash_table
+     88,056  (1.1%)           create
+     46,676  (0.6%)           new_word_node
+
+< 1,746,038 (21.3%, 58.9%)  ./malloc/./malloc/malloc.c:
+  1,285,938 (15.7%)           _int_malloc
+    458,225  (5.6%)           malloc
+
+< 1,107,550 (13.5%, 72.4%)  ./libio/./libio/getc.c:getc
+
+<   551,071  (6.7%, 79.1%)  ./string/../sysdeps/x86_64/multiarch/strcmp-avx2.S:__strcmp_avx2
+
+<   521,228  (6.4%, 85.5%)  ./ctype/../include/ctype.h:
+    260,616  (3.2%)           __ctype_tolower_loc
+    260,612  (3.2%)           __ctype_b_loc
+
+<   468,163  (5.7%, 91.2%)  ???:
+    468,151  (5.7%)           ???
+
+<   456,071  (5.6%, 96.8%)  /usr/include/ctype.h:get_word
+
+<    48,344  (0.6%, 97.3%)  ./string/../sysdeps/x86_64/multiarch/strcpy-avx2.S:__strcpy_avx2
+
+<    40,776  (0.5%, 97.8%)  ./elf/./elf/dl-lookup.c:
+     25,623  (0.3%)           do_lookup_x
+      9,515  (0.1%)           _dl_lookup_symbol_x
+
+<    37,412  (0.5%, 98.3%)  ./elf/./elf/dl-tunables.c:
+     36,500  (0.4%)           __GI___tunables_init
+
+<    23,366  (0.3%, 98.6%)  ./string/../sysdeps/x86_64/multiarch/strlen-avx2.S:__strlen_avx2
+
+<    22,107  (0.3%, 98.9%)  ./malloc/./malloc/arena.c:
+     22,023  (0.3%)           malloc
+
+<    16,539  (0.2%, 99.1%)  ./elf/./elf/dl-reloc.c:_dl_relocate_object
+
+<     9,160  (0.1%, 99.2%)  ./elf/../sysdeps/generic/dl-new-hash.h:_dl_lookup_symbol_x
+
+<     8,535  (0.1%, 99.3%)  ./string/../sysdeps/x86_64/multiarch/../multiarch/strcmp-sse2.S:
+      8,503  (0.1%)           strcmp
+
+--------------------------------------------------------------------------------
+-- Function:file summary
+--------------------------------------------------------------------------------
+  Ir______________________  function:file
+
+> 2,086,303 (25.5%, 25.5%)  get_word:
+  1,630,232 (19.9%)           /home/njn/grind/ws1/cachegrind/docs/concord.c
+    456,071  (5.6%)           /usr/include/ctype.h
+
+> 1,285,938 (15.7%, 41.1%)  _int_malloc:./malloc/./malloc/malloc.c
+
+> 1,107,550 (13.5%, 54.7%)  getc:./libio/./libio/getc.c
+
+>   630,918  (7.7%, 62.4%)  hash:/home/njn/grind/ws1/cachegrind/docs/concord.c
+
+>   551,071  (6.7%, 69.1%)  __strcmp_avx2:./string/../sysdeps/x86_64/multiarch/strcmp-avx2.S
+
+>   480,248  (5.9%, 74.9%)  malloc:
+    458,225  (5.6%)           ./malloc/./malloc/malloc.c
+     22,023  (0.3%)           ./malloc/./malloc/arena.c
+
+>   468,151  (5.7%, 80.7%)  ???:???
+
+>   461,095  (5.6%, 86.3%)  insert:/home/njn/grind/ws1/cachegrind/docs/concord.c
+
+>   260,616  (3.2%, 89.5%)  __ctype_tolower_loc:./ctype/../include/ctype.h
+
+>   260,612  (3.2%, 92.6%)  __ctype_b_loc:./ctype/../include/ctype.h
+
+>   130,560  (1.6%, 94.2%)  add_existing:/home/njn/grind/ws1/cachegrind/docs/concord.c
+
+>    91,014  (1.1%, 95.4%)  init_hash_table:/home/njn/grind/ws1/cachegrind/docs/concord.c
+
+>    88,056  (1.1%, 96.4%)  create:/home/njn/grind/ws1/cachegrind/docs/concord.c
+
+>    50,010  (0.6%, 97.0%)  new_word_node:
+     46,676  (0.6%)           /home/njn/grind/ws1/cachegrind/docs/concord.c
+
+>    48,344  (0.6%, 97.6%)  __strcpy_avx2:./string/../sysdeps/x86_64/multiarch/strcpy-avx2.S
+
+>    42,906  (0.5%, 98.1%)  __GI___tunables_init:
+     36,500  (0.4%)           ./elf/./elf/dl-tunables.c
+
+>    26,514  (0.3%, 98.5%)  do_lookup_x:
+     25,623  (0.3%)           ./elf/./elf/dl-lookup.c
+
+>    25,642  (0.3%, 98.8%)  _dl_relocate_object:
+     16,539  (0.2%)           ./elf/./elf/dl-reloc.c
+
+>    23,366  (0.3%, 99.1%)  __strlen_avx2:./string/../sysdeps/x86_64/multiarch/strlen-avx2.S
+
+>    18,675  (0.2%, 99.3%)  _dl_lookup_symbol_x:
+      9,515  (0.1%)           ./elf/./elf/dl-lookup.c
+      9,160  (0.1%)           ./elf/../sysdeps/generic/dl-new-hash.h
+
+>     8,547  (0.1%, 99.4%)  strcmp:
+      8,503  (0.1%)           ./string/../sysdeps/x86_64/multiarch/../multiarch/strcmp-sse2.S
+
+--------------------------------------------------------------------------------
+-- Annotated source file: ./ctype/../include/ctype.h
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:
+- ./ctype/../include/ctype.h
+
+--------------------------------------------------------------------------------
+-- Annotated source file: ./elf/../sysdeps/generic/dl-new-hash.h
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:
+- ./elf/../sysdeps/generic/dl-new-hash.h
+
+--------------------------------------------------------------------------------
+-- Annotated source file: ./elf/./elf/dl-lookup.c
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:
+- ./elf/./elf/dl-lookup.c
+
+--------------------------------------------------------------------------------
+-- Annotated source file: ./elf/./elf/dl-reloc.c
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:
+- ./elf/./elf/dl-reloc.c
+
+--------------------------------------------------------------------------------
+-- Annotated source file: ./elf/./elf/dl-tunables.c
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:
+- ./elf/./elf/dl-tunables.c
+
+--------------------------------------------------------------------------------
+-- Annotated source file: ./libio/./libio/getc.c
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:
+- ./libio/./libio/getc.c
+
+--------------------------------------------------------------------------------
+-- Annotated source file: ./malloc/./malloc/arena.c
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:
+- ./malloc/./malloc/arena.c
+
+--------------------------------------------------------------------------------
+-- Annotated source file: ./malloc/./malloc/malloc.c
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:
+- ./malloc/./malloc/malloc.c
+
+--------------------------------------------------------------------------------
+-- Annotated source file: ./string/../sysdeps/x86_64/multiarch/../multiarch/strcmp-sse2.S
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:
+- ./string/../sysdeps/x86_64/multiarch/../multiarch/strcmp-sse2.S
+
+--------------------------------------------------------------------------------
+-- Annotated source file: ./string/../sysdeps/x86_64/multiarch/strcmp-avx2.S
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:
+- ./string/../sysdeps/x86_64/multiarch/strcmp-avx2.S
+
+--------------------------------------------------------------------------------
+-- Annotated source file: ./string/../sysdeps/x86_64/multiarch/strcpy-avx2.S
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:
+- ./string/../sysdeps/x86_64/multiarch/strcpy-avx2.S
+
+--------------------------------------------------------------------------------
+-- Annotated source file: ./string/../sysdeps/x86_64/multiarch/strlen-avx2.S
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:
+- ./string/../sysdeps/x86_64/multiarch/strlen-avx2.S
+
+--------------------------------------------------------------------------------
+-- Annotated source file: /home/njn/grind/ws1/cachegrind/docs/concord.c
+--------------------------------------------------------------------------------
+Ir____________ 
+
+-- line 81 ----------------------------------------
+      .         Arg_Node  *append(char *word, Arg_Node *head);
+      .         void       count(Arg_Node *head, Word_Node *table[]);
+      .         void       list_lines(Arg_Node *head, Word_Node *table[]);
+      .         void       intersection(Arg_Node *head);
+      .         void       intersect_array(int master[], int size, Arg_Node *arg_head);
+      .         void       kill_arg_list(Arg_Node *head);
+      .         
+      .         int main(int argc, char *argv[])
+      8 (0.0%)  {
+      .             /* The actual hash table, a fixed-size array of pointers to word nodes */
+      .             Word_Node *table[TABLE_SIZE];
+      .         
+      .             /* Checking command line input for one file name */
+      2 (0.0%)      if (argc != ARGS_NUMBER + 1) {
+      .         	fprintf(stderr, "%s requires %d argument\n", argv[0], ARGS_NUMBER); 
+      .         	exit(EXIT_FAILURE);
+      .             }
+      .         
+      4 (0.0%)      init_hash_table(argv[1], table);
+      2 (0.0%)      interact(table);
+      .         
+      .             /* Nb:  I am not freeing the dynamic memory in the hash table, having been
+      .             ** told this is not necessary. */
+      .             return 0;
+      7 (0.0%)  }
+      .         
+      .         /* General dynamic allocation function that allocates and then checks. */
+      .         void *create(int mem_size)
+ 22,014 (0.3%)  {
+      .             void *dyn_block;
+      .         
+ 22,014 (0.3%)      dyn_block = malloc(mem_size);
+ 22,014 (0.3%)      if (!(dyn_block)) {
+      .                 fprintf(stderr, "Couldn't allocate enough memory to continue.\n");
+      .                 exit(EXIT_FAILURE);
+      .             }
+      .         
+      .             return dyn_block;
+ 22,014 (0.3%)  }
+      .         
+      .         /* Function returns a hash value on a word.  Almost identical to the hash
+      .         ** function presented in Sedgewick.
+      .         */
+      .         int hash(char *word)
+  7,908 (0.1%)  {
+  7,908 (0.1%)      int hash_value = 0;
+      .         
+161,292 (2.0%)      for ( ; *word; word++)
+453,810 (5.5%)          hash_value = (HASH_CONSTANT * hash_value + *word) % TABLE_SIZE;
+      .         
+      .             return hash_value;
+      .         }
+      .         
+      .         /* Function builds the hash table from the given file. */
+      .         void init_hash_table(char *file_name, Word_Node *table[])
+      8 (0.0%)  {
+      .             FILE *file_ptr;
+      .             Word_Info *data;
+      2 (0.0%)      int line = 1, i;
+      .         
+      .             /* Structure used when reading in words and line numbers. */
+      3 (0.0%)      data = (Word_Info *) create(sizeof(Word_Info));
+      .         
+      .             /* Initialise entire table to NULL. */
+  2,993 (0.0%)      for (i = 0; i < TABLE_SIZE; i++)
+    997 (0.0%)          table[i] = NULL;
+      .         
+      .             /* Open file, check it. */
+      4 (0.0%)      file_ptr = fopen(file_name, "r");
+      2 (0.0%)      if (!(file_ptr)) {
+      .                 fprintf(stderr, "Couldn't open '%s'.\n", file_name);
+      .                 exit(EXIT_FAILURE);
+      .             }
+      .         
+      .             /*  'Get' the words and lines one at a time from the file, and insert them
+      .             ** into the table one at a time. */
+ 55,363 (0.7%)      while ((line = get_word(data, line, file_ptr)) != EOF)
+ 31,632 (0.4%)          insert(data->word, data->line, table);
+      .         
+      2 (0.0%)      free(data);
+      2 (0.0%)      fclose(file_ptr);
+      6 (0.0%)  }
+      .         
+      .         /* Function reads the next word, and it's line number, and places them in the 
+      .         ** structure 'data', via a pointer.
+      .         */
+      .         int get_word(Word_Info *data, int line, FILE *file_ptr)
+ 86,999 (1.1%)  {
+ 15,818 (0.2%)      int index = 0, pos = BEFORE_WORD;
+      .         
+      .             /* Only alphabetic characters are read, apostrophes are ignored, and other
+      .             ** characters are considered separators.  'pos' helps keep track whether
+      .             ** the current file position is inside a word or between words.
+      .             */
+529,133 (6.5%)      while ((data->word[index] = tolower(fgetc(file_ptr))) != EOF) {
+      .                 if (data->word[index] == '\n')
+260,608 (3.2%)              line++;
+390,912 (4.8%)          if (islower(data->word[index])) {
+ 64,830 (0.8%)              if (pos == BEFORE_WORD) {
+ 15,816 (0.2%)                  pos = IN_WORD;
+  7,908 (0.1%)                  data->line = line;
+      .                     }
+ 32,415 (0.4%)              index++;
+      .                 }
+146,702 (1.8%)          else if ((pos == IN_WORD) && (data->word[index] != '\'')) {
+      .                     break;
+      .                 }
+      .             }
+      .             /* Signals end of file has been reached. */
+      .             if (data->word[index] == EOF)
+      1 (0.0%)          line = EOF;
+      .         
+      .             /* Adding the null character. */
+ 15,818 (0.2%)      data->word[index] = '\0';
+      .         
+      .             return line;
+ 63,272 (0.8%)  }
+      .         
+      .         /* Function inserts a word and it's line number into the hash table. */
+      .         void insert(char *inword, int in_line, Word_Node *table[])
+102,804 (1.3%)  {
+  7,908 (0.1%)      int position = hash(inword);
+      .             Word_Node *curr, *prev = NULL;
+  7,908 (0.1%)      char dummy_word[DUMMY_WORD_LENGTH] = "A";
+      .         
+      .             /* The case where that hash position hasn't been used before; a new word
+      .             ** node is created. 
+      .             */
+ 31,632 (0.4%)      if (table[position] == NULL)
+  3,185 (0.0%)          table[position] = new_word_node(dummy_word, 0);
+  7,908 (0.1%)      curr = table[position];
+      .         
+      .             /* Traverses that position's list of words until the current word is found
+      .             ** (i.e. it's come up before) or the list end is reached (i.e. it's the
+      .             ** first occurrence of the word).
+      .             */
+118,384 (1.4%)      while ((curr != NULL) && (strcmp(inword, curr->word) > 0)) {
+      .                 prev = curr;
+ 28,366 (0.3%)          curr = curr->next_word;
+      .             }
+      .         
+      .             /* If the word hasn't appeared before, it's inserted alphabetically into
+      .             ** the list.
+      .             */
+ 35,410 (0.4%)      if ((curr == NULL) || (strcmp(curr->word, inword) != 0)) {
+  4,120 (0.1%)          prev->next_word = new_word_node(inword, in_line);
+  1,030 (0.0%)          prev->next_word->next_word = curr;
+      .             }
+      .             /* Otherwise, the word count is incremented, and the line number is added
+      .             ** to the existing list.
+      .             */
+      .             else {
+  6,878 (0.1%)          (curr->number)++;
+ 27,512 (0.3%)          curr->last_line = add_existing(curr->last_line, in_line);
+      .             }
+ 78,050 (1.0%)  }
+      .         
+      .         /* Function creates a new node for when a word is inserted for the first time.
+      .         */
+      .         Word_Node *new_word_node(char *inword, int in_line)
+ 10,002 (0.1%)  {
+      .             Word_Node *new;
+      .         
+  5,001 (0.1%)      new = (Word_Node *) create(sizeof(Word_Node));
+  8,335 (0.1%)      new->word = (char *) create(sizeof(char) * (strlen(inword) + 1));
+  1,667 (0.0%)      new->word = strcpy(new->word, inword);
+      .             /* The word count is set to 1, as this is the first occurrence! */
+  1,667 (0.0%)      new->number = 1;
+  1,667 (0.0%)      new->next_word = NULL;
+      .             /* One line number node is added. */
+  5,001 (0.1%)      new->line_list = (Line_Node *) create(sizeof(Line_Node));
+  1,667 (0.0%)      new->line_list->line = in_line;
+  1,667 (0.0%)      new->line_list->next_line = NULL;
+  1,667 (0.0%)      new->last_line = new->line_list;
+      .         
+      .             return new;
+  8,335 (0.1%)  }
+      .         
+      .         /* Function adds a line number to the line number list of a word that has
+      .         ** already been inserted at least once.  The pointer 'last_line', part of
+      .         ** the word node structure, allows easy appending to the list.
+      .         */
+      .         Line_Node *add_existing(Line_Node *last_line, int in_line)
+ 34,390 (0.4%)  {
+      .             /* Check to see if that line has already occurred - multiple occurrences on
+      .             ** the one line are only recorded once.  (Nb:  They are counted twice, but
+      .             ** only listed once.)
+      .             */
+ 13,756 (0.2%)      if (last_line->line != in_line) {
+ 18,009 (0.2%)          last_line->next_line = (Line_Node *) create(sizeof(Line_Node));
+ 12,006 (0.1%)          last_line = last_line->next_line;
+  6,003 (0.1%)          last_line->line = in_line;
+  6,003 (0.1%)          last_line->next_line = NULL;
+      .             }
+      .         
+      .             return last_line;
+ 40,393 (0.5%)  }
+      .         
+      .         /*  Function controls the interactive command line part of the program. */
+      .         void interact(Word_Node *table[])
+     12 (0.0%)  {
+      .             char args[MAX_WORD_LENGTH];     /* Array to hold command line */
+      .             Arg_Node *arg_list = NULL;      /* List that holds processed arguments */ 
+      .             int not_quitted = TRUE;         /* Quit flag */
+      .         
+      .             /* The prompt (?) is displayed.  Commands are read into an array, and then
+      .             ** individual arguments are placed into a linked list for easy use. 
+      .             ** The first argument (actually the command) is looked at to determine
+      .             ** what action should be performed.  'arg_list->next_arg' is passed to
+      .             ** count() and list_lines(), because the actual 'c' or 'l' is not needed
+      .             ** by them.  Lastly, the argument linked list is freed, by 'kill_arg_list'.  
+      .             */ 
+      .             do {
+      .                 printf("?");		     
+      .                 fgets(args, MAX_WORD_LENGTH - 1, stdin);
+      3 (0.0%)          arg_list = place_args_in_list(args);
+      2 (0.0%)          if (arg_list) {
+      7 (0.0%)              if (strcmp(arg_list->word, "c") == 0)
+      .         		count(arg_list->next_arg, table);
+      6 (0.0%)              else if (strcmp(arg_list->word, "l") == 0)
+      .                        	list_lines(arg_list->next_arg, table); 
+      8 (0.0%)              else if (strcmp(arg_list->word, "q") == 0) {
+      .                        	printf("Quitting concord\n");
+      1 (0.0%)  		not_quitted = FALSE;
+      .         	    }
+      .                     else
+      .                        	printf("Not a valid command.\n");
+      2 (0.0%)  	    kill_arg_list(arg_list);
+      .                 }
+      2 (0.0%)      } while (not_quitted);	/* Quits on flag */
+     11 (0.0%)  }
+      .         
+      .         /* Function takes an array containing a command line, and parses it, placing
+      .         ** actual word into a linked list.
+      .         */
+      .         Arg_Node *place_args_in_list(char command[])
+     10 (0.0%)  {
+      2 (0.0%)      int index1 = 0, index2 = 0, pos = BEFORE_WORD;
+      .             char token[MAX_WORD_LENGTH], c;
+      1 (0.0%)      Arg_Node *head = NULL;
+      .         
+      .             /* Non alphabetic characters are discarded.  Alphabetic characters are
+      .             ** copied into the array 'token'.  Once the current word has been copied
+      .             ** into 'token', 'append' is called, copying 'token' to a new node in the
+      .             ** linked list.
+      .             */
+     12 (0.0%)      while (command[index1] != '\0') {
+      8 (0.0%)          c = tolower(command[index1++]);
+     11 (0.0%)          if (islower(c)) {
+      3 (0.0%)              token[index2++] = c;
+      4 (0.0%)              pos = IN_WORD;
+      .                 }
+      2 (0.0%)          else if (c == '\'')
+      .                     token[index2] = c;
+      2 (0.0%)          else if (pos == IN_WORD) {
+      1 (0.0%)              pos = BEFORE_WORD;
+      2 (0.0%)              token[index2] = '\0';
+      4 (0.0%)              head = append(token, head);
+      2 (0.0%)              index2 = 0;
+      .                 }
+      .             }
+      .         
+      .             return head;
+     11 (0.0%)  }
+      .         
+      .         /* Function takes a word, and appends a new node containing that word to the
+      .         ** list.
+      .         */
+      .         Arg_Node *append(char *word, Arg_Node *head)
+      6 (0.0%)  {
+      .             Arg_Node *curr = head,
+      3 (0.0%)               *new = (Arg_Node *) create(sizeof(Arg_Node));
+      .         
+      6 (0.0%)      new->word = (char *) create(sizeof(char) * (strlen(word) + 1));
+      .             strcpy(new->word, word);
+      1 (0.0%)      new->line_list = NULL;
+      1 (0.0%)      new->next_arg = NULL;
+      .         
+      2 (0.0%)      if (head == NULL)
+      .                 return new;
+      .         
+      .             while (curr->next_arg != NULL)
+      .                 curr = curr->next_arg;
+      .             curr->next_arg = new;
+      .         
+      .             return head;
+      5 (0.0%)  }
+      .         
+      .         
+      .         /* Function displays the number of times a word has occurred. */
+      .         void count(Arg_Node *arg_list, Word_Node *table[])
+      .         {
+      .             int hash_pos = 0;		/* Only initialised to avoid gnuc warnings */
+      .             Word_Node *curr_word = NULL;  
+      .         
+-- line 375 ----------------------------------------
+-- line 514 ----------------------------------------
+      .                     *(master + index) = 0;
+      .         
+      .                 arg_head = arg_head->next_arg;
+      .             }
+      .         }
+      .         
+      .         /*  Function to free dynamic memory used by the arguments linked list. */
+      .         void kill_arg_list(Arg_Node *head)
+      5 (0.0%)  {
+      .             Arg_Node *temp;
+      .         
+      4 (0.0%)      while (head != NULL) {
+      .                 temp = head;
+      2 (0.0%)          head = head->next_arg;
+      2 (0.0%)          free(temp->word);
+      2 (0.0%)          free(temp);
+      .             }
+      4 (0.0%)  }
+      .         
+
+--------------------------------------------------------------------------------
+-- Annotated source file: /usr/include/ctype.h
+--------------------------------------------------------------------------------
+Ir____________ 
+
+-- line 201 ----------------------------------------
+      .         #   define isblank(c)	__isctype((c), _ISblank)
+      .         #  endif
+      .         # endif
+      .         
+      .         # ifdef __USE_EXTERN_INLINES
+      .         __extern_inline int
+      .         __NTH (tolower (int __c))
+      .         {
+456,071 (5.6%)    return __c >= -128 && __c < 256 ? (*__ctype_tolower_loc ())[__c] : __c;
+      .         }
+      .         
+      .         __extern_inline int
+      .         __NTH (toupper (int __c))
+      .         {
+      .           return __c >= -128 && __c < 256 ? (*__ctype_toupper_loc ())[__c] : __c;
+      .         }
+      .         # endif
+-- line 217 ----------------------------------------
+
+--------------------------------------------------------------------------------
+-- Annotation summary
+--------------------------------------------------------------------------------
+Ir_______________ 
+
+3,534,817 (43.1%)    annotated: files known & above threshold & readable, line numbers known
+        0            annotated: files known & above threshold & readable, line numbers unknown
+        0          unannotated: files known & above threshold & two or more non-identical
+4,132,126 (50.4%)  unannotated: files known & above threshold & unreadable 
+   59,950  (0.7%)  unannotated: files known & below threshold
+  468,163  (5.7%)  unannotated: files unknown
+
diff --git a/cachegrind/docs/concord.cgout b/cachegrind/docs/concord.cgout
new file mode 100644
index 000000000..e14054df5
--- /dev/null
+++ b/cachegrind/docs/concord.cgout
@@ -0,0 +1,5573 @@
+cmd: ./concord ../cg_main.c
+events: Ir
+fl=./csu/../csu/libc-start.c
+fn=__libc_start_main@@GLIBC_2.34
+128 2
+134 3
+135 6
+138 1
+139 2
+142 3
+143 2
+144 11
+145 4
+242 12
+332 3
+333 3
+358 6
+361 2
+364 2
+371 2
+373 2
+381 4
+fl=./csu/../sysdeps/nptl/libc_start_call_main.h
+fn=(below main)
+29 9
+44 3
+46 2
+51 2
+52 2
+55 2
+58 6
+74 2
+fl=./csu/./csu/init-first.c
+fn=_init_first
+46 4
+51 2
+55 5
+61 1
+62 1
+63 2
+71 2
+72 2
+fl=./ctype/../include/ctype.h
+fn=__ctype_b_loc
+40 65242
+41 130484
+42 65242
+fn=__ctype_tolower_loc
+52 65243
+53 130486
+54 65243
+fl=./ctype/./ctype/ctype-info.c
+fn=__ctype_init
+29 1
+31 7
+33 4
+35 4
+36 1
+fl=./elf/../bits/stdlib-bsearch.h
+fn=intel_check_word.constprop.0
+27 102
+28 51
+29 792
+31 972
+32 370
+37 126
+fl=./elf/../elf/dl-tls.c
+fn=_dl_add_to_slotinfo
+1015 8
+1021 1
+1024 1
+1025 1
+1029 3
+1063 2
+1066 3
+1067 3
+1070 6
+fn=_dl_allocate_tls_init
+528 10
+529 2
+533 2
+535 1
+536 1
+539 2
+542 3
+554 1
+559 18
+565 5
+568 3
+569 2
+575 3
+576 2
+578 9
+581 2
+582 2
+585 2
+586 4
+588 2
+589 2
+598 1
+606 2
+608 8
+614 1
+620 2
+623 2
+626 9
+fn=_dl_allocate_tls_storage
+370 1
+374 1
+375 1
+376 2
+379 1
+385 2
+422 4
+424 1
+435 1
+436 1
+437 2
+445 6
+446 1
+450 305
+469 1
+475 5
+fn=_dl_assign_tls_modid
+131 2
+134 2
+147 1
+186 1
+188 1
+191 1
+192 2
+fn=_dl_count_modids
+197 1
+199 2
+200 1
+215 2
+fn=_dl_determine_tlsoffset
+221 8
+222 1
+223 1
+224 1
+227 3
+230 2
+264 1
+266 10
+268 5
+270 3
+271 2
+273 2
+275 5
+286 1
+291 9
+293 2
+294 1
+306 1
+307 7
+309 2
+359 1
+360 8
+fn=_dl_tls_static_surplus_init
+83 3
+97 6
+101 6
+102 5
+108 3
+110 4
+113 1
+115 1
+117 3
+118 5
+fl=./elf/../include/list.h
+fn=__tls_init_tp
+43 2
+44 2
+45 1
+47 1
+fl=./elf/../include/rtld-malloc.h
+fn=__minimal_calloc
+56 8
+fn=_dl_allocate_tls_storage
+44 2
+56 1
+fn=_dl_check_map_versions
+44 9
+fn=_dl_find_object_init
+56 2
+fn=_dl_important_hwcaps
+56 3
+fn=_dl_init_paths
+56 5
+fn=_dl_map_object_deps
+56 6
+fn=_dl_new_object
+44 9
+56 6
+fn=fillin_rpath.isra.0
+50 4
+51 2
+56 4
+fn=init_tls
+44 2
+fl=./elf/../include/scratch_buffer.h
+fn=_dl_map_object_deps
+77 2
+78 1
+85 3
+131 8
+fl=./elf/../misc/sbrk.c
+fn=sbrk
+37 5
+40 1
+58 2
+62 2
+78 5
+fl=./elf/../nptl/nptl-stack.h
+fn=__libc_early_init
+58 8
+fl=./elf/../sysdeps/generic/_itoa.h
+fn=_dl_check_map_versions
+76 6
+fl=./elf/../sysdeps/generic/dl-cache.h
+fn=_dl_load_cache_lookup
+125 5
+194 1
+195 5
+fl=./elf/../sysdeps/generic/dl-debug.h
+fn=dl_main
+29 4
+30 1
+fl=./elf/../sysdeps/generic/dl-hash.h
+fn=__rtld_malloc_init_real
+43 1
+44 1
+45 19
+48 18
+62 24
+67 1
+72 1
+fl=./elf/../sysdeps/generic/dl-new-hash.h
+fn=_dl_lookup_symbol_x
+69 106
+70 212
+74 783
+77 1566
+80 727
+81 1454
+87 50
+88 50
+89 100
+90 50
+99 677
+101 677
+102 1354
+104 677
+105 677
+fl=./elf/../sysdeps/generic/dl-protected.h
+fn=do_lookup_x
+29 396
+fl=./elf/../sysdeps/generic/ldsodefs.h
+fn=_dl_relocate_object
+80 27
+138 321
+fn=_dl_start
+80 6
+fn=do_lookup_x
+137 198
+138 297
+fl=./elf/../sysdeps/nptl/dl-mutex.c
+fn=__rtld_mutex_init
+30 4
+37 1
+40 7
+44 2
+45 9
+47 6
+51 2
+52 7
+53 4
+fl=./elf/../sysdeps/nptl/dl-tls_init_tp.c
+fn=__tls_init_tp
+68 3
+69 1
+72 1
+75 4
+76 2
+77 1
+82 2
+87 4
+90 1
+93 3
+106 4
+123 3
+130 2
+131 3
+fn=__tls_pre_init_tp
+53 3
+56 10
+61 2
+62 1
+64 1
+fn=rtld_mutex_dummy
+42 8
+44 16
+fl=./elf/../sysdeps/nptl/pthread_early_init.h
+fn=__libc_early_init
+33 6
+34 1
+38 5
+45 2
+46 1
+52 7
+53 1
+54 1
+57 1
+fl=./elf/../sysdeps/posix/dl-fileid.h
+fn=_dl_map_object_from_fd
+37 10
+40 8
+49 16
+fl=./elf/../sysdeps/unix/sysv/linux/brk.c
+fn=brk
+36 1
+37 1
+38 2
+44 1
+45 1
+fl=./elf/../sysdeps/unix/sysv/linux/brk_call.h
+fn=brk
+24 2
+fl=./elf/../sysdeps/unix/sysv/linux/dl-osinfo.h
+fn=dl_main
+39 2
+52 1
+fl=./elf/../sysdeps/unix/sysv/linux/dl-parse_auxv.h
+fn=_dl_sysdep_parse_arguments
+32 2
+33 1
+34 1
+41 85
+42 40
+43 40
+45 1
+46 2
+47 3
+48 2
+49 2
+50 2
+51 2
+52 2
+54 2
+fl=./elf/../sysdeps/unix/sysv/linux/dl-sysdep.c
+fn=_dl_sysdep_parse_arguments
+78 5
+79 2
+80 2
+81 3
+82 82
+83 167
+86 2
+90 56
+93 2
+94 2
+95 2
+96 5
+fn=_dl_sysdep_start
+102 4
+103 1
+106 2
+110 2
+113 1
+115 2
+122 3
+123 2
+125 5
+137 3
+140 5
+143 4
+fn=_dl_sysdep_start_cleanup
+147 1
+148 1
+fl=./elf/../sysdeps/unix/sysv/linux/dl-vdso-setup.h
+fn=dl_main
+30 1
+33 1
+36 1
+39 1
+45 1
+fl=./elf/../sysdeps/unix/sysv/linux/dl-vdso.h
+fn=dl_main
+40 2
+41 5
+55 2
+fl=./elf/../sysdeps/unix/sysv/linux/rseq-internal.h
+fn=__tls_init_tp
+32 3
+34 7
+37 2
+40 1
+fl=./elf/../sysdeps/unix/sysv/linux/x86/cpu-features.c
+fn=init_cpu_features.constprop.0
+27 6
+fl=./elf/../sysdeps/unix/sysv/linux/x86/dl-minsigstacksize.h
+fn=get_common_indices.constprop.0
+24 2
+27 2
+28 2
+65 3
+72 1
+74 1
+fl=./elf/../sysdeps/x86/cpu-features.c
+fn=get_common_indices.constprop.0
+325 1
+329 4
+332 5
+336 1
+337 4
+338 4
+339 4
+340 3
+341 2
+348 2
+350 9
+355 8
+362 2
+363 7
+369 2
+384 2
+fn=init_cpu_features.constprop.0
+303 3
+304 2
+305 6
+310 2
+311 6
+316 2
+317 7
+399 7
+402 1
+403 1
+404 1
+415 3
+418 6
+419 1
+422 2
+424 6
+429 1
+431 3
+433 5
+434 15
+489 3
+503 1
+513 9
+555 1
+556 1
+564 2
+571 2
+573 1
+577 2
+583 2
+585 1
+687 3
+688 1
+691 2
+692 1
+698 1
+699 1
+700 1
+701 2
+706 4
+710 3
+723 2
+724 4
+770 1
+771 2
+775 3
+792 2
+793 2
+795 3
+796 2
+798 3
+799 2
+802 2
+817 4
+819 4
+829 2
+874 8
+fn=update_active.constprop.0
+43 7
+47 1
+55 3
+56 3
+57 1
+65 4
+66 1
+74 4
+75 1
+76 3
+80 4
+83 3
+84 1
+87 3
+88 2
+92 4
+93 3
+94 3
+95 2
+97 4
+98 3
+100 2
+101 4
+104 4
+105 4
+109 2
+113 2
+115 4
+119 2
+121 1
+124 2
+126 2
+130 1
+131 1
+134 4
+140 4
+142 4
+144 3
+149 4
+198 3
+210 2
+211 1
+214 4
+218 2
+222 3
+223 2
+225 1
+226 1
+229 2
+231 1
+234 2
+285 3
+289 3
+296 1
+297 8
+fl=./elf/../sysdeps/x86/dl-cacheinfo.h
+fn=get_common_cache_info.constprop.0
+481 9
+496 1
+497 1
+498 2
+499 1
+505 1
+507 6
+510 4
+521 1
+526 3
+530 2
+535 7
+538 32
+545 8
+548 26
+553 2
+557 3
+558 2
+562 6
+566 3
+569 4
+570 1
+575 5
+581 4
+589 5
+591 2
+593 4
+595 13
+597 1
+598 1
+599 9
+601 4
+604 2
+609 1
+611 5
+612 3
+613 1
+616 2
+619 2
+626 1
+628 4
+629 1
+634 2
+639 3
+640 1
+641 2
+642 3
+681 4
+682 5
+686 2
+693 1
+694 1
+695 7
+fn=handle_intel.constprop.0
+250 96
+255 24
+263 12
+264 12
+266 4
+272 48
+279 24
+280 60
+284 60
+286 24
+289 60
+291 24
+294 5
+296 2
+299 5
+301 2
+305 3
+310 96
+fn=init_cpu_features.constprop.0
+706 1
+708 1
+722 2
+724 4
+725 3
+726 4
+729 3
+731 3
+732 1
+734 3
+736 3
+737 1
+739 3
+741 3
+744 3
+746 3
+748 3
+750 4
+840 1
+841 2
+842 2
+843 2
+844 2
+845 2
+846 1
+847 1
+848 1
+849 2
+850 1
+851 1
+863 10
+873 2
+874 1
+881 12
+898 2
+907 7
+909 2
+912 5
+914 2
+917 5
+919 2
+922 5
+923 3
+929 6
+932 10
+933 8
+940 9
+942 10
+944 9
+958 3
+960 2
+961 1
+962 1
+963 1
+964 2
+965 1
+fn=intel_check_word.constprop.0
+110 690
+113 345
+119 104
+123 78
+125 15
+129 156
+131 154
+133 63
+135 126
+143 126
+152 12
+155 138
+158 99
+162 64
+164 186
+167 120
+168 72
+169 60
+170 48
+172 11
+174 22
+176 12
+177 16
+178 12
+179 20
+180 14
+181 12
+183 8
+184 16
+187 21
+194 204
+202 175
+216 51
+246 104
+fl=./elf/../sysdeps/x86/dl-cet.c
+fn=_dl_cet_check
+41 2
+42 1
+44 3
+45 1
+48 4
+49 2
+51 1
+62 2
+252 8
+254 8
+fl=./elf/../sysdeps/x86/dl-get-cpu-features.c
+fn=__x86_cpu_features
+39 3
+71 3
+fn=_dl_x86_init_cpu_features
+37 1
+39 3
+41 1
+fl=./elf/../sysdeps/x86/dl-hwcap.h
+fn=_dl_important_hwcaps
+57 4
+fl=./elf/../sysdeps/x86/dl-procinfo.h
+fn=_dl_load_cache_lookup
+39 2
+42 7
+fl=./elf/../sysdeps/x86/dl-prop.h
+fn=_dl_map_object_from_fd
+95 16
+100 4
+101 2
+102 2
+103 2
+105 18
+108 6
+109 4
+110 4
+114 6
+118 6
+121 6
+122 4
+126 2
+127 10
+131 6
+132 3
+135 6
+138 3
+139 12
+144 12
+151 6
+164 6
+167 11
+185 6
+187 8
+192 2
+193 8
+198 16
+200 8
+201 2
+202 2
+203 4
+212 6
+213 2
+fn=dl_main
+36 2
+37 15
+40 11
+49 22
+53 17
+71 4
+95 8
+100 2
+101 1
+102 1
+103 1
+105 10
+108 3
+109 2
+110 2
+114 3
+118 3
+121 3
+122 2
+126 1
+127 5
+131 4
+132 2
+135 4
+138 2
+139 8
+144 8
+151 4
+164 4
+165 2
+167 2
+170 4
+185 3
+187 4
+192 1
+193 4
+198 9
+200 4
+201 1
+202 1
+203 2
+212 5
+213 2
+fl=./elf/../sysdeps/x86/get-isa-level.h
+fn=_dl_hwcaps_subdirs_active
+30 2
+31 3
+32 3
+36 3
+39 4
+40 4
+45 4
+48 2
+51 5
+53 4
+54 2
+55 2
+58 2
+61 2
+fn=update_active.constprop.0
+28 1
+30 2
+31 2
+32 3
+36 3
+38 1
+39 2
+40 4
+45 4
+47 1
+48 2
+51 5
+53 4
+54 2
+55 2
+57 1
+58 2
+61 2
+fl=./elf/../sysdeps/x86_64/dl-hwcaps-subdirs.c
+fn=_dl_hwcaps_subdirs_active
+29 1
+38 3
+43 1
+48 1
+52 1
+fl=./elf/../sysdeps/x86_64/dl-machine.h
+fn=_dl_fixup
+220 2
+fn=_dl_relocate_object
+72 12
+78 4
+82 3
+88 1
+98 2
+118 2
+121 2
+123 4
+253 246
+260 246
+264 246
+273 947
+276 450
+277 240
+278 241
+281 6
+282 9
+293 6
+300 12
+303 738
+384 51
+386 68
+390 51
+399 34
+407 210
+444 6
+448 8
+449 8
+450 10
+451 4
+460 4
+461 6
+462 2
+463 2
+482 18
+483 6
+487 12
+488 12
+498 106
+502 28
+505 42
+506 14
+532 78
+534 39
+535 39
+fn=_dl_start
+46 1
+72 1
+264 21
+273 59
+276 14
+277 28
+303 21
+307 6
+fn=_dl_sysdep_start
+206 1
+fl=./elf/../sysdeps/x86_64/dl-runtime.h
+fn=_dl_fixup
+28 2
+fl=./elf/../sysdeps/x86_64/dl-trampoline.h
+fn=_dl_runtime_resolve_xsave
+71 2
+76 2
+79 2
+81 2
+91 2
+97 2
+98 2
+99 2
+100 2
+101 2
+102 2
+103 2
+107 2
+108 2
+111 2
+112 2
+114 2
+115 2
+116 2
+117 2
+118 2
+119 2
+121 2
+128 2
+129 2
+130 2
+131 2
+136 2
+137 2
+138 2
+140 2
+141 2
+142 2
+143 2
+144 2
+145 2
+146 2
+148 2
+150 2
+154 2
+156 2
+fl=./elf/./dl-find_object.h
+fn=_dl_find_object_from_map
+95 8
+96 8
+97 4
+103 20
+104 119
+105 82
+107 12
+112 4
+fl=./elf/./dl-hwcaps.h
+fn=_dl_important_hwcaps
+54 15
+56 6
+57 1
+88 10
+89 9
+90 5
+fl=./elf/./dl-load.h
+fn=_dl_map_object_from_fd
+91 16
+92 6
+94 16
+95 4
+96 6
+97 4
+99 4
+fl=./elf/./dl-main.h
+fn=dl_main
+112 5
+fl=./elf/./dl-map-segments.h
+fn=_dl_map_object_from_fd
+28 6
+29 20
+101 2
+102 6
+105 6
+106 4
+108 4
+125 2
+127 2
+135 48
+137 24
+139 54
+140 12
+149 32
+155 4
+156 2
+157 10
+158 4
+165 4
+168 4
+176 14
+182 4
+186 8
+189 2
+194 8
+fl=./elf/./dl-sym-post.h
+fn=lookup_malloc_symbol
+41 8
+53 12
+fl=./elf/./dl-tunable-types.h
+fn=__GI___tunable_set_val
+90 10
+fl=./elf/./elf/dl-audit.c
+fn=_dl_audit_activity_map
+29 6
+30 1
+31 3
+37 6
+fn=_dl_audit_activity_nsid
+41 18
+45 12
+46 3
+47 12
+51 18
+fn=_dl_audit_objclose
+97 4
+98 16
+fn=_dl_audit_objopen
+77 2
+78 8
+fn=_dl_audit_preinit
+118 1
+119 4
+fl=./elf/./elf/dl-cache.c
+fn=_dl_cache_libcmp
+367 30
+368 122
+370 162
+372 58
+378 6
+379 6
+380 52
+382 52
+384 4
+390 104
+392 104
+393 24
+396 44
+397 44
+400 16
+fn=_dl_load_cache_lookup
+194 16
+208 1
+212 7
+218 10
+219 2
+221 8
+228 26
+230 40
+231 16
+235 24
+239 24
+240 16
+248 4
+250 2
+253 2
+255 6
+257 1
+267 4
+271 3
+272 6
+277 4
+278 3
+294 2
+303 2
+306 2
+311 11
+312 6
+340 2
+352 4
+356 7
+357 4
+359 3
+413 10
+415 2
+418 3
+421 5
+429 5
+430 8
+433 6
+434 1
+441 1
+442 2
+492 3
+495 3
+510 2
+522 17
+523 4
+524 1
+525 8
+fn=_dl_unload_cache
+534 2
+535 4
+537 2
+538 1
+542 1
+544 2
+fl=./elf/./elf/dl-call-libc-early-init.c
+fn=_dl_call_libc_early_init
+27 4
+29 2
+33 7
+37 2
+39 4
+40 2
+41 3
+fl=./elf/./elf/dl-debug.c
+fn=_dl_debug_initialize
+56 2
+57 3
+59 2
+63 3
+64 1
+85 3
+90 3
+91 2
+92 1
+95 3
+96 6
+99 2
+107 2
+fn=_dl_debug_state
+116 2
+117 2
+fn=_dl_debug_update
+38 3
+40 6
+41 3
+44 6
+48 3
+fl=./elf/./elf/dl-deps.c
+fn=_dl_map_object_deps
+128 6
+129 5
+130 6
+132 3
+136 8
+143 13
+144 20
+160 8
+161 1
+164 1
+182 2
+183 6
+184 1
+188 7
+190 12
+193 4
+197 12
+198 25
+201 8
+205 4
+208 16
+210 8
+215 2
+216 2
+217 4
+218 4
+221 224
+222 104
+228 24
+230 2
+232 8
+233 4
+242 2
+244 4
+249 14
+252 2
+253 2
+254 2
+255 4
+257 2
+259 10
+263 6
+264 3
+266 150
+417 8
+419 8
+422 6
+423 8
+429 2
+430 10
+431 10
+434 2
+435 4
+439 12
+441 4
+442 17
+448 6
+449 1
+452 4
+463 4
+465 3
+470 4
+471 1
+472 5
+474 16
+478 4
+483 8
+485 9
+490 4
+494 2
+495 3
+532 2
+533 2
+547 7
+552 9
+553 4
+557 1
+559 2
+560 1
+561 2
+568 3
+571 2
+574 8
+fn=openaux
+61 6
+64 22
+65 2
+68 2
+69 4
+fl=./elf/./elf/dl-environ.c
+fn=_dl_next_ld_env_entry
+28 3
+29 3
+32 250
+34 164
+35 20
+37 2
+40 4
+42 2
+45 80
+49 1
+fl=./elf/./elf/dl-error-skeleton.c
+fn=_dl_catch_error
+225 10
+227 2
+228 2
+229 2
+230 2
+232 5
+fn=_dl_catch_exception
+175 15
+178 6
+180 6
+185 3
+199 3
+200 6
+203 6
+206 18
+208 9
+209 6
+210 12
+219 6
+fn=_dl_receive_error
+238 6
+239 1
+240 1
+243 1
+244 1
+246 1
+248 1
+249 1
+250 4
+fl=./elf/./elf/dl-find_object.c
+fn=_dl_find_object_init
+561 4
+564 1
+566 2
+567 3
+579 2
+580 3
+582 2
+585 2
+590 1
+591 1
+594 3
+596 2
+599 6
+601 1
+604 4
+fn=_dlfo_process_initial
+474 4
+475 2
+477 2
+478 4
+504 24
+505 22
+506 8
+508 28
+511 24
+513 18
+515 6
+516 6
+517 12
+529 2
+531 8
+fn=_dlfo_sort_mappings
+536 1
+537 2
+540 12
+544 4
+545 19
+546 19
+553 8
+554 10
+555 4
+557 1
+fl=./elf/./elf/dl-fini.c
+fn=_dl_fini
+31 11
+48 1
+51 13
+54 2
+56 1
+59 2
+61 6
+66 1
+68 3
+73 17
+78 16
+80 8
+82 8
+84 8
+85 4
+86 4
+90 4
+92 6
+93 6
+99 5
+108 2
+113 18
+115 4
+117 12
+120 8
+123 12
+124 6
+127 4
+138 8
+139 2
+140 4
+141 20
+142 4
+146 6
+147 6
+153 8
+158 4
+162 5
+168 6
+170 1
+174 2
+181 8
+fl=./elf/./elf/dl-hwcaps.c
+fn=_dl_important_hwcaps
+55 16
+57 2
+58 6
+59 2
+80 16
+82 2
+83 16
+87 2
+89 6
+90 2
+91 2
+103 4
+105 3
+108 1
+111 2
+115 4
+128 12
+130 10
+131 4
+132 4
+133 2
+136 5
+145 20
+146 10
+148 1
+154 2
+158 7
+159 3
+160 4
+164 1
+165 3
+166 2
+175 14
+176 6
+178 2
+179 4
+204 2
+205 4
+215 2
+216 4
+220 18
+221 6
+222 4
+225 3
+228 20
+229 1
+230 6
+231 4
+233 5
+234 2
+235 8
+236 1
+238 3
+240 7
+241 2
+242 1
+245 6
+246 1
+249 2
+253 2
+254 3
+257 6
+258 3
+260 1
+261 11
+262 4
+263 3
+273 4
+277 9
+284 4
+285 2
+292 2
+293 3
+306 4
+317 2
+330 2
+333 2
+340 7
+343 2
+346 20
+349 14
+350 23
+351 5
+354 10
+356 15
+361 34
+362 8
+366 24
+369 32
+370 64
+371 36
+373 8
+376 7
+377 2
+378 10
+380 6
+381 2
+383 1
+384 7
+391 18
+392 4
+394 8
+397 4
+403 9
+fl=./elf/./elf/dl-hwcaps_split.c
+fn=_dl_hwcaps_split
+25 4
+26 3
+27 1
+47 4
+fn=_dl_hwcaps_split_masked
+26 44
+30 24
+34 78
+35 12
+36 24
+41 36
+42 18
+43 12
+45 6
+51 77
+55 5
+56 18
+57 27
+58 24
+60 6
+62 66
+67 12
+fl=./elf/./elf/dl-init.c
+fn=_dl_init
+31 30
+33 26
+77 12
+78 1
+79 1
+82 3
+85 2
+90 2
+115 1
+116 17
+117 8
+123 8
+fn=call_init.part.0
+26 36
+39 12
+42 12
+43 2
+47 6
+55 9
+56 6
+59 3
+60 6
+66 6
+68 4
+69 22
+70 12
+72 24
+fl=./elf/./elf/dl-load.c
+fn=_dl_dst_count
+238 12
+241 4
+244 4
+245 2
+264 14
+fn=_dl_init_paths
+706 13
+720 4
+725 1
+727 3
+734 3
+735 2
+739 4
+741 4
+747 1
+748 1
+756 9
+758 8
+759 4
+761 5
+762 7
+763 7
+766 9
+767 9
+768 18
+770 29
+776 1
+777 1
+782 1
+785 3
+787 4
+789 3
+806 1
+808 3
+822 2
+825 5
+827 23
+831 2
+832 84
+833 20
+834 40
+836 1
+837 1
+838 2
+844 6
+847 3
+853 1
+857 8
+fn=_dl_map_object
+682 10
+692 5
+696 7
+1971 33
+1973 2
+1979 6
+1980 12
+1983 42
+1988 49
+1990 35
+1994 12
+1995 18
+1998 10
+1999 6
+2000 8
+2017 14
+2018 6
+2026 6
+2041 2
+2043 10
+2047 4
+2049 2
+2056 4
+2060 1
+2061 1
+2065 3
+2066 1
+2080 2
+2081 7
+2082 1
+2091 2
+2106 2
+2107 16
+2113 5
+2114 1
+2122 1
+2136 2
+2138 3
+2142 3
+2144 2
+2148 2
+2156 3
+2179 8
+2183 3
+2184 1
+2201 2
+2207 1
+2208 4
+2209 2
+2210 3
+2214 8
+2217 3
+2229 8
+2274 4
+2275 34
+2277 27
+fn=_dl_map_object_from_fd
+944 28
+954 6
+961 8
+999 29
+1000 10
+1018 4
+1046 8
+1056 4
+1063 16
+1064 4
+1075 4
+1076 6
+1077 4
+1079 6
+1080 8
+1081 2
+1084 17
+1085 6
+1096 2
+1098 2
+1100 44
+1101 2
+1102 2
+1103 2
+1104 2
+1110 87
+1111 213
+1117 4
+1124 5
+1125 6
+1126 20
+1131 2
+1132 1
+1137 62
+1145 32
+1146 52
+1147 40
+1148 16
+1149 16
+1151 48
+1153 16
+1166 16
+1167 30
+1172 8
+1173 82
+1183 8
+1186 3
+1190 4
+1191 1
+1195 5
+1196 2
+1199 1
+1204 4
+1219 4
+1220 2
+1223 8
+1228 14
+1238 38
+1239 16
+1245 4
+1255 8
+1262 14
+1278 8
+1279 6
+1285 6
+1286 4
+1288 4
+1298 6
+1317 6
+1319 12
+1371 6
+1372 2
+1378 70
+1379 117
+1385 33
+1390 8
+1403 4
+1405 8
+1407 4
+1423 4
+1427 4
+1428 4
+1445 4
+1446 1
+1449 4
+1454 4
+1464 4
+1471 16
+1472 6
+1473 9
+1474 3
+1475 5
+1481 4
+1482 4
+1487 2
+1494 6
+1497 8
+1516 4
+1520 14
+1521 8
+1525 18
+fn=_dl_process_pt_gnu_property
+868 6
+869 6
+870 3
+876 6
+882 15
+885 9
+886 6
+887 6
+930 3
+fn=expand_dynamic_string_token
+241 6
+244 4
+385 18
+399 4
+410 14
+fn=fillin_rpath.isra.0
+468 16
+472 1
+474 18
+477 1
+478 1
+481 4
+483 4
+487 2
+492 3
+493 2
+500 5
+505 4
+509 33
+510 18
+528 8
+532 8
+534 4
+538 2
+539 4
+540 8
+541 2
+543 4
+549 8
+550 56
+551 18
+553 4
+554 6
+555 2
+561 4
+562 2
+565 6
+571 1
+574 12
+fn=open_path
+1819 11
+1820 2
+1823 1
+1824 2
+1826 2
+1829 2
+1831 22
+1834 6
+1838 4
+1843 12
+1850 10
+1851 78
+1854 60
+1859 220
+1862 60
+1868 60
+1869 4
+1871 160
+1873 80
+1875 20
+1880 20
+1881 70
+1887 40
+1889 62
+1890 2
+1892 37
+1899 40
+1901 20
+1940 14
+1945 2
+1947 8
+1950 4
+1964 9
+fn=open_verify.constprop.0
+1578 264
+1610 66
+1615 40
+1629 110
+1631 44
+1639 2
+1640 2
+1645 16
+1647 4
+1649 4
+1651 4
+1657 4
+1674 29
+1755 4
+1760 4
+1766 8
+1772 4
+1778 8
+1779 8
+1783 16
+1784 5
+1805 198
+fl=./elf/./elf/dl-lookup-direct.c
+fn=_dl_lookup_direct
+32 15
+33 9
+35 6
+36 6
+48 9
+51 6
+53 18
+57 6
+58 21
+59 15
+74 36
+76 6
+78 21
+79 6
+81 12
+84 25
+86 12
+93 6
+115 3
+116 27
+fl=./elf/./elf/dl-lookup.c
+fn=_dl_lookup_symbol_x
+756 1272
+758 212
+759 424
+762 106
+766 406
+768 1166
+769 212
+771 212
+775 247
+776 1802
+781 530
+783 42
+800 106
+801 21
+804 212
+805 578
+812 99
+840 417
+854 297
+855 1
+857 198
+858 1
+859 954
+fn=check_match
+71 1313
+74 707
+79 103
+87 495
+90 444
+94 99
+95 198
+97 194
+116 194
+117 776
+118 485
+145 4
+147 12
+148 8
+163 606
+fn=do_lookup_x
+172 40
+348 1484
+349 106
+354 106
+355 106
+359 660
+362 660
+366 884
+370 656
+374 656
+380 984
+384 328
+385 328
+388 1640
+389 656
+392 328
+393 656
+395 328
+396 212
+397 656
+399 742
+400 1312
+403 2296
+406 208
+407 312
+408 208
+410 208
+413 1165
+415 303
+416 2558
+417 202
+420 303
+423 406
+437 891
+454 10
+460 198
+467 760
+471 24
+483 198
+484 99
+485 198
+502 693
+505 7
+506 848
+fl=./elf/./elf/dl-minimal-malloc.c
+fn=__minimal_calloc
+78 40
+82 8
+85 8
+86 24
+fn=__minimal_free
+95 2
+97 8
+fn=__minimal_malloc
+35 125
+36 75
+41 3
+42 2
+43 2
+47 99
+50 167
+55 8
+56 8
+58 2
+59 16
+61 4
+63 4
+65 6
+68 25
+69 27
+71 100
+fl=./elf/./elf/dl-minimal.c
+fn=__rtld_malloc_init_real
+76 8
+86 2
+87 1
+89 1
+91 6
+92 5
+93 5
+94 4
+99 1
+100 1
+101 1
+102 1
+103 7
+fn=__rtld_malloc_init_stubs
+42 1
+43 2
+44 2
+45 2
+46 2
+47 1
+fn=lookup_malloc_symbol
+61 28
+63 4
+64 28
+68 28
+69 28
+71 4
+72 20
+fn=strsep
+239 3
+242 9
+244 3
+245 6
+249 126
+254 78
+256 152
+260 21
+267 2
+271 6
+fl=./elf/./elf/dl-misc.c
+fn=_dl_name_match_p
+68 84
+69 56
+70 3
+72 14
+74 58
+75 90
+80 15
+82 11
+83 56
+fn=_dl_sysdep_read_whole_file
+36 9
+39 3
+40 2
+42 6
+44 2
+47 2
+49 8
+60 3
+63 7
+fl=./elf/./elf/dl-object.c
+fn=_dl_add_to_namespace_list
+31 18
+33 9
+35 26
+38 20
+40 2
+42 2
+45 2
+46 9
+47 6
+48 6
+50 6
+51 9
+fn=_dl_new_object
+59 42
+62 6
+64 2
+66 3
+67 2
+71 6
+77 2
+80 4
+83 11
+87 4
+92 2
+95 6
+98 3
+99 3
+100 6
+103 6
+104 21
+106 3
+119 10
+123 2
+125 1
+127 21
+130 6
+131 6
+132 3
+136 3
+139 58
+141 19
+149 6
+150 3
+153 1
+155 18
+157 6
+160 8
+164 8
+168 12
+176 1
+179 6
+189 9
+191 10
+195 6
+200 8
+208 2
+212 2
+245 8
+250 84
+251 84
+253 4
+256 2
+259 2
+263 27
+fl=./elf/./elf/dl-reloc.c
+fn=_dl_relocate_object
+170 492
+171 214
+174 621
+175 49
+177 7
+178 21
+182 510
+183 100
+184 100
+186 826
+187 94
+188 564
+190 706
+193 100
+194 700
+196 7
+207 52
+217 6
+218 12
+221 4
+223 8
+225 4
+226 12
+245 8
+251 16
+252 8
+253 1
+255 8
+262 16
+301 11129
+304 24
+328 4
+331 8
+348 12
+350 32
+356 24
+359 8
+363 8
+364 24
+fl=./elf/./elf/dl-runtime.c
+fn=_dl_fixup
+46 16
+48 10
+49 6
+54 2
+55 8
+56 14
+58 4
+63 4
+67 4
+71 6
+75 6
+76 8
+77 6
+84 2
+85 6
+95 22
+99 10
+109 20
+124 8
+133 6
+159 6
+163 12
+fl=./elf/./elf/dl-setup_hash.c
+fn=_dl_setup_hash
+25 8
+28 12
+31 12
+32 8
+33 4
+34 4
+36 12
+37 4
+38 12
+40 4
+41 8
+43 4
+45 12
+50 4
+fl=./elf/./elf/dl-sort-maps.c
+fn=_dl_sort_maps
+145 40
+186 2
+187 4
+188 34
+189 16
+216 34
+221 4
+223 2
+224 10
+226 26
+228 8
+231 24
+249 8
+262 2
+269 6
+282 8
+304 26
+312 6
+316 18
+fn=_dl_sort_maps_init
+295 2
+296 4
+297 1
+298 3
+299 2
+fn=dfs_traversal.part.0
+140 64
+145 20
+148 8
+150 24
+152 60
+161 28
+176 24
+177 8
+178 48
+fl=./elf/./elf/dl-tunables.c
+fn=__GI___tunable_set_val
+102 45
+111 5
+112 20
+113 25
+116 10
+119 10
+123 25
+130 10
+131 10
+134 30
+135 5
+136 5
+137 5
+157 25
+161 5
+fn=__GI___tunables_init
+71 415
+74 82
+77 6594
+81 164
+86 492
+270 3
+279 9
+298 164
+302 11480
+304 5
+308 15826
+320 328
+354 8
+fn=__tunable_get_val
+402 33
+405 357
+414 43
+424 23
+425 23
+431 165
+433 33
+fl=./elf/./elf/dl-tunables.h
+fn=__GI___tunable_set_val
+121 10
+122 5
+131 5
+fn=__GI___tunables_init
+140 2744
+141 3524
+fl=./elf/./elf/dl-version.c
+fn=_dl_check_all_versions
+392 6
+394 2
+396 17
+397 4
+398 36
+401 7
+fn=_dl_check_map_versions
+36 24
+37 5
+38 35
+56 32
+64 24
+70 24
+86 8
+87 16
+89 8
+94 192
+108 136
+110 8
+113 56
+120 180
+124 60
+155 36
+156 4
+164 6
+170 12
+172 16
+174 4
+175 4
+177 8
+180 6
+184 12
+200 6
+204 4
+208 6
+213 6
+215 6
+217 24
+218 73
+221 16
+224 32
+228 4
+231 32
+234 24
+239 6
+243 6
+257 15
+260 4
+263 190
+266 140
+270 44
+274 9
+279 3
+280 12
+281 9
+291 3
+294 17
+296 6
+299 8
+303 6
+306 24
+308 32
+310 40
+311 16
+312 24
+313 24
+316 24
+321 6
+324 6
+334 6
+337 4
+341 44
+343 92
+347 44
+348 220
+349 132
+350 44
+353 138
+357 44
+365 4
+366 4
+367 4
+368 2
+370 4
+372 108
+373 52
+375 2
+376 7
+387 36
+fl=./elf/./elf/do-rel.h
+fn=_dl_relocate_object
+49 24
+50 24
+51 4
+53 48
+72 4
+73 6
+80 24
+83 166
+85 212
+87 117
+96 5
+97 128
+98 78
+114 21
+123 34
+124 12
+126 21
+129 20
+131 491
+133 595
+134 476
+135 357
+136 476
+138 357
+140 10
+143 2
+147 117
+150 368
+151 66
+163 18
+164 28
+165 6
+168 6
+170 16
+171 2
+172 4
+179 35
+181 42
+182 12
+184 18
+192 6
+195 18
+209 6
+fn=_dl_start
+49 6
+50 4
+51 1
+53 10
+56 4
+61 25
+63 21
+64 14
+65 7
+fl=./elf/./elf/get-dynamic-info.h
+fn=_dl_map_object_from_fd
+39 4
+43 2
+45 178
+49 86
+54 38
+55 25
+56 10
+58 2
+59 10
+62 8
+64 8
+68 43
+72 8
+84 7
+85 8
+86 8
+87 8
+88 7
+89 7
+90 7
+91 8
+103 16
+110 7
+115 2
+122 2
+123 6
+129 6
+130 3
+147 6
+152 2
+154 2
+156 2
+158 2
+162 6
+164 3
+165 2
+174 2
+175 3
+180 2
+184 4
+fl=./elf/./elf/libc_early_init.c
+fn=__libc_early_init
+33 7
+35 1
+38 2
+41 2
+47 4
+49 2
+fl=./elf/./elf/rtld.c
+fn=_dl_start
+84 8
+460 2
+478 3
+479 1
+480 2
+482 2
+491 1
+497 4
+499 2
+520 16
+546 1
+549 2
+550 1
+566 142
+568 1
+581 1
+588 9
+fn=dl_main
+84 17
+91 12
+92 2
+100 9
+196 2
+198 1
+223 3
+224 5
+225 4
+295 3
+301 1
+302 1
+843 1
+845 1
+854 1
+856 1
+861 1
+1121 1
+1122 1
+1124 4
+1127 2
+1129 2
+1131 1
+1132 1
+1147 1
+1150 46
+1151 117
+1155 3
+1160 3
+1161 9
+1170 2
+1171 2
+1173 1
+1180 3
+1198 2
+1206 8
+1207 16
+1208 8
+1209 1
+1211 12
+1212 14
+1216 8
+1217 8
+1218 4
+1219 10
+1220 1
+1224 12
+1252 2
+1256 4
+1258 1
+1262 26
+1263 61
+1269 17
+1275 4
+1278 3
+1280 3
+1282 3
+1357 13
+1362 1
+1368 1
+1384 3
+1385 4
+1654 10
+1656 2
+1657 3
+1658 2
+1659 3
+1663 2
+1664 2
+1689 3
+1690 5
+1691 3
+1692 3
+1695 1
+1697 1
+1698 1
+1700 2
+1701 2
+1705 2
+1707 2
+1715 4
+1722 2
+1725 2
+1750 4
+1752 1
+1757 2
+1760 4
+1761 4
+1762 3
+1763 1
+1764 1
+1765 1
+1776 2
+1777 2
+1779 2
+1781 1
+1782 3
+1787 5
+1788 5
+1790 4
+1796 2
+1805 2
+1809 1
+1810 2
+1834 2
+1840 1
+1841 1
+1842 1
+1846 3
+1851 3
+1853 1
+1855 2
+1859 5
+1864 2
+1880 6
+1956 4
+1960 19
+1961 1
+1964 3
+1965 1
+1966 2
+1967 2
+1972 3
+1982 9
+1988 19
+1989 12
+1992 3
+1993 2
+1994 1
+1996 14
+1997 6
+2001 2
+2007 1
+2009 3
+2010 2
+2012 2
+2014 3
+2016 3
+2017 1
+2018 2
+2030 2
+2031 2
+2032 3
+2043 5
+2044 4
+2045 4
+2055 2
+2056 2
+2057 4
+2059 2
+2064 3
+2263 3
+2267 3
+2273 2
+2276 2
+2284 2
+2295 3
+2298 1
+2303 1
+2304 21
+2306 8
+2311 8
+2313 10
+2315 1
+2316 1
+2319 4
+2321 8
+2322 24
+2326 10
+2327 3
+2336 2
+2340 4
+2342 1
+2349 4
+2352 2
+2362 3
+2364 2
+2374 1
+2379 3
+2382 1
+2388 1
+2389 5
+2397 3
+2404 1
+2408 3
+2413 3
+2414 1
+2415 1
+2416 1
+2420 1
+2425 8
+2558 2
+2560 1
+2564 6
+2566 18
+2568 2
+2570 107
+2571 19
+2579 13
+2600 2
+2607 4
+2609 2
+2610 1
+2655 3
+2656 5
+2658 2
+2659 2
+2660 1
+2697 4
+2722 2
+fn=handle_preload_list
+182 3
+812 1
+816 2
+817 1
+818 1
+820 1
+822 6
+823 3
+831 3
+875 11
+876 2
+880 7
+883 4
+884 3
+886 4
+887 2
+893 1
+895 2
+897 2
+898 1
+901 9
+fn=init_tls
+737 3
+739 2
+743 3
+749 2
+752 1
+753 1
+754 1
+758 1
+760 1
+764 2
+765 2
+766 12
+767 4
+768 8
+772 3
+774 1
+776 3
+779 2
+782 1
+789 2
+790 2
+796 2
+799 8
+802 1
+803 1
+806 5
+fn=map_doit
+644 3
+646 4
+647 6
+649 2
+fn=version_check_doit
+677 3
+679 6
+683 2
+fl=./elf/./get-dynamic-info.h
+fn=_dl_start
+45 83
+49 40
+54 17
+55 11
+56 5
+58 1
+59 5
+62 4
+64 4
+68 21
+72 3
+84 4
+85 4
+86 4
+87 4
+88 5
+89 4
+90 4
+91 4
+103 10
+110 3
+115 2
+122 1
+123 3
+129 3
+130 3
+133 2
+134 3
+139 3
+142 3
+fn=dl_main
+39 4
+43 1
+45 107
+49 52
+54 25
+55 17
+56 5
+58 1
+59 5
+62 4
+64 4
+68 26
+72 7
+84 3
+85 5
+86 5
+87 5
+88 4
+89 5
+90 5
+91 5
+103 9
+110 3
+115 2
+122 3
+123 4
+129 3
+147 4
+152 2
+154 2
+156 2
+158 2
+159 2
+162 4
+164 3
+165 2
+174 2
+175 3
+180 2
+181 3
+184 3
+fl=./elf/./setup-vdso.h
+fn=dl_main
+24 2
+fl=./io/../sysdeps/unix/sysv/linux/access.c
+fn=access
+25 1
+27 7
+31 1
+fl=./io/../sysdeps/unix/sysv/linux/close_nocancel.c
+fn=__GI___close_nocancel
+25 3
+26 12
+27 3
+fn=__close_nocancel
+25 1
+26 4
+27 1
+fl=./io/../sysdeps/unix/sysv/linux/fstat64.c
+fn=fstat
+29 12
+30 12
+35 18
+fl=./io/../sysdeps/unix/sysv/linux/fstatat64.c
+fn=fstatat
+99 32
+154 32
+168 18
+169 57
+170 7
+fl=./io/../sysdeps/unix/sysv/linux/open64.c
+fn=open
+30 10
+33 7
+41 9
+43 7
+fl=./io/../sysdeps/unix/sysv/linux/open64_nocancel.c
+fn=__open_nocancel
+28 46
+31 161
+39 221
+41 23
+fl=./io/../sysdeps/unix/sysv/linux/pread64_nocancel.c
+fn=__pread64_nocancel
+25 4
+26 8
+27 2
+fl=./io/../sysdeps/unix/sysv/linux/read.c
+fn=read
+25 18
+26 108
+27 18
+fl=./io/../sysdeps/unix/sysv/linux/read_nocancel.c
+fn=__read_nocancel
+25 2
+26 8
+27 2
+fl=./io/../sysdeps/unix/sysv/linux/stat64.c
+fn=stat
+28 20
+29 40
+fl=./io/../sysdeps/unix/sysv/linux/write.c
+fn=write
+25 2
+26 12
+27 2
+fl=./libio/../include/sys/sysmacros.h
+fn=_IO_file_doallocate
+47 12
+fl=./libio/./libio/filedoalloc.c
+fn=_IO_file_doallocate
+78 27
+84 27
+86 12
+89 4
+91 4
+94 4
+97 12
+101 9
+102 6
+104 12
+105 3
+106 24
+fl=./libio/./libio/fileops.c
+fn=_IO_do_write@@GLIBC_2.2.5
+423 20
+425 14
+426 16
+433 4
+440 8
+443 2
+448 10
+449 10
+451 6
+452 4
+453 2
+454 4
+455 8
+fn=_IO_file_close
+1164 1
+1167 2
+fn=_IO_file_close_it@@GLIBC_2.2.5
+128 4
+130 3
+133 2
+134 3
+137 1
+139 2
+142 7
+145 3
+153 5
+154 4
+157 2
+158 1
+159 1
+160 1
+162 2
+163 5
+fn=_IO_file_finish@@GLIBC_2.2.5
+168 5
+169 2
+175 3
+176 3
+fn=_IO_file_fopen@@GLIBC_2.2.5
+213 11
+214 2
+222 2
+224 6
+227 1
+228 1
+247 8
+280 5
+283 2
+286 4
+287 2
+356 12
+fn=_IO_file_open
+182 9
+184 2
+185 1
+188 2
+189 2
+191 1
+192 6
+195 3
+205 2
+206 1
+207 4
+fn=_IO_file_overflow@@GLIBC_2.2.5
+731 15
+732 12
+739 13
+742 2
+744 2
+745 6
+754 3
+763 3
+764 1
+765 1
+766 1
+767 1
+768 3
+770 3
+771 5
+772 2
+774 6
+775 2
+776 3
+777 4
+780 6
+781 6
+782 8
+783 4
+784 3
+786 2
+787 11
+fn=_IO_file_read
+1130 18
+1131 18
+1132 18
+1133 54
+fn=_IO_file_setbuf@@GLIBC_2.2.5
+381 6
+382 6
+385 2
+387 6
+389 4
+390 4
+fn=_IO_file_stat
+1146 3
+1147 6
+fn=_IO_file_sync@@GLIBC_2.2.5
+792 10
+797 8
+799 2
+800 4
+811 4
+815 8
+fn=_IO_file_underflow@@GLIBC_2.2.5
+461 162
+465 54
+466 2
+468 36
+474 54
+477 36
+480 6
+485 4
+489 39
+496 16
+498 40
+500 7
+505 36
+511 108
+516 90
+518 36
+521 7
+525 17
+531 1
+534 51
+536 34
+537 144
+fn=_IO_file_write@@GLIBC_2.2.5
+1173 14
+1174 4
+1175 12
+1179 6
+1180 6
+1181 4
+1186 2
+1187 2
+1189 6
+1190 6
+1193 12
+fn=_IO_file_xsputn@@GLIBC_2.2.5
+1197 8
+1203 4
+1204 1
+1210 4
+1212 4
+1213 2
+1216 35
+1218 48
+1228 1
+1233 1
+1235 5
+1236 1
+1237 2
+1239 2
+1267 9
+fn=_IO_new_file_init_internal
+106 3
+110 1
+111 1
+113 1
+114 1
+115 2
+fl=./libio/./libio/genops.c
+fn=_IO_cleanup
+786 6
+787 11
+790 12
+799 9
+801 6
+807 26
+815 10
+817 4
+819 8
+820 2
+824 10
+826 6
+830 10
+831 16
+838 3
+842 9
+843 2
+863 11
+866 3
+878 12
+fn=_IO_default_finish
+54 2
+601 3
+603 3
+609 3
+612 3
+624 2
+fn=_IO_default_setbuf
+330 10
+332 2
+333 2
+337 4
+452 18
+453 10
+455 8
+457 6
+458 4
+462 2
+466 8
+467 2
+468 12
+fn=_IO_default_uflow
+361 90
+362 54
+363 36
+365 68
+366 72
+fn=_IO_doallocbuf
+343 12
+344 6
+346 12
+347 15
+350 12
+fn=_IO_flush_all_lockp
+686 12
+687 1
+691 6
+692 12
+695 12
+697 3
+698 3
+701 21
+709 6
+711 3
+715 9
+716 2
+720 12
+fn=_IO_link_in
+87 20
+88 8
+90 2
+92 7
+93 12
+94 1
+95 13
+97 2
+98 1
+100 13
+101 1
+102 7
+103 2
+106 21
+fn=_IO_no_init
+563 10
+564 1
+565 1
+566 2
+568 1
+572 6
+579 1
+581 1
+587 1
+588 6
+fn=_IO_old_init
+531 1
+532 2
+534 6
+539 7
+544 1
+550 2
+555 3
+556 2
+558 1
+fn=_IO_setb
+329 40
+330 18
+331 1
+332 4
+333 4
+335 26
+338 24
+fn=_IO_switch_to_get_mode
+164 90
+165 54
+168 54
+172 36
+173 36
+176 18
+178 36
+180 36
+181 18
+182 72
+fn=_IO_un_link
+53 2
+54 4
+82 2
+fn=_IO_unsave_markers
+960 3
+962 2
+967 3
+969 2
+fn=__GI__IO_un_link.part.0
+52 9
+58 6
+59 12
+60 1
+61 14
+63 3
+65 2
+66 2
+74 3
+76 13
+77 1
+78 9
+79 2
+82 9
+fn=__overflow
+199 6
+201 6
+202 1
+203 6
+204 4
+fn=__uflow
+299 90
+300 89
+305 54
+308 72
+310 36
+316 36
+321 54
+323 72
+324 54
+fl=./libio/./libio/getc.c
+fn=getc
+34 326210
+37 130484
+38 391418
+43 260951
+fl=./libio/./libio/iofclose.c
+fn=fclose@@GLIBC_2.2.5
+34 5
+48 3
+49 1
+51 15
+52 3
+53 3
+57 4
+58 3
+71 2
+76 5
+fl=./libio/./libio/iofgets.c
+fn=fgets
+32 6
+37 4
+39 3
+47 14
+51 3
+52 2
+53 7
+56 5
+57 1
+60 1
+63 4
+66 7
+fl=./libio/./libio/iofopen.c
+fn=fopen@@GLIBC_2.2.5
+37 2
+65 2
+67 3
+70 2
+72 7
+73 2
+74 2
+75 7
+85 8
+87 7
+fl=./libio/./libio/iogetline.c
+fn=_IO_getline
+33 1
+34 2
+fn=_IO_getline_info
+49 14
+51 2
+53 3
+54 2
+55 6
+57 6
+58 4
+60 2
+61 2
+67 2
+77 2
+78 1
+83 1
+85 6
+86 2
+88 3
+89 2
+90 3
+92 1
+94 2
+96 5
+97 2
+98 1
+107 8
+fl=./libio/./libio/ioputs.c
+fn=puts
+33 8
+35 2
+36 15
+38 2
+39 4
+40 6
+41 8
+42 4
+46 7
+fl=./libio/./libio/libioP.h
+fn=_IO_cleanup
+940 4
+942 4
+943 4
+fn=_IO_default_setbuf
+940 6
+942 4
+943 4
+fn=_IO_default_uflow
+940 54
+942 36
+943 36
+fn=_IO_do_write@@GLIBC_2.2.5
+940 8
+942 4
+943 4
+fn=_IO_doallocbuf
+940 9
+942 6
+943 6
+fn=_IO_file_close_it@@GLIBC_2.2.5
+940 3
+942 2
+943 2
+fn=_IO_file_doallocate
+940 9
+942 6
+943 6
+fn=_IO_file_underflow@@GLIBC_2.2.5
+883 2
+884 9
+940 40
+942 74
+943 38
+fn=__overflow
+940 6
+942 4
+943 4
+fn=__uflow
+940 54
+942 36
+943 36
+fn=fclose@@GLIBC_2.2.5
+855 4
+856 4
+862 2
+883 5
+884 7
+940 3
+942 2
+943 2
+fn=fgets
+883 3
+884 9
+fn=putchar
+883 2
+884 10
+fn=puts
+883 2
+884 10
+940 3
+942 2
+943 2
+fl=./libio/./libio/putchar.c
+fn=putchar
+25 7
+27 15
+28 7
+31 6
+fl=./libio/./libio/vtables.c
+fn=check_stdfiles_vtables
+83 1
+84 4
+85 3
+86 3
+88 1
+fl=./malloc/./malloc/arena.c
+fn=free
+162 15
+fn=malloc
+162 22026
+315 1
+fn=ptmalloc_init.part.0
+313 6
+318 1
+343 2
+347 3
+352 4
+353 4
+354 4
+355 4
+356 4
+357 4
+358 4
+360 4
+361 4
+362 4
+365 4
+366 4
+367 2
+430 7
+fl=./malloc/./malloc/malloc.c
+fn=_int_free
+2006 3
+3175 8
+3177 16
+3178 4
+3179 8
+4417 55
+4427 15
+4433 25
+4434 10
+4438 20
+4445 10
+4446 35
+4449 4
+4455 12
+4475 20
+4478 4
+4489 2
+4565 2
+4571 5
+4574 1
+4578 3
+4581 4
+4583 3
+4586 3
+4589 2
+4590 2
+4591 2
+4597 2
+4606 2
+4611 2
+4615 1
+4623 1
+4624 4
+4625 2
+4627 1
+4629 2
+4631 2
+4634 1
+4635 1
+4637 3
+4638 2
+4668 2
+4688 3
+4698 55
+fn=_int_malloc
+1338 22028
+1357 44056
+1999 33042
+3766 99126
+3807 22028
+3834 26270
+3836 57570
+3839 12423
+3841 55051
+3897 10
+3899 6
+3900 22022
+3902 33033
+3959 20
+3960 9
+3978 11014
+3979 33042
+3980 55068
+3984 11014
+3989 4
+3990 44076
+3992 4
+3993 8
+3994 4
+3996 8
+3997 12
+3999 4
+4000 16
+4002 16
+4004 8
+4005 8
+4007 8
+4018 16
+4019 4
+4020 2
+4021 1
+4024 1
+4025 1
+4026 1
+4027 3
+4028 3
+4029 2
+4031 2
+4035 8
+4037 3
+4038 1
+4041 1
+4049 3
+4050 3
+4054 6
+4083 6
+4091 21
+4092 12
+4093 3
+4096 6
+4140 15
+4143 18
+4144 3
+4145 3
+4146 3
+4147 3
+4152 6
+4153 6
+4162 9
+4168 6
+4179 22026
+4181 9
+4184 9
+4252 11013
+4253 22026
+4254 22026
+4255 22026
+4256 33039
+4261 22026
+4265 88090
+4268 121112
+4270 12
+4271 6
+4275 42
+4277 18
+4279 36
+4283 3
+4286 6
+4295 6
+4298 6
+4300 9
+4303 6
+4306 9
+4316 3
+4321 12
+4322 6
+4325 3
+4326 3
+4327 3
+4330 6
+4331 1
+4332 6
+4334 6
+4337 24
+4339 9
+4340 3
+4343 3
+4365 11010
+4366 22020
+4368 22020
+4371 33033
+4373 11007
+4374 11007
+4375 11007
+4376 88056
+4378 22014
+4381 11007
+4388 9
+4403 12
+4404 6
+4409 99126
+fn=free
+3346 20
+3350 10
+3358 10
+3360 5
+3362 15
+3379 15
+3385 10
+3388 5
+3389 20
+fn=malloc
+1338 22026
+1357 53651
+3235 4
+3281 66078
+3288 22026
+3298 19198
+3300 44052
+3303 22026
+3304 4
+3305 33036
+3313 22026
+3315 44052
+3316 55065
+3341 55065
+fn=ptmalloc_init.part.0
+1960 385
+1963 381
+1971 1
+1972 1
+1974 1
+3156 7
+fn=sysmalloc
+2022 16
+2542 27
+2562 12
+2563 3
+2573 6
+2574 6
+2600 3
+2601 9
+2602 3
+2604 6
+2611 29
+2617 9
+2620 9
+2681 6
+2690 12
+2703 9
+2707 3
+2711 21
+2719 6
+2721 15
+2722 15
+2724 3
+2727 6
+2758 6
+2759 2
+2760 15
+2766 10
+2767 8
+2769 4
+2809 3
+2831 2
+2832 7
+2834 2
+2835 5
+2847 4
+2889 1
+2890 4
+2891 5
+2902 2
+2935 6
+2936 3
+2940 2
+2941 6
+2944 6
+2946 3
+2947 3
+2948 3
+2949 24
+2950 6
+2952 3
+2958 33
+fn=tcache_init.part.0
+3229 3
+3238 8
+3239 4
+3240 2
+3248 4
+3255 2
+3257 2
+3258 88
+3261 4
+fn=unlink_chunk.constprop.0
+1620 3
+1622 15
+1625 6
+1626 6
+1628 12
+1631 3
+1632 3
+1633 15
+1635 6
+1636 9
+1639 6
+1653 3
+1654 3
+1657 6
+fl=./malloc/./malloc/morecore.c
+fn=__glibc_morecore
+25 8
+26 8
+29 4
+30 8
+34 8
+fl=./malloc/./malloc/scratch_buffer_set_array_size.c
+fn=__libc_scratch_buffer_set_array_size
+30 20
+34 2
+35 4
+45 4
+46 2
+63 8
+fl=./misc/../sysdeps/unix/syscall-template.S
+fn=mprotect
+117 20
+122 4
+fn=munmap
+117 5
+122 1
+fl=./misc/../sysdeps/unix/sysv/linux/brk.c
+fn=brk
+36 4
+37 8
+38 8
+44 4
+45 4
+fl=./misc/../sysdeps/unix/sysv/linux/brk_call.h
+fn=brk
+24 8
+fl=./misc/../sysdeps/unix/sysv/linux/mmap64.c
+fn=mmap
+47 24
+50 24
+58 48
+60 12
+fl=./misc/./misc/init-misc.c
+fn=__init_misc
+30 5
+31 5
+33 3
+37 5
+38 3
+40 4
+fl=./misc/./misc/sbrk.c
+fn=sbrk
+37 20
+40 8
+43 8
+58 8
+59 4
+62 8
+63 1
+66 12
+74 9
+78 20
+fl=./nptl/../sysdeps/unix/sysv/linux/x86/elision-conf.c
+fn=__lll_elision_init
+96 6
+101 5
+103 4
+105 4
+107 4
+109 4
+113 3
+114 1
+115 6
+fl=./nptl/./nptl/libc-cleanup.c
+fn=__libc_cleanup_pop_restore
+54 4
+55 4
+57 8
+59 4
+60 12
+71 4
+fn=__libc_cleanup_push_defer
+24 4
+25 4
+27 8
+29 8
+32 8
+46 12
+48 4
+49 4
+fl=./nptl/./nptl/pthread_mutex_conf.c
+fn=__pthread_tunables_init
+50 6
+51 5
+53 4
+55 6
+fl=./nptl/./nptl/pthread_mutex_lock.c
+fn=pthread_mutex_lock@@GLIBC_2.2.5
+44 1
+45 7
+46 1
+77 3
+80 3
+82 1
+84 2
+88 2
+97 2
+108 4
+112 1
+115 2
+124 1
+130 3
+131 2
+179 1
+182 1
+184 1
+187 1
+190 3
+fl=./nptl/./nptl/pthread_mutex_unlock.c
+fn=pthread_mutex_unlock@@GLIBC_2.2.5
+39 1
+40 4
+41 2
+51 3
+52 2
+57 2
+62 1
+65 1
+70 1
+72 1
+74 2
+80 4
+84 3
+87 3
+367 2
+369 2
+fl=./posix/../malloc/dynarray-skeleton.c
+fn=__unregister_atfork
+243 2
+fl=./posix/../sysdeps/unix/sysv/linux/_exit.c
+fn=_Exit
+27 2
+30 3
+31 2
+fl=./posix/./posix/register-atfork.c
+fn=__unregister_atfork
+71 4
+82 6
+83 8
+109 8
+110 4
+fl=./resource/../sysdeps/unix/sysv/linux/getrlimit64.c
+fn=getrlimit
+38 2
+39 7
+40 1
+fl=./setjmp/../sysdeps/x86_64/bsd-_setjmp.S
+fn=_setjmp
+28 1
+30 1
+32 1
+fl=./setjmp/../sysdeps/x86_64/setjmp.S
+fn=__sigsetjmp
+30 4
+32 4
+41 4
+42 8
+43 4
+47 4
+48 4
+49 4
+50 4
+51 4
+53 8
+55 4
+56 4
+57 4
+59 8
+61 4
+66 1
+67 1
+69 3
+72 3
+73 3
+80 3
+81 3
+84 1
+fl=./setjmp/./setjmp/sigjmp.c
+fn=__sigjmp_save
+28 3
+29 1
+30 2
+34 3
+fl=./stdlib/../sysdeps/unix/sysv/linux/getrandom.c
+fn=getrandom
+28 1
+29 6
+30 1
+fl=./stdlib/./stdlib/cxa_atexit.c
+fn=__cxa_atexit
+41 2
+43 8
+44 2
+46 2
+53 3
+55 1
+56 1
+57 1
+58 1
+59 4
+60 1
+69 6
+71 6
+fn=__new_exitfn
+82 5
+83 1
+88 2
+93 9
+95 4
+103 1
+124 1
+125 1
+136 1
+138 1
+139 1
+143 5
+fl=./stdlib/./stdlib/cxa_finalize.c
+fn=__cxa_finalize
+30 18
+33 12
+36 12
+40 12
+94 14
+98 12
+105 4
+106 4
+107 8
+108 16
+fl=./stdlib/./stdlib/cxa_thread_atexit_impl.c
+fn=__call_tls_dtors
+149 4
+150 4
+168 4
+fl=./stdlib/./stdlib/exit.c
+fn=__run_exit_handlers
+40 11
+45 2
+46 2
+48 5
+56 1
+58 3
+62 1
+66 6
+68 2
+71 7
+98 1
+105 2
+106 1
+107 1
+109 2
+112 4
+113 3
+114 4
+124 2
+125 2
+131 4
+133 2
+134 9
+136 2
+fn=exit
+142 4
+143 4
+fl=./string/../include/rtld-malloc.h
+fn=strdup
+56 6
+fl=./string/../string/strcspn.c
+fn=strcspn
+32 4
+33 3
+34 2
+39 6
+40 4
+41 4
+42 4
+44 1
+47 9
+48 6
+51 4
+52 4
+53 4
+54 4
+56 3
+61 17
+62 34
+63 34
+64 34
+65 17
+67 102
+69 1
+70 4
+71 3
+fl=./string/../string/strstr.c
+fn=__GI_strstr
+77 11
+82 3
+84 5
+85 2
+129 2
+161 12
+fl=./string/../sysdeps/x86/cacheinfo.c
+fn=__x86_cacheinfo
+86 3
+fl=./string/../sysdeps/x86/cacheinfo.h
+fn=__x86_cacheinfo
+61 1
+64 3
+66 3
+67 1
+73 3
+75 3
+76 1
+80 2
+82 2
+83 2
+84 2
+86 2
+fl=./string/../sysdeps/x86_64/multiarch/../multiarch/memcmp-sse2.S
+fn=bcmp
+59 1
+71 1
+72 1
+94 1
+95 1
+140 1
+141 1
+143 1
+144 1
+146 1
+147 1
+149 1
+150 1
+197 1
+198 1
+199 1
+200 1
+201 1
+202 1
+fl=./string/../sysdeps/x86_64/multiarch/../multiarch/memset-vec-unaligned-erms.S
+fn=memset
+140 7
+141 35
+146 7
+147 7
+150 6
+151 6
+189 4
+190 4
+192 4
+252 6
+253 6
+265 6
+268 6
+269 6
+281 2
+282 2
+295 2
+296 2
+298 2
+301 1
+304 2
+307 29
+308 29
+309 29
+310 29
+311 29
+312 29
+313 29
+316 2
+317 2
+318 2
+319 2
+324 2
+358 1
+360 1
+361 1
+400 1
+401 1
+403 1
+fl=./string/../sysdeps/x86_64/multiarch/../multiarch/strchr-sse2.S
+fn=index
+32 15
+33 15
+34 15
+35 15
+36 15
+37 15
+38 15
+39 15
+40 15
+41 15
+42 15
+43 15
+44 15
+45 15
+46 15
+47 15
+48 15
+49 15
+50 12
+54 12
+55 12
+56 12
+57 12
+59 12
+63 3
+64 3
+65 3
+66 3
+67 3
+68 3
+69 3
+70 3
+71 3
+72 3
+73 3
+74 3
+75 3
+76 3
+77 3
+78 3
+79 3
+80 3
+81 3
+82 3
+83 3
+84 3
+85 3
+86 5
+91 1
+92 1
+95 1
+96 1
+97 1
+98 1
+99 1
+100 1
+101 1
+102 1
+103 1
+104 1
+105 1
+106 1
+107 1
+108 1
+109 1
+110 1
+111 1
+112 1
+114 1
+115 1
+117 1
+118 1
+119 1
+120 1
+121 1
+122 1
+123 1
+124 1
+126 1
+127 1
+128 1
+129 1
+130 1
+131 1
+132 1
+133 1
+134 1
+135 1
+138 3
+142 3
+143 3
+144 3
+145 3
+147 3
+fl=./string/../sysdeps/x86_64/multiarch/../multiarch/strcmp-sse2.S
+fn=strcmp
+98 189
+131 189
+132 189
+134 189
+135 189
+156 189
+157 189
+158 109
+159 109
+160 78
+161 78
+162 78
+163 78
+180 78
+181 78
+182 78
+183 78
+184 78
+185 78
+186 78
+191 8
+192 24
+201 119
+202 119
+203 119
+204 119
+205 119
+206 119
+207 119
+208 119
+209 59
+210 36
+211 36
+212 36
+214 59
+215 59
+216 59
+217 59
+218 59
+219 59
+229 60
+230 60
+231 60
+233 60
+239 60
+240 60
+241 60
+242 60
+243 60
+248 60
+250 58
+251 58
+252 116
+260 58
+261 58
+264 58
+265 58
+266 58
+267 58
+268 58
+269 58
+300 2
+301 2
+302 2
+303 2
+304 2
+306 2
+307 2
+308 2
+309 2
+310 2
+311 2
+312 2
+313 2
+316 2
+317 2
+318 2
+324 2
+325 2
+326 4
+330 2
+331 2
+334 2
+335 2
+336 2
+338 2
+339 2
+340 2
+344 2
+345 2
+346 2
+347 2
+348 2
+349 2
+424 1
+425 1
+426 1
+427 1
+428 1
+430 1
+431 1
+432 1
+433 1
+434 1
+435 1
+436 1
+437 1
+440 1
+441 1
+442 1
+448 1
+449 1
+450 2
+454 1
+455 1
+458 1
+459 1
+460 1
+462 1
+463 1
+464 1
+468 1
+469 1
+470 1
+471 1
+472 1
+473 1
+661 1
+662 1
+663 1
+664 1
+665 1
+667 1
+668 1
+669 1
+670 1
+671 1
+672 1
+673 1
+674 1
+678 1
+679 1
+680 1
+686 1
+687 1
+688 2
+692 1
+693 1
+696 1
+697 1
+698 1
+700 1
+701 1
+702 1
+706 1
+707 1
+708 1
+709 1
+710 1
+711 1
+780 2
+781 2
+782 2
+783 2
+784 2
+786 2
+787 2
+788 2
+789 2
+790 2
+791 2
+792 2
+793 2
+797 2
+798 2
+799 2
+805 2
+806 2
+807 4
+811 2
+812 2
+815 2
+816 2
+817 2
+819 2
+820 2
+821 2
+825 2
+826 2
+827 2
+828 2
+829 2
+830 2
+837 1
+838 1
+840 1
+841 1
+843 1
+844 1
+845 1
+847 1
+848 1
+849 1
+853 1
+854 1
+855 1
+856 1
+857 1
+858 1
+899 1
+900 1
+901 1
+902 1
+903 1
+905 1
+906 1
+907 1
+908 1
+909 1
+910 1
+911 1
+1137 4
+1138 4
+1139 4
+1140 4
+1141 4
+1143 4
+1144 4
+1145 4
+1146 4
+1147 4
+1148 4
+1149 4
+1256 4
+1257 4
+1258 4
+1259 4
+1260 4
+1262 4
+1263 4
+1264 4
+1265 4
+1266 4
+1267 4
+1268 4
+1269 2
+1273 2
+1274 2
+1275 2
+1281 2
+1282 2
+1283 4
+1287 2
+1288 2
+1291 2
+1292 2
+1293 2
+1295 2
+1296 2
+1297 2
+1301 2
+1302 2
+1303 2
+1304 2
+1305 2
+1306 2
+1375 2
+1376 2
+1377 2
+1378 2
+1379 2
+1381 2
+1382 2
+1383 2
+1384 2
+1385 2
+1386 2
+1387 2
+1388 2
+1392 2
+1393 2
+1394 2
+1400 2
+1401 2
+1402 4
+1406 2
+1407 2
+1410 2
+1411 2
+1412 2
+1414 2
+1415 2
+1416 2
+1420 2
+1421 2
+1422 2
+1423 2
+1424 2
+1425 2
+1432 1
+1433 1
+1435 1
+1436 1
+1438 1
+1439 1
+1440 1
+1442 1
+1443 1
+1444 1
+1448 1
+1449 1
+1450 1
+1451 1
+1452 1
+1453 1
+1494 3
+1495 3
+1496 3
+1497 3
+1498 3
+1500 3
+1501 3
+1502 3
+1503 3
+1504 3
+1505 3
+1506 3
+1507 1
+1511 1
+1512 1
+1513 1
+1519 1
+1520 1
+1521 2
+1525 1
+1526 1
+1529 1
+1530 1
+1531 1
+1533 1
+1534 1
+1535 1
+1539 1
+1540 1
+1541 1
+1542 1
+1543 1
+1544 1
+1613 23
+1614 23
+1615 23
+1616 23
+1617 23
+1619 23
+1620 23
+1621 23
+1622 23
+1623 23
+1624 23
+1625 23
+1626 21
+1630 21
+1631 21
+1632 21
+1638 21
+1639 21
+1640 42
+1644 21
+1645 21
+1648 21
+1649 21
+1650 21
+1652 21
+1653 21
+1654 21
+1658 21
+1659 21
+1660 21
+1661 21
+1662 21
+1663 21
+1732 8
+1733 8
+1734 8
+1735 8
+1736 8
+1738 8
+1739 8
+1740 8
+1741 8
+1742 8
+1743 8
+1744 8
+1745 8
+1749 8
+1750 8
+1751 8
+1757 8
+1758 8
+1759 16
+1763 8
+1764 8
+1767 8
+1768 8
+1769 8
+1771 8
+1772 8
+1773 8
+1777 8
+1778 8
+1779 8
+1780 8
+1781 8
+1782 8
+1851 5
+1852 5
+1853 5
+1854 5
+1855 5
+1857 5
+1858 5
+1859 5
+1860 5
+1861 5
+1862 5
+1863 5
+1864 4
+1868 4
+1869 4
+1870 4
+1876 4
+1877 4
+1878 8
+1882 4
+1883 4
+1886 4
+1887 4
+1888 4
+1890 4
+1891 4
+1892 4
+1896 4
+1897 4
+1898 4
+1899 4
+1900 4
+1901 4
+1970 3
+1971 3
+1972 3
+1973 3
+1974 3
+1976 3
+1977 3
+1978 3
+1979 3
+1980 3
+1981 3
+1982 3
+2093 102
+2095 119
+2096 119
+2097 119
+2098 119
+2099 72
+2104 189
+2110 189
+2111 189
+2119 189
+2120 189
+fn=strncmp
+98 1
+125 1
+126 1
+127 1
+128 1
+129 1
+131 1
+132 1
+134 1
+135 1
+156 1
+157 1
+158 1
+159 1
+160 1
+161 1
+162 1
+163 1
+180 1
+181 1
+182 1
+183 1
+184 1
+185 1
+186 1
+2104 1
+2107 1
+2108 1
+2110 1
+2111 1
+2119 1
+2120 1
+fl=./string/../sysdeps/x86_64/multiarch/../multiarch/strlen-sse2.S
+fn=strlen
+57 16
+101 16
+102 16
+103 16
+104 16
+105 16
+106 16
+107 16
+109 16
+111 16
+142 16
+143 16
+144 16
+145 16
+146 16
+147 7
+149 7
+153 9
+154 9
+155 9
+156 9
+157 9
+158 9
+159 9
+160 9
+161 9
+162 9
+163 9
+164 9
+169 66
+243 3
+244 3
+245 3
+246 3
+247 3
+248 3
+249 3
+250 3
+266 3
+268 3
+269 42
+271 3
+272 3
+273 3
+275 3
+fl=./string/../sysdeps/x86_64/multiarch/ifunc-avx2.h
+fn=memrchr
+38 2
+40 5
+41 2
+42 3
+45 2
+46 2
+49 3
+52 6
+fn=rindex
+38 2
+40 5
+41 2
+42 3
+45 2
+46 2
+49 3
+52 6
+fn=strchrnul
+38 2
+40 5
+41 2
+42 3
+45 2
+46 2
+49 3
+52 6
+fn=strlen
+38 4
+40 10
+41 4
+42 6
+45 4
+46 4
+49 6
+52 12
+fn=strnlen
+38 4
+40 10
+41 4
+42 6
+45 4
+46 4
+49 6
+52 12
+fl=./string/../sysdeps/x86_64/multiarch/ifunc-evex.h
+fn=memchr
+37 2
+38 5
+39 3
+42 2
+43 2
+45 2
+51 3
+54 4
+fn=rawmemchr
+37 2
+38 5
+39 3
+42 2
+43 2
+45 2
+51 3
+54 4
+fl=./string/../sysdeps/x86_64/multiarch/ifunc-memcmp.h
+fn=bcmp
+34 5
+35 2
+36 2
+37 3
+40 2
+41 2
+44 3
+47 4
+fl=./string/../sysdeps/x86_64/multiarch/ifunc-memmove.h
+fn=memcpy@@GLIBC_2.14
+56 2
+57 3
+60 3
+61 2
+74 2
+77 2
+85 2
+93 2
+96 5
+fn=memmove
+56 2
+57 3
+60 3
+61 2
+74 2
+77 2
+85 2
+93 2
+96 5
+fn=mempcpy
+56 2
+57 3
+60 3
+61 2
+74 2
+77 2
+85 2
+93 2
+96 5
+fl=./string/../sysdeps/x86_64/multiarch/ifunc-memset.h
+fn=memset
+54 5
+57 3
+58 2
+64 2
+73 2
+75 2
+85 2
+93 2
+96 5
+fl=./string/../sysdeps/x86_64/multiarch/ifunc-sse4_2.h
+fn=strcspn
+36 5
+fn=strpbrk
+36 5
+fn=strspn
+36 5
+fl=./string/../sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
+fn=strcasecmp
+36 2
+37 4
+38 3
+41 2
+42 2
+45 3
+48 5
+fn=strcasecmp_l
+36 2
+37 4
+38 3
+41 2
+42 2
+45 3
+48 5
+fn=strncasecmp
+36 2
+37 4
+38 3
+41 2
+42 2
+45 3
+48 5
+fn=strncasecmp_l
+36 2
+37 4
+38 3
+41 2
+42 2
+45 3
+48 5
+fl=./string/../sysdeps/x86_64/multiarch/ifunc-strcpy.h
+fn=stpcpy
+38 4
+39 3
+42 2
+43 2
+46 3
+49 3
+fn=strcat
+38 4
+39 3
+42 2
+43 2
+46 3
+49 3
+fn=strcpy
+38 8
+39 6
+42 4
+43 4
+46 6
+49 6
+fl=./string/../sysdeps/x86_64/multiarch/ifunc-strncpy.h
+fn=stpncpy
+35 5
+36 3
+39 2
+40 2
+43 3
+46 4
+fn=strncpy
+35 5
+36 3
+39 2
+40 2
+43 3
+46 4
+fl=./string/../sysdeps/x86_64/multiarch/memchr-avx2.S
+fn=__memchr_avx2
+61 1
+68 1
+70 1
+73 1
+74 1
+76 1
+77 1
+78 1
+79 1
+82 1
+83 1
+86 1
+87 1
+101 1
+106 1
+113 1
+114 1
+115 1
+116 1
+fl=./string/../sysdeps/x86_64/multiarch/memchr.c
+fn=memchr
+29 2
+fl=./string/../sysdeps/x86_64/multiarch/memcmp.c
+fn=bcmp
+29 4
+fl=./string/../sysdeps/x86_64/multiarch/memcpy.c
+fn=memcpy@@GLIBC_2.14
+29 1
+fl=./string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
+fn=__memcpy_avx_unaligned_erms
+263 1
+264 1
+270 2
+271 2
+313 2
+314 2
+316 1
+317 1
+319 1
+323 1
+324 1
+325 1
+326 1
+327 1
+331 1
+333 1
+350 1
+351 1
+352 1
+353 1
+355 1
+fn=__mempcpy_avx_unaligned_erms
+250 1
+251 1
+252 1
+253 1
+fn=memcpy
+218 18
+219 18
+225 72
+226 72
+228 23
+229 23
+230 23
+232 19
+233 19
+234 19
+236 19
+316 49
+317 49
+319 14
+323 14
+324 7
+325 7
+326 3
+327 3
+328 2
+329 2
+331 3
+333 7
+339 7
+340 7
+341 7
+342 7
+343 7
+372 35
+373 35
+374 35
+375 35
+376 35
+400 4
+401 4
+403 4
+404 4
+405 4
+407 4
+408 4
+409 4
+410 4
+411 4
+412 4
+413 4
+414 4
+415 4
+416 4
+417 4
+418 4
+419 4
+420 4
+421 4
+fn=mempcpy
+205 54
+206 54
+207 54
+208 54
+fl=./string/../sysdeps/x86_64/multiarch/memmove.c
+fn=memmove
+29 1
+fl=./string/../sysdeps/x86_64/multiarch/mempcpy.c
+fn=mempcpy
+33 1
+fl=./string/../sysdeps/x86_64/multiarch/memrchr.c
+fn=memrchr
+29 3
+fl=./string/../sysdeps/x86_64/multiarch/memset.c
+fn=memset
+29 1
+fl=./string/../sysdeps/x86_64/multiarch/rawmemchr.c
+fn=rawmemchr
+31 2
+fl=./string/../sysdeps/x86_64/multiarch/stpcpy.c
+fn=stpcpy
+33 3
+fl=./string/../sysdeps/x86_64/multiarch/stpncpy.c
+fn=stpncpy
+31 3
+fl=./string/../sysdeps/x86_64/multiarch/strcasecmp.c
+fn=strcasecmp
+31 3
+fl=./string/../sysdeps/x86_64/multiarch/strcasecmp_l.c
+fn=strcasecmp_l
+31 3
+fl=./string/../sysdeps/x86_64/multiarch/strcat.c
+fn=strcat
+29 3
+fl=./string/../sysdeps/x86_64/multiarch/strchr-avx2.S
+fn=__strchr_avx2
+53 1
+55 1
+56 1
+57 1
+58 1
+59 1
+62 1
+63 1
+67 1
+68 1
+69 1
+70 1
+71 1
+72 1
+73 1
+74 1
+77 1
+85 1
+93 1
+94 2
+fl=./string/../sysdeps/x86_64/multiarch/strchr.c
+fn=index
+42 2
+43 4
+44 3
+47 2
+48 2
+51 3
+54 3
+65 3
+fl=./string/../sysdeps/x86_64/multiarch/strchrnul.c
+fn=strchrnul
+31 3
+fl=./string/../sysdeps/x86_64/multiarch/strcmp-avx2.S
+fn=__strcmp_avx2
+206 24179
+249 24179
+274 24179
+275 24179
+276 24179
+278 24179
+279 24179
+283 23917
+287 23917
+289 23917
+291 23917
+294 23917
+303 23917
+304 23917
+308 23917
+318 23917
+319 23917
+322 23917
+326 47834
+953 24179
+954 24179
+957 1799
+958 1799
+959 1799
+960 1799
+962 1799
+965 1799
+966 1799
+970 525
+971 525
+980 38
+984 38
+985 70
+993 234
+994 234
+995 234
+996 234
+997 234
+998 234
+1000 234
+1035 262
+1036 262
+1047 262
+1048 262
+1051 262
+1052 262
+1053 262
+1056 524
+1076 1274
+1077 1274
+1080 224
+1081 224
+1082 224
+1083 224
+1089 224
+1091 224
+1094 224
+1095 224
+1100 28
+1101 28
+1103 28
+1104 28
+1105 28
+1106 28
+1107 28
+1108 28
+1109 28
+fl=./string/../sysdeps/x86_64/multiarch/strcmp.c
+fn=strcmp
+47 4
+48 8
+49 6
+52 4
+53 4
+56 6
+59 6
+79 6
+fl=./string/../sysdeps/x86_64/multiarch/strcpy-avx2.S
+fn=__strcpy_avx2
+56 1668
+62 1668
+64 1668
+69 1668
+71 1668
+72 1668
+73 1668
+75 637
+76 637
+78 637
+79 637
+80 637
+94 637
+95 637
+307 1031
+308 1031
+309 1031
+310 1031
+320 1031
+321 1031
+352 637
+354 1668
+356 1668
+357 1668
+358 1668
+359 1668
+360 1665
+361 1665
+362 1414
+363 1414
+364 828
+365 828
+366 718
+367 718
+368 656
+552 656
+553 656
+562 1312
+566 62
+567 62
+568 62
+577 124
+581 110
+582 110
+591 220
+595 586
+596 586
+597 586
+598 586
+608 1172
+612 251
+613 251
+614 251
+615 251
+625 502
+629 3
+630 3
+631 3
+632 3
+642 6
+fl=./string/../sysdeps/x86_64/multiarch/strcpy.c
+fn=strcpy
+29 6
+fl=./string/../sysdeps/x86_64/multiarch/strcspn.c
+fn=strcspn
+29 2
+fl=./string/../sysdeps/x86_64/multiarch/strlen-avx2.S
+fn=__strlen_avx2
+52 1669
+65 1669
+66 1669
+67 1669
+70 1669
+72 1669
+73 1669
+76 1669
+77 1669
+85 1669
+86 1669
+87 1669
+92 3338
+fl=./string/../sysdeps/x86_64/multiarch/strlen.c
+fn=strlen
+29 6
+fl=./string/../sysdeps/x86_64/multiarch/strncase.c
+fn=strncasecmp
+31 3
+fl=./string/../sysdeps/x86_64/multiarch/strncase_l.c
+fn=strncasecmp_l
+31 3
+fl=./string/../sysdeps/x86_64/multiarch/strncmp.c
+fn=strncmp
+43 2
+44 4
+45 3
+48 2
+49 2
+52 3
+55 5
+67 3
+fl=./string/../sysdeps/x86_64/multiarch/strncpy.c
+fn=strncpy
+29 3
+fl=./string/../sysdeps/x86_64/multiarch/strnlen.c
+fn=strnlen
+31 6
+fl=./string/../sysdeps/x86_64/multiarch/strpbrk.c
+fn=strpbrk
+29 2
+fl=./string/../sysdeps/x86_64/multiarch/strrchr-avx2.S
+fn=__strrchr_avx2
+53 1
+54 1
+55 1
+57 1
+58 1
+62 1
+63 1
+64 1
+67 1
+69 1
+70 1
+71 1
+72 1
+75 1
+76 1
+78 1
+79 1
+80 1
+81 1
+82 1
+91 2
+fl=./string/../sysdeps/x86_64/multiarch/strrchr.c
+fn=rindex
+28 3
+fl=./string/../sysdeps/x86_64/multiarch/strspn.c
+fn=strspn
+29 2
+fl=./string/./string/strdup.c
+fn=strdup
+40 15
+41 6
+44 6
+47 12
+48 9
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/ifunc-avx2.h
+fn=wcschr
+38 4
+40 10
+41 4
+42 6
+45 4
+46 4
+49 6
+52 12
+fn=wcscmp
+38 2
+40 5
+41 2
+42 3
+45 2
+46 2
+49 3
+52 6
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/ifunc-evex.h
+fn=wmemchr
+37 4
+38 10
+39 6
+42 4
+43 4
+45 4
+51 6
+54 8
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/ifunc-memcmp.h
+fn=wmemcmp
+34 5
+35 2
+36 2
+37 3
+40 2
+41 2
+44 3
+47 4
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/ifunc-wcslen.h
+fn=wcslen
+40 2
+41 4
+42 3
+45 2
+46 2
+49 3
+52 3
+fn=wcsnlen
+40 2
+41 4
+42 3
+45 2
+46 2
+49 3
+52 3
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/ifunc-wmemset.h
+fn=wmemset
+36 10
+37 6
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/wcschr.c
+fn=wcschr
+31 6
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/wcscmp.c
+fn=wcscmp
+30 3
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/wcscpy.c
+fn=wcscpy
+38 5
+44 2
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/wcslen.c
+fn=wcslen
+29 3
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/wcsnlen.c
+fn=wcsnlen
+30 3
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/wmemchr.c
+fn=wmemchr
+31 4
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/wmemcmp.c
+fn=wmemcmp
+29 4
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/wmemset.c
+fn=wmemset
+31 4
+fl=/home/njn/grind/ws1/cachegrind/docs/concord.c
+fn=add_existing
+264 34400
+269 13760
+270 18015
+271 12010
+272 6005
+273 6005
+277 40405
+fn=append
+350 6
+352 3
+354 6
+356 1
+357 1
+359 2
+367 5
+fn=create
+109 22018
+112 22018
+113 22018
+119 22018
+fn=get_word
+168 87021
+169 15822
+175 529847
+177 260964
+178 391446
+179 64844
+180 15820
+181 7910
+183 32422
+185 147034
+191 1
+194 15822
+197 63288
+fn=hash
+125 7910
+126 7910
+128 161328
+129 453908
+fn=init_hash_table
+136 8
+139 2
+142 3
+145 2993
+146 997
+149 4
+150 2
+157 55377
+158 31640
+160 2
+161 2
+162 6
+fn=insert
+201 102830
+202 7910
+204 7910
+209 31640
+210 3185
+211 7910
+217 118410
+219 28372
+225 35420
+226 4120
+227 1030
+233 6880
+234 27520
+236 78070
+fn=interact
+281 12
+296 3
+297 2
+298 7
+300 6
+302 8
+304 1
+308 2
+310 2
+311 11
+fn=kill_arg_list
+522 5
+525 4
+527 2
+528 2
+529 2
+531 4
+fn=main
+89 8
+94 2
+99 4
+100 2
+105 7
+fn=new_word_node
+241 10002
+244 5001
+245 8335
+246 1667
+248 1667
+249 1667
+251 5001
+252 1667
+253 1667
+254 1667
+257 8335
+fn=place_args_in_list
+317 10
+318 2
+320 1
+327 12
+328 8
+329 11
+330 3
+331 4
+333 2
+335 2
+336 1
+337 2
+338 4
+339 2
+344 11
+fl=/usr/include/ctype.h
+fn=get_word
+209 456694
+fl=/usr/include/x86_64-linux-gnu/bits/stdio2.h
+fn=interact
+86 4
+213 5
+fl=/usr/include/x86_64-linux-gnu/bits/string_fortified.h
+fn=append
+79 2
+fn=new_word_node
+79 3334
+fl=???
+fn=(below main)
+0 12
+fn=???
+0 468701
+summary: 8201333