Delete all the old documentation ...

git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1283
This commit is contained in:
Julian Seward 2002-11-11 00:11:22 +00:00
parent 4623a5d36c
commit 50040b9ebc
19 changed files with 0 additions and 10803 deletions

View File

@ -1,10 +0,0 @@
<html>
<head>
<title>AddrCheck</title>
</head>
<body>
(no docs yet, sorry)
</body>
</html>

View File

@ -1,26 +0,0 @@
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta http-equiv="Content-Language" content="en-gb">
<meta name="generator"
content="Mozilla/4.76 (X11; U; Linux 2.4.1-0.1.9 i586) [Netscape]">
<meta name="author" content="Julian Seward <jseward@acm.org>">
<meta name="description" content="say what this prog does">
<meta name="keywords" content="Valgrind, memory checker, x86, GPL">
<title>Valgrind's user manual</title>
</head>
<frameset cols="150,*">
<frame name="nav" target="main" src="nav.html">
<frame name="main" src="manual.html" scrolling="auto">
<noframes>
<body>
<p>This page uses frames, but your browser doesn't support them.</p>
</body>
</noframes>
</frameset>
</html>

View File

@ -1,752 +0,0 @@
<html>
<head>
<style type="text/css">
body { background-color: #ffffff;
color: #000000;
font-family: Times, Helvetica, Arial;
font-size: 14pt}
h4 { margin-bottom: 0.3em}
code { color: #000000;
font-family: Courier;
font-size: 13pt }
pre { color: #000000;
font-family: Courier;
font-size: 13pt }
a:link { color: #0000C0;
text-decoration: none; }
a:visited { color: #0000C0;
text-decoration: none; }
a:active { color: #0000C0;
text-decoration: none; }
</style>
<title>Cachegrind</title>
</head>
<body bgcolor="#ffffff">
<a name="title">&nbsp;</a>
<h1 align=center>Cachegrind, version 1.0.0</h1>
<center>This manual was last updated on 20020726</center>
<p>
<center>
<a href="mailto:jseward@acm.org">jseward@acm.org</a><br>
Copyright &copy; 2000-2002 Julian Seward
<p>
Cachegrind is licensed under the GNU General Public License,
version 2<br>
An open-source tool for finding memory-management problems in
Linux-x86 executables.
</center>
<p>
<hr width="100%">
<a name="contents"></a>
<h2>Contents of this manual</h2>
<h4>1&nbsp; <a href="#cache">How to use Cachegrind</a></h4>
<h4>2&nbsp; <a href="techdocs.html">How Cachegrind works</a></h4>
<hr width="100%">
<a name="cache"></a>
<h2>1&nbsp; Cache profiling</h2>
Cachegrind is a tool for doing cache simulations and annotate your source
line-by-line with the number of cache misses. In particular, it records:
<ul>
<li>L1 instruction cache reads and misses;
<li>L1 data cache reads and read misses, writes and write misses;
<li>L2 unified cache reads and read misses, writes and writes misses.
</ul>
On a modern x86 machine, an L1 miss will typically cost around 10 cycles,
and an L2 miss can cost as much as 200 cycles. Detailed cache profiling can be
very useful for improving the performance of your program.<p>
Also, since one instruction cache read is performed per instruction executed,
you can find out how many instructions are executed per line, which can be
useful for traditional profiling and test coverage.<p>
Any feedback, bug-fixes, suggestions, etc, welcome.
<h3>1.1&nbsp; Overview</h3>
First off, as for normal Valgrind use, you probably want to compile with
debugging info (the <code>-g</code> flag). But by contrast with normal
Valgrind use, you probably <b>do</b> want to turn optimisation on, since you
should profile your program as it will be normally run.
The two steps are:
<ol>
<li>Run your program with <code>valgrind --skin=cachegrind</code> in front of
the normal command line invocation. When the program finishes,
Valgrind will print summary cache statistics. It also collects
line-by-line information in a file
<code>cachegrind.out.<i>pid</i></code>, where <code><i>pid</i></code>
is the program's process id.
<p>
This step should be done every time you want to collect
information about a new program, a changed program, or about the
same program with different input.
</li>
<p>
<li>Generate a function-by-function summary, and possibly annotate
source files with 'cg_annotate'. Source files to annotate can be
specified manually, or manually on the command line, or
"interesting" source files can be annotated automatically with
the <code>--auto=yes</code> option. You can annotate C/C++
files or assembly language files equally easily.
<p>
This step can be performed as many times as you like for each
Step 2. You may want to do multiple annotations showing
different information each time.<p>
</li>
</ol>
The steps are described in detail in the following sections.<p>
<h3>1.2&nbsp; Cache simulation specifics</h3>
Cachegrind uses a simulation for a machine with a split L1 cache and a unified
L2 cache. This configuration is used for all (modern) x86-based machines we
are aware of. Old Cyrix CPUs had a unified I and D L1 cache, but they are
ancient history now.<p>
The more specific characteristics of the simulation are as follows.
<ul>
<li>Write-allocate: when a write miss occurs, the block written to
is brought into the D1 cache. Most modern caches have this
property.</li><p>
<li>Bit-selection hash function: the line(s) in the cache to which a
memory block maps is chosen by the middle bits M--(M+N-1) of the
byte address, where:
<ul>
<li>&nbsp;line size = 2^M bytes&nbsp;</li>
<li>(cache size / line size) = 2^N bytes</li>
</ul> </li><p>
<li>Inclusive L2 cache: the L2 cache replicates all the entries of
the L1 cache. This is standard on Pentium chips, but AMD
Athlons use an exclusive L2 cache that only holds blocks evicted
from L1. Ditto AMD Durons and most modern VIAs.</li><p>
</ul>
The cache configuration simulated (cache size, associativity and line size) is
determined automagically using the CPUID instruction. If you have an old
machine that (a) doesn't support the CPUID instruction, or (b) supports it in
an early incarnation that doesn't give any cache information, then Cachegrind
will fall back to using a default configuration (that of a model 3/4 Athlon).
Cachegrind will tell you if this happens. You can manually specify one, two or
all three levels (I1/D1/L2) of the cache from the command line using the
<code>--I1</code>, <code>--D1</code> and <code>--L2</code> options.<p>
Other noteworthy behaviour:
<ul>
<li>References that straddle two cache lines are treated as follows:
<ul>
<li>If both blocks hit --&gt; counted as one hit</li>
<li>If one block hits, the other misses --&gt; counted as one miss</li>
<li>If both blocks miss --&gt; counted as one miss (not two)</li>
</ul><p></li>
<li>Instructions that modify a memory location (eg. <code>inc</code> and
<code>dec</code>) are counted as doing just a read, ie. a single data
reference. This may seem strange, but since the write can never cause a
miss (the read guarantees the block is in the cache) it's not very
interesting.<p>
Thus it measures not the number of times the data cache is accessed, but
the number of times a data cache miss could occur.<p>
</li>
</ul>
If you are interested in simulating a cache with different properties, it is
not particularly hard to write your own cache simulator, or to modify the
existing ones in <code>vg_cachesim_I1.c</code>, <code>vg_cachesim_D1.c</code>,
<code>vg_cachesim_L2.c</code> and <code>vg_cachesim_gen.c</code>. We'd be
interested to hear from anyone who does.
<a name="profile"></a>
<h3>1.3&nbsp; Profiling programs</h3>
Cache profiling is enabled by using the <code>--skin=cachegrind</code>
option to the <code>valgrind</code> shell script. To gather cache profiling
information about the program <code>ls -l</code>, type:
<blockquote><code>valgrind --skin=cachegrind ls -l</code></blockquote>
The program will execute (slowly). Upon completion, summary statistics
that look like this will be printed:
<pre>
==31751== I refs: 27,742,716
==31751== I1 misses: 276
==31751== L2 misses: 275
==31751== I1 miss rate: 0.0%
==31751== L2i miss rate: 0.0%
==31751==
==31751== D refs: 15,430,290 (10,955,517 rd + 4,474,773 wr)
==31751== D1 misses: 41,185 ( 21,905 rd + 19,280 wr)
==31751== L2 misses: 23,085 ( 3,987 rd + 19,098 wr)
==31751== D1 miss rate: 0.2% ( 0.1% + 0.4%)
==31751== L2d miss rate: 0.1% ( 0.0% + 0.4%)
==31751==
==31751== L2 misses: 23,360 ( 4,262 rd + 19,098 wr)
==31751== L2 miss rate: 0.0% ( 0.0% + 0.4%)
</pre>
Cache accesses for instruction fetches are summarised first, giving the
number of fetches made (this is the number of instructions executed, which
can be useful to know in its own right), the number of I1 misses, and the
number of L2 instruction (<code>L2i</code>) misses.<p>
Cache accesses for data follow. The information is similar to that of the
instruction fetches, except that the values are also shown split between reads
and writes (note each row's <code>rd</code> and <code>wr</code> values add up
to the row's total).<p>
Combined instruction and data figures for the L2 cache follow that.<p>
<h3>1.4&nbsp; Output file</h3>
As well as printing summary information, Cachegrind also writes
line-by-line cache profiling information to a file named
<code>cachegrind.out.<i>pid</i></code>. This file is human-readable, but is
best interpreted by the accompanying program <code>cg_annotate</code>,
described in the next section.
<p>
Things to note about the <code>cachegrind.out.<i>pid</i></code> file:
<ul>
<li>It is written every time <code>valgrind --skin=cachegrind</code>
is run, and will overwrite any existing
<code>cachegrind.out.<i>pid</i></code> in the current directory (but
that won't happen very often because it takes some time for process ids
to be recycled).</li>
<p>
<li>It can be huge: <code>ls -l</code> generates a file of about
350KB. Browsing a few files and web pages with a Konqueror
built with full debugging information generates a file
of around 15 MB.</li>
</ul>
Note that older versions of Cachegrind used a log file named
<code>cachegrind.out</code> (i.e. no <code><i>.pid</i></code> suffix).
The suffix serves two purposes. Firstly, it means you don't have to rename old
log files that you don't want to overwrite. Secondly, and more importantly,
it allows correct profiling with the <code>--trace-children=yes</code> option
of programs that spawn child processes.
<a name="profileflags"></a>
<h3>1.5&nbsp; Cachegrind options</h3>
Cachegrind accepts all the options that Valgrind does, although some of them
(ones related to memory checking) don't do anything when cache profiling.<p>
The interesting cache-simulation specific options are:
<ul>
<li><code>--I1=&lt;size&gt;,&lt;associativity&gt;,&lt;line_size&gt;</code><br>
<code>--D1=&lt;size&gt;,&lt;associativity&gt;,&lt;line_size&gt;</code><br>
<code>--L2=&lt;size&gt;,&lt;associativity&gt;,&lt;line_size&gt;</code><p>
[default: uses CPUID for automagic cache configuration]<p>
Manually specifies the I1/D1/L2 cache configuration, where
<code>size</code> and <code>line_size</code> are measured in bytes. The
three items must be comma-separated, but with no spaces, eg:
<blockquote>
<code>valgrind --skin=cachegrind --I1=65535,2,64</code>
</blockquote>
You can specify one, two or three of the I1/D1/L2 caches. Any level not
manually specified will be simulated using the configuration found in the
normal way (via the CPUID instruction, or failing that, via defaults).
</ul>
<a name="annotate"></a>
<h3>1.6&nbsp; Annotating C/C++ programs</h3>
Before using <code>cg_annotate</code>, it is worth widening your
window to be at least 120-characters wide if possible, as the output
lines can be quite long.
<p>
To get a function-by-function summary, run <code>cg_annotate
--<i>pid</i></code> in a directory containing a
<code>cachegrind.out.<i>pid</i></code> file. The <code>--<i>pid</i></code>
is required so that <code>cg_annotate</code> knows which log file to use when
several are present.
<p>
The output looks like this:
<pre>
--------------------------------------------------------------------------------
I1 cache: 65536 B, 64 B, 2-way associative
D1 cache: 65536 B, 64 B, 2-way associative
L2 cache: 262144 B, 64 B, 8-way associative
Command: concord vg_to_ucode.c
Events recorded: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
Events shown: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
Event sort order: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
Threshold: 99%
Chosen for annotation:
Auto-annotation: on
--------------------------------------------------------------------------------
Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
--------------------------------------------------------------------------------
27,742,716 276 275 10,955,517 21,905 3,987 4,474,773 19,280 19,098 PROGRAM TOTALS
--------------------------------------------------------------------------------
Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw file:function
--------------------------------------------------------------------------------
8,821,482 5 5 2,242,702 1,621 73 1,794,230 0 0 getc.c:_IO_getc
5,222,023 4 4 2,276,334 16 12 875,959 1 1 concord.c:get_word
2,649,248 2 2 1,344,810 7,326 1,385 . . . vg_main.c:strcmp
2,521,927 2 2 591,215 0 0 179,398 0 0 concord.c:hash
2,242,740 2 2 1,046,612 568 22 448,548 0 0 ctype.c:tolower
1,496,937 4 4 630,874 9,000 1,400 279,388 0 0 concord.c:insert
897,991 51 51 897,831 95 30 62 1 1 ???:???
598,068 1 1 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__flockfile
598,068 0 0 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__funlockfile
598,024 4 4 213,580 35 16 149,506 0 0 vg_clientmalloc.c:malloc
446,587 1 1 215,973 2,167 430 129,948 14,057 13,957 concord.c:add_existing
341,760 2 2 128,160 0 0 128,160 0 0 vg_clientmalloc.c:vg_trap_here_WRAPPER
320,782 4 4 150,711 276 0 56,027 53 53 concord.c:init_hash_table
298,998 1 1 106,785 0 0 64,071 1 1 concord.c:create
149,518 0 0 149,516 0 0 1 0 0 ???:tolower@@GLIBC_2.0
149,518 0 0 149,516 0 0 1 0 0 ???:fgetc@@GLIBC_2.0
95,983 4 4 38,031 0 0 34,409 3,152 3,150 concord.c:new_word_node
85,440 0 0 42,720 0 0 21,360 0 0 vg_clientmalloc.c:vg_bogus_epilogue
</pre>
First up is a summary of the annotation options:
<ul>
<li>I1 cache, D1 cache, L2 cache: cache configuration. So you know the
configuration with which these results were obtained.</li><p>
<li>Command: the command line invocation of the program under
examination.</li><p>
<li>Events recorded: event abbreviations are:<p>
<ul>
<li><code>Ir </code>: I cache reads (ie. instructions executed)</li>
<li><code>I1mr</code>: I1 cache read misses</li>
<li><code>I2mr</code>: L2 cache instruction read misses</li>
<li><code>Dr </code>: D cache reads (ie. memory reads)</li>
<li><code>D1mr</code>: D1 cache read misses</li>
<li><code>D2mr</code>: L2 cache data read misses</li>
<li><code>Dw </code>: D cache writes (ie. memory writes)</li>
<li><code>D1mw</code>: D1 cache write misses</li>
<li><code>D2mw</code>: L2 cache data write misses</li>
</ul><p>
Note that D1 total accesses is given by <code>D1mr</code> +
<code>D1mw</code>, and that L2 total accesses is given by
<code>I2mr</code> + <code>D2mr</code> + <code>D2mw</code>.</li><p>
<li>Events shown: the events shown (a subset of events gathered). This can
be adjusted with the <code>--show</code> option.</li><p>
<li>Event sort order: the sort order in which functions are shown. For
example, in this case the functions are sorted from highest
<code>Ir</code> counts to lowest. If two functions have identical
<code>Ir</code> counts, they will then be sorted by <code>I1mr</code>
counts, and so on. This order can be adjusted with the
<code>--sort</code> option.<p>
Note that this dictates the order the functions appear. It is <b>not</b>
the order in which the columns appear; that is dictated by the "events
shown" line (and can be changed with the <code>--show</code> option).
</li><p>
<li>Threshold: <code>cg_annotate</code> by default omits functions
that cause very low numbers of misses to avoid drowning you in
information. In this case, cg_annotate shows summaries the
functions that account for 99% of the <code>Ir</code> counts;
<code>Ir</code> is chosen as the threshold event since it is the
primary sort event. The threshold can be adjusted with the
<code>--threshold</code> option.</li><p>
<li>Chosen for annotation: names of files specified manually for annotation;
in this case none.</li><p>
<li>Auto-annotation: whether auto-annotation was requested via the
<code>--auto=yes</code> option. In this case no.</li><p>
</ul>
Then follows summary statistics for the whole program. These are similar
to the summary provided when running <code>valgrind --skin=cachegrind</code>.<p>
Then follows function-by-function statistics. Each function is
identified by a <code>file_name:function_name</code> pair. If a column
contains only a dot it means the function never performs
that event (eg. the third row shows that <code>strcmp()</code>
contains no instructions that write to memory). The name
<code>???</code> is used if the the file name and/or function name
could not be determined from debugging information. If most of the
entries have the form <code>???:???</code> the program probably wasn't
compiled with <code>-g</code>. If any code was invalidated (either due to
self-modifying code or unloading of shared objects) its counts are aggregated
into a single cost centre written as <code>(discarded):(discarded)</code>.<p>
It is worth noting that functions will come from three types of source files:
<ol>
<li> From the profiled program (<code>concord.c</code> in this example).</li>
<li>From libraries (eg. <code>getc.c</code>)</li>
<li>From Valgrind's implementation of some libc functions (eg.
<code>vg_clientmalloc.c:malloc</code>). These are recognisable because
the filename begins with <code>vg_</code>, and is probably one of
<code>vg_main.c</code>, <code>vg_clientmalloc.c</code> or
<code>vg_mylibc.c</code>.
</li>
</ol>
There are two ways to annotate source files -- by choosing them
manually, or with the <code>--auto=yes</code> option. To do it
manually, just specify the filenames as arguments to
<code>cg_annotate</code>. For example, the output from running
<code>cg_annotate concord.c</code> for our example produces the same
output as above followed by an annotated version of
<code>concord.c</code>, a section of which looks like:
<pre>
--------------------------------------------------------------------------------
-- User-annotated source: concord.c
--------------------------------------------------------------------------------
Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
[snip]
. . . . . . . . . void init_hash_table(char *file_name, Word_Node *table[])
3 1 1 . . . 1 0 0 {
. . . . . . . . . FILE *file_ptr;
. . . . . . . . . Word_Info *data;
1 0 0 . . . 1 1 1 int line = 1, i;
. . . . . . . . .
5 0 0 . . . 3 0 0 data = (Word_Info *) create(sizeof(Word_Info));
. . . . . . . . .
4,991 0 0 1,995 0 0 998 0 0 for (i = 0; i < TABLE_SIZE; i++)
3,988 1 1 1,994 0 0 997 53 52 table[i] = NULL;
. . . . . . . . .
. . . . . . . . . /* Open file, check it. */
6 0 0 1 0 0 4 0 0 file_ptr = fopen(file_name, "r");
2 0 0 1 0 0 . . . if (!(file_ptr)) {
. . . . . . . . . fprintf(stderr, "Couldn't open '%s'.\n", file_name);
1 1 1 . . . . . . exit(EXIT_FAILURE);
. . . . . . . . . }
. . . . . . . . .
165,062 1 1 73,360 0 0 91,700 0 0 while ((line = get_word(data, line, file_ptr)) != EOF)
146,712 0 0 73,356 0 0 73,356 0 0 insert(data->;word, data->line, table);
. . . . . . . . .
4 0 0 1 0 0 2 0 0 free(data);
4 0 0 1 0 0 2 0 0 fclose(file_ptr);
3 0 0 2 0 0 . . . }
</pre>
(Although column widths are automatically minimised, a wide terminal is clearly
useful.)<p>
Each source file is clearly marked (<code>User-annotated source</code>) as
having been chosen manually for annotation. If the file was found in one of
the directories specified with the <code>-I</code>/<code>--include</code>
option, the directory and file are both given.<p>
Each line is annotated with its event counts. Events not applicable for a line
are represented by a `.'; this is useful for distinguishing between an event
which cannot happen, and one which can but did not.<p>
Sometimes only a small section of a source file is executed. To minimise
uninteresting output, Valgrind only shows annotated lines and lines within a
small distance of annotated lines. Gaps are marked with the line numbers so
you know which part of a file the shown code comes from, eg:
<pre>
(figures and code for line 704)
-- line 704 ----------------------------------------
-- line 878 ----------------------------------------
(figures and code for line 878)
</pre>
The amount of context to show around annotated lines is controlled by the
<code>--context</code> option.<p>
To get automatic annotation, run <code>cg_annotate --auto=yes</code>.
cg_annotate will automatically annotate every source file it can find that is
mentioned in the function-by-function summary. Therefore, the files chosen for
auto-annotation are affected by the <code>--sort</code> and
<code>--threshold</code> options. Each source file is clearly marked
(<code>Auto-annotated source</code>) as being chosen automatically. Any files
that could not be found are mentioned at the end of the output, eg:
<pre>
--------------------------------------------------------------------------------
The following files chosen for auto-annotation could not be found:
--------------------------------------------------------------------------------
getc.c
ctype.c
../sysdeps/generic/lockfile.c
</pre>
This is quite common for library files, since libraries are usually compiled
with debugging information, but the source files are often not present on a
system. If a file is chosen for annotation <b>both</b> manually and
automatically, it is marked as <code>User-annotated source</code>.
Use the <code>-I/--include</code> option to tell Valgrind where to look for
source files if the filenames found from the debugging information aren't
specific enough.
Beware that cg_annotate can take some time to digest large
<code>cachegrind.out.<i>pid</i></code> files, e.g. 30 seconds or more. Also
beware that auto-annotation can produce a lot of output if your program is
large!
<h3>1.7&nbsp; Annotating assembler programs</h3>
Valgrind can annotate assembler programs too, or annotate the
assembler generated for your C program. Sometimes this is useful for
understanding what is really happening when an interesting line of C
code is translated into multiple instructions.<p>
To do this, you just need to assemble your <code>.s</code> files with
assembler-level debug information. gcc doesn't do this, but you can
use the GNU assembler with the <code>--gstabs</code> option to
generate object files with this information, eg:
<blockquote><code>as --gstabs foo.s</code></blockquote>
You can then profile and annotate source files in the same way as for C/C++
programs.
<h3>1.8&nbsp; <code>cg_annotate</code> options</h3>
<ul>
<li><code>--<i>pid</i></code></li><p>
Indicates which <code>cachegrind.out.<i>pid</i></code> file to read.
Not actually an option -- it is required.
<li><code>-h, --help</code></li><p>
<li><code>-v, --version</code><p>
Help and version, as usual.</li>
<li><code>--sort=A,B,C</code> [default: order in
<code>cachegrind.out.<i>pid</i></code>]<p>
Specifies the events upon which the sorting of the function-by-function
entries will be based. Useful if you want to concentrate on eg. I cache
misses (<code>--sort=I1mr,I2mr</code>), or D cache misses
(<code>--sort=D1mr,D2mr</code>), or L2 misses
(<code>--sort=D2mr,I2mr</code>).</li><p>
<li><code>--show=A,B,C</code> [default: all, using order in
<code>cachegrind.out.<i>pid</i></code>]<p>
Specifies which events to show (and the column order). Default is to use
all present in the <code>cachegrind.out.<i>pid</i></code> file (and use
the order in the file).</li><p>
<li><code>--threshold=X</code> [default: 99%] <p>
Sets the threshold for the function-by-function summary. Functions are
shown that account for more than X% of the primary sort event. If
auto-annotating, also affects which files are annotated.
Note: thresholds can be set for more than one of the events by appending
any events for the <code>--sort</code> option with a colon and a number
(no spaces, though). E.g. if you want to see the functions that cover
99% of L2 read misses and 99% of L2 write misses, use this option:
<blockquote><code>--sort=D2mr:99,D2mw:99</code></blockquote>
</li><p>
<li><code>--auto=no</code> [default]<br>
<code>--auto=yes</code> <p>
When enabled, automatically annotates every file that is mentioned in the
function-by-function summary that can be found. Also gives a list of
those that couldn't be found.
<li><code>--context=N</code> [default: 8]<p>
Print N lines of context before and after each annotated line. Avoids
printing large sections of source files that were not executed. Use a
large number (eg. 10,000) to show all source lines.
</li><p>
<li><code>-I=&lt;dir&gt;, --include=&lt;dir&gt;</code>
[default: empty string]<p>
Adds a directory to the list in which to search for files. Multiple
-I/--include options can be given to add multiple directories.
</ul>
<h3>1.9&nbsp; Warnings</h3>
There are a couple of situations in which cg_annotate issues warnings.
<ul>
<li>If a source file is more recent than the
<code>cachegrind.out.<i>pid</i></code> file. This is because the
information in <code>cachegrind.out.<i>pid</i></code> is only recorded
with line numbers, so if the line numbers change at all in the source
(eg. lines added, deleted, swapped), any annotations will be
incorrect.<p>
<li>If information is recorded about line numbers past the end of a file.
This can be caused by the above problem, ie. shortening the source file
while using an old <code>cachegrind.out.<i>pid</i></code> file. If this
happens, the figures for the bogus lines are printed anyway (clearly
marked as bogus) in case they are important.</li><p>
</ul>
<h3>1.10&nbsp; Things to watch out for</h3>
Some odd things that can occur during annotation:
<ul>
<li>If annotating at the assembler level, you might see something like this:
<pre>
1 0 0 . . . . . . leal -12(%ebp),%eax
1 0 0 . . . 1 0 0 movl %eax,84(%ebx)
2 0 0 0 0 0 1 0 0 movl $1,-20(%ebp)
. . . . . . . . . .align 4,0x90
1 0 0 . . . . . . movl $.LnrB,%eax
1 0 0 . . . 1 0 0 movl %eax,-16(%ebp)
</pre>
How can the third instruction be executed twice when the others are
executed only once? As it turns out, it isn't. Here's a dump of the
executable, using <code>objdump -d</code>:
<pre>
8048f25: 8d 45 f4 lea 0xfffffff4(%ebp),%eax
8048f28: 89 43 54 mov %eax,0x54(%ebx)
8048f2b: c7 45 ec 01 00 00 00 movl $0x1,0xffffffec(%ebp)
8048f32: 89 f6 mov %esi,%esi
8048f34: b8 08 8b 07 08 mov $0x8078b08,%eax
8048f39: 89 45 f0 mov %eax,0xfffffff0(%ebp)
</pre>
Notice the extra <code>mov %esi,%esi</code> instruction. Where did this
come from? The GNU assembler inserted it to serve as the two bytes of
padding needed to align the <code>movl $.LnrB,%eax</code> instruction on
a four-byte boundary, but pretended it didn't exist when adding debug
information. Thus when Valgrind reads the debug info it thinks that the
<code>movl $0x1,0xffffffec(%ebp)</code> instruction covers the address
range 0x8048f2b--0x804833 by itself, and attributes the counts for the
<code>mov %esi,%esi</code> to it.<p>
</li>
<li>Inlined functions can cause strange results in the function-by-function
summary. If a function <code>inline_me()</code> is defined in
<code>foo.h</code> and inlined in the functions <code>f1()</code>,
<code>f2()</code> and <code>f3()</code> in <code>bar.c</code>, there will
not be a <code>foo.h:inline_me()</code> function entry. Instead, there
will be separate function entries for each inlining site, ie.
<code>foo.h:f1()</code>, <code>foo.h:f2()</code> and
<code>foo.h:f3()</code>. To find the total counts for
<code>foo.h:inline_me()</code>, add up the counts from each entry.<p>
The reason for this is that although the debug info output by gcc
indicates the switch from <code>bar.c</code> to <code>foo.h</code>, it
doesn't indicate the name of the function in <code>foo.h</code>, so
Valgrind keeps using the old one.<p>
<li>Sometimes, the same filename might be represented with a relative name
and with an absolute name in different parts of the debug info, eg:
<code>/home/user/proj/proj.h</code> and <code>../proj.h</code>. In this
case, if you use auto-annotation, the file will be annotated twice with
the counts split between the two.<p>
</li>
<li>Files with more than 65,535 lines cause difficulties for the stabs debug
info reader. This is because the line number in the <code>struct
nlist</code> defined in <code>a.out.h</code> under Linux is only a 16-bit
value. Valgrind can handle some files with more than 65,535 lines
correctly by making some guesses to identify line number overflows. But
some cases are beyond it, in which case you'll get a warning message
explaining that annotations for the file might be incorrect.<p>
</li>
<li>If you compile some files with <code>-g</code> and some without, some
events that take place in a file without debug info could be attributed
to the last line of a file with debug info (whichever one gets placed
before the non-debug-info file in the executable).<p>
</li>
</ul>
This list looks long, but these cases should be fairly rare.<p>
Note: stabs is not an easy format to read. If you come across bizarre
annotations that look like might be caused by a bug in the stabs reader,
please let us know.<p>
<h3>1.11&nbsp; Accuracy</h3>
Valgrind's cache profiling has a number of shortcomings:
<ul>
<li>It doesn't account for kernel activity -- the effect of system calls on
the cache contents is ignored.</li><p>
<li>It doesn't account for other process activity (although this is probably
desirable when considering a single program).</li><p>
<li>It doesn't account for virtual-to-physical address mappings; hence the
entire simulation is not a true representation of what's happening in the
cache.</li><p>
<li>It doesn't account for cache misses not visible at the instruction level,
eg. those arising from TLB misses, or speculative execution.</li><p>
<li>Valgrind's custom <code>malloc()</code> will allocate memory in different
ways to the standard <code>malloc()</code>, which could warp the results.
</li><p>
<li>Valgrind's custom threads implementation will schedule threads
differently to the standard one. This too could warp the results for
threaded programs.
</li><p>
<li>The instructions <code>bts</code>, <code>btr</code> and <code>btc</code>
will incorrectly be counted as doing a data read if both the arguments
are registers, eg:
<blockquote><code>btsl %eax, %edx</code></blockquote>
This should only happen rarely.
</li><p>
<li>FPU instructions with data sizes of 28 and 108 bytes (e.g.
<code>fsave</code>) are treated as though they only access 16 bytes.
These instructions seem to be rare so hopefully this won't affect
accuracy much.
</li><p>
</ul>
Another thing worth nothing is that results are very sensitive. Changing the
size of the <code>valgrind.so</code> file, the size of the program being
profiled, or even the length of its name can perturb the results. Variations
will be small, but don't expect perfectly repeatable results if your program
changes at all.<p>
While these factors mean you shouldn't trust the results to be super-accurate,
hopefully they should be close enough to be useful.<p>
<h3>1.12&nbsp; Todo</h3>
<ul>
<li>Program start-up/shut-down calls a lot of functions that aren't
interesting and just complicate the output. Would be nice to exclude
these somehow.</li>
<p>
</ul>
<hr width="100%">
</body>
</html>

View File

@ -1,35 +0,0 @@
<html>
<head>
<title>Valgrind</title>
<base target="main">
<style type="text/css">
<style type="text/css">
body { background-color: #ffffff;
color: #000000;
font-family: Times, Helvetica, Arial;
font-size: 14pt}
h4 { margin-bottom: 0.3em}
code { color: #000000;
font-family: Courier;
font-size: 13pt }
pre { color: #000000;
font-family: Courier;
font-size: 13pt }
a:link { color: #0000C0;
text-decoration: none; }
a:visited { color: #0000C0;
text-decoration: none; }
a:active { color: #0000C0;
text-decoration: none; }
</style>
</head>
<body>
<br>
<a href="manual.html#contents"><b>Contents of this manual</b></a><br>
<a href="manual.html#cache">1 <b>How to use Cachegrind</b></a></h4>
<p>
<a href="techdocs.html">2 <b>How Cachegrind works</b></a><br>
</body>
</html>

View File

@ -1,461 +0,0 @@
<html>
<head>
<style type="text/css">
body { background-color: #ffffff;
color: #000000;
font-family: Times, Helvetica, Arial;
font-size: 14pt}
h4 { margin-bottom: 0.3em}
code { color: #000000;
font-family: Courier;
font-size: 13pt }
pre { color: #000000;
font-family: Courier;
font-size: 13pt }
a:link { color: #0000C0;
text-decoration: none; }
a:visited { color: #0000C0;
text-decoration: none; }
a:active { color: #0000C0;
text-decoration: none; }
</style>
<title>The design and implementation of Valgrind</title>
</head>
<body bgcolor="#ffffff">
<a name="title">&nbsp;</a>
<h1 align=center>How Cachegrind works</h1>
<center>
Detailed technical notes for hackers, maintainers and the
overly-curious<br>
These notes pertain to snapshot 20020306<br>
<p>
<a href="mailto:jseward@acm.org">jseward@acm.org<br>
<a href="http://developer.kde.org/~sewardj">http://developer.kde.org/~sewardj</a><br>
Copyright &copy; 2000-2002 Julian Seward
<p>
Valgrind is licensed under the GNU General Public License,
version 2<br>
An open-source tool for finding memory-management problems in
x86 GNU/Linux executables.
</center>
<p>
<hr width="100%">
<h2>Cache profiling</h2>
Valgrind is a very nice platform for doing cache profiling and other kinds of
simulation, because it converts horrible x86 instructions into nice clean
RISC-like UCode. For example, for cache profiling we are interested in
instructions that read and write memory; in UCode there are only four
instructions that do this: <code>LOAD</code>, <code>STORE</code>,
<code>FPU_R</code> and <code>FPU_W</code>. By contrast, because of the x86
addressing modes, almost every instruction can read or write memory.<p>
Most of the cache profiling machinery is in the file
<code>vg_cachesim.c</code>.<p>
These notes are a somewhat haphazard guide to how Valgrind's cache profiling
works.<p>
<h3>Cost centres</h3>
Valgrind gathers cache profiling about every instruction executed,
individually. Each instruction has a <b>cost centre</b> associated with it.
There are two kinds of cost centre: one for instructions that don't reference
memory (<code>iCC</code>), and one for instructions that do
(<code>idCC</code>):
<pre>
typedef struct _CC {
ULong a;
ULong m1;
ULong m2;
} CC;
typedef struct _iCC {
/* word 1 */
UChar tag;
UChar instr_size;
/* words 2+ */
Addr instr_addr;
CC I;
} iCC;
typedef struct _idCC {
/* word 1 */
UChar tag;
UChar instr_size;
UChar data_size;
/* words 2+ */
Addr instr_addr;
CC I;
CC D;
} idCC;
</pre>
Each <code>CC</code> has three fields <code>a</code>, <code>m1</code>,
<code>m2</code> for recording references, level 1 misses and level 2 misses.
Each of these is a 64-bit <code>ULong</code> -- the numbers can get very large,
ie. greater than 4.2 billion allowed by a 32-bit unsigned int.<p>
A <code>iCC</code> has one <code>CC</code> for instruction cache accesses. A
<code>idCC</code> has two, one for instruction cache accesses, and one for data
cache accesses.<p>
The <code>iCC</code> and <code>dCC</code> structs also store unchanging
information about the instruction:
<ul>
<li>An instruction-type identification tag (explained below)</li><p>
<li>Instruction size</li><p>
<li>Data reference size (<code>idCC</code> only)</li><p>
<li>Instruction address</li><p>
</ul>
Note that data address is not one of the fields for <code>idCC</code>. This is
because for many memory-referencing instructions the data address can change
each time it's executed (eg. if it uses register-offset addressing). We have
to give this item to the cache simulation in a different way (see
Instrumentation section below). Some memory-referencing instructions do always
reference the same address, but we don't try to treat them specialy in order to
keep things simple.<p>
Also note that there is only room for recording info about one data cache
access in an <code>idCC</code>. So what about instructions that do a read then
a write, such as:
<blockquote><code>inc %(esi)</code></blockquote>
In a write-allocate cache, as simulated by Valgrind, the write cannot miss,
since it immediately follows the read which will drag the block into the cache
if it's not already there. So the write access isn't really interesting, and
Valgrind doesn't record it. This means that Valgrind doesn't measure
memory references, but rather memory references that could miss in the cache.
This behaviour is the same as that used by the AMD Athlon hardware counters.
It also has the benefit of simplifying the implementation -- instructions that
read and write memory can be treated like instructions that read memory.<p>
<h3>Storing cost-centres</h3>
Cost centres are stored in a way that makes them very cheap to lookup, which is
important since one is looked up for every original x86 instruction
executed.<p>
Valgrind does JIT translations at the basic block level, and cost centres are
also setup and stored at the basic block level. By doing things carefully, we
store all the cost centres for a basic block in a contiguous array, and lookup
comes almost for free.<p>
Consider this part of a basic block (for exposition purposes, pretend it's an
entire basic block):
<pre>
movl $0x0,%eax
movl $0x99, -4(%ebp)
</pre>
The translation to UCode looks like this:
<pre>
MOVL $0x0, t20
PUTL t20, %EAX
INCEIPo $5
LEA1L -4(t4), t14
MOVL $0x99, t18
STL t18, (t14)
INCEIPo $7
</pre>
The first step is to allocate the cost centres. This requires a preliminary
pass to count how many x86 instructions were in the basic block, and their
types (and thus sizes). UCode translations for single x86 instructions are
delimited by the <code>INCEIPo</code> instruction, the argument of which gives
the byte size of the instruction (note that lazy INCEIP updating is turned off
to allow this).<p>
We can tell if an x86 instruction references memory by looking for
<code>LDL</code> and <code>STL</code> UCode instructions, and thus what kind of
cost centre is required. From this we can determine how many cost centres we
need for the basic block, and their sizes. We can then allocate them in a
single array.<p>
Consider the example code above. After the preliminary pass, we know we need
two cost centres, one <code>iCC</code> and one <code>dCC</code>. So we
allocate an array to store these which looks like this:
<pre>
|(uninit)| tag (1 byte)
|(uninit)| instr_size (1 bytes)
|(uninit)| (padding) (2 bytes)
|(uninit)| instr_addr (4 bytes)
|(uninit)| I.a (8 bytes)
|(uninit)| I.m1 (8 bytes)
|(uninit)| I.m2 (8 bytes)
|(uninit)| tag (1 byte)
|(uninit)| instr_size (1 byte)
|(uninit)| data_size (1 byte)
|(uninit)| (padding) (1 byte)
|(uninit)| instr_addr (4 bytes)
|(uninit)| I.a (8 bytes)
|(uninit)| I.m1 (8 bytes)
|(uninit)| I.m2 (8 bytes)
|(uninit)| D.a (8 bytes)
|(uninit)| D.m1 (8 bytes)
|(uninit)| D.m2 (8 bytes)
</pre>
(We can see now why we need tags to distinguish between the two types of cost
centres.)<p>
We also record the size of the array. We look up the debug info of the first
instruction in the basic block, and then stick the array into a table indexed
by filename and function name. This makes it easy to dump the information
quickly to file at the end.<p>
<h3>Instrumentation</h3>
The instrumentation pass has two main jobs:
<ol>
<li>Fill in the gaps in the allocated cost centres.</li><p>
<li>Add UCode to call the cache simulator for each instruction.</li><p>
</ol>
The instrumentation pass steps through the UCode and the cost centres in
tandem. As each original x86 instruction's UCode is processed, the appropriate
gaps in the instructions cost centre are filled in, for example:
<pre>
|INSTR_CC| tag (1 byte)
|5 | instr_size (1 bytes)
|(uninit)| (padding) (2 bytes)
|i_addr1 | instr_addr (4 bytes)
|0 | I.a (8 bytes)
|0 | I.m1 (8 bytes)
|0 | I.m2 (8 bytes)
|WRITE_CC| tag (1 byte)
|7 | instr_size (1 byte)
|4 | data_size (1 byte)
|(uninit)| (padding) (1 byte)
|i_addr2 | instr_addr (4 bytes)
|0 | I.a (8 bytes)
|0 | I.m1 (8 bytes)
|0 | I.m2 (8 bytes)
|0 | D.a (8 bytes)
|0 | D.m1 (8 bytes)
|0 | D.m2 (8 bytes)
</pre>
(Note that this step is not performed if a basic block is re-translated; see
<a href="#retranslations">here</a> for more information.)<p>
GCC inserts padding before the <code>instr_size</code> field so that it is word
aligned.<p>
The instrumentation added to call the cache simulation function looks like this
(instrumentation is indented to distinguish it from the original UCode):
<pre>
MOVL $0x0, t20
PUTL t20, %EAX
PUSHL %eax
PUSHL %ecx
PUSHL %edx
MOVL $0x4091F8A4, t46 # address of 1st CC
PUSHL t46
CALLMo $0x12 # second cachesim function
CLEARo $0x4
POPL %edx
POPL %ecx
POPL %eax
INCEIPo $5
LEA1L -4(t4), t14
MOVL $0x99, t18
MOVL t14, t42
STL t18, (t14)
PUSHL %eax
PUSHL %ecx
PUSHL %edx
PUSHL t42
MOVL $0x4091F8C4, t44 # address of 2nd CC
PUSHL t44
CALLMo $0x13 # second cachesim function
CLEARo $0x8
POPL %edx
POPL %ecx
POPL %eax
INCEIPo $7
</pre>
Consider the first instruction's UCode. Each call is surrounded by three
<code>PUSHL</code> and <code>POPL</code> instructions to save and restore the
caller-save registers. Then the address of the instruction's cost centre is
pushed onto the stack, to be the first argument to the cache simulation
function. The address is known at this point because we are doing a
simultaneous pass through the cost centre array. This means the cost centre
lookup for each instruction is almost free (just the cost of pushing an
argument for a function call). Then the call to the cache simulation function
for non-memory-reference instructions is made (note that the
<code>CALLMo</code> UInstruction takes an offset into a table of predefined
functions; it is not an absolute address), and the single argument is
<code>CLEAR</code>ed from the stack.<p>
The second instruction's UCode is similar. The only difference is that, as
mentioned before, we have to pass the address of the data item referenced to
the cache simulation function too. This explains the <code>MOVL t14,
t42</code> and <code>PUSHL t42</code> UInstructions. (Note that the seemingly
redundant <code>MOV</code>ing will probably be optimised away during register
allocation.)<p>
Note that instead of storing unchanging information about each instruction
(instruction size, data size, etc) in its cost centre, we could have passed in
these arguments to the simulation function. But this would slow the calls down
(two or three extra arguments pushed onto the stack). Also it would bloat the
UCode instrumentation by amounts similar to the space required for them in the
cost centre; bloated UCode would also fill the translation cache more quickly,
requiring more translations for large programs and slowing them down more.<p>
<a name="retranslations"></a>
<h3>Handling basic block retranslations</h3>
The above description ignores one complication. Valgrind has a limited size
cache for basic block translations; if it fills up, old translations are
discarded. If a discarded basic block is executed again, it must be
re-translated.<p>
However, we can't use this approach for profiling -- we can't throw away cost
centres for instructions in the middle of execution! So when a basic block is
translated, we first look for its cost centre array in the hash table. If
there is no cost centre array, it must be the first translation, so we proceed
as described above. But if there is a cost centre array already, it must be a
retranslation. In this case, we skip the cost centre allocation and
initialisation steps, but still do the UCode instrumentation step.<p>
<h3>The cache simulation</h3>
The cache simulation is fairly straightforward. It just tracks which memory
blocks are in the cache at the moment (it doesn't track the contents, since
that is irrelevant).<p>
The interface to the simulation is quite clean. The functions called from the
UCode contain calls to the simulation functions in the files
<Code>vg_cachesim_{I1,D1,L2}.c</code>; these calls are inlined so that only
one function call is done per simulated x86 instruction. The file
<code>vg_cachesim.c</code> simply <code>#include</code>s the three files
containing the simulation, which makes plugging in new cache simulations is
very easy -- you just replace the three files and recompile.<p>
<h3>Output</h3>
Output is fairly straightforward, basically printing the cost centre for every
instruction, grouped by files and functions. Total counts (eg. total cache
accesses, total L1 misses) are calculated when traversing this structure rather
than during execution, to save time; the cache simulation functions are called
so often that even one or two extra adds can make a sizeable difference.<p>
Input file has the following format:
<pre>
file ::= desc_line* cmd_line events_line data_line+ summary_line
desc_line ::= "desc:" ws? non_nl_string
cmd_line ::= "cmd:" ws? cmd
events_line ::= "events:" ws? (event ws)+
data_line ::= file_line | fn_line | count_line
file_line ::= ("fl=" | "fi=" | "fe=") filename
fn_line ::= "fn=" fn_name
count_line ::= line_num ws? (count ws)+
summary_line ::= "summary:" ws? (count ws)+
count ::= num | "."
</pre>
Where:
<ul>
<li><code>non_nl_string</code> is any string not containing a newline.</li><p>
<li><code>cmd</code> is a command line invocation.</li><p>
<li><code>filename</code> and <code>fn_name</code> can be anything.</li><p>
<li><code>num</code> and <code>line_num</code> are decimal numbers.</li><p>
<li><code>ws</code> is whitespace.</li><p>
<li><code>nl</code> is a newline.</li><p>
</ul>
The contents of the "desc:" lines is printed out at the top of the summary.
This is a generic way of providing simulation specific information, eg. for
giving the cache configuration for cache simulation.<p>
Counts can be "." to represent "N/A", eg. the number of write misses for an
instruction that doesn't write to memory.<p>
The number of counts in each <code>line</code> and the
<code>summary_line</code> should not exceed the number of events in the
<code>event_line</code>. If the number in each <code>line</code> is less,
cg_annotate treats those missing as though they were a "." entry. <p>
A <code>file_line</code> changes the current file name. A <code>fn_line</code>
changes the current function name. A <code>count_line</code> contains counts
that pertain to the current filename/fn_name. A "fn=" <code>file_line</code>
and a <code>fn_line</code> must appear before any <code>count_line</code>s to
give the context of the first <code>count_line</code>s.<p>
Each <code>file_line</code> should be immediately followed by a
<code>fn_line</code>. "fi=" <code>file_lines</code> are used to switch
filenames for inlined functions; "fe=" <code>file_lines</code> are similar, but
are put at the end of a basic block in which the file name hasn't been switched
back to the original file name. (fi and fe lines behave the same, they are
only distinguished to help debugging.)<p>
<h3>Summary of performance features</h3>
Quite a lot of work has gone into making the profiling as fast as possible.
This is a summary of the important features:
<ul>
<li>The basic block-level cost centre storage allows almost free cost centre
lookup.</li><p>
<li>Only one function call is made per instruction simulated; even this
accounts for a sizeable percentage of execution time, but it seems
unavoidable if we want flexibility in the cache simulator.</li><p>
<li>Unchanging information about an instruction is stored in its cost centre,
avoiding unnecessary argument pushing, and minimising UCode
instrumentation bloat.</li><p>
<li>Summary counts are calculated at the end, rather than during
execution.</li><p>
<li>The <code>cachegrind.out</code> output files can contain huge amounts of
information; file format was carefully chosen to minimise file
sizes.</li><p>
</ul>
<h3>Annotation</h3>
Annotation is done by cg_annotate. It is a fairly straightforward Perl script
that slurps up all the cost centres, and then runs through all the chosen
source files, printing out cost centres with them. It too has been carefully
optimised.
<h3>Similar work, extensions</h3>
It would be relatively straightforward to do other simulations and obtain
line-by-line information about interesting events. A good example would be
branch prediction -- all branches could be instrumented to interact with a
branch prediction simulator, using very similar techniques to those described
above.<p>
In particular, cg_annotate would not need to change -- the file format is such
that it is not specific to the cache simulation, but could be used for any kind
of line-by-line information. The only part of cg_annotate that is specific to
the cache simulation is the name of the input file
(<code>cachegrind.out</code>), although it would be very simple to add an
option to control this.<p>
</body>
</html>

View File

@ -1,66 +0,0 @@
<html>
<head>
<style type="text/css">
body { background-color: #ffffff;
color: #000000;
font-family: Times, Helvetica, Arial;
font-size: 14pt}
h4 { margin-bottom: 0.3em}
code { color: #000000;
font-family: Courier;
font-size: 13pt }
pre { color: #000000;
font-family: Courier;
font-size: 13pt }
a:link { color: #0000C0;
text-decoration: none; }
a:visited { color: #0000C0;
text-decoration: none; }
a:active { color: #0000C0;
text-decoration: none; }
</style>
<title>Cachegrind</title>
</head>
<body bgcolor="#ffffff">
<a name="title"></a>
<h1 align=center>CoreCheck</h1>
<center>This manual was last updated on 2002-10-03</center>
<p>
<center>
<a href="mailto:njn25@cam.ac.uk">njn25@cam.ac.uk</a><br>
Copyright &copy; 2000-2002 Nicholas Nethercote
<p>
CoreCheck is licensed under the GNU General Public License,
version 2<br>
CoreCheck is a Valgrind skin that does very basic error checking.
</center>
<p>
<h2>1&nbsp; CoreCheck</h2>
CoreCheck is a very simple skin for Valgrind. It adds no instrumentation to
the program's code, and only reports the few kinds of errors detected by
Valgrind's core. It is mainly of use for Valgrind's developers for debugging
and regression testing.
<p>
The errors detected are those found by the core when
<code>VG_(needs).core_errors</code> is set. These include:
<ul>
<li>Pthread API errors (many; eg. unlocking a non-locked mutex)<p>
<li>Silly arguments to <code>malloc() </code> et al (eg. negative size)<p>
<li>Invalid file descriptors to blocking syscalls <code>read()</code> and
<code>write()</code><p>
<li>Bad signal numbers passed to <code>sigaction()</code><p>
<li>Attempts to install signal handler for <code>SIGKILL</code> or
<code>SIGSTOP</code> <p>
</ul>
<hr width="100%">
</body>
</html>

View File

@ -1,26 +0,0 @@
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta http-equiv="Content-Language" content="en-gb">
<meta name="generator"
content="Mozilla/4.76 (X11; U; Linux 2.4.1-0.1.9 i586) [Netscape]">
<meta name="author" content="Julian Seward <jseward@acm.org>">
<meta name="description" content="say what this prog does">
<meta name="keywords" content="Valgrind, memory checker, x86, GPL">
<title>Valgrind's user manual</title>
</head>
<frameset cols="150,*">
<frame name="nav" target="main" src="nav.html">
<frame name="main" src="manual.html" scrolling="auto">
<noframes>
<body>
<p>This page uses frames, but your browser doesn't support them.</p>
</body>
</noframes>
</frameset>
</html>

File diff suppressed because it is too large Load Diff

View File

@ -1,72 +0,0 @@
<html>
<head>
<title>Valgrind</title>
<base target="main">
<style type="text/css">
<style type="text/css">
body { background-color: #ffffff;
color: #000000;
font-family: Times, Helvetica, Arial;
font-size: 14pt}
h4 { margin-bottom: 0.3em}
code { color: #000000;
font-family: Courier;
font-size: 13pt }
pre { color: #000000;
font-family: Courier;
font-size: 13pt }
a:link { color: #0000C0;
text-decoration: none; }
a:visited { color: #0000C0;
text-decoration: none; }
a:active { color: #0000C0;
text-decoration: none; }
</style>
</head>
<body>
<br>
<a href="manual.html#contents"><b>Contents of this manual</b></a><br>
<a href="manual.html#intro">1 Introduction</a><br>
<a href="manual.html#whatfor">1.1 What Valgrind is for</a><br>
<a href="manual.html#whatdoes">1.2 What it does with
your program</a>
<p>
<a href="manual.html#howtouse">2 <b>How to use it, and how to
make sense of the results</b></a><br>
<a href="manual.html#starta">2.1 Getting started</a><br>
<a href="manual.html#comment">2.2 The commentary</a><br>
<a href="manual.html#report">2.3 Reporting of errors</a><br>
<a href="manual.html#suppress">2.4 Suppressing errors</a><br>
<a href="manual.html#flags">2.5 Command-line flags</a><br>
<a href="manual.html#errormsgs">2.6 Explanation of error messages</a><br>
<a href="manual.html#suppfiles">2.7 Writing suppressions files</a><br>
<a href="manual.html#clientreq">2.8 The Client Request mechanism</a><br>
<a href="manual.html#pthreads">2.9 Support for POSIX pthreads</a><br>
<a href="manual.html#install">2.10 Building and installing</a><br>
<a href="manual.html#problems">2.11 If you have problems</a>
<p>
<a href="manual.html#machine">3 <b>Details of the checking machinery</b></a><br>
<a href="manual.html#vvalue">3.1 Valid-value (V) bits</a><br>
<a href="manual.html#vaddress">3.2 Valid-address (A) bits</a><br>
<a href="manual.html#together">3.3 Putting it all together</a><br>
<a href="manual.html#signals">3.4 Signals</a><br>
<a href="manual.html#leaks">3.5 Memory leak detection</a>
<p>
<a href="manual.html#limits">4 <b>Limitations</b></a><br>
<p>
<a href="manual.html#howitworks">5 <b>How it works -- a rough overview</b></a><br>
<a href="manual.html#startb">5.1 Getting started</a><br>
<a href="manual.html#engine">5.2 The translation/instrumentation engine</a><br>
<a href="manual.html#track">5.3 Tracking the status of memory</a><br>
<a href="manual.html#sys_calls">5.4 System calls</a><br>
<a href="manual.html#sys_signals">5.5 Signals</a>
<p>
<a href="manual.html#example">6 <b>An example</b></a><br>
<p>
<a href="manual.html#cache">7 <b>Cache profiling</b></a></h4>
<p>
<a href="techdocs.html">8 <b>The design and implementation of Valgrind</b></a><br>
</body>
</html>

View File

@ -1,687 +0,0 @@
<html>
<head>
<style type="text/css">
body { background-color: #ffffff;
color: #000000;
font-family: Times, Helvetica, Arial;
font-size: 14pt}
h4 { margin-bottom: 0.3em}
code { color: #000000;
font-family: Courier;
font-size: 13pt }
pre { color: #000000;
font-family: Courier;
font-size: 13pt }
a:link { color: #0000C0;
text-decoration: none; }
a:visited { color: #0000C0;
text-decoration: none; }
a:active { color: #0000C0;
text-decoration: none; }
</style>
<title>Valgrind</title>
</head>
<body bgcolor="#ffffff">
<a name="title">&nbsp;</a>
<h1 align=center>Valgrind Skins</h1>
<center>
A guide to writing new skins for Valgrind<br>
This guide was last updated on 20020926
</center>
<p>
<center>
<a href="mailto:njn25@cam.ac.uk">njn25@cam.ac.uk</a><br>
Nick Nethercote, October 2002
<p>
Valgrind is licensed under the GNU General Public License,
version 2<br>
An open-source tool for supervising execution of Linux-x86 executables.
</center>
<p>
<hr width="100%">
<a name="contents"></a>
<h2>Contents of this manual</h2>
<h4>1&nbsp; <a href="#intro">Introduction</a></h4>
1.1&nbsp; <a href="#supexec">Supervised Execution</a><br>
1.2&nbsp; <a href="#skins">Skins</a><br>
1.3&nbsp; <a href="#execspaces">Execution Spaces</a><br>
<h4>2&nbsp; <a href="#writingaskin">Writing a Skin</a></h4>
2.1&nbsp; <a href="#whywriteaskin">Why write a skin?</a><br>
2.2&nbsp; <a href="#howskinswork">How skins work</a><br>
2.3&nbsp; <a href="#gettingcode">Getting the code</a><br>
2.4&nbsp; <a href="#gettingstarted">Getting started</a><br>
2.5&nbsp; <a href="#writingcode">Writing the code</a><br>
2.6&nbsp; <a href="#init">Initialisation</a><br>
2.7&nbsp; <a href="#instr">Instrumentation</a><br>
2.8&nbsp; <a href="#fini">Finalisation</a><br>
2.9&nbsp; <a href="#otherimportantinfo">Other important information</a><br>
2.10&nbsp; <a href="#wordsofadvice">Words of advice</a><br>
<h4>3&nbsp; <a href="#advancedtopics">Advanced Topics</a></h4>
3.1&nbsp; <a href="#suppressions">Suppressions</a><br>
3.2&nbsp; <a href="#documentation">Documentation</a><br>
3.3&nbsp; <a href="#regressiontests">Regression tests</a><br>
3.4&nbsp; <a href="#profiling">Profiling</a><br>
3.5&nbsp; <a href="#othermakefilehackery">Other makefile hackery</a><br>
3.6&nbsp; <a href="#interfaceversions">Core/skin interface versions</a><br>
<h4>4&nbsp; <a href="#finalwords">Final Words</a></h4>
<hr width="100%">
<a name="intro"></a>
<h2>1&nbsp; Introduction</h2>
<a name="supexec"></a>
<h3>1.1&nbsp; Supervised Execution</h3>
Valgrind provides a generic infrastructure for supervising the execution of
programs. This is done by providing a way to instrument programs in very
precise ways, making it relatively easy to support activities such as dynamic
error detection and profiling.<p>
Although writing a skin is not easy, and requires learning quite a few things
about Valgrind, it is much easier than instrumenting a program from scratch
yourself.
<a name="skins"></a>
<h3>1.2&nbsp; Skins</h3>
The key idea behind Valgrind's architecture is the division between its
``core'' and ``skins''.
<p>
The core provides the common low-level infrastructure to support program
instrumentation, including the x86-to-x86 JIT compiler, low-level memory
manager, signal handling and a scheduler (for pthreads). It also provides
certain services that are useful to some but not all skins, such as support
for error recording and suppression.
<p>
But the core leaves certain operations undefined, which must be filled by skins.
Most notably, skins define how program code should be instrumented. They can
also define certain variables to indicate to the core that they would like to
use certain services, or be notified when certain interesting events occur.
<p>
Each skin that is written defines a new program supervision tool. Writing a
new tool just requires writing a new skin. The core takes care of all the hard
work.
<p>
<a name="execspaces"></a>
<h3>1.3&nbsp; Execution Spaces</h3>
An important concept to understand before writing a skin is that there are
three spaces in which program code executes:
<ol>
<li>User space: this covers most of the program's execution. The skin is
given the code and can instrument it any way it likes, providing (more or
less) total control over the code.<p>
Code executed in user space includes all the program code, almost all of
the C library (including things like the dynamic linker), and almost
all parts of all other libraries.
</li><p>
<li>Core space: a small proportion of the program's execution takes place
entirely within Valgrind's core. This includes:<p>
<ul>
<li>Dynamic memory management (<code>malloc()</code> etc.)</li>
<li>Pthread operations and scheduling</li>
<li>Signal handling</li>
</ul><p>
A skin has no control over these operations; it never ``sees'' the code
doing this work and thus cannot instrument it. However, the core
provides hooks so a skin can be notified when certain interesting events
happen, for example when when dynamic memory is allocated or freed, the
stack pointer is changed, or a pthread mutex is locked, etc.<p>
Note that these hooks only notify skins of events relevant to user
space. For example, when the core allocates some memory for its own use,
the skin is not notified of this, because it's not directly part of the
supervised program's execution.
</li><p>
<li>Kernel space: execution in the kernel. Two kinds:<p>
<ol>
<li>System calls: can't be directly observed by either the skin or the
core. But the core does have some idea of what happens to the
arguments, and it provides hooks for a skin to wrap system calls.
</li><p>
<li>Other: all other kernel activity (e.g. process scheduling) is
totally opaque and irrelevant to the program.
</li><p>
</ol>
</li><p>
It should be noted that a skin only has direct control over code executed in
user space. This is the vast majority of code executed, but it is not
absolutely all of it, so any profiling information recorded by a skin won't
be totally accurate.
</ol>
<a name="writingaskin"></a>
<h2>2&nbsp; Writing a Skin</h2>
<a name="whywriteaskin"</a>
<h3>2.1&nbsp; Why write a skin?</h3>
Before you write a skin, you should have some idea of what it should do. What
is it you want to know about your programs of interest? Consider some existing
skins:
<ul>
<li>memcheck: among other things, performs fine-grained validity and
addressibility checks of every memory reference performed by the program
</li><p>
<li>addrcheck: performs lighterweight addressibility checks of every memory
reference performed by the program</li><p>
<li>cachegrind: tracks every instruction and memory reference to simulate
instruction and data caches, tracking cache accesses and misses that
occur on every line in the program</li><p>
<li>helgrind: tracks every memory access and mutex lock/unlock to determine
if a program contains any data races</li><p>
<li>lackey: does simple counting of various things: the number of calls to a
particular function (<code>_dl_runtime_resolve()</code>); the number of
basic blocks, x86 instruction, UCode instructions executed; the number
of branches executed and the proportion of those which were taken.</li><p>
</ul>
These examples give a reasonable idea of what kinds of things Valgrind can be
used for. The instrumentation can range from very lightweight (e.g. counting
the number of times a particular function is called) to very intrusive (e.g.
memcheck's memory checking).
<a name="howskinswork"</a>
<h3>2.2&nbsp; How skins work</h3>
Skins must define various functions for instrumenting programs that are called
by Valgrind's core, yet they must be implemented in such a way that they can be
written and compiled without touching Valgrind's core. This is important,
because one of our aims is to allow people to write and distribute their own
skins that can be plugged into Valgrind's core easily.<p>
This is achieved by packaging each skin into a separate shared object which is
then loaded ahead of the core shared object <code>valgrind.so</code>, using the
dynamic linker's <code>LD_PRELOAD</code> variable. Any functions defined in
the skin that share the name with a function defined in core (such as
the instrumentation function <code>SK_(instrument)()</code>) override the
core's definition. Thus the core can call the necessary skin functions.<p>
This magic is all done for you; the shared object used is chosen with the
<code>--skin</code> option to the <code>valgrind</code> startup script. The
default skin used is <code>memcheck</code>, Valgrind's original memory checker.
<a name="gettingcode"</a>
<h3>2.3&nbsp; Getting the code</h3>
To write your own skin, you'll need to check out a copy of Valgrind from the
CVS repository, rather than using a packaged distribution. This is because it
contains several extra files needed for writing skins.<p>
To check out the code from the CVS repository, first login:
<blockquote><code>
cvs -d:pserver:anonymous@cvs.valgrind.sourceforge.net:/cvsroot/valgrind login
</code></blockquote>
Then checkout the code. To get a copy of the current development version
(recommended for the brave only):
<blockquote><code>
cvs -z3 -d:pserver:anonymous@cvs.valgrind.sourceforge.net:/cvsroot/valgrind co valgrind
</code></blockquote>
To get a copy of the stable released branch:
<blockquote><code>
cvs -z3 -d:pserver:anonymous@cvs.valgrind.sourceforge.net:/cvsroot/valgrind co -r <i>TAG</i> valgrind
</code></blockquote>
where <code><i>TAG</i></code> has the form <code>VALGRIND_X_Y_Z</code> for
version X.Y.Z.
<a name="gettingstarted"</a>
<h3>2.4&nbsp; Getting started</h3>
Valgrind uses GNU <code>automake</code> and <code>autoconf</code> for the
creation of Makefiles and configuration. But don't worry, these instructions
should be enough to get you started even if you know nothing about those
tools.<p>
In what follows, all filenames are relative to Valgrind's top-level directory
<code>valgrind/</code>.
<ol>
<li>Choose a name for the skin, and an abbreviation that can be used as a
short prefix. We'll use <code>foobar</code> and <code>fb</code> as an
example.
</li><p>
<li>Make a new directory <code>foobar/</code> which will hold the skin.
</li><p>
<li>Copy <code>example/Makefile.am</code> into <code>foobar/</code>.
Edit it by replacing all occurrences of the string
``<code>example</code>'' with ``<code>foobar</code>'' and the one
occurrence of the string ``<code>ex_</code>'' with ``<code>fb_</code>''.
It might be worth trying to understand this file, at least a little; you
might have to do more complicated things with it later on. In
particular, the name of the <code>vgskin_foobar_so_SOURCES</code> variable
determines the name of the skin's shared object, which determines what
name must be passed to the <code>--skin</code> option to use the skin.
</li><p>
<li>Copy <code>example/ex_main.c</code> into
<code>foobar/</code>, renaming it as <code>fb_main.c</code>.
Edit it by changing the five lines in <code>SK_(pre_clo_init)()</code>
to something appropriate for the skin. These fields are used in the
startup message, except for <code>bug_reports_to</code> which is used
if a skin assertion fails.
</li><p>
<li>Edit <code>Makefile.am</code>, adding the new directory
<code>foobar</code> to the <code>SUBDIRS</code> variable.
</li><p>
<li>Edit <code>configure.in</code>, adding <code>foobar/Makefile</code> to the
<code>AC_OUTPUT</code> list.
</li><p>
<li>Run:
<pre>
autogen.sh
./configure --prefix=`pwd`/inst
make install</pre>
It should automake, configure and compile without errors, putting copies
of the skin's shared object <code>vgskin_foobar.so</code> in
<code>foobar/</code> and
<code>inst/lib/valgrind/</code>.
</li><p>
<li>You can test it with a command like
<pre>
inst/bin/valgrind --skin=foobar date</pre>
(almost any program should work; <code>date</code> is just an example).
The output should be something like this:
<pre>
==738== foobar-0.0.1, a foobarring tool for x86-linux.
==738== Copyright (C) 2002, and GNU GPL'd, by J. Random Hacker.
==738== Built with valgrind-1.1.0, a program execution monitor.
==738== Copyright (C) 2000-2002, and GNU GPL'd, by Julian Seward.
==738== Estimated CPU clock rate is 1400 MHz
==738== For more details, rerun with: -v
==738==
Wed Sep 25 10:31:54 BST 2002
==738==</pre>
The skin does nothing except run the program uninstrumented.
</li><p>
</ol>
These steps don't have to be followed exactly - you can choose different names
for your source files, and use a different <code>--prefix</code> for
<code>./configure</code>.<p>
Now that we've setup, built and tested the simplest possible skin, onto the
interesting stuff...
<a name="writingcode"></a>
<h3>2.5&nbsp; Writing the code</h3>
A skin must define at least these four functions:
<pre>
SK_(pre_clo_init)()
SK_(post_clo_init)()
SK_(instrument)()
SK_(fini)()
</pre>
Also, it must use the macro <code>VG_DETERMINE_INTERFACE_VERSION</code>
exactly once in its source code. If it doesn't, you will get a link error
involving <code>VG_(skin_interface_major_version)</code>. This macro is
used to ensure the core/skin interface used by the core and a plugged-in
skin are binary compatible.
In addition, if a skin wants to use some of the optional services provided by
the core, it may have to define other functions.
<a name="init"></a>
<h3>2.6&nbsp; Initialisation</h3>
Most of the initialisation should be done in <code>SK_(pre_clo_init)()</code>.
Only use <code>SK_(post_clo_init)()</code> if a skin provides command line
options and must do some initialisation after option processing takes place
(``<code>clo</code>'' stands for ``command line options'').<p>
The first argument to <code>SK_(pre_clo_init)()</code> must be initialised with
various ``details'' for a skin. These are all compulsory except for
<code>version</code>. They are used when constructing the startup message,
except for <code></code> which is used if <code>VG_(skin_panic)()</code> is
ever called, or a skin assertion fails.<p>
The second argument to <code>SK_(pre_clo_init)()</code> must be initialised with
the ``needs'' for a skin. They are mostly booleans, and can be left untouched
(they default to <code>False</code>). They determine whether a skin can do
various things such as: record, report and suppress errors; process command
line options; wrap system calls; record extra information about malloc'd
blocks, etc.<p>
For example, if a skin wants the core's help in recording and reporting errors,
it must set the <code>skin_errors</code> need to <code>True</code>, and then
provide definitions of six functions for comparing errors, printing out errors,
reading suppressions from a suppressions file, etc. While writing these
functions requires some work, it's much less than doing error handling from
scratch because the core is doing most of the work. See the type
<code>VgNeeds</code> in <code>include/vg_skin.h</code> for full details of all
the needs.<p>
The third argument to <code>SK_(pre_clo_init)()</code> must be initialised to
indicate which events in core the skin wants to be notified about. These
include things such as blocks of memory being malloc'd, the stack pointer
changing, a mutex being locked, etc. If a skin wants to know about this,
it should set the relevant pointer in the structure to point to a function,
which will be called when that event happens.<p>
For example, if the skin want to be notified when a new block of memory is
malloc'd, it should set the <code>new_mem_heap</code> function pointer, and the
assigned function will be called each time this happens. See the type
<code>VgTrackEvents</code> in <code>include/vg_skin.h</code> for full details
of all the trackable events.<p>
<a name="instr"></a>
<h3>2.7&nbsp; Instrumentation</h3>
<code>SK_(instrument)()</code> is the interesting one. It allows you to
instrument <i>UCode</i>, which is Valgrind's RISC-like intermediate language.
UCode is described in the <a href="techdocs.html">technical docs</a>.
The easiest way to instrument UCode is to insert calls to C functions when
interesting things happen. See the skin ``lackey''
(<code>lackey/lk_main.c</code>) for a simple example of this, or
Cachegrind (<code>cachegrind/cg_main.c</code>) for a more complex
example.<p>
A much more complicated way to instrument UCode, albeit one that might result
in faster instrumented programs, is to extend UCode with new UCode
instructions. This is recommended for advanced Valgrind hackers only! See the
``memcheck'' skin for an example.
<a name="fini"></a>
<h3>2.8&nbsp; Finalisation</h3>
This is where you can present the final results, such as a summary of the
information collected. Any log files should be written out at this point.
<a name="otherimportantinfo"></a>
<h3>2.9&nbsp; Other important information</h3>
Please note that the core/skin split infrastructure is all very new, and not
very well documented. Here are some important points, but there are
undoubtedly many others that I should note but haven't thought of.<p>
The file <code>include/vg_skin.h</code> contains all the types,
macros, functions, etc. that a skin should (hopefully) need, and is the only
<code>.h</code> file a skin should need to <code>#include</code>.<p>
In particular, you probably shouldn't use anything from the C library (there
are deep reasons for this, trust us). Valgrind provides an implementation of a
reasonable subset of the C library, details of which are in
<code>vg_skin.h</code>.<p>
Similarly, when writing a skin, you shouldn't need to look at any of the code
in Valgrind's core. Although it might be useful sometimes to help understand
something.<p>
<code>vg_skin.h</code> has a reasonable amount of documentation in it that
should hopefully be enough to get you going. But ultimately, the skins
distributed (memcheck, addrcheck, cachegrind, lackey, etc.) are probably the
best documentation of all, for the moment.<p>
Note that the <code>VG_</code> and <code>SK_</code> macros are used heavily.
These just prepend longer strings in front of names to avoid potential
namespace clashes. We strongly recommend using the <code>SK_</code> macro
for any global functions and variables in your skin.<p>
<a name="wordsofadvice"</a>
<h3>2.10&nbsp; Words of Advice</h3>
Writing and debugging skins is not trivial. Here are some suggestions for
solving common problems.<p>
If you are getting segmentation faults in C functions used by your skin, the
usual GDB command:
<blockquote><code>gdb <i>prog</i> core</code></blockquote>
usually gives the location of the segmentation fault.<p>
If you want to debug C functions used by your skin, you can attach GDB to
Valgrind with some effort:
<ul>
<li>Enable the following code in <code>coregrind/vg_main.c</code> by
changing <code>if (0)</code> into <code>if (1)</code>:
<pre>
/* Hook to delay things long enough so we can get the pid and
attach GDB in another shell. */
if (0) {
Int p, q;
for (p = 0; p < 50000; p++)
for (q = 0; q < 50000; q++) ;
}
</li><p>
and rebuild Valgrind.
<li>Then run:
<blockquote><code>valgrind <i>prog</i></code></blockquote>
Valgrind starts the program, printing its process id, and then delays for
a few seconds (you may have to change the loop bounds to get a suitable
delay).</li><p>
<li>In a second shell run:
<blockquote><code>gdb <i>prog</i> <i>pid</i></code></blockquote></li><p>
</ul>
GDB may be able to give you useful information. Note that by default
most of the system is built with <code>-fomit-frame-pointer</code>,
and you'll need to get rid of this to extract useful tracebacks from
GDB.<p>
If you just want to know whether a program point has been reached, using the
<code>OINK</code> macro (in <code> include/vg_skin.h</code>) can be easier than
using GDB.<p>
If you are having problems with your UCode instrumentation, it's likely that
GDB won't be able to help at all. In this case, Valgrind's
<code>--trace-codegen</code> option is invaluable for observing the results of
instrumentation.<p>
The other debugging command line options can be useful too (run <code>valgrind
-h</code> for the list).<p>
<a name="advancedtopics"></a>
<h2>3&nbsp; Advanced Topics</h2>
Once a skin becomes more complicated, there are some extra things you may
want/need to do.
<a name="suppressions"</a>
<h3>3.1&nbsp; Suppressions</h3>
If your skin reports errors and you want to suppress some common ones, you can
add suppressions to the suppression files. The relevant files are
<code>valgrind/*.supp</code>; the final suppression file is aggregated from
these files by combining the relevant <code>.supp</code> files depending on the
versions of linux, X and glibc on a system.
<a name="documentation"</a>
<h3>3.2&nbsp; Documentation</h3>
If you are feeling conscientious and want to write some HTML documentation for
your skin, follow these steps (using <code>foobar</code> as the example skin
name again):
<ol>
<li>Make a directory <code>foobar/docs/</code>.
</li><p>
<li>Edit <code>foobar/Makefile.am</code>, adding <code>docs</code> to
the <code>SUBDIRS</code> variable.
</li><p>
<li>Edit <code>configure.in</code>, adding
<code>foobar/docs/Makefile</code> to the <code>AC_OUTPUT</code> list.
</li><p>
<li>Write <code>foobar/docs/Makefile.am</code>. Use
<code>memcheck/docs/Makefile.am</code> as an example.
</li>
<li>Write the documentation; the top-level file should be called
<code>foobar/docs/index.html</code>.
</li><p>
<li>(optional) Add a link in the main documentation index
<code>docs/index.html</code> to
<code>foobar/docs/index.html</code>
</li><p>
</ol>
<a name="regressiontests"</a>
<h3>3.3&nbsp; Regression tests</h3>
Valgrind has some support for regression tests. If you want to write
regression tests for your skin:
<ol>
<li>Make a directory <code>foobar/tests/</code>.
</li><p>
<li>Edit <code>foobar/Makefile.am</code>, adding <code>tests</code> to
the <code>SUBDIRS</code> variable.
</li><p>
<li>Edit <code>configure.in</code>, adding
<code>foobar/tests/Makefile</code> to the <code>AC_OUTPUT</code> list.
</li><p>
<li>Write <code>foobar/tests/Makefile.am</code>. Use
<code>memcheck/tests/Makefile.am</code> as an example.
</li><p>
<li>Write the tests, <code>.vgtest</code> test description files,
<code>.stdout.exp</code> and <code>.stderr.exp</code> expected output
files. (Note that Valgrind's output goes to stderr.) Some details
on writing and running tests are given in the comments at the top of the
testing script <code>tests/vg_regtest</code>.
</li><p>
<li>Write a filter for stderr results <code>foobar/tests/filter_stderr</code>.
It can call the existing filters in <code>tests/</code>. See
<code>memcheck/tests/filter_stderr</code> for an example; in particular
note the <code>$dir</code> trick that ensures the filter works correctly
from any directory.
</li><p>
</ol>
<a name="profiling"</a>
<h3>3.4&nbsp; Profiling</h3>
To do simple tick-based profiling of a skin, include the line
<blockquote>
#include "vg_profile.c"
</blockquote>
in the skin somewhere, and rebuild (you may have to <code>make clean</code>
first). Then run Valgrind with the <code>--profile=yes</code> option.<p>
The profiler is stack-based; you can register a profiling event with
<code>VGP_(register_profile_event)()</code> and then use the
<code>VGP_PUSHCC</code> and <code>VGP_POPCC</code> macros to record time spent
doing certain things. New profiling event numbers must not overlap with the
core profiling event numbers. See <code>include/vg_skin.h</code> for details
and the ``memcheck'' skin for an example.
<a name="othermakefilehackery"</a>
<h3>3.5&nbsp; Other makefile hackery</h3>
If you add any directories under <code>valgrind/foobar/</code>, you will
need to add an appropriate <code>Makefile.am</code> to it, and add a
corresponding entry to the <code>AC_OUTPUT</code> list in
<code>valgrind/configure.in</code>.<p>
If you add any scripts to your skin (see Cachegrind for an example) you need to
add them to the <code>bin_SCRIPTS</code> variable in
<code>valgrind/foobar/Makefile.am</code>.<p>
<a name="interfaceversions"</a>
<h3>3.5&nbsp; Core/skin interface versions</h3>
In order to allow for the core/skin interface to evolve over time, Valgrind
uses a basic interface versioning system. All a skin has to do is use the
<code>VG_DETERMINE_INTERFACE_VERSION</code> macro exactly once in its code.
If not, a link error will occur when the skin is built.
<p>
The interface version number has the form X.Y. Changes in Y indicate binary
compatible changes. Changes in X indicate binary incompatible changes. If
the core and skin has the same major version number X they should work
together. If X doesn't match, Valgrind will abort execution with an
explanation of the problem.
<p>
This approach was chosen so that if the interface changes in the future,
old skins won't work and the reason will be clearly explained, instead of
possibly crashing mysteriously. We have attempted to minimise the potential
for binary incompatible changes by means such as minimising the use of naked
structs in the interface.
<a name="finalwords"></a>
<h2>4&nbsp; Final Words</h2>
This whole core/skin business is very new and experimental, and under active
development.<p>
The first consequence of this is that the core/skin interface is quite
immature. It will almost certainly change in the future; we have no intention
of freezing it and then regretting the inevitable stupidities. Hopefully most
of the future changes will be to add new features, hooks, functions, etc,
rather than to change old ones, which should cause a minimum of trouble for
existing skins, and we've put some effort into future-proofing the interface
to avoid binary incompatibility. But we can't guarantee anything. The
versioning system should catch any incompatibilities. Just something to be
aware of.<p>
The second consequence of this is that we'd love to hear your feedback about
it:
<ul>
<li>If you love it or hate it</li><p>
<li>If you find bugs</li><p>
<li>If you write a skin</li><p>
<li>If you have suggestions for new features, needs, trackable events,
functions</li><p>
<li>If you have suggestions for making skins easier to write
</li><p>
<li>If you have suggestions for improving this documentation </li><p>
<li>If you don't understand something</li><p>
</ul>
or anything else!<p>
Happy programming.

File diff suppressed because it is too large Load Diff

View File

@ -1,44 +0,0 @@
<html>
<head>
<title>Valgrind</title>
<base target="main">
<style type="text/css">
<style type="text/css">
body { background-color: #ffffff;
color: #000000;
font-family: Times, Helvetica, Arial;
font-size: 14pt}
h4 { margin-bottom: 0.3em}
code { color: #000000;
font-family: Courier;
font-size: 13pt }
pre { color: #000000;
font-family: Courier;
font-size: 13pt }
a:link { color: #0000C0;
text-decoration: none; }
a:visited { color: #0000C0;
text-decoration: none; }
a:active { color: #0000C0;
text-decoration: none; }
</style>
</head>
<body>
<h2>Documentation Contents</h2>
<h3>Valgrind's core</h3>
<a href="../coregrind/docs/index.html"><b>Core</b></a><br>
<h3>Distributed skins</h3>
<a href="../memcheck/docs/index.html"> <b>MemCheck </b></a><br>
<a href="../addrcheck/docs/index.html"> <b>AddrCheck </b></a><br>
<a href="../cachegrind/docs/index.html"><b>Cachegrind</b></a><br>
<a href="../none/docs/index.html"> <b>Nulgrind </b></a><br>
<a href="../lackey/docs/index.html"> <b>Lackey </b></a><br>
<a href="../corecheck/docs/index.html"> <b>CoreCheck </b></a><br>
<a href="../helgrind/docs/index.html"> <b>Helgrind </b></a><br>
<h3>About skins</h3>
<a href="../coregrind/docs/skins.html"><b>How to write a skin</b></a><br>
</body>
</html>

View File

@ -1,80 +0,0 @@
<html>
<head>
<style type="text/css">
body { background-color: #ffffff;
color: #000000;
font-family: Times, Helvetica, Arial;
font-size: 14pt}
h4 { margin-bottom: 0.3em}
code { color: #000000;
font-family: Courier;
font-size: 13pt }
pre { color: #000000;
font-family: Courier;
font-size: 13pt }
a:link { color: #0000C0;
text-decoration: none; }
a:visited { color: #0000C0;
text-decoration: none; }
a:active { color: #0000C0;
text-decoration: none; }
</style>
<title>Cachegrind</title>
</head>
<body bgcolor="#ffffff">
<a name="title"></a>
<h1 align=center>Helgrind</h1>
<center>This manual was last updated on 2002-10-03</center>
<p>
<center>
<a href="mailto:njn25@cam.ac.uk">njn25@cam.ac.uk</a><br>
Copyright &copy; 2000-2002 Nicholas Nethercote
<p>
Helgrind is licensed under the GNU General Public License,
version 2<br>
Helgrind is a Valgrind skin for detecting data races in threaded programs.
</center>
<p>
<h2>1&nbsp; Helgrind</h2>
Helgrind is a Valgrind skin for detecting data races in C and C++ programs
that use the Pthreads library.
<p>
It uses the Eraser algorithm described in
<blockquote>
Eraser: A Dynamic Data Race Detector for Multithreaded Programs<br>
Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro and
Thomas Anderson<br>
ACM Transactions on Computer Systems, 15(4):391-411<br>
November 1997.
</blockquote>
It is unfortunately in a rather mangy state and probably doesn't work at all.
We include it partly because it may serve as a useful example skin, and partly
in case anybody is inspired to improve it and get it working.
<p>
If you are inspired, we'd love to hear from you. And if you are successful,
you might like to include some improvements to the basic Eraser algorithm
described in Section 4.2 of
<blockquote>
Runtime Checking of Multithreaded Applications with Visual Threads
Jerry J. Harrow, Jr.<br>
Proceedings of the 7th International SPIN Workshop on Model Checking of
Software<br>
Stanford, California, USA<br>
August 2000<br>
LNCS 1885, pp331--342<br>
K. Havelund, J. Penix, and W. Visser, editors.<br>
</blockquote>
<hr width="100%">
</body>
</html>

View File

@ -1,68 +0,0 @@
<html>
<head>
<style type="text/css">
body { background-color: #ffffff;
color: #000000;
font-family: Times, Helvetica, Arial;
font-size: 14pt}
h4 { margin-bottom: 0.3em}
code { color: #000000;
font-family: Courier;
font-size: 13pt }
pre { color: #000000;
font-family: Courier;
font-size: 13pt }
a:link { color: #0000C0;
text-decoration: none; }
a:visited { color: #0000C0;
text-decoration: none; }
a:active { color: #0000C0;
text-decoration: none; }
</style>
<title>Cachegrind</title>
</head>
<body bgcolor="#ffffff">
<a name="title"></a>
<h1 align=center>Lackey</h1>
<center>This manual was last updated on 2002-10-03</center>
<p>
<center>
<a href="mailto:njn25@cam.ac.uk">njn25@cam.ac.uk</a><br>
Copyright &copy; 2000-2002 Nicholas Nethercote
<p>
Lackey is licensed under the GNU General Public License,
version 2<br>
Lackey is an example Valgrind skin that does some very basic program
measurement.
</center>
<p>
<h2>1&nbsp; Lackey</h2>
Lackey is a simple Valgrind skin that does some basic program measurement.
It adds quite a lot of simple instrumentation to the program's code. It is
primarily intended to be of use as an example skin.
<p>
It measures three things:
<ol>
<li>The number of calls to <code>_dl_runtime_resolve()</code>, the function
in glibc's dynamic linker that resolves function lookups into shared
objects.<p>
<li>The number of UCode instructions (UCode is Valgrind's RISC-like
intermediate language), x86 instructions, and basic blocks executed by the
program, and some ratios between the three counts.<p>
<li>The number of conditional branches encountered and the proportion of those
taken.<p>
</ol>
<hr width="100%">
</body>
</html>

View File

@ -1,26 +0,0 @@
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta http-equiv="Content-Language" content="en-gb">
<meta name="generator"
content="Mozilla/4.76 (X11; U; Linux 2.4.1-0.1.9 i586) [Netscape]">
<meta name="author" content="Julian Seward <jseward@acm.org>">
<meta name="description" content="say what this prog does">
<meta name="keywords" content="Valgrind, memory checker, x86, GPL">
<title>Valgrind's user manual</title>
</head>
<frameset cols="150,*">
<frame name="nav" target="main" src="nav.html">
<frame name="main" src="manual.html" scrolling="auto">
<noframes>
<body>
<p>This page uses frames, but your browser doesn't support them.</p>
</body>
</noframes>
</frameset>
</html>

File diff suppressed because it is too large Load Diff

View File

@ -1,72 +0,0 @@
<html>
<head>
<title>Valgrind</title>
<base target="main">
<style type="text/css">
<style type="text/css">
body { background-color: #ffffff;
color: #000000;
font-family: Times, Helvetica, Arial;
font-size: 14pt}
h4 { margin-bottom: 0.3em}
code { color: #000000;
font-family: Courier;
font-size: 13pt }
pre { color: #000000;
font-family: Courier;
font-size: 13pt }
a:link { color: #0000C0;
text-decoration: none; }
a:visited { color: #0000C0;
text-decoration: none; }
a:active { color: #0000C0;
text-decoration: none; }
</style>
</head>
<body>
<br>
<a href="manual.html#contents"><b>Contents of this manual</b></a><br>
<a href="manual.html#intro">1 Introduction</a><br>
<a href="manual.html#whatfor">1.1 What Valgrind is for</a><br>
<a href="manual.html#whatdoes">1.2 What it does with
your program</a>
<p>
<a href="manual.html#howtouse">2 <b>How to use it, and how to
make sense of the results</b></a><br>
<a href="manual.html#starta">2.1 Getting started</a><br>
<a href="manual.html#comment">2.2 The commentary</a><br>
<a href="manual.html#report">2.3 Reporting of errors</a><br>
<a href="manual.html#suppress">2.4 Suppressing errors</a><br>
<a href="manual.html#flags">2.5 Command-line flags</a><br>
<a href="manual.html#errormsgs">2.6 Explanation of error messages</a><br>
<a href="manual.html#suppfiles">2.7 Writing suppressions files</a><br>
<a href="manual.html#clientreq">2.8 The Client Request mechanism</a><br>
<a href="manual.html#pthreads">2.9 Support for POSIX pthreads</a><br>
<a href="manual.html#install">2.10 Building and installing</a><br>
<a href="manual.html#problems">2.11 If you have problems</a>
<p>
<a href="manual.html#machine">3 <b>Details of the checking machinery</b></a><br>
<a href="manual.html#vvalue">3.1 Valid-value (V) bits</a><br>
<a href="manual.html#vaddress">3.2 Valid-address (A) bits</a><br>
<a href="manual.html#together">3.3 Putting it all together</a><br>
<a href="manual.html#signals">3.4 Signals</a><br>
<a href="manual.html#leaks">3.5 Memory leak detection</a>
<p>
<a href="manual.html#limits">4 <b>Limitations</b></a><br>
<p>
<a href="manual.html#howitworks">5 <b>How it works -- a rough overview</b></a><br>
<a href="manual.html#startb">5.1 Getting started</a><br>
<a href="manual.html#engine">5.2 The translation/instrumentation engine</a><br>
<a href="manual.html#track">5.3 Tracking the status of memory</a><br>
<a href="manual.html#sys_calls">5.4 System calls</a><br>
<a href="manual.html#sys_signals">5.5 Signals</a>
<p>
<a href="manual.html#example">6 <b>An example</b></a><br>
<p>
<a href="manual.html#cache">7 <b>Cache profiling</b></a></h4>
<p>
<a href="techdocs.html">8 <b>The design and implementation of Valgrind</b></a><br>
</body>
</html>

File diff suppressed because it is too large Load Diff

View File

@ -1,57 +0,0 @@
<html>
<head>
<style type="text/css">
body { background-color: #ffffff;
color: #000000;
font-family: Times, Helvetica, Arial;
font-size: 14pt}
h4 { margin-bottom: 0.3em}
code { color: #000000;
font-family: Courier;
font-size: 13pt }
pre { color: #000000;
font-family: Courier;
font-size: 13pt }
a:link { color: #0000C0;
text-decoration: none; }
a:visited { color: #0000C0;
text-decoration: none; }
a:active { color: #0000C0;
text-decoration: none; }
</style>
<title>Cachegrind</title>
</head>
<body bgcolor="#ffffff">
<a name="title"></a>
<h1 align=center>Nulgrind</h1>
<center>This manual was last updated on 2002-10-02</center>
<p>
<center>
<a href="mailto:njn25@cam.ac.uk">njn25@cam.ac.uk</a><br>
Copyright &copy; 2000-2002 Nicholas Nethercote
<p>
Nulgrind is licensed under the GNU General Public License,
version 2<br>
Nulgrind is a Valgrind skin that does not very much at all.
</center>
<p>
<h2>1&nbsp; Nulgrind</h2>
Nulgrind is the minimal skin for Valgrind. It does no initialisation or
finalisation, and adds no instrumentation to the program's code. It is mainly
of use for Valgrind's developers for debugging and regression testing.
<p>
Nonetheless you can run programs with Nulgrind. They will run roughly 5-10
times more slowly than normal, for no useful effect. Note that you need to use
the option <code>--skin=none</code> to run Nulgrind (ie. not
<code>--skin=nulgrind</code>).
<hr width="100%">
</body>
</html>