mirror of
https://github.com/Zenithsiz/ftmemsim-valgrind.git
synced 2026-02-03 18:13:01 +00:00
Delete all the old documentation ...
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1283
This commit is contained in:
parent
4623a5d36c
commit
50040b9ebc
@ -1,10 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>AddrCheck</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
(no docs yet, sorry)
|
||||
</body>
|
||||
</html>
|
||||
|
||||
@ -1,26 +0,0 @@
|
||||
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
|
||||
<html>
|
||||
|
||||
<head>
|
||||
<meta http-equiv="Content-Type"
|
||||
content="text/html; charset=iso-8859-1">
|
||||
<meta http-equiv="Content-Language" content="en-gb">
|
||||
<meta name="generator"
|
||||
content="Mozilla/4.76 (X11; U; Linux 2.4.1-0.1.9 i586) [Netscape]">
|
||||
<meta name="author" content="Julian Seward <jseward@acm.org>">
|
||||
<meta name="description" content="say what this prog does">
|
||||
<meta name="keywords" content="Valgrind, memory checker, x86, GPL">
|
||||
<title>Valgrind's user manual</title>
|
||||
</head>
|
||||
|
||||
<frameset cols="150,*">
|
||||
<frame name="nav" target="main" src="nav.html">
|
||||
<frame name="main" src="manual.html" scrolling="auto">
|
||||
<noframes>
|
||||
<body>
|
||||
<p>This page uses frames, but your browser doesn't support them.</p>
|
||||
</body>
|
||||
</noframes>
|
||||
</frameset>
|
||||
|
||||
</html>
|
||||
@ -1,752 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<style type="text/css">
|
||||
body { background-color: #ffffff;
|
||||
color: #000000;
|
||||
font-family: Times, Helvetica, Arial;
|
||||
font-size: 14pt}
|
||||
h4 { margin-bottom: 0.3em}
|
||||
code { color: #000000;
|
||||
font-family: Courier;
|
||||
font-size: 13pt }
|
||||
pre { color: #000000;
|
||||
font-family: Courier;
|
||||
font-size: 13pt }
|
||||
a:link { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
a:visited { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
a:active { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
</style>
|
||||
<title>Cachegrind</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#ffffff">
|
||||
|
||||
<a name="title"> </a>
|
||||
<h1 align=center>Cachegrind, version 1.0.0</h1>
|
||||
<center>This manual was last updated on 20020726</center>
|
||||
<p>
|
||||
|
||||
<center>
|
||||
<a href="mailto:jseward@acm.org">jseward@acm.org</a><br>
|
||||
Copyright © 2000-2002 Julian Seward
|
||||
<p>
|
||||
Cachegrind is licensed under the GNU General Public License,
|
||||
version 2<br>
|
||||
An open-source tool for finding memory-management problems in
|
||||
Linux-x86 executables.
|
||||
</center>
|
||||
|
||||
<p>
|
||||
|
||||
<hr width="100%">
|
||||
<a name="contents"></a>
|
||||
<h2>Contents of this manual</h2>
|
||||
|
||||
<h4>1 <a href="#cache">How to use Cachegrind</a></h4>
|
||||
|
||||
<h4>2 <a href="techdocs.html">How Cachegrind works</a></h4>
|
||||
|
||||
<hr width="100%">
|
||||
|
||||
|
||||
<a name="cache"></a>
|
||||
<h2>1 Cache profiling</h2>
|
||||
Cachegrind is a tool for doing cache simulations and annotate your source
|
||||
line-by-line with the number of cache misses. In particular, it records:
|
||||
<ul>
|
||||
<li>L1 instruction cache reads and misses;
|
||||
<li>L1 data cache reads and read misses, writes and write misses;
|
||||
<li>L2 unified cache reads and read misses, writes and writes misses.
|
||||
</ul>
|
||||
On a modern x86 machine, an L1 miss will typically cost around 10 cycles,
|
||||
and an L2 miss can cost as much as 200 cycles. Detailed cache profiling can be
|
||||
very useful for improving the performance of your program.<p>
|
||||
|
||||
Also, since one instruction cache read is performed per instruction executed,
|
||||
you can find out how many instructions are executed per line, which can be
|
||||
useful for traditional profiling and test coverage.<p>
|
||||
|
||||
Any feedback, bug-fixes, suggestions, etc, welcome.
|
||||
|
||||
|
||||
<h3>1.1 Overview</h3>
|
||||
First off, as for normal Valgrind use, you probably want to compile with
|
||||
debugging info (the <code>-g</code> flag). But by contrast with normal
|
||||
Valgrind use, you probably <b>do</b> want to turn optimisation on, since you
|
||||
should profile your program as it will be normally run.
|
||||
|
||||
The two steps are:
|
||||
<ol>
|
||||
<li>Run your program with <code>valgrind --skin=cachegrind</code> in front of
|
||||
the normal command line invocation. When the program finishes,
|
||||
Valgrind will print summary cache statistics. It also collects
|
||||
line-by-line information in a file
|
||||
<code>cachegrind.out.<i>pid</i></code>, where <code><i>pid</i></code>
|
||||
is the program's process id.
|
||||
<p>
|
||||
This step should be done every time you want to collect
|
||||
information about a new program, a changed program, or about the
|
||||
same program with different input.
|
||||
</li>
|
||||
<p>
|
||||
<li>Generate a function-by-function summary, and possibly annotate
|
||||
source files with 'cg_annotate'. Source files to annotate can be
|
||||
specified manually, or manually on the command line, or
|
||||
"interesting" source files can be annotated automatically with
|
||||
the <code>--auto=yes</code> option. You can annotate C/C++
|
||||
files or assembly language files equally easily.
|
||||
<p>
|
||||
This step can be performed as many times as you like for each
|
||||
Step 2. You may want to do multiple annotations showing
|
||||
different information each time.<p>
|
||||
</li>
|
||||
</ol>
|
||||
|
||||
The steps are described in detail in the following sections.<p>
|
||||
|
||||
|
||||
<h3>1.2 Cache simulation specifics</h3>
|
||||
|
||||
Cachegrind uses a simulation for a machine with a split L1 cache and a unified
|
||||
L2 cache. This configuration is used for all (modern) x86-based machines we
|
||||
are aware of. Old Cyrix CPUs had a unified I and D L1 cache, but they are
|
||||
ancient history now.<p>
|
||||
|
||||
The more specific characteristics of the simulation are as follows.
|
||||
|
||||
<ul>
|
||||
<li>Write-allocate: when a write miss occurs, the block written to
|
||||
is brought into the D1 cache. Most modern caches have this
|
||||
property.</li><p>
|
||||
|
||||
<li>Bit-selection hash function: the line(s) in the cache to which a
|
||||
memory block maps is chosen by the middle bits M--(M+N-1) of the
|
||||
byte address, where:
|
||||
<ul>
|
||||
<li> line size = 2^M bytes </li>
|
||||
<li>(cache size / line size) = 2^N bytes</li>
|
||||
</ul> </li><p>
|
||||
|
||||
<li>Inclusive L2 cache: the L2 cache replicates all the entries of
|
||||
the L1 cache. This is standard on Pentium chips, but AMD
|
||||
Athlons use an exclusive L2 cache that only holds blocks evicted
|
||||
from L1. Ditto AMD Durons and most modern VIAs.</li><p>
|
||||
</ul>
|
||||
|
||||
The cache configuration simulated (cache size, associativity and line size) is
|
||||
determined automagically using the CPUID instruction. If you have an old
|
||||
machine that (a) doesn't support the CPUID instruction, or (b) supports it in
|
||||
an early incarnation that doesn't give any cache information, then Cachegrind
|
||||
will fall back to using a default configuration (that of a model 3/4 Athlon).
|
||||
Cachegrind will tell you if this happens. You can manually specify one, two or
|
||||
all three levels (I1/D1/L2) of the cache from the command line using the
|
||||
<code>--I1</code>, <code>--D1</code> and <code>--L2</code> options.<p>
|
||||
|
||||
Other noteworthy behaviour:
|
||||
|
||||
<ul>
|
||||
<li>References that straddle two cache lines are treated as follows:
|
||||
<ul>
|
||||
<li>If both blocks hit --> counted as one hit</li>
|
||||
<li>If one block hits, the other misses --> counted as one miss</li>
|
||||
<li>If both blocks miss --> counted as one miss (not two)</li>
|
||||
</ul><p></li>
|
||||
|
||||
<li>Instructions that modify a memory location (eg. <code>inc</code> and
|
||||
<code>dec</code>) are counted as doing just a read, ie. a single data
|
||||
reference. This may seem strange, but since the write can never cause a
|
||||
miss (the read guarantees the block is in the cache) it's not very
|
||||
interesting.<p>
|
||||
|
||||
Thus it measures not the number of times the data cache is accessed, but
|
||||
the number of times a data cache miss could occur.<p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
If you are interested in simulating a cache with different properties, it is
|
||||
not particularly hard to write your own cache simulator, or to modify the
|
||||
existing ones in <code>vg_cachesim_I1.c</code>, <code>vg_cachesim_D1.c</code>,
|
||||
<code>vg_cachesim_L2.c</code> and <code>vg_cachesim_gen.c</code>. We'd be
|
||||
interested to hear from anyone who does.
|
||||
|
||||
<a name="profile"></a>
|
||||
<h3>1.3 Profiling programs</h3>
|
||||
|
||||
Cache profiling is enabled by using the <code>--skin=cachegrind</code>
|
||||
option to the <code>valgrind</code> shell script. To gather cache profiling
|
||||
information about the program <code>ls -l</code>, type:
|
||||
|
||||
<blockquote><code>valgrind --skin=cachegrind ls -l</code></blockquote>
|
||||
|
||||
The program will execute (slowly). Upon completion, summary statistics
|
||||
that look like this will be printed:
|
||||
|
||||
<pre>
|
||||
==31751== I refs: 27,742,716
|
||||
==31751== I1 misses: 276
|
||||
==31751== L2 misses: 275
|
||||
==31751== I1 miss rate: 0.0%
|
||||
==31751== L2i miss rate: 0.0%
|
||||
==31751==
|
||||
==31751== D refs: 15,430,290 (10,955,517 rd + 4,474,773 wr)
|
||||
==31751== D1 misses: 41,185 ( 21,905 rd + 19,280 wr)
|
||||
==31751== L2 misses: 23,085 ( 3,987 rd + 19,098 wr)
|
||||
==31751== D1 miss rate: 0.2% ( 0.1% + 0.4%)
|
||||
==31751== L2d miss rate: 0.1% ( 0.0% + 0.4%)
|
||||
==31751==
|
||||
==31751== L2 misses: 23,360 ( 4,262 rd + 19,098 wr)
|
||||
==31751== L2 miss rate: 0.0% ( 0.0% + 0.4%)
|
||||
</pre>
|
||||
|
||||
Cache accesses for instruction fetches are summarised first, giving the
|
||||
number of fetches made (this is the number of instructions executed, which
|
||||
can be useful to know in its own right), the number of I1 misses, and the
|
||||
number of L2 instruction (<code>L2i</code>) misses.<p>
|
||||
|
||||
Cache accesses for data follow. The information is similar to that of the
|
||||
instruction fetches, except that the values are also shown split between reads
|
||||
and writes (note each row's <code>rd</code> and <code>wr</code> values add up
|
||||
to the row's total).<p>
|
||||
|
||||
Combined instruction and data figures for the L2 cache follow that.<p>
|
||||
|
||||
|
||||
<h3>1.4 Output file</h3>
|
||||
|
||||
As well as printing summary information, Cachegrind also writes
|
||||
line-by-line cache profiling information to a file named
|
||||
<code>cachegrind.out.<i>pid</i></code>. This file is human-readable, but is
|
||||
best interpreted by the accompanying program <code>cg_annotate</code>,
|
||||
described in the next section.
|
||||
<p>
|
||||
Things to note about the <code>cachegrind.out.<i>pid</i></code> file:
|
||||
<ul>
|
||||
<li>It is written every time <code>valgrind --skin=cachegrind</code>
|
||||
is run, and will overwrite any existing
|
||||
<code>cachegrind.out.<i>pid</i></code> in the current directory (but
|
||||
that won't happen very often because it takes some time for process ids
|
||||
to be recycled).</li>
|
||||
<p>
|
||||
<li>It can be huge: <code>ls -l</code> generates a file of about
|
||||
350KB. Browsing a few files and web pages with a Konqueror
|
||||
built with full debugging information generates a file
|
||||
of around 15 MB.</li>
|
||||
</ul>
|
||||
|
||||
Note that older versions of Cachegrind used a log file named
|
||||
<code>cachegrind.out</code> (i.e. no <code><i>.pid</i></code> suffix).
|
||||
The suffix serves two purposes. Firstly, it means you don't have to rename old
|
||||
log files that you don't want to overwrite. Secondly, and more importantly,
|
||||
it allows correct profiling with the <code>--trace-children=yes</code> option
|
||||
of programs that spawn child processes.
|
||||
|
||||
<a name="profileflags"></a>
|
||||
<h3>1.5 Cachegrind options</h3>
|
||||
Cachegrind accepts all the options that Valgrind does, although some of them
|
||||
(ones related to memory checking) don't do anything when cache profiling.<p>
|
||||
|
||||
The interesting cache-simulation specific options are:
|
||||
|
||||
<ul>
|
||||
<li><code>--I1=<size>,<associativity>,<line_size></code><br>
|
||||
<code>--D1=<size>,<associativity>,<line_size></code><br>
|
||||
<code>--L2=<size>,<associativity>,<line_size></code><p>
|
||||
[default: uses CPUID for automagic cache configuration]<p>
|
||||
|
||||
Manually specifies the I1/D1/L2 cache configuration, where
|
||||
<code>size</code> and <code>line_size</code> are measured in bytes. The
|
||||
three items must be comma-separated, but with no spaces, eg:
|
||||
|
||||
<blockquote>
|
||||
<code>valgrind --skin=cachegrind --I1=65535,2,64</code>
|
||||
</blockquote>
|
||||
|
||||
You can specify one, two or three of the I1/D1/L2 caches. Any level not
|
||||
manually specified will be simulated using the configuration found in the
|
||||
normal way (via the CPUID instruction, or failing that, via defaults).
|
||||
</ul>
|
||||
|
||||
|
||||
<a name="annotate"></a>
|
||||
<h3>1.6 Annotating C/C++ programs</h3>
|
||||
|
||||
Before using <code>cg_annotate</code>, it is worth widening your
|
||||
window to be at least 120-characters wide if possible, as the output
|
||||
lines can be quite long.
|
||||
<p>
|
||||
To get a function-by-function summary, run <code>cg_annotate
|
||||
--<i>pid</i></code> in a directory containing a
|
||||
<code>cachegrind.out.<i>pid</i></code> file. The <code>--<i>pid</i></code>
|
||||
is required so that <code>cg_annotate</code> knows which log file to use when
|
||||
several are present.
|
||||
<p>
|
||||
The output looks like this:
|
||||
|
||||
<pre>
|
||||
--------------------------------------------------------------------------------
|
||||
I1 cache: 65536 B, 64 B, 2-way associative
|
||||
D1 cache: 65536 B, 64 B, 2-way associative
|
||||
L2 cache: 262144 B, 64 B, 8-way associative
|
||||
Command: concord vg_to_ucode.c
|
||||
Events recorded: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
|
||||
Events shown: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
|
||||
Event sort order: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
|
||||
Threshold: 99%
|
||||
Chosen for annotation:
|
||||
Auto-annotation: on
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
|
||||
--------------------------------------------------------------------------------
|
||||
27,742,716 276 275 10,955,517 21,905 3,987 4,474,773 19,280 19,098 PROGRAM TOTALS
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw file:function
|
||||
--------------------------------------------------------------------------------
|
||||
8,821,482 5 5 2,242,702 1,621 73 1,794,230 0 0 getc.c:_IO_getc
|
||||
5,222,023 4 4 2,276,334 16 12 875,959 1 1 concord.c:get_word
|
||||
2,649,248 2 2 1,344,810 7,326 1,385 . . . vg_main.c:strcmp
|
||||
2,521,927 2 2 591,215 0 0 179,398 0 0 concord.c:hash
|
||||
2,242,740 2 2 1,046,612 568 22 448,548 0 0 ctype.c:tolower
|
||||
1,496,937 4 4 630,874 9,000 1,400 279,388 0 0 concord.c:insert
|
||||
897,991 51 51 897,831 95 30 62 1 1 ???:???
|
||||
598,068 1 1 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__flockfile
|
||||
598,068 0 0 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__funlockfile
|
||||
598,024 4 4 213,580 35 16 149,506 0 0 vg_clientmalloc.c:malloc
|
||||
446,587 1 1 215,973 2,167 430 129,948 14,057 13,957 concord.c:add_existing
|
||||
341,760 2 2 128,160 0 0 128,160 0 0 vg_clientmalloc.c:vg_trap_here_WRAPPER
|
||||
320,782 4 4 150,711 276 0 56,027 53 53 concord.c:init_hash_table
|
||||
298,998 1 1 106,785 0 0 64,071 1 1 concord.c:create
|
||||
149,518 0 0 149,516 0 0 1 0 0 ???:tolower@@GLIBC_2.0
|
||||
149,518 0 0 149,516 0 0 1 0 0 ???:fgetc@@GLIBC_2.0
|
||||
95,983 4 4 38,031 0 0 34,409 3,152 3,150 concord.c:new_word_node
|
||||
85,440 0 0 42,720 0 0 21,360 0 0 vg_clientmalloc.c:vg_bogus_epilogue
|
||||
</pre>
|
||||
|
||||
First up is a summary of the annotation options:
|
||||
|
||||
<ul>
|
||||
<li>I1 cache, D1 cache, L2 cache: cache configuration. So you know the
|
||||
configuration with which these results were obtained.</li><p>
|
||||
|
||||
<li>Command: the command line invocation of the program under
|
||||
examination.</li><p>
|
||||
|
||||
<li>Events recorded: event abbreviations are:<p>
|
||||
<ul>
|
||||
<li><code>Ir </code>: I cache reads (ie. instructions executed)</li>
|
||||
<li><code>I1mr</code>: I1 cache read misses</li>
|
||||
<li><code>I2mr</code>: L2 cache instruction read misses</li>
|
||||
<li><code>Dr </code>: D cache reads (ie. memory reads)</li>
|
||||
<li><code>D1mr</code>: D1 cache read misses</li>
|
||||
<li><code>D2mr</code>: L2 cache data read misses</li>
|
||||
<li><code>Dw </code>: D cache writes (ie. memory writes)</li>
|
||||
<li><code>D1mw</code>: D1 cache write misses</li>
|
||||
<li><code>D2mw</code>: L2 cache data write misses</li>
|
||||
</ul><p>
|
||||
Note that D1 total accesses is given by <code>D1mr</code> +
|
||||
<code>D1mw</code>, and that L2 total accesses is given by
|
||||
<code>I2mr</code> + <code>D2mr</code> + <code>D2mw</code>.</li><p>
|
||||
|
||||
<li>Events shown: the events shown (a subset of events gathered). This can
|
||||
be adjusted with the <code>--show</code> option.</li><p>
|
||||
|
||||
<li>Event sort order: the sort order in which functions are shown. For
|
||||
example, in this case the functions are sorted from highest
|
||||
<code>Ir</code> counts to lowest. If two functions have identical
|
||||
<code>Ir</code> counts, they will then be sorted by <code>I1mr</code>
|
||||
counts, and so on. This order can be adjusted with the
|
||||
<code>--sort</code> option.<p>
|
||||
|
||||
Note that this dictates the order the functions appear. It is <b>not</b>
|
||||
the order in which the columns appear; that is dictated by the "events
|
||||
shown" line (and can be changed with the <code>--show</code> option).
|
||||
</li><p>
|
||||
|
||||
<li>Threshold: <code>cg_annotate</code> by default omits functions
|
||||
that cause very low numbers of misses to avoid drowning you in
|
||||
information. In this case, cg_annotate shows summaries the
|
||||
functions that account for 99% of the <code>Ir</code> counts;
|
||||
<code>Ir</code> is chosen as the threshold event since it is the
|
||||
primary sort event. The threshold can be adjusted with the
|
||||
<code>--threshold</code> option.</li><p>
|
||||
|
||||
<li>Chosen for annotation: names of files specified manually for annotation;
|
||||
in this case none.</li><p>
|
||||
|
||||
<li>Auto-annotation: whether auto-annotation was requested via the
|
||||
<code>--auto=yes</code> option. In this case no.</li><p>
|
||||
</ul>
|
||||
|
||||
Then follows summary statistics for the whole program. These are similar
|
||||
to the summary provided when running <code>valgrind --skin=cachegrind</code>.<p>
|
||||
|
||||
Then follows function-by-function statistics. Each function is
|
||||
identified by a <code>file_name:function_name</code> pair. If a column
|
||||
contains only a dot it means the function never performs
|
||||
that event (eg. the third row shows that <code>strcmp()</code>
|
||||
contains no instructions that write to memory). The name
|
||||
<code>???</code> is used if the the file name and/or function name
|
||||
could not be determined from debugging information. If most of the
|
||||
entries have the form <code>???:???</code> the program probably wasn't
|
||||
compiled with <code>-g</code>. If any code was invalidated (either due to
|
||||
self-modifying code or unloading of shared objects) its counts are aggregated
|
||||
into a single cost centre written as <code>(discarded):(discarded)</code>.<p>
|
||||
|
||||
It is worth noting that functions will come from three types of source files:
|
||||
<ol>
|
||||
<li> From the profiled program (<code>concord.c</code> in this example).</li>
|
||||
<li>From libraries (eg. <code>getc.c</code>)</li>
|
||||
<li>From Valgrind's implementation of some libc functions (eg.
|
||||
<code>vg_clientmalloc.c:malloc</code>). These are recognisable because
|
||||
the filename begins with <code>vg_</code>, and is probably one of
|
||||
<code>vg_main.c</code>, <code>vg_clientmalloc.c</code> or
|
||||
<code>vg_mylibc.c</code>.
|
||||
</li>
|
||||
</ol>
|
||||
|
||||
There are two ways to annotate source files -- by choosing them
|
||||
manually, or with the <code>--auto=yes</code> option. To do it
|
||||
manually, just specify the filenames as arguments to
|
||||
<code>cg_annotate</code>. For example, the output from running
|
||||
<code>cg_annotate concord.c</code> for our example produces the same
|
||||
output as above followed by an annotated version of
|
||||
<code>concord.c</code>, a section of which looks like:
|
||||
|
||||
<pre>
|
||||
--------------------------------------------------------------------------------
|
||||
-- User-annotated source: concord.c
|
||||
--------------------------------------------------------------------------------
|
||||
Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
|
||||
|
||||
[snip]
|
||||
|
||||
. . . . . . . . . void init_hash_table(char *file_name, Word_Node *table[])
|
||||
3 1 1 . . . 1 0 0 {
|
||||
. . . . . . . . . FILE *file_ptr;
|
||||
. . . . . . . . . Word_Info *data;
|
||||
1 0 0 . . . 1 1 1 int line = 1, i;
|
||||
. . . . . . . . .
|
||||
5 0 0 . . . 3 0 0 data = (Word_Info *) create(sizeof(Word_Info));
|
||||
. . . . . . . . .
|
||||
4,991 0 0 1,995 0 0 998 0 0 for (i = 0; i < TABLE_SIZE; i++)
|
||||
3,988 1 1 1,994 0 0 997 53 52 table[i] = NULL;
|
||||
. . . . . . . . .
|
||||
. . . . . . . . . /* Open file, check it. */
|
||||
6 0 0 1 0 0 4 0 0 file_ptr = fopen(file_name, "r");
|
||||
2 0 0 1 0 0 . . . if (!(file_ptr)) {
|
||||
. . . . . . . . . fprintf(stderr, "Couldn't open '%s'.\n", file_name);
|
||||
1 1 1 . . . . . . exit(EXIT_FAILURE);
|
||||
. . . . . . . . . }
|
||||
. . . . . . . . .
|
||||
165,062 1 1 73,360 0 0 91,700 0 0 while ((line = get_word(data, line, file_ptr)) != EOF)
|
||||
146,712 0 0 73,356 0 0 73,356 0 0 insert(data->;word, data->line, table);
|
||||
. . . . . . . . .
|
||||
4 0 0 1 0 0 2 0 0 free(data);
|
||||
4 0 0 1 0 0 2 0 0 fclose(file_ptr);
|
||||
3 0 0 2 0 0 . . . }
|
||||
</pre>
|
||||
|
||||
(Although column widths are automatically minimised, a wide terminal is clearly
|
||||
useful.)<p>
|
||||
|
||||
Each source file is clearly marked (<code>User-annotated source</code>) as
|
||||
having been chosen manually for annotation. If the file was found in one of
|
||||
the directories specified with the <code>-I</code>/<code>--include</code>
|
||||
option, the directory and file are both given.<p>
|
||||
|
||||
Each line is annotated with its event counts. Events not applicable for a line
|
||||
are represented by a `.'; this is useful for distinguishing between an event
|
||||
which cannot happen, and one which can but did not.<p>
|
||||
|
||||
Sometimes only a small section of a source file is executed. To minimise
|
||||
uninteresting output, Valgrind only shows annotated lines and lines within a
|
||||
small distance of annotated lines. Gaps are marked with the line numbers so
|
||||
you know which part of a file the shown code comes from, eg:
|
||||
|
||||
<pre>
|
||||
(figures and code for line 704)
|
||||
-- line 704 ----------------------------------------
|
||||
-- line 878 ----------------------------------------
|
||||
(figures and code for line 878)
|
||||
</pre>
|
||||
|
||||
The amount of context to show around annotated lines is controlled by the
|
||||
<code>--context</code> option.<p>
|
||||
|
||||
To get automatic annotation, run <code>cg_annotate --auto=yes</code>.
|
||||
cg_annotate will automatically annotate every source file it can find that is
|
||||
mentioned in the function-by-function summary. Therefore, the files chosen for
|
||||
auto-annotation are affected by the <code>--sort</code> and
|
||||
<code>--threshold</code> options. Each source file is clearly marked
|
||||
(<code>Auto-annotated source</code>) as being chosen automatically. Any files
|
||||
that could not be found are mentioned at the end of the output, eg:
|
||||
|
||||
<pre>
|
||||
--------------------------------------------------------------------------------
|
||||
The following files chosen for auto-annotation could not be found:
|
||||
--------------------------------------------------------------------------------
|
||||
getc.c
|
||||
ctype.c
|
||||
../sysdeps/generic/lockfile.c
|
||||
</pre>
|
||||
|
||||
This is quite common for library files, since libraries are usually compiled
|
||||
with debugging information, but the source files are often not present on a
|
||||
system. If a file is chosen for annotation <b>both</b> manually and
|
||||
automatically, it is marked as <code>User-annotated source</code>.
|
||||
|
||||
Use the <code>-I/--include</code> option to tell Valgrind where to look for
|
||||
source files if the filenames found from the debugging information aren't
|
||||
specific enough.
|
||||
|
||||
Beware that cg_annotate can take some time to digest large
|
||||
<code>cachegrind.out.<i>pid</i></code> files, e.g. 30 seconds or more. Also
|
||||
beware that auto-annotation can produce a lot of output if your program is
|
||||
large!
|
||||
|
||||
|
||||
<h3>1.7 Annotating assembler programs</h3>
|
||||
|
||||
Valgrind can annotate assembler programs too, or annotate the
|
||||
assembler generated for your C program. Sometimes this is useful for
|
||||
understanding what is really happening when an interesting line of C
|
||||
code is translated into multiple instructions.<p>
|
||||
|
||||
To do this, you just need to assemble your <code>.s</code> files with
|
||||
assembler-level debug information. gcc doesn't do this, but you can
|
||||
use the GNU assembler with the <code>--gstabs</code> option to
|
||||
generate object files with this information, eg:
|
||||
|
||||
<blockquote><code>as --gstabs foo.s</code></blockquote>
|
||||
|
||||
You can then profile and annotate source files in the same way as for C/C++
|
||||
programs.
|
||||
|
||||
|
||||
<h3>1.8 <code>cg_annotate</code> options</h3>
|
||||
<ul>
|
||||
<li><code>--<i>pid</i></code></li><p>
|
||||
|
||||
Indicates which <code>cachegrind.out.<i>pid</i></code> file to read.
|
||||
Not actually an option -- it is required.
|
||||
|
||||
<li><code>-h, --help</code></li><p>
|
||||
<li><code>-v, --version</code><p>
|
||||
|
||||
Help and version, as usual.</li>
|
||||
|
||||
<li><code>--sort=A,B,C</code> [default: order in
|
||||
<code>cachegrind.out.<i>pid</i></code>]<p>
|
||||
Specifies the events upon which the sorting of the function-by-function
|
||||
entries will be based. Useful if you want to concentrate on eg. I cache
|
||||
misses (<code>--sort=I1mr,I2mr</code>), or D cache misses
|
||||
(<code>--sort=D1mr,D2mr</code>), or L2 misses
|
||||
(<code>--sort=D2mr,I2mr</code>).</li><p>
|
||||
|
||||
<li><code>--show=A,B,C</code> [default: all, using order in
|
||||
<code>cachegrind.out.<i>pid</i></code>]<p>
|
||||
Specifies which events to show (and the column order). Default is to use
|
||||
all present in the <code>cachegrind.out.<i>pid</i></code> file (and use
|
||||
the order in the file).</li><p>
|
||||
|
||||
<li><code>--threshold=X</code> [default: 99%] <p>
|
||||
Sets the threshold for the function-by-function summary. Functions are
|
||||
shown that account for more than X% of the primary sort event. If
|
||||
auto-annotating, also affects which files are annotated.
|
||||
|
||||
Note: thresholds can be set for more than one of the events by appending
|
||||
any events for the <code>--sort</code> option with a colon and a number
|
||||
(no spaces, though). E.g. if you want to see the functions that cover
|
||||
99% of L2 read misses and 99% of L2 write misses, use this option:
|
||||
|
||||
<blockquote><code>--sort=D2mr:99,D2mw:99</code></blockquote>
|
||||
</li><p>
|
||||
|
||||
<li><code>--auto=no</code> [default]<br>
|
||||
<code>--auto=yes</code> <p>
|
||||
When enabled, automatically annotates every file that is mentioned in the
|
||||
function-by-function summary that can be found. Also gives a list of
|
||||
those that couldn't be found.
|
||||
|
||||
<li><code>--context=N</code> [default: 8]<p>
|
||||
Print N lines of context before and after each annotated line. Avoids
|
||||
printing large sections of source files that were not executed. Use a
|
||||
large number (eg. 10,000) to show all source lines.
|
||||
</li><p>
|
||||
|
||||
<li><code>-I=<dir>, --include=<dir></code>
|
||||
[default: empty string]<p>
|
||||
Adds a directory to the list in which to search for files. Multiple
|
||||
-I/--include options can be given to add multiple directories.
|
||||
</ul>
|
||||
|
||||
|
||||
<h3>1.9 Warnings</h3>
|
||||
There are a couple of situations in which cg_annotate issues warnings.
|
||||
|
||||
<ul>
|
||||
<li>If a source file is more recent than the
|
||||
<code>cachegrind.out.<i>pid</i></code> file. This is because the
|
||||
information in <code>cachegrind.out.<i>pid</i></code> is only recorded
|
||||
with line numbers, so if the line numbers change at all in the source
|
||||
(eg. lines added, deleted, swapped), any annotations will be
|
||||
incorrect.<p>
|
||||
|
||||
<li>If information is recorded about line numbers past the end of a file.
|
||||
This can be caused by the above problem, ie. shortening the source file
|
||||
while using an old <code>cachegrind.out.<i>pid</i></code> file. If this
|
||||
happens, the figures for the bogus lines are printed anyway (clearly
|
||||
marked as bogus) in case they are important.</li><p>
|
||||
</ul>
|
||||
|
||||
|
||||
<h3>1.10 Things to watch out for</h3>
|
||||
Some odd things that can occur during annotation:
|
||||
|
||||
<ul>
|
||||
<li>If annotating at the assembler level, you might see something like this:
|
||||
|
||||
<pre>
|
||||
1 0 0 . . . . . . leal -12(%ebp),%eax
|
||||
1 0 0 . . . 1 0 0 movl %eax,84(%ebx)
|
||||
2 0 0 0 0 0 1 0 0 movl $1,-20(%ebp)
|
||||
. . . . . . . . . .align 4,0x90
|
||||
1 0 0 . . . . . . movl $.LnrB,%eax
|
||||
1 0 0 . . . 1 0 0 movl %eax,-16(%ebp)
|
||||
</pre>
|
||||
|
||||
How can the third instruction be executed twice when the others are
|
||||
executed only once? As it turns out, it isn't. Here's a dump of the
|
||||
executable, using <code>objdump -d</code>:
|
||||
|
||||
<pre>
|
||||
8048f25: 8d 45 f4 lea 0xfffffff4(%ebp),%eax
|
||||
8048f28: 89 43 54 mov %eax,0x54(%ebx)
|
||||
8048f2b: c7 45 ec 01 00 00 00 movl $0x1,0xffffffec(%ebp)
|
||||
8048f32: 89 f6 mov %esi,%esi
|
||||
8048f34: b8 08 8b 07 08 mov $0x8078b08,%eax
|
||||
8048f39: 89 45 f0 mov %eax,0xfffffff0(%ebp)
|
||||
</pre>
|
||||
|
||||
Notice the extra <code>mov %esi,%esi</code> instruction. Where did this
|
||||
come from? The GNU assembler inserted it to serve as the two bytes of
|
||||
padding needed to align the <code>movl $.LnrB,%eax</code> instruction on
|
||||
a four-byte boundary, but pretended it didn't exist when adding debug
|
||||
information. Thus when Valgrind reads the debug info it thinks that the
|
||||
<code>movl $0x1,0xffffffec(%ebp)</code> instruction covers the address
|
||||
range 0x8048f2b--0x804833 by itself, and attributes the counts for the
|
||||
<code>mov %esi,%esi</code> to it.<p>
|
||||
</li>
|
||||
|
||||
<li>Inlined functions can cause strange results in the function-by-function
|
||||
summary. If a function <code>inline_me()</code> is defined in
|
||||
<code>foo.h</code> and inlined in the functions <code>f1()</code>,
|
||||
<code>f2()</code> and <code>f3()</code> in <code>bar.c</code>, there will
|
||||
not be a <code>foo.h:inline_me()</code> function entry. Instead, there
|
||||
will be separate function entries for each inlining site, ie.
|
||||
<code>foo.h:f1()</code>, <code>foo.h:f2()</code> and
|
||||
<code>foo.h:f3()</code>. To find the total counts for
|
||||
<code>foo.h:inline_me()</code>, add up the counts from each entry.<p>
|
||||
|
||||
The reason for this is that although the debug info output by gcc
|
||||
indicates the switch from <code>bar.c</code> to <code>foo.h</code>, it
|
||||
doesn't indicate the name of the function in <code>foo.h</code>, so
|
||||
Valgrind keeps using the old one.<p>
|
||||
|
||||
<li>Sometimes, the same filename might be represented with a relative name
|
||||
and with an absolute name in different parts of the debug info, eg:
|
||||
<code>/home/user/proj/proj.h</code> and <code>../proj.h</code>. In this
|
||||
case, if you use auto-annotation, the file will be annotated twice with
|
||||
the counts split between the two.<p>
|
||||
</li>
|
||||
|
||||
<li>Files with more than 65,535 lines cause difficulties for the stabs debug
|
||||
info reader. This is because the line number in the <code>struct
|
||||
nlist</code> defined in <code>a.out.h</code> under Linux is only a 16-bit
|
||||
value. Valgrind can handle some files with more than 65,535 lines
|
||||
correctly by making some guesses to identify line number overflows. But
|
||||
some cases are beyond it, in which case you'll get a warning message
|
||||
explaining that annotations for the file might be incorrect.<p>
|
||||
</li>
|
||||
|
||||
<li>If you compile some files with <code>-g</code> and some without, some
|
||||
events that take place in a file without debug info could be attributed
|
||||
to the last line of a file with debug info (whichever one gets placed
|
||||
before the non-debug-info file in the executable).<p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
This list looks long, but these cases should be fairly rare.<p>
|
||||
|
||||
Note: stabs is not an easy format to read. If you come across bizarre
|
||||
annotations that look like might be caused by a bug in the stabs reader,
|
||||
please let us know.<p>
|
||||
|
||||
|
||||
<h3>1.11 Accuracy</h3>
|
||||
Valgrind's cache profiling has a number of shortcomings:
|
||||
|
||||
<ul>
|
||||
<li>It doesn't account for kernel activity -- the effect of system calls on
|
||||
the cache contents is ignored.</li><p>
|
||||
|
||||
<li>It doesn't account for other process activity (although this is probably
|
||||
desirable when considering a single program).</li><p>
|
||||
|
||||
<li>It doesn't account for virtual-to-physical address mappings; hence the
|
||||
entire simulation is not a true representation of what's happening in the
|
||||
cache.</li><p>
|
||||
|
||||
<li>It doesn't account for cache misses not visible at the instruction level,
|
||||
eg. those arising from TLB misses, or speculative execution.</li><p>
|
||||
|
||||
<li>Valgrind's custom <code>malloc()</code> will allocate memory in different
|
||||
ways to the standard <code>malloc()</code>, which could warp the results.
|
||||
</li><p>
|
||||
|
||||
<li>Valgrind's custom threads implementation will schedule threads
|
||||
differently to the standard one. This too could warp the results for
|
||||
threaded programs.
|
||||
</li><p>
|
||||
|
||||
<li>The instructions <code>bts</code>, <code>btr</code> and <code>btc</code>
|
||||
will incorrectly be counted as doing a data read if both the arguments
|
||||
are registers, eg:
|
||||
|
||||
<blockquote><code>btsl %eax, %edx</code></blockquote>
|
||||
|
||||
This should only happen rarely.
|
||||
</li><p>
|
||||
|
||||
<li>FPU instructions with data sizes of 28 and 108 bytes (e.g.
|
||||
<code>fsave</code>) are treated as though they only access 16 bytes.
|
||||
These instructions seem to be rare so hopefully this won't affect
|
||||
accuracy much.
|
||||
</li><p>
|
||||
</ul>
|
||||
|
||||
Another thing worth nothing is that results are very sensitive. Changing the
|
||||
size of the <code>valgrind.so</code> file, the size of the program being
|
||||
profiled, or even the length of its name can perturb the results. Variations
|
||||
will be small, but don't expect perfectly repeatable results if your program
|
||||
changes at all.<p>
|
||||
|
||||
While these factors mean you shouldn't trust the results to be super-accurate,
|
||||
hopefully they should be close enough to be useful.<p>
|
||||
|
||||
|
||||
<h3>1.12 Todo</h3>
|
||||
<ul>
|
||||
<li>Program start-up/shut-down calls a lot of functions that aren't
|
||||
interesting and just complicate the output. Would be nice to exclude
|
||||
these somehow.</li>
|
||||
<p>
|
||||
</ul>
|
||||
<hr width="100%">
|
||||
</body>
|
||||
</html>
|
||||
|
||||
@ -1,35 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>Valgrind</title>
|
||||
<base target="main">
|
||||
<style type="text/css">
|
||||
<style type="text/css">
|
||||
body { background-color: #ffffff;
|
||||
color: #000000;
|
||||
font-family: Times, Helvetica, Arial;
|
||||
font-size: 14pt}
|
||||
h4 { margin-bottom: 0.3em}
|
||||
code { color: #000000;
|
||||
font-family: Courier;
|
||||
font-size: 13pt }
|
||||
pre { color: #000000;
|
||||
font-family: Courier;
|
||||
font-size: 13pt }
|
||||
a:link { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
a:visited { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
a:active { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
</style>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<br>
|
||||
<a href="manual.html#contents"><b>Contents of this manual</b></a><br>
|
||||
<a href="manual.html#cache">1 <b>How to use Cachegrind</b></a></h4>
|
||||
<p>
|
||||
<a href="techdocs.html">2 <b>How Cachegrind works</b></a><br>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
@ -1,461 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<style type="text/css">
|
||||
body { background-color: #ffffff;
|
||||
color: #000000;
|
||||
font-family: Times, Helvetica, Arial;
|
||||
font-size: 14pt}
|
||||
h4 { margin-bottom: 0.3em}
|
||||
code { color: #000000;
|
||||
font-family: Courier;
|
||||
font-size: 13pt }
|
||||
pre { color: #000000;
|
||||
font-family: Courier;
|
||||
font-size: 13pt }
|
||||
a:link { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
a:visited { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
a:active { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
</style>
|
||||
<title>The design and implementation of Valgrind</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#ffffff">
|
||||
|
||||
<a name="title"> </a>
|
||||
<h1 align=center>How Cachegrind works</h1>
|
||||
|
||||
<center>
|
||||
Detailed technical notes for hackers, maintainers and the
|
||||
overly-curious<br>
|
||||
These notes pertain to snapshot 20020306<br>
|
||||
<p>
|
||||
<a href="mailto:jseward@acm.org">jseward@acm.org<br>
|
||||
<a href="http://developer.kde.org/~sewardj">http://developer.kde.org/~sewardj</a><br>
|
||||
Copyright © 2000-2002 Julian Seward
|
||||
<p>
|
||||
Valgrind is licensed under the GNU General Public License,
|
||||
version 2<br>
|
||||
An open-source tool for finding memory-management problems in
|
||||
x86 GNU/Linux executables.
|
||||
</center>
|
||||
|
||||
<p>
|
||||
|
||||
|
||||
|
||||
|
||||
<hr width="100%">
|
||||
|
||||
<h2>Cache profiling</h2>
|
||||
Valgrind is a very nice platform for doing cache profiling and other kinds of
|
||||
simulation, because it converts horrible x86 instructions into nice clean
|
||||
RISC-like UCode. For example, for cache profiling we are interested in
|
||||
instructions that read and write memory; in UCode there are only four
|
||||
instructions that do this: <code>LOAD</code>, <code>STORE</code>,
|
||||
<code>FPU_R</code> and <code>FPU_W</code>. By contrast, because of the x86
|
||||
addressing modes, almost every instruction can read or write memory.<p>
|
||||
|
||||
Most of the cache profiling machinery is in the file
|
||||
<code>vg_cachesim.c</code>.<p>
|
||||
|
||||
These notes are a somewhat haphazard guide to how Valgrind's cache profiling
|
||||
works.<p>
|
||||
|
||||
<h3>Cost centres</h3>
|
||||
Valgrind gathers cache profiling about every instruction executed,
|
||||
individually. Each instruction has a <b>cost centre</b> associated with it.
|
||||
There are two kinds of cost centre: one for instructions that don't reference
|
||||
memory (<code>iCC</code>), and one for instructions that do
|
||||
(<code>idCC</code>):
|
||||
|
||||
<pre>
|
||||
typedef struct _CC {
|
||||
ULong a;
|
||||
ULong m1;
|
||||
ULong m2;
|
||||
} CC;
|
||||
|
||||
typedef struct _iCC {
|
||||
/* word 1 */
|
||||
UChar tag;
|
||||
UChar instr_size;
|
||||
|
||||
/* words 2+ */
|
||||
Addr instr_addr;
|
||||
CC I;
|
||||
} iCC;
|
||||
|
||||
typedef struct _idCC {
|
||||
/* word 1 */
|
||||
UChar tag;
|
||||
UChar instr_size;
|
||||
UChar data_size;
|
||||
|
||||
/* words 2+ */
|
||||
Addr instr_addr;
|
||||
CC I;
|
||||
CC D;
|
||||
} idCC;
|
||||
</pre>
|
||||
|
||||
Each <code>CC</code> has three fields <code>a</code>, <code>m1</code>,
|
||||
<code>m2</code> for recording references, level 1 misses and level 2 misses.
|
||||
Each of these is a 64-bit <code>ULong</code> -- the numbers can get very large,
|
||||
ie. greater than 4.2 billion allowed by a 32-bit unsigned int.<p>
|
||||
|
||||
A <code>iCC</code> has one <code>CC</code> for instruction cache accesses. A
|
||||
<code>idCC</code> has two, one for instruction cache accesses, and one for data
|
||||
cache accesses.<p>
|
||||
|
||||
The <code>iCC</code> and <code>dCC</code> structs also store unchanging
|
||||
information about the instruction:
|
||||
<ul>
|
||||
<li>An instruction-type identification tag (explained below)</li><p>
|
||||
<li>Instruction size</li><p>
|
||||
<li>Data reference size (<code>idCC</code> only)</li><p>
|
||||
<li>Instruction address</li><p>
|
||||
</ul>
|
||||
|
||||
Note that data address is not one of the fields for <code>idCC</code>. This is
|
||||
because for many memory-referencing instructions the data address can change
|
||||
each time it's executed (eg. if it uses register-offset addressing). We have
|
||||
to give this item to the cache simulation in a different way (see
|
||||
Instrumentation section below). Some memory-referencing instructions do always
|
||||
reference the same address, but we don't try to treat them specialy in order to
|
||||
keep things simple.<p>
|
||||
|
||||
Also note that there is only room for recording info about one data cache
|
||||
access in an <code>idCC</code>. So what about instructions that do a read then
|
||||
a write, such as:
|
||||
|
||||
<blockquote><code>inc %(esi)</code></blockquote>
|
||||
|
||||
In a write-allocate cache, as simulated by Valgrind, the write cannot miss,
|
||||
since it immediately follows the read which will drag the block into the cache
|
||||
if it's not already there. So the write access isn't really interesting, and
|
||||
Valgrind doesn't record it. This means that Valgrind doesn't measure
|
||||
memory references, but rather memory references that could miss in the cache.
|
||||
This behaviour is the same as that used by the AMD Athlon hardware counters.
|
||||
It also has the benefit of simplifying the implementation -- instructions that
|
||||
read and write memory can be treated like instructions that read memory.<p>
|
||||
|
||||
<h3>Storing cost-centres</h3>
|
||||
Cost centres are stored in a way that makes them very cheap to lookup, which is
|
||||
important since one is looked up for every original x86 instruction
|
||||
executed.<p>
|
||||
|
||||
Valgrind does JIT translations at the basic block level, and cost centres are
|
||||
also setup and stored at the basic block level. By doing things carefully, we
|
||||
store all the cost centres for a basic block in a contiguous array, and lookup
|
||||
comes almost for free.<p>
|
||||
|
||||
Consider this part of a basic block (for exposition purposes, pretend it's an
|
||||
entire basic block):
|
||||
|
||||
<pre>
|
||||
movl $0x0,%eax
|
||||
movl $0x99, -4(%ebp)
|
||||
</pre>
|
||||
|
||||
The translation to UCode looks like this:
|
||||
|
||||
<pre>
|
||||
MOVL $0x0, t20
|
||||
PUTL t20, %EAX
|
||||
INCEIPo $5
|
||||
|
||||
LEA1L -4(t4), t14
|
||||
MOVL $0x99, t18
|
||||
STL t18, (t14)
|
||||
INCEIPo $7
|
||||
</pre>
|
||||
|
||||
The first step is to allocate the cost centres. This requires a preliminary
|
||||
pass to count how many x86 instructions were in the basic block, and their
|
||||
types (and thus sizes). UCode translations for single x86 instructions are
|
||||
delimited by the <code>INCEIPo</code> instruction, the argument of which gives
|
||||
the byte size of the instruction (note that lazy INCEIP updating is turned off
|
||||
to allow this).<p>
|
||||
|
||||
We can tell if an x86 instruction references memory by looking for
|
||||
<code>LDL</code> and <code>STL</code> UCode instructions, and thus what kind of
|
||||
cost centre is required. From this we can determine how many cost centres we
|
||||
need for the basic block, and their sizes. We can then allocate them in a
|
||||
single array.<p>
|
||||
|
||||
Consider the example code above. After the preliminary pass, we know we need
|
||||
two cost centres, one <code>iCC</code> and one <code>dCC</code>. So we
|
||||
allocate an array to store these which looks like this:
|
||||
|
||||
<pre>
|
||||
|(uninit)| tag (1 byte)
|
||||
|(uninit)| instr_size (1 bytes)
|
||||
|(uninit)| (padding) (2 bytes)
|
||||
|(uninit)| instr_addr (4 bytes)
|
||||
|(uninit)| I.a (8 bytes)
|
||||
|(uninit)| I.m1 (8 bytes)
|
||||
|(uninit)| I.m2 (8 bytes)
|
||||
|
||||
|(uninit)| tag (1 byte)
|
||||
|(uninit)| instr_size (1 byte)
|
||||
|(uninit)| data_size (1 byte)
|
||||
|(uninit)| (padding) (1 byte)
|
||||
|(uninit)| instr_addr (4 bytes)
|
||||
|(uninit)| I.a (8 bytes)
|
||||
|(uninit)| I.m1 (8 bytes)
|
||||
|(uninit)| I.m2 (8 bytes)
|
||||
|(uninit)| D.a (8 bytes)
|
||||
|(uninit)| D.m1 (8 bytes)
|
||||
|(uninit)| D.m2 (8 bytes)
|
||||
</pre>
|
||||
|
||||
(We can see now why we need tags to distinguish between the two types of cost
|
||||
centres.)<p>
|
||||
|
||||
We also record the size of the array. We look up the debug info of the first
|
||||
instruction in the basic block, and then stick the array into a table indexed
|
||||
by filename and function name. This makes it easy to dump the information
|
||||
quickly to file at the end.<p>
|
||||
|
||||
<h3>Instrumentation</h3>
|
||||
The instrumentation pass has two main jobs:
|
||||
|
||||
<ol>
|
||||
<li>Fill in the gaps in the allocated cost centres.</li><p>
|
||||
<li>Add UCode to call the cache simulator for each instruction.</li><p>
|
||||
</ol>
|
||||
|
||||
The instrumentation pass steps through the UCode and the cost centres in
|
||||
tandem. As each original x86 instruction's UCode is processed, the appropriate
|
||||
gaps in the instructions cost centre are filled in, for example:
|
||||
|
||||
<pre>
|
||||
|INSTR_CC| tag (1 byte)
|
||||
|5 | instr_size (1 bytes)
|
||||
|(uninit)| (padding) (2 bytes)
|
||||
|i_addr1 | instr_addr (4 bytes)
|
||||
|0 | I.a (8 bytes)
|
||||
|0 | I.m1 (8 bytes)
|
||||
|0 | I.m2 (8 bytes)
|
||||
|
||||
|WRITE_CC| tag (1 byte)
|
||||
|7 | instr_size (1 byte)
|
||||
|4 | data_size (1 byte)
|
||||
|(uninit)| (padding) (1 byte)
|
||||
|i_addr2 | instr_addr (4 bytes)
|
||||
|0 | I.a (8 bytes)
|
||||
|0 | I.m1 (8 bytes)
|
||||
|0 | I.m2 (8 bytes)
|
||||
|0 | D.a (8 bytes)
|
||||
|0 | D.m1 (8 bytes)
|
||||
|0 | D.m2 (8 bytes)
|
||||
</pre>
|
||||
|
||||
(Note that this step is not performed if a basic block is re-translated; see
|
||||
<a href="#retranslations">here</a> for more information.)<p>
|
||||
|
||||
GCC inserts padding before the <code>instr_size</code> field so that it is word
|
||||
aligned.<p>
|
||||
|
||||
The instrumentation added to call the cache simulation function looks like this
|
||||
(instrumentation is indented to distinguish it from the original UCode):
|
||||
|
||||
<pre>
|
||||
MOVL $0x0, t20
|
||||
PUTL t20, %EAX
|
||||
PUSHL %eax
|
||||
PUSHL %ecx
|
||||
PUSHL %edx
|
||||
MOVL $0x4091F8A4, t46 # address of 1st CC
|
||||
PUSHL t46
|
||||
CALLMo $0x12 # second cachesim function
|
||||
CLEARo $0x4
|
||||
POPL %edx
|
||||
POPL %ecx
|
||||
POPL %eax
|
||||
INCEIPo $5
|
||||
|
||||
LEA1L -4(t4), t14
|
||||
MOVL $0x99, t18
|
||||
MOVL t14, t42
|
||||
STL t18, (t14)
|
||||
PUSHL %eax
|
||||
PUSHL %ecx
|
||||
PUSHL %edx
|
||||
PUSHL t42
|
||||
MOVL $0x4091F8C4, t44 # address of 2nd CC
|
||||
PUSHL t44
|
||||
CALLMo $0x13 # second cachesim function
|
||||
CLEARo $0x8
|
||||
POPL %edx
|
||||
POPL %ecx
|
||||
POPL %eax
|
||||
INCEIPo $7
|
||||
</pre>
|
||||
|
||||
Consider the first instruction's UCode. Each call is surrounded by three
|
||||
<code>PUSHL</code> and <code>POPL</code> instructions to save and restore the
|
||||
caller-save registers. Then the address of the instruction's cost centre is
|
||||
pushed onto the stack, to be the first argument to the cache simulation
|
||||
function. The address is known at this point because we are doing a
|
||||
simultaneous pass through the cost centre array. This means the cost centre
|
||||
lookup for each instruction is almost free (just the cost of pushing an
|
||||
argument for a function call). Then the call to the cache simulation function
|
||||
for non-memory-reference instructions is made (note that the
|
||||
<code>CALLMo</code> UInstruction takes an offset into a table of predefined
|
||||
functions; it is not an absolute address), and the single argument is
|
||||
<code>CLEAR</code>ed from the stack.<p>
|
||||
|
||||
The second instruction's UCode is similar. The only difference is that, as
|
||||
mentioned before, we have to pass the address of the data item referenced to
|
||||
the cache simulation function too. This explains the <code>MOVL t14,
|
||||
t42</code> and <code>PUSHL t42</code> UInstructions. (Note that the seemingly
|
||||
redundant <code>MOV</code>ing will probably be optimised away during register
|
||||
allocation.)<p>
|
||||
|
||||
Note that instead of storing unchanging information about each instruction
|
||||
(instruction size, data size, etc) in its cost centre, we could have passed in
|
||||
these arguments to the simulation function. But this would slow the calls down
|
||||
(two or three extra arguments pushed onto the stack). Also it would bloat the
|
||||
UCode instrumentation by amounts similar to the space required for them in the
|
||||
cost centre; bloated UCode would also fill the translation cache more quickly,
|
||||
requiring more translations for large programs and slowing them down more.<p>
|
||||
|
||||
<a name="retranslations"></a>
|
||||
<h3>Handling basic block retranslations</h3>
|
||||
The above description ignores one complication. Valgrind has a limited size
|
||||
cache for basic block translations; if it fills up, old translations are
|
||||
discarded. If a discarded basic block is executed again, it must be
|
||||
re-translated.<p>
|
||||
|
||||
However, we can't use this approach for profiling -- we can't throw away cost
|
||||
centres for instructions in the middle of execution! So when a basic block is
|
||||
translated, we first look for its cost centre array in the hash table. If
|
||||
there is no cost centre array, it must be the first translation, so we proceed
|
||||
as described above. But if there is a cost centre array already, it must be a
|
||||
retranslation. In this case, we skip the cost centre allocation and
|
||||
initialisation steps, but still do the UCode instrumentation step.<p>
|
||||
|
||||
<h3>The cache simulation</h3>
|
||||
The cache simulation is fairly straightforward. It just tracks which memory
|
||||
blocks are in the cache at the moment (it doesn't track the contents, since
|
||||
that is irrelevant).<p>
|
||||
|
||||
The interface to the simulation is quite clean. The functions called from the
|
||||
UCode contain calls to the simulation functions in the files
|
||||
<Code>vg_cachesim_{I1,D1,L2}.c</code>; these calls are inlined so that only
|
||||
one function call is done per simulated x86 instruction. The file
|
||||
<code>vg_cachesim.c</code> simply <code>#include</code>s the three files
|
||||
containing the simulation, which makes plugging in new cache simulations is
|
||||
very easy -- you just replace the three files and recompile.<p>
|
||||
|
||||
<h3>Output</h3>
|
||||
Output is fairly straightforward, basically printing the cost centre for every
|
||||
instruction, grouped by files and functions. Total counts (eg. total cache
|
||||
accesses, total L1 misses) are calculated when traversing this structure rather
|
||||
than during execution, to save time; the cache simulation functions are called
|
||||
so often that even one or two extra adds can make a sizeable difference.<p>
|
||||
|
||||
Input file has the following format:
|
||||
|
||||
<pre>
|
||||
file ::= desc_line* cmd_line events_line data_line+ summary_line
|
||||
desc_line ::= "desc:" ws? non_nl_string
|
||||
cmd_line ::= "cmd:" ws? cmd
|
||||
events_line ::= "events:" ws? (event ws)+
|
||||
data_line ::= file_line | fn_line | count_line
|
||||
file_line ::= ("fl=" | "fi=" | "fe=") filename
|
||||
fn_line ::= "fn=" fn_name
|
||||
count_line ::= line_num ws? (count ws)+
|
||||
summary_line ::= "summary:" ws? (count ws)+
|
||||
count ::= num | "."
|
||||
</pre>
|
||||
|
||||
Where:
|
||||
|
||||
<ul>
|
||||
<li><code>non_nl_string</code> is any string not containing a newline.</li><p>
|
||||
<li><code>cmd</code> is a command line invocation.</li><p>
|
||||
<li><code>filename</code> and <code>fn_name</code> can be anything.</li><p>
|
||||
<li><code>num</code> and <code>line_num</code> are decimal numbers.</li><p>
|
||||
<li><code>ws</code> is whitespace.</li><p>
|
||||
<li><code>nl</code> is a newline.</li><p>
|
||||
</ul>
|
||||
|
||||
The contents of the "desc:" lines is printed out at the top of the summary.
|
||||
This is a generic way of providing simulation specific information, eg. for
|
||||
giving the cache configuration for cache simulation.<p>
|
||||
|
||||
Counts can be "." to represent "N/A", eg. the number of write misses for an
|
||||
instruction that doesn't write to memory.<p>
|
||||
|
||||
The number of counts in each <code>line</code> and the
|
||||
<code>summary_line</code> should not exceed the number of events in the
|
||||
<code>event_line</code>. If the number in each <code>line</code> is less,
|
||||
cg_annotate treats those missing as though they were a "." entry. <p>
|
||||
|
||||
A <code>file_line</code> changes the current file name. A <code>fn_line</code>
|
||||
changes the current function name. A <code>count_line</code> contains counts
|
||||
that pertain to the current filename/fn_name. A "fn=" <code>file_line</code>
|
||||
and a <code>fn_line</code> must appear before any <code>count_line</code>s to
|
||||
give the context of the first <code>count_line</code>s.<p>
|
||||
|
||||
Each <code>file_line</code> should be immediately followed by a
|
||||
<code>fn_line</code>. "fi=" <code>file_lines</code> are used to switch
|
||||
filenames for inlined functions; "fe=" <code>file_lines</code> are similar, but
|
||||
are put at the end of a basic block in which the file name hasn't been switched
|
||||
back to the original file name. (fi and fe lines behave the same, they are
|
||||
only distinguished to help debugging.)<p>
|
||||
|
||||
|
||||
<h3>Summary of performance features</h3>
|
||||
Quite a lot of work has gone into making the profiling as fast as possible.
|
||||
This is a summary of the important features:
|
||||
|
||||
<ul>
|
||||
<li>The basic block-level cost centre storage allows almost free cost centre
|
||||
lookup.</li><p>
|
||||
|
||||
<li>Only one function call is made per instruction simulated; even this
|
||||
accounts for a sizeable percentage of execution time, but it seems
|
||||
unavoidable if we want flexibility in the cache simulator.</li><p>
|
||||
|
||||
<li>Unchanging information about an instruction is stored in its cost centre,
|
||||
avoiding unnecessary argument pushing, and minimising UCode
|
||||
instrumentation bloat.</li><p>
|
||||
|
||||
<li>Summary counts are calculated at the end, rather than during
|
||||
execution.</li><p>
|
||||
|
||||
<li>The <code>cachegrind.out</code> output files can contain huge amounts of
|
||||
information; file format was carefully chosen to minimise file
|
||||
sizes.</li><p>
|
||||
</ul>
|
||||
|
||||
|
||||
<h3>Annotation</h3>
|
||||
Annotation is done by cg_annotate. It is a fairly straightforward Perl script
|
||||
that slurps up all the cost centres, and then runs through all the chosen
|
||||
source files, printing out cost centres with them. It too has been carefully
|
||||
optimised.
|
||||
|
||||
|
||||
<h3>Similar work, extensions</h3>
|
||||
It would be relatively straightforward to do other simulations and obtain
|
||||
line-by-line information about interesting events. A good example would be
|
||||
branch prediction -- all branches could be instrumented to interact with a
|
||||
branch prediction simulator, using very similar techniques to those described
|
||||
above.<p>
|
||||
|
||||
In particular, cg_annotate would not need to change -- the file format is such
|
||||
that it is not specific to the cache simulation, but could be used for any kind
|
||||
of line-by-line information. The only part of cg_annotate that is specific to
|
||||
the cache simulation is the name of the input file
|
||||
(<code>cachegrind.out</code>), although it would be very simple to add an
|
||||
option to control this.<p>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
@ -1,66 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<style type="text/css">
|
||||
body { background-color: #ffffff;
|
||||
color: #000000;
|
||||
font-family: Times, Helvetica, Arial;
|
||||
font-size: 14pt}
|
||||
h4 { margin-bottom: 0.3em}
|
||||
code { color: #000000;
|
||||
font-family: Courier;
|
||||
font-size: 13pt }
|
||||
pre { color: #000000;
|
||||
font-family: Courier;
|
||||
font-size: 13pt }
|
||||
a:link { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
a:visited { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
a:active { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
</style>
|
||||
<title>Cachegrind</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#ffffff">
|
||||
|
||||
<a name="title"></a>
|
||||
<h1 align=center>CoreCheck</h1>
|
||||
<center>This manual was last updated on 2002-10-03</center>
|
||||
<p>
|
||||
|
||||
<center>
|
||||
<a href="mailto:njn25@cam.ac.uk">njn25@cam.ac.uk</a><br>
|
||||
Copyright © 2000-2002 Nicholas Nethercote
|
||||
<p>
|
||||
CoreCheck is licensed under the GNU General Public License,
|
||||
version 2<br>
|
||||
CoreCheck is a Valgrind skin that does very basic error checking.
|
||||
</center>
|
||||
|
||||
<p>
|
||||
|
||||
<h2>1 CoreCheck</h2>
|
||||
|
||||
CoreCheck is a very simple skin for Valgrind. It adds no instrumentation to
|
||||
the program's code, and only reports the few kinds of errors detected by
|
||||
Valgrind's core. It is mainly of use for Valgrind's developers for debugging
|
||||
and regression testing.
|
||||
<p>
|
||||
The errors detected are those found by the core when
|
||||
<code>VG_(needs).core_errors</code> is set. These include:
|
||||
|
||||
<ul>
|
||||
<li>Pthread API errors (many; eg. unlocking a non-locked mutex)<p>
|
||||
<li>Silly arguments to <code>malloc() </code> et al (eg. negative size)<p>
|
||||
<li>Invalid file descriptors to blocking syscalls <code>read()</code> and
|
||||
<code>write()</code><p>
|
||||
<li>Bad signal numbers passed to <code>sigaction()</code><p>
|
||||
<li>Attempts to install signal handler for <code>SIGKILL</code> or
|
||||
<code>SIGSTOP</code> <p>
|
||||
</ul>
|
||||
|
||||
<hr width="100%">
|
||||
</body>
|
||||
</html>
|
||||
|
||||
@ -1,26 +0,0 @@
|
||||
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
|
||||
<html>
|
||||
|
||||
<head>
|
||||
<meta http-equiv="Content-Type"
|
||||
content="text/html; charset=iso-8859-1">
|
||||
<meta http-equiv="Content-Language" content="en-gb">
|
||||
<meta name="generator"
|
||||
content="Mozilla/4.76 (X11; U; Linux 2.4.1-0.1.9 i586) [Netscape]">
|
||||
<meta name="author" content="Julian Seward <jseward@acm.org>">
|
||||
<meta name="description" content="say what this prog does">
|
||||
<meta name="keywords" content="Valgrind, memory checker, x86, GPL">
|
||||
<title>Valgrind's user manual</title>
|
||||
</head>
|
||||
|
||||
<frameset cols="150,*">
|
||||
<frame name="nav" target="main" src="nav.html">
|
||||
<frame name="main" src="manual.html" scrolling="auto">
|
||||
<noframes>
|
||||
<body>
|
||||
<p>This page uses frames, but your browser doesn't support them.</p>
|
||||
</body>
|
||||
</noframes>
|
||||
</frameset>
|
||||
|
||||
</html>
|
||||
File diff suppressed because it is too large
Load Diff
@ -1,72 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>Valgrind</title>
|
||||
<base target="main">
|
||||
<style type="text/css">
|
||||
<style type="text/css">
|
||||
body { background-color: #ffffff;
|
||||
color: #000000;
|
||||
font-family: Times, Helvetica, Arial;
|
||||
font-size: 14pt}
|
||||
h4 { margin-bottom: 0.3em}
|
||||
code { color: #000000;
|
||||
font-family: Courier;
|
||||
font-size: 13pt }
|
||||
pre { color: #000000;
|
||||
font-family: Courier;
|
||||
font-size: 13pt }
|
||||
a:link { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
a:visited { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
a:active { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
</style>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<br>
|
||||
<a href="manual.html#contents"><b>Contents of this manual</b></a><br>
|
||||
<a href="manual.html#intro">1 Introduction</a><br>
|
||||
<a href="manual.html#whatfor">1.1 What Valgrind is for</a><br>
|
||||
<a href="manual.html#whatdoes">1.2 What it does with
|
||||
your program</a>
|
||||
<p>
|
||||
<a href="manual.html#howtouse">2 <b>How to use it, and how to
|
||||
make sense of the results</b></a><br>
|
||||
<a href="manual.html#starta">2.1 Getting started</a><br>
|
||||
<a href="manual.html#comment">2.2 The commentary</a><br>
|
||||
<a href="manual.html#report">2.3 Reporting of errors</a><br>
|
||||
<a href="manual.html#suppress">2.4 Suppressing errors</a><br>
|
||||
<a href="manual.html#flags">2.5 Command-line flags</a><br>
|
||||
<a href="manual.html#errormsgs">2.6 Explanation of error messages</a><br>
|
||||
<a href="manual.html#suppfiles">2.7 Writing suppressions files</a><br>
|
||||
<a href="manual.html#clientreq">2.8 The Client Request mechanism</a><br>
|
||||
<a href="manual.html#pthreads">2.9 Support for POSIX pthreads</a><br>
|
||||
<a href="manual.html#install">2.10 Building and installing</a><br>
|
||||
<a href="manual.html#problems">2.11 If you have problems</a>
|
||||
<p>
|
||||
<a href="manual.html#machine">3 <b>Details of the checking machinery</b></a><br>
|
||||
<a href="manual.html#vvalue">3.1 Valid-value (V) bits</a><br>
|
||||
<a href="manual.html#vaddress">3.2 Valid-address (A) bits</a><br>
|
||||
<a href="manual.html#together">3.3 Putting it all together</a><br>
|
||||
<a href="manual.html#signals">3.4 Signals</a><br>
|
||||
<a href="manual.html#leaks">3.5 Memory leak detection</a>
|
||||
<p>
|
||||
<a href="manual.html#limits">4 <b>Limitations</b></a><br>
|
||||
<p>
|
||||
<a href="manual.html#howitworks">5 <b>How it works -- a rough overview</b></a><br>
|
||||
<a href="manual.html#startb">5.1 Getting started</a><br>
|
||||
<a href="manual.html#engine">5.2 The translation/instrumentation engine</a><br>
|
||||
<a href="manual.html#track">5.3 Tracking the status of memory</a><br>
|
||||
<a href="manual.html#sys_calls">5.4 System calls</a><br>
|
||||
<a href="manual.html#sys_signals">5.5 Signals</a>
|
||||
<p>
|
||||
<a href="manual.html#example">6 <b>An example</b></a><br>
|
||||
<p>
|
||||
<a href="manual.html#cache">7 <b>Cache profiling</b></a></h4>
|
||||
<p>
|
||||
<a href="techdocs.html">8 <b>The design and implementation of Valgrind</b></a><br>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
@ -1,687 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<style type="text/css">
|
||||
body { background-color: #ffffff;
|
||||
color: #000000;
|
||||
font-family: Times, Helvetica, Arial;
|
||||
font-size: 14pt}
|
||||
h4 { margin-bottom: 0.3em}
|
||||
code { color: #000000;
|
||||
font-family: Courier;
|
||||
font-size: 13pt }
|
||||
pre { color: #000000;
|
||||
font-family: Courier;
|
||||
font-size: 13pt }
|
||||
a:link { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
a:visited { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
a:active { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
</style>
|
||||
<title>Valgrind</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#ffffff">
|
||||
|
||||
<a name="title"> </a>
|
||||
<h1 align=center>Valgrind Skins</h1>
|
||||
<center>
|
||||
A guide to writing new skins for Valgrind<br>
|
||||
This guide was last updated on 20020926
|
||||
</center>
|
||||
<p>
|
||||
|
||||
<center>
|
||||
<a href="mailto:njn25@cam.ac.uk">njn25@cam.ac.uk</a><br>
|
||||
Nick Nethercote, October 2002
|
||||
<p>
|
||||
Valgrind is licensed under the GNU General Public License,
|
||||
version 2<br>
|
||||
An open-source tool for supervising execution of Linux-x86 executables.
|
||||
</center>
|
||||
|
||||
<p>
|
||||
|
||||
<hr width="100%">
|
||||
<a name="contents"></a>
|
||||
<h2>Contents of this manual</h2>
|
||||
|
||||
<h4>1 <a href="#intro">Introduction</a></h4>
|
||||
1.1 <a href="#supexec">Supervised Execution</a><br>
|
||||
1.2 <a href="#skins">Skins</a><br>
|
||||
1.3 <a href="#execspaces">Execution Spaces</a><br>
|
||||
|
||||
<h4>2 <a href="#writingaskin">Writing a Skin</a></h4>
|
||||
2.1 <a href="#whywriteaskin">Why write a skin?</a><br>
|
||||
2.2 <a href="#howskinswork">How skins work</a><br>
|
||||
2.3 <a href="#gettingcode">Getting the code</a><br>
|
||||
2.4 <a href="#gettingstarted">Getting started</a><br>
|
||||
2.5 <a href="#writingcode">Writing the code</a><br>
|
||||
2.6 <a href="#init">Initialisation</a><br>
|
||||
2.7 <a href="#instr">Instrumentation</a><br>
|
||||
2.8 <a href="#fini">Finalisation</a><br>
|
||||
2.9 <a href="#otherimportantinfo">Other important information</a><br>
|
||||
2.10 <a href="#wordsofadvice">Words of advice</a><br>
|
||||
|
||||
<h4>3 <a href="#advancedtopics">Advanced Topics</a></h4>
|
||||
3.1 <a href="#suppressions">Suppressions</a><br>
|
||||
3.2 <a href="#documentation">Documentation</a><br>
|
||||
3.3 <a href="#regressiontests">Regression tests</a><br>
|
||||
3.4 <a href="#profiling">Profiling</a><br>
|
||||
3.5 <a href="#othermakefilehackery">Other makefile hackery</a><br>
|
||||
3.6 <a href="#interfaceversions">Core/skin interface versions</a><br>
|
||||
|
||||
<h4>4 <a href="#finalwords">Final Words</a></h4>
|
||||
|
||||
<hr width="100%">
|
||||
|
||||
<a name="intro"></a>
|
||||
<h2>1 Introduction</h2>
|
||||
|
||||
<a name="supexec"></a>
|
||||
<h3>1.1 Supervised Execution</h3>
|
||||
|
||||
Valgrind provides a generic infrastructure for supervising the execution of
|
||||
programs. This is done by providing a way to instrument programs in very
|
||||
precise ways, making it relatively easy to support activities such as dynamic
|
||||
error detection and profiling.<p>
|
||||
|
||||
Although writing a skin is not easy, and requires learning quite a few things
|
||||
about Valgrind, it is much easier than instrumenting a program from scratch
|
||||
yourself.
|
||||
|
||||
<a name="skins"></a>
|
||||
<h3>1.2 Skins</h3>
|
||||
The key idea behind Valgrind's architecture is the division between its
|
||||
``core'' and ``skins''.
|
||||
<p>
|
||||
The core provides the common low-level infrastructure to support program
|
||||
instrumentation, including the x86-to-x86 JIT compiler, low-level memory
|
||||
manager, signal handling and a scheduler (for pthreads). It also provides
|
||||
certain services that are useful to some but not all skins, such as support
|
||||
for error recording and suppression.
|
||||
<p>
|
||||
But the core leaves certain operations undefined, which must be filled by skins.
|
||||
Most notably, skins define how program code should be instrumented. They can
|
||||
also define certain variables to indicate to the core that they would like to
|
||||
use certain services, or be notified when certain interesting events occur.
|
||||
<p>
|
||||
Each skin that is written defines a new program supervision tool. Writing a
|
||||
new tool just requires writing a new skin. The core takes care of all the hard
|
||||
work.
|
||||
<p>
|
||||
|
||||
<a name="execspaces"></a>
|
||||
<h3>1.3 Execution Spaces</h3>
|
||||
An important concept to understand before writing a skin is that there are
|
||||
three spaces in which program code executes:
|
||||
|
||||
<ol>
|
||||
<li>User space: this covers most of the program's execution. The skin is
|
||||
given the code and can instrument it any way it likes, providing (more or
|
||||
less) total control over the code.<p>
|
||||
|
||||
Code executed in user space includes all the program code, almost all of
|
||||
the C library (including things like the dynamic linker), and almost
|
||||
all parts of all other libraries.
|
||||
</li><p>
|
||||
|
||||
<li>Core space: a small proportion of the program's execution takes place
|
||||
entirely within Valgrind's core. This includes:<p>
|
||||
|
||||
<ul>
|
||||
<li>Dynamic memory management (<code>malloc()</code> etc.)</li>
|
||||
|
||||
<li>Pthread operations and scheduling</li>
|
||||
|
||||
<li>Signal handling</li>
|
||||
</ul><p>
|
||||
|
||||
A skin has no control over these operations; it never ``sees'' the code
|
||||
doing this work and thus cannot instrument it. However, the core
|
||||
provides hooks so a skin can be notified when certain interesting events
|
||||
happen, for example when when dynamic memory is allocated or freed, the
|
||||
stack pointer is changed, or a pthread mutex is locked, etc.<p>
|
||||
|
||||
Note that these hooks only notify skins of events relevant to user
|
||||
space. For example, when the core allocates some memory for its own use,
|
||||
the skin is not notified of this, because it's not directly part of the
|
||||
supervised program's execution.
|
||||
</li><p>
|
||||
|
||||
<li>Kernel space: execution in the kernel. Two kinds:<p>
|
||||
|
||||
<ol>
|
||||
<li>System calls: can't be directly observed by either the skin or the
|
||||
core. But the core does have some idea of what happens to the
|
||||
arguments, and it provides hooks for a skin to wrap system calls.
|
||||
</li><p>
|
||||
|
||||
<li>Other: all other kernel activity (e.g. process scheduling) is
|
||||
totally opaque and irrelevant to the program.
|
||||
</li><p>
|
||||
</ol>
|
||||
</li><p>
|
||||
|
||||
It should be noted that a skin only has direct control over code executed in
|
||||
user space. This is the vast majority of code executed, but it is not
|
||||
absolutely all of it, so any profiling information recorded by a skin won't
|
||||
be totally accurate.
|
||||
</ol>
|
||||
|
||||
|
||||
<a name="writingaskin"></a>
|
||||
<h2>2 Writing a Skin</h2>
|
||||
|
||||
<a name="whywriteaskin"</a>
|
||||
<h3>2.1 Why write a skin?</h3>
|
||||
|
||||
Before you write a skin, you should have some idea of what it should do. What
|
||||
is it you want to know about your programs of interest? Consider some existing
|
||||
skins:
|
||||
|
||||
<ul>
|
||||
<li>memcheck: among other things, performs fine-grained validity and
|
||||
addressibility checks of every memory reference performed by the program
|
||||
</li><p>
|
||||
|
||||
<li>addrcheck: performs lighterweight addressibility checks of every memory
|
||||
reference performed by the program</li><p>
|
||||
|
||||
<li>cachegrind: tracks every instruction and memory reference to simulate
|
||||
instruction and data caches, tracking cache accesses and misses that
|
||||
occur on every line in the program</li><p>
|
||||
|
||||
<li>helgrind: tracks every memory access and mutex lock/unlock to determine
|
||||
if a program contains any data races</li><p>
|
||||
|
||||
<li>lackey: does simple counting of various things: the number of calls to a
|
||||
particular function (<code>_dl_runtime_resolve()</code>); the number of
|
||||
basic blocks, x86 instruction, UCode instructions executed; the number
|
||||
of branches executed and the proportion of those which were taken.</li><p>
|
||||
</ul>
|
||||
|
||||
These examples give a reasonable idea of what kinds of things Valgrind can be
|
||||
used for. The instrumentation can range from very lightweight (e.g. counting
|
||||
the number of times a particular function is called) to very intrusive (e.g.
|
||||
memcheck's memory checking).
|
||||
|
||||
<a name="howskinswork"</a>
|
||||
<h3>2.2 How skins work</h3>
|
||||
|
||||
Skins must define various functions for instrumenting programs that are called
|
||||
by Valgrind's core, yet they must be implemented in such a way that they can be
|
||||
written and compiled without touching Valgrind's core. This is important,
|
||||
because one of our aims is to allow people to write and distribute their own
|
||||
skins that can be plugged into Valgrind's core easily.<p>
|
||||
|
||||
This is achieved by packaging each skin into a separate shared object which is
|
||||
then loaded ahead of the core shared object <code>valgrind.so</code>, using the
|
||||
dynamic linker's <code>LD_PRELOAD</code> variable. Any functions defined in
|
||||
the skin that share the name with a function defined in core (such as
|
||||
the instrumentation function <code>SK_(instrument)()</code>) override the
|
||||
core's definition. Thus the core can call the necessary skin functions.<p>
|
||||
|
||||
This magic is all done for you; the shared object used is chosen with the
|
||||
<code>--skin</code> option to the <code>valgrind</code> startup script. The
|
||||
default skin used is <code>memcheck</code>, Valgrind's original memory checker.
|
||||
|
||||
<a name="gettingcode"</a>
|
||||
<h3>2.3 Getting the code</h3>
|
||||
|
||||
To write your own skin, you'll need to check out a copy of Valgrind from the
|
||||
CVS repository, rather than using a packaged distribution. This is because it
|
||||
contains several extra files needed for writing skins.<p>
|
||||
|
||||
To check out the code from the CVS repository, first login:
|
||||
<blockquote><code>
|
||||
cvs -d:pserver:anonymous@cvs.valgrind.sourceforge.net:/cvsroot/valgrind login
|
||||
</code></blockquote>
|
||||
|
||||
Then checkout the code. To get a copy of the current development version
|
||||
(recommended for the brave only):
|
||||
<blockquote><code>
|
||||
cvs -z3 -d:pserver:anonymous@cvs.valgrind.sourceforge.net:/cvsroot/valgrind co valgrind
|
||||
</code></blockquote>
|
||||
|
||||
To get a copy of the stable released branch:
|
||||
<blockquote><code>
|
||||
cvs -z3 -d:pserver:anonymous@cvs.valgrind.sourceforge.net:/cvsroot/valgrind co -r <i>TAG</i> valgrind
|
||||
</code></blockquote>
|
||||
|
||||
where <code><i>TAG</i></code> has the form <code>VALGRIND_X_Y_Z</code> for
|
||||
version X.Y.Z.
|
||||
|
||||
<a name="gettingstarted"</a>
|
||||
<h3>2.4 Getting started</h3>
|
||||
|
||||
Valgrind uses GNU <code>automake</code> and <code>autoconf</code> for the
|
||||
creation of Makefiles and configuration. But don't worry, these instructions
|
||||
should be enough to get you started even if you know nothing about those
|
||||
tools.<p>
|
||||
|
||||
In what follows, all filenames are relative to Valgrind's top-level directory
|
||||
<code>valgrind/</code>.
|
||||
|
||||
<ol>
|
||||
<li>Choose a name for the skin, and an abbreviation that can be used as a
|
||||
short prefix. We'll use <code>foobar</code> and <code>fb</code> as an
|
||||
example.
|
||||
</li><p>
|
||||
|
||||
<li>Make a new directory <code>foobar/</code> which will hold the skin.
|
||||
</li><p>
|
||||
|
||||
<li>Copy <code>example/Makefile.am</code> into <code>foobar/</code>.
|
||||
Edit it by replacing all occurrences of the string
|
||||
``<code>example</code>'' with ``<code>foobar</code>'' and the one
|
||||
occurrence of the string ``<code>ex_</code>'' with ``<code>fb_</code>''.
|
||||
It might be worth trying to understand this file, at least a little; you
|
||||
might have to do more complicated things with it later on. In
|
||||
particular, the name of the <code>vgskin_foobar_so_SOURCES</code> variable
|
||||
determines the name of the skin's shared object, which determines what
|
||||
name must be passed to the <code>--skin</code> option to use the skin.
|
||||
</li><p>
|
||||
|
||||
<li>Copy <code>example/ex_main.c</code> into
|
||||
<code>foobar/</code>, renaming it as <code>fb_main.c</code>.
|
||||
Edit it by changing the five lines in <code>SK_(pre_clo_init)()</code>
|
||||
to something appropriate for the skin. These fields are used in the
|
||||
startup message, except for <code>bug_reports_to</code> which is used
|
||||
if a skin assertion fails.
|
||||
</li><p>
|
||||
|
||||
<li>Edit <code>Makefile.am</code>, adding the new directory
|
||||
<code>foobar</code> to the <code>SUBDIRS</code> variable.
|
||||
</li><p>
|
||||
|
||||
<li>Edit <code>configure.in</code>, adding <code>foobar/Makefile</code> to the
|
||||
<code>AC_OUTPUT</code> list.
|
||||
</li><p>
|
||||
|
||||
<li>Run:
|
||||
<pre>
|
||||
autogen.sh
|
||||
./configure --prefix=`pwd`/inst
|
||||
make install</pre>
|
||||
|
||||
It should automake, configure and compile without errors, putting copies
|
||||
of the skin's shared object <code>vgskin_foobar.so</code> in
|
||||
<code>foobar/</code> and
|
||||
<code>inst/lib/valgrind/</code>.
|
||||
</li><p>
|
||||
|
||||
<li>You can test it with a command like
|
||||
<pre>
|
||||
inst/bin/valgrind --skin=foobar date</pre>
|
||||
|
||||
(almost any program should work; <code>date</code> is just an example).
|
||||
The output should be something like this:
|
||||
<pre>
|
||||
==738== foobar-0.0.1, a foobarring tool for x86-linux.
|
||||
==738== Copyright (C) 2002, and GNU GPL'd, by J. Random Hacker.
|
||||
==738== Built with valgrind-1.1.0, a program execution monitor.
|
||||
==738== Copyright (C) 2000-2002, and GNU GPL'd, by Julian Seward.
|
||||
==738== Estimated CPU clock rate is 1400 MHz
|
||||
==738== For more details, rerun with: -v
|
||||
==738==
|
||||
Wed Sep 25 10:31:54 BST 2002
|
||||
==738==</pre>
|
||||
|
||||
The skin does nothing except run the program uninstrumented.
|
||||
</li><p>
|
||||
</ol>
|
||||
|
||||
These steps don't have to be followed exactly - you can choose different names
|
||||
for your source files, and use a different <code>--prefix</code> for
|
||||
<code>./configure</code>.<p>
|
||||
|
||||
Now that we've setup, built and tested the simplest possible skin, onto the
|
||||
interesting stuff...
|
||||
|
||||
|
||||
<a name="writingcode"></a>
|
||||
<h3>2.5 Writing the code</h3>
|
||||
|
||||
A skin must define at least these four functions:
|
||||
<pre>
|
||||
SK_(pre_clo_init)()
|
||||
SK_(post_clo_init)()
|
||||
SK_(instrument)()
|
||||
SK_(fini)()
|
||||
</pre>
|
||||
|
||||
Also, it must use the macro <code>VG_DETERMINE_INTERFACE_VERSION</code>
|
||||
exactly once in its source code. If it doesn't, you will get a link error
|
||||
involving <code>VG_(skin_interface_major_version)</code>. This macro is
|
||||
used to ensure the core/skin interface used by the core and a plugged-in
|
||||
skin are binary compatible.
|
||||
|
||||
In addition, if a skin wants to use some of the optional services provided by
|
||||
the core, it may have to define other functions.
|
||||
|
||||
<a name="init"></a>
|
||||
<h3>2.6 Initialisation</h3>
|
||||
|
||||
Most of the initialisation should be done in <code>SK_(pre_clo_init)()</code>.
|
||||
Only use <code>SK_(post_clo_init)()</code> if a skin provides command line
|
||||
options and must do some initialisation after option processing takes place
|
||||
(``<code>clo</code>'' stands for ``command line options'').<p>
|
||||
|
||||
The first argument to <code>SK_(pre_clo_init)()</code> must be initialised with
|
||||
various ``details'' for a skin. These are all compulsory except for
|
||||
<code>version</code>. They are used when constructing the startup message,
|
||||
except for <code></code> which is used if <code>VG_(skin_panic)()</code> is
|
||||
ever called, or a skin assertion fails.<p>
|
||||
|
||||
The second argument to <code>SK_(pre_clo_init)()</code> must be initialised with
|
||||
the ``needs'' for a skin. They are mostly booleans, and can be left untouched
|
||||
(they default to <code>False</code>). They determine whether a skin can do
|
||||
various things such as: record, report and suppress errors; process command
|
||||
line options; wrap system calls; record extra information about malloc'd
|
||||
blocks, etc.<p>
|
||||
|
||||
For example, if a skin wants the core's help in recording and reporting errors,
|
||||
it must set the <code>skin_errors</code> need to <code>True</code>, and then
|
||||
provide definitions of six functions for comparing errors, printing out errors,
|
||||
reading suppressions from a suppressions file, etc. While writing these
|
||||
functions requires some work, it's much less than doing error handling from
|
||||
scratch because the core is doing most of the work. See the type
|
||||
<code>VgNeeds</code> in <code>include/vg_skin.h</code> for full details of all
|
||||
the needs.<p>
|
||||
|
||||
The third argument to <code>SK_(pre_clo_init)()</code> must be initialised to
|
||||
indicate which events in core the skin wants to be notified about. These
|
||||
include things such as blocks of memory being malloc'd, the stack pointer
|
||||
changing, a mutex being locked, etc. If a skin wants to know about this,
|
||||
it should set the relevant pointer in the structure to point to a function,
|
||||
which will be called when that event happens.<p>
|
||||
|
||||
For example, if the skin want to be notified when a new block of memory is
|
||||
malloc'd, it should set the <code>new_mem_heap</code> function pointer, and the
|
||||
assigned function will be called each time this happens. See the type
|
||||
<code>VgTrackEvents</code> in <code>include/vg_skin.h</code> for full details
|
||||
of all the trackable events.<p>
|
||||
|
||||
<a name="instr"></a>
|
||||
<h3>2.7 Instrumentation</h3>
|
||||
|
||||
<code>SK_(instrument)()</code> is the interesting one. It allows you to
|
||||
instrument <i>UCode</i>, which is Valgrind's RISC-like intermediate language.
|
||||
UCode is described in the <a href="techdocs.html">technical docs</a>.
|
||||
|
||||
The easiest way to instrument UCode is to insert calls to C functions when
|
||||
interesting things happen. See the skin ``lackey''
|
||||
(<code>lackey/lk_main.c</code>) for a simple example of this, or
|
||||
Cachegrind (<code>cachegrind/cg_main.c</code>) for a more complex
|
||||
example.<p>
|
||||
|
||||
A much more complicated way to instrument UCode, albeit one that might result
|
||||
in faster instrumented programs, is to extend UCode with new UCode
|
||||
instructions. This is recommended for advanced Valgrind hackers only! See the
|
||||
``memcheck'' skin for an example.
|
||||
|
||||
<a name="fini"></a>
|
||||
<h3>2.8 Finalisation</h3>
|
||||
|
||||
This is where you can present the final results, such as a summary of the
|
||||
information collected. Any log files should be written out at this point.
|
||||
|
||||
<a name="otherimportantinfo"></a>
|
||||
<h3>2.9 Other important information</h3>
|
||||
|
||||
Please note that the core/skin split infrastructure is all very new, and not
|
||||
very well documented. Here are some important points, but there are
|
||||
undoubtedly many others that I should note but haven't thought of.<p>
|
||||
|
||||
The file <code>include/vg_skin.h</code> contains all the types,
|
||||
macros, functions, etc. that a skin should (hopefully) need, and is the only
|
||||
<code>.h</code> file a skin should need to <code>#include</code>.<p>
|
||||
|
||||
In particular, you probably shouldn't use anything from the C library (there
|
||||
are deep reasons for this, trust us). Valgrind provides an implementation of a
|
||||
reasonable subset of the C library, details of which are in
|
||||
<code>vg_skin.h</code>.<p>
|
||||
|
||||
Similarly, when writing a skin, you shouldn't need to look at any of the code
|
||||
in Valgrind's core. Although it might be useful sometimes to help understand
|
||||
something.<p>
|
||||
|
||||
<code>vg_skin.h</code> has a reasonable amount of documentation in it that
|
||||
should hopefully be enough to get you going. But ultimately, the skins
|
||||
distributed (memcheck, addrcheck, cachegrind, lackey, etc.) are probably the
|
||||
best documentation of all, for the moment.<p>
|
||||
|
||||
Note that the <code>VG_</code> and <code>SK_</code> macros are used heavily.
|
||||
These just prepend longer strings in front of names to avoid potential
|
||||
namespace clashes. We strongly recommend using the <code>SK_</code> macro
|
||||
for any global functions and variables in your skin.<p>
|
||||
|
||||
<a name="wordsofadvice"</a>
|
||||
<h3>2.10 Words of Advice</h3>
|
||||
|
||||
Writing and debugging skins is not trivial. Here are some suggestions for
|
||||
solving common problems.<p>
|
||||
|
||||
If you are getting segmentation faults in C functions used by your skin, the
|
||||
usual GDB command:
|
||||
<blockquote><code>gdb <i>prog</i> core</code></blockquote>
|
||||
usually gives the location of the segmentation fault.<p>
|
||||
|
||||
If you want to debug C functions used by your skin, you can attach GDB to
|
||||
Valgrind with some effort:
|
||||
<ul>
|
||||
<li>Enable the following code in <code>coregrind/vg_main.c</code> by
|
||||
changing <code>if (0)</code> into <code>if (1)</code>:
|
||||
<pre>
|
||||
/* Hook to delay things long enough so we can get the pid and
|
||||
attach GDB in another shell. */
|
||||
if (0) {
|
||||
Int p, q;
|
||||
for (p = 0; p < 50000; p++)
|
||||
for (q = 0; q < 50000; q++) ;
|
||||
}
|
||||
</li><p>
|
||||
and rebuild Valgrind.
|
||||
|
||||
<li>Then run:
|
||||
<blockquote><code>valgrind <i>prog</i></code></blockquote>
|
||||
|
||||
Valgrind starts the program, printing its process id, and then delays for
|
||||
a few seconds (you may have to change the loop bounds to get a suitable
|
||||
delay).</li><p>
|
||||
|
||||
<li>In a second shell run:
|
||||
|
||||
<blockquote><code>gdb <i>prog</i> <i>pid</i></code></blockquote></li><p>
|
||||
</ul>
|
||||
|
||||
GDB may be able to give you useful information. Note that by default
|
||||
most of the system is built with <code>-fomit-frame-pointer</code>,
|
||||
and you'll need to get rid of this to extract useful tracebacks from
|
||||
GDB.<p>
|
||||
|
||||
If you just want to know whether a program point has been reached, using the
|
||||
<code>OINK</code> macro (in <code> include/vg_skin.h</code>) can be easier than
|
||||
using GDB.<p>
|
||||
|
||||
If you are having problems with your UCode instrumentation, it's likely that
|
||||
GDB won't be able to help at all. In this case, Valgrind's
|
||||
<code>--trace-codegen</code> option is invaluable for observing the results of
|
||||
instrumentation.<p>
|
||||
|
||||
The other debugging command line options can be useful too (run <code>valgrind
|
||||
-h</code> for the list).<p>
|
||||
|
||||
<a name="advancedtopics"></a>
|
||||
<h2>3 Advanced Topics</h2>
|
||||
|
||||
Once a skin becomes more complicated, there are some extra things you may
|
||||
want/need to do.
|
||||
|
||||
<a name="suppressions"</a>
|
||||
<h3>3.1 Suppressions</h3>
|
||||
|
||||
If your skin reports errors and you want to suppress some common ones, you can
|
||||
add suppressions to the suppression files. The relevant files are
|
||||
<code>valgrind/*.supp</code>; the final suppression file is aggregated from
|
||||
these files by combining the relevant <code>.supp</code> files depending on the
|
||||
versions of linux, X and glibc on a system.
|
||||
|
||||
<a name="documentation"</a>
|
||||
<h3>3.2 Documentation</h3>
|
||||
|
||||
If you are feeling conscientious and want to write some HTML documentation for
|
||||
your skin, follow these steps (using <code>foobar</code> as the example skin
|
||||
name again):
|
||||
|
||||
<ol>
|
||||
<li>Make a directory <code>foobar/docs/</code>.
|
||||
</li><p>
|
||||
|
||||
<li>Edit <code>foobar/Makefile.am</code>, adding <code>docs</code> to
|
||||
the <code>SUBDIRS</code> variable.
|
||||
</li><p>
|
||||
|
||||
<li>Edit <code>configure.in</code>, adding
|
||||
<code>foobar/docs/Makefile</code> to the <code>AC_OUTPUT</code> list.
|
||||
</li><p>
|
||||
|
||||
<li>Write <code>foobar/docs/Makefile.am</code>. Use
|
||||
<code>memcheck/docs/Makefile.am</code> as an example.
|
||||
</li>
|
||||
|
||||
<li>Write the documentation; the top-level file should be called
|
||||
<code>foobar/docs/index.html</code>.
|
||||
</li><p>
|
||||
|
||||
<li>(optional) Add a link in the main documentation index
|
||||
<code>docs/index.html</code> to
|
||||
<code>foobar/docs/index.html</code>
|
||||
</li><p>
|
||||
</ol>
|
||||
|
||||
<a name="regressiontests"</a>
|
||||
<h3>3.3 Regression tests</h3>
|
||||
|
||||
Valgrind has some support for regression tests. If you want to write
|
||||
regression tests for your skin:
|
||||
|
||||
<ol>
|
||||
<li>Make a directory <code>foobar/tests/</code>.
|
||||
</li><p>
|
||||
|
||||
<li>Edit <code>foobar/Makefile.am</code>, adding <code>tests</code> to
|
||||
the <code>SUBDIRS</code> variable.
|
||||
</li><p>
|
||||
|
||||
<li>Edit <code>configure.in</code>, adding
|
||||
<code>foobar/tests/Makefile</code> to the <code>AC_OUTPUT</code> list.
|
||||
</li><p>
|
||||
|
||||
<li>Write <code>foobar/tests/Makefile.am</code>. Use
|
||||
<code>memcheck/tests/Makefile.am</code> as an example.
|
||||
</li><p>
|
||||
|
||||
<li>Write the tests, <code>.vgtest</code> test description files,
|
||||
<code>.stdout.exp</code> and <code>.stderr.exp</code> expected output
|
||||
files. (Note that Valgrind's output goes to stderr.) Some details
|
||||
on writing and running tests are given in the comments at the top of the
|
||||
testing script <code>tests/vg_regtest</code>.
|
||||
</li><p>
|
||||
|
||||
<li>Write a filter for stderr results <code>foobar/tests/filter_stderr</code>.
|
||||
It can call the existing filters in <code>tests/</code>. See
|
||||
<code>memcheck/tests/filter_stderr</code> for an example; in particular
|
||||
note the <code>$dir</code> trick that ensures the filter works correctly
|
||||
from any directory.
|
||||
</li><p>
|
||||
</ol>
|
||||
|
||||
<a name="profiling"</a>
|
||||
<h3>3.4 Profiling</h3>
|
||||
|
||||
To do simple tick-based profiling of a skin, include the line
|
||||
<blockquote>
|
||||
#include "vg_profile.c"
|
||||
</blockquote>
|
||||
in the skin somewhere, and rebuild (you may have to <code>make clean</code>
|
||||
first). Then run Valgrind with the <code>--profile=yes</code> option.<p>
|
||||
|
||||
The profiler is stack-based; you can register a profiling event with
|
||||
<code>VGP_(register_profile_event)()</code> and then use the
|
||||
<code>VGP_PUSHCC</code> and <code>VGP_POPCC</code> macros to record time spent
|
||||
doing certain things. New profiling event numbers must not overlap with the
|
||||
core profiling event numbers. See <code>include/vg_skin.h</code> for details
|
||||
and the ``memcheck'' skin for an example.
|
||||
|
||||
|
||||
<a name="othermakefilehackery"</a>
|
||||
<h3>3.5 Other makefile hackery</h3>
|
||||
|
||||
If you add any directories under <code>valgrind/foobar/</code>, you will
|
||||
need to add an appropriate <code>Makefile.am</code> to it, and add a
|
||||
corresponding entry to the <code>AC_OUTPUT</code> list in
|
||||
<code>valgrind/configure.in</code>.<p>
|
||||
|
||||
If you add any scripts to your skin (see Cachegrind for an example) you need to
|
||||
add them to the <code>bin_SCRIPTS</code> variable in
|
||||
<code>valgrind/foobar/Makefile.am</code>.<p>
|
||||
|
||||
|
||||
<a name="interfaceversions"</a>
|
||||
<h3>3.5 Core/skin interface versions</h3>
|
||||
|
||||
In order to allow for the core/skin interface to evolve over time, Valgrind
|
||||
uses a basic interface versioning system. All a skin has to do is use the
|
||||
<code>VG_DETERMINE_INTERFACE_VERSION</code> macro exactly once in its code.
|
||||
If not, a link error will occur when the skin is built.
|
||||
<p>
|
||||
The interface version number has the form X.Y. Changes in Y indicate binary
|
||||
compatible changes. Changes in X indicate binary incompatible changes. If
|
||||
the core and skin has the same major version number X they should work
|
||||
together. If X doesn't match, Valgrind will abort execution with an
|
||||
explanation of the problem.
|
||||
<p>
|
||||
This approach was chosen so that if the interface changes in the future,
|
||||
old skins won't work and the reason will be clearly explained, instead of
|
||||
possibly crashing mysteriously. We have attempted to minimise the potential
|
||||
for binary incompatible changes by means such as minimising the use of naked
|
||||
structs in the interface.
|
||||
|
||||
<a name="finalwords"></a>
|
||||
<h2>4 Final Words</h2>
|
||||
|
||||
This whole core/skin business is very new and experimental, and under active
|
||||
development.<p>
|
||||
|
||||
The first consequence of this is that the core/skin interface is quite
|
||||
immature. It will almost certainly change in the future; we have no intention
|
||||
of freezing it and then regretting the inevitable stupidities. Hopefully most
|
||||
of the future changes will be to add new features, hooks, functions, etc,
|
||||
rather than to change old ones, which should cause a minimum of trouble for
|
||||
existing skins, and we've put some effort into future-proofing the interface
|
||||
to avoid binary incompatibility. But we can't guarantee anything. The
|
||||
versioning system should catch any incompatibilities. Just something to be
|
||||
aware of.<p>
|
||||
|
||||
The second consequence of this is that we'd love to hear your feedback about
|
||||
it:
|
||||
|
||||
<ul>
|
||||
<li>If you love it or hate it</li><p>
|
||||
<li>If you find bugs</li><p>
|
||||
<li>If you write a skin</li><p>
|
||||
<li>If you have suggestions for new features, needs, trackable events,
|
||||
functions</li><p>
|
||||
<li>If you have suggestions for making skins easier to write
|
||||
</li><p>
|
||||
<li>If you have suggestions for improving this documentation </li><p>
|
||||
<li>If you don't understand something</li><p>
|
||||
</ul>
|
||||
|
||||
or anything else!<p>
|
||||
|
||||
Happy programming.
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@ -1,44 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>Valgrind</title>
|
||||
<base target="main">
|
||||
<style type="text/css">
|
||||
<style type="text/css">
|
||||
body { background-color: #ffffff;
|
||||
color: #000000;
|
||||
font-family: Times, Helvetica, Arial;
|
||||
font-size: 14pt}
|
||||
h4 { margin-bottom: 0.3em}
|
||||
code { color: #000000;
|
||||
font-family: Courier;
|
||||
font-size: 13pt }
|
||||
pre { color: #000000;
|
||||
font-family: Courier;
|
||||
font-size: 13pt }
|
||||
a:link { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
a:visited { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
a:active { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
</style>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<h2>Documentation Contents</h2>
|
||||
<h3>Valgrind's core</h3>
|
||||
<a href="../coregrind/docs/index.html"><b>Core</b></a><br>
|
||||
|
||||
<h3>Distributed skins</h3>
|
||||
<a href="../memcheck/docs/index.html"> <b>MemCheck </b></a><br>
|
||||
<a href="../addrcheck/docs/index.html"> <b>AddrCheck </b></a><br>
|
||||
<a href="../cachegrind/docs/index.html"><b>Cachegrind</b></a><br>
|
||||
<a href="../none/docs/index.html"> <b>Nulgrind </b></a><br>
|
||||
<a href="../lackey/docs/index.html"> <b>Lackey </b></a><br>
|
||||
<a href="../corecheck/docs/index.html"> <b>CoreCheck </b></a><br>
|
||||
<a href="../helgrind/docs/index.html"> <b>Helgrind </b></a><br>
|
||||
|
||||
<h3>About skins</h3>
|
||||
<a href="../coregrind/docs/skins.html"><b>How to write a skin</b></a><br>
|
||||
</body>
|
||||
</html>
|
||||
@ -1,80 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<style type="text/css">
|
||||
body { background-color: #ffffff;
|
||||
color: #000000;
|
||||
font-family: Times, Helvetica, Arial;
|
||||
font-size: 14pt}
|
||||
h4 { margin-bottom: 0.3em}
|
||||
code { color: #000000;
|
||||
font-family: Courier;
|
||||
font-size: 13pt }
|
||||
pre { color: #000000;
|
||||
font-family: Courier;
|
||||
font-size: 13pt }
|
||||
a:link { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
a:visited { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
a:active { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
</style>
|
||||
<title>Cachegrind</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#ffffff">
|
||||
|
||||
<a name="title"></a>
|
||||
<h1 align=center>Helgrind</h1>
|
||||
<center>This manual was last updated on 2002-10-03</center>
|
||||
<p>
|
||||
|
||||
<center>
|
||||
<a href="mailto:njn25@cam.ac.uk">njn25@cam.ac.uk</a><br>
|
||||
Copyright © 2000-2002 Nicholas Nethercote
|
||||
<p>
|
||||
Helgrind is licensed under the GNU General Public License,
|
||||
version 2<br>
|
||||
Helgrind is a Valgrind skin for detecting data races in threaded programs.
|
||||
</center>
|
||||
|
||||
<p>
|
||||
|
||||
<h2>1 Helgrind</h2>
|
||||
|
||||
Helgrind is a Valgrind skin for detecting data races in C and C++ programs
|
||||
that use the Pthreads library.
|
||||
<p>
|
||||
It uses the Eraser algorithm described in
|
||||
<blockquote>
|
||||
Eraser: A Dynamic Data Race Detector for Multithreaded Programs<br>
|
||||
Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro and
|
||||
Thomas Anderson<br>
|
||||
ACM Transactions on Computer Systems, 15(4):391-411<br>
|
||||
November 1997.
|
||||
</blockquote>
|
||||
|
||||
It is unfortunately in a rather mangy state and probably doesn't work at all.
|
||||
We include it partly because it may serve as a useful example skin, and partly
|
||||
in case anybody is inspired to improve it and get it working.
|
||||
<p>
|
||||
If you are inspired, we'd love to hear from you. And if you are successful,
|
||||
you might like to include some improvements to the basic Eraser algorithm
|
||||
described in Section 4.2 of
|
||||
|
||||
<blockquote>
|
||||
Runtime Checking of Multithreaded Applications with Visual Threads
|
||||
Jerry J. Harrow, Jr.<br>
|
||||
Proceedings of the 7th International SPIN Workshop on Model Checking of
|
||||
Software<br>
|
||||
Stanford, California, USA<br>
|
||||
August 2000<br>
|
||||
LNCS 1885, pp331--342<br>
|
||||
K. Havelund, J. Penix, and W. Visser, editors.<br>
|
||||
</blockquote>
|
||||
|
||||
|
||||
<hr width="100%">
|
||||
</body>
|
||||
</html>
|
||||
|
||||
@ -1,68 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<style type="text/css">
|
||||
body { background-color: #ffffff;
|
||||
color: #000000;
|
||||
font-family: Times, Helvetica, Arial;
|
||||
font-size: 14pt}
|
||||
h4 { margin-bottom: 0.3em}
|
||||
code { color: #000000;
|
||||
font-family: Courier;
|
||||
font-size: 13pt }
|
||||
pre { color: #000000;
|
||||
font-family: Courier;
|
||||
font-size: 13pt }
|
||||
a:link { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
a:visited { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
a:active { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
</style>
|
||||
<title>Cachegrind</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#ffffff">
|
||||
|
||||
<a name="title"></a>
|
||||
<h1 align=center>Lackey</h1>
|
||||
<center>This manual was last updated on 2002-10-03</center>
|
||||
<p>
|
||||
|
||||
<center>
|
||||
<a href="mailto:njn25@cam.ac.uk">njn25@cam.ac.uk</a><br>
|
||||
Copyright © 2000-2002 Nicholas Nethercote
|
||||
<p>
|
||||
Lackey is licensed under the GNU General Public License,
|
||||
version 2<br>
|
||||
Lackey is an example Valgrind skin that does some very basic program
|
||||
measurement.
|
||||
</center>
|
||||
|
||||
<p>
|
||||
|
||||
<h2>1 Lackey</h2>
|
||||
|
||||
Lackey is a simple Valgrind skin that does some basic program measurement.
|
||||
It adds quite a lot of simple instrumentation to the program's code. It is
|
||||
primarily intended to be of use as an example skin.
|
||||
<p>
|
||||
It measures three things:
|
||||
|
||||
<ol>
|
||||
<li>The number of calls to <code>_dl_runtime_resolve()</code>, the function
|
||||
in glibc's dynamic linker that resolves function lookups into shared
|
||||
objects.<p>
|
||||
|
||||
<li>The number of UCode instructions (UCode is Valgrind's RISC-like
|
||||
intermediate language), x86 instructions, and basic blocks executed by the
|
||||
program, and some ratios between the three counts.<p>
|
||||
|
||||
<li>The number of conditional branches encountered and the proportion of those
|
||||
taken.<p>
|
||||
</ol>
|
||||
|
||||
<hr width="100%">
|
||||
</body>
|
||||
</html>
|
||||
|
||||
@ -1,26 +0,0 @@
|
||||
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
|
||||
<html>
|
||||
|
||||
<head>
|
||||
<meta http-equiv="Content-Type"
|
||||
content="text/html; charset=iso-8859-1">
|
||||
<meta http-equiv="Content-Language" content="en-gb">
|
||||
<meta name="generator"
|
||||
content="Mozilla/4.76 (X11; U; Linux 2.4.1-0.1.9 i586) [Netscape]">
|
||||
<meta name="author" content="Julian Seward <jseward@acm.org>">
|
||||
<meta name="description" content="say what this prog does">
|
||||
<meta name="keywords" content="Valgrind, memory checker, x86, GPL">
|
||||
<title>Valgrind's user manual</title>
|
||||
</head>
|
||||
|
||||
<frameset cols="150,*">
|
||||
<frame name="nav" target="main" src="nav.html">
|
||||
<frame name="main" src="manual.html" scrolling="auto">
|
||||
<noframes>
|
||||
<body>
|
||||
<p>This page uses frames, but your browser doesn't support them.</p>
|
||||
</body>
|
||||
</noframes>
|
||||
</frameset>
|
||||
|
||||
</html>
|
||||
File diff suppressed because it is too large
Load Diff
@ -1,72 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>Valgrind</title>
|
||||
<base target="main">
|
||||
<style type="text/css">
|
||||
<style type="text/css">
|
||||
body { background-color: #ffffff;
|
||||
color: #000000;
|
||||
font-family: Times, Helvetica, Arial;
|
||||
font-size: 14pt}
|
||||
h4 { margin-bottom: 0.3em}
|
||||
code { color: #000000;
|
||||
font-family: Courier;
|
||||
font-size: 13pt }
|
||||
pre { color: #000000;
|
||||
font-family: Courier;
|
||||
font-size: 13pt }
|
||||
a:link { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
a:visited { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
a:active { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
</style>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<br>
|
||||
<a href="manual.html#contents"><b>Contents of this manual</b></a><br>
|
||||
<a href="manual.html#intro">1 Introduction</a><br>
|
||||
<a href="manual.html#whatfor">1.1 What Valgrind is for</a><br>
|
||||
<a href="manual.html#whatdoes">1.2 What it does with
|
||||
your program</a>
|
||||
<p>
|
||||
<a href="manual.html#howtouse">2 <b>How to use it, and how to
|
||||
make sense of the results</b></a><br>
|
||||
<a href="manual.html#starta">2.1 Getting started</a><br>
|
||||
<a href="manual.html#comment">2.2 The commentary</a><br>
|
||||
<a href="manual.html#report">2.3 Reporting of errors</a><br>
|
||||
<a href="manual.html#suppress">2.4 Suppressing errors</a><br>
|
||||
<a href="manual.html#flags">2.5 Command-line flags</a><br>
|
||||
<a href="manual.html#errormsgs">2.6 Explanation of error messages</a><br>
|
||||
<a href="manual.html#suppfiles">2.7 Writing suppressions files</a><br>
|
||||
<a href="manual.html#clientreq">2.8 The Client Request mechanism</a><br>
|
||||
<a href="manual.html#pthreads">2.9 Support for POSIX pthreads</a><br>
|
||||
<a href="manual.html#install">2.10 Building and installing</a><br>
|
||||
<a href="manual.html#problems">2.11 If you have problems</a>
|
||||
<p>
|
||||
<a href="manual.html#machine">3 <b>Details of the checking machinery</b></a><br>
|
||||
<a href="manual.html#vvalue">3.1 Valid-value (V) bits</a><br>
|
||||
<a href="manual.html#vaddress">3.2 Valid-address (A) bits</a><br>
|
||||
<a href="manual.html#together">3.3 Putting it all together</a><br>
|
||||
<a href="manual.html#signals">3.4 Signals</a><br>
|
||||
<a href="manual.html#leaks">3.5 Memory leak detection</a>
|
||||
<p>
|
||||
<a href="manual.html#limits">4 <b>Limitations</b></a><br>
|
||||
<p>
|
||||
<a href="manual.html#howitworks">5 <b>How it works -- a rough overview</b></a><br>
|
||||
<a href="manual.html#startb">5.1 Getting started</a><br>
|
||||
<a href="manual.html#engine">5.2 The translation/instrumentation engine</a><br>
|
||||
<a href="manual.html#track">5.3 Tracking the status of memory</a><br>
|
||||
<a href="manual.html#sys_calls">5.4 System calls</a><br>
|
||||
<a href="manual.html#sys_signals">5.5 Signals</a>
|
||||
<p>
|
||||
<a href="manual.html#example">6 <b>An example</b></a><br>
|
||||
<p>
|
||||
<a href="manual.html#cache">7 <b>Cache profiling</b></a></h4>
|
||||
<p>
|
||||
<a href="techdocs.html">8 <b>The design and implementation of Valgrind</b></a><br>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
File diff suppressed because it is too large
Load Diff
@ -1,57 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<style type="text/css">
|
||||
body { background-color: #ffffff;
|
||||
color: #000000;
|
||||
font-family: Times, Helvetica, Arial;
|
||||
font-size: 14pt}
|
||||
h4 { margin-bottom: 0.3em}
|
||||
code { color: #000000;
|
||||
font-family: Courier;
|
||||
font-size: 13pt }
|
||||
pre { color: #000000;
|
||||
font-family: Courier;
|
||||
font-size: 13pt }
|
||||
a:link { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
a:visited { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
a:active { color: #0000C0;
|
||||
text-decoration: none; }
|
||||
</style>
|
||||
<title>Cachegrind</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#ffffff">
|
||||
|
||||
<a name="title"></a>
|
||||
<h1 align=center>Nulgrind</h1>
|
||||
<center>This manual was last updated on 2002-10-02</center>
|
||||
<p>
|
||||
|
||||
<center>
|
||||
<a href="mailto:njn25@cam.ac.uk">njn25@cam.ac.uk</a><br>
|
||||
Copyright © 2000-2002 Nicholas Nethercote
|
||||
<p>
|
||||
Nulgrind is licensed under the GNU General Public License,
|
||||
version 2<br>
|
||||
Nulgrind is a Valgrind skin that does not very much at all.
|
||||
</center>
|
||||
|
||||
<p>
|
||||
|
||||
<h2>1 Nulgrind</h2>
|
||||
|
||||
Nulgrind is the minimal skin for Valgrind. It does no initialisation or
|
||||
finalisation, and adds no instrumentation to the program's code. It is mainly
|
||||
of use for Valgrind's developers for debugging and regression testing.
|
||||
<p>
|
||||
Nonetheless you can run programs with Nulgrind. They will run roughly 5-10
|
||||
times more slowly than normal, for no useful effect. Note that you need to use
|
||||
the option <code>--skin=none</code> to run Nulgrind (ie. not
|
||||
<code>--skin=nulgrind</code>).
|
||||
|
||||
<hr width="100%">
|
||||
</body>
|
||||
</html>
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user