mirror of
https://github.com/Zenithsiz/ftmemsim-valgrind.git
synced 2026-02-04 10:21:20 +00:00
2655 lines
115 KiB
HTML
2655 lines
115 KiB
HTML
<html>
|
|
<head>
|
|
<style type="text/css">
|
|
body { background-color: #ffffff;
|
|
color: #000000;
|
|
font-family: Times, Helvetica, Arial;
|
|
font-size: 14pt}
|
|
h4 { margin-bottom: 0.3em}
|
|
code { color: #000000;
|
|
font-family: Courier;
|
|
font-size: 13pt }
|
|
pre { color: #000000;
|
|
font-family: Courier;
|
|
font-size: 13pt }
|
|
a:link { color: #0000C0;
|
|
text-decoration: none; }
|
|
a:visited { color: #0000C0;
|
|
text-decoration: none; }
|
|
a:active { color: #0000C0;
|
|
text-decoration: none; }
|
|
</style>
|
|
</head>
|
|
|
|
<body bgcolor="#ffffff">
|
|
|
|
<a name="title"> </a>
|
|
<h1 align=center>Valgrind, snapshot 20020522</h1>
|
|
<center>This manual was majorly updated on 20020501</center>
|
|
<center>This manual was minorly updated on 20020522</center>
|
|
<p>
|
|
|
|
<center>
|
|
<a href="mailto:jseward@acm.org">jseward@acm.org<br>
|
|
Copyright © 2000-2002 Julian Seward
|
|
<p>
|
|
Valgrind is licensed under the GNU General Public License,
|
|
version 2<br>
|
|
An open-source tool for finding memory-management problems in
|
|
Linux-x86 executables.
|
|
</center>
|
|
|
|
<p>
|
|
|
|
<hr width="100%">
|
|
<a name="contents"></a>
|
|
<h2>Contents of this manual</h2>
|
|
|
|
<h4>1 <a href="#intro">Introduction</a></h4>
|
|
1.1 <a href="#whatfor">What Valgrind is for</a><br>
|
|
1.2 <a href="#whatdoes">What it does with your program</a>
|
|
|
|
<h4>2 <a href="#howtouse">How to use it, and how to make sense
|
|
of the results</a></h4>
|
|
2.1 <a href="#starta">Getting started</a><br>
|
|
2.2 <a href="#comment">The commentary</a><br>
|
|
2.3 <a href="#report">Reporting of errors</a><br>
|
|
2.4 <a href="#suppress">Suppressing errors</a><br>
|
|
2.5 <a href="#flags">Command-line flags</a><br>
|
|
2.6 <a href="#errormsgs">Explaination of error messages</a><br>
|
|
2.7 <a href="#suppfiles">Writing suppressions files</a><br>
|
|
2.8 <a href="#clientreq">The Client Request mechanism</a><br>
|
|
2.9 <a href="#pthreads">Support for POSIX pthreads</a><br>
|
|
2.10 <a href="#install">Building and installing</a><br>
|
|
2.11 <a href="#problems">If you have problems</a><br>
|
|
|
|
<h4>3 <a href="#machine">Details of the checking machinery</a></h4>
|
|
3.1 <a href="#vvalue">Valid-value (V) bits</a><br>
|
|
3.2 <a href="#vaddress">Valid-address (A) bits</a><br>
|
|
3.3 <a href="#together">Putting it all together</a><br>
|
|
3.4 <a href="#signals">Signals</a><br>
|
|
3.5 <a href="#leaks">Memory leak detection</a><br>
|
|
|
|
<h4>4 <a href="#limits">Limitations</a></h4>
|
|
|
|
<h4>5 <a href="#howitworks">How it works -- a rough overview</a></h4>
|
|
5.1 <a href="#startb">Getting started</a><br>
|
|
5.2 <a href="#engine">The translation/instrumentation engine</a><br>
|
|
5.3 <a href="#track">Tracking the status of memory</a><br>
|
|
5.4 <a href="#sys_calls">System calls</a><br>
|
|
5.5 <a href="#sys_signals">Signals</a><br>
|
|
|
|
<h4>6 <a href="#example">An example</a></h4>
|
|
|
|
<h4>7 <a href="#cache">Cache profiling</a></h4>
|
|
|
|
<h4>8 <a href="techdocs.html">The design and implementation of Valgrind</a></h4>
|
|
|
|
<hr width="100%">
|
|
|
|
<a name="intro"></a>
|
|
<h2>1 Introduction</h2>
|
|
|
|
<a name="whatfor"></a>
|
|
<h3>1.1 What Valgrind is for</h3>
|
|
|
|
Valgrind is a tool to help you find memory-management problems in your
|
|
programs. When a program is run under Valgrind's supervision, all
|
|
reads and writes of memory are checked, and calls to
|
|
malloc/new/free/delete are intercepted. As a result, Valgrind can
|
|
detect problems such as:
|
|
<ul>
|
|
<li>Use of uninitialised memory</li>
|
|
<li>Reading/writing memory after it has been free'd</li>
|
|
<li>Reading/writing off the end of malloc'd blocks</li>
|
|
<li>Reading/writing inappropriate areas on the stack</li>
|
|
<li>Memory leaks -- where pointers to malloc'd blocks are lost
|
|
forever</li>
|
|
<li>Mismatched use of malloc/new/new [] vs free/delete/delete []</li>
|
|
</ul>
|
|
|
|
Problems like these can be difficult to find by other means, often
|
|
lying undetected for long periods, then causing occasional,
|
|
difficult-to-diagnose crashes.
|
|
|
|
<p>
|
|
Valgrind is closely tied to details of the CPU, operating system and
|
|
to a less extent, compiler and basic C libraries. This makes it
|
|
difficult to make it portable, so I have chosen at the outset to
|
|
concentrate on what I believe to be a widely used platform: Red Hat
|
|
Linux 7.2, on x86s. Valgrind uses the standard Unix
|
|
<code>./configure</code>, <code>make</code>, <code>make install</code>
|
|
mechanism, and I have attempted to ensure that it works on machines
|
|
with kernel 2.2 or 2.4 and glibc 2.1.X or 2.2.X. This should cover
|
|
the vast majority of modern Linux installations.
|
|
|
|
|
|
<p>
|
|
Valgrind is licensed under the GNU General Public License, version
|
|
2. Read the file LICENSE in the source distribution for details.
|
|
|
|
<a name="whatdoes">
|
|
<h3>1.2 What it does with your program</h3>
|
|
|
|
Valgrind is designed to be as non-intrusive as possible. It works
|
|
directly with existing executables. You don't need to recompile,
|
|
relink, or otherwise modify, the program to be checked. Simply place
|
|
the word <code>valgrind</code> at the start of the command line
|
|
normally used to run the program. So, for example, if you want to run
|
|
the command <code>ls -l</code> on Valgrind, simply issue the
|
|
command: <code>valgrind ls -l</code>.
|
|
|
|
<p>Valgrind takes control of your program before it starts. Debugging
|
|
information is read from the executable and associated libraries, so
|
|
that error messages can be phrased in terms of source code
|
|
locations. Your program is then run on a synthetic x86 CPU which
|
|
checks every memory access. All detected errors are written to a
|
|
log. When the program finishes, Valgrind searches for and reports on
|
|
leaked memory.
|
|
|
|
<p>You can run pretty much any dynamically linked ELF x86 executable
|
|
using Valgrind. Programs run 25 to 50 times slower, and take a lot
|
|
more memory, than they usually would. It works well enough to run
|
|
large programs. For example, the Konqueror web browser from the KDE
|
|
Desktop Environment, version 3.0, runs slowly but usably on Valgrind.
|
|
|
|
<p>Valgrind simulates every single instruction your program executes.
|
|
Because of this, it finds errors not only in your application but also
|
|
in all supporting dynamically-linked (<code>.so</code>-format)
|
|
libraries, including the GNU C library, the X client libraries, Qt, if
|
|
you work with KDE, and so on. That often includes libraries, for
|
|
example the GNU C library, which contain memory access violations, but
|
|
which you cannot or do not want to fix.
|
|
|
|
<p>Rather than swamping you with errors in which you are not
|
|
interested, Valgrind allows you to selectively suppress errors, by
|
|
recording them in a suppressions file which is read when Valgrind
|
|
starts up. The build mechanism attempts to select suppressions which
|
|
give reasonable behaviour for the libc and XFree86 versions detected
|
|
on your machine.
|
|
|
|
|
|
<p><a href="#example">Section 6</a> shows an example of use.
|
|
<p>
|
|
<hr width="100%">
|
|
|
|
<a name="howtouse"></a>
|
|
<h2>2 How to use it, and how to make sense of the results</h2>
|
|
|
|
<a name="starta"></a>
|
|
<h3>2.1 Getting started</h3>
|
|
|
|
First off, consider whether it might be beneficial to recompile your
|
|
application and supporting libraries with optimisation disabled and
|
|
debugging info enabled (the <code>-g</code> flag). You don't have to
|
|
do this, but doing so helps Valgrind produce more accurate and less
|
|
confusing error reports. Chances are you're set up like this already,
|
|
if you intended to debug your program with GNU gdb, or some other
|
|
debugger.
|
|
|
|
<p>
|
|
A plausible compromise is to use <code>-g -O</code>.
|
|
Optimisation levels above <code>-O</code> have been observed, on very
|
|
rare occasions, to cause gcc to generate code which fools Valgrind's
|
|
error tracking machinery into wrongly reporting uninitialised value
|
|
errors. <code>-O</code> gets you the vast majority of the benefits of
|
|
higher optimisation levels anyway, so you don't lose much there.
|
|
|
|
<p>
|
|
Note that as of 1 May 2002 Valgrind does not understand the DWARF
|
|
debugging format, which is unfortunate since the upcoming gcc-3.1 uses
|
|
it by default. Valgrind only knows about the older "stabs" format.
|
|
If you use gcc-3.1 or above, you can still ask for stabs-format debug
|
|
info by passing <code>-gstabs</code> to gcc.
|
|
|
|
<p>
|
|
Then just run your application, but place the word
|
|
<code>valgrind</code> in front of your usual command-line invokation.
|
|
Note that you should run the real (machine-code) executable here. If
|
|
your application is started by, for example, a shell or perl script,
|
|
you'll need to modify it to invoke Valgrind on the real executables.
|
|
Running such scripts directly under Valgrind will result in you
|
|
getting error reports pertaining to <code>/bin/sh</code>,
|
|
<code>/usr/bin/perl</code>, or whatever interpreter you're using.
|
|
This almost certainly isn't what you want and can be confusing.
|
|
|
|
<a name="comment"></a>
|
|
<h3>2.2 The commentary</h3>
|
|
|
|
Valgrind writes a commentary, detailing error reports and other
|
|
significant events. The commentary goes to standard output by
|
|
default. This may interfere with your program, so you can ask for it
|
|
to be directed elsewhere.
|
|
|
|
<p>All lines in the commentary are of the following form:<br>
|
|
<pre>
|
|
==12345== some-message-from-Valgrind
|
|
</pre>
|
|
<p>The <code>12345</code> is the process ID. This scheme makes it easy
|
|
to distinguish program output from Valgrind commentary, and also easy
|
|
to differentiate commentaries from different processes which have
|
|
become merged together, for whatever reason.
|
|
|
|
<p>By default, Valgrind writes only essential messages to the commentary,
|
|
so as to avoid flooding you with information of secondary importance.
|
|
If you want more information about what is happening, re-run, passing
|
|
the <code>-v</code> flag to Valgrind.
|
|
|
|
|
|
<a name="report"></a>
|
|
<h3>2.3 Reporting of errors</h3>
|
|
|
|
When Valgrind detects something bad happening in the program, an error
|
|
message is written to the commentary. For example:<br>
|
|
<pre>
|
|
==25832== Invalid read of size 4
|
|
==25832== at 0x8048724: BandMatrix::ReSize(int, int, int) (bogon.cpp:45)
|
|
==25832== by 0x80487AF: main (bogon.cpp:66)
|
|
==25832== by 0x40371E5E: __libc_start_main (libc-start.c:129)
|
|
==25832== by 0x80485D1: (within /home/sewardj/newmat10/bogon)
|
|
==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd
|
|
</pre>
|
|
|
|
<p>This message says that the program did an illegal 4-byte read of
|
|
address 0xBFFFF74C, which, as far as it can tell, is not a valid stack
|
|
address, nor corresponds to any currently malloc'd or free'd blocks.
|
|
The read is happening at line 45 of <code>bogon.cpp</code>, called
|
|
from line 66 of the same file, etc. For errors associated with an
|
|
identified malloc'd/free'd block, for example reading free'd memory,
|
|
Valgrind reports not only the location where the error happened, but
|
|
also where the associated block was malloc'd/free'd.
|
|
|
|
<p>Valgrind remembers all error reports. When an error is detected,
|
|
it is compared against old reports, to see if it is a duplicate. If
|
|
so, the error is noted, but no further commentary is emitted. This
|
|
avoids you being swamped with bazillions of duplicate error reports.
|
|
|
|
<p>If you want to know how many times each error occurred, run with
|
|
the <code>-v</code> option. When execution finishes, all the reports
|
|
are printed out, along with, and sorted by, their occurrence counts.
|
|
This makes it easy to see which errors have occurred most frequently.
|
|
|
|
<p>Errors are reported before the associated operation actually
|
|
happens. For example, if you program decides to read from address
|
|
zero, Valgrind will emit a message to this effect, and the program
|
|
will then duly die with a segmentation fault.
|
|
|
|
<p>In general, you should try and fix errors in the order that they
|
|
are reported. Not doing so can be confusing. For example, a program
|
|
which copies uninitialised values to several memory locations, and
|
|
later uses them, will generate several error messages. The first such
|
|
error message may well give the most direct clue to the root cause of
|
|
the problem.
|
|
|
|
<p>The process of detecting duplicate errors is quite an expensive
|
|
one and can become a significant performance overhead if your program
|
|
generates huge quantities of errors. To avoid serious problems here,
|
|
Valgrind will simply stop collecting errors after 300 different errors
|
|
have been seen, or 30000 errors in total have been seen. In this
|
|
situation you might as well stop your program and fix it, because
|
|
Valgrind won't tell you anything else useful after this. Note that
|
|
the 300/30000 limits apply after suppressed errors are removed. These
|
|
limits are defined in <code>vg_include.h</code> and can be increased
|
|
if necessary.
|
|
|
|
<a name="suppress"></a>
|
|
<h3>2.4 Suppressing errors</h3>
|
|
|
|
Valgrind detects numerous problems in the base libraries, such as the
|
|
GNU C library, and the XFree86 client libraries, which come
|
|
pre-installed on your GNU/Linux system. You can't easily fix these,
|
|
but you don't want to see these errors (and yes, there are many!) So
|
|
Valgrind reads a list of errors to suppress at startup.
|
|
A default suppression file is cooked up by the
|
|
<code>./configure</code> script.
|
|
|
|
<p>You can modify and add to the suppressions file at your leisure,
|
|
or, better, write your own. Multiple suppression files are allowed.
|
|
This is useful if part of your project contains errors you can't or
|
|
don't want to fix, yet you don't want to continuously be reminded of
|
|
them.
|
|
|
|
<p>Each error to be suppressed is described very specifically, to
|
|
minimise the possibility that a suppression-directive inadvertantly
|
|
suppresses a bunch of similar errors which you did want to see. The
|
|
suppression mechanism is designed to allow precise yet flexible
|
|
specification of errors to suppress.
|
|
|
|
<p>If you use the <code>-v</code> flag, at the end of execution, Valgrind
|
|
prints out one line for each used suppression, giving its name and the
|
|
number of times it got used. Here's the suppressions used by a run of
|
|
<code>ls -l</code>:
|
|
<pre>
|
|
--27579-- supp: 1 socketcall.connect(serv_addr)/__libc_connect/__nscd_getgrgid_r
|
|
--27579-- supp: 1 socketcall.connect(serv_addr)/__libc_connect/__nscd_getpwuid_r
|
|
--27579-- supp: 6 strrchr/_dl_map_object_from_fd/_dl_map_object
|
|
</pre>
|
|
|
|
<a name="flags"></a>
|
|
<h3>2.5 Command-line flags</h3>
|
|
|
|
You invoke Valgrind like this:
|
|
<pre>
|
|
valgrind [options-for-Valgrind] your-prog [options for your-prog]
|
|
</pre>
|
|
|
|
<p>Note that Valgrind also reads options from the environment variable
|
|
<code>$VALGRIND</code>, and processes them before the command-line
|
|
options.
|
|
|
|
<p>Valgrind's default settings succeed in giving reasonable behaviour
|
|
in most cases. Available options, in no particular order, are as
|
|
follows:
|
|
<ul>
|
|
<li><code>--help</code></li><br>
|
|
|
|
<li><code>--version</code><br>
|
|
<p>The usual deal.</li><br><p>
|
|
|
|
<li><code>-v --verbose</code><br>
|
|
<p>Be more verbose. Gives extra information on various aspects
|
|
of your program, such as: the shared objects loaded, the
|
|
suppressions used, the progress of the instrumentation engine,
|
|
and warnings about unusual behaviour.
|
|
</li><br><p>
|
|
|
|
<li><code>-q --quiet</code><br>
|
|
<p>Run silently, and only print error messages. Useful if you
|
|
are running regression tests or have some other automated test
|
|
machinery.
|
|
</li><br><p>
|
|
|
|
<li><code>--demangle=no</code><br>
|
|
<code>--demangle=yes</code> [the default]
|
|
<p>Disable/enable automatic demangling (decoding) of C++ names.
|
|
Enabled by default. When enabled, Valgrind will attempt to
|
|
translate encoded C++ procedure names back to something
|
|
approaching the original. The demangler handles symbols mangled
|
|
by g++ versions 2.X and 3.X.
|
|
|
|
<p>An important fact about demangling is that function
|
|
names mentioned in suppressions files should be in their mangled
|
|
form. Valgrind does not demangle function names when searching
|
|
for applicable suppressions, because to do otherwise would make
|
|
suppressions file contents dependent on the state of Valgrind's
|
|
demangling machinery, and would also be slow and pointless.
|
|
</li><br><p>
|
|
|
|
<li><code>--num-callers=<number></code> [default=4]<br>
|
|
<p>By default, Valgrind shows four levels of function call names
|
|
to help you identify program locations. You can change that
|
|
number with this option. This can help in determining the
|
|
program's location in deeply-nested call chains. Note that errors
|
|
are commoned up using only the top three function locations (the
|
|
place in the current function, and that of its two immediate
|
|
callers). So this doesn't affect the total number of errors
|
|
reported.
|
|
<p>
|
|
The maximum value for this is 50. Note that higher settings
|
|
will make Valgrind run a bit more slowly and take a bit more
|
|
memory, but can be useful when working with programs with
|
|
deeply-nested call chains.
|
|
</li><br><p>
|
|
|
|
<li><code>--gdb-attach=no</code> [the default]<br>
|
|
<code>--gdb-attach=yes</code>
|
|
<p>When enabled, Valgrind will pause after every error shown,
|
|
and print the line
|
|
<br>
|
|
<code>---- Attach to GDB ? --- [Return/N/n/Y/y/C/c] ----</code>
|
|
<p>
|
|
Pressing <code>Ret</code>, or <code>N</code> <code>Ret</code>
|
|
or <code>n</code> <code>Ret</code>, causes Valgrind not to
|
|
start GDB for this error.
|
|
<p>
|
|
<code>Y</code> <code>Ret</code>
|
|
or <code>y</code> <code>Ret</code> causes Valgrind to
|
|
start GDB, for the program at this point. When you have
|
|
finished with GDB, quit from it, and the program will continue.
|
|
Trying to continue from inside GDB doesn't work.
|
|
<p>
|
|
<code>C</code> <code>Ret</code>
|
|
or <code>c</code> <code>Ret</code> causes Valgrind not to
|
|
start GDB, and not to ask again.
|
|
<p>
|
|
<code>--gdb-attach=yes</code> conflicts with
|
|
<code>--trace-children=yes</code>. You can't use them together.
|
|
Valgrind refuses to start up in this situation. 1 May 2002:
|
|
this is a historical relic which could be easily fixed if it
|
|
gets in your way. Mail me and complain if this is a problem for
|
|
you. </li><br><p>
|
|
|
|
<li><code>--partial-loads-ok=yes</code> [the default]<br>
|
|
<code>--partial-loads-ok=no</code>
|
|
<p>Controls how Valgrind handles word (4-byte) loads from
|
|
addresses for which some bytes are addressible and others
|
|
are not. When <code>yes</code> (the default), such loads
|
|
do not elicit an address error. Instead, the loaded V bytes
|
|
corresponding to the illegal addresses indicate undefined, and
|
|
those corresponding to legal addresses are loaded from shadow
|
|
memory, as usual.
|
|
<p>
|
|
When <code>no</code>, loads from partially
|
|
invalid addresses are treated the same as loads from completely
|
|
invalid addresses: an illegal-address error is issued,
|
|
and the resulting V bytes indicate valid data.
|
|
</li><br><p>
|
|
|
|
<li><code>--sloppy-malloc=no</code> [the default]<br>
|
|
<code>--sloppy-malloc=yes</code>
|
|
<p>When enabled, all requests for malloc/calloc are rounded up
|
|
to a whole number of machine words -- in other words, made
|
|
divisible by 4. For example, a request for 17 bytes of space
|
|
would result in a 20-byte area being made available. This works
|
|
around bugs in sloppy libraries which assume that they can
|
|
safely rely on malloc/calloc requests being rounded up in this
|
|
fashion. Without the workaround, these libraries tend to
|
|
generate large numbers of errors when they access the ends of
|
|
these areas.
|
|
<p>
|
|
Valgrind snapshots dated 17 Feb 2002 and later are
|
|
cleverer about this problem, and you should no longer need to
|
|
use this flag. To put it bluntly, if you do need to use this
|
|
flag, your program violates the ANSI C semantics defined for
|
|
<code>malloc</code> and <code>free</code>, even if it appears to
|
|
work correctly, and you should fix it, at least if you hope for
|
|
maximum portability.
|
|
</li><br><p>
|
|
|
|
<li><code>--trace-children=no</code> [the default]</br>
|
|
<code>--trace-children=yes</code>
|
|
<p>When enabled, Valgrind will trace into child processes. This
|
|
is confusing and usually not what you want, so is disabled by
|
|
default. As of 1 May 2002, tracing into a child process from a
|
|
parent which uses <code>libpthread.so</code> is probably broken
|
|
and is likely to cause breakage. Please report any such
|
|
problems to me. </li><br><p>
|
|
|
|
<li><code>--freelist-vol=<number></code> [default: 1000000]
|
|
<p>When the client program releases memory using free (in C) or
|
|
delete (C++), that memory is not immediately made available for
|
|
re-allocation. Instead it is marked inaccessible and placed in
|
|
a queue of freed blocks. The purpose is to delay the point at
|
|
which freed-up memory comes back into circulation. This
|
|
increases the chance that Valgrind will be able to detect
|
|
invalid accesses to blocks for some significant period of time
|
|
after they have been freed.
|
|
<p>
|
|
This flag specifies the maximum total size, in bytes, of the
|
|
blocks in the queue. The default value is one million bytes.
|
|
Increasing this increases the total amount of memory used by
|
|
Valgrind but may detect invalid uses of freed blocks which would
|
|
otherwise go undetected.</li><br><p>
|
|
|
|
<li><code>--logfile-fd=<number></code> [default: 2, stderr]
|
|
<p>Specifies the file descriptor on which Valgrind communicates
|
|
all of its messages. The default, 2, is the standard error
|
|
channel. This may interfere with the client's own use of
|
|
stderr. To dump Valgrind's commentary in a file without using
|
|
stderr, something like the following works well (sh/bash
|
|
syntax):<br>
|
|
<code>
|
|
valgrind --logfile-fd=9 my_prog 9> logfile</code><br>
|
|
That is: tell Valgrind to send all output to file descriptor 9,
|
|
and ask the shell to route file descriptor 9 to "logfile".
|
|
</li><br><p>
|
|
|
|
<li><code>--suppressions=<filename></code>
|
|
[default: $PREFIX/lib/valgrind/default.supp]
|
|
<p>Specifies an extra
|
|
file from which to read descriptions of errors to suppress. You
|
|
may use as many extra suppressions files as you
|
|
like.</li><br><p>
|
|
|
|
<li><code>--leak-check=no</code> [default]<br>
|
|
<code>--leak-check=yes</code>
|
|
<p>When enabled, search for memory leaks when the client program
|
|
finishes. A memory leak means a malloc'd block, which has not
|
|
yet been free'd, but to which no pointer can be found. Such a
|
|
block can never be free'd by the program, since no pointer to it
|
|
exists. Leak checking is disabled by default because it tends
|
|
to generate dozens of error messages. </li><br><p>
|
|
|
|
<li><code>--show-reachable=no</code> [default]<br>
|
|
<code>--show-reachable=yes</code>
|
|
<p>When disabled, the memory leak detector only shows blocks for
|
|
which it cannot find a pointer to at all, or it can only find a
|
|
pointer to the middle of. These blocks are prime candidates for
|
|
memory leaks. When enabled, the leak detector also reports on
|
|
blocks which it could find a pointer to. Your program could, at
|
|
least in principle, have freed such blocks before exit.
|
|
Contrast this to blocks for which no pointer, or only an
|
|
interior pointer could be found: they are more likely to
|
|
indicate memory leaks, because you do not actually have a
|
|
pointer to the start of the block which you can hand to
|
|
<code>free</code>, even if you wanted to. </li><br><p>
|
|
|
|
<li><code>--leak-resolution=low</code> [default]<br>
|
|
<code>--leak-resolution=med</code> <br>
|
|
<code>--leak-resolution=high</code>
|
|
<p>When doing leak checking, determines how willing Valgrind is
|
|
to consider different backtraces to be the same. When set to
|
|
<code>low</code>, the default, only the first two entries need
|
|
match. When <code>med</code>, four entries have to match. When
|
|
<code>high</code>, all entries need to match.
|
|
<p>
|
|
For hardcore leak debugging, you probably want to use
|
|
<code>--leak-resolution=high</code> together with
|
|
<code>--num-callers=40</code> or some such large number. Note
|
|
however that this can give an overwhelming amount of
|
|
information, which is why the defaults are 4 callers and
|
|
low-resolution matching.
|
|
<p>
|
|
Note that the <code>--leak-resolution=</code> setting does not
|
|
affect Valgrind's ability to find leaks. It only changes how
|
|
the results are presented.
|
|
</li><br><p>
|
|
|
|
<li><code>--workaround-gcc296-bugs=no</code> [default]<br>
|
|
<code>--workaround-gcc296-bugs=yes</code> <p>When enabled,
|
|
assume that reads and writes some small distance below the stack
|
|
pointer <code>%esp</code> are due to bugs in gcc 2.96, and does
|
|
not report them. The "small distance" is 256 bytes by default.
|
|
Note that gcc 2.96 is the default compiler on some popular Linux
|
|
distributions (RedHat 7.X, Mandrake) and so you may well need to
|
|
use this flag. Do not use it if you do not have to, as it can
|
|
cause real errors to be overlooked. A better option is to use a
|
|
gcc/g++ which works properly; 2.95.3 seems to be a good choice.
|
|
<p>
|
|
Unfortunately (27 Feb 02) it looks like g++ 3.0.4 is similarly
|
|
buggy, so you may need to issue this flag if you use 3.0.4. A
|
|
while later (early Apr 02) this is confirmed as a scheduling bug
|
|
in g++-3.0.4.
|
|
</li><br><p>
|
|
|
|
<li><code>--cachesim=no</code> [default]<br>
|
|
<code>--cachesim=yes</code> <p>When enabled, turns off memory
|
|
checking, and turns on cache profiling. Cache profiling is
|
|
described in detail in <a href="#cache">Section 7</a>.
|
|
</li><br><p>
|
|
|
|
<li><code>--weird-hacks=hack1,hack2,...</code>
|
|
Pass miscellaneous hints to Valgrind which slightly modify the
|
|
simulated behaviour in nonstandard or dangerous ways, possibly
|
|
to help the simulation of strange features. By default no hacks
|
|
are enabled. Use with caution! Currently known hacks are:
|
|
<p>
|
|
<ul>
|
|
<li><code>ioctl-VTIME</code> Use this if you have a program
|
|
which sets readable file descriptors to have a timeout by
|
|
doing <code>ioctl</code> on them with a
|
|
<code>TCSETA</code>-style command <b>and</b> a non-zero
|
|
<code>VTIME</code> timeout value. This is considered
|
|
potentially dangerous and therefore is not engaged by
|
|
default, because it is (remotely) conceivable that it could
|
|
cause threads doing <code>read</code> to incorrectly block
|
|
the entire process.
|
|
<p>
|
|
You probably want to try this one if you have a program
|
|
which unexpectedly blocks in a <code>read</code> from a file
|
|
descriptor which you know to have been messed with by
|
|
<code>ioctl</code>. This could happen, for example, if the
|
|
descriptor is used to read input from some kind of screen
|
|
handling library.
|
|
<p>
|
|
To find out if your program is blocking unexpectedly in the
|
|
<code>read</code> system call, run with
|
|
<code>--trace-syscalls=yes</code> flag.
|
|
</ul>
|
|
|
|
</li><p>
|
|
</ul>
|
|
|
|
There are also some options for debugging Valgrind itself. You
|
|
shouldn't need to use them in the normal run of things. Nevertheless:
|
|
|
|
<ul>
|
|
|
|
<li><code>--single-step=no</code> [default]<br>
|
|
<code>--single-step=yes</code>
|
|
<p>When enabled, each x86 insn is translated seperately into
|
|
instrumented code. When disabled, translation is done on a
|
|
per-basic-block basis, giving much better translations.</li><br>
|
|
<p>
|
|
|
|
<li><code>--optimise=no</code><br>
|
|
<code>--optimise=yes</code> [default]
|
|
<p>When enabled, various improvements are applied to the
|
|
intermediate code, mainly aimed at allowing the simulated CPU's
|
|
registers to be cached in the real CPU's registers over several
|
|
simulated instructions.</li><br>
|
|
<p>
|
|
|
|
<li><code>--instrument=no</code><br>
|
|
<code>--instrument=yes</code> [default]
|
|
<p>When disabled, the translations don't actually contain any
|
|
instrumentation.</li><br>
|
|
<p>
|
|
|
|
<li><code>--cleanup=no</code><br>
|
|
<code>--cleanup=yes</code> [default]
|
|
<p>When enabled, various improvments are applied to the
|
|
post-instrumented intermediate code, aimed at removing redundant
|
|
value checks.</li><br>
|
|
<p>
|
|
|
|
<li><code>--trace-syscalls=no</code> [default]<br>
|
|
<code>--trace-syscalls=yes</code>
|
|
<p>Enable/disable tracing of system call intercepts.</li><br>
|
|
<p>
|
|
|
|
<li><code>--trace-signals=no</code> [default]<br>
|
|
<code>--trace-signals=yes</code>
|
|
<p>Enable/disable tracing of signal handling.</li><br>
|
|
<p>
|
|
|
|
<li><code>--trace-sched=no</code> [default]<br>
|
|
<code>--trace-sched=yes</code>
|
|
<p>Enable/disable tracing of thread scheduling events.</li><br>
|
|
<p>
|
|
|
|
<li><code>--trace-pthread=none</code> [default]<br>
|
|
<code>--trace-pthread=some</code> <br>
|
|
<code>--trace-pthread=all</code>
|
|
<p>Specifies amount of trace detail for pthread-related events.</li><br>
|
|
<p>
|
|
|
|
<li><code>--trace-symtab=no</code> [default]<br>
|
|
<code>--trace-symtab=yes</code>
|
|
<p>Enable/disable tracing of symbol table reading.</li><br>
|
|
<p>
|
|
|
|
<li><code>--trace-malloc=no</code> [default]<br>
|
|
<code>--trace-malloc=yes</code>
|
|
<p>Enable/disable tracing of malloc/free (et al) intercepts.
|
|
</li><br>
|
|
<p>
|
|
|
|
<li><code>--stop-after=<number></code>
|
|
[default: infinity, more or less]
|
|
<p>After <number> basic blocks have been executed, shut down
|
|
Valgrind and switch back to running the client on the real CPU.
|
|
</li><br>
|
|
<p>
|
|
|
|
<li><code>--dump-error=<number></code> [default: inactive]
|
|
<p>After the program has exited, show gory details of the
|
|
translation of the basic block containing the <number>'th
|
|
error context. When used with <code>--single-step=yes</code>,
|
|
can show the exact x86 instruction causing an error. This is
|
|
all fairly dodgy and doesn't work at all if threads are
|
|
involved.</li><br>
|
|
<p>
|
|
</ul>
|
|
|
|
|
|
<a name="errormsgs">
|
|
<h3>2.6 Explaination of error messages</h3>
|
|
|
|
Despite considerable sophistication under the hood, Valgrind can only
|
|
really detect two kinds of errors, use of illegal addresses, and use
|
|
of undefined values. Nevertheless, this is enough to help you
|
|
discover all sorts of memory-management nasties in your code. This
|
|
section presents a quick summary of what error messages mean. The
|
|
precise behaviour of the error-checking machinery is described in
|
|
<a href="#machine">Section 4</a>.
|
|
|
|
|
|
<h4>2.6.1 Illegal read / Illegal write errors</h4>
|
|
For example:
|
|
<pre>
|
|
Invalid read of size 4
|
|
at 0x40F6BBCC: (within /usr/lib/libpng.so.2.1.0.9)
|
|
by 0x40F6B804: (within /usr/lib/libpng.so.2.1.0.9)
|
|
by 0x40B07FF4: read_png_image__FP8QImageIO (kernel/qpngio.cpp:326)
|
|
by 0x40AC751B: QImageIO::read() (kernel/qimage.cpp:3621)
|
|
Address 0xBFFFF0E0 is not stack'd, malloc'd or free'd
|
|
</pre>
|
|
|
|
<p>This happens when your program reads or writes memory at a place
|
|
which Valgrind reckons it shouldn't. In this example, the program did
|
|
a 4-byte read at address 0xBFFFF0E0, somewhere within the
|
|
system-supplied library libpng.so.2.1.0.9, which was called from
|
|
somewhere else in the same library, called from line 326 of
|
|
qpngio.cpp, and so on.
|
|
|
|
<p>Valgrind tries to establish what the illegal address might relate
|
|
to, since that's often useful. So, if it points into a block of
|
|
memory which has already been freed, you'll be informed of this, and
|
|
also where the block was free'd at. Likewise, if it should turn out
|
|
to be just off the end of a malloc'd block, a common result of
|
|
off-by-one-errors in array subscripting, you'll be informed of this
|
|
fact, and also where the block was malloc'd.
|
|
|
|
<p>In this example, Valgrind can't identify the address. Actually the
|
|
address is on the stack, but, for some reason, this is not a valid
|
|
stack address -- it is below the stack pointer, %esp, and that isn't
|
|
allowed. In this particular case it's probably caused by gcc
|
|
generating invalid code, a known bug in various flavours of gcc.
|
|
|
|
<p>Note that Valgrind only tells you that your program is about to
|
|
access memory at an illegal address. It can't stop the access from
|
|
happening. So, if your program makes an access which normally would
|
|
result in a segmentation fault, you program will still suffer the same
|
|
fate -- but you will get a message from Valgrind immediately prior to
|
|
this. In this particular example, reading junk on the stack is
|
|
non-fatal, and the program stays alive.
|
|
|
|
|
|
<h4>2.6.2 Use of uninitialised values</h4>
|
|
For example:
|
|
<pre>
|
|
Conditional jump or move depends on uninitialised value(s)
|
|
at 0x402DFA94: _IO_vfprintf (_itoa.h:49)
|
|
by 0x402E8476: _IO_printf (printf.c:36)
|
|
by 0x8048472: main (tests/manuel1.c:8)
|
|
by 0x402A6E5E: __libc_start_main (libc-start.c:129)
|
|
</pre>
|
|
|
|
<p>An uninitialised-value use error is reported when your program uses
|
|
a value which hasn't been initialised -- in other words, is undefined.
|
|
Here, the undefined value is used somewhere inside the printf()
|
|
machinery of the C library. This error was reported when running the
|
|
following small program:
|
|
<pre>
|
|
int main()
|
|
{
|
|
int x;
|
|
printf ("x = %d\n", x);
|
|
}
|
|
</pre>
|
|
|
|
<p>It is important to understand that your program can copy around
|
|
junk (uninitialised) data to its heart's content. Valgrind observes
|
|
this and keeps track of the data, but does not complain. A complaint
|
|
is issued only when your program attempts to make use of uninitialised
|
|
data. In this example, x is uninitialised. Valgrind observes the
|
|
value being passed to _IO_printf and thence to _IO_vfprintf, but makes
|
|
no comment. However, _IO_vfprintf has to examine the value of x so it
|
|
can turn it into the corresponding ASCII string, and it is at this
|
|
point that Valgrind complains.
|
|
|
|
<p>Sources of uninitialised data tend to be:
|
|
<ul>
|
|
<li>Local variables in procedures which have not been initialised,
|
|
as in the example above.</li><br><p>
|
|
|
|
<li>The contents of malloc'd blocks, before you write something
|
|
there. In C++, the new operator is a wrapper round malloc, so
|
|
if you create an object with new, its fields will be
|
|
uninitialised until you fill them in, which is only Right and
|
|
Proper.</li>
|
|
</ul>
|
|
|
|
|
|
|
|
<h4>2.6.3 Illegal frees</h4>
|
|
For example:
|
|
<pre>
|
|
Invalid free()
|
|
at 0x4004FFDF: free (ut_clientmalloc.c:577)
|
|
by 0x80484C7: main (tests/doublefree.c:10)
|
|
by 0x402A6E5E: __libc_start_main (libc-start.c:129)
|
|
by 0x80483B1: (within tests/doublefree)
|
|
Address 0x3807F7B4 is 0 bytes inside a block of size 177 free'd
|
|
at 0x4004FFDF: free (ut_clientmalloc.c:577)
|
|
by 0x80484C7: main (tests/doublefree.c:10)
|
|
by 0x402A6E5E: __libc_start_main (libc-start.c:129)
|
|
by 0x80483B1: (within tests/doublefree)
|
|
</pre>
|
|
<p>Valgrind keeps track of the blocks allocated by your program with
|
|
malloc/new, so it can know exactly whether or not the argument to
|
|
free/delete is legitimate or not. Here, this test program has
|
|
freed the same block twice. As with the illegal read/write errors,
|
|
Valgrind attempts to make sense of the address free'd. If, as
|
|
here, the address is one which has previously been freed, you wil
|
|
be told that -- making duplicate frees of the same block easy to spot.
|
|
|
|
|
|
<h4>2.6.4 When a block is freed with an inappropriate
|
|
deallocation function</h4>
|
|
In the following example, a block allocated with <code>new[]</code>
|
|
has wrongly been deallocated with <code>free</code>:
|
|
<pre>
|
|
Mismatched free() / delete / delete []
|
|
at 0x40043249: free (vg_clientfuncs.c:171)
|
|
by 0x4102BB4E: QGArray::~QGArray(void) (tools/qgarray.cpp:149)
|
|
by 0x4C261C41: PptDoc::~PptDoc(void) (include/qmemarray.h:60)
|
|
by 0x4C261F0E: PptXml::~PptXml(void) (pptxml.cc:44)
|
|
Address 0x4BB292A8 is 0 bytes inside a block of size 64 alloc'd
|
|
at 0x4004318C: __builtin_vec_new (vg_clientfuncs.c:152)
|
|
by 0x4C21BC15: KLaola::readSBStream(int) const (klaola.cc:314)
|
|
by 0x4C21C155: KLaola::stream(KLaola::OLENode const *) (klaola.cc:416)
|
|
by 0x4C21788F: OLEFilter::convert(QCString const &) (olefilter.cc:272)
|
|
</pre>
|
|
The following was told to me be the KDE 3 developers. I didn't know
|
|
any of it myself. They also implemented the check itself.
|
|
<p>
|
|
In C++ it's important to deallocate memory in a way compatible with
|
|
how it was allocated. The deal is:
|
|
<ul>
|
|
<li>If allocated with <code>malloc</code>, <code>calloc</code>,
|
|
<code>realloc</code>, <code>valloc</code> or
|
|
<code>memalign</code>, you must deallocate with <code>free</code>.
|
|
<li>If allocated with <code>new[]</code>, you must deallocate with
|
|
<code>delete[]</code>.
|
|
<li>If allocated with <code>new</code>, you must deallocate with
|
|
<code>delete</code>.
|
|
</ul>
|
|
The worst thing is that on Linux apparently it doesn't matter if you
|
|
do muddle these up, and it all seems to work ok, but the same program
|
|
may then crash on a different platform, Solaris for example. So it's
|
|
best to fix it properly. According to the KDE folks "it's amazing how
|
|
many C++ programmers don't know this".
|
|
<p>
|
|
Pascal Massimino adds the following clarification:
|
|
<code>delete[]</code> must be called associated with a
|
|
<code>new[]</code> because the compiler stores the size of the array
|
|
and the pointer-to-member to the destructor of the array's content
|
|
just before the pointer actually returned. This implies a
|
|
variable-sized overhead in what's returned by <code>new</code> or
|
|
<code>new[]</code>. It rather surprising how compilers [Ed:
|
|
runtime-support libraries?] are robust to mismatch in
|
|
<code>new</code>/<code>delete</code>
|
|
<code>new[]</code>/<code>delete[]</code>.
|
|
|
|
|
|
<h4>2.6.5 Passing system call parameters with inadequate
|
|
read/write permissions</h4>
|
|
|
|
Valgrind checks all parameters to system calls. If a system call
|
|
needs to read from a buffer provided by your program, Valgrind checks
|
|
that the entire buffer is addressible and has valid data, ie, it is
|
|
readable. And if the system call needs to write to a user-supplied
|
|
buffer, Valgrind checks that the buffer is addressible. After the
|
|
system call, Valgrind updates its administrative information to
|
|
precisely reflect any changes in memory permissions caused by the
|
|
system call.
|
|
|
|
<p>Here's an example of a system call with an invalid parameter:
|
|
<pre>
|
|
#include <stdlib.h>
|
|
#include <unistd.h>
|
|
int main( void )
|
|
{
|
|
char* arr = malloc(10);
|
|
(void) write( 1 /* stdout */, arr, 10 );
|
|
return 0;
|
|
}
|
|
</pre>
|
|
|
|
<p>You get this complaint ...
|
|
<pre>
|
|
Syscall param write(buf) contains uninitialised or unaddressable byte(s)
|
|
at 0x4035E072: __libc_write
|
|
by 0x402A6E5E: __libc_start_main (libc-start.c:129)
|
|
by 0x80483B1: (within tests/badwrite)
|
|
by <bogus frame pointer> ???
|
|
Address 0x3807E6D0 is 0 bytes inside a block of size 10 alloc'd
|
|
at 0x4004FEE6: malloc (ut_clientmalloc.c:539)
|
|
by 0x80484A0: main (tests/badwrite.c:6)
|
|
by 0x402A6E5E: __libc_start_main (libc-start.c:129)
|
|
by 0x80483B1: (within tests/badwrite)
|
|
</pre>
|
|
|
|
<p>... because the program has tried to write uninitialised junk from
|
|
the malloc'd block to the standard output.
|
|
|
|
|
|
<h4>2.6.6 Warning messages you might see</h4>
|
|
|
|
Most of these only appear if you run in verbose mode (enabled by
|
|
<code>-v</code>):
|
|
<ul>
|
|
<li> <code>More than 50 errors detected. Subsequent errors
|
|
will still be recorded, but in less detail than before.</code>
|
|
<br>
|
|
After 50 different errors have been shown, Valgrind becomes
|
|
more conservative about collecting them. It then requires only
|
|
the program counters in the top two stack frames to match when
|
|
deciding whether or not two errors are really the same one.
|
|
Prior to this point, the PCs in the top four frames are required
|
|
to match. This hack has the effect of slowing down the
|
|
appearance of new errors after the first 50. The 50 constant can
|
|
be changed by recompiling Valgrind.
|
|
<p>
|
|
<li> <code>More than 300 errors detected. I'm not reporting any more.
|
|
Final error counts may be inaccurate. Go fix your
|
|
program!</code>
|
|
<br>
|
|
After 300 different errors have been detected, Valgrind ignores
|
|
any more. It seems unlikely that collecting even more different
|
|
ones would be of practical help to anybody, and it avoids the
|
|
danger that Valgrind spends more and more of its time comparing
|
|
new errors against an ever-growing collection. As above, the 500
|
|
number is a compile-time constant.
|
|
<p>
|
|
<li> <code>Warning: client exiting by calling exit(<number>).
|
|
Bye!</code>
|
|
<br>
|
|
Your program has called the <code>exit</code> system call, which
|
|
will immediately terminate the process. You'll get no exit-time
|
|
error summaries or leak checks. Note that this is not the same
|
|
as your program calling the ANSI C function <code>exit()</code>
|
|
-- that causes a normal, controlled shutdown of Valgrind.
|
|
<p>
|
|
<li> <code>Warning: client switching stacks?</code>
|
|
<br>
|
|
Valgrind spotted such a large change in the stack pointer, %esp,
|
|
that it guesses the client is switching to a different stack.
|
|
At this point it makes a kludgey guess where the base of the new
|
|
stack is, and sets memory permissions accordingly. You may get
|
|
many bogus error messages following this, if Valgrind guesses
|
|
wrong. At the moment "large change" is defined as a change of
|
|
more that 2000000 in the value of the %esp (stack pointer)
|
|
register.
|
|
<p>
|
|
<li> <code>Warning: client attempted to close Valgrind's logfile fd <number>
|
|
</code>
|
|
<br>
|
|
Valgrind doesn't allow the client
|
|
to close the logfile, because you'd never see any diagnostic
|
|
information after that point. If you see this message,
|
|
you may want to use the <code>--logfile-fd=<number></code>
|
|
option to specify a different logfile file-descriptor number.
|
|
<p>
|
|
<li> <code>Warning: noted but unhandled ioctl <number></code>
|
|
<br>
|
|
Valgrind observed a call to one of the vast family of
|
|
<code>ioctl</code> system calls, but did not modify its
|
|
memory status info (because I have not yet got round to it).
|
|
The call will still have gone through, but you may get spurious
|
|
errors after this as a result of the non-update of the memory info.
|
|
<p>
|
|
<li> <code>Warning: unblocking signal <number> due to
|
|
sigprocmask</code>
|
|
<br>
|
|
Really just a diagnostic from the signal simulation machinery.
|
|
This message will appear if your program handles a signal by
|
|
first <code>longjmp</code>ing out of the signal handler,
|
|
and then unblocking the signal with <code>sigprocmask</code>
|
|
-- a standard signal-handling idiom.
|
|
<p>
|
|
<li> <code>Warning: bad signal number <number> in __NR_sigaction.</code>
|
|
<br>
|
|
Probably indicates a bug in the signal simulation machinery.
|
|
<p>
|
|
<li> <code>Warning: set address range perms: large range <number></code>
|
|
<br>
|
|
Diagnostic message, mostly for my benefit, to do with memory
|
|
permissions.
|
|
</ul>
|
|
|
|
|
|
<a name="suppfiles"></a>
|
|
<h3>2.7 Writing suppressions files</h3>
|
|
|
|
A suppression file describes a bunch of errors which, for one reason
|
|
or another, you don't want Valgrind to tell you about. Usually the
|
|
reason is that the system libraries are buggy but unfixable, at least
|
|
within the scope of the current debugging session. Multiple
|
|
suppresions files are allowed. By default, Valgrind uses
|
|
<code>$PREFIX/lib/valgrind/default.supp</code>.
|
|
|
|
<p>
|
|
You can ask to add suppressions from another file, by specifying
|
|
<code>--suppressions=/path/to/file.supp</code>.
|
|
|
|
<p>Each suppression has the following components:<br>
|
|
<ul>
|
|
|
|
<li>Its name. This merely gives a handy name to the suppression, by
|
|
which it is referred to in the summary of used suppressions
|
|
printed out when a program finishes. It's not important what
|
|
the name is; any identifying string will do.
|
|
<p>
|
|
|
|
<li>The nature of the error to suppress. Either:
|
|
<code>Value1</code>,
|
|
<code>Value2</code>,
|
|
<code>Value4</code> or
|
|
<code>Value8</code>,
|
|
meaning an uninitialised-value error when
|
|
using a value of 1, 2, 4 or 8 bytes.
|
|
Or
|
|
<code>Cond</code> (or its old name, <code>Value0</code>),
|
|
meaning use of an uninitialised CPU condition code. Or:
|
|
<code>Addr1</code>,
|
|
<code>Addr2</code>,
|
|
<code>Addr4</code> or
|
|
<code>Addr8</code>, meaning an invalid address during a
|
|
memory access of 1, 2, 4 or 8 bytes respectively. Or
|
|
<code>Param</code>,
|
|
meaning an invalid system call parameter error. Or
|
|
<code>Free</code>, meaning an invalid or mismatching free.</li><br>
|
|
<p>
|
|
|
|
<li>The "immediate location" specification. For Value and Addr
|
|
errors, is either the name of the function in which the error
|
|
occurred, or, failing that, the full path the the .so file
|
|
containing the error location. For Param errors, is the name of
|
|
the offending system call parameter. For Free errors, is the
|
|
name of the function doing the freeing (eg, <code>free</code>,
|
|
<code>__builtin_vec_delete</code>, etc)</li><br>
|
|
<p>
|
|
|
|
<li>The caller of the above "immediate location". Again, either a
|
|
function or shared-object name.</li><br>
|
|
<p>
|
|
|
|
<li>Optionally, one or two extra calling-function or object names,
|
|
for greater precision.</li>
|
|
</ul>
|
|
|
|
<p>
|
|
Locations may be either names of shared objects or wildcards matching
|
|
function names. They begin <code>obj:</code> and <code>fun:</code>
|
|
respectively. Function and object names to match against may use the
|
|
wildcard characters <code>*</code> and <code>?</code>.
|
|
|
|
A suppression only suppresses an error when the error matches all the
|
|
details in the suppression. Here's an example:
|
|
<pre>
|
|
{
|
|
__gconv_transform_ascii_internal/__mbrtowc/mbtowc
|
|
Value4
|
|
fun:__gconv_transform_ascii_internal
|
|
fun:__mbr*toc
|
|
fun:mbtowc
|
|
}
|
|
</pre>
|
|
|
|
<p>What is means is: suppress a use-of-uninitialised-value error, when
|
|
the data size is 4, when it occurs in the function
|
|
<code>__gconv_transform_ascii_internal</code>, when that is called
|
|
from any function of name matching <code>__mbr*toc</code>,
|
|
when that is called from
|
|
<code>mbtowc</code>. It doesn't apply under any other circumstances.
|
|
The string by which this suppression is identified to the user is
|
|
__gconv_transform_ascii_internal/__mbrtowc/mbtowc.
|
|
|
|
<p>Another example:
|
|
<pre>
|
|
{
|
|
libX11.so.6.2/libX11.so.6.2/libXaw.so.7.0
|
|
Value4
|
|
obj:/usr/X11R6/lib/libX11.so.6.2
|
|
obj:/usr/X11R6/lib/libX11.so.6.2
|
|
obj:/usr/X11R6/lib/libXaw.so.7.0
|
|
}
|
|
</pre>
|
|
|
|
<p>Suppress any size 4 uninitialised-value error which occurs anywhere
|
|
in <code>libX11.so.6.2</code>, when called from anywhere in the same
|
|
library, when called from anywhere in <code>libXaw.so.7.0</code>. The
|
|
inexact specification of locations is regrettable, but is about all
|
|
you can hope for, given that the X11 libraries shipped with Red Hat
|
|
7.2 have had their symbol tables removed.
|
|
|
|
<p>Note -- since the above two examples did not make it clear -- that
|
|
you can freely mix the <code>obj:</code> and <code>fun:</code>
|
|
styles of description within a single suppression record.
|
|
|
|
|
|
<a name="clientreq"></a>
|
|
<h3>2.8 The Client Request mechanism</h3>
|
|
|
|
Valgrind has a trapdoor mechanism via which the client program can
|
|
pass all manner of requests and queries to Valgrind. Internally, this
|
|
is used extensively to make malloc, free, signals, threads, etc, work,
|
|
although you don't see that.
|
|
<p>
|
|
For your convenience, a subset of these so-called client requests is
|
|
provided to allow you to tell Valgrind facts about the behaviour of
|
|
your program, and conversely to make queries. In particular, your
|
|
program can tell Valgrind about changes in memory range permissions
|
|
that Valgrind would not otherwise know about, and so allows clients to
|
|
get Valgrind to do arbitrary custom checks.
|
|
<p>
|
|
Clients need to include the header file <code>valgrind.h</code> to
|
|
make this work. The macros therein have the magical property that
|
|
they generate code in-line which Valgrind can spot. However, the code
|
|
does nothing when not run on Valgrind, so you are not forced to run
|
|
your program on Valgrind just because you use the macros in this file.
|
|
Also, you are not required to link your program with any extra
|
|
supporting libraries.
|
|
<p>
|
|
A brief description of the available macros:
|
|
<ul>
|
|
<li><code>VALGRIND_MAKE_NOACCESS</code>,
|
|
<code>VALGRIND_MAKE_WRITABLE</code> and
|
|
<code>VALGRIND_MAKE_READABLE</code>. These mark address
|
|
ranges as completely inaccessible, accessible but containing
|
|
undefined data, and accessible and containing defined data,
|
|
respectively. Subsequent errors may have their faulting
|
|
addresses described in terms of these blocks. Returns a
|
|
"block handle". Returns zero when not run on Valgrind.
|
|
<p>
|
|
<li><code>VALGRIND_DISCARD</code>: At some point you may want
|
|
Valgrind to stop reporting errors in terms of the blocks
|
|
defined by the previous three macros. To do this, the above
|
|
macros return a small-integer "block handle". You can pass
|
|
this block handle to <code>VALGRIND_DISCARD</code>. After
|
|
doing so, Valgrind will no longer be able to relate
|
|
addressing errors to the user-defined block associated with
|
|
the handle. The permissions settings associated with the
|
|
handle remain in place; this just affects how errors are
|
|
reported, not whether they are reported. Returns 1 for an
|
|
invalid handle and 0 for a valid handle (although passing
|
|
invalid handles is harmless). Always returns 0 when not run
|
|
on Valgrind.
|
|
<p>
|
|
<li><code>VALGRIND_CHECK_NOACCESS</code>,
|
|
<code>VALGRIND_CHECK_WRITABLE</code> and
|
|
<code>VALGRIND_CHECK_READABLE</code>: check immediately
|
|
whether or not the given address range has the relevant
|
|
property, and if not, print an error message. Also, for the
|
|
convenience of the client, returns zero if the relevant
|
|
property holds; otherwise, the returned value is the address
|
|
of the first byte for which the property is not true.
|
|
Always returns 0 when not run on Valgrind.
|
|
<p>
|
|
<li><code>VALGRIND_CHECK_NOACCESS</code>: a quick and easy way
|
|
to find out whether Valgrind thinks a particular variable
|
|
(lvalue, to be precise) is addressible and defined. Prints
|
|
an error message if not. Returns no value.
|
|
<p>
|
|
<li><code>VALGRIND_MAKE_NOACCESS_STACK</code>: a highly
|
|
experimental feature. Similarly to
|
|
<code>VALGRIND_MAKE_NOACCESS</code>, this marks an address
|
|
range as inaccessible, so that subsequent accesses to an
|
|
address in the range gives an error. However, this macro
|
|
does not return a block handle. Instead, all annotations
|
|
created like this are reviewed at each client
|
|
<code>ret</code> (subroutine return) instruction, and those
|
|
which now define an address range block the client's stack
|
|
pointer register (<code>%esp</code>) are automatically
|
|
deleted.
|
|
<p>
|
|
In other words, this macro allows the client to tell
|
|
Valgrind about red-zones on its own stack. Valgrind
|
|
automatically discards this information when the stack
|
|
retreats past such blocks. Beware: hacky and flaky, and
|
|
probably interacts badly with the new pthread support.
|
|
<p>
|
|
<li><code>RUNNING_ON_VALGRIND</code>: returns 1 if running on
|
|
Valgrind, 0 if running on the real CPU.
|
|
<p>
|
|
<li><code>VALGRIND_DO_LEAK_CHECK</code>: run the memory leak detector
|
|
right now. Returns no value. I guess this could be used to
|
|
incrementally check for leaks between arbitrary places in the
|
|
program's execution. Warning: not properly tested!
|
|
<p>
|
|
<li><code>VALGRIND_DISCARD_TRANSLATIONS</code>: discard translations
|
|
of code in the specified address range. Useful if you are
|
|
debugging a JITter or some other dynamic code generation system.
|
|
After this call, attempts to execute code in the invalidated
|
|
address range will cause valgrind to make new translations of that
|
|
code, which is probably the semantics you want. Note that this is
|
|
implemented naively, and involves checking all 200191 entries in
|
|
the translation table to see if any of them overlap the specified
|
|
address range. So try not to call it often, or performance will
|
|
nosedive. Note that you can be clever about this: you only need
|
|
to call it when an area which previously contained code is
|
|
overwritten with new code. You can choose to write code into
|
|
fresh memory, and just call this occasionally to discard large
|
|
chunks of old code all at once.
|
|
<p>
|
|
Warning: minimally tested. Also, doesn't interact well with the
|
|
cache simulator.
|
|
</ul>
|
|
<p>
|
|
|
|
|
|
<a name="pthreads"></a>
|
|
<h3>2.9 Support for POSIX Pthreads</h3>
|
|
|
|
As of late April 02, Valgrind supports programs which use POSIX
|
|
pthreads. Doing this has proved technically challenging and is still
|
|
in progress, but it works well enough, for significant threaded
|
|
applications to work.
|
|
<p>
|
|
It works as follows: threaded apps are (dynamically) linked against
|
|
<code>libpthread.so</code>. Usually this is the one installed with
|
|
your Linux distribution. Valgrind, however, supplies its own
|
|
<code>libpthread.so</code> and automatically connects your program to
|
|
it instead.
|
|
<p>
|
|
The fake <code>libpthread.so</code> and Valgrind cooperate to
|
|
implement a user-space pthreads package. This approach avoids the
|
|
horrible implementation problems of implementing a truly
|
|
multiprocessor version of Valgrind, but it does mean that threaded
|
|
apps run only on one CPU, even if you have a multiprocessor machine.
|
|
<p>
|
|
Valgrind schedules your threads in a round-robin fashion, with all
|
|
threads having equal priority. It switches threads every 20000 basic
|
|
blocks (typically around 120000 x86 instructions), which means you'll
|
|
get a much finer interleaving of thread executions than when run
|
|
natively. This in itself may cause your program to behave differently
|
|
if you have some kind of concurrency, critical race, locking, or
|
|
similar, bugs.
|
|
<p>
|
|
The current (18 May 02) state of pthread support is as follows. Please
|
|
note that things are advancing rapidly, so the situation may have
|
|
improved by the time you read this -- check the web site for further
|
|
updates.
|
|
<ul>
|
|
<li>Mutexes, condition variables, thread-specific data,
|
|
<code>pthread_once</code> and basic semaphore functions
|
|
(<code>sem_*</code>) currently work.
|
|
<p>
|
|
<li>Various attribute-like calls are handled but ignored.
|
|
You get a warning message.
|
|
<p>
|
|
<li>The main big omission is proper cleanup support for cancellation.
|
|
<code>pthread_cancel</code> works, but instantly nukes the target
|
|
thread without giving it any chance to clean up. Also, when a
|
|
thread exits, it does not run any cleanup handlers.
|
|
<p>
|
|
<li>Other omissions are: the detachedness state of threads is ignored.
|
|
This means detached threads hang around and clog up scheduler
|
|
slots forever when they finish. Calls for reader-writer locks are
|
|
have dummy stubs with no functionality right now. You get a
|
|
warning message.
|
|
<p>
|
|
<li>Currently the following syscalls are thread-safe (nonblocking):
|
|
<code>write</code> <code>read</code> <code>nanosleep</code>
|
|
<code>sleep</code> <code>select</code> and <code>poll</code>.
|
|
<p>
|
|
<li>The POSIX requirement that each thread have its own
|
|
signal-blocking mask is now implemented.
|
|
<code>pthread_sigmask</code>, <code>pthread_kill</code>,
|
|
<code>pthread_sigwait</code> and <code>raise</code> should all now
|
|
work as POSIX requires.
|
|
</ul>
|
|
|
|
|
|
As of 18 May 02, the following threaded programs now work fine on my
|
|
RedHat 7.2 box: Opera 6.0Beta2, KNode in KDE 3.0, Mozilla-0.9.2.1 and
|
|
Galeon-0.11.3, both as supplied with RedHat 7.2. Also Mozilla 1.0RC2.
|
|
|
|
|
|
<a name="install"></a>
|
|
<h3>2.10 Building and installing</h3>
|
|
|
|
We now use the standard Unix <code>./configure</code>,
|
|
<code>make</code>, <code>make install</code> mechanism, and I have
|
|
attempted to ensure that it works on machines with kernel 2.2 or 2.4
|
|
and glibc 2.1.X or 2.2.X. I don't think there is much else to say.
|
|
There are no options apart from the usual <code>--prefix</code> that
|
|
you should give to <code>./configure</code>.
|
|
<p>
|
|
Let me know if you have build problems.
|
|
|
|
|
|
|
|
<a name="problems"></a>
|
|
<h3>2.11 If you have problems</h3>
|
|
Mail me (<a href="mailto:jseward@acm.org">jseward@acm.org</a>).
|
|
|
|
<p>See <a href="#limits">Section 4</a> for the known limitations of
|
|
Valgrind, and for a list of programs which are known not to work on
|
|
it.
|
|
|
|
<p>The translator/instrumentor has a lot of assertions in it. They
|
|
are permanently enabled, and I have no plans to disable them. If one
|
|
of these breaks, please mail me!
|
|
|
|
<p>If you get an assertion failure on the expression
|
|
<code>chunkSane(ch)</code> in <code>vg_free()</code> in
|
|
<code>vg_malloc.c</code>, this may have happened because your program
|
|
wrote off the end of a malloc'd block, or before its beginning.
|
|
Valgrind should have emitted a proper message to that effect before
|
|
dying in this way. This is a known problem which I should fix.
|
|
<p>
|
|
|
|
<hr width="100%">
|
|
|
|
<a name="machine"></a>
|
|
<h2>3 Details of the checking machinery</h2>
|
|
|
|
Read this section if you want to know, in detail, exactly what and how
|
|
Valgrind is checking.
|
|
|
|
<a name="vvalue"></a>
|
|
<h3>3.1 Valid-value (V) bits</h3>
|
|
|
|
It is simplest to think of Valgrind implementing a synthetic Intel x86
|
|
CPU which is identical to a real CPU, except for one crucial detail.
|
|
Every bit (literally) of data processed, stored and handled by the
|
|
real CPU has, in the synthetic CPU, an associated "valid-value" bit,
|
|
which says whether or not the accompanying bit has a legitimate value.
|
|
In the discussions which follow, this bit is referred to as the V
|
|
(valid-value) bit.
|
|
|
|
<p>Each byte in the system therefore has a 8 V bits which follow
|
|
it wherever it goes. For example, when the CPU loads a word-size item
|
|
(4 bytes) from memory, it also loads the corresponding 32 V bits from
|
|
a bitmap which stores the V bits for the process' entire address
|
|
space. If the CPU should later write the whole or some part of that
|
|
value to memory at a different address, the relevant V bits will be
|
|
stored back in the V-bit bitmap.
|
|
|
|
<p>In short, each bit in the system has an associated V bit, which
|
|
follows it around everywhere, even inside the CPU. Yes, the CPU's
|
|
(integer and <code>%eflags</code>) registers have their own V bit
|
|
vectors.
|
|
|
|
<p>Copying values around does not cause Valgrind to check for, or
|
|
report on, errors. However, when a value is used in a way which might
|
|
conceivably affect the outcome of your program's computation, the
|
|
associated V bits are immediately checked. If any of these indicate
|
|
that the value is undefined, an error is reported.
|
|
|
|
<p>Here's an (admittedly nonsensical) example:
|
|
<pre>
|
|
int i, j;
|
|
int a[10], b[10];
|
|
for (i = 0; i < 10; i++) {
|
|
j = a[i];
|
|
b[i] = j;
|
|
}
|
|
</pre>
|
|
|
|
<p>Valgrind emits no complaints about this, since it merely copies
|
|
uninitialised values from <code>a[]</code> into <code>b[]</code>, and
|
|
doesn't use them in any way. However, if the loop is changed to
|
|
<pre>
|
|
for (i = 0; i < 10; i++) {
|
|
j += a[i];
|
|
}
|
|
if (j == 77)
|
|
printf("hello there\n");
|
|
</pre>
|
|
then Valgrind will complain, at the <code>if</code>, that the
|
|
condition depends on uninitialised values.
|
|
|
|
<p>Most low level operations, such as adds, cause Valgrind to
|
|
use the V bits for the operands to calculate the V bits for the
|
|
result. Even if the result is partially or wholly undefined,
|
|
it does not complain.
|
|
|
|
<p>Checks on definedness only occur in two places: when a value is
|
|
used to generate a memory address, and where control flow decision
|
|
needs to be made. Also, when a system call is detected, valgrind
|
|
checks definedness of parameters as required.
|
|
|
|
<p>If a check should detect undefinedness, an error message is
|
|
issued. The resulting value is subsequently regarded as well-defined.
|
|
To do otherwise would give long chains of error messages. In effect,
|
|
we say that undefined values are non-infectious.
|
|
|
|
<p>This sounds overcomplicated. Why not just check all reads from
|
|
memory, and complain if an undefined value is loaded into a CPU register?
|
|
Well, that doesn't work well, because perfectly legitimate C programs routinely
|
|
copy uninitialised values around in memory, and we don't want endless complaints
|
|
about that. Here's the canonical example. Consider a struct
|
|
like this:
|
|
<pre>
|
|
struct S { int x; char c; };
|
|
struct S s1, s2;
|
|
s1.x = 42;
|
|
s1.c = 'z';
|
|
s2 = s1;
|
|
</pre>
|
|
|
|
<p>The question to ask is: how large is <code>struct S</code>, in
|
|
bytes? An int is 4 bytes and a char one byte, so perhaps a struct S
|
|
occupies 5 bytes? Wrong. All (non-toy) compilers I know of will
|
|
round the size of <code>struct S</code> up to a whole number of words,
|
|
in this case 8 bytes. Not doing this forces compilers to generate
|
|
truly appalling code for subscripting arrays of <code>struct
|
|
S</code>'s.
|
|
|
|
<p>So s1 occupies 8 bytes, yet only 5 of them will be initialised.
|
|
For the assignment <code>s2 = s1</code>, gcc generates code to copy
|
|
all 8 bytes wholesale into <code>s2</code> without regard for their
|
|
meaning. If Valgrind simply checked values as they came out of
|
|
memory, it would yelp every time a structure assignment like this
|
|
happened. So the more complicated semantics described above is
|
|
necessary. This allows gcc to copy <code>s1</code> into
|
|
<code>s2</code> any way it likes, and a warning will only be emitted
|
|
if the uninitialised values are later used.
|
|
|
|
<p>One final twist to this story. The above scheme allows garbage to
|
|
pass through the CPU's integer registers without complaint. It does
|
|
this by giving the integer registers V tags, passing these around in
|
|
the expected way. This complicated and computationally expensive to
|
|
do, but is necessary. Valgrind is more simplistic about
|
|
floating-point loads and stores. In particular, V bits for data read
|
|
as a result of floating-point loads are checked at the load
|
|
instruction. So if your program uses the floating-point registers to
|
|
do memory-to-memory copies, you will get complaints about
|
|
uninitialised values. Fortunately, I have not yet encountered a
|
|
program which (ab)uses the floating-point registers in this way.
|
|
|
|
<a name="vaddress"></a>
|
|
<h3>3.2 Valid-address (A) bits</h3>
|
|
|
|
Notice that the previous section describes how the validity of values
|
|
is established and maintained without having to say whether the
|
|
program does or does not have the right to access any particular
|
|
memory location. We now consider the latter issue.
|
|
|
|
<p>As described above, every bit in memory or in the CPU has an
|
|
associated valid-value (V) bit. In addition, all bytes in memory, but
|
|
not in the CPU, have an associated valid-address (A) bit. This
|
|
indicates whether or not the program can legitimately read or write
|
|
that location. It does not give any indication of the validity or the
|
|
data at that location -- that's the job of the V bits -- only whether
|
|
or not the location may be accessed.
|
|
|
|
<p>Every time your program reads or writes memory, Valgrind checks the
|
|
A bits associated with the address. If any of them indicate an
|
|
invalid address, an error is emitted. Note that the reads and writes
|
|
themselves do not change the A bits, only consult them.
|
|
|
|
<p>So how do the A bits get set/cleared? Like this:
|
|
|
|
<ul>
|
|
<li>When the program starts, all the global data areas are marked as
|
|
accessible.</li><br>
|
|
<p>
|
|
|
|
<li>When the program does malloc/new, the A bits for the exactly the
|
|
area allocated, and not a byte more, are marked as accessible.
|
|
Upon freeing the area the A bits are changed to indicate
|
|
inaccessibility.</li><br>
|
|
<p>
|
|
|
|
<li>When the stack pointer register (%esp) moves up or down, A bits
|
|
are set. The rule is that the area from %esp up to the base of
|
|
the stack is marked as accessible, and below %esp is
|
|
inaccessible. (If that sounds illogical, bear in mind that the
|
|
stack grows down, not up, on almost all Unix systems, including
|
|
GNU/Linux.) Tracking %esp like this has the useful side-effect
|
|
that the section of stack used by a function for local variables
|
|
etc is automatically marked accessible on function entry and
|
|
inaccessible on exit.</li><br>
|
|
<p>
|
|
|
|
<li>When doing system calls, A bits are changed appropriately. For
|
|
example, mmap() magically makes files appear in the process's
|
|
address space, so the A bits must be updated if mmap()
|
|
succeeds.</li><br>
|
|
<p>
|
|
|
|
<li>Optionally, your program can tell Valgrind about such changes
|
|
explicitly, using the client request mechanism described above.
|
|
</ul>
|
|
|
|
|
|
<a name="together"></a>
|
|
<h3>3.3 Putting it all together</h3>
|
|
Valgrind's checking machinery can be summarised as follows:
|
|
|
|
<ul>
|
|
<li>Each byte in memory has 8 associated V (valid-value) bits,
|
|
saying whether or not the byte has a defined value, and a single
|
|
A (valid-address) bit, saying whether or not the program
|
|
currently has the right to read/write that address.</li><br>
|
|
<p>
|
|
|
|
<li>When memory is read or written, the relevant A bits are
|
|
consulted. If they indicate an invalid address, Valgrind emits
|
|
an Invalid read or Invalid write error.</li><br>
|
|
<p>
|
|
|
|
<li>When memory is read into the CPU's integer registers, the
|
|
relevant V bits are fetched from memory and stored in the
|
|
simulated CPU. They are not consulted.</li><br>
|
|
<p>
|
|
|
|
<li>When an integer register is written out to memory, the V bits
|
|
for that register are written back to memory too.</li><br>
|
|
<p>
|
|
|
|
<li>When memory is read into the CPU's floating point registers, the
|
|
relevant V bits are read from memory and they are immediately
|
|
checked. If any are invalid, an uninitialised value error is
|
|
emitted. This precludes using the floating-point registers to
|
|
copy possibly-uninitialised memory, but simplifies Valgrind in
|
|
that it does not have to track the validity status of the
|
|
floating-point registers.</li><br>
|
|
<p>
|
|
|
|
<li>As a result, when a floating-point register is written to
|
|
memory, the associated V bits are set to indicate a valid
|
|
value.</li><br>
|
|
<p>
|
|
|
|
<li>When values in integer CPU registers are used to generate a
|
|
memory address, or to determine the outcome of a conditional
|
|
branch, the V bits for those values are checked, and an error
|
|
emitted if any of them are undefined.</li><br>
|
|
<p>
|
|
|
|
<li>When values in integer CPU registers are used for any other
|
|
purpose, Valgrind computes the V bits for the result, but does
|
|
not check them.</li><br>
|
|
<p>
|
|
|
|
<li>One the V bits for a value in the CPU have been checked, they
|
|
are then set to indicate validity. This avoids long chains of
|
|
errors.</li><br>
|
|
<p>
|
|
|
|
<li>When values are loaded from memory, valgrind checks the A bits
|
|
for that location and issues an illegal-address warning if
|
|
needed. In that case, the V bits loaded are forced to indicate
|
|
Valid, despite the location being invalid.
|
|
<p>
|
|
This apparently strange choice reduces the amount of confusing
|
|
information presented to the user. It avoids the
|
|
unpleasant phenomenon in which memory is read from a place which
|
|
is both unaddressible and contains invalid values, and, as a
|
|
result, you get not only an invalid-address (read/write) error,
|
|
but also a potentially large set of uninitialised-value errors,
|
|
one for every time the value is used.
|
|
<p>
|
|
There is a hazy boundary case to do with multi-byte loads from
|
|
addresses which are partially valid and partially invalid. See
|
|
details of the flag <code>--partial-loads-ok</code> for details.
|
|
</li><br>
|
|
</ul>
|
|
|
|
Valgrind intercepts calls to malloc, calloc, realloc, valloc,
|
|
memalign, free, new and delete. The behaviour you get is:
|
|
|
|
<ul>
|
|
|
|
<li>malloc/new: the returned memory is marked as addressible but not
|
|
having valid values. This means you have to write on it before
|
|
you can read it.</li><br>
|
|
<p>
|
|
|
|
<li>calloc: returned memory is marked both addressible and valid,
|
|
since calloc() clears the area to zero.</li><br>
|
|
<p>
|
|
|
|
<li>realloc: if the new size is larger than the old, the new section
|
|
is addressible but invalid, as with malloc.</li><br>
|
|
<p>
|
|
|
|
<li>If the new size is smaller, the dropped-off section is marked as
|
|
unaddressible. You may only pass to realloc a pointer
|
|
previously issued to you by malloc/calloc/new/realloc.</li><br>
|
|
<p>
|
|
|
|
<li>free/delete: you may only pass to free a pointer previously
|
|
issued to you by malloc/calloc/new/realloc, or the value
|
|
NULL. Otherwise, Valgrind complains. If the pointer is indeed
|
|
valid, Valgrind marks the entire area it points at as
|
|
unaddressible, and places the block in the freed-blocks-queue.
|
|
The aim is to defer as long as possible reallocation of this
|
|
block. Until that happens, all attempts to access it will
|
|
elicit an invalid-address error, as you would hope.</li><br>
|
|
</ul>
|
|
|
|
|
|
|
|
<a name="signals"></a>
|
|
<h3>3.4 Signals</h3>
|
|
|
|
Valgrind provides suitable handling of signals, so, provided you stick
|
|
to POSIX stuff, you should be ok. Basic sigaction() and sigprocmask()
|
|
are handled. Signal handlers may return in the normal way or do
|
|
longjmp(); both should work ok. As specified by POSIX, a signal is
|
|
blocked in its own handler. Default actions for signals should work
|
|
as before. Etc, etc.
|
|
|
|
<p>Under the hood, dealing with signals is a real pain, and Valgrind's
|
|
simulation leaves much to be desired. If your program does
|
|
way-strange stuff with signals, bad things may happen. If so, let me
|
|
know. I don't promise to fix it, but I'd at least like to be aware of
|
|
it.
|
|
|
|
|
|
<a name="leaks"><a/>
|
|
<h3>3.5 Memory leak detection</h3>
|
|
|
|
Valgrind keeps track of all memory blocks issued in response to calls
|
|
to malloc/calloc/realloc/new. So when the program exits, it knows
|
|
which blocks are still outstanding -- have not been returned, in other
|
|
words. Ideally, you want your program to have no blocks still in use
|
|
at exit. But many programs do.
|
|
|
|
<p>For each such block, Valgrind scans the entire address space of the
|
|
process, looking for pointers to the block. One of three situations
|
|
may result:
|
|
|
|
<ul>
|
|
<li>A pointer to the start of the block is found. This usually
|
|
indicates programming sloppiness; since the block is still
|
|
pointed at, the programmer could, at least in principle, free'd
|
|
it before program exit.</li><br>
|
|
<p>
|
|
|
|
<li>A pointer to the interior of the block is found. The pointer
|
|
might originally have pointed to the start and have been moved
|
|
along, or it might be entirely unrelated. Valgrind deems such a
|
|
block as "dubious", that is, possibly leaked,
|
|
because it's unclear whether or
|
|
not a pointer to it still exists.</li><br>
|
|
<p>
|
|
|
|
<li>The worst outcome is that no pointer to the block can be found.
|
|
The block is classified as "leaked", because the
|
|
programmer could not possibly have free'd it at program exit,
|
|
since no pointer to it exists. This might be a symptom of
|
|
having lost the pointer at some earlier point in the
|
|
program.</li>
|
|
</ul>
|
|
|
|
Valgrind reports summaries about leaked and dubious blocks.
|
|
For each such block, it will also tell you where the block was
|
|
allocated. This should help you figure out why the pointer to it has
|
|
been lost. In general, you should attempt to ensure your programs do
|
|
not have any leaked or dubious blocks at exit.
|
|
|
|
<p>The precise area of memory in which Valgrind searches for pointers
|
|
is: all naturally-aligned 4-byte words for which all A bits indicate
|
|
addressibility and all V bits indicated that the stored value is
|
|
actually valid.
|
|
|
|
<p><hr width="100%">
|
|
|
|
|
|
<a name="limits"></a>
|
|
<h2>4 Limitations</h2>
|
|
|
|
The following list of limitations seems depressingly long. However,
|
|
most programs actually work fine.
|
|
|
|
<p>Valgrind will run x86-GNU/Linux ELF dynamically linked binaries, on
|
|
a kernel 2.2.X or 2.4.X system, subject to the following constraints:
|
|
|
|
<ul>
|
|
<li>No MMX, SSE, SSE2, 3DNow instructions. If the translator
|
|
encounters these, Valgrind will simply give up. It may be
|
|
possible to add support for them at a later time. Intel added a
|
|
few instructions such as "cmov" to the integer instruction set
|
|
on Pentium and later processors, and these are supported.
|
|
Nevertheless it's safest to think of Valgrind as implementing
|
|
the 486 instruction set.</li><br>
|
|
<p>
|
|
|
|
<li>Pthreads support is improving, but there are still significant
|
|
limitations in that department. See the section above on
|
|
Pthreads. Note that your program must be dynamically linked
|
|
against <code>libpthread.so</code>, so that Valgrind can
|
|
substitute its own implementation at program startup time. If
|
|
you're statically linked against it, things will fail
|
|
badly.</li><br>
|
|
<p>
|
|
|
|
<li>Valgrind assumes that the floating point registers are not used
|
|
as intermediaries in memory-to-memory copies, so it immediately
|
|
checks V bits in floating-point loads/stores. If you want to
|
|
write code which copies around possibly-uninitialised values,
|
|
you must ensure these travel through the integer registers, not
|
|
the FPU.</li><br>
|
|
<p>
|
|
|
|
<li>If your program does its own memory management, rather than
|
|
using malloc/new/free/delete, it should still work, but
|
|
Valgrind's error checking won't be so effective.</li><br>
|
|
<p>
|
|
|
|
<li>Valgrind's signal simulation is not as robust as it could be.
|
|
Basic POSIX-compliant sigaction and sigprocmask functionality is
|
|
supplied, but it's conceivable that things could go badly awry
|
|
if you do wierd things with signals. Workaround: don't.
|
|
Programs that do non-POSIX signal tricks are in any case
|
|
inherently unportable, so should be avoided if
|
|
possible.</li><br>
|
|
<p>
|
|
|
|
<li>Programs which switch stacks are not well handled. Valgrind
|
|
does have support for this, but I don't have great faith in it.
|
|
It's difficult -- there's no cast-iron way to decide whether a
|
|
large change in %esp is as a result of the program switching
|
|
stacks, or merely allocating a large object temporarily on the
|
|
current stack -- yet Valgrind needs to handle the two situations
|
|
differently. 1 May 02: this probably interacts badly with the
|
|
new pthread support. I haven't checked properly.</li><br>
|
|
<p>
|
|
|
|
<li>x86 instructions, and system calls, have been implemented on
|
|
demand. So it's possible, although unlikely, that a program
|
|
will fall over with a message to that effect. If this happens,
|
|
please mail me ALL the details printed out, so I can try and
|
|
implement the missing feature.</li><br>
|
|
<p>
|
|
|
|
<li>x86 floating point works correctly, but floating-point code may
|
|
run even more slowly than integer code, due to my simplistic
|
|
approach to FPU emulation.</li><br>
|
|
<p>
|
|
|
|
<li>You can't Valgrind-ize statically linked binaries. Valgrind
|
|
relies on the dynamic-link mechanism to gain control at
|
|
startup.</li><br>
|
|
<p>
|
|
|
|
<li>Memory consumption of your program is majorly increased whilst
|
|
running under Valgrind. This is due to the large amount of
|
|
adminstrative information maintained behind the scenes. Another
|
|
cause is that Valgrind dynamically translates the original
|
|
executable. Translated, instrumented code is 14-16 times larger
|
|
than the original (!) so you can easily end up with 30+ MB of
|
|
translations when running (eg) a web browser.
|
|
</li>
|
|
</ul>
|
|
|
|
|
|
Programs which are known not to work are:
|
|
|
|
<ul>
|
|
<li>emacs starts up but immediately concludes it is out of memory
|
|
and aborts. Emacs has it's own memory-management scheme, but I
|
|
don't understand why this should interact so badly with
|
|
Valgrind. Emacs works fine if you build it to use the standard
|
|
malloc/free routines.</li><br>
|
|
<p>
|
|
</ul>
|
|
|
|
|
|
<p><hr width="100%">
|
|
|
|
|
|
<a name="howitworks"></a>
|
|
<h2>5 How it works -- a rough overview</h2>
|
|
Some gory details, for those with a passion for gory details. You
|
|
don't need to read this section if all you want to do is use Valgrind.
|
|
|
|
<a name="startb"></a>
|
|
<h3>5.1 Getting started</h3>
|
|
|
|
Valgrind is compiled into a shared object, valgrind.so. The shell
|
|
script valgrind sets the LD_PRELOAD environment variable to point to
|
|
valgrind.so. This causes the .so to be loaded as an extra library to
|
|
any subsequently executed dynamically-linked ELF binary, viz, the
|
|
program you want to debug.
|
|
|
|
<p>The dynamic linker allows each .so in the process image to have an
|
|
initialisation function which is run before main(). It also allows
|
|
each .so to have a finalisation function run after main() exits.
|
|
|
|
<p>When valgrind.so's initialisation function is called by the dynamic
|
|
linker, the synthetic CPU to starts up. The real CPU remains locked
|
|
in valgrind.so for the entire rest of the program, but the synthetic
|
|
CPU returns from the initialisation function. Startup of the program
|
|
now continues as usual -- the dynamic linker calls all the other .so's
|
|
initialisation routines, and eventually runs main(). This all runs on
|
|
the synthetic CPU, not the real one, but the client program cannot
|
|
tell the difference.
|
|
|
|
<p>Eventually main() exits, so the synthetic CPU calls valgrind.so's
|
|
finalisation function. Valgrind detects this, and uses it as its cue
|
|
to exit. It prints summaries of all errors detected, possibly checks
|
|
for memory leaks, and then exits the finalisation routine, but now on
|
|
the real CPU. The synthetic CPU has now lost control -- permanently
|
|
-- so the program exits back to the OS on the real CPU, just as it
|
|
would have done anyway.
|
|
|
|
<p>On entry, Valgrind switches stacks, so it runs on its own stack.
|
|
On exit, it switches back. This means that the client program
|
|
continues to run on its own stack, so we can switch back and forth
|
|
between running it on the simulated and real CPUs without difficulty.
|
|
This was an important design decision, because it makes it easy (well,
|
|
significantly less difficult) to debug the synthetic CPU.
|
|
|
|
|
|
<a name="engine"></a>
|
|
<h3>5.2 The translation/instrumentation engine</h3>
|
|
|
|
Valgrind does not directly run any of the original program's code. Only
|
|
instrumented translations are run. Valgrind maintains a translation
|
|
table, which allows it to find the translation quickly for any branch
|
|
target (code address). If no translation has yet been made, the
|
|
translator - a just-in-time translator - is summoned. This makes an
|
|
instrumented translation, which is added to the collection of
|
|
translations. Subsequent jumps to that address will use this
|
|
translation.
|
|
|
|
<p>Valgrind no longer directly supports detection of self-modifying
|
|
code. Such checking is expensive, and in practice (fortunately)
|
|
almost no applications need it. However, to help people who are
|
|
debugging dynamic code generation systems, there is a Client Request
|
|
(basically a macro you can put in your program) which directs Valgrind
|
|
to discard translations in a given address range. So Valgrind can
|
|
still work in this situation provided the client tells it when
|
|
code has become out-of-date and needs to be retranslated.
|
|
|
|
<p>The JITter translates basic blocks -- blocks of straight-line-code
|
|
-- as single entities. To minimise the considerable difficulties of
|
|
dealing with the x86 instruction set, x86 instructions are first
|
|
translated to a RISC-like intermediate code, similar to sparc code,
|
|
but with an infinite number of virtual integer registers. Initially
|
|
each insn is translated seperately, and there is no attempt at
|
|
instrumentation.
|
|
|
|
<p>The intermediate code is improved, mostly so as to try and cache
|
|
the simulated machine's registers in the real machine's registers over
|
|
several simulated instructions. This is often very effective. Also,
|
|
we try to remove redundant updates of the simulated machines's
|
|
condition-code register.
|
|
|
|
<p>The intermediate code is then instrumented, giving more
|
|
intermediate code. There are a few extra intermediate-code operations
|
|
to support instrumentation; it is all refreshingly simple. After
|
|
instrumentation there is a cleanup pass to remove redundant value
|
|
checks.
|
|
|
|
<p>This gives instrumented intermediate code which mentions arbitrary
|
|
numbers of virtual registers. A linear-scan register allocator is
|
|
used to assign real registers and possibly generate spill code. All
|
|
of this is still phrased in terms of the intermediate code. This
|
|
machinery is inspired by the work of Reuben Thomas (MITE).
|
|
|
|
<p>Then, and only then, is the final x86 code emitted. The
|
|
intermediate code is carefully designed so that x86 code can be
|
|
generated from it without need for spare registers or other
|
|
inconveniences.
|
|
|
|
<p>The translations are managed using a traditional LRU-based caching
|
|
scheme. The translation cache has a default size of about 14MB.
|
|
|
|
<a name="track"></a>
|
|
|
|
<h3>5.3 Tracking the status of memory</h3> Each byte in the
|
|
process' address space has nine bits associated with it: one A bit and
|
|
eight V bits. The A and V bits for each byte are stored using a
|
|
sparse array, which flexibly and efficiently covers arbitrary parts of
|
|
the 32-bit address space without imposing significant space or
|
|
performance overheads for the parts of the address space never
|
|
visited. The scheme used, and speedup hacks, are described in detail
|
|
at the top of the source file vg_memory.c, so you should read that for
|
|
the gory details.
|
|
|
|
<a name="sys_calls"></a>
|
|
|
|
<h3>5.4 System calls</h3>
|
|
All system calls are intercepted. The memory status map is consulted
|
|
before and updated after each call. It's all rather tiresome. See
|
|
vg_syscall_mem.c for details.
|
|
|
|
<a name="sys_signals"></a>
|
|
|
|
<h3>5.5 Signals</h3>
|
|
All system calls to sigaction() and sigprocmask() are intercepted. If
|
|
the client program is trying to set a signal handler, Valgrind makes a
|
|
note of the handler address and which signal it is for. Valgrind then
|
|
arranges for the same signal to be delivered to its own handler.
|
|
|
|
<p>When such a signal arrives, Valgrind's own handler catches it, and
|
|
notes the fact. At a convenient safe point in execution, Valgrind
|
|
builds a signal delivery frame on the client's stack and runs its
|
|
handler. If the handler longjmp()s, there is nothing more to be said.
|
|
If the handler returns, Valgrind notices this, zaps the delivery
|
|
frame, and carries on where it left off before delivering the signal.
|
|
|
|
<p>The purpose of this nonsense is that setting signal handlers
|
|
essentially amounts to giving callback addresses to the Linux kernel.
|
|
We can't allow this to happen, because if it did, signal handlers
|
|
would run on the real CPU, not the simulated one. This means the
|
|
checking machinery would not operate during the handler run, and,
|
|
worse, memory permissions maps would not be updated, which could cause
|
|
spurious error reports once the handler had returned.
|
|
|
|
<p>An even worse thing would happen if the signal handler longjmp'd
|
|
rather than returned: Valgrind would completely lose control of the
|
|
client program.
|
|
|
|
<p>Upshot: we can't allow the client to install signal handlers
|
|
directly. Instead, Valgrind must catch, on behalf of the client, any
|
|
signal the client asks to catch, and must delivery it to the client on
|
|
the simulated CPU, not the real one. This involves considerable
|
|
gruesome fakery; see vg_signals.c for details.
|
|
<p>
|
|
|
|
<hr width="100%">
|
|
|
|
<a name="example"></a>
|
|
<h2>6 Example</h2>
|
|
This is the log for a run of a small program. The program is in fact
|
|
correct, and the reported error is as the result of a potentially serious
|
|
code generation bug in GNU g++ (snapshot 20010527).
|
|
<pre>
|
|
sewardj@phoenix:~/newmat10$
|
|
~/Valgrind-6/valgrind -v ./bogon
|
|
==25832== Valgrind 0.10, a memory error detector for x86 RedHat 7.1.
|
|
==25832== Copyright (C) 2000-2001, and GNU GPL'd, by Julian Seward.
|
|
==25832== Startup, with flags:
|
|
==25832== --suppressions=/home/sewardj/Valgrind/redhat71.supp
|
|
==25832== reading syms from /lib/ld-linux.so.2
|
|
==25832== reading syms from /lib/libc.so.6
|
|
==25832== reading syms from /mnt/pima/jrs/Inst/lib/libgcc_s.so.0
|
|
==25832== reading syms from /lib/libm.so.6
|
|
==25832== reading syms from /mnt/pima/jrs/Inst/lib/libstdc++.so.3
|
|
==25832== reading syms from /home/sewardj/Valgrind/valgrind.so
|
|
==25832== reading syms from /proc/self/exe
|
|
==25832== loaded 5950 symbols, 142333 line number locations
|
|
==25832==
|
|
==25832== Invalid read of size 4
|
|
==25832== at 0x8048724: _ZN10BandMatrix6ReSizeEiii (bogon.cpp:45)
|
|
==25832== by 0x80487AF: main (bogon.cpp:66)
|
|
==25832== by 0x40371E5E: __libc_start_main (libc-start.c:129)
|
|
==25832== by 0x80485D1: (within /home/sewardj/newmat10/bogon)
|
|
==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd
|
|
==25832==
|
|
==25832== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
|
|
==25832== malloc/free: in use at exit: 0 bytes in 0 blocks.
|
|
==25832== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
|
|
==25832== For a detailed leak analysis, rerun with: --leak-check=yes
|
|
==25832==
|
|
==25832== exiting, did 1881 basic blocks, 0 misses.
|
|
==25832== 223 translations, 3626 bytes in, 56801 bytes out.
|
|
</pre>
|
|
<p>The GCC folks fixed this about a week before gcc-3.0 shipped.
|
|
<hr width="100%">
|
|
<p>
|
|
|
|
|
|
|
|
<a name="cache"></a>
|
|
<h2>7 Cache profiling</h2>
|
|
As well as memory debugging, Valgrind also allows you to do cache simulations
|
|
and annotate your source line-by-line with the number of cache misses. In
|
|
particular, it records:
|
|
<ul>
|
|
<li>L1 instruction cache reads and misses;
|
|
<li>L1 data cache reads and read misses, writes and write misses;
|
|
<li>L2 unified cache reads and read misses, writes and writes misses.
|
|
</ul>
|
|
On a modern x86 machine, an L1 miss will typically cost around 10 cycles,
|
|
and an L2 miss can cost as much as 200 cycles. Detailed cache profiling can be
|
|
very useful for improving the performance of your program.<p>
|
|
|
|
Also, since one instruction cache read is performed per instruction executed,
|
|
you can find out how many instructions are executed per line, which can be
|
|
useful for optimisation and test coverage.<p>
|
|
|
|
Please note that this is an experimental feature. Any feedback, bug-fixes,
|
|
suggestions, etc, welcome.
|
|
|
|
|
|
<h3>7.1 Overview</h3>
|
|
First off, as for normal Valgrind use, you probably want to turn on debugging
|
|
info (the <code>-g</code> flag). But by contrast with normal Valgrind use, you
|
|
probably <b>do</b> want to turn optimisation on, since you should profile your
|
|
program as it will be normally run.
|
|
|
|
The three steps are:
|
|
<ol>
|
|
<li>Generate a cache simulator for your machine's cache
|
|
configuration with the supplied <code>vg_cachegen</code>
|
|
program, and recompile Valgrind with <code>make install</code>.
|
|
<p>
|
|
The default settings are for an AMD Athlon, and you will get
|
|
useful information with the defaults, so you can skip this step
|
|
if you want. Nevertheless, for accurate cache profiles you will
|
|
need use <code>vg_cachegen</code> to customise
|
|
<code>cachegrind</code> for your system.
|
|
<p>
|
|
This step only needs to be done once, unless you are interested
|
|
in simulating different cache configurations (eg. first
|
|
concentrating on instruction cache misses, then on data cache
|
|
misses).
|
|
</li>
|
|
<p>
|
|
<li>Run your program with <code>cachegrind</code> in front of the
|
|
normal command line invocation. When the program finishes,
|
|
Valgrind will print summary cache statistics. It also collects
|
|
line-by-line information in a file <code>cachegrind.out</code>.
|
|
<p>
|
|
This step should be done every time you want to collect
|
|
information about a new program, a changed program, or about the
|
|
same program with different input.
|
|
</li>
|
|
<p>
|
|
<li>Generate a function-by-function summary, and possibly annotate
|
|
source files with 'vg_annotate'. Source files to annotate can be
|
|
specified manually, or manually on the command line, or
|
|
"interesting" source files can be annotated automatically with
|
|
the <code>--auto=yes</code> option. You can annotate C/C++
|
|
files or assembly language files equally easily.</li>
|
|
<p>
|
|
This step can be performed as many times as you like for each
|
|
Step 2. You may want to do multiple annotations showing
|
|
different information each time.<p>
|
|
</ol>
|
|
|
|
The steps are described in detail in the following sections.<p>
|
|
|
|
|
|
<a name="generate"></a>
|
|
<h3>7.3 Generating a cache simulator</h3>
|
|
|
|
Although Valgrind comes with a pre-generated cache simulator, it most
|
|
likely won't match the cache configuration of your machine, so you
|
|
should generate a new simulator.<p>
|
|
|
|
You need to generate three files, one for each of the I1, D1 and L2
|
|
caches. For each cache, you need to know the:
|
|
<ul>
|
|
<li>Cache size (bytes);
|
|
<li>Line size (bytes);
|
|
<li>Associativity.
|
|
</ul>
|
|
|
|
vg_cachegen takes three options:
|
|
<ul>
|
|
<li><code>--I1=size,line_size,associativity</code>
|
|
<li><code>--D1=size,line_size,associativity</code>
|
|
<li><code>--L2=size,line_size,associativity</code>
|
|
</ul>
|
|
|
|
You can specify one, two or all three caches per invocation of
|
|
vg_cachegen. It checks that the configuration is sensible before
|
|
generating the simulators; to see the allowed values, run
|
|
<code>vg_cachegen -h</code>.<p>
|
|
|
|
An example invocation would be:
|
|
|
|
<blockquote><code>
|
|
vg_cachegen --I1=65536,64,2 --D1=65536,64,2 --L2=262144,64,8
|
|
</code></blockquote>
|
|
|
|
This simulates a machine with a 128KB split L1 2-way associative
|
|
cache, and a 256KB unified 8-way associative L2 cache. Both caches
|
|
have 64B lines.<p>
|
|
|
|
If you don't know your cache configuration, you'll have to find it
|
|
out. (Ideally <code>vg_cachegen</code> could auto-identify your cache
|
|
configuration using the CPUID instruction, which could be done
|
|
automatically during installation, and this whole step could be
|
|
skipped.)<p>
|
|
|
|
|
|
<h3>7.4 Cache simulation specifics</h3>
|
|
|
|
<code>vg_cachegen</code> only generates simulations for a machine with
|
|
a split L1 cache and a unified L2 cache. This configuration is used
|
|
for all (modern) x86-based machines we are aware of. Old Cyrix CPUs
|
|
had a unified I and D L1 cache, but they are ancient history now.<p>
|
|
|
|
The more specific characteristics of the simulation are as follows.
|
|
|
|
<ul>
|
|
<li>Write-allocate: when a write miss occurs, the block written to
|
|
is brought into the D1 cache. Most modern caches have this
|
|
property.</li><p>
|
|
|
|
<li>Bit-selection hash function: the line(s) in the cache to which a
|
|
memory block maps is chosen by the middle bits M--(M+N-1) of the
|
|
byte address, where:
|
|
<ul>
|
|
<li> line size = 2^M bytes </li>
|
|
<li>(cache size / line size) = 2^N bytes</li>
|
|
</ul> </li><p>
|
|
|
|
<li>Inclusive L2 cache: the L2 cache replicates all the entries of
|
|
the L1 cache. This is standard on Pentium chips, but AMD
|
|
Athlons use an exclusive L2 cache that only holds blocks evicted
|
|
from L1. Ditto AMD Durons and most modern VIAs.</li><p>
|
|
</ul>
|
|
|
|
Other noteworthy behaviour:
|
|
|
|
<ul>
|
|
<li>References that straddle two cache lines are treated as follows:</li>
|
|
<ul>
|
|
<li>If both blocks hit --> counted as one hit</li>
|
|
<li>If one block hits, the other misses --> counted as one miss</li>
|
|
<li>If both blocks miss --> counted as one miss (not two)</li>
|
|
</ul><p>
|
|
|
|
<li>Instructions that modify a memory location (eg. <code>inc</code> and
|
|
<code>dec</code>) are counted as doing just a read, ie. a single data
|
|
reference. This may seem strange, but since the write can never cause a
|
|
miss (the read guarantees the block is in the cache) it's not very
|
|
interesting.<p>
|
|
|
|
Thus it measures not the number of times the data cache is accessed, but
|
|
the number of times a data cache miss could occur.<p>
|
|
</li>
|
|
</ul>
|
|
|
|
If you are interested in simulating a cache with different properties, it is
|
|
not particularly hard to write your own cache simulator, or to modify existing
|
|
ones in <code>vg_cachesim_I1.c</code>, <code>vg_cachesim_I1.c</code> and
|
|
<code>vg_cachesim_I1.c</code>. We'd be interested to hear from anyone who
|
|
does.
|
|
|
|
|
|
<a name="profile"></a>
|
|
<h3>7.5 Profiling programs</h3>
|
|
|
|
Cache profiling is enabled by using the <code>--cachesim=yes</code>
|
|
option to the <code>valgrind</code> shell script. Alternatively, it
|
|
is probably more convenient to use the <code>cachegrind</code> script.
|
|
This automatically turns off Valgrind's memory checking functions,
|
|
since the cache simulation is slow enough already, and you probably
|
|
don't want to do both at once.
|
|
<p>
|
|
To gather cache profiling information about the program <code>ls
|
|
-l</code>, type:
|
|
|
|
<blockquote><code>cachegrind ls -l</code></blockquote>
|
|
|
|
The program will execute (slowly). Upon completion, summary statistics
|
|
that look like this will be printed:
|
|
|
|
<pre>
|
|
==31751== I refs: 27,742,716
|
|
==31751== I1 misses: 276
|
|
==31751== L2 misses: 275
|
|
==31751== I1 miss rate: 0.0%
|
|
==31751== L2i miss rate: 0.0%
|
|
==31751==
|
|
==31751== D refs: 15,430,290 (10,955,517 rd + 4,474,773 wr)
|
|
==31751== D1 misses: 41,185 ( 21,905 rd + 19,280 wr)
|
|
==31751== L2 misses: 23,085 ( 3,987 rd + 19,098 wr)
|
|
==31751== D1 miss rate: 0.2% ( 0.1% + 0.4%)
|
|
==31751== L2d miss rate: 0.1% ( 0.0% + 0.4%)
|
|
==31751==
|
|
==31751== L2 misses: 23,360 ( 4,262 rd + 19,098 wr)
|
|
==31751== L2 miss rate: 0.0% ( 0.0% + 0.4%)
|
|
</pre>
|
|
|
|
Cache accesses for instruction fetches are summarised first, giving the
|
|
number of fetches made (this is the number of instructions executed, which
|
|
can be useful to know in its own right), the number of I1 misses, and the
|
|
number of L2 instruction (<code>L2i</code>) misses.<p>
|
|
|
|
Cache accesses for data follow. The information is similar to that of the
|
|
instruction fetches, except that the values are also shown split between reads
|
|
and writes (note each row's <code>rd</code> and <code>wr</code> values add up
|
|
to the row's total).<p>
|
|
|
|
Combined instruction and data figures for the L2 cache follow that.<p>
|
|
|
|
|
|
<h3>7.6 Output file</h3>
|
|
|
|
As well as printing summary information, Cachegrind also writes
|
|
line-by-line cache profiling information to a file named
|
|
<code>cachegrind.out</code>. This file is human-readable, but is best
|
|
interpreted by the accompanying program <code>vg_annotate</code>,
|
|
described in the next section.
|
|
<p>
|
|
Things to note about the <code>cachegrind.out</code> file:
|
|
<ul>
|
|
<li>It is written every time <code>valgrind --cachesim=yes</code> or
|
|
<code>cachegrind</code> is run, and will overwrite any existing
|
|
<code>cachegrind.out</code> in the current directory.</li>
|
|
<p>
|
|
<li>It can be huge: <code>ls -l</code> generates a file of about
|
|
350KB. Browsing a few files and web pages with a Konqueror
|
|
built with full debugging information generates a file
|
|
of around 15 MB.</li>
|
|
</ul>
|
|
|
|
|
|
<a name="annotate"></a>
|
|
<h3>7.7 Annotating C/C++ programs</h3>
|
|
|
|
Before using <code>vg_annotate</code>, it is worth widening your
|
|
window to be at least 120-characters wide if possible, as the output
|
|
lines can be quite long.
|
|
<p>
|
|
To get a function-by-function summary, run <code>vg_annotate</code> in
|
|
directory containing a <code>cachegrind.out</code> file. The output
|
|
looks like this:
|
|
|
|
<pre>
|
|
--------------------------------------------------------------------------------
|
|
I1 cache: 65536 B, 64 B, 2-way associative
|
|
D1 cache: 65536 B, 64 B, 2-way associative
|
|
L2 cache: 262144 B, 64 B, 8-way associative
|
|
Command: concord vg_to_ucode.c
|
|
Events recorded: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
|
|
Events shown: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
|
|
Event sort order: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
|
|
Threshold: 99%
|
|
Chosen for annotation:
|
|
Auto-annotation: on
|
|
|
|
--------------------------------------------------------------------------------
|
|
Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
|
|
--------------------------------------------------------------------------------
|
|
27,742,716 276 275 10,955,517 21,905 3,987 4,474,773 19,280 19,098 PROGRAM TOTALS
|
|
|
|
--------------------------------------------------------------------------------
|
|
Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw file:function
|
|
--------------------------------------------------------------------------------
|
|
8,821,482 5 5 2,242,702 1,621 73 1,794,230 0 0 getc.c:_IO_getc
|
|
5,222,023 4 4 2,276,334 16 12 875,959 1 1 concord.c:get_word
|
|
2,649,248 2 2 1,344,810 7,326 1,385 . . . vg_main.c:strcmp
|
|
2,521,927 2 2 591,215 0 0 179,398 0 0 concord.c:hash
|
|
2,242,740 2 2 1,046,612 568 22 448,548 0 0 ctype.c:tolower
|
|
1,496,937 4 4 630,874 9,000 1,400 279,388 0 0 concord.c:insert
|
|
897,991 51 51 897,831 95 30 62 1 1 ???:???
|
|
598,068 1 1 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__flockfile
|
|
598,068 0 0 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__funlockfile
|
|
598,024 4 4 213,580 35 16 149,506 0 0 vg_clientmalloc.c:malloc
|
|
446,587 1 1 215,973 2,167 430 129,948 14,057 13,957 concord.c:add_existing
|
|
341,760 2 2 128,160 0 0 128,160 0 0 vg_clientmalloc.c:vg_trap_here_WRAPPER
|
|
320,782 4 4 150,711 276 0 56,027 53 53 concord.c:init_hash_table
|
|
298,998 1 1 106,785 0 0 64,071 1 1 concord.c:create
|
|
149,518 0 0 149,516 0 0 1 0 0 ???:tolower@@GLIBC_2.0
|
|
149,518 0 0 149,516 0 0 1 0 0 ???:fgetc@@GLIBC_2.0
|
|
95,983 4 4 38,031 0 0 34,409 3,152 3,150 concord.c:new_word_node
|
|
85,440 0 0 42,720 0 0 21,360 0 0 vg_clientmalloc.c:vg_bogus_epilogue
|
|
</pre>
|
|
|
|
First up is a summary of the annotation options:
|
|
|
|
<ul>
|
|
<li>I1 cache, D1 cache, L2 cache: cache configuration. So you know the
|
|
configuration with which these results were obtained.</li><p>
|
|
|
|
<li>Command: the command line invocation of the program under
|
|
examination.</li><p>
|
|
|
|
<li>Events recorded: event abbreviations are:<p>
|
|
<ul>
|
|
<li><code>Ir </code>: I cache reads (ie. instructions executed)</li>
|
|
<li><code>I1mr</code>: I1 cache read misses</li>
|
|
<li><code>I2mr</code>: L2 cache instruction read misses</li>
|
|
<li><code>Dr </code>: D cache reads (ie. memory reads)</li>
|
|
<li><code>D1mr</code>: D1 cache read misses</li>
|
|
<li><code>D2mr</code>: L2 cache data read misses</li>
|
|
<li><code>Dw </code>: D cache writes (ie. memory writes)</li>
|
|
<li><code>D1mw</code>: D1 cache write misses</li>
|
|
<li><code>D2mw</code>: L2 cache data write misses</li>
|
|
</ul><p>
|
|
Note that D1 total accesses is given by <code>D1mr</code> +
|
|
<code>D1mw</code>, and that L2 total accesses is given by
|
|
<code>I2mr</code> + <code>D2mr</code> + <code>D2mw</code>.</li><p>
|
|
|
|
<li>Events shown: the events shown (a subset of events gathered). This can
|
|
be adjusted with the <code>--show</code> option.</li><p>
|
|
|
|
<li>Event sort order: the sort order in which functions are shown. For
|
|
example, in this case the functions are sorted from highest
|
|
<code>Ir</code> counts to lowest. If two functions have identical
|
|
<code>Ir</code> counts, they will then be sorted by <code>I1mr</code>
|
|
counts, and so on. This order can be adjusted with the
|
|
<code>--sort</code> option.<p>
|
|
|
|
Note that this dictates the order the functions appear. It is <b>not</b>
|
|
the order in which the columns appear; that is dictated by the "events
|
|
shown" line (and can be changed with the <code>--sort</code> option).
|
|
</li><p>
|
|
|
|
<li>Threshold: <code>vg_annotate</code> by default omits functions
|
|
that cause very low numbers of misses to avoid drowning you in
|
|
information. In this case, vg_annotate shows summaries the
|
|
functions that account for 99% of the <code>Ir</code> counts;
|
|
<code>Ir</code> is chosen as the threshold event since it is the
|
|
primary sort event. The threshold can be adjusted with the
|
|
<code>--threshold</code> option.</li><p>
|
|
|
|
<li>Chosen for annotation: names of files specified manually for annotation;
|
|
in this case none.</li><p>
|
|
|
|
<li>Auto-annotation: whether auto-annotation was requested via the
|
|
<code>--auto=yes</code> option. In this case no.</li><p>
|
|
</ul>
|
|
|
|
Then follows summary statistics for the whole program. These are similar
|
|
to the summary provided when running <code>valgrind --cachesim=yes</code>.<p>
|
|
|
|
Then follows function-by-function statistics. Each function is
|
|
identified by a <code>file_name:function_name</code> pair. If a column
|
|
contains only a dot it means the function never performs
|
|
that event (eg. the third row shows that <code>strcmp()</code>
|
|
contains no instructions that write to memory). The name
|
|
<code>???</code> is used if the the file name and/or function name
|
|
could not be determined from debugging information. If most of the
|
|
entries have the form <code>???:???</code> the program probably wasn't
|
|
compiled with <code>-g</code>. <p>
|
|
|
|
It is worth noting that functions will come from three types of source files:
|
|
<ol>
|
|
<li> From the profiled program (<code>concord.c</code> in this example).</li>
|
|
<li>From libraries (eg. <code>getc.c</code>)</li>
|
|
<li>From Valgrind's implementation of some libc functions (eg.
|
|
<code>vg_clientmalloc.c:malloc</code>). These are recognisable because
|
|
the filename begins with <code>vg_</code>, and is probably one of
|
|
<code>vg_main.c</code>, <code>vg_clientmalloc.c</code> or
|
|
<code>vg_mylibc.c</code>.
|
|
</li>
|
|
</ol>
|
|
|
|
There are two ways to annotate source files -- by choosing them
|
|
manually, or with the <code>--auto=yes</code> option. To do it
|
|
manually, just specify the filenames as arguments to
|
|
<code>vg_annotate</code>. For example, the output from running
|
|
<code>vg_annotate concord.c</code> for our example produces the same
|
|
output as above followed by an annotated version of
|
|
<code>concord.c</code>, a section of which looks like:
|
|
|
|
<pre>
|
|
--------------------------------------------------------------------------------
|
|
-- User-annotated source: concord.c
|
|
--------------------------------------------------------------------------------
|
|
Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
|
|
|
|
[snip]
|
|
|
|
. . . . . . . . . void init_hash_table(char *file_name, Word_Node *table[])
|
|
3 1 1 . . . 1 0 0 {
|
|
. . . . . . . . . FILE *file_ptr;
|
|
. . . . . . . . . Word_Info *data;
|
|
1 0 0 . . . 1 1 1 int line = 1, i;
|
|
. . . . . . . . .
|
|
5 0 0 . . . 3 0 0 data = (Word_Info *) create(sizeof(Word_Info));
|
|
. . . . . . . . .
|
|
4,991 0 0 1,995 0 0 998 0 0 for (i = 0; i < TABLE_SIZE; i++)
|
|
3,988 1 1 1,994 0 0 997 53 52 table[i] = NULL;
|
|
. . . . . . . . .
|
|
. . . . . . . . . /* Open file, check it. */
|
|
6 0 0 1 0 0 4 0 0 file_ptr = fopen(file_name, "r");
|
|
2 0 0 1 0 0 . . . if (!(file_ptr)) {
|
|
. . . . . . . . . fprintf(stderr, "Couldn't open '%s'.\n", file_name);
|
|
1 1 1 . . . . . . exit(EXIT_FAILURE);
|
|
. . . . . . . . . }
|
|
. . . . . . . . .
|
|
165,062 1 1 73,360 0 0 91,700 0 0 while ((line = get_word(data, line, file_ptr)) != EOF)
|
|
146,712 0 0 73,356 0 0 73,356 0 0 insert(data->;word, data->line, table);
|
|
. . . . . . . . .
|
|
4 0 0 1 0 0 2 0 0 free(data);
|
|
4 0 0 1 0 0 2 0 0 fclose(file_ptr);
|
|
3 0 0 2 0 0 . . . }
|
|
</pre>
|
|
|
|
(Although column widths are automatically minimised, a wide terminal is clearly
|
|
useful.)<p>
|
|
|
|
Each source file is clearly marked (<code>User-annotated source</code>) as
|
|
having been chosen manually for annotation. If the file was found in one of
|
|
the directories specified with the <code>-I</code>/<code>--include</code>
|
|
option, the directory and file are both given.<p>
|
|
|
|
Each line is annotated with its event counts. Events not applicable for a line
|
|
are represented by a `.'; this is useful for distinguishing between an event
|
|
which cannot happen, and one which can but did not.<p>
|
|
|
|
Sometimes only a small section of a source file is executed. To minimise
|
|
uninteresting output, Valgrind only shows annotated lines and lines within a
|
|
small distance of annotated lines. Gaps are marked with the line numbers so
|
|
you know which part of a file the shown code comes from, eg:
|
|
|
|
<pre>
|
|
(figures and code for line 704)
|
|
-- line 704 ----------------------------------------
|
|
-- line 878 ----------------------------------------
|
|
(figures and code for line 878)
|
|
</pre>
|
|
|
|
The amount of context to show around annotated lines is controlled by the
|
|
<code>--context</code> option.<p>
|
|
|
|
To get automatic annotation, run <code>vg_annotate --auto=yes</code>.
|
|
vg_annotate will automatically annotate every source file it can find that is
|
|
mentioned in the function-by-function summary. Therefore, the files chosen for
|
|
auto-annotation are affected by the <code>--sort</code> and
|
|
<code>--threshold</code> options. Each source file is clearly marked
|
|
(<code>Auto-annotated source</code>) as being chosen automatically. Any files
|
|
that could not be found are mentioned at the end of the output, eg:
|
|
|
|
<pre>
|
|
--------------------------------------------------------------------------------
|
|
The following files chosen for auto-annotation could not be found:
|
|
--------------------------------------------------------------------------------
|
|
getc.c
|
|
ctype.c
|
|
../sysdeps/generic/lockfile.c
|
|
</pre>
|
|
|
|
This is quite common for library files, since libraries are usually compiled
|
|
with debugging information, but the source files are often not present on a
|
|
system. If a file is chosen for annotation <b>both</b> manually and
|
|
automatically, it is marked as <code>User-annotated source</code>.
|
|
|
|
Use the <code>-I/--include</code> option to tell Valgrind where to look for
|
|
source files if the filenames found from the debugging information aren't
|
|
specific enough.
|
|
|
|
Beware that vg_annotate can take some time to digest large
|
|
<code>cachegrind.out</code> files, eg. 30 seconds or more. Also beware that
|
|
auto-annotation can produce a lot of output if your program is large!
|
|
|
|
|
|
<h3>7.8 Annotating assembler programs</h3>
|
|
|
|
Valgrind can annotate assembler programs too, or annotate the
|
|
assembler generated for your C program. Sometimes this is useful for
|
|
understanding what is really happening when an interesting line of C
|
|
code is translated into multiple instructions.<p>
|
|
|
|
To do this, you just need to assemble your <code>.s</code> files with
|
|
assembler-level debug information. gcc doesn't do this, but you can
|
|
use the GNU assembler with the <code>--gstabs</code> option to
|
|
generate object files with this information, eg:
|
|
|
|
<blockquote><code>as --gstabs foo.s</code></blockquote>
|
|
|
|
You can then profile and annotate source files in the same way as for C/C++
|
|
programs.
|
|
|
|
|
|
<h3>7.9 <code>vg_annotate</code> options</h3>
|
|
<ul>
|
|
<li><code>-h, --help</code></li><p>
|
|
<li><code>-v, --version</code><p>
|
|
|
|
Help and version, as usual.</li>
|
|
|
|
<li><code>--sort=A,B,C</code> [default: order in
|
|
<code>cachegrind.out</code>]<p>
|
|
Specifies the events upon which the sorting of the function-by-function
|
|
entries will be based. Useful if you want to concentrate on eg. I cache
|
|
misses (<code>--sort=I1mr,I2mr</code>), or D cache misses
|
|
(<code>--sort=D1mr,D2mr</code>), or L2 misses
|
|
(<code>--sort=D2mr,I2mr</code>).</li><p>
|
|
|
|
<li><code>--show=A,B,C</code> [default: all, using order in
|
|
<code>cachegrind.out</code>]<p>
|
|
Specifies which events to show (and the column order). Default is to use
|
|
all present in the <code>cachegrind.out</code> file (and use the order in
|
|
the file).</li><p>
|
|
|
|
<li><code>--threshold=X</code> [default: 99%] <p>
|
|
Sets the threshold for the function-by-function summary. Functions are
|
|
shown that account for more than X% of the primary sort event. If
|
|
auto-annotating, also affects which files are annotated.
|
|
|
|
Note: thresholds can be set for more than one of the events by appending
|
|
any events for the <code>--sort</code> option with a colon and a number
|
|
(no spaces, though). E.g. if you want to see the functions that cover
|
|
99% of L2 read misses and 99% of L2 write misses, use this option:
|
|
|
|
<blockquote><code>--sort=D2mr:99,D2mw:99</code></blockquote>
|
|
</li><p>
|
|
|
|
<li><code>--auto=no</code> [default]<br>
|
|
<code>--auto=yes</code> <p>
|
|
When enabled, automatically annotates every file that is mentioned in the
|
|
function-by-function summary that can be found. Also gives a list of
|
|
those that couldn't be found.
|
|
|
|
<li><code>--context=N</code> [default: 8]<p>
|
|
Print N lines of context before and after each annotated line. Avoids
|
|
printing large sections of source files that were not executed. Use a
|
|
large number (eg. 10,000) to show all source lines.
|
|
</li><p>
|
|
|
|
<li><code>-I=<dir>, --include=<dir></code>
|
|
[default: empty string]<p>
|
|
Adds a directory to the list in which to search for files. Multiple
|
|
-I/--include options can be given to add multiple directories.
|
|
</ul>
|
|
|
|
|
|
<h3>7.10 Warnings</h3>
|
|
There are a couple of situations in which vg_annotate issues warnings.
|
|
|
|
<ul>
|
|
<li>If a source file is more recent than the <code>cachegrind.out</code>
|
|
file. This is because the information in <code>cachegrind.out</code> is
|
|
only recorded with line numbers, so if the line numbers change at all in
|
|
the source (eg. lines added, deleted, swapped), any annotations will be
|
|
incorrect.<p>
|
|
|
|
<li>If information is recorded about line numbers past the end of a file.
|
|
This can be caused by the above problem, ie. shortening the source file
|
|
while using an old <code>cachegrind.out</code> file. If this happens,
|
|
the figures for the bogus lines are printed anyway (clearly marked as
|
|
bogus) in case they are important.</li><p>
|
|
</ul>
|
|
|
|
|
|
<h3>7.10 Things to watch out for</h3>
|
|
Some odd things that can occur during annotation:
|
|
|
|
<ul>
|
|
<li>If annotating at the assembler level, you might see something like this:
|
|
|
|
<pre>
|
|
1 0 0 . . . . . . leal -12(%ebp),%eax
|
|
1 0 0 . . . 1 0 0 movl %eax,84(%ebx)
|
|
2 0 0 0 0 0 1 0 0 movl $1,-20(%ebp)
|
|
. . . . . . . . . .align 4,0x90
|
|
1 0 0 . . . . . . movl $.LnrB,%eax
|
|
1 0 0 . . . 1 0 0 movl %eax,-16(%ebp)
|
|
</pre>
|
|
|
|
How can the third instruction be executed twice when the others are
|
|
executed only once? As it turns out, it isn't. Here's a dump of the
|
|
executable, from objdump:
|
|
|
|
<pre>
|
|
8048f25: 8d 45 f4 lea 0xfffffff4(%ebp),%eax
|
|
8048f28: 89 43 54 mov %eax,0x54(%ebx)
|
|
8048f2b: c7 45 ec 01 00 00 00 movl $0x1,0xffffffec(%ebp)
|
|
8048f32: 89 f6 mov %esi,%esi
|
|
8048f34: b8 08 8b 07 08 mov $0x8078b08,%eax
|
|
8048f39: 89 45 f0 mov %eax,0xfffffff0(%ebp)
|
|
</pre>
|
|
|
|
Notice the extra <code>mov %esi,%esi</code> instruction. Where did this
|
|
come from? The GNU assembler inserted it to serve as the two bytes of
|
|
padding needed to align the <code>movl $.LnrB,%eax</code> instruction on
|
|
a four-byte boundary, but pretended it didn't exist when adding debug
|
|
information. Thus when Valgrind reads the debug info it thinks that the
|
|
<code>movl $0x1,0xffffffec(%ebp)</code> instruction covers the address
|
|
range 0x8048f2b--0x804833 by itself, and attributes the counts for the
|
|
<code>mov %esi,%esi</code> to it.<p>
|
|
</li>
|
|
|
|
<li>Inlined functions can cause strange results in the function-by-function
|
|
summary. If a function <code>inline_me()</code> is defined in
|
|
<code>foo.h</code> and inlined in the functions <code>f1()</code>,
|
|
<code>f2()</code> and <code>f3()</code> in <code>bar.c</code>, there will
|
|
not be a <code>foo.h:inline_me()</code> function entry. Instead, there
|
|
will be separate function entries for each inlining site, ie.
|
|
<code>foo.h:f1()</code>, <code>foo.h:f2()</code> and
|
|
<code>foo.h:f3()</code>. To find the total counts for
|
|
<code>foo.h:inline_me()</code>, add up the counts from each entry.<p>
|
|
|
|
The reason for this is that although the debug info output by gcc
|
|
indicates the switch from <code>bar.c</code> to <code>foo.h</code>, it
|
|
doesn't indicate the name of the function in <code>foo.h</code>, so
|
|
Valgrind keeps using the old one.<p>
|
|
|
|
<li>Sometimes, the same filename might be represented with a relative name
|
|
and with an absolute name in different parts of the debug info, eg:
|
|
<code>/home/user/proj/proj.h</code> and <code>../proj.h</code>. In this
|
|
case, if you use auto-annotation, the file will be annotated twice with
|
|
the counts split between the two.<p>
|
|
</li>
|
|
|
|
<li>Files with more than 65,535 lines cause difficulties for the stabs debug
|
|
info reader. This is because the line number in the <code>struct
|
|
nlist</code> defined in <code>a.out.h</code> under Linux is only a 16-bit
|
|
number. Valgrind can handle some files with more than 65,535 lines
|
|
correctly by making some guesses to identify line number overflows. But
|
|
some cases are beyond it, in which case you'll get a warning message
|
|
explaining that annotations for the file might be incorrect.<p>
|
|
</li>
|
|
|
|
<li>If you compile some files with <code>-g</code> and some without, some
|
|
events that take place in a file without debug info could be attributed
|
|
to the last line of a file with debug info (whichever one gets placed
|
|
before the non-debug-info file in the executable).<p>
|
|
</li>
|
|
</ul>
|
|
|
|
This list looks long, but these cases should be fairly rare.<p>
|
|
|
|
Note: stabs is not an easy format to read. If you come across bizarre
|
|
annotations that look like might be caused by a bug in the stabs reader,
|
|
please let us know.<p>
|
|
|
|
|
|
<h3>7.11 Accuracy</h3>
|
|
Valgrind's cache profiling has a number of shortcomings:
|
|
|
|
<ul>
|
|
<li>It doesn't account for kernel activity -- the effect of system calls on
|
|
the cache contents is ignored.</li><p>
|
|
|
|
<li>It doesn't account for other process activity (although this is probably
|
|
desirable when considering a single program).</li><p>
|
|
|
|
<li>It doesn't account for virtual-to-physical address mappings; hence the
|
|
entire simulation is not a true representation of what's happening in the
|
|
cache.</li><p>
|
|
|
|
<li>It doesn't account for cache misses not visible at the instruction level,
|
|
eg. those arising from TLB misses, or speculative execution.</li><p>
|
|
|
|
<li>Valgrind's custom <code>malloc()</code> will allocate memory in different
|
|
ways to the standard <code>malloc()</code>, which could warp the results.
|
|
</li><p>
|
|
|
|
<li>The instructions <code>bts</code>, <code>btr</code> and <code>btc</code>
|
|
will incorrectly be counted as doing a data read if both the arguments
|
|
are registers, eg:
|
|
|
|
<blockquote><code>btsl %eax, %edx</code></blockquote>
|
|
|
|
This should only happen rarely.
|
|
</ul>
|
|
|
|
Another thing worth nothing is that results are very sensitive. Changing the
|
|
size of the <code>valgrind.so</code> file, the size of the program being
|
|
profiled, or even the length of its name can perturb the results. Variations
|
|
will be small, but don't expect perfectly repeatable results if your program
|
|
changes at all.<p>
|
|
|
|
While these factors mean you shouldn't trust the results to be super-accurate,
|
|
hopefully they should be close enough to be useful.<p>
|
|
|
|
|
|
<h3>7.12 Todo</h3>
|
|
<ul>
|
|
<li>Use CPUID instruction to auto-identify cache configuration during
|
|
installation. This would save the user from having to know their cache
|
|
configuration and using vg_cachegen.</li>
|
|
<p>
|
|
<li>Program start-up/shut-down calls a lot of functions that aren't
|
|
interesting and just complicate the output. Would be nice to exclude
|
|
these somehow.</li>
|
|
<p>
|
|
</ul>
|
|
<hr width="100%">
|
|
</body>
|
|
</html>
|
|
|