mirror of
https://github.com/Zenithsiz/ftmemsim-valgrind.git
synced 2026-02-03 18:13:01 +00:00
line options. This commit changes them to all <option>. Also make consistent how options with multiple names (eg. -h --help) are shown. Also, remove section describing --help and --version in Callgrind's chapter; these aren't necessary and are presumably a hangover from when Callgrind was a separate tool. git-svn-id: svn://svn.valgrind.org/valgrind/trunk@10659
1927 lines
75 KiB
XML
1927 lines
75 KiB
XML
<?xml version="1.0"?> <!-- -*- sgml -*- -->
|
|
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
|
|
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
|
|
|
|
|
|
<chapter id="mc-manual" xreflabel="Memcheck: a memory error detector">
|
|
<title>Memcheck: a memory error detector</title>
|
|
|
|
<para>To use this tool, you may specify <option>--tool=memcheck</option>
|
|
on the Valgrind command line. You don't have to, though, since Memcheck
|
|
is the default tool.</para>
|
|
|
|
|
|
<sect1 id="mc-manual.overview" xreflabel="Overview">
|
|
<title>Overview</title>
|
|
|
|
<para>Memcheck is a memory error detector. It can detect the following
|
|
problems that are common in C and C++ programs.</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>Accessing memory you shouldn't, e.g. overrunning and underrunning
|
|
heap blocks, overrunning the top of the stack, and accessing memory after
|
|
it has been freed.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>Using undefined values, i.e. values that have not been initialised,
|
|
or that have been derived from other undefined values.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>Incorrect freeing of heap memory, such as double-freeing heap
|
|
blocks, or mismatched use of
|
|
<function>malloc</function>/<computeroutput>new</computeroutput>/<computeroutput>new[]</computeroutput>
|
|
versus
|
|
<function>free</function>/<computeroutput>delete</computeroutput>/<computeroutput>delete[]</computeroutput></para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>Overlapping <computeroutput>src</computeroutput> and
|
|
<computeroutput>dst</computeroutput> pointers in
|
|
<computeroutput>memcpy()</computeroutput> and related
|
|
functions.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>Memory leaks.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>Problems like these can be difficult to find by other means,
|
|
often remaining undetected for long periods, then causing occasional,
|
|
difficult-to-diagnose crashes.</para>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<sect1 id="mc-manual.flags"
|
|
xreflabel="Command-line flags specific to Memcheck">
|
|
<title>Command-line flags specific to Memcheck</title>
|
|
|
|
<!-- start of xi:include in the manpage -->
|
|
<variablelist id="mc.opts.list">
|
|
|
|
<varlistentry id="opt.undef-value-errors" xreflabel="--undef-value-errors">
|
|
<term>
|
|
<option><![CDATA[--undef-value-errors=<yes|no> [default: yes] ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>Controls whether Memcheck reports
|
|
uses of undefined value errors. Set this to
|
|
<varname>no</varname> if you don't want to see undefined value
|
|
errors. It also has the side effect of speeding up
|
|
memcheck somewhat.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry id="opt.track-origins" xreflabel="--track-origins">
|
|
<term>
|
|
<option><![CDATA[--track-origins=<yes|no> [default: no] ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>Controls whether Memcheck tracks
|
|
the origin of uninitialised values. By default, it does not,
|
|
which means that although it can tell you that an
|
|
uninitialised value is being used in a dangerous way, it
|
|
cannot tell you where the uninitialised value came from. This
|
|
often makes it difficult to track down the root problem.
|
|
</para>
|
|
<para>When set
|
|
to <varname>yes</varname>, Memcheck keeps
|
|
track of the origins of all uninitialised values. Then, when
|
|
an uninitialised value error is
|
|
reported, Memcheck will try to show the
|
|
origin of the value. An origin can be one of the following
|
|
four places: a heap block, a stack allocation, a client
|
|
request, or miscellaneous other sources (eg, a call
|
|
to <varname>brk</varname>).
|
|
</para>
|
|
<para>For uninitialised values originating from a heap
|
|
block, Memcheck shows where the block was
|
|
allocated. For uninitialised values originating from a stack
|
|
allocation, Memcheck can tell you which
|
|
function allocated the value, but no more than that -- typically
|
|
it shows you the source location of the opening brace of the
|
|
function. So you should carefully check that all of the
|
|
function's local variables are initialised properly.
|
|
</para>
|
|
<para>Performance overhead: origin tracking is expensive. It
|
|
halves Memcheck's speed and increases
|
|
memory use by a minimum of 100MB, and possibly more.
|
|
Nevertheless it can drastically reduce the effort required to
|
|
identify the root cause of uninitialised value errors, and so
|
|
is often a programmer productivity win, despite running
|
|
more slowly.
|
|
</para>
|
|
<para>Accuracy: Memcheck tracks origins
|
|
quite accurately. To avoid very large space and time
|
|
overheads, some approximations are made. It is possible,
|
|
although unlikely, that Memcheck will report an incorrect origin, or
|
|
not be able to identify any origin.
|
|
</para>
|
|
<para>Note that the combination
|
|
<option>--track-origins=yes</option>
|
|
and <option>--undef-value-errors=no</option> is
|
|
nonsensical. Memcheck checks for and
|
|
rejects this combination at startup.
|
|
</para>
|
|
<para>Origin tracking is a new feature, introduced in Valgrind
|
|
version 3.4.0.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry id="opt.leak-check" xreflabel="--leak-check">
|
|
<term>
|
|
<option><![CDATA[--leak-check=<no|summary|yes|full> [default: summary] ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>When enabled, search for memory leaks when the client
|
|
program finishes. If set to <varname>summary</varname>, it says how
|
|
many leaks occurred. If set to <varname>full</varname> or
|
|
<varname>yes</varname>, it also gives details of each individual
|
|
leak.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry id="opt.show-reachable" xreflabel="--show-reachable">
|
|
<term>
|
|
<option><![CDATA[--show-reachable=<yes|no> [default: no] ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>When disabled, the memory leak detector only shows "definitely
|
|
lost" and "possibly lost" blocks. When enabled, the leak detector also
|
|
shows "reachable" and "indirectly lost" blocks. (In other words, it
|
|
shows all blocks, except suppressed ones, so
|
|
<option>--show-all</option> would be a better name for
|
|
it.)</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry id="opt.leak-resolution" xreflabel="--leak-resolution">
|
|
<term>
|
|
<option><![CDATA[--leak-resolution=<low|med|high> [default: high] ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>When doing leak checking, determines how willing
|
|
Memcheck is to consider different backtraces to
|
|
be the same. When set to <varname>low</varname>, only the first
|
|
two entries need match. When <varname>med</varname>, four entries
|
|
have to match. When <varname>high</varname>, all entries need to
|
|
match; this is consistent with how merging occurs for other kinds of
|
|
errors.</para>
|
|
|
|
<para>For hardcore leak debugging, you probably want to use
|
|
<option>--leak-resolution=high</option> together with
|
|
<option>--num-callers=40</option> or some such large number.
|
|
</para>
|
|
|
|
<para>Note that the <option>--leak-resolution=</option> setting
|
|
does not affect Memcheck's ability to find
|
|
leaks. It only changes how the results are presented.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry id="opt.freelist-vol" xreflabel="--freelist-vol">
|
|
<term>
|
|
<option><![CDATA[--freelist-vol=<number> [default: 10000000] ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>When the client program releases memory using
|
|
<function>free</function> (in <literal>C</literal>) or
|
|
<computeroutput>delete</computeroutput>
|
|
(<literal>C++</literal>), that memory is not immediately made
|
|
available for re-allocation. Instead, it is marked inaccessible
|
|
and placed in a queue of freed blocks. The purpose is to defer as
|
|
long as possible the point at which freed-up memory comes back
|
|
into circulation. This increases the chance that
|
|
Memcheck will be able to detect invalid
|
|
accesses to blocks for some significant period of time after they
|
|
have been freed.</para>
|
|
|
|
<para>This flag specifies the maximum total size, in bytes, of the
|
|
blocks in the queue. The default value is ten million bytes.
|
|
Increasing this increases the total amount of memory used by
|
|
Memcheck but may detect invalid uses of freed
|
|
blocks which would otherwise go undetected.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry id="opt.workaround-gcc296-bugs" xreflabel="--workaround-gcc296-bugs">
|
|
<term>
|
|
<option><![CDATA[--workaround-gcc296-bugs=<yes|no> [default: no] ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>When enabled, assume that reads and writes some small
|
|
distance below the stack pointer are due to bugs in gcc 2.96, and
|
|
does not report them. The "small distance" is 256 bytes by
|
|
default. Note that gcc 2.96 is the default compiler on some ancient
|
|
Linux distributions (RedHat 7.X) and so you may need to use this
|
|
flag. Do not use it if you do not have to, as it can cause real
|
|
errors to be overlooked. A better alternative is to use a more
|
|
recent gcc/g++ in which this bug is fixed.</para>
|
|
|
|
<para>You may also need to use this flag when working with
|
|
gcc/g++ 3.X or 4.X on 32-bit PowerPC Linux. This is because
|
|
gcc/g++ generates code which occasionally accesses below the
|
|
stack pointer, particularly for floating-point to/from integer
|
|
conversions. This is in violation of the 32-bit PowerPC ELF
|
|
specification, which makes no provision for locations below the
|
|
stack pointer to be accessible.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry id="opt.partial-loads-ok" xreflabel="--partial-loads-ok">
|
|
<term>
|
|
<option><![CDATA[--partial-loads-ok=<yes|no> [default: no] ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>Controls how Memcheck handles word-sized,
|
|
word-aligned loads from addresses for which some bytes are
|
|
addressable and others are not. When <varname>yes</varname>, such
|
|
loads do not produce an address error. Instead, loaded bytes
|
|
originating from illegal addresses are marked as uninitialised, and
|
|
those corresponding to legal addresses are handled in the normal
|
|
way.</para>
|
|
|
|
<para>When <varname>no</varname>, loads from partially invalid
|
|
addresses are treated the same as loads from completely invalid
|
|
addresses: an illegal-address error is issued, and the resulting
|
|
bytes are marked as initialised.</para>
|
|
|
|
<para>Note that code that behaves in this way is in violation of
|
|
the the ISO C/C++ standards, and should be considered broken. If
|
|
at all possible, such code should be fixed. This flag should be
|
|
used only as a last resort.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry id="opt.malloc-fill" xreflabel="--malloc-fill">
|
|
<term>
|
|
<option><![CDATA[--malloc-fill=<hexnumber> ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>Fills blocks allocated
|
|
by <computeroutput>malloc</computeroutput>,
|
|
<computeroutput>new</computeroutput>, etc, but not
|
|
by <computeroutput>calloc</computeroutput>, with the specified
|
|
byte. This can be useful when trying to shake out obscure
|
|
memory corruption problems. The allocated area is still
|
|
regarded by Memcheck as undefined -- this flag only affects its
|
|
contents.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry id="opt.free-fill" xreflabel="--free-fill">
|
|
<term>
|
|
<option><![CDATA[--free-fill=<hexnumber> ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>Fills blocks freed
|
|
by <computeroutput>free</computeroutput>,
|
|
<computeroutput>delete</computeroutput>, etc, with the
|
|
specified byte. This can be useful when trying to shake out
|
|
obscure memory corruption problems. The freed area is still
|
|
regarded by Memcheck as not valid for access -- this flag only
|
|
affects its contents.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
<!-- end of xi:include in the manpage -->
|
|
|
|
</sect1>
|
|
|
|
|
|
<sect1 id="mc-manual.errormsgs"
|
|
xreflabel="Explanation of error messages from Memcheck">
|
|
<title>Explanation of error messages from Memcheck</title>
|
|
|
|
<para>Despite considerable sophistication under the hood, Memcheck can
|
|
only really detect two kinds of errors: use of illegal addresses, and
|
|
use of undefined values. Nevertheless, this is enough to help you
|
|
discover all sorts of memory-management problems in your code. This
|
|
section presents a quick summary of what error messages mean. The
|
|
precise behaviour of the error-checking machinery is described in
|
|
<xref linkend="mc-manual.machine"/>.</para>
|
|
|
|
|
|
<sect2 id="mc-manual.badrw"
|
|
xreflabel="Illegal read / Illegal write errors">
|
|
<title>Illegal read / Illegal write errors</title>
|
|
|
|
<para>For example:</para>
|
|
<programlisting><![CDATA[
|
|
Invalid read of size 4
|
|
at 0x40F6BBCC: (within /usr/lib/libpng.so.2.1.0.9)
|
|
by 0x40F6B804: (within /usr/lib/libpng.so.2.1.0.9)
|
|
by 0x40B07FF4: read_png_image(QImageIO *) (kernel/qpngio.cpp:326)
|
|
by 0x40AC751B: QImageIO::read() (kernel/qimage.cpp:3621)
|
|
Address 0xBFFFF0E0 is not stack'd, malloc'd or free'd
|
|
]]></programlisting>
|
|
|
|
<para>This happens when your program reads or writes memory at a place
|
|
which Memcheck reckons it shouldn't. In this example, the program did a
|
|
4-byte read at address 0xBFFFF0E0, somewhere within the system-supplied
|
|
library libpng.so.2.1.0.9, which was called from somewhere else in the
|
|
same library, called from line 326 of <filename>qpngio.cpp</filename>,
|
|
and so on.</para>
|
|
|
|
<para>Memcheck tries to establish what the illegal address might relate
|
|
to, since that's often useful. So, if it points into a block of memory
|
|
which has already been freed, you'll be informed of this, and also where
|
|
the block was free'd at. Likewise, if it should turn out to be just off
|
|
the end of a malloc'd block, a common result of off-by-one-errors in
|
|
array subscripting, you'll be informed of this fact, and also where the
|
|
block was malloc'd.</para>
|
|
|
|
<para>In this example, Memcheck can't identify the address. Actually
|
|
the address is on the stack, but, for some reason, this is not a valid
|
|
stack address -- it is below the stack pointer and that isn't allowed.
|
|
In this particular case it's probably caused by gcc generating invalid
|
|
code, a known bug in some ancient versions of gcc.</para>
|
|
|
|
<para>Note that Memcheck only tells you that your program is about to
|
|
access memory at an illegal address. It can't stop the access from
|
|
happening. So, if your program makes an access which normally would
|
|
result in a segmentation fault, you program will still suffer the same
|
|
fate -- but you will get a message from Memcheck immediately prior to
|
|
this. In this particular example, reading junk on the stack is
|
|
non-fatal, and the program stays alive.</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2 id="mc-manual.uninitvals"
|
|
xreflabel="Use of uninitialised values">
|
|
<title>Use of uninitialised values</title>
|
|
|
|
<para>For example:</para>
|
|
<programlisting><![CDATA[
|
|
Conditional jump or move depends on uninitialised value(s)
|
|
at 0x402DFA94: _IO_vfprintf (_itoa.h:49)
|
|
by 0x402E8476: _IO_printf (printf.c:36)
|
|
by 0x8048472: main (tests/manuel1.c:8)
|
|
]]></programlisting>
|
|
|
|
<para>An uninitialised-value use error is reported when your program
|
|
uses a value which hasn't been initialised -- in other words, is
|
|
undefined. Here, the undefined value is used somewhere inside the
|
|
printf() machinery of the C library. This error was reported when
|
|
running the following small program:</para>
|
|
<programlisting><![CDATA[
|
|
int main()
|
|
{
|
|
int x;
|
|
printf ("x = %d\n", x);
|
|
}]]></programlisting>
|
|
|
|
<para>It is important to understand that your program can copy around
|
|
junk (uninitialised) data as much as it likes. Memcheck observes this
|
|
and keeps track of the data, but does not complain. A complaint is
|
|
issued only when your program attempts to make use of uninitialised
|
|
data. In this example, x is uninitialised. Memcheck observes the value
|
|
being passed to <literal>_IO_printf</literal> and thence to
|
|
<literal>_IO_vfprintf</literal>, but makes no comment. However,
|
|
<literal>_IO_vfprintf</literal> has to examine the value of
|
|
x so it can turn it into the
|
|
corresponding ASCII string, and it is at this point that Memcheck
|
|
complains.</para>
|
|
|
|
<para>Sources of uninitialised data tend to be:</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>Local variables in procedures which have not been initialised,
|
|
as in the example above.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>The contents of malloc'd blocks, before you write something
|
|
there. In C++, the new operator is a wrapper round
|
|
<function>malloc</function>, so if you create an object with new,
|
|
its fields will be uninitialised until you (or the constructor)
|
|
fill them in.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>To see information on the sources of uninitialised data in your
|
|
program, use the <option>--track-origins=yes</option> flag. This
|
|
makes Memcheck run more slowly, but can make it much easier to track down
|
|
the root causes of uninitialised value errors.</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2 id="mc-manual.badfrees" xreflabel="Illegal frees">
|
|
<title>Illegal frees</title>
|
|
|
|
<para>For example:</para>
|
|
<programlisting><![CDATA[
|
|
Invalid free()
|
|
at 0x4004FFDF: free (vg_clientmalloc.c:577)
|
|
by 0x80484C7: main (tests/doublefree.c:10)
|
|
Address 0x3807F7B4 is 0 bytes inside a block of size 177 free'd
|
|
at 0x4004FFDF: free (vg_clientmalloc.c:577)
|
|
by 0x80484C7: main (tests/doublefree.c:10)
|
|
]]></programlisting>
|
|
|
|
<para>Memcheck keeps track of the blocks allocated by your program
|
|
with <function>malloc</function>/<computeroutput>new</computeroutput>,
|
|
so it can know exactly whether or not the argument to
|
|
<function>free</function>/<computeroutput>delete</computeroutput> is
|
|
legitimate or not. Here, this test program has freed the same block
|
|
twice. As with the illegal read/write errors, Memcheck attempts to
|
|
make sense of the address free'd. If, as here, the address is one
|
|
which has previously been freed, you wil be told that -- making
|
|
duplicate frees of the same block easy to spot.</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
<sect2 id="mc-manual.rudefn"
|
|
xreflabel="When a block is freed with an inappropriate deallocation
|
|
function">
|
|
<title>When a block is freed with an inappropriate deallocation
|
|
function</title>
|
|
|
|
<para>In the following example, a block allocated with
|
|
<function>new[]</function> has wrongly been deallocated with
|
|
<function>free</function>:</para>
|
|
<programlisting><![CDATA[
|
|
Mismatched free() / delete / delete []
|
|
at 0x40043249: free (vg_clientfuncs.c:171)
|
|
by 0x4102BB4E: QGArray::~QGArray(void) (tools/qgarray.cpp:149)
|
|
by 0x4C261C41: PptDoc::~PptDoc(void) (include/qmemarray.h:60)
|
|
by 0x4C261F0E: PptXml::~PptXml(void) (pptxml.cc:44)
|
|
Address 0x4BB292A8 is 0 bytes inside a block of size 64 alloc'd
|
|
at 0x4004318C: operator new[](unsigned int) (vg_clientfuncs.c:152)
|
|
by 0x4C21BC15: KLaola::readSBStream(int) const (klaola.cc:314)
|
|
by 0x4C21C155: KLaola::stream(KLaola::OLENode const *) (klaola.cc:416)
|
|
by 0x4C21788F: OLEFilter::convert(QCString const &) (olefilter.cc:272)
|
|
]]></programlisting>
|
|
|
|
<para>In <literal>C++</literal> it's important to deallocate memory in a
|
|
way compatible with how it was allocated. The deal is:</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>If allocated with
|
|
<function>malloc</function>,
|
|
<function>calloc</function>,
|
|
<function>realloc</function>,
|
|
<function>valloc</function> or
|
|
<function>memalign</function>, you must
|
|
deallocate with <function>free</function>.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>If allocated with <function>new[]</function>, you must
|
|
deallocate with <function>delete[]</function>.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>If allocated with <function>new</function>, you must deallocate
|
|
with <function>delete</function>.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>The worst thing is that on Linux apparently it doesn't matter if
|
|
you do mix these up, but the same program may then crash on a
|
|
different platform, Solaris for example. So it's best to fix it
|
|
properly. According to the KDE folks "it's amazing how many C++
|
|
programmers don't know this".</para>
|
|
|
|
<para>The reason behind the requirement is as follows. In some C++
|
|
implementations, <function>delete[]</function> must be used for
|
|
objects allocated by <function>new[]</function> because the compiler
|
|
stores the size of the array and the pointer-to-member to the
|
|
destructor of the array's content just before the pointer actually
|
|
returned. This implies a variable-sized overhead in what's returned
|
|
by <function>new</function> or <function>new[]</function>.</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2 id="mc-manual.badperm"
|
|
xreflabel="Passing system call parameters with
|
|
inadequate read/write permissions">
|
|
<title>Passing system call parameters with inadequate read/write
|
|
permissions</title>
|
|
|
|
<para>Memcheck checks all parameters to system calls:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>It checks all the direct parameters themselves.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Also, if a system call needs to read from a buffer provided by
|
|
your program, Memcheck checks that the entire buffer is addressable
|
|
and has valid data, ie, it is readable.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Also, if the system call needs to write to a user-supplied
|
|
buffer, Memcheck checks that the buffer is addressable.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
<para>After the system call, Memcheck updates its tracked information to
|
|
precisely reflect any changes in memory permissions caused by the system
|
|
call.</para>
|
|
|
|
<para>Here's an example of two system calls with invalid parameters:</para>
|
|
<programlisting><![CDATA[
|
|
#include <stdlib.h>
|
|
#include <unistd.h>
|
|
int main( void )
|
|
{
|
|
char* arr = malloc(10);
|
|
int* arr2 = malloc(sizeof(int));
|
|
write( 1 /* stdout */, arr, 10 );
|
|
exit(arr2[0]);
|
|
}
|
|
]]></programlisting>
|
|
|
|
<para>You get these complaints ...</para>
|
|
<programlisting><![CDATA[
|
|
Syscall param write(buf) points to uninitialised byte(s)
|
|
at 0x25A48723: __write_nocancel (in /lib/tls/libc-2.3.3.so)
|
|
by 0x259AFAD3: __libc_start_main (in /lib/tls/libc-2.3.3.so)
|
|
by 0x8048348: (within /auto/homes/njn25/grind/head4/a.out)
|
|
Address 0x25AB8028 is 0 bytes inside a block of size 10 alloc'd
|
|
at 0x259852B0: malloc (vg_replace_malloc.c:130)
|
|
by 0x80483F1: main (a.c:5)
|
|
|
|
Syscall param exit(error_code) contains uninitialised byte(s)
|
|
at 0x25A21B44: __GI__exit (in /lib/tls/libc-2.3.3.so)
|
|
by 0x8048426: main (a.c:8)
|
|
]]></programlisting>
|
|
|
|
<para>... because the program has (a) tried to write uninitialised junk
|
|
from the malloc'd block to the standard output, and (b) passed an
|
|
uninitialised value to <function>exit</function>. Note that the first
|
|
error refers to the memory pointed to by
|
|
<computeroutput>buf</computeroutput> (not
|
|
<computeroutput>buf</computeroutput> itself), but the second error
|
|
refers directly to <computeroutput>exit</computeroutput>'s argument
|
|
<computeroutput>arr2[0]</computeroutput>.</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
<sect2 id="mc-manual.overlap"
|
|
xreflabel="Overlapping source and destination blocks">
|
|
<title>Overlapping source and destination blocks</title>
|
|
|
|
<para>The following C library functions copy some data from one
|
|
memory block to another (or something similar):
|
|
<function>memcpy()</function>,
|
|
<function>strcpy()</function>,
|
|
<function>strncpy()</function>,
|
|
<function>strcat()</function>,
|
|
<function>strncat()</function>.
|
|
The blocks pointed to by their <computeroutput>src</computeroutput> and
|
|
<computeroutput>dst</computeroutput> pointers aren't allowed to overlap.
|
|
Memcheck checks for this.</para>
|
|
|
|
<para>For example:</para>
|
|
<programlisting><![CDATA[
|
|
==27492== Source and destination overlap in memcpy(0xbffff294, 0xbffff280, 21)
|
|
==27492== at 0x40026CDC: memcpy (mc_replace_strmem.c:71)
|
|
==27492== by 0x804865A: main (overlap.c:40)
|
|
]]></programlisting>
|
|
|
|
<para>You don't want the two blocks to overlap because one of them could
|
|
get partially overwritten by the copying.</para>
|
|
|
|
<para>You might think that Memcheck is being overly pedantic reporting
|
|
this in the case where <computeroutput>dst</computeroutput> is less than
|
|
<computeroutput>src</computeroutput>. For example, the obvious way to
|
|
implement <function>memcpy()</function> is by copying from the first
|
|
byte to the last. However, the optimisation guides of some
|
|
architectures recommend copying from the last byte down to the first.
|
|
Also, some implementations of <function>memcpy()</function> zero
|
|
<computeroutput>dst</computeroutput> before copying, because zeroing the
|
|
destination's cache line(s) can improve performance.</para>
|
|
|
|
<para>In addition, for many of these functions, the POSIX standards
|
|
have wording along the lines "If copying takes place between objects
|
|
that overlap, the behavior is undefined." Hence overlapping copies
|
|
violate the standard.</para>
|
|
|
|
<para>The moral of the story is: if you want to write truly portable
|
|
code, don't make any assumptions about the language
|
|
implementation.</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
<sect2 id="mc-manual.leaks" xreflabel="Memory leak detection">
|
|
<title>Memory leak detection</title>
|
|
|
|
<para>Memcheck keeps track of all memory blocks issued in response to
|
|
calls to
|
|
<function>malloc</function>/<function>calloc</function>/<function>realloc</function>/<computeroutput>new</computeroutput>.
|
|
So when the program exits, it knows which blocks have not been freed.
|
|
</para>
|
|
|
|
<para>If <option>--leak-check</option> is set appropriately, for each
|
|
remaining block, Memcheck determines if the block is reachable from pointers
|
|
within the root-set. The root-set consists of (a) general purpose registers
|
|
of all threads, and (b) initialised, aligned, pointer-sized data words in
|
|
accessible client memory, including stacks.</para>
|
|
|
|
<para>There are two ways a block can be reached. The first is with a
|
|
"start-pointer", i.e. a pointer to the start of the block. The second is with
|
|
an "interior-pointer", i.e. a pointer to the middle of the block. There are
|
|
three possibilities we know of:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>The pointer might have originally been a start-pointer and have been
|
|
moved along deliberately (or not deliberately) by the program.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>It might be a random junk value in memory, entirely unrelated, just
|
|
a coincidence.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>It might be a pointer to an array of C++ objects (which possess
|
|
destructors) allocated with <computeroutput>new[]</computeroutput>. In
|
|
this case, some compilers store a "magic cookie" containing the array
|
|
length at the start of the allocated block, and return a pointer to just
|
|
past that magic cookie, i.e. an interior-pointer.
|
|
See <ulink url="http://theory.uwinnipeg.ca/gnu/gcc/gxxint_14.html">this
|
|
page</ulink> for more information.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>With that in mind, consider the nine possible cases described by the
|
|
following figure.</para>
|
|
|
|
<programlisting><![CDATA[
|
|
Pointer chain AAA Category BBB Category
|
|
------------- ------------ ------------
|
|
(1) RRR ------------> BBB DR
|
|
(2) RRR ---> AAA ---> BBB DR IR
|
|
(3) RRR BBB DL
|
|
(4) RRR AAA ---> BBB DL IL
|
|
(5) RRR ------?-----> BBB (y)DR, (n)DL
|
|
(6) RRR ---> AAA -?-> BBB DR (y)IR, (n)DL
|
|
(7) RRR -?-> AAA ---> BBB (y)DR, (n)DL (y)IR, (n)IL
|
|
(8) RRR -?-> AAA -?-> BBB (y)DR, (n)DL (y,y)IR, (n,y)IL, (_,n)DL
|
|
(9) RRR AAA -?-> BBB DL (y)IL, (n)DL
|
|
|
|
Pointer chain legend:
|
|
- RRR: a root set node or DR block
|
|
- AAA, BBB: heap blocks
|
|
- --->: a start-pointer
|
|
- -?->: an interior-pointer
|
|
|
|
Category legend:
|
|
- DR: Directly reachable
|
|
- IR: Indirectly reachable
|
|
- DL: Directly lost
|
|
- IL: Indirectly lost
|
|
- (y)XY: it's XY if the interior-pointer is a real pointer
|
|
- (n)XY: it's XY if the interior-pointer is not a real pointer
|
|
- (_)XY: it's XY in either case
|
|
]]></programlisting>
|
|
|
|
<para>Every possible case can be reduced to one of the above nine. Memcheck
|
|
merges some of these cases in its output, resulting in the following four
|
|
categories.</para>
|
|
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
<para>"Still reachable". This covers cases 1 and 2 (for the BBB blocks)
|
|
above. A start-pointer or chain of start-pointers to the block is
|
|
found. Since the block is still pointed at, the programmer could, at
|
|
least in principle, have freed it before program exit. Because these
|
|
are very common and arguably not a problem, Memcheck won't report such
|
|
blocks individually unless <option>--show-reachable=yes</option> is
|
|
specified.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>"Definitely lost". This covers case 3 (for the BBB blocks) above.
|
|
This means that no pointer to the block can be found. The block is
|
|
classified as "lost", because the programmer could not possibly have
|
|
freed it at program exit, since no pointer to it exists. This is likely
|
|
a symptom of having lost the pointer at some earlier point in the
|
|
program. Such cases should be fixed by the programmer.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>"Indirectly lost". This covers cases 4 and 9 (for the BBB blocks)
|
|
above. This means that the block is lost, not because there are no
|
|
pointers to it, but rather because all the blocks that point to it are
|
|
themselves lost. For example, if you have a binary tree and the root
|
|
node is lost, all its children nodes will be indirectly lost. Because
|
|
the problem will disappear if the definitely lost block that caused the
|
|
indirect leak is fixed, Memcheck won't report such blocks individually
|
|
unless <option>--show-reachable=yes</option> is specified.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>"Possibly lost". This covers cases 5--8 (for the BBB blocks)
|
|
above. This means that a chain of one or more pointers to the block has
|
|
been found, but at least one of the pointers is an interior-pointer.
|
|
This could just be a random value in memory that happens to point into a
|
|
block, and so you shouldn't consider this ok unless you know you have
|
|
interior-pointers.</para>
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
<para>(Note: This mapping of the nine possible cases onto four categories is
|
|
not necessarily the best way that leaks could be reported; in particular,
|
|
interior-pointers are treated inconsistently. It is possible the
|
|
categorisation may be improved in the future.)</para>
|
|
|
|
<para>Furthermore, if suppressions exists for a block, it will be reported
|
|
as "suppressed" no matter what which of the above four categories it belongs
|
|
to.</para>
|
|
|
|
|
|
<para>The following is an example leak summary.</para>
|
|
|
|
<programlisting><![CDATA[
|
|
LEAK SUMMARY:
|
|
definitely lost: 48 bytes in 3 blocks.
|
|
indirectly lost: 32 bytes in 2 blocks.
|
|
possibly lost: 96 bytes in 6 blocks.
|
|
still reachable: 64 bytes in 4 blocks.
|
|
suppressed: 0 bytes in 0 blocks.
|
|
]]></programlisting>
|
|
|
|
<para>If <option>--leak-check=full</option> is specified,
|
|
Memcheck will give details for each definitely lost or possibly lost block,
|
|
including where it was allocated. (Actually, it merges results for all
|
|
blocks that have the same category and sufficiently similar stack traces
|
|
into a single "loss record". The
|
|
<option>--leak-resolution</option> lets you control the
|
|
meaning of "sufficiently similar".) It cannot tell you when or how or why
|
|
the pointer to a leaked block was lost; you have to work that out for
|
|
yourself. In general, you should attempt to ensure your programs do not
|
|
have any definitely lost or possibly lost blocks at exit.</para>
|
|
|
|
<para>For example:</para>
|
|
<programlisting><![CDATA[
|
|
8 bytes in 1 blocks are definitely lost in loss record 1 of 14
|
|
at 0x........: malloc (vg_replace_malloc.c:...)
|
|
by 0x........: mk (leak-tree.c:11)
|
|
by 0x........: main (leak-tree.c:39)
|
|
|
|
88 (8 direct, 80 indirect) bytes in 1 blocks are definitely lost in loss record 13 of 14
|
|
at 0x........: malloc (vg_replace_malloc.c:...)
|
|
by 0x........: mk (leak-tree.c:11)
|
|
by 0x........: main (leak-tree.c:25)
|
|
]]></programlisting>
|
|
|
|
<para>The first message describes a simple case of a single 8 byte block
|
|
that has been definitely lost. The second case mentions another 8 byte
|
|
block that has been definitely lost; the difference is that a further 80
|
|
bytes in other blocks are indirectly lost because of this lost block.
|
|
The loss records are not presented in any notable order, so the loss record
|
|
numbers aren't particularly meaningful.</para>
|
|
|
|
<para>If you specify <option>--show-reachable=yes</option>,
|
|
reachable and indirectly lost blocks will also be shown, as the following
|
|
two examples show.</para>
|
|
|
|
<programlisting><![CDATA[
|
|
64 bytes in 4 blocks are still reachable in loss record 2 of 4
|
|
at 0x........: malloc (vg_replace_malloc.c:177)
|
|
by 0x........: mk (leak-cases.c:52)
|
|
by 0x........: main (leak-cases.c:74)
|
|
|
|
32 bytes in 2 blocks are indirectly lost in loss record 1 of 4
|
|
at 0x........: malloc (vg_replace_malloc.c:177)
|
|
by 0x........: mk (leak-cases.c:52)
|
|
by 0x........: main (leak-cases.c:80)
|
|
]]></programlisting>
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<sect1 id="mc-manual.suppfiles" xreflabel="Writing suppression files">
|
|
<title>Writing suppression files</title>
|
|
|
|
<para>The basic suppression format is described in
|
|
<xref linkend="manual-core.suppress"/>.</para>
|
|
|
|
<para>The suppression-type (second) line should have the form:</para>
|
|
<programlisting><![CDATA[
|
|
Memcheck:suppression_type]]></programlisting>
|
|
|
|
<para>The Memcheck suppression types are as follows:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para><varname>Value1</varname>,
|
|
<varname>Value2</varname>,
|
|
<varname>Value4</varname>,
|
|
<varname>Value8</varname>,
|
|
<varname>Value16</varname>,
|
|
meaning an uninitialised-value error when
|
|
using a value of 1, 2, 4, 8 or 16 bytes.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><varname>Cond</varname> (or its old
|
|
name, <varname>Value0</varname>), meaning use
|
|
of an uninitialised CPU condition code.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><varname>Addr1</varname>,
|
|
<varname>Addr2</varname>,
|
|
<varname>Addr4</varname>,
|
|
<varname>Addr8</varname>,
|
|
<varname>Addr16</varname>,
|
|
meaning an invalid address during a
|
|
memory access of 1, 2, 4, 8 or 16 bytes respectively.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><varname>Jump</varname>, meaning an
|
|
jump to an unaddressable location error.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><varname>Param</varname>, meaning an
|
|
invalid system call parameter error.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><varname>Free</varname>, meaning an
|
|
invalid or mismatching free.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><varname>Overlap</varname>, meaning a
|
|
<computeroutput>src</computeroutput> /
|
|
<computeroutput>dst</computeroutput> overlap in
|
|
<function>memcpy()</function> or a similar function.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><varname>Leak</varname>, meaning
|
|
a memory leak.</para>
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
<para><computeroutput>Param</computeroutput> errors have an extra
|
|
information line at this point, which is the name of the offending
|
|
system call parameter. No other error kinds have this extra
|
|
line.</para>
|
|
|
|
<para>The first line of the calling context: for Value and Addr errors,
|
|
it is either the name of the function in which the error occurred, or,
|
|
failing that, the full path of the .so file or executable containing the
|
|
error location. For Free errors, is the name of the function doing the
|
|
freeing (eg, <function>free</function>,
|
|
<function>__builtin_vec_delete</function>, etc). For Overlap errors, is
|
|
the name of the function with the overlapping arguments (eg.
|
|
<function>memcpy()</function>, <function>strcpy()</function>,
|
|
etc).</para>
|
|
|
|
<para>Lastly, there's the rest of the calling context.</para>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<sect1 id="mc-manual.machine"
|
|
xreflabel="Details of Memcheck's checking machinery">
|
|
<title>Details of Memcheck's checking machinery</title>
|
|
|
|
<para>Read this section if you want to know, in detail, exactly
|
|
what and how Memcheck is checking.</para>
|
|
|
|
|
|
<sect2 id="mc-manual.value" xreflabel="Valid-value (V) bit">
|
|
<title>Valid-value (V) bits</title>
|
|
|
|
<para>It is simplest to think of Memcheck implementing a synthetic CPU
|
|
which is identical to a real CPU, except for one crucial detail. Every
|
|
bit (literally) of data processed, stored and handled by the real CPU
|
|
has, in the synthetic CPU, an associated "valid-value" bit, which says
|
|
whether or not the accompanying bit has a legitimate value. In the
|
|
discussions which follow, this bit is referred to as the V (valid-value)
|
|
bit.</para>
|
|
|
|
<para>Each byte in the system therefore has a 8 V bits which follow it
|
|
wherever it goes. For example, when the CPU loads a word-size item (4
|
|
bytes) from memory, it also loads the corresponding 32 V bits from a
|
|
bitmap which stores the V bits for the process' entire address space.
|
|
If the CPU should later write the whole or some part of that value to
|
|
memory at a different address, the relevant V bits will be stored back
|
|
in the V-bit bitmap.</para>
|
|
|
|
<para>In short, each bit in the system has an associated V bit, which
|
|
follows it around everywhere, even inside the CPU. Yes, all the CPU's
|
|
registers (integer, floating point, vector and condition registers) have
|
|
their own V bit vectors.</para>
|
|
|
|
<para>Copying values around does not cause Memcheck to check for, or
|
|
report on, errors. However, when a value is used in a way which might
|
|
conceivably affect the outcome of your program's computation, the
|
|
associated V bits are immediately checked. If any of these indicate
|
|
that the value is undefined, an error is reported.</para>
|
|
|
|
<para>Here's an (admittedly nonsensical) example:</para>
|
|
<programlisting><![CDATA[
|
|
int i, j;
|
|
int a[10], b[10];
|
|
for ( i = 0; i < 10; i++ ) {
|
|
j = a[i];
|
|
b[i] = j;
|
|
}]]></programlisting>
|
|
|
|
<para>Memcheck emits no complaints about this, since it merely copies
|
|
uninitialised values from <varname>a[]</varname> into
|
|
<varname>b[]</varname>, and doesn't use them in a way which could
|
|
affect the behaviour of the program. However, if
|
|
the loop is changed to:</para>
|
|
<programlisting><![CDATA[
|
|
for ( i = 0; i < 10; i++ ) {
|
|
j += a[i];
|
|
}
|
|
if ( j == 77 )
|
|
printf("hello there\n");
|
|
]]></programlisting>
|
|
|
|
<para>then Memcheck will complain, at the
|
|
<computeroutput>if</computeroutput>, that the condition depends on
|
|
uninitialised values. Note that it <command>doesn't</command> complain
|
|
at the <varname>j += a[i];</varname>, since at that point the
|
|
undefinedness is not "observable". It's only when a decision has to be
|
|
made as to whether or not to do the <function>printf</function> -- an
|
|
observable action of your program -- that Memcheck complains.</para>
|
|
|
|
<para>Most low level operations, such as adds, cause Memcheck to use the
|
|
V bits for the operands to calculate the V bits for the result. Even if
|
|
the result is partially or wholly undefined, it does not
|
|
complain.</para>
|
|
|
|
<para>Checks on definedness only occur in three places: when a value is
|
|
used to generate a memory address, when control flow decision needs to
|
|
be made, and when a system call is detected, Memcheck checks definedness
|
|
of parameters as required.</para>
|
|
|
|
<para>If a check should detect undefinedness, an error message is
|
|
issued. The resulting value is subsequently regarded as well-defined.
|
|
To do otherwise would give long chains of error messages. In other
|
|
words, once Memcheck reports an undefined value error, it tries to
|
|
avoid reporting further errors derived from that same undefined
|
|
value.</para>
|
|
|
|
<para>This sounds overcomplicated. Why not just check all reads from
|
|
memory, and complain if an undefined value is loaded into a CPU
|
|
register? Well, that doesn't work well, because perfectly legitimate C
|
|
programs routinely copy uninitialised values around in memory, and we
|
|
don't want endless complaints about that. Here's the canonical example.
|
|
Consider a struct like this:</para>
|
|
<programlisting><![CDATA[
|
|
struct S { int x; char c; };
|
|
struct S s1, s2;
|
|
s1.x = 42;
|
|
s1.c = 'z';
|
|
s2 = s1;
|
|
]]></programlisting>
|
|
|
|
<para>The question to ask is: how large is <varname>struct S</varname>,
|
|
in bytes? An <varname>int</varname> is 4 bytes and a
|
|
<varname>char</varname> one byte, so perhaps a <varname>struct
|
|
S</varname> occupies 5 bytes? Wrong. All non-toy compilers we know
|
|
of will round the size of <varname>struct S</varname> up to a whole
|
|
number of words, in this case 8 bytes. Not doing this forces compilers
|
|
to generate truly appalling code for accessing arrays of
|
|
<varname>struct S</varname>'s on some architectures.</para>
|
|
|
|
<para>So <varname>s1</varname> occupies 8 bytes, yet only 5 of them will
|
|
be initialised. For the assignment <varname>s2 = s1</varname>, gcc
|
|
generates code to copy all 8 bytes wholesale into <varname>s2</varname>
|
|
without regard for their meaning. If Memcheck simply checked values as
|
|
they came out of memory, it would yelp every time a structure assignment
|
|
like this happened. So the more complicated behaviour described above
|
|
is necessary. This allows <literal>gcc</literal> to copy
|
|
<varname>s1</varname> into <varname>s2</varname> any way it likes, and a
|
|
warning will only be emitted if the uninitialised values are later
|
|
used.</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
<sect2 id="mc-manual.vaddress" xreflabel=" Valid-address (A) bits">
|
|
<title>Valid-address (A) bits</title>
|
|
|
|
<para>Notice that the previous subsection describes how the validity of
|
|
values is established and maintained without having to say whether the
|
|
program does or does not have the right to access any particular memory
|
|
location. We now consider the latter question.</para>
|
|
|
|
<para>As described above, every bit in memory or in the CPU has an
|
|
associated valid-value (V) bit. In addition, all bytes in memory, but
|
|
not in the CPU, have an associated valid-address (A) bit. This
|
|
indicates whether or not the program can legitimately read or write that
|
|
location. It does not give any indication of the validity or the data
|
|
at that location -- that's the job of the V bits -- only whether or not
|
|
the location may be accessed.</para>
|
|
|
|
<para>Every time your program reads or writes memory, Memcheck checks
|
|
the A bits associated with the address. If any of them indicate an
|
|
invalid address, an error is emitted. Note that the reads and writes
|
|
themselves do not change the A bits, only consult them.</para>
|
|
|
|
<para>So how do the A bits get set/cleared? Like this:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>When the program starts, all the global data areas are
|
|
marked as accessible.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>When the program does
|
|
<function>malloc</function>/<computeroutput>new</computeroutput>,
|
|
the A bits for exactly the area allocated, and not a byte more,
|
|
are marked as accessible. Upon freeing the area the A bits are
|
|
changed to indicate inaccessibility.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>When the stack pointer register (<literal>SP</literal>) moves
|
|
up or down, A bits are set. The rule is that the area from
|
|
<literal>SP</literal> up to the base of the stack is marked as
|
|
accessible, and below <literal>SP</literal> is inaccessible. (If
|
|
that sounds illogical, bear in mind that the stack grows down, not
|
|
up, on almost all Unix systems, including GNU/Linux.) Tracking
|
|
<literal>SP</literal> like this has the useful side-effect that the
|
|
section of stack used by a function for local variables etc is
|
|
automatically marked accessible on function entry and inaccessible
|
|
on exit.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>When doing system calls, A bits are changed appropriately.
|
|
For example, <literal>mmap</literal>
|
|
magically makes files appear in the process'
|
|
address space, so the A bits must be updated if <literal>mmap</literal>
|
|
succeeds.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>Optionally, your program can tell Memcheck about such changes
|
|
explicitly, using the client request mechanism described
|
|
above.</para>
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</sect2>
|
|
|
|
|
|
<sect2 id="mc-manual.together" xreflabel="Putting it all together">
|
|
<title>Putting it all together</title>
|
|
|
|
<para>Memcheck's checking machinery can be summarised as
|
|
follows:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>Each byte in memory has 8 associated V (valid-value) bits,
|
|
saying whether or not the byte has a defined value, and a single A
|
|
(valid-address) bit, saying whether or not the program currently has
|
|
the right to read/write that address.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>When memory is read or written, the relevant A bits are
|
|
consulted. If they indicate an invalid address, Memcheck emits an
|
|
Invalid read or Invalid write error.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>When memory is read into the CPU's registers, the relevant V
|
|
bits are fetched from memory and stored in the simulated CPU. They
|
|
are not consulted.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>When a register is written out to memory, the V bits for that
|
|
register are written back to memory too.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>When values in CPU registers are used to generate a memory
|
|
address, or to determine the outcome of a conditional branch, the V
|
|
bits for those values are checked, and an error emitted if any of
|
|
them are undefined.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>When values in CPU registers are used for any other purpose,
|
|
Memcheck computes the V bits for the result, but does not check
|
|
them.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>Once the V bits for a value in the CPU have been checked, they
|
|
are then set to indicate validity. This avoids long chains of
|
|
errors.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>When values are loaded from memory, Memcheck checks the A bits
|
|
for that location and issues an illegal-address warning if needed.
|
|
In that case, the V bits loaded are forced to indicate Valid,
|
|
despite the location being invalid.</para>
|
|
|
|
<para>This apparently strange choice reduces the amount of confusing
|
|
information presented to the user. It avoids the unpleasant
|
|
phenomenon in which memory is read from a place which is both
|
|
unaddressable and contains invalid values, and, as a result, you get
|
|
not only an invalid-address (read/write) error, but also a
|
|
potentially large set of uninitialised-value errors, one for every
|
|
time the value is used.</para>
|
|
|
|
<para>There is a hazy boundary case to do with multi-byte loads from
|
|
addresses which are partially valid and partially invalid. See
|
|
details of the flag <option>--partial-loads-ok</option> for details.
|
|
</para>
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
|
|
<para>Memcheck intercepts calls to <function>malloc</function>,
|
|
<function>calloc</function>, <function>realloc</function>,
|
|
<function>valloc</function>, <function>memalign</function>,
|
|
<function>free</function>, <computeroutput>new</computeroutput>,
|
|
<computeroutput>new[]</computeroutput>,
|
|
<computeroutput>delete</computeroutput> and
|
|
<computeroutput>delete[]</computeroutput>. The behaviour you get
|
|
is:</para>
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
<para><function>malloc</function>/<function>new</function>/<computeroutput>new[]</computeroutput>:
|
|
the returned memory is marked as addressable but not having valid
|
|
values. This means you have to write to it before you can read
|
|
it.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><function>calloc</function>: returned memory is marked both
|
|
addressable and valid, since <function>calloc</function> clears
|
|
the area to zero.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><function>realloc</function>: if the new size is larger than
|
|
the old, the new section is addressable but invalid, as with
|
|
<function>malloc</function>.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>If the new size is smaller, the dropped-off section is
|
|
marked as unaddressable. You may only pass to
|
|
<function>realloc</function> a pointer previously issued to you by
|
|
<function>malloc</function>/<function>calloc</function>/<function>realloc</function>.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><function>free</function>/<computeroutput>delete</computeroutput>/<computeroutput>delete[]</computeroutput>:
|
|
you may only pass to these functions a pointer previously issued
|
|
to you by the corresponding allocation function. Otherwise,
|
|
Memcheck complains. If the pointer is indeed valid, Memcheck
|
|
marks the entire area it points at as unaddressable, and places
|
|
the block in the freed-blocks-queue. The aim is to defer as long
|
|
as possible reallocation of this block. Until that happens, all
|
|
attempts to access it will elicit an invalid-address error, as you
|
|
would hope.</para>
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</sect2>
|
|
</sect1>
|
|
|
|
|
|
|
|
<sect1 id="mc-manual.clientreqs" xreflabel="Client requests">
|
|
<title>Client Requests</title>
|
|
|
|
<para>The following client requests are defined in
|
|
<filename>memcheck.h</filename>.
|
|
See <filename>memcheck.h</filename> for exact details of their
|
|
arguments.</para>
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
<para><varname>VALGRIND_MAKE_MEM_NOACCESS</varname>,
|
|
<varname>VALGRIND_MAKE_MEM_UNDEFINED</varname> and
|
|
<varname>VALGRIND_MAKE_MEM_DEFINED</varname>.
|
|
These mark address ranges as completely inaccessible,
|
|
accessible but containing undefined data, and accessible and
|
|
containing defined data, respectively. Subsequent errors may
|
|
have their faulting addresses described in terms of these
|
|
blocks. Returns a "block handle". Returns zero when not run
|
|
on Valgrind.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><varname>VALGRIND_MAKE_MEM_DEFINED_IF_ADDRESSABLE</varname>.
|
|
This is just like <varname>VALGRIND_MAKE_MEM_DEFINED</varname> but only
|
|
affects those bytes that are already addressable.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><varname>VALGRIND_DISCARD</varname>: At some point you may
|
|
want Valgrind to stop reporting errors in terms of the blocks
|
|
defined by the previous three macros. To do this, the above macros
|
|
return a small-integer "block handle". You can pass this block
|
|
handle to <varname>VALGRIND_DISCARD</varname>. After doing so,
|
|
Valgrind will no longer be able to relate addressing errors to the
|
|
user-defined block associated with the handle. The permissions
|
|
settings associated with the handle remain in place; this just
|
|
affects how errors are reported, not whether they are reported.
|
|
Returns 1 for an invalid handle and 0 for a valid handle (although
|
|
passing invalid handles is harmless). Always returns 0 when not run
|
|
on Valgrind.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><varname>VALGRIND_CHECK_MEM_IS_ADDRESSABLE</varname> and
|
|
<varname>VALGRIND_CHECK_MEM_IS_DEFINED</varname>: check immediately
|
|
whether or not the given address range has the relevant property,
|
|
and if not, print an error message. Also, for the convenience of
|
|
the client, returns zero if the relevant property holds; otherwise,
|
|
the returned value is the address of the first byte for which the
|
|
property is not true. Always returns 0 when not run on
|
|
Valgrind.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><varname>VALGRIND_CHECK_VALUE_IS_DEFINED</varname>: a quick and easy
|
|
way to find out whether Valgrind thinks a particular value
|
|
(lvalue, to be precise) is addressable and defined. Prints an error
|
|
message if not. It has no return value.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><varname>VALGRIND_DO_LEAK_CHECK</varname>: does a full memory leak
|
|
check (like <option>--leak-check=full</option> right now.
|
|
This is useful for incrementally checking for leaks between arbitrary
|
|
places in the program's execution. It has no return value.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><varname>VALGRIND_DO_QUICK_LEAK_CHECK</varname>: like
|
|
<varname>VALGRIND_DO_LEAK_CHECK</varname>, except it produces only a leak
|
|
summary (like <option>--leak-check=summary</option>).
|
|
It has no return value.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><varname>VALGRIND_COUNT_LEAKS</varname>: fills in the four
|
|
arguments with the number of bytes of memory found by the previous
|
|
leak check to be leaked (i.e. the sum of direct leaks and indirect leaks),
|
|
dubious, reachable and suppressed. Again, useful in test harness code,
|
|
after calling <varname>VALGRIND_DO_LEAK_CHECK</varname> or
|
|
<varname>VALGRIND_DO_QUICK_LEAK_CHECK</varname>.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><varname>VALGRIND_COUNT_LEAK_BLOCKS</varname>: identical to
|
|
<varname>VALGRIND_COUNT_LEAKS</varname> except that it returns the
|
|
number of blocks rather than the number of bytes in each
|
|
category.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><varname>VALGRIND_GET_VBITS</varname> and
|
|
<varname>VALGRIND_SET_VBITS</varname>: allow you to get and set the
|
|
V (validity) bits for an address range. You should probably only
|
|
set V bits that you have got with
|
|
<varname>VALGRIND_GET_VBITS</varname>. Only for those who really
|
|
know what they are doing.</para>
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
|
|
<sect1 id="mc-manual.mempools" xreflabel="Memory pools">
|
|
<title>Memory Pools: describing and working with custom allocators</title>
|
|
|
|
<para>Some programs use custom memory allocators, often for performance
|
|
reasons. Left to itself, Memcheck is unable to "understand" the
|
|
behaviour of custom allocation schemes and so may miss errors and
|
|
leaks in your program. What this section describes is a way to give
|
|
Memcheck enough of a description of your custom allocator that it can
|
|
make at least some sense of what is happening.</para>
|
|
|
|
<para>There are many different sorts of custom allocator, so Memcheck
|
|
attempts to reason about them using a loose, abstract model. We
|
|
use the following terminology when describing custom allocation
|
|
systems:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>Custom allocation involves a set of independent "memory pools".
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Memcheck's notion of a a memory pool consists of a single "anchor
|
|
address" and a set of non-overlapping "chunks" associated with the
|
|
anchor address.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Typically a pool's anchor address is the address of a
|
|
book-keeping "header" structure.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Typically the pool's chunks are drawn from a contiguous
|
|
"superblock" acquired through the system
|
|
<function>malloc()</function> or
|
|
<function>mmap()</function>.</para>
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
<para>Keep in mind that the last two points above say "typically": the
|
|
Valgrind mempool client request API is intentionally vague about the
|
|
exact structure of a mempool. There is no specific mention made of
|
|
headers or superblocks. Nevertheless, the following picture may help
|
|
elucidate the intention of the terms in the API:</para>
|
|
|
|
<programlisting><![CDATA[
|
|
"pool"
|
|
(anchor address)
|
|
|
|
|
v
|
|
+--------+---+
|
|
| header | o |
|
|
+--------+-|-+
|
|
|
|
|
v superblock
|
|
+------+---+--------------+---+------------------+
|
|
| |rzB| allocation |rzB| |
|
|
+------+---+--------------+---+------------------+
|
|
^ ^
|
|
| |
|
|
"addr" "addr"+"size"
|
|
]]></programlisting>
|
|
|
|
<para>
|
|
Note that the header and the superblock may be contiguous or
|
|
discontiguous, and there may be multiple superblocks associated with a
|
|
single header; such variations are opaque to Memcheck. The API
|
|
only requires that your allocation scheme can present sensible values
|
|
of "pool", "addr" and "size".</para>
|
|
|
|
<para>
|
|
Typically, before making client requests related to mempools, a client
|
|
program will have allocated such a header and superblock for their
|
|
mempool, and marked the superblock NOACCESS using the
|
|
<varname>VALGRIND_MAKE_MEM_NOACCESS</varname> client request.</para>
|
|
|
|
<para>
|
|
When dealing with mempools, the goal is to maintain a particular
|
|
invariant condition: that Memcheck believes the unallocated portions
|
|
of the pool's superblock (including redzones) are NOACCESS. To
|
|
maintain this invariant, the client program must ensure that the
|
|
superblock starts out in that state; Memcheck cannot make it so, since
|
|
Memcheck never explicitly learns about the superblock of a pool, only
|
|
the allocated chunks within the pool.</para>
|
|
|
|
<para>
|
|
Once the header and superblock for a pool are established and properly
|
|
marked, there are a number of client requests programs can use to
|
|
inform Memcheck about changes to the state of a mempool:</para>
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
<para>
|
|
<varname>VALGRIND_CREATE_MEMPOOL(pool, rzB, is_zeroed)</varname>:
|
|
This request registers the address "pool" as the anchor address
|
|
for a memory pool. It also provides a size "rzB", specifying how
|
|
large the redzones placed around chunks allocated from the pool
|
|
should be. Finally, it provides an "is_zeroed" flag that specifies
|
|
whether the pool's chunks are zeroed (more precisely: defined)
|
|
when allocated.
|
|
</para>
|
|
<para>
|
|
Upon completion of this request, no chunks are associated with the
|
|
pool. The request simply tells Memcheck that the pool exists, so that
|
|
subsequent calls can refer to it as a pool.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><varname>VALGRIND_DESTROY_MEMPOOL(pool)</varname>:
|
|
This request tells Memcheck that a pool is being torn down. Memcheck
|
|
then removes all records of chunks associated with the pool, as well
|
|
as its record of the pool's existence. While destroying its records of
|
|
a mempool, Memcheck resets the redzones of any live chunks in the pool
|
|
to NOACCESS.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><varname>VALGRIND_MEMPOOL_ALLOC(pool, addr, size)</varname>:
|
|
This request informs Memcheck that a "size"-byte chunk has been
|
|
allocated at "addr", and associates the chunk with the specified
|
|
"pool". If the pool was created with nonzero "rzB" redzones, Memcheck
|
|
will mark the "rzB" bytes before and after the chunk as NOACCESS. If
|
|
the pool was created with the "is_zeroed" flag set, Memcheck will mark
|
|
the chunk as DEFINED, otherwise Memcheck will mark the chunk as
|
|
UNDEFINED.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><varname>VALGRIND_MEMPOOL_FREE(pool, addr)</varname>:
|
|
This request informs Memcheck that the chunk at "addr" should no
|
|
longer be considered allocated. Memcheck will mark the chunk
|
|
associated with "addr" as NOACCESS, and delete its record of the
|
|
chunk's existence.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><varname>VALGRIND_MEMPOOL_TRIM(pool, addr, size)</varname>:
|
|
This request "trims" the chunks associated with "pool". The request
|
|
only operates on chunks associated with "pool". Trimming is formally
|
|
defined as:</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para> All chunks entirely inside the range [addr,addr+size) are
|
|
preserved.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>All chunks entirely outside the range [addr,addr+size) are
|
|
discarded, as though <varname>VALGRIND_MEMPOOL_FREE</varname>
|
|
was called on them. </para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>All other chunks must intersect with the range
|
|
[addr,addr+size); areas outside the intersection are marked as
|
|
NOACCESS, as though they had been independently freed with
|
|
<varname>VALGRIND_MEMPOOL_FREE</varname>.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<para>This is a somewhat rare request, but can be useful in
|
|
implementing the type of mass-free operations common in custom
|
|
LIFO allocators.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><varname>VALGRIND_MOVE_MEMPOOL(poolA, poolB)</varname>: This
|
|
request informs Memcheck that the pool previously anchored at
|
|
address "poolA" has moved to anchor address "poolB". This is a
|
|
rare request, typically only needed if you
|
|
<function>realloc()</function> the header of a mempool.</para>
|
|
<para>No memory-status bits are altered by this request.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
<varname>VALGRIND_MEMPOOL_CHANGE(pool, addrA, addrB,
|
|
size)</varname>: This request informs Memcheck that the chunk
|
|
previously allocated at address "addrA" within "pool" has been
|
|
moved and/or resized, and should be changed to cover the region
|
|
[addrB,addrB+size). This is a rare request, typically only needed
|
|
if you <function>realloc()</function> a superblock or wish to
|
|
extend a chunk without changing its memory-status bits.
|
|
</para>
|
|
<para>No memory-status bits are altered by this request.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><varname>VALGRIND_MEMPOOL_EXISTS(pool)</varname>:
|
|
This request informs the caller whether or not Memcheck is currently
|
|
tracking a mempool at anchor address "pool". It evaluates to 1 when
|
|
there is a mempool associated with that address, 0 otherwise. This is a
|
|
rare request, only useful in circumstances when client code might have
|
|
lost track of the set of active mempools.
|
|
</para>
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<sect1 id="mc-manual.mpiwrap" xreflabel="MPI Wrappers">
|
|
<title>Debugging MPI Parallel Programs with Valgrind</title>
|
|
|
|
<para> Valgrind supports debugging of distributed-memory applications
|
|
which use the MPI message passing standard. This support consists of a
|
|
library of wrapper functions for the
|
|
<computeroutput>PMPI_*</computeroutput> interface. When incorporated
|
|
into the application's address space, either by direct linking or by
|
|
<computeroutput>LD_PRELOAD</computeroutput>, the wrappers intercept
|
|
calls to <computeroutput>PMPI_Send</computeroutput>,
|
|
<computeroutput>PMPI_Recv</computeroutput>, etc. They then
|
|
use client requests to inform Valgrind of memory state changes caused
|
|
by the function being wrapped. This reduces the number of false
|
|
positives that Memcheck otherwise typically reports for MPI
|
|
applications.</para>
|
|
|
|
<para>The wrappers also take the opportunity to carefully check
|
|
size and definedness of buffers passed as arguments to MPI functions, hence
|
|
detecting errors such as passing undefined data to
|
|
<computeroutput>PMPI_Send</computeroutput>, or receiving data into a
|
|
buffer which is too small.</para>
|
|
|
|
<para>Unlike most of the rest of Valgrind, the wrapper library is subject to a
|
|
BSD-style license, so you can link it into any code base you like.
|
|
See the top of <computeroutput>auxprogs/libmpiwrap.c</computeroutput>
|
|
for license details.</para>
|
|
|
|
|
|
<sect2 id="mc-manual.mpiwrap.build" xreflabel="Building MPI Wrappers">
|
|
<title>Building and installing the wrappers</title>
|
|
|
|
<para> The wrapper library will be built automatically if possible.
|
|
Valgrind's configure script will look for a suitable
|
|
<computeroutput>mpicc</computeroutput> to build it with. This must be
|
|
the same <computeroutput>mpicc</computeroutput> you use to build the
|
|
MPI application you want to debug. By default, Valgrind tries
|
|
<computeroutput>mpicc</computeroutput>, but you can specify a
|
|
different one by using the configure-time flag
|
|
<option>--with-mpicc=</option>. Currently the
|
|
wrappers are only buildable with
|
|
<computeroutput>mpicc</computeroutput>s which are based on GNU
|
|
<computeroutput>gcc</computeroutput> or Intel's
|
|
<computeroutput>icc</computeroutput>.</para>
|
|
|
|
<para>Check that the configure script prints a line like this:</para>
|
|
|
|
<programlisting><![CDATA[
|
|
checking for usable MPI2-compliant mpicc and mpi.h... yes, mpicc
|
|
]]></programlisting>
|
|
|
|
<para>If it says <computeroutput>... no</computeroutput>, your
|
|
<computeroutput>mpicc</computeroutput> has failed to compile and link
|
|
a test MPI2 program.</para>
|
|
|
|
<para>If the configure test succeeds, continue in the usual way with
|
|
<computeroutput>make</computeroutput> and <computeroutput>make
|
|
install</computeroutput>. The final install tree should then contain
|
|
<computeroutput>libmpiwrap.so</computeroutput>.
|
|
</para>
|
|
|
|
<para>Compile up a test MPI program (eg, MPI hello-world) and try
|
|
this:</para>
|
|
|
|
<programlisting><![CDATA[
|
|
LD_PRELOAD=$prefix/lib/valgrind/libmpiwrap-<platform>.so \
|
|
mpirun [args] $prefix/bin/valgrind ./hello
|
|
]]></programlisting>
|
|
|
|
<para>You should see something similar to the following</para>
|
|
|
|
<programlisting><![CDATA[
|
|
valgrind MPI wrappers 31901: Active for pid 31901
|
|
valgrind MPI wrappers 31901: Try MPIWRAP_DEBUG=help for possible options
|
|
]]></programlisting>
|
|
|
|
<para>repeated for every process in the group. If you do not see
|
|
these, there is an build/installation problem of some kind.</para>
|
|
|
|
<para> The MPI functions to be wrapped are assumed to be in an ELF
|
|
shared object with soname matching
|
|
<computeroutput>libmpi.so*</computeroutput>. This is known to be
|
|
correct at least for Open MPI and Quadrics MPI, and can easily be
|
|
changed if required.</para>
|
|
</sect2>
|
|
|
|
|
|
<sect2 id="mc-manual.mpiwrap.gettingstarted"
|
|
xreflabel="Getting started with MPI Wrappers">
|
|
<title>Getting started</title>
|
|
|
|
<para>Compile your MPI application as usual, taking care to link it
|
|
using the same <computeroutput>mpicc</computeroutput> that your
|
|
Valgrind build was configured with.</para>
|
|
|
|
<para>
|
|
Use the following basic scheme to run your application on Valgrind with
|
|
the wrappers engaged:</para>
|
|
|
|
<programlisting><![CDATA[
|
|
MPIWRAP_DEBUG=[wrapper-args] \
|
|
LD_PRELOAD=$prefix/lib/valgrind/libmpiwrap-<platform>.so \
|
|
mpirun [mpirun-args] \
|
|
$prefix/bin/valgrind [valgrind-args] \
|
|
[application] [app-args]
|
|
]]></programlisting>
|
|
|
|
<para>As an alternative to
|
|
<computeroutput>LD_PRELOAD</computeroutput>ing
|
|
<computeroutput>libmpiwrap-<platform>.so</computeroutput>, you can
|
|
simply link it to your application if desired. This should not disturb
|
|
native behaviour of your application in any way.</para>
|
|
</sect2>
|
|
|
|
|
|
<sect2 id="mc-manual.mpiwrap.controlling"
|
|
xreflabel="Controlling the MPI Wrappers">
|
|
<title>Controlling the wrapper library</title>
|
|
|
|
<para>Environment variable
|
|
<computeroutput>MPIWRAP_DEBUG</computeroutput> is consulted at
|
|
startup. The default behaviour is to print a starting banner</para>
|
|
|
|
<programlisting><![CDATA[
|
|
valgrind MPI wrappers 16386: Active for pid 16386
|
|
valgrind MPI wrappers 16386: Try MPIWRAP_DEBUG=help for possible options
|
|
]]></programlisting>
|
|
|
|
<para> and then be relatively quiet.</para>
|
|
|
|
<para>You can give a list of comma-separated options in
|
|
<computeroutput>MPIWRAP_DEBUG</computeroutput>. These are</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para><computeroutput>verbose</computeroutput>:
|
|
show entries/exits of all wrappers. Also show extra
|
|
debugging info, such as the status of outstanding
|
|
<computeroutput>MPI_Request</computeroutput>s resulting
|
|
from uncompleted <computeroutput>MPI_Irecv</computeroutput>s.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para><computeroutput>quiet</computeroutput>:
|
|
opposite of <computeroutput>verbose</computeroutput>, only print
|
|
anything when the wrappers want
|
|
to report a detected programming error, or in case of catastrophic
|
|
failure of the wrappers.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para><computeroutput>warn</computeroutput>:
|
|
by default, functions which lack proper wrappers
|
|
are not commented on, just silently
|
|
ignored. This causes a warning to be printed for each unwrapped
|
|
function used, up to a maximum of three warnings per function.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para><computeroutput>strict</computeroutput>:
|
|
print an error message and abort the program if
|
|
a function lacking a wrapper is used.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para> If you want to use Valgrind's XML output facility
|
|
(<option>--xml=yes</option>), you should pass
|
|
<computeroutput>quiet</computeroutput> in
|
|
<computeroutput>MPIWRAP_DEBUG</computeroutput> so as to get rid of any
|
|
extraneous printing from the wrappers.</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
<sect2 id="mc-manual.mpiwrap.limitations"
|
|
xreflabel="Abilities and Limitations of MPI Wrappers">
|
|
<title>Abilities and limitations</title>
|
|
|
|
<sect3 id="mc-manual.mpiwrap.limitations.functions"
|
|
xreflabel="Functions">
|
|
<title>Functions</title>
|
|
|
|
<para>All MPI2 functions except
|
|
<computeroutput>MPI_Wtick</computeroutput>,
|
|
<computeroutput>MPI_Wtime</computeroutput> and
|
|
<computeroutput>MPI_Pcontrol</computeroutput> have wrappers. The
|
|
first two are not wrapped because they return a
|
|
<computeroutput>double</computeroutput>, and Valgrind's
|
|
function-wrap mechanism cannot handle that (it could easily enough be
|
|
extended to). <computeroutput>MPI_Pcontrol</computeroutput> cannot be
|
|
wrapped as it has variable arity:
|
|
<computeroutput>int MPI_Pcontrol(const int level, ...)</computeroutput></para>
|
|
|
|
<para>Most functions are wrapped with a default wrapper which does
|
|
nothing except complain or abort if it is called, depending on
|
|
settings in <computeroutput>MPIWRAP_DEBUG</computeroutput> listed
|
|
above. The following functions have "real", do-something-useful
|
|
wrappers:</para>
|
|
|
|
<programlisting><![CDATA[
|
|
PMPI_Send PMPI_Bsend PMPI_Ssend PMPI_Rsend
|
|
|
|
PMPI_Recv PMPI_Get_count
|
|
|
|
PMPI_Isend PMPI_Ibsend PMPI_Issend PMPI_Irsend
|
|
|
|
PMPI_Irecv
|
|
PMPI_Wait PMPI_Waitall
|
|
PMPI_Test PMPI_Testall
|
|
|
|
PMPI_Iprobe PMPI_Probe
|
|
|
|
PMPI_Cancel
|
|
|
|
PMPI_Sendrecv
|
|
|
|
PMPI_Type_commit PMPI_Type_free
|
|
|
|
PMPI_Pack PMPI_Unpack
|
|
|
|
PMPI_Bcast PMPI_Gather PMPI_Scatter PMPI_Alltoall
|
|
PMPI_Reduce PMPI_Allreduce PMPI_Op_create
|
|
|
|
PMPI_Comm_create PMPI_Comm_dup PMPI_Comm_free PMPI_Comm_rank PMPI_Comm_size
|
|
|
|
PMPI_Error_string
|
|
PMPI_Init PMPI_Initialized PMPI_Finalize
|
|
]]></programlisting>
|
|
|
|
<para> A few functions such as
|
|
<computeroutput>PMPI_Address</computeroutput> are listed as
|
|
<computeroutput>HAS_NO_WRAPPER</computeroutput>. They have no wrapper
|
|
at all as there is nothing worth checking, and giving a no-op wrapper
|
|
would reduce performance for no reason.</para>
|
|
|
|
<para> Note that the wrapper library itself can itself generate large
|
|
numbers of calls to the MPI implementation, especially when walking
|
|
complex types. The most common functions called are
|
|
<computeroutput>PMPI_Extent</computeroutput>,
|
|
<computeroutput>PMPI_Type_get_envelope</computeroutput>,
|
|
<computeroutput>PMPI_Type_get_contents</computeroutput>, and
|
|
<computeroutput>PMPI_Type_free</computeroutput>. </para>
|
|
</sect3>
|
|
|
|
<sect3 id="mc-manual.mpiwrap.limitations.types"
|
|
xreflabel="Types">
|
|
<title>Types</title>
|
|
|
|
<para> MPI-1.1 structured types are supported, and walked exactly.
|
|
The currently supported combiners are
|
|
<computeroutput>MPI_COMBINER_NAMED</computeroutput>,
|
|
<computeroutput>MPI_COMBINER_CONTIGUOUS</computeroutput>,
|
|
<computeroutput>MPI_COMBINER_VECTOR</computeroutput>,
|
|
<computeroutput>MPI_COMBINER_HVECTOR</computeroutput>
|
|
<computeroutput>MPI_COMBINER_INDEXED</computeroutput>,
|
|
<computeroutput>MPI_COMBINER_HINDEXED</computeroutput> and
|
|
<computeroutput>MPI_COMBINER_STRUCT</computeroutput>. This should
|
|
cover all MPI-1.1 types. The mechanism (function
|
|
<computeroutput>walk_type</computeroutput>) should extend easily to
|
|
cover MPI2 combiners.</para>
|
|
|
|
<para>MPI defines some named structured types
|
|
(<computeroutput>MPI_FLOAT_INT</computeroutput>,
|
|
<computeroutput>MPI_DOUBLE_INT</computeroutput>,
|
|
<computeroutput>MPI_LONG_INT</computeroutput>,
|
|
<computeroutput>MPI_2INT</computeroutput>,
|
|
<computeroutput>MPI_SHORT_INT</computeroutput>,
|
|
<computeroutput>MPI_LONG_DOUBLE_INT</computeroutput>) which are pairs
|
|
of some basic type and a C <computeroutput>int</computeroutput>.
|
|
Unfortunately the MPI specification makes it impossible to look inside
|
|
these types and see where the fields are. Therefore these wrappers
|
|
assume the types are laid out as <computeroutput>struct { float val;
|
|
int loc; }</computeroutput> (for
|
|
<computeroutput>MPI_FLOAT_INT</computeroutput>), etc, and act
|
|
accordingly. This appears to be correct at least for Open MPI 1.0.2
|
|
and for Quadrics MPI.</para>
|
|
|
|
<para>If <computeroutput>strict</computeroutput> is an option specified
|
|
in <computeroutput>MPIWRAP_DEBUG</computeroutput>, the application
|
|
will abort if an unhandled type is encountered. Otherwise, the
|
|
application will print a warning message and continue.</para>
|
|
|
|
<para>Some effort is made to mark/check memory ranges corresponding to
|
|
arrays of values in a single pass. This is important for performance
|
|
since asking Valgrind to mark/check any range, no matter how small,
|
|
carries quite a large constant cost. This optimisation is applied to
|
|
arrays of primitive types (<computeroutput>double</computeroutput>,
|
|
<computeroutput>float</computeroutput>,
|
|
<computeroutput>int</computeroutput>,
|
|
<computeroutput>long</computeroutput>, <computeroutput>long
|
|
long</computeroutput>, <computeroutput>short</computeroutput>,
|
|
<computeroutput>char</computeroutput>, and <computeroutput>long
|
|
double</computeroutput> on platforms where <computeroutput>sizeof(long
|
|
double) == 8</computeroutput>). For arrays of all other types, the
|
|
wrappers handle each element individually and so there can be a very
|
|
large performance cost.</para>
|
|
|
|
</sect3>
|
|
|
|
</sect2>
|
|
|
|
|
|
<sect2 id="mc-manual.mpiwrap.writingwrappers"
|
|
xreflabel="Writing new MPI Wrappers">
|
|
<title>Writing new wrappers</title>
|
|
|
|
<para>
|
|
For the most part the wrappers are straightforward. The only
|
|
significant complexity arises with nonblocking receives.</para>
|
|
|
|
<para>The issue is that <computeroutput>MPI_Irecv</computeroutput>
|
|
states the recv buffer and returns immediately, giving a handle
|
|
(<computeroutput>MPI_Request</computeroutput>) for the transaction.
|
|
Later the user will have to poll for completion with
|
|
<computeroutput>MPI_Wait</computeroutput> etc, and when the
|
|
transaction completes successfully, the wrappers have to paint the
|
|
recv buffer. But the recv buffer details are not presented to
|
|
<computeroutput>MPI_Wait</computeroutput> -- only the handle is. The
|
|
library therefore maintains a shadow table which associates
|
|
uncompleted <computeroutput>MPI_Request</computeroutput>s with the
|
|
corresponding buffer address/count/type. When an operation completes,
|
|
the table is searched for the associated address/count/type info, and
|
|
memory is marked accordingly.</para>
|
|
|
|
<para>Access to the table is guarded by a (POSIX pthreads) lock, so as
|
|
to make the library thread-safe.</para>
|
|
|
|
<para>The table is allocated with
|
|
<computeroutput>malloc</computeroutput> and never
|
|
<computeroutput>free</computeroutput>d, so it will show up in leak
|
|
checks.</para>
|
|
|
|
<para>Writing new wrappers should be fairly easy. The source file is
|
|
<computeroutput>auxprogs/libmpiwrap.c</computeroutput>. If possible,
|
|
find an existing wrapper for a function of similar behaviour to the
|
|
one you want to wrap, and use it as a starting point. The wrappers
|
|
are organised in sections in the same order as the MPI 1.1 spec, to
|
|
aid navigation. When adding a wrapper, remember to comment out the
|
|
definition of the default wrapper in the long list of defaults at the
|
|
bottom of the file (do not remove it, just comment it out).</para>
|
|
</sect2>
|
|
|
|
<sect2 id="mc-manual.mpiwrap.whattoexpect"
|
|
xreflabel="What to expect with MPI Wrappers">
|
|
<title>What to expect when using the wrappers</title>
|
|
|
|
<para>The wrappers should reduce Memcheck's false-error rate on MPI
|
|
applications. Because the wrapping is done at the MPI interface,
|
|
there will still potentially be a large number of errors reported in
|
|
the MPI implementation below the interface. The best you can do is
|
|
try to suppress them.</para>
|
|
|
|
<para>You may also find that the input-side (buffer
|
|
length/definedness) checks find errors in your MPI use, for example
|
|
passing too short a buffer to
|
|
<computeroutput>MPI_Recv</computeroutput>.</para>
|
|
|
|
<para>Functions which are not wrapped may increase the false
|
|
error rate. A possible approach is to run with
|
|
<computeroutput>MPI_DEBUG</computeroutput> containing
|
|
<computeroutput>warn</computeroutput>. This will show you functions
|
|
which lack proper wrappers but which are nevertheless used. You can
|
|
then write wrappers for them.
|
|
</para>
|
|
|
|
<para>A known source of potential false errors are the
|
|
<computeroutput>PMPI_Reduce</computeroutput> family of functions, when
|
|
using a custom (user-defined) reduction function. In a reduction
|
|
operation, each node notionally sends data to a "central point" which
|
|
uses the specified reduction function to merge the data items into a
|
|
single item. Hence, in general, data is passed between nodes and fed
|
|
to the reduction function, but the wrapper library cannot mark the
|
|
transferred data as initialised before it is handed to the reduction
|
|
function, because all that happens "inside" the
|
|
<computeroutput>PMPI_Reduce</computeroutput> call. As a result you
|
|
may see false positives reported in your reduction function.</para>
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
|
|
|
|
</chapter>
|