mirror of
https://github.com/Zenithsiz/ftmemsim-valgrind.git
synced 2026-02-03 18:13:01 +00:00
Update documents in preparation for 3.3.0, and restructure them
somewhat to move less relevant material out of the way to some extent. The main changes are: * Update date and version info * Mention other tools in the quick-start guide * Document --child-silent-after-fork * Rearrange order of sections in the Valgrind Core chapter, to move advanced stuff (client requests) to the end, and compact stuff relevant to the majority of users towards the front * Move MPI debugging stuff from the Core manual (a nonsensical place for it) to the Memcheck chapter * Update the manual's introductory chapter a bit * Connect up new tech docs summary page, and disconnect old and very out of date valgrind/memcheck tech docs * Add section tags to the Cachegrind manual, to stop xsltproc complaining about their absence git-svn-id: svn://svn.valgrind.org/valgrind/trunk@7199
This commit is contained in:
parent
595181679a
commit
9101880b1f
@ -6,8 +6,9 @@ dynamic-translation framework.
|
||||
|
||||
Jeremy Fitzhardinge, jeremy@valgrind.org
|
||||
|
||||
Jeremy wrote Helgrind and totally overhauled low-level syscall/signal
|
||||
and address space layout stuff, among many other improvements.
|
||||
Jeremy wrote Helgrind (in the 2.X line) and totally overhauled
|
||||
low-level syscall/signal and address space layout stuff, among many
|
||||
other improvements.
|
||||
|
||||
Tom Hughes, tom@valgrind.org
|
||||
|
||||
|
||||
5
AUTHORS
5
AUTHORS
@ -2,8 +2,9 @@
|
||||
Cerion Armour-Brown worked on PowerPC instruction set support using
|
||||
the Vex dynamic-translation framework.
|
||||
|
||||
Jeremy Fitzhardinge wrote Helgrind and totally overhauled low-level
|
||||
syscall/signal and address space layout stuff, among many other things.
|
||||
Jeremy Fitzhardinge wrote Helgrind (in the 2.X line) and totally
|
||||
overhauled low-level syscall/signal and address space layout stuff,
|
||||
among many other things.
|
||||
|
||||
Tom Hughes did a vast number of bug fixes, and helped out with support
|
||||
for more recent Linux/glibc versions.
|
||||
|
||||
@ -937,7 +937,7 @@ way as for C/C++ programs.</para>
|
||||
|
||||
|
||||
|
||||
<sect2>
|
||||
<sect2 id="cg-manual.annopts.warnings" xreflabel="Warnings">
|
||||
<title>Warnings</title>
|
||||
|
||||
<para>There are a couple of situations in which
|
||||
@ -969,7 +969,8 @@ warnings.</para>
|
||||
|
||||
|
||||
|
||||
<sect2>
|
||||
<sect2 id="cg-manual.annopts.things-to-watch-out-for"
|
||||
xreflabel="Things to watch out for">
|
||||
<title>Things to watch out for</title>
|
||||
|
||||
<para>Some odd things that can occur during annotation:</para>
|
||||
@ -1084,7 +1085,7 @@ rare.</para>
|
||||
|
||||
|
||||
|
||||
<sect2>
|
||||
<sect2 id="cg-manual.annopts.accuracy" xreflabel="Accuracy">
|
||||
<title>Accuracy</title>
|
||||
|
||||
<para>Valgrind's cache profiling has a number of
|
||||
@ -1221,7 +1222,8 @@ fail these checks.</para>
|
||||
</sect1>
|
||||
|
||||
|
||||
<sect1>
|
||||
<sect1 id="cg-manual.acting-on"
|
||||
xreflabel="Acting on Cachegrind's information">
|
||||
<title>Acting on Cachegrind's information</title>
|
||||
<para>
|
||||
So, you've managed to profile your program with Cachegrind. Now what?
|
||||
@ -1260,14 +1262,16 @@ yourself. But at least you have the information!
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<sect1 id="cg-manual.impl-details"
|
||||
xreflabel="Implementation details">
|
||||
<title>Implementation details</title>
|
||||
<para>
|
||||
This section talks about details you don't need to know about in order to
|
||||
use Cachegrind, but may be of interest to some people.
|
||||
</para>
|
||||
|
||||
<sect2>
|
||||
<sect2 id="cg-manual.impl-details.how-cg-works"
|
||||
xreflabel="How Cachegrind works">
|
||||
<title>How Cachegrind works</title>
|
||||
<para>The best reference for understanding how Cachegrind works is chapter 3 of
|
||||
"Dynamic Binary Analysis and Instrumentation", by Nicholas Nethercote. It
|
||||
@ -1275,7 +1279,8 @@ is available on the <ulink url="&vg-pubs;">Valgrind publications
|
||||
page</ulink>.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<sect2 id="cg-manual.impl-details.file-format"
|
||||
xreflabel="Cachegrind output file format">
|
||||
<title>Cachegrind output file format</title>
|
||||
<para>The file format is fairly straightforward, basically giving the
|
||||
cost centre for every line, grouped by files and
|
||||
|
||||
@ -7,5 +7,6 @@ EXTRA_DIST = \
|
||||
manual-writing-tools.xml\
|
||||
quick-start-guide.xml \
|
||||
tech-docs.xml \
|
||||
new-tech-docs.xml \
|
||||
vg-entities.xml \
|
||||
xml_help.txt
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@ -11,7 +11,7 @@
|
||||
<para>Valgrind is a suite of simulation-based debugging and profiling
|
||||
tools for programs running on Linux (x86, amd64, ppc32 and ppc64).
|
||||
The system consists of a core, which provides a synthetic CPU in
|
||||
software, and a series of tools, each of which performs some kind of
|
||||
software, and a set of tools, each of which performs some kind of
|
||||
debugging, profiling, or similar task. The architecture is modular,
|
||||
so that new tools can be created easily and without disturbing the
|
||||
existing structure.</para>
|
||||
@ -106,6 +106,30 @@ summary, these are:</para>
|
||||
paging needed.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para><command>Helgrind</command> detects synchronisation errors
|
||||
in programs that use the POSIX pthreads threading primitives. It
|
||||
detects the following three classes of errors:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Misuses of the POSIX pthreads API.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Potential deadlocks arising from lock ordering
|
||||
problems.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data races -- accessing memory without adequate locking.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>Problems like these often result in unreproducible,
|
||||
timing-dependent crashes, deadlocks and other misbehaviour, and
|
||||
can be difficult to find by other means.</para>
|
||||
|
||||
</listitem>
|
||||
|
||||
</orderedlist>
|
||||
|
||||
|
||||
@ -119,19 +143,22 @@ integer and floating point operations your program does.</para>
|
||||
|
||||
<para>Valgrind is closely tied to details of the CPU and operating
|
||||
system, and to a lesser extent, the compiler and basic C libraries.
|
||||
Nonetheless, as of version 3.2.0 it supports several platforms:
|
||||
Nonetheless, as of version 3.3.0 it supports several platforms:
|
||||
x86/Linux (mature), amd64/Linux (maturing), ppc32/Linux and
|
||||
ppc64/Linux (less mature but work well). Valgrind uses the standard Unix
|
||||
ppc64/Linux (less mature but work well). There is also experimental
|
||||
support for ppc32/AIX5 and ppc64/AIX5 (AIX 5.2 and 5.3 only).
|
||||
Valgrind uses the standard Unix
|
||||
<computeroutput>./configure</computeroutput>,
|
||||
<computeroutput>make</computeroutput>, <computeroutput>make
|
||||
install</computeroutput> mechanism, and we have attempted to ensure that
|
||||
it works on machines with Linux kernel 2.4.X or 2.6.X and glibc
|
||||
2.2.X to 2.5.X.</para>
|
||||
2.2.X to 2.7.X.</para>
|
||||
|
||||
<para>Valgrind is licensed under the <xref linkend="license.gpl"/>,
|
||||
version 2. The <computeroutput>valgrind/*.h</computeroutput> headers
|
||||
that you may wish to include in your code (eg.
|
||||
<filename>valgrind.h</filename>, <filename>memcheck.h</filename>) are
|
||||
<filename>valgrind.h</filename>, <filename>memcheck.h</filename>,
|
||||
<filename>helgrind.h</filename>) are
|
||||
distributed under a BSD-style license, so you may include them in your
|
||||
code without worrying about license conflicts. Some of the PThreads
|
||||
test cases, <filename>pth_*.c</filename>, are taken from "Pthreads
|
||||
@ -139,6 +166,13 @@ Programming" by Bradford Nichols, Dick Buttlar & Jacqueline Proulx
|
||||
Farrell, ISBN 1-56592-115-1, published by O'Reilly & Associates,
|
||||
Inc.</para>
|
||||
|
||||
<para>If you contribute code to Valgrind, please ensure your
|
||||
contributions are licensed as "GPLv2, or (at your option) any later
|
||||
version." This is so as to allow the possibility of easily upgrading
|
||||
the license to GPLv3 in future. If you want to modify code in the VEX
|
||||
subdirectory, please also see VEX/HACKING.README.</para>
|
||||
|
||||
|
||||
</sect1>
|
||||
|
||||
|
||||
@ -158,11 +192,15 @@ want to run the Memcheck tool. The final chapter explains how to write a
|
||||
new tool.</para>
|
||||
|
||||
<para>Be aware that the core understands some command line flags, and
|
||||
the tools have their own flags which they know about. This means there
|
||||
is no central place describing all the flags that are accepted -- you
|
||||
have to read the flags documentation both for
|
||||
the tools have their own flags which they know about. This means
|
||||
there is no central place describing all the flags that are
|
||||
accepted -- you have to read the flags documentation both for
|
||||
<xref linkend="manual-core"/> and for the tool you want to use.</para>
|
||||
|
||||
<para>The manual is quite big and complex. If you are looking for a
|
||||
quick getting-started guide, have a look at
|
||||
<xref linkend="quick-start"/>.</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
</chapter>
|
||||
|
||||
@ -32,24 +32,64 @@ memory errors such as:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>touching memory you shouldn't (eg. overrunning heap block
|
||||
boundaries);</para>
|
||||
<para>Touching memory you shouldn't (eg. overrunning heap block
|
||||
boundaries, or reading/writing freed memory).</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>using values before they have been initialized;</para>
|
||||
<para>Using values before they have been initialized.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>incorrect freeing of memory, such as double-freeing heap
|
||||
blocks;</para>
|
||||
<para>Incorrect freeing of memory, such as double-freeing heap
|
||||
blocks.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>memory leaks.</para>
|
||||
<para>Memory leaks.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>Memcheck is only one of the tools in the Valgrind suite.
|
||||
Other tools you may find useful are:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Cachegrind: a profiling tool which produces detailed data on
|
||||
cache (miss) and branch (misprediction) events. Statistics are
|
||||
gathered for the entire program, for each function, for each line
|
||||
of code, and even for each instruction, if you need that level of
|
||||
detail.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Callgrind: a heavyweight profiling tool similar to
|
||||
Cachegrind, but which also shows cost relationships across
|
||||
function calls. Information gathered by Callgrind can be viewed
|
||||
using the KCachegrind GUI. KCachegrind is not part of the
|
||||
Valgrind suite - it is part of the KDE Desktop Environment.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Massif: a space profiling tool. It allows you to explore
|
||||
in detail which parts of your program allocate memory.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Helgrind: a debugging tool for threaded programs. Helgrind
|
||||
looks for various kinds of synchronisation errors in code that uses
|
||||
the POSIX PThreads API.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>In addition, there are a number of "experimental" tools in
|
||||
the codebase. They can be distinguished by the "exp-" prefix on
|
||||
their names. Experimental tools are not subject to the same
|
||||
quality control standards that apply to our production-grade tools
|
||||
(Memcheck, Cachegrind, Callgrind, Massif and Helgrind).</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>The rest of this guide discusses only the Memcheck tool. For
|
||||
full documentation on the other tools, see the Valgrind User
|
||||
Manual.</para>
|
||||
|
||||
<para>What follows is the minimum information you need to start
|
||||
detecting memory errors in your program with Memcheck. Note that this
|
||||
guide applies to Valgrind version 2.4.0 and later. Some of the
|
||||
guide applies to Valgrind version 3.3.0 and later. Some of the
|
||||
information is not quite right for earlier versions.</para>
|
||||
|
||||
</sect1>
|
||||
@ -162,8 +202,9 @@ Things to notice:
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
It's worth fixing errors in the order they are reported, as later errors
|
||||
can be caused by earlier errors.</para>
|
||||
It's worth fixing errors in the order they are reported, as later
|
||||
errors can be caused by earlier errors. Failing to do this is a
|
||||
common cause of difficulty with Memcheck.</para>
|
||||
|
||||
<para>Memory leak messages look like this:
|
||||
|
||||
@ -219,6 +260,15 @@ that are allocated statically or on the stack. But it should detect many
|
||||
errors that could crash your program (eg. cause a segmentation
|
||||
fault).</para>
|
||||
|
||||
<para>Try to make your program so clean that Memcheck reports no
|
||||
errors. Once you achieve this state, it is much easier to see when
|
||||
changes to the program cause Memcheck to report new errors.
|
||||
Experience from several years of Memcheck use shows that it is
|
||||
possible to make even huge programs run Memcheck-clean. For example,
|
||||
large parts of KDE 3.5.X, and recent versions of OpenOffice.org
|
||||
(2.3.0) are Memcheck-clean, or very close to it.</para>
|
||||
|
||||
|
||||
</sect1>
|
||||
|
||||
|
||||
|
||||
@ -17,11 +17,14 @@
|
||||
</legalnotice>
|
||||
</bookinfo>
|
||||
|
||||
<xi:include href="../../memcheck/docs/mc-tech-docs.xml" parse="xml"
|
||||
<!-- <xi:include href="../../memcheck/docs/mc-tech-docs.xml" parse="xml"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude" />
|
||||
<xi:include href="../../callgrind/docs/cl-format.xml" parse="xml"
|
||||
-->
|
||||
<xi:include href="new-tech-docs.xml" parse="xml"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude" />
|
||||
<xi:include href="manual-writing-tools.xml" parse="xml"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude" />
|
||||
<xi:include href="../../callgrind/docs/cl-format.xml" parse="xml"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude" />
|
||||
|
||||
</book>
|
||||
|
||||
@ -2,13 +2,13 @@
|
||||
<!ENTITY vg-url "http://www.valgrind.org/">
|
||||
<!ENTITY vg-jemail "julian@valgrind.org">
|
||||
<!ENTITY vg-vemail "valgrind@valgrind.org">
|
||||
<!ENTITY vg-lifespan "2000-2006">
|
||||
<!ENTITY vg-lifespan "2000-2007">
|
||||
<!ENTITY vg-users-list "http://lists.sourceforge.net/lists/listinfo/valgrind-users">
|
||||
|
||||
<!-- valgrind release + version stuff -->
|
||||
<!ENTITY rel-type "Release">
|
||||
<!ENTITY rel-version "3.2.0">
|
||||
<!ENTITY rel-date "7 June 2006">
|
||||
<!ENTITY rel-version "3.3.0">
|
||||
<!ENTITY rel-date "7 December 2007">
|
||||
|
||||
<!-- where the docs are installed -->
|
||||
<!ENTITY vg-doc-path "/usr/share/doc/valgrind/html/index.html">
|
||||
|
||||
@ -1287,6 +1287,393 @@ inform Memcheck about changes to the state of a mempool:</para>
|
||||
|
||||
</itemizedlist>
|
||||
|
||||
</sect1>
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<sect1 id="mc-manual.mpiwrap" xreflabel="MPI Wrappers">
|
||||
<title>Debugging MPI Parallel Programs with Valgrind</title>
|
||||
|
||||
<para> Valgrind supports debugging of distributed-memory applications
|
||||
which use the MPI message passing standard. This support consists of a
|
||||
library of wrapper functions for the
|
||||
<computeroutput>PMPI_*</computeroutput> interface. When incorporated
|
||||
into the application's address space, either by direct linking or by
|
||||
<computeroutput>LD_PRELOAD</computeroutput>, the wrappers intercept
|
||||
calls to <computeroutput>PMPI_Send</computeroutput>,
|
||||
<computeroutput>PMPI_Recv</computeroutput>, etc. They then
|
||||
use client requests to inform Valgrind of memory state changes caused
|
||||
by the function being wrapped. This reduces the number of false
|
||||
positives that Memcheck otherwise typically reports for MPI
|
||||
applications.</para>
|
||||
|
||||
<para>The wrappers also take the opportunity to carefully check
|
||||
size and definedness of buffers passed as arguments to MPI functions, hence
|
||||
detecting errors such as passing undefined data to
|
||||
<computeroutput>PMPI_Send</computeroutput>, or receiving data into a
|
||||
buffer which is too small.</para>
|
||||
|
||||
<para>Unlike most of the rest of Valgrind, the wrapper library is subject to a
|
||||
BSD-style license, so you can link it into any code base you like.
|
||||
See the top of <computeroutput>auxprogs/libmpiwrap.c</computeroutput>
|
||||
for license details.</para>
|
||||
|
||||
|
||||
<sect2 id="mc-manual.mpiwrap.build" xreflabel="Building MPI Wrappers">
|
||||
<title>Building and installing the wrappers</title>
|
||||
|
||||
<para> The wrapper library will be built automatically if possible.
|
||||
Valgrind's configure script will look for a suitable
|
||||
<computeroutput>mpicc</computeroutput> to build it with. This must be
|
||||
the same <computeroutput>mpicc</computeroutput> you use to build the
|
||||
MPI application you want to debug. By default, Valgrind tries
|
||||
<computeroutput>mpicc</computeroutput>, but you can specify a
|
||||
different one by using the configure-time flag
|
||||
<computeroutput>--with-mpicc=</computeroutput>. Currently the
|
||||
wrappers are only buildable with
|
||||
<computeroutput>mpicc</computeroutput>s which are based on GNU
|
||||
<computeroutput>gcc</computeroutput> or Intel's
|
||||
<computeroutput>icc</computeroutput>.</para>
|
||||
|
||||
<para>Check that the configure script prints a line like this:</para>
|
||||
|
||||
<programlisting><![CDATA[
|
||||
checking for usable MPI2-compliant mpicc and mpi.h... yes, mpicc
|
||||
]]></programlisting>
|
||||
|
||||
<para>If it says <computeroutput>... no</computeroutput>, your
|
||||
<computeroutput>mpicc</computeroutput> has failed to compile and link
|
||||
a test MPI2 program.</para>
|
||||
|
||||
<para>If the configure test succeeds, continue in the usual way with
|
||||
<computeroutput>make</computeroutput> and <computeroutput>make
|
||||
install</computeroutput>. The final install tree should then contain
|
||||
<computeroutput>libmpiwrap.so</computeroutput>.
|
||||
</para>
|
||||
|
||||
<para>Compile up a test MPI program (eg, MPI hello-world) and try
|
||||
this:</para>
|
||||
|
||||
<programlisting><![CDATA[
|
||||
LD_PRELOAD=$prefix/lib/valgrind/<platform>/libmpiwrap.so \
|
||||
mpirun [args] $prefix/bin/valgrind ./hello
|
||||
]]></programlisting>
|
||||
|
||||
<para>You should see something similar to the following</para>
|
||||
|
||||
<programlisting><![CDATA[
|
||||
valgrind MPI wrappers 31901: Active for pid 31901
|
||||
valgrind MPI wrappers 31901: Try MPIWRAP_DEBUG=help for possible options
|
||||
]]></programlisting>
|
||||
|
||||
<para>repeated for every process in the group. If you do not see
|
||||
these, there is an build/installation problem of some kind.</para>
|
||||
|
||||
<para> The MPI functions to be wrapped are assumed to be in an ELF
|
||||
shared object with soname matching
|
||||
<computeroutput>libmpi.so*</computeroutput>. This is known to be
|
||||
correct at least for Open MPI and Quadrics MPI, and can easily be
|
||||
changed if required.</para>
|
||||
</sect2>
|
||||
|
||||
|
||||
<sect2 id="mc-manual.mpiwrap.gettingstarted"
|
||||
xreflabel="Getting started with MPI Wrappers">
|
||||
<title>Getting started</title>
|
||||
|
||||
<para>Compile your MPI application as usual, taking care to link it
|
||||
using the same <computeroutput>mpicc</computeroutput> that your
|
||||
Valgrind build was configured with.</para>
|
||||
|
||||
<para>
|
||||
Use the following basic scheme to run your application on Valgrind with
|
||||
the wrappers engaged:</para>
|
||||
|
||||
<programlisting><![CDATA[
|
||||
MPIWRAP_DEBUG=[wrapper-args] \
|
||||
LD_PRELOAD=$prefix/lib/valgrind/<platform>/libmpiwrap.so \
|
||||
mpirun [mpirun-args] \
|
||||
$prefix/bin/valgrind [valgrind-args] \
|
||||
[application] [app-args]
|
||||
]]></programlisting>
|
||||
|
||||
<para>As an alternative to
|
||||
<computeroutput>LD_PRELOAD</computeroutput>ing
|
||||
<computeroutput>libmpiwrap.so</computeroutput>, you can simply link it
|
||||
to your application if desired. This should not disturb native
|
||||
behaviour of your application in any way.</para>
|
||||
</sect2>
|
||||
|
||||
|
||||
<sect2 id="mc-manual.mpiwrap.controlling"
|
||||
xreflabel="Controlling the MPI Wrappers">
|
||||
<title>Controlling the wrapper library</title>
|
||||
|
||||
<para>Environment variable
|
||||
<computeroutput>MPIWRAP_DEBUG</computeroutput> is consulted at
|
||||
startup. The default behaviour is to print a starting banner</para>
|
||||
|
||||
<programlisting><![CDATA[
|
||||
valgrind MPI wrappers 16386: Active for pid 16386
|
||||
valgrind MPI wrappers 16386: Try MPIWRAP_DEBUG=help for possible options
|
||||
]]></programlisting>
|
||||
|
||||
<para> and then be relatively quiet.</para>
|
||||
|
||||
<para>You can give a list of comma-separated options in
|
||||
<computeroutput>MPIWRAP_DEBUG</computeroutput>. These are</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para><computeroutput>verbose</computeroutput>:
|
||||
show entries/exits of all wrappers. Also show extra
|
||||
debugging info, such as the status of outstanding
|
||||
<computeroutput>MPI_Request</computeroutput>s resulting
|
||||
from uncompleted <computeroutput>MPI_Irecv</computeroutput>s.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para><computeroutput>quiet</computeroutput>:
|
||||
opposite of <computeroutput>verbose</computeroutput>, only print
|
||||
anything when the wrappers want
|
||||
to report a detected programming error, or in case of catastrophic
|
||||
failure of the wrappers.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para><computeroutput>warn</computeroutput>:
|
||||
by default, functions which lack proper wrappers
|
||||
are not commented on, just silently
|
||||
ignored. This causes a warning to be printed for each unwrapped
|
||||
function used, up to a maximum of three warnings per function.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para><computeroutput>strict</computeroutput>:
|
||||
print an error message and abort the program if
|
||||
a function lacking a wrapper is used.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para> If you want to use Valgrind's XML output facility
|
||||
(<computeroutput>--xml=yes</computeroutput>), you should pass
|
||||
<computeroutput>quiet</computeroutput> in
|
||||
<computeroutput>MPIWRAP_DEBUG</computeroutput> so as to get rid of any
|
||||
extraneous printing from the wrappers.</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
|
||||
<sect2 id="mc-manual.mpiwrap.limitations"
|
||||
xreflabel="Abilities and Limitations of MPI Wrappers">
|
||||
<title>Abilities and limitations</title>
|
||||
|
||||
<sect3 id="mc-manual.mpiwrap.limitations.functions"
|
||||
xreflabel="Functions">
|
||||
<title>Functions</title>
|
||||
|
||||
<para>All MPI2 functions except
|
||||
<computeroutput>MPI_Wtick</computeroutput>,
|
||||
<computeroutput>MPI_Wtime</computeroutput> and
|
||||
<computeroutput>MPI_Pcontrol</computeroutput> have wrappers. The
|
||||
first two are not wrapped because they return a
|
||||
<computeroutput>double</computeroutput>, and Valgrind's
|
||||
function-wrap mechanism cannot handle that (it could easily enough be
|
||||
extended to). <computeroutput>MPI_Pcontrol</computeroutput> cannot be
|
||||
wrapped as it has variable arity:
|
||||
<computeroutput>int MPI_Pcontrol(const int level, ...)</computeroutput></para>
|
||||
|
||||
<para>Most functions are wrapped with a default wrapper which does
|
||||
nothing except complain or abort if it is called, depending on
|
||||
settings in <computeroutput>MPIWRAP_DEBUG</computeroutput> listed
|
||||
above. The following functions have "real", do-something-useful
|
||||
wrappers:</para>
|
||||
|
||||
<programlisting><![CDATA[
|
||||
PMPI_Send PMPI_Bsend PMPI_Ssend PMPI_Rsend
|
||||
|
||||
PMPI_Recv PMPI_Get_count
|
||||
|
||||
PMPI_Isend PMPI_Ibsend PMPI_Issend PMPI_Irsend
|
||||
|
||||
PMPI_Irecv
|
||||
PMPI_Wait PMPI_Waitall
|
||||
PMPI_Test PMPI_Testall
|
||||
|
||||
PMPI_Iprobe PMPI_Probe
|
||||
|
||||
PMPI_Cancel
|
||||
|
||||
PMPI_Sendrecv
|
||||
|
||||
PMPI_Type_commit PMPI_Type_free
|
||||
|
||||
PMPI_Pack PMPI_Unpack
|
||||
|
||||
PMPI_Bcast PMPI_Gather PMPI_Scatter PMPI_Alltoall
|
||||
PMPI_Reduce PMPI_Allreduce PMPI_Op_create
|
||||
|
||||
PMPI_Comm_create PMPI_Comm_dup PMPI_Comm_free PMPI_Comm_rank PMPI_Comm_size
|
||||
|
||||
PMPI_Error_string
|
||||
PMPI_Init PMPI_Initialized PMPI_Finalize
|
||||
]]></programlisting>
|
||||
|
||||
<para> A few functions such as
|
||||
<computeroutput>PMPI_Address</computeroutput> are listed as
|
||||
<computeroutput>HAS_NO_WRAPPER</computeroutput>. They have no wrapper
|
||||
at all as there is nothing worth checking, and giving a no-op wrapper
|
||||
would reduce performance for no reason.</para>
|
||||
|
||||
<para> Note that the wrapper library itself can itself generate large
|
||||
numbers of calls to the MPI implementation, especially when walking
|
||||
complex types. The most common functions called are
|
||||
<computeroutput>PMPI_Extent</computeroutput>,
|
||||
<computeroutput>PMPI_Type_get_envelope</computeroutput>,
|
||||
<computeroutput>PMPI_Type_get_contents</computeroutput>, and
|
||||
<computeroutput>PMPI_Type_free</computeroutput>. </para>
|
||||
</sect3>
|
||||
|
||||
<sect3 id="mc-manual.mpiwrap.limitations.types"
|
||||
xreflabel="Types">
|
||||
<title>Types</title>
|
||||
|
||||
<para> MPI-1.1 structured types are supported, and walked exactly.
|
||||
The currently supported combiners are
|
||||
<computeroutput>MPI_COMBINER_NAMED</computeroutput>,
|
||||
<computeroutput>MPI_COMBINER_CONTIGUOUS</computeroutput>,
|
||||
<computeroutput>MPI_COMBINER_VECTOR</computeroutput>,
|
||||
<computeroutput>MPI_COMBINER_HVECTOR</computeroutput>
|
||||
<computeroutput>MPI_COMBINER_INDEXED</computeroutput>,
|
||||
<computeroutput>MPI_COMBINER_HINDEXED</computeroutput> and
|
||||
<computeroutput>MPI_COMBINER_STRUCT</computeroutput>. This should
|
||||
cover all MPI-1.1 types. The mechanism (function
|
||||
<computeroutput>walk_type</computeroutput>) should extend easily to
|
||||
cover MPI2 combiners.</para>
|
||||
|
||||
<para>MPI defines some named structured types
|
||||
(<computeroutput>MPI_FLOAT_INT</computeroutput>,
|
||||
<computeroutput>MPI_DOUBLE_INT</computeroutput>,
|
||||
<computeroutput>MPI_LONG_INT</computeroutput>,
|
||||
<computeroutput>MPI_2INT</computeroutput>,
|
||||
<computeroutput>MPI_SHORT_INT</computeroutput>,
|
||||
<computeroutput>MPI_LONG_DOUBLE_INT</computeroutput>) which are pairs
|
||||
of some basic type and a C <computeroutput>int</computeroutput>.
|
||||
Unfortunately the MPI specification makes it impossible to look inside
|
||||
these types and see where the fields are. Therefore these wrappers
|
||||
assume the types are laid out as <computeroutput>struct { float val;
|
||||
int loc; }</computeroutput> (for
|
||||
<computeroutput>MPI_FLOAT_INT</computeroutput>), etc, and act
|
||||
accordingly. This appears to be correct at least for Open MPI 1.0.2
|
||||
and for Quadrics MPI.</para>
|
||||
|
||||
<para>If <computeroutput>strict</computeroutput> is an option specified
|
||||
in <computeroutput>MPIWRAP_DEBUG</computeroutput>, the application
|
||||
will abort if an unhandled type is encountered. Otherwise, the
|
||||
application will print a warning message and continue.</para>
|
||||
|
||||
<para>Some effort is made to mark/check memory ranges corresponding to
|
||||
arrays of values in a single pass. This is important for performance
|
||||
since asking Valgrind to mark/check any range, no matter how small,
|
||||
carries quite a large constant cost. This optimisation is applied to
|
||||
arrays of primitive types (<computeroutput>double</computeroutput>,
|
||||
<computeroutput>float</computeroutput>,
|
||||
<computeroutput>int</computeroutput>,
|
||||
<computeroutput>long</computeroutput>, <computeroutput>long
|
||||
long</computeroutput>, <computeroutput>short</computeroutput>,
|
||||
<computeroutput>char</computeroutput>, and <computeroutput>long
|
||||
double</computeroutput> on platforms where <computeroutput>sizeof(long
|
||||
double) == 8</computeroutput>). For arrays of all other types, the
|
||||
wrappers handle each element individually and so there can be a very
|
||||
large performance cost.</para>
|
||||
|
||||
</sect3>
|
||||
|
||||
</sect2>
|
||||
|
||||
|
||||
<sect2 id="mc-manual.mpiwrap.writingwrappers"
|
||||
xreflabel="Writing new MPI Wrappers">
|
||||
<title>Writing new wrappers</title>
|
||||
|
||||
<para>
|
||||
For the most part the wrappers are straightforward. The only
|
||||
significant complexity arises with nonblocking receives.</para>
|
||||
|
||||
<para>The issue is that <computeroutput>MPI_Irecv</computeroutput>
|
||||
states the recv buffer and returns immediately, giving a handle
|
||||
(<computeroutput>MPI_Request</computeroutput>) for the transaction.
|
||||
Later the user will have to poll for completion with
|
||||
<computeroutput>MPI_Wait</computeroutput> etc, and when the
|
||||
transaction completes successfully, the wrappers have to paint the
|
||||
recv buffer. But the recv buffer details are not presented to
|
||||
<computeroutput>MPI_Wait</computeroutput> -- only the handle is. The
|
||||
library therefore maintains a shadow table which associates
|
||||
uncompleted <computeroutput>MPI_Request</computeroutput>s with the
|
||||
corresponding buffer address/count/type. When an operation completes,
|
||||
the table is searched for the associated address/count/type info, and
|
||||
memory is marked accordingly.</para>
|
||||
|
||||
<para>Access to the table is guarded by a (POSIX pthreads) lock, so as
|
||||
to make the library thread-safe.</para>
|
||||
|
||||
<para>The table is allocated with
|
||||
<computeroutput>malloc</computeroutput> and never
|
||||
<computeroutput>free</computeroutput>d, so it will show up in leak
|
||||
checks.</para>
|
||||
|
||||
<para>Writing new wrappers should be fairly easy. The source file is
|
||||
<computeroutput>auxprogs/libmpiwrap.c</computeroutput>. If possible,
|
||||
find an existing wrapper for a function of similar behaviour to the
|
||||
one you want to wrap, and use it as a starting point. The wrappers
|
||||
are organised in sections in the same order as the MPI 1.1 spec, to
|
||||
aid navigation. When adding a wrapper, remember to comment out the
|
||||
definition of the default wrapper in the long list of defaults at the
|
||||
bottom of the file (do not remove it, just comment it out).</para>
|
||||
</sect2>
|
||||
|
||||
<sect2 id="mc-manual.mpiwrap.whattoexpect"
|
||||
xreflabel="What to expect with MPI Wrappers">
|
||||
<title>What to expect when using the wrappers</title>
|
||||
|
||||
<para>The wrappers should reduce Memcheck's false-error rate on MPI
|
||||
applications. Because the wrapping is done at the MPI interface,
|
||||
there will still potentially be a large number of errors reported in
|
||||
the MPI implementation below the interface. The best you can do is
|
||||
try to suppress them.</para>
|
||||
|
||||
<para>You may also find that the input-side (buffer
|
||||
length/definedness) checks find errors in your MPI use, for example
|
||||
passing too short a buffer to
|
||||
<computeroutput>MPI_Recv</computeroutput>.</para>
|
||||
|
||||
<para>Functions which are not wrapped may increase the false
|
||||
error rate. A possible approach is to run with
|
||||
<computeroutput>MPI_DEBUG</computeroutput> containing
|
||||
<computeroutput>warn</computeroutput>. This will show you functions
|
||||
which lack proper wrappers but which are nevertheless used. You can
|
||||
then write wrappers for them.
|
||||
</para>
|
||||
|
||||
<para>A known source of potential false errors are the
|
||||
<computeroutput>PMPI_Reduce</computeroutput> family of functions, when
|
||||
using a custom (user-defined) reduction function. In a reduction
|
||||
operation, each node notionally sends data to a "central point" which
|
||||
uses the specified reduction function to merge the data items into a
|
||||
single item. Hence, in general, data is passed between nodes and fed
|
||||
to the reduction function, but the wrapper library cannot mark the
|
||||
transferred data as initialised before it is handed to the reduction
|
||||
function, because all that happens "inside" the
|
||||
<computeroutput>PMPI_Reduce</computeroutput> call. As a result you
|
||||
may see false positives reported in your reduction function.</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
</sect1>
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
</chapter>
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user