Update documents in preparation for 3.3.0, and restructure them

somewhat to move less relevant material out of the way to some extent.
The main changes are:

* Update date and version info

* Mention other tools in the quick-start guide

* Document --child-silent-after-fork

* Rearrange order of sections in the Valgrind Core chapter, to move
  advanced stuff (client requests) to the end, and compact stuff
  relevant to the majority of users towards the front

* Move MPI debugging stuff from the Core manual (a nonsensical place
  for it) to the Memcheck chapter

* Update the manual's introductory chapter a bit

* Connect up new tech docs summary page, and disconnect old and
  very out of date valgrind/memcheck tech docs

* Add section tags to the Cachegrind manual, to stop xsltproc
  complaining about their absence



git-svn-id: svn://svn.valgrind.org/valgrind/trunk@7199
This commit is contained in:
Julian Seward 2007-11-22 01:21:56 +00:00
parent 595181679a
commit 9101880b1f
10 changed files with 1054 additions and 910 deletions

View File

@ -6,8 +6,9 @@ dynamic-translation framework.
Jeremy Fitzhardinge, jeremy@valgrind.org
Jeremy wrote Helgrind and totally overhauled low-level syscall/signal
and address space layout stuff, among many other improvements.
Jeremy wrote Helgrind (in the 2.X line) and totally overhauled
low-level syscall/signal and address space layout stuff, among many
other improvements.
Tom Hughes, tom@valgrind.org

View File

@ -2,8 +2,9 @@
Cerion Armour-Brown worked on PowerPC instruction set support using
the Vex dynamic-translation framework.
Jeremy Fitzhardinge wrote Helgrind and totally overhauled low-level
syscall/signal and address space layout stuff, among many other things.
Jeremy Fitzhardinge wrote Helgrind (in the 2.X line) and totally
overhauled low-level syscall/signal and address space layout stuff,
among many other things.
Tom Hughes did a vast number of bug fixes, and helped out with support
for more recent Linux/glibc versions.

View File

@ -937,7 +937,7 @@ way as for C/C++ programs.</para>
<sect2>
<sect2 id="cg-manual.annopts.warnings" xreflabel="Warnings">
<title>Warnings</title>
<para>There are a couple of situations in which
@ -969,7 +969,8 @@ warnings.</para>
<sect2>
<sect2 id="cg-manual.annopts.things-to-watch-out-for"
xreflabel="Things to watch out for">
<title>Things to watch out for</title>
<para>Some odd things that can occur during annotation:</para>
@ -1084,7 +1085,7 @@ rare.</para>
<sect2>
<sect2 id="cg-manual.annopts.accuracy" xreflabel="Accuracy">
<title>Accuracy</title>
<para>Valgrind's cache profiling has a number of
@ -1221,7 +1222,8 @@ fail these checks.</para>
</sect1>
<sect1>
<sect1 id="cg-manual.acting-on"
xreflabel="Acting on Cachegrind's information">
<title>Acting on Cachegrind's information</title>
<para>
So, you've managed to profile your program with Cachegrind. Now what?
@ -1260,14 +1262,16 @@ yourself. But at least you have the information!
</sect1>
<sect1>
<sect1 id="cg-manual.impl-details"
xreflabel="Implementation details">
<title>Implementation details</title>
<para>
This section talks about details you don't need to know about in order to
use Cachegrind, but may be of interest to some people.
</para>
<sect2>
<sect2 id="cg-manual.impl-details.how-cg-works"
xreflabel="How Cachegrind works">
<title>How Cachegrind works</title>
<para>The best reference for understanding how Cachegrind works is chapter 3 of
"Dynamic Binary Analysis and Instrumentation", by Nicholas Nethercote. It
@ -1275,7 +1279,8 @@ is available on the <ulink url="&vg-pubs;">Valgrind publications
page</ulink>.</para>
</sect2>
<sect2>
<sect2 id="cg-manual.impl-details.file-format"
xreflabel="Cachegrind output file format">
<title>Cachegrind output file format</title>
<para>The file format is fairly straightforward, basically giving the
cost centre for every line, grouped by files and

View File

@ -7,5 +7,6 @@ EXTRA_DIST = \
manual-writing-tools.xml\
quick-start-guide.xml \
tech-docs.xml \
new-tech-docs.xml \
vg-entities.xml \
xml_help.txt

File diff suppressed because it is too large Load Diff

View File

@ -11,7 +11,7 @@
<para>Valgrind is a suite of simulation-based debugging and profiling
tools for programs running on Linux (x86, amd64, ppc32 and ppc64).
The system consists of a core, which provides a synthetic CPU in
software, and a series of tools, each of which performs some kind of
software, and a set of tools, each of which performs some kind of
debugging, profiling, or similar task. The architecture is modular,
so that new tools can be created easily and without disturbing the
existing structure.</para>
@ -106,6 +106,30 @@ summary, these are:</para>
paging needed.</para>
</listitem>
<listitem>
<para><command>Helgrind</command> detects synchronisation errors
in programs that use the POSIX pthreads threading primitives. It
detects the following three classes of errors:</para>
<itemizedlist>
<listitem>
<para>Misuses of the POSIX pthreads API.</para>
</listitem>
<listitem>
<para>Potential deadlocks arising from lock ordering
problems.</para>
</listitem>
<listitem>
<para>Data races -- accessing memory without adequate locking.</para>
</listitem>
</itemizedlist>
<para>Problems like these often result in unreproducible,
timing-dependent crashes, deadlocks and other misbehaviour, and
can be difficult to find by other means.</para>
</listitem>
</orderedlist>
@ -119,19 +143,22 @@ integer and floating point operations your program does.</para>
<para>Valgrind is closely tied to details of the CPU and operating
system, and to a lesser extent, the compiler and basic C libraries.
Nonetheless, as of version 3.2.0 it supports several platforms:
Nonetheless, as of version 3.3.0 it supports several platforms:
x86/Linux (mature), amd64/Linux (maturing), ppc32/Linux and
ppc64/Linux (less mature but work well). Valgrind uses the standard Unix
ppc64/Linux (less mature but work well). There is also experimental
support for ppc32/AIX5 and ppc64/AIX5 (AIX 5.2 and 5.3 only).
Valgrind uses the standard Unix
<computeroutput>./configure</computeroutput>,
<computeroutput>make</computeroutput>, <computeroutput>make
install</computeroutput> mechanism, and we have attempted to ensure that
it works on machines with Linux kernel 2.4.X or 2.6.X and glibc
2.2.X to 2.5.X.</para>
2.2.X to 2.7.X.</para>
<para>Valgrind is licensed under the <xref linkend="license.gpl"/>,
version 2. The <computeroutput>valgrind/*.h</computeroutput> headers
that you may wish to include in your code (eg.
<filename>valgrind.h</filename>, <filename>memcheck.h</filename>) are
<filename>valgrind.h</filename>, <filename>memcheck.h</filename>,
<filename>helgrind.h</filename>) are
distributed under a BSD-style license, so you may include them in your
code without worrying about license conflicts. Some of the PThreads
test cases, <filename>pth_*.c</filename>, are taken from "Pthreads
@ -139,6 +166,13 @@ Programming" by Bradford Nichols, Dick Buttlar &amp; Jacqueline Proulx
Farrell, ISBN 1-56592-115-1, published by O'Reilly &amp; Associates,
Inc.</para>
<para>If you contribute code to Valgrind, please ensure your
contributions are licensed as "GPLv2, or (at your option) any later
version." This is so as to allow the possibility of easily upgrading
the license to GPLv3 in future. If you want to modify code in the VEX
subdirectory, please also see VEX/HACKING.README.</para>
</sect1>
@ -158,11 +192,15 @@ want to run the Memcheck tool. The final chapter explains how to write a
new tool.</para>
<para>Be aware that the core understands some command line flags, and
the tools have their own flags which they know about. This means there
is no central place describing all the flags that are accepted -- you
have to read the flags documentation both for
the tools have their own flags which they know about. This means
there is no central place describing all the flags that are
accepted -- you have to read the flags documentation both for
<xref linkend="manual-core"/> and for the tool you want to use.</para>
<para>The manual is quite big and complex. If you are looking for a
quick getting-started guide, have a look at
<xref linkend="quick-start"/>.</para>
</sect1>
</chapter>

View File

@ -32,24 +32,64 @@ memory errors such as:</para>
<itemizedlist>
<listitem>
<para>touching memory you shouldn't (eg. overrunning heap block
boundaries);</para>
<para>Touching memory you shouldn't (eg. overrunning heap block
boundaries, or reading/writing freed memory).</para>
</listitem>
<listitem>
<para>using values before they have been initialized;</para>
<para>Using values before they have been initialized.</para>
</listitem>
<listitem>
<para>incorrect freeing of memory, such as double-freeing heap
blocks;</para>
<para>Incorrect freeing of memory, such as double-freeing heap
blocks.</para>
</listitem>
<listitem>
<para>memory leaks.</para>
<para>Memory leaks.</para>
</listitem>
</itemizedlist>
<para>Memcheck is only one of the tools in the Valgrind suite.
Other tools you may find useful are:</para>
<itemizedlist>
<listitem>
<para>Cachegrind: a profiling tool which produces detailed data on
cache (miss) and branch (misprediction) events. Statistics are
gathered for the entire program, for each function, for each line
of code, and even for each instruction, if you need that level of
detail.</para>
</listitem>
<listitem>
<para>Callgrind: a heavyweight profiling tool similar to
Cachegrind, but which also shows cost relationships across
function calls. Information gathered by Callgrind can be viewed
using the KCachegrind GUI. KCachegrind is not part of the
Valgrind suite - it is part of the KDE Desktop Environment.</para>
</listitem>
<listitem>
<para>Massif: a space profiling tool. It allows you to explore
in detail which parts of your program allocate memory.</para>
</listitem>
<listitem>
<para>Helgrind: a debugging tool for threaded programs. Helgrind
looks for various kinds of synchronisation errors in code that uses
the POSIX PThreads API.</para>
</listitem>
<listitem>
<para>In addition, there are a number of "experimental" tools in
the codebase. They can be distinguished by the "exp-" prefix on
their names. Experimental tools are not subject to the same
quality control standards that apply to our production-grade tools
(Memcheck, Cachegrind, Callgrind, Massif and Helgrind).</para>
</listitem>
</itemizedlist>
<para>The rest of this guide discusses only the Memcheck tool. For
full documentation on the other tools, see the Valgrind User
Manual.</para>
<para>What follows is the minimum information you need to start
detecting memory errors in your program with Memcheck. Note that this
guide applies to Valgrind version 2.4.0 and later. Some of the
guide applies to Valgrind version 3.3.0 and later. Some of the
information is not quite right for earlier versions.</para>
</sect1>
@ -162,8 +202,9 @@ Things to notice:
</listitem>
</itemizedlist>
It's worth fixing errors in the order they are reported, as later errors
can be caused by earlier errors.</para>
It's worth fixing errors in the order they are reported, as later
errors can be caused by earlier errors. Failing to do this is a
common cause of difficulty with Memcheck.</para>
<para>Memory leak messages look like this:
@ -219,6 +260,15 @@ that are allocated statically or on the stack. But it should detect many
errors that could crash your program (eg. cause a segmentation
fault).</para>
<para>Try to make your program so clean that Memcheck reports no
errors. Once you achieve this state, it is much easier to see when
changes to the program cause Memcheck to report new errors.
Experience from several years of Memcheck use shows that it is
possible to make even huge programs run Memcheck-clean. For example,
large parts of KDE 3.5.X, and recent versions of OpenOffice.org
(2.3.0) are Memcheck-clean, or very close to it.</para>
</sect1>

View File

@ -17,11 +17,14 @@
</legalnotice>
</bookinfo>
<xi:include href="../../memcheck/docs/mc-tech-docs.xml" parse="xml"
<!-- <xi:include href="../../memcheck/docs/mc-tech-docs.xml" parse="xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="../../callgrind/docs/cl-format.xml" parse="xml"
-->
<xi:include href="new-tech-docs.xml" parse="xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="manual-writing-tools.xml" parse="xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="../../callgrind/docs/cl-format.xml" parse="xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</book>

View File

@ -2,13 +2,13 @@
<!ENTITY vg-url "http://www.valgrind.org/">
<!ENTITY vg-jemail "julian@valgrind.org">
<!ENTITY vg-vemail "valgrind@valgrind.org">
<!ENTITY vg-lifespan "2000-2006">
<!ENTITY vg-lifespan "2000-2007">
<!ENTITY vg-users-list "http://lists.sourceforge.net/lists/listinfo/valgrind-users">
<!-- valgrind release + version stuff -->
<!ENTITY rel-type "Release">
<!ENTITY rel-version "3.2.0">
<!ENTITY rel-date "7 June 2006">
<!ENTITY rel-version "3.3.0">
<!ENTITY rel-date "7 December 2007">
<!-- where the docs are installed -->
<!ENTITY vg-doc-path "/usr/share/doc/valgrind/html/index.html">

View File

@ -1287,6 +1287,393 @@ inform Memcheck about changes to the state of a mempool:</para>
</itemizedlist>
</sect1>
<sect1 id="mc-manual.mpiwrap" xreflabel="MPI Wrappers">
<title>Debugging MPI Parallel Programs with Valgrind</title>
<para> Valgrind supports debugging of distributed-memory applications
which use the MPI message passing standard. This support consists of a
library of wrapper functions for the
<computeroutput>PMPI_*</computeroutput> interface. When incorporated
into the application's address space, either by direct linking or by
<computeroutput>LD_PRELOAD</computeroutput>, the wrappers intercept
calls to <computeroutput>PMPI_Send</computeroutput>,
<computeroutput>PMPI_Recv</computeroutput>, etc. They then
use client requests to inform Valgrind of memory state changes caused
by the function being wrapped. This reduces the number of false
positives that Memcheck otherwise typically reports for MPI
applications.</para>
<para>The wrappers also take the opportunity to carefully check
size and definedness of buffers passed as arguments to MPI functions, hence
detecting errors such as passing undefined data to
<computeroutput>PMPI_Send</computeroutput>, or receiving data into a
buffer which is too small.</para>
<para>Unlike most of the rest of Valgrind, the wrapper library is subject to a
BSD-style license, so you can link it into any code base you like.
See the top of <computeroutput>auxprogs/libmpiwrap.c</computeroutput>
for license details.</para>
<sect2 id="mc-manual.mpiwrap.build" xreflabel="Building MPI Wrappers">
<title>Building and installing the wrappers</title>
<para> The wrapper library will be built automatically if possible.
Valgrind's configure script will look for a suitable
<computeroutput>mpicc</computeroutput> to build it with. This must be
the same <computeroutput>mpicc</computeroutput> you use to build the
MPI application you want to debug. By default, Valgrind tries
<computeroutput>mpicc</computeroutput>, but you can specify a
different one by using the configure-time flag
<computeroutput>--with-mpicc=</computeroutput>. Currently the
wrappers are only buildable with
<computeroutput>mpicc</computeroutput>s which are based on GNU
<computeroutput>gcc</computeroutput> or Intel's
<computeroutput>icc</computeroutput>.</para>
<para>Check that the configure script prints a line like this:</para>
<programlisting><![CDATA[
checking for usable MPI2-compliant mpicc and mpi.h... yes, mpicc
]]></programlisting>
<para>If it says <computeroutput>... no</computeroutput>, your
<computeroutput>mpicc</computeroutput> has failed to compile and link
a test MPI2 program.</para>
<para>If the configure test succeeds, continue in the usual way with
<computeroutput>make</computeroutput> and <computeroutput>make
install</computeroutput>. The final install tree should then contain
<computeroutput>libmpiwrap.so</computeroutput>.
</para>
<para>Compile up a test MPI program (eg, MPI hello-world) and try
this:</para>
<programlisting><![CDATA[
LD_PRELOAD=$prefix/lib/valgrind/<platform>/libmpiwrap.so \
mpirun [args] $prefix/bin/valgrind ./hello
]]></programlisting>
<para>You should see something similar to the following</para>
<programlisting><![CDATA[
valgrind MPI wrappers 31901: Active for pid 31901
valgrind MPI wrappers 31901: Try MPIWRAP_DEBUG=help for possible options
]]></programlisting>
<para>repeated for every process in the group. If you do not see
these, there is an build/installation problem of some kind.</para>
<para> The MPI functions to be wrapped are assumed to be in an ELF
shared object with soname matching
<computeroutput>libmpi.so*</computeroutput>. This is known to be
correct at least for Open MPI and Quadrics MPI, and can easily be
changed if required.</para>
</sect2>
<sect2 id="mc-manual.mpiwrap.gettingstarted"
xreflabel="Getting started with MPI Wrappers">
<title>Getting started</title>
<para>Compile your MPI application as usual, taking care to link it
using the same <computeroutput>mpicc</computeroutput> that your
Valgrind build was configured with.</para>
<para>
Use the following basic scheme to run your application on Valgrind with
the wrappers engaged:</para>
<programlisting><![CDATA[
MPIWRAP_DEBUG=[wrapper-args] \
LD_PRELOAD=$prefix/lib/valgrind/<platform>/libmpiwrap.so \
mpirun [mpirun-args] \
$prefix/bin/valgrind [valgrind-args] \
[application] [app-args]
]]></programlisting>
<para>As an alternative to
<computeroutput>LD_PRELOAD</computeroutput>ing
<computeroutput>libmpiwrap.so</computeroutput>, you can simply link it
to your application if desired. This should not disturb native
behaviour of your application in any way.</para>
</sect2>
<sect2 id="mc-manual.mpiwrap.controlling"
xreflabel="Controlling the MPI Wrappers">
<title>Controlling the wrapper library</title>
<para>Environment variable
<computeroutput>MPIWRAP_DEBUG</computeroutput> is consulted at
startup. The default behaviour is to print a starting banner</para>
<programlisting><![CDATA[
valgrind MPI wrappers 16386: Active for pid 16386
valgrind MPI wrappers 16386: Try MPIWRAP_DEBUG=help for possible options
]]></programlisting>
<para> and then be relatively quiet.</para>
<para>You can give a list of comma-separated options in
<computeroutput>MPIWRAP_DEBUG</computeroutput>. These are</para>
<itemizedlist>
<listitem>
<para><computeroutput>verbose</computeroutput>:
show entries/exits of all wrappers. Also show extra
debugging info, such as the status of outstanding
<computeroutput>MPI_Request</computeroutput>s resulting
from uncompleted <computeroutput>MPI_Irecv</computeroutput>s.</para>
</listitem>
<listitem>
<para><computeroutput>quiet</computeroutput>:
opposite of <computeroutput>verbose</computeroutput>, only print
anything when the wrappers want
to report a detected programming error, or in case of catastrophic
failure of the wrappers.</para>
</listitem>
<listitem>
<para><computeroutput>warn</computeroutput>:
by default, functions which lack proper wrappers
are not commented on, just silently
ignored. This causes a warning to be printed for each unwrapped
function used, up to a maximum of three warnings per function.</para>
</listitem>
<listitem>
<para><computeroutput>strict</computeroutput>:
print an error message and abort the program if
a function lacking a wrapper is used.</para>
</listitem>
</itemizedlist>
<para> If you want to use Valgrind's XML output facility
(<computeroutput>--xml=yes</computeroutput>), you should pass
<computeroutput>quiet</computeroutput> in
<computeroutput>MPIWRAP_DEBUG</computeroutput> so as to get rid of any
extraneous printing from the wrappers.</para>
</sect2>
<sect2 id="mc-manual.mpiwrap.limitations"
xreflabel="Abilities and Limitations of MPI Wrappers">
<title>Abilities and limitations</title>
<sect3 id="mc-manual.mpiwrap.limitations.functions"
xreflabel="Functions">
<title>Functions</title>
<para>All MPI2 functions except
<computeroutput>MPI_Wtick</computeroutput>,
<computeroutput>MPI_Wtime</computeroutput> and
<computeroutput>MPI_Pcontrol</computeroutput> have wrappers. The
first two are not wrapped because they return a
<computeroutput>double</computeroutput>, and Valgrind's
function-wrap mechanism cannot handle that (it could easily enough be
extended to). <computeroutput>MPI_Pcontrol</computeroutput> cannot be
wrapped as it has variable arity:
<computeroutput>int MPI_Pcontrol(const int level, ...)</computeroutput></para>
<para>Most functions are wrapped with a default wrapper which does
nothing except complain or abort if it is called, depending on
settings in <computeroutput>MPIWRAP_DEBUG</computeroutput> listed
above. The following functions have "real", do-something-useful
wrappers:</para>
<programlisting><![CDATA[
PMPI_Send PMPI_Bsend PMPI_Ssend PMPI_Rsend
PMPI_Recv PMPI_Get_count
PMPI_Isend PMPI_Ibsend PMPI_Issend PMPI_Irsend
PMPI_Irecv
PMPI_Wait PMPI_Waitall
PMPI_Test PMPI_Testall
PMPI_Iprobe PMPI_Probe
PMPI_Cancel
PMPI_Sendrecv
PMPI_Type_commit PMPI_Type_free
PMPI_Pack PMPI_Unpack
PMPI_Bcast PMPI_Gather PMPI_Scatter PMPI_Alltoall
PMPI_Reduce PMPI_Allreduce PMPI_Op_create
PMPI_Comm_create PMPI_Comm_dup PMPI_Comm_free PMPI_Comm_rank PMPI_Comm_size
PMPI_Error_string
PMPI_Init PMPI_Initialized PMPI_Finalize
]]></programlisting>
<para> A few functions such as
<computeroutput>PMPI_Address</computeroutput> are listed as
<computeroutput>HAS_NO_WRAPPER</computeroutput>. They have no wrapper
at all as there is nothing worth checking, and giving a no-op wrapper
would reduce performance for no reason.</para>
<para> Note that the wrapper library itself can itself generate large
numbers of calls to the MPI implementation, especially when walking
complex types. The most common functions called are
<computeroutput>PMPI_Extent</computeroutput>,
<computeroutput>PMPI_Type_get_envelope</computeroutput>,
<computeroutput>PMPI_Type_get_contents</computeroutput>, and
<computeroutput>PMPI_Type_free</computeroutput>. </para>
</sect3>
<sect3 id="mc-manual.mpiwrap.limitations.types"
xreflabel="Types">
<title>Types</title>
<para> MPI-1.1 structured types are supported, and walked exactly.
The currently supported combiners are
<computeroutput>MPI_COMBINER_NAMED</computeroutput>,
<computeroutput>MPI_COMBINER_CONTIGUOUS</computeroutput>,
<computeroutput>MPI_COMBINER_VECTOR</computeroutput>,
<computeroutput>MPI_COMBINER_HVECTOR</computeroutput>
<computeroutput>MPI_COMBINER_INDEXED</computeroutput>,
<computeroutput>MPI_COMBINER_HINDEXED</computeroutput> and
<computeroutput>MPI_COMBINER_STRUCT</computeroutput>. This should
cover all MPI-1.1 types. The mechanism (function
<computeroutput>walk_type</computeroutput>) should extend easily to
cover MPI2 combiners.</para>
<para>MPI defines some named structured types
(<computeroutput>MPI_FLOAT_INT</computeroutput>,
<computeroutput>MPI_DOUBLE_INT</computeroutput>,
<computeroutput>MPI_LONG_INT</computeroutput>,
<computeroutput>MPI_2INT</computeroutput>,
<computeroutput>MPI_SHORT_INT</computeroutput>,
<computeroutput>MPI_LONG_DOUBLE_INT</computeroutput>) which are pairs
of some basic type and a C <computeroutput>int</computeroutput>.
Unfortunately the MPI specification makes it impossible to look inside
these types and see where the fields are. Therefore these wrappers
assume the types are laid out as <computeroutput>struct { float val;
int loc; }</computeroutput> (for
<computeroutput>MPI_FLOAT_INT</computeroutput>), etc, and act
accordingly. This appears to be correct at least for Open MPI 1.0.2
and for Quadrics MPI.</para>
<para>If <computeroutput>strict</computeroutput> is an option specified
in <computeroutput>MPIWRAP_DEBUG</computeroutput>, the application
will abort if an unhandled type is encountered. Otherwise, the
application will print a warning message and continue.</para>
<para>Some effort is made to mark/check memory ranges corresponding to
arrays of values in a single pass. This is important for performance
since asking Valgrind to mark/check any range, no matter how small,
carries quite a large constant cost. This optimisation is applied to
arrays of primitive types (<computeroutput>double</computeroutput>,
<computeroutput>float</computeroutput>,
<computeroutput>int</computeroutput>,
<computeroutput>long</computeroutput>, <computeroutput>long
long</computeroutput>, <computeroutput>short</computeroutput>,
<computeroutput>char</computeroutput>, and <computeroutput>long
double</computeroutput> on platforms where <computeroutput>sizeof(long
double) == 8</computeroutput>). For arrays of all other types, the
wrappers handle each element individually and so there can be a very
large performance cost.</para>
</sect3>
</sect2>
<sect2 id="mc-manual.mpiwrap.writingwrappers"
xreflabel="Writing new MPI Wrappers">
<title>Writing new wrappers</title>
<para>
For the most part the wrappers are straightforward. The only
significant complexity arises with nonblocking receives.</para>
<para>The issue is that <computeroutput>MPI_Irecv</computeroutput>
states the recv buffer and returns immediately, giving a handle
(<computeroutput>MPI_Request</computeroutput>) for the transaction.
Later the user will have to poll for completion with
<computeroutput>MPI_Wait</computeroutput> etc, and when the
transaction completes successfully, the wrappers have to paint the
recv buffer. But the recv buffer details are not presented to
<computeroutput>MPI_Wait</computeroutput> -- only the handle is. The
library therefore maintains a shadow table which associates
uncompleted <computeroutput>MPI_Request</computeroutput>s with the
corresponding buffer address/count/type. When an operation completes,
the table is searched for the associated address/count/type info, and
memory is marked accordingly.</para>
<para>Access to the table is guarded by a (POSIX pthreads) lock, so as
to make the library thread-safe.</para>
<para>The table is allocated with
<computeroutput>malloc</computeroutput> and never
<computeroutput>free</computeroutput>d, so it will show up in leak
checks.</para>
<para>Writing new wrappers should be fairly easy. The source file is
<computeroutput>auxprogs/libmpiwrap.c</computeroutput>. If possible,
find an existing wrapper for a function of similar behaviour to the
one you want to wrap, and use it as a starting point. The wrappers
are organised in sections in the same order as the MPI 1.1 spec, to
aid navigation. When adding a wrapper, remember to comment out the
definition of the default wrapper in the long list of defaults at the
bottom of the file (do not remove it, just comment it out).</para>
</sect2>
<sect2 id="mc-manual.mpiwrap.whattoexpect"
xreflabel="What to expect with MPI Wrappers">
<title>What to expect when using the wrappers</title>
<para>The wrappers should reduce Memcheck's false-error rate on MPI
applications. Because the wrapping is done at the MPI interface,
there will still potentially be a large number of errors reported in
the MPI implementation below the interface. The best you can do is
try to suppress them.</para>
<para>You may also find that the input-side (buffer
length/definedness) checks find errors in your MPI use, for example
passing too short a buffer to
<computeroutput>MPI_Recv</computeroutput>.</para>
<para>Functions which are not wrapped may increase the false
error rate. A possible approach is to run with
<computeroutput>MPI_DEBUG</computeroutput> containing
<computeroutput>warn</computeroutput>. This will show you functions
which lack proper wrappers but which are nevertheless used. You can
then write wrappers for them.
</para>
<para>A known source of potential false errors are the
<computeroutput>PMPI_Reduce</computeroutput> family of functions, when
using a custom (user-defined) reduction function. In a reduction
operation, each node notionally sends data to a "central point" which
uses the specified reduction function to merge the data items into a
single item. Hence, in general, data is passed between nodes and fed
to the reduction function, but the wrapper library cannot mark the
transferred data as initialised before it is handed to the reduction
function, because all that happens "inside" the
<computeroutput>PMPI_Reduce</computeroutput> call. As a result you
may see false positives reported in your reduction function.</para>
</sect2>
</sect1>
</chapter>