Update documents in preparation for 3.3.0, and restructure them

somewhat to move less relevant material out of the way to some extent. The main changes are: * Update date and version info * Mention other tools in the quick-start guide * Document --child-silent-after-fork * Rearrange order of sections in the Valgrind Core chapter, to move advanced stuff (client requests) to the end, and compact stuff relevant to the majority of users towards the front * Move MPI debugging stuff from the Core manual (a nonsensical place for it) to the Memcheck chapter * Update the manual's introductory chapter a bit * Connect up new tech docs summary page, and disconnect old and very out of date valgrind/memcheck tech docs * Add section tags to the Cachegrind manual, to stop xsltproc complaining about their absence git-svn-id: svn://svn.valgrind.org/valgrind/trunk@7199
2026-02-03 18:13:01 +00:00 · 2007-11-22 01:21:56 +00:00 · 2007-11-22 01:21:56 +00:00 · 9101880b1f
commit 9101880b1f
parent 595181679a
10 changed files with 1054 additions and 910 deletions
--- a/5
+++ b/5
@ -6,8 +6,9 @@ dynamic-translation framework.

 Jeremy Fitzhardinge, jeremy@valgrind.org

-Jeremy wrote Helgrind and totally overhauled low-level syscall/signal
-and address space layout stuff, among many other improvements.
+Jeremy wrote Helgrind (in the 2.X line) and totally overhauled
+low-level syscall/signal and address space layout stuff, among many
+other improvements.

 Tom Hughes, tom@valgrind.org

--- a/5
+++ b/5
@ -2,8 +2,9 @@
 Cerion Armour-Brown worked on PowerPC instruction set support using
 the Vex dynamic-translation framework.

-Jeremy Fitzhardinge wrote Helgrind and totally overhauled low-level
-syscall/signal and address space layout stuff, among many other things.
+Jeremy Fitzhardinge wrote Helgrind (in the 2.X line) and totally
+overhauled low-level syscall/signal and address space layout stuff,
+among many other things.

 Tom Hughes did a vast number of bug fixes, and helped out with support
 for more recent Linux/glibc versions.
--- a/cachegrind/docs/cg-manual.xml
+++ b/cachegrind/docs/cg-manual.xml
@ -937,7 +937,7 @@ way as for C/C++ programs.</para>
  


-<sect2>
+<sect2 id="cg-manual.annopts.warnings" xreflabel="Warnings">
 <title>Warnings</title>

 <para>There are a couple of situations in which
@ -969,7 +969,8 @@ warnings.</para>



-<sect2>
+<sect2 id="cg-manual.annopts.things-to-watch-out-for"
+       xreflabel="Things to watch out for">
 <title>Things to watch out for</title>

 <para>Some odd things that can occur during annotation:</para>
@ -1084,7 +1085,7 @@ rare.</para>



-<sect2>
+<sect2 id="cg-manual.annopts.accuracy" xreflabel="Accuracy">
 <title>Accuracy</title>

 <para>Valgrind's cache profiling has a number of
@ -1221,7 +1222,8 @@ fail these checks.</para>
 </sect1>


-<sect1>
+<sect1 id="cg-manual.acting-on"
+       xreflabel="Acting on Cachegrind's information">
 <title>Acting on Cachegrind's information</title>
 <para>
 So, you've managed to profile your program with Cachegrind.  Now what?
@ -1260,14 +1262,16 @@ yourself.  But at least you have the information!

 </sect1>

-<sect1>
+<sect1 id="cg-manual.impl-details"
+       xreflabel="Implementation details">
 <title>Implementation details</title>
 <para>
 This section talks about details you don't need to know about in order to
 use Cachegrind, but may be of interest to some people.
 </para>

-<sect2>
+<sect2 id="cg-manual.impl-details.how-cg-works"
+       xreflabel="How Cachegrind works">
 <title>How Cachegrind works</title>
 <para>The best reference for understanding how Cachegrind works is chapter 3 of
 "Dynamic Binary Analysis and Instrumentation", by Nicholas Nethercote.  It
@ -1275,7 +1279,8 @@ is available on the <ulink url="&vg-pubs;">Valgrind publications
 page</ulink>.</para>
 </sect2>

-<sect2>
+<sect2 id="cg-manual.impl-details.file-format"
+       xreflabel="Cachegrind output file format">
 <title>Cachegrind output file format</title>
 <para>The file format is fairly straightforward, basically giving the
 cost centre for every line, grouped by files and
--- a/docs/xml/Makefile.am
+++ b/docs/xml/Makefile.am
@ -7,5 +7,6 @@ EXTRA_DIST =  \
 	manual-writing-tools.xml\
 	quick-start-guide.xml	\
 	tech-docs.xml 		\
+	new-tech-docs.xml 	\
 	vg-entities.xml 	\
 	xml_help.txt
--- a/docs/xml/manual-core.xml
+++ b/docs/xml/manual-core.xml
--- a/docs/xml/manual-intro.xml
+++ b/docs/xml/manual-intro.xml
@ -11,7 +11,7 @@
 <para>Valgrind is a suite of simulation-based debugging and profiling
 tools for programs running on Linux (x86, amd64, ppc32 and ppc64).
 The system consists of a core, which provides a synthetic CPU in
-software, and a series of tools, each of which performs some kind of
+software, and a set of tools, each of which performs some kind of
 debugging, profiling, or similar task.  The architecture is modular,
 so that new tools can be created easily and without disturbing the
 existing structure.</para>
@ -106,6 +106,30 @@ summary, these are:</para>
     paging needed.</para>
   </listitem>

+   <listitem>
+     <para><command>Helgrind</command> detects synchronisation errors
+     in programs that use the POSIX pthreads threading primitives.  It
+     detects the following three classes of errors:</para>
+
+     <itemizedlist>
+      <listitem>
+        <para>Misuses of the POSIX pthreads API.</para>
+      </listitem>
+      <listitem>
+        <para>Potential deadlocks arising from lock ordering
+        problems.</para>
+      </listitem>
+      <listitem>
+       <para>Data races -- accessing memory without adequate locking.</para>
+      </listitem>
+    </itemizedlist>
+
+    <para>Problems like these often result in unreproducible,
+    timing-dependent crashes, deadlocks and other misbehaviour, and
+    can be difficult to find by other means.</para>
+
+   </listitem>
+
 </orderedlist>
  

@ -119,19 +143,22 @@ integer and floating point operations your program does.</para>

 <para>Valgrind is closely tied to details of the CPU and operating
 system, and to a lesser extent, the compiler and basic C libraries.
-Nonetheless, as of version 3.2.0 it supports several platforms:
+Nonetheless, as of version 3.3.0 it supports several platforms:
 x86/Linux (mature), amd64/Linux (maturing), ppc32/Linux and
-ppc64/Linux (less mature but work well).  Valgrind uses the standard Unix
+ppc64/Linux (less mature but work well).  There is also experimental
+support for ppc32/AIX5 and ppc64/AIX5 (AIX 5.2 and 5.3 only).
+Valgrind uses the standard Unix
 <computeroutput>./configure</computeroutput>,
 <computeroutput>make</computeroutput>, <computeroutput>make
 install</computeroutput> mechanism, and we have attempted to ensure that
 it works on machines with Linux kernel 2.4.X or 2.6.X and glibc
-2.2.X to 2.5.X.</para>
+2.2.X to 2.7.X.</para>

 <para>Valgrind is licensed under the <xref linkend="license.gpl"/>,
 version 2.  The <computeroutput>valgrind/*.h</computeroutput> headers
 that you may wish to include in your code (eg.
-<filename>valgrind.h</filename>, <filename>memcheck.h</filename>) are
+<filename>valgrind.h</filename>, <filename>memcheck.h</filename>,
+<filename>helgrind.h</filename>) are
 distributed under a BSD-style license, so you may include them in your
 code without worrying about license conflicts.  Some of the PThreads
 test cases, <filename>pth_*.c</filename>, are taken from "Pthreads
@ -139,6 +166,13 @@ Programming" by Bradford Nichols, Dick Buttlar &amp; Jacqueline Proulx
 Farrell, ISBN 1-56592-115-1, published by O'Reilly &amp; Associates,
 Inc.</para>

+<para>If you contribute code to Valgrind, please ensure your
+contributions are licensed as "GPLv2, or (at your option) any later
+version."  This is so as to allow the possibility of easily upgrading
+the license to GPLv3 in future.  If you want to modify code in the VEX
+subdirectory, please also see VEX/HACKING.README.</para>
+
+
 </sect1>


@ -158,11 +192,15 @@ want to run the Memcheck tool.  The final chapter explains how to write a
 new tool.</para>

 <para>Be aware that the core understands some command line flags, and
-the tools have their own flags which they know about.  This means there
-is no central place describing all the flags that are accepted -- you
-have to read the flags documentation both for 
+the tools have their own flags which they know about.  This means
+there is no central place describing all the flags that are
+accepted -- you have to read the flags documentation both for
 <xref linkend="manual-core"/> and for the tool you want to use.</para>

+<para>The manual is quite big and complex.  If you are looking for a
+quick getting-started guide, have a look at
+<xref linkend="quick-start"/>.</para>
+
 </sect1>

 </chapter>
--- a/docs/xml/quick-start-guide.xml
+++ b/docs/xml/quick-start-guide.xml
@ -32,24 +32,64 @@ memory errors such as:</para>

 <itemizedlist>
  <listitem>
-    <para>touching memory you shouldn't (eg. overrunning heap block
-    boundaries);</para>
+    <para>Touching memory you shouldn't (eg. overrunning heap block
+    boundaries, or reading/writing freed memory).</para>
  </listitem>
  <listitem>
-    <para>using values before they have been initialized;</para>
+    <para>Using values before they have been initialized.</para>
  </listitem>
  <listitem>
-    <para>incorrect freeing of memory, such as double-freeing heap
-    blocks;</para>
+    <para>Incorrect freeing of memory, such as double-freeing heap
+    blocks.</para>
  </listitem>
  <listitem>
-    <para>memory leaks.</para>
+    <para>Memory leaks.</para>
  </listitem>
 </itemizedlist>

+<para>Memcheck is only one of the tools in the Valgrind suite.
+Other tools you may find useful are:</para>
+
+<itemizedlist>
+  <listitem>
+    <para>Cachegrind: a profiling tool which produces detailed data on
+    cache (miss) and branch (misprediction) events.  Statistics are
+    gathered for the entire program, for each function, for each line
+    of code, and even for each instruction, if you need that level of
+    detail.</para>
+  </listitem>
+  <listitem>
+    <para>Callgrind: a heavyweight profiling tool similar to
+    Cachegrind, but which also shows cost relationships across
+    function calls.  Information gathered by Callgrind can be viewed
+    using the KCachegrind GUI.  KCachegrind is not part of the
+    Valgrind suite - it is part of the KDE Desktop Environment.</para>
+  </listitem>
+  <listitem>
+    <para>Massif: a space profiling tool.  It allows you to explore
+    in detail which parts of your program allocate memory.</para>
+  </listitem>
+  <listitem>
+    <para>Helgrind: a debugging tool for threaded programs.  Helgrind
+    looks for various kinds of synchronisation errors in code that uses
+    the POSIX PThreads API.</para>
+  </listitem>
+  <listitem>
+    <para>In addition, there are a number of "experimental" tools in
+    the codebase.  They can be distinguished by the "exp-" prefix on
+    their names.  Experimental tools are not subject to the same
+    quality control standards that apply to our production-grade tools
+    (Memcheck, Cachegrind, Callgrind, Massif and Helgrind).</para>
+  </listitem>
+</itemizedlist>
+
+<para>The rest of this guide discusses only the Memcheck tool.  For
+full documentation on the other tools, see the Valgrind User
+Manual.</para>
+
 <para>What follows is the minimum information you need to start
 detecting memory errors in your program with Memcheck.  Note that this
-guide applies to Valgrind version 2.4.0 and later.  Some of the
+guide applies to Valgrind version 3.3.0 and later.  Some of the
 information is not quite right for earlier versions.</para>

 </sect1>
@ -162,8 +202,9 @@ Things to notice:
  </listitem>
 </itemizedlist>

-It's worth fixing errors in the order they are reported, as later errors
-can be caused by earlier errors.</para>
+It's worth fixing errors in the order they are reported, as later
+errors can be caused by earlier errors.  Failing to do this is a
+common cause of difficulty with Memcheck.</para>

 <para>Memory leak messages look like this:

@ -219,6 +260,15 @@ that are allocated statically or on the stack.  But it should detect many
 errors that could crash your program (eg. cause a segmentation
 fault).</para>

+<para>Try to make your program so clean that Memcheck reports no
+errors.  Once you achieve this state, it is much easier to see when
+changes to the program cause Memcheck to report new errors.
+Experience from several years of Memcheck use shows that it is
+possible to make even huge programs run Memcheck-clean.  For example,
+large parts of KDE 3.5.X, and recent versions of OpenOffice.org
+(2.3.0) are Memcheck-clean, or very close to it.</para>
+
+
 </sect1>


--- a/docs/xml/tech-docs.xml
+++ b/docs/xml/tech-docs.xml
@ -17,11 +17,14 @@
  </legalnotice>
 </bookinfo>

-  <xi:include href="../../memcheck/docs/mc-tech-docs.xml" parse="xml"  
+<!--  <xi:include href="../../memcheck/docs/mc-tech-docs.xml" parse="xml"  
      xmlns:xi="http://www.w3.org/2001/XInclude" />
-  <xi:include href="../../callgrind/docs/cl-format.xml" parse="xml"  
+-->
+  <xi:include href="new-tech-docs.xml" parse="xml"  
      xmlns:xi="http://www.w3.org/2001/XInclude" />
  <xi:include href="manual-writing-tools.xml" parse="xml"  
      xmlns:xi="http://www.w3.org/2001/XInclude" />
+  <xi:include href="../../callgrind/docs/cl-format.xml" parse="xml"  
+      xmlns:xi="http://www.w3.org/2001/XInclude" />

 </book>
--- a/docs/xml/vg-entities.xml
+++ b/docs/xml/vg-entities.xml
@ -2,13 +2,13 @@
 <!ENTITY vg-url        "http://www.valgrind.org/">
 <!ENTITY vg-jemail     "julian@valgrind.org">
 <!ENTITY vg-vemail     "valgrind@valgrind.org">
-<!ENTITY vg-lifespan   "2000-2006">
+<!ENTITY vg-lifespan   "2000-2007">
 <!ENTITY vg-users-list "http://lists.sourceforge.net/lists/listinfo/valgrind-users">

 <!-- valgrind release + version stuff -->
 <!ENTITY rel-type    "Release">
-<!ENTITY rel-version "3.2.0">
-<!ENTITY rel-date    "7 June 2006">
+<!ENTITY rel-version "3.3.0">
+<!ENTITY rel-date    "7 December 2007">

 <!-- where the docs are installed -->
 <!ENTITY vg-doc-path  "/usr/share/doc/valgrind/html/index.html">
--- a/memcheck/docs/mc-manual.xml
+++ b/memcheck/docs/mc-manual.xml
@ -1287,6 +1287,393 @@ inform Memcheck about changes to the state of a mempool:</para>

 </itemizedlist>

+</sect1>
+
+
+
+
+
+
+
+<sect1 id="mc-manual.mpiwrap" xreflabel="MPI Wrappers">
+<title>Debugging MPI Parallel Programs with Valgrind</title>
+
+<para> Valgrind supports debugging of distributed-memory applications
+which use the MPI message passing standard.  This support consists of a
+library of wrapper functions for the
+<computeroutput>PMPI_*</computeroutput> interface.  When incorporated
+into the application's address space, either by direct linking or by
+<computeroutput>LD_PRELOAD</computeroutput>, the wrappers intercept
+calls to <computeroutput>PMPI_Send</computeroutput>,
+<computeroutput>PMPI_Recv</computeroutput>, etc.  They then
+use client requests to inform Valgrind of memory state changes caused
+by the function being wrapped.  This reduces the number of false
+positives that Memcheck otherwise typically reports for MPI
+applications.</para>
+
+<para>The wrappers also take the opportunity to carefully check
+size and definedness of buffers passed as arguments to MPI functions, hence
+detecting errors such as passing undefined data to
+<computeroutput>PMPI_Send</computeroutput>, or receiving data into a
+buffer which is too small.</para>
+
+<para>Unlike most of the rest of Valgrind, the wrapper library is subject to a
+BSD-style license, so you can link it into any code base you like.
+See the top of <computeroutput>auxprogs/libmpiwrap.c</computeroutput>
+for license details.</para>
+
+
+<sect2 id="mc-manual.mpiwrap.build" xreflabel="Building MPI Wrappers">
+<title>Building and installing the wrappers</title>
+
+<para> The wrapper library will be built automatically if possible.
+Valgrind's configure script will look for a suitable
+<computeroutput>mpicc</computeroutput> to build it with.  This must be
+the same <computeroutput>mpicc</computeroutput> you use to build the
+MPI application you want to debug.  By default, Valgrind tries
+<computeroutput>mpicc</computeroutput>, but you can specify a
+different one by using the configure-time flag
+<computeroutput>--with-mpicc=</computeroutput>.  Currently the
+wrappers are only buildable with
+<computeroutput>mpicc</computeroutput>s which are based on GNU
+<computeroutput>gcc</computeroutput> or Intel's
+<computeroutput>icc</computeroutput>.</para>
+
+<para>Check that the configure script prints a line like this:</para>
+
+<programlisting><![CDATA[
+checking for usable MPI2-compliant mpicc and mpi.h... yes, mpicc
+]]></programlisting>
+
+<para>If it says <computeroutput>... no</computeroutput>, your
+<computeroutput>mpicc</computeroutput> has failed to compile and link
+a test MPI2 program.</para>
+
+<para>If the configure test succeeds, continue in the usual way with
+<computeroutput>make</computeroutput> and <computeroutput>make
+install</computeroutput>.  The final install tree should then contain
+<computeroutput>libmpiwrap.so</computeroutput>.
+</para>
+
+<para>Compile up a test MPI program (eg, MPI hello-world) and try
+this:</para>
+
+<programlisting><![CDATA[
+LD_PRELOAD=$prefix/lib/valgrind/<platform>/libmpiwrap.so   \
+           mpirun [args] $prefix/bin/valgrind ./hello
+]]></programlisting>
+
+<para>You should see something similar to the following</para>
+
+<programlisting><![CDATA[
+valgrind MPI wrappers 31901: Active for pid 31901
+valgrind MPI wrappers 31901: Try MPIWRAP_DEBUG=help for possible options
+]]></programlisting>
+
+<para>repeated for every process in the group.  If you do not see
+these, there is an build/installation problem of some kind.</para>
+
+<para> The MPI functions to be wrapped are assumed to be in an ELF
+shared object with soname matching
+<computeroutput>libmpi.so*</computeroutput>.  This is known to be
+correct at least for Open MPI and Quadrics MPI, and can easily be
+changed if required.</para> 
+</sect2>
+
+
+<sect2 id="mc-manual.mpiwrap.gettingstarted" 
+       xreflabel="Getting started with MPI Wrappers">
+<title>Getting started</title>
+
+<para>Compile your MPI application as usual, taking care to link it
+using the same <computeroutput>mpicc</computeroutput> that your
+Valgrind build was configured with.</para>
+
+<para>
+Use the following basic scheme to run your application on Valgrind with
+the wrappers engaged:</para>
+
+<programlisting><![CDATA[
+MPIWRAP_DEBUG=[wrapper-args]                                  \
+   LD_PRELOAD=$prefix/lib/valgrind/<platform>/libmpiwrap.so   \
+   mpirun [mpirun-args]                                       \
+   $prefix/bin/valgrind [valgrind-args]                       \
+   [application] [app-args]
+]]></programlisting>
+
+<para>As an alternative to
+<computeroutput>LD_PRELOAD</computeroutput>ing
+<computeroutput>libmpiwrap.so</computeroutput>, you can simply link it
+to your application if desired.  This should not disturb native
+behaviour of your application in any way.</para>
+</sect2>
+
+
+<sect2 id="mc-manual.mpiwrap.controlling" 
+       xreflabel="Controlling the MPI Wrappers">
+<title>Controlling the wrapper library</title>
+
+<para>Environment variable
+<computeroutput>MPIWRAP_DEBUG</computeroutput> is consulted at
+startup.  The default behaviour is to print a starting banner</para>
+
+<programlisting><![CDATA[
+valgrind MPI wrappers 16386: Active for pid 16386
+valgrind MPI wrappers 16386: Try MPIWRAP_DEBUG=help for possible options
+]]></programlisting>
+
+<para> and then be relatively quiet.</para>
+
+<para>You can give a list of comma-separated options in
+<computeroutput>MPIWRAP_DEBUG</computeroutput>.  These are</para>
+
+<itemizedlist>
+  <listitem>
+    <para><computeroutput>verbose</computeroutput>:
+    show entries/exits of all wrappers.  Also show extra
+    debugging info, such as the status of outstanding 
+    <computeroutput>MPI_Request</computeroutput>s resulting
+    from uncompleted <computeroutput>MPI_Irecv</computeroutput>s.</para>
+  </listitem>
+  <listitem>
+    <para><computeroutput>quiet</computeroutput>: 
+    opposite of <computeroutput>verbose</computeroutput>, only print 
+    anything when the wrappers want
+    to report a detected programming error, or in case of catastrophic
+    failure of the wrappers.</para>
+  </listitem>
+  <listitem>
+    <para><computeroutput>warn</computeroutput>: 
+    by default, functions which lack proper wrappers
+    are not commented on, just silently
+    ignored.  This causes a warning to be printed for each unwrapped
+    function used, up to a maximum of three warnings per function.</para>
+  </listitem>
+  <listitem>
+    <para><computeroutput>strict</computeroutput>: 
+    print an error message and abort the program if 
+    a function lacking a wrapper is used.</para>
+  </listitem>
+</itemizedlist>
+
+<para> If you want to use Valgrind's XML output facility
+(<computeroutput>--xml=yes</computeroutput>), you should pass
+<computeroutput>quiet</computeroutput> in
+<computeroutput>MPIWRAP_DEBUG</computeroutput> so as to get rid of any
+extraneous printing from the wrappers.</para>
+
+</sect2>
+
+
+<sect2 id="mc-manual.mpiwrap.limitations" 
+       xreflabel="Abilities and Limitations of MPI Wrappers">
+<title>Abilities and limitations</title>
+
+<sect3 id="mc-manual.mpiwrap.limitations.functions" 
+       xreflabel="Functions">
+<title>Functions</title>
+
+<para>All MPI2 functions except
+<computeroutput>MPI_Wtick</computeroutput>,
+<computeroutput>MPI_Wtime</computeroutput> and
+<computeroutput>MPI_Pcontrol</computeroutput> have wrappers.  The
+first two are not wrapped because they return a 
+<computeroutput>double</computeroutput>, and Valgrind's
+function-wrap mechanism cannot handle that (it could easily enough be
+extended to).  <computeroutput>MPI_Pcontrol</computeroutput> cannot be
+wrapped as it has variable arity: 
+<computeroutput>int MPI_Pcontrol(const int level, ...)</computeroutput></para>
+
+<para>Most functions are wrapped with a default wrapper which does
+nothing except complain or abort if it is called, depending on
+settings in <computeroutput>MPIWRAP_DEBUG</computeroutput> listed
+above.  The following functions have "real", do-something-useful
+wrappers:</para>
+
+<programlisting><![CDATA[
+PMPI_Send PMPI_Bsend PMPI_Ssend PMPI_Rsend
+
+PMPI_Recv PMPI_Get_count
+
+PMPI_Isend PMPI_Ibsend PMPI_Issend PMPI_Irsend
+
+PMPI_Irecv
+PMPI_Wait PMPI_Waitall
+PMPI_Test PMPI_Testall
+
+PMPI_Iprobe PMPI_Probe
+
+PMPI_Cancel
+
+PMPI_Sendrecv
+
+PMPI_Type_commit PMPI_Type_free
+
+PMPI_Pack PMPI_Unpack
+
+PMPI_Bcast PMPI_Gather PMPI_Scatter PMPI_Alltoall
+PMPI_Reduce PMPI_Allreduce PMPI_Op_create
+
+PMPI_Comm_create PMPI_Comm_dup PMPI_Comm_free PMPI_Comm_rank PMPI_Comm_size
+
+PMPI_Error_string
+PMPI_Init PMPI_Initialized PMPI_Finalize
+]]></programlisting>
+
+<para> A few functions such as
+<computeroutput>PMPI_Address</computeroutput> are listed as
+<computeroutput>HAS_NO_WRAPPER</computeroutput>.  They have no wrapper
+at all as there is nothing worth checking, and giving a no-op wrapper
+would reduce performance for no reason.</para>
+
+<para> Note that the wrapper library itself can itself generate large
+numbers of calls to the MPI implementation, especially when walking
+complex types.  The most common functions called are
+<computeroutput>PMPI_Extent</computeroutput>,
+<computeroutput>PMPI_Type_get_envelope</computeroutput>,
+<computeroutput>PMPI_Type_get_contents</computeroutput>, and
+<computeroutput>PMPI_Type_free</computeroutput>.  </para>
+</sect3>
+
+<sect3 id="mc-manual.mpiwrap.limitations.types" 
+       xreflabel="Types">
+<title>Types</title>
+
+<para> MPI-1.1 structured types are supported, and walked exactly.
+The currently supported combiners are
+<computeroutput>MPI_COMBINER_NAMED</computeroutput>,
+<computeroutput>MPI_COMBINER_CONTIGUOUS</computeroutput>,
+<computeroutput>MPI_COMBINER_VECTOR</computeroutput>,
+<computeroutput>MPI_COMBINER_HVECTOR</computeroutput>
+<computeroutput>MPI_COMBINER_INDEXED</computeroutput>,
+<computeroutput>MPI_COMBINER_HINDEXED</computeroutput> and
+<computeroutput>MPI_COMBINER_STRUCT</computeroutput>.  This should
+cover all MPI-1.1 types.  The mechanism (function
+<computeroutput>walk_type</computeroutput>) should extend easily to
+cover MPI2 combiners.</para>
+
+<para>MPI defines some named structured types
+(<computeroutput>MPI_FLOAT_INT</computeroutput>,
+<computeroutput>MPI_DOUBLE_INT</computeroutput>,
+<computeroutput>MPI_LONG_INT</computeroutput>,
+<computeroutput>MPI_2INT</computeroutput>,
+<computeroutput>MPI_SHORT_INT</computeroutput>,
+<computeroutput>MPI_LONG_DOUBLE_INT</computeroutput>) which are pairs
+of some basic type and a C <computeroutput>int</computeroutput>.
+Unfortunately the MPI specification makes it impossible to look inside
+these types and see where the fields are.  Therefore these wrappers
+assume the types are laid out as <computeroutput>struct { float val;
+int loc; }</computeroutput> (for
+<computeroutput>MPI_FLOAT_INT</computeroutput>), etc, and act
+accordingly.  This appears to be correct at least for Open MPI 1.0.2
+and for Quadrics MPI.</para>
+
+<para>If <computeroutput>strict</computeroutput> is an option specified 
+in <computeroutput>MPIWRAP_DEBUG</computeroutput>, the application
+will abort if an unhandled type is encountered.  Otherwise, the 
+application will print a warning message and continue.</para>
+
+<para>Some effort is made to mark/check memory ranges corresponding to
+arrays of values in a single pass.  This is important for performance
+since asking Valgrind to mark/check any range, no matter how small,
+carries quite a large constant cost.  This optimisation is applied to
+arrays of primitive types (<computeroutput>double</computeroutput>,
+<computeroutput>float</computeroutput>,
+<computeroutput>int</computeroutput>,
+<computeroutput>long</computeroutput>, <computeroutput>long
+long</computeroutput>, <computeroutput>short</computeroutput>,
+<computeroutput>char</computeroutput>, and <computeroutput>long
+double</computeroutput> on platforms where <computeroutput>sizeof(long
+double) == 8</computeroutput>).  For arrays of all other types, the
+wrappers handle each element individually and so there can be a very
+large performance cost.</para>
+
+</sect3>
+
+</sect2>
+
+
+<sect2 id="mc-manual.mpiwrap.writingwrappers" 
+       xreflabel="Writing new MPI Wrappers">
+<title>Writing new wrappers</title>
+
+<para>
+For the most part the wrappers are straightforward.  The only
+significant complexity arises with nonblocking receives.</para>
+
+<para>The issue is that <computeroutput>MPI_Irecv</computeroutput>
+states the recv buffer and returns immediately, giving a handle
+(<computeroutput>MPI_Request</computeroutput>) for the transaction.
+Later the user will have to poll for completion with
+<computeroutput>MPI_Wait</computeroutput> etc, and when the
+transaction completes successfully, the wrappers have to paint the
+recv buffer.  But the recv buffer details are not presented to
+<computeroutput>MPI_Wait</computeroutput> -- only the handle is.  The
+library therefore maintains a shadow table which associates
+uncompleted <computeroutput>MPI_Request</computeroutput>s with the
+corresponding buffer address/count/type.  When an operation completes,
+the table is searched for the associated address/count/type info, and
+memory is marked accordingly.</para>
+
+<para>Access to the table is guarded by a (POSIX pthreads) lock, so as
+to make the library thread-safe.</para>
+
+<para>The table is allocated with
+<computeroutput>malloc</computeroutput> and never
+<computeroutput>free</computeroutput>d, so it will show up in leak
+checks.</para>
+
+<para>Writing new wrappers should be fairly easy.  The source file is
+<computeroutput>auxprogs/libmpiwrap.c</computeroutput>.  If possible,
+find an existing wrapper for a function of similar behaviour to the
+one you want to wrap, and use it as a starting point.  The wrappers
+are organised in sections in the same order as the MPI 1.1 spec, to
+aid navigation.  When adding a wrapper, remember to comment out the
+definition of the default wrapper in the long list of defaults at the
+bottom of the file (do not remove it, just comment it out).</para>
+</sect2>
+
+<sect2 id="mc-manual.mpiwrap.whattoexpect" 
+       xreflabel="What to expect with MPI Wrappers">
+<title>What to expect when using the wrappers</title>
+
+<para>The wrappers should reduce Memcheck's false-error rate on MPI
+applications.  Because the wrapping is done at the MPI interface,
+there will still potentially be a large number of errors reported in
+the MPI implementation below the interface.  The best you can do is
+try to suppress them.</para>
+
+<para>You may also find that the input-side (buffer
+length/definedness) checks find errors in your MPI use, for example
+passing too short a buffer to
+<computeroutput>MPI_Recv</computeroutput>.</para>
+
+<para>Functions which are not wrapped may increase the false
+error rate.  A possible approach is to run with
+<computeroutput>MPI_DEBUG</computeroutput> containing
+<computeroutput>warn</computeroutput>.  This will show you functions
+which lack proper wrappers but which are nevertheless used.  You can
+then write wrappers for them.
+</para>
+
+<para>A known source of potential false errors are the
+<computeroutput>PMPI_Reduce</computeroutput> family of functions, when
+using a custom (user-defined) reduction function.  In a reduction
+operation, each node notionally sends data to a "central point" which
+uses the specified reduction function to merge the data items into a
+single item.  Hence, in general, data is passed between nodes and fed
+to the reduction function, but the wrapper library cannot mark the
+transferred data as initialised before it is handed to the reduction
+function, because all that happens "inside" the
+<computeroutput>PMPI_Reduce</computeroutput> call.  As a result you
+may see false positives reported in your reduction function.</para>
+
+</sect2>

 </sect1>
+
+
+
+
+
 </chapter>