diff --git a/addrcheck/docs/ac-manual.xml b/addrcheck/docs/ac-manual.xml
index ff3b4ffdd..8ba188e17 100644
--- a/addrcheck/docs/ac-manual.xml
+++ b/addrcheck/docs/ac-manual.xml
@@ -2,25 +2,24 @@
 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
   "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
 
+
 <chapter id="ac-manual" xreflabel="Addrcheck: a lightweight memory checker">
   <title>Addrcheck: a lightweight memory checker</title>
 
 <para>To use this tool, you must specify
-<computeroutput>--tool=addrcheck</computeroutput> on the Valgrind
-command line.</para>
+<option>--tool=addrcheck</option> on the Valgrind command line.</para>
 
-<para>Note: Addrcheck does not work in Valgrind 3.1.0.  We may
-reinstate it in later releases.</para>
+<para>Note: Addrcheck does not work in Valgrind 3.1.0.  We may reinstate
+it in later releases.</para>
 
 <sect1>
 <title>Kinds of bugs that Addrcheck can find</title>
 
-<para>Addrcheck is a simplified version of the Memcheck tool
-described in Section 3.  It is identical in every way to
-Memcheck, except for one important detail: it does not do the
-undefined-value checks that Memcheck does.  This means Addrcheck
-is faster than Memcheck, and uses less memory.
-Addrcheck can detect the following errors:</para>
+<para>Addrcheck is a simplified version of the Memcheck tool described
+in Section 3.  It is identical in every way to Memcheck, except for one
+important detail: it does not do the undefined-value checks that
+Memcheck does.  This means Addrcheck is faster than Memcheck, and uses
+less memory.  Addrcheck can detect the following errors:</para>
 
 <itemizedlist>
   <listitem>
@@ -33,7 +32,7 @@ Addrcheck can detect the following errors:</para>
     <para>Reading/writing inappropriate areas on the stack</para>
   </listitem>
   <listitem>
-    <para>Memory leaks -- where pointers to malloc'd blocks are lost
+    <para>Memory leaks - where pointers to malloc'd blocks are lost
     forever</para>
   </listitem>
   <listitem>
@@ -48,61 +47,56 @@ Addrcheck can detect the following errors:</para>
 </itemizedlist>
 
 
-<para>Rather than duplicate much of the Memcheck docs here, 
-users of Addrcheck are advised to read <xref linkend="mc-manual.bugs"/>.  
-Some important points:</para>
+<para>Rather than duplicate much of the Memcheck docs here, users of
+Addrcheck are advised to read <xref linkend="mc-manual.bugs"/>.  Some
+important points:</para>
 
 <itemizedlist>
 
   <listitem>
     <para>Addrcheck is exactly like Memcheck, except that all the
-    value-definedness tracking machinery has been removed.
-    Therefore, the Memcheck documentation which discusses
-    definedess ("V-bits") is irrelevant.  The stuff on
-    addressibility ("A-bits") is still relevant.</para>
+    value-definedness tracking machinery has been removed.  Therefore,
+    the Memcheck documentation which discusses definedess ("V-bits") is
+    irrelevant.  The stuff on addressibility ("A-bits") is still
+    relevant.</para>
   </listitem>
 
   <listitem>
-    <para>Addrcheck accepts the same command-line flags as
-    Memcheck, with the exception of ... (to be filled in).</para>
+    <para>Addrcheck accepts the same command-line flags as Memcheck,
+    with the exception of ... (to be filled in).</para>
   </listitem>
 
   <listitem>
     <para>Like Memcheck, Addrcheck will do memory leak checking
-    (internally, the same code does leak checking for both
-    tools).  The only difference is how the two tools decide
-    which memory locations to consider when searching for
-    pointers to blocks.  Memcheck will only consider 4-byte
-    aligned locations which are validly addressible and which
-    hold defined values.  Addrcheck does not track definedness
-    and so cannot apply the last, "defined value",
-    criteria.</para>
+    (internally, the same code does leak checking for both tools).  The
+    only difference is how the two tools decide which memory locations
+    to consider when searching for pointers to blocks.  Memcheck will
+    only consider 4-byte aligned locations which are validly addressible
+    and which hold defined values.  Addrcheck does not track definedness
+    and so cannot apply the last, "defined value", criteria.</para>
 
-    <para>The result is that Addrcheck's leak checker may
-    "discover" pointers to blocks that Memcheck would not.  So it
-    is possible that Memcheck could (correctly) conclude that a
-    block is leaked, yet Addrcheck would not conclude
-    that.</para>
+    <para>The result is that Addrcheck's leak checker may "discover"
+    pointers to blocks that Memcheck would not.  So it is possible that
+    Memcheck could (correctly) conclude that a block is leaked, yet
+    Addrcheck would not conclude that.</para>
 
-    <para>Whether or not this has any effect in practice is
-    unknown.  I suspect not, but that is mere speculation at this
-    stage.</para>
+    <para>Whether or not this has any effect in practice is unknown.  I
+    suspect not, but that is mere speculation at this stage.</para>
   </listitem>
 
 </itemizedlist>
 
-<para>Addrcheck is, therefore, a fine-grained address checker.
-All it really does is check each memory reference to say whether
-or not that location may validly be addressed.  Addrcheck has a
-memory overhead of one bit per byte of used address space.  In
-contrast, Memcheck has an overhead of nine bits per byte.</para>
+<para>Addrcheck is, therefore, a fine-grained address checker.  All it
+really does is check each memory reference to say whether or not that
+location may validly be addressed.  Addrcheck has a memory overhead of
+one bit per byte of used address space.  In contrast, Memcheck has an
+overhead of nine bits per byte.</para>
 
-<para>Addrcheck is quite pleasant to use.  It's faster than
-Memcheck, and the lack of valid-value checks has another side
-effect: the errors it does report are relatively easy to track
-down, compared to the tedious and often confusing search
-sometimes needed to find the cause of uninitialised-value errors
-reported by Memcheck.</para>
+<para>Addrcheck is quite pleasant to use.  It's faster than Memcheck,
+and the lack of valid-value checks has another side effect: the errors
+it does report are relatively easy to track down, compared to the
+tedious and often confusing search sometimes needed to find the cause of
+uninitialised-value errors reported by Memcheck.</para>
 
 </sect1>
 
diff --git a/cachegrind/docs/cg-manual.xml b/cachegrind/docs/cg-manual.xml
index 5b3f47968..2e49b147d 100644
--- a/cachegrind/docs/cg-manual.xml
+++ b/cachegrind/docs/cg-manual.xml
@@ -2,6 +2,7 @@
 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
   "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
 
+
 <chapter id="cg-manual" xreflabel="Cachegrind: a cache-miss profiler">
 <title>Cachegrind: a cache profiler</title>
 
@@ -302,28 +303,54 @@ programs that spawn child processes.</para>
 <sect2 id="cg-manual.cgopts" xreflabel="Cachegrind options">
 <title>Cachegrind options</title>
 
+<!-- start of xi:include in the manpage -->
+<para id="cg.opts.para">Manually specifies the I1/D1/L2 cache
+configuration, where <varname>size</varname> and
+<varname>line_size</varname> are measured in bytes.  The three items
+must be comma-separated, but with no spaces, eg:
+<literallayout>    valgrind --tool=cachegrind --I1=65535,2,64</literallayout>
+
+You can specify one, two or three of the I1/D1/L2 caches.  Any level not
+manually specified will be simulated using the configuration found in
+the normal way (via the CPUID instruction for automagic cache
+configuration, or failing that, via defaults).</para>
+
 <para>Cache-simulation specific options are:</para>
 
-<screen><![CDATA[
---I1=<size>,<associativity>,<line_size>
---D1=<size>,<associativity>,<line_size>
---L2=<size>,<associativity>,<line_size>
+<variablelist id="cg.opts.list">
 
-[default: uses CPUID for automagic cache configuration]]]></screen>
+  <varlistentry id="opt.I1" xreflabel="--I1">
+    <term>
+      <option><![CDATA[--I1=<size>,<associativity>,<line size> ]]></option>
+    </term>
+    <listitem>
+      <para>Specify the size, associativity and line size of the level 1
+      instruction cache.  </para>
+    </listitem>
+  </varlistentry>
 
-<para>Manually specifies the I1/D1/L2 cache configuration, where
-<computeroutput>size</computeroutput> and
-<computeroutput>line_size</computeroutput> are measured in bytes.
-The three items must be comma-separated, but with no spaces,
-eg:</para>
+  <varlistentry id="opt.D1" xreflabel="--D1">
+    <term>
+      <option><![CDATA[--D1=<size>,<associativity>,<line size> ]]></option>
+    </term>
+    <listitem>
+      <para>Specify the size, associativity and line size of the level 1
+      data cache.</para>
+    </listitem>
+  </varlistentry>
 
-<programlisting><![CDATA[
-valgrind --tool=cachegrind --I1=65535,2,64]]></programlisting>
+  <varlistentry id="opt.L2" xreflabel="--L2">
+    <term>
+      <option><![CDATA[--L2=<size>,<associativity>,<line size> ]]></option>
+    </term>
+    <listitem>
+      <para>Specify the size, associativity and line size of the level 2
+      cache.</para>
+    </listitem>
+  </varlistentry>
 
-<para>You can specify one, two or three of the I1/D1/L2 caches.
-Any level not manually specified will be simulated using the
-configuration found in the normal way (via the CPUID instruction,
-or failing that, via defaults).</para>
+</variablelist>
+<!-- end of xi:include in the manpage -->
 
 </sect2>
 
@@ -338,10 +365,10 @@ wide if possible, as the output lines can be quite long.</para>
 
 <para>To get a function-by-function summary, run
 <computeroutput>cg_annotate --pid</computeroutput> in a directory
-containing a <computeroutput>cachegrind.out.pid</computeroutput>
-file.  The <emphasis>--pid</emphasis> is required so that
-<computeroutput>cg_annotate</computeroutput> knows which log file
-to use when several are present.</para>
+containing a <filename>cachegrind.out.pid</filename> file.  The
+<emphasis>--pid</emphasis> is required so that
+<computeroutput>cg_annotate</computeroutput> knows which log file to use
+when several are present.</para>
 
 <para>The output looks like this:</para>
 
@@ -501,8 +528,7 @@ Ir        I1mr I2mr Dr        D1mr  D2mr  Dw        D1mw   D2mw    file:function
 
 <para>Then follows summary statistics for the whole
 program. These are similar to the summary provided when running
-<computeroutput>valgrind
---tool=cachegrind</computeroutput>.</para>
+<computeroutput>valgrind --tool=cachegrind</computeroutput>.</para>
   
 <para>Then follows function-by-function statistics. Each function
 is identified by a
diff --git a/docs/xml/manual-core.xml b/docs/xml/manual-core.xml
index c8257206d..2db084a8a 100644
--- a/docs/xml/manual-core.xml
+++ b/docs/xml/manual-core.xml
@@ -559,9 +559,10 @@ categories.</para>
 <sect2 id="manual-core.basicopts" xreflabel="Basic Options">
 <title>Basic Options</title>
 
-<para>These options work with all tools.</para>
+<!-- start of xi:include in the manpage -->
+<para id="basic.opts.para">These options work with all tools.</para>
 
-<variablelist id="basic.opts">
+<variablelist id="basic.opts.list">
 
   <varlistentry id="opt.help" xreflabel="--help">
     <term><option>-h --help</option></term>
@@ -743,6 +744,7 @@ categories.</para>
   </varlistentry>
 
 </variablelist>
+<!-- end of xi:include in the manpage -->
 
 </sect2>
 
@@ -750,10 +752,11 @@ categories.</para>
 <sect2 id="manual-core.erropts" xreflabel="Error-related Options">
 <title>Error-related options</title>
 
-<para>These options are used by all tools that can report
-errors, e.g. Memcheck, but not Cachegrind.</para>
+<!-- start of xi:include in the manpage -->
+<para id="error-related.opts.para">These options are used by all tools
+that can report errors, e.g. Memcheck, but not Cachegrind.</para>
 
-<variablelist id="error-related.opts">
+<variablelist id="error-related.opts.list">
 
   <varlistentry id="opt.xml" xreflabel="--xml">
     <term>
@@ -1011,6 +1014,7 @@ errors, e.g. Memcheck, but not Cachegrind.</para>
   </varlistentry>
 
 </variablelist>
+<!-- end of xi:include in the manpage -->
 
 </sect2>
 
@@ -1018,11 +1022,12 @@ errors, e.g. Memcheck, but not Cachegrind.</para>
 <sect2 id="manual-core.mallocopts" xreflabel="malloc()-related Options">
 <title><computeroutput>malloc()</computeroutput>-related Options</title>
 
-<para>For tools that use their own version of
+<!-- start of xi:include in the manpage -->
+<para id="malloc-related.opts.para">For tools that use their own version of
 <computeroutput>malloc()</computeroutput> (e.g. Memcheck and
 Addrcheck), the following options apply.</para>
 
-<variablelist id="malloc-related.opts">
+<variablelist id="malloc-related.opts.list">
 
   <varlistentry id="opt.alignment" xreflabel="--alignment">
     <term>
@@ -1039,6 +1044,7 @@ Addrcheck), the following options apply.</para>
   </varlistentry>
 
 </variablelist>
+<!-- end of xi:include in the manpage -->
 
 </sect2>
 
@@ -1046,11 +1052,12 @@ Addrcheck), the following options apply.</para>
 <sect2 id="manual-core.rareopts" xreflabel="Uncommon Options">
 <title>Uncommon Options</title>
 
-<para>These options apply to all tools, as they affect certain obscure
-workings of the Valgrind core.  Most people won't need to use
-these.</para>
+<!-- start of xi:include in the manpage -->
+<para id="uncommon.opts.para">These options apply to all tools, as they
+affect certain obscure workings of the Valgrind core.  Most people won't
+need to use these.</para>
 
-<variablelist id="uncommon.opts">
+<variablelist id="uncommon.opts.list">
 
   <varlistentry id="opt.run-libc-freeres" xreflabel="--run-libc-freeres">
     <term>
@@ -1161,6 +1168,7 @@ these.</para>
   </varlistentry>
 
 </variablelist>
+<!-- end of xi:include in the manpage -->
 
 </sect2>
 
@@ -1168,10 +1176,12 @@ these.</para>
 <sect2 id="manual-core.debugopts" xreflabel="Debugging Valgrind Options">
 <title>Debugging Valgrind Options</title>
 
-<para>There are also some options for debugging Valgrind itself.
-You shouldn't need to use them in the normal run of things.  If you
-wish to see the list, use the <computeroutput>--help-debug</computeroutput>
-option.</para>
+<!-- start of xi:include in the manpage -->
+<para id="debug.opts.para">There are also some options for debugging
+Valgrind itself.  You shouldn't need to use them in the normal run of
+things.  If you wish to see the list, use the
+<option>--help-debug</option> option.</para>
+<!-- end of xi:include in the manpage -->
 
 </sect2>
 
diff --git a/docs/xml/valgrind-manpage.xml b/docs/xml/valgrind-manpage.xml
index 4ea3ac074..e86ebb37d 100644
--- a/docs/xml/valgrind-manpage.xml
+++ b/docs/xml/valgrind-manpage.xml
@@ -112,9 +112,10 @@ leaks.</para>
 <refsect1 id="basic-options">
 <title>Basic Options</title>
 
-<para>These options work with all tools.</para>
+<xi:include href="manual-core.xml" xpointer="basic.opts.para"
+            xmlns:xi="http://www.w3.org/2001/XInclude" />
 
-<xi:include href="manual-core.xml" xpointer="basic.opts"
+<xi:include href="manual-core.xml" xpointer="basic.opts.list"
             xmlns:xi="http://www.w3.org/2001/XInclude" />
 
 </refsect1>
@@ -124,10 +125,10 @@ leaks.</para>
 <refsect1 id="error-related-options">
 <title>Error-Related Options</title>
 
-<para>These options are used by all tools that can report errors,
-e.g. Memcheck, but not Cachegrind.</para>
+<xi:include href="manual-core.xml" xpointer="error-related.opts.para"
+            xmlns:xi="http://www.w3.org/2001/XInclude" />
 
-<xi:include href="manual-core.xml" xpointer="error-related.opts"
+<xi:include href="manual-core.xml" xpointer="error-related.opts.list"
             xmlns:xi="http://www.w3.org/2001/XInclude" />
 
 </refsect1>
@@ -137,11 +138,10 @@ e.g. Memcheck, but not Cachegrind.</para>
 <refsect1 id="malloc-related-options">
 <title>malloc()-related Options</title>
 
-<para>For tools that use their own version of
-<function>malloc()</function> (e.g. Memcheck and Addrcheck), the
-following options apply.</para>
+<xi:include href="manual-core.xml" xpointer="malloc-related.opts.para"
+            xmlns:xi="http://www.w3.org/2001/XInclude" />
 
-<xi:include href="manual-core.xml" xpointer="malloc-related.opts"
+<xi:include href="manual-core.xml" xpointer="malloc-related.opts.list"
             xmlns:xi="http://www.w3.org/2001/XInclude" />
 
 </refsect1>
@@ -151,11 +151,11 @@ following options apply.</para>
 <refsect1 id="uncommon-options">
 <title>Uncommon Options</title>
 
-<para>These options apply to all tools, as they affect certain obscure
-workings of the Valgrind core.  Most people won't need to use
-these.</para>
 
-<xi:include href="manual-core.xml" xpointer="uncommon.opts"
+<xi:include href="manual-core.xml" xpointer="uncommon.opts.para"
+            xmlns:xi="http://www.w3.org/2001/XInclude" />
+
+<xi:include href="manual-core.xml" xpointer="uncommon.opts.list"
             xmlns:xi="http://www.w3.org/2001/XInclude" />
 
 </refsect1>
@@ -165,9 +165,8 @@ these.</para>
 <refsect1 id="debugging-valgrind-options">
 <title>Debugging Valgrind Options</title>
 
-<para>There are also some options for debugging Valgrind itself.  You
-shouldn't need to use them in the normal run of things.  If you wish to
-see the list, use the <option>--help-debug</option> option.</para>
+<xi:include href="manual-core.xml" xpointer="debug.opts.para"
+            xmlns:xi="http://www.w3.org/2001/XInclude" />
 
 </refsect1>
 
@@ -176,7 +175,8 @@ see the list, use the <option>--help-debug</option> option.</para>
 <refsect1 id="memcheck-options">
 <title>Memcheck Options</title>
 
-<xi:include href="../../memcheck/docs/mc-manual.xml" xpointer="mc.opts"
+<xi:include href="../../memcheck/docs/mc-manual.xml" 
+            xpointer="mc.opts.list"
             xmlns:xi="http://www.w3.org/2001/XInclude" />
 
 </refsect1>
@@ -186,10 +186,12 @@ see the list, use the <option>--help-debug</option> option.</para>
 <refsect1 id="cachegrind-options">
 <title>Cachegrind Options</title>
 
-<xi:include href="../../cachegrind/docs/cg-manual.xml" xpointer="cg.opts.p1"
+<xi:include href="../../cachegrind/docs/cg-manual.xml" 
+            xpointer="cg.opts.para"
             xmlns:xi="http://www.w3.org/2001/XInclude" />
 
-<xi:include href="../../cachegrind/docs/cg-manual.xml" xpointer="cg.opts"
+<xi:include href="../../cachegrind/docs/cg-manual.xml" 
+            xpointer="cg.opts.list"
             xmlns:xi="http://www.w3.org/2001/XInclude" />
 
 </refsect1>
@@ -199,7 +201,8 @@ see the list, use the <option>--help-debug</option> option.</para>
 <refsect1 id="massif-options">
 <title>Massif Options</title>
 
-<xi:include href="../../massif/docs/ms-manual.xml" xpointer="ms.opts"
+<xi:include href="../../massif/docs/ms-manual.xml" 
+            xpointer="ms.opts.list"
             xmlns:xi="http://www.w3.org/2001/XInclude" />
 
 </refsect1>
@@ -209,7 +212,8 @@ see the list, use the <option>--help-debug</option> option.</para>
 <refsect1 id="helgrind-options">
 <title>Helgrind Options</title>
 
-<xi:include href="../../helgrind/docs/hg-manual.xml" xpointer="hg.opts"
+<xi:include href="../../helgrind/docs/hg-manual.xml" 
+            xpointer="hg.opts.list"
             xmlns:xi="http://www.w3.org/2001/XInclude" />
 
 </refsect1>
@@ -219,7 +223,8 @@ see the list, use the <option>--help-debug</option> option.</para>
 <refsect1 id="lackey-options">
 <title>Lackey Options</title>
 
-<xi:include href="../../lackey/docs/lk-manual.xml" xpointer="lk.opts"
+<xi:include href="../../lackey/docs/lk-manual.xml" 
+            xpointer="lk.opts.list"
             xmlns:xi="http://www.w3.org/2001/XInclude" />
 
 </refsect1>
diff --git a/helgrind/docs/hg-manual.xml b/helgrind/docs/hg-manual.xml
index efea0124c..3f4cabb5c 100644
--- a/helgrind/docs/hg-manual.xml
+++ b/helgrind/docs/hg-manual.xml
@@ -1,19 +1,24 @@
 <?xml version="1.0"?> <!-- -*- sgml -*- -->
 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-  "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
+          "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
+
 
 <chapter id="hg-manual" xreflabel="Helgrind: a data-race detector">
   <title>Helgrind: a data-race detector</title>
 
-<para>Helgrind is a Valgrind tool for detecting data races in C
-and C++ programs that use the Pthreads library.</para>
+<para>To use this tool, you must specify
+<computeroutput>--tool=helgrind</computeroutput> on the Valgrind
+command line.</para>
 
 <para>Note: Helgrind does not work in Valgrind 3.1.0.  We hope
 to reinstate in version 3.2.0.</para>
 
-<para>To use this tool, you must specify
-<computeroutput>--tool=helgrind</computeroutput> on the Valgrind
-command line.</para>
+
+<sect1 id="hg-manual.data-races" xreflabel="Data Races">
+<title>Data Races</title>
+
+<para>Helgrind is a valgrind tool for detecting data races in C and C++
+programs that use the Pthreads library.</para>
 
 <para>It uses the Eraser algorithm described in:
 
@@ -36,6 +41,12 @@ command line.</para>
  </address>
 </para>
 
+</sect1>
+
+
+<sect1 id="hg-manual.what-does" xreflabel="What Helgrind Does">
+<title>What Helgrind Does</title>
+
 <para>Basically what Helgrind does is to look for memory
 locations which are accessed by more than one thread.  For each
 such location, Helgrind records which of the program's
@@ -55,6 +66,41 @@ can both access the same variable without holding a lock.</para>
 <para>There's a lot of other sophistication in Helgrind, aimed at
 reducing the number of false reports, and at producing useful
 error reports.  We hope to have more documentation one
-day...</para>
+day ... </para>
+
+</sect1>
+
+
+
+<sect1 id="hg-manual.options" xreflabel="Helgrind Options">
+<title>Helgrind Options</title>
+
+<para>Helgrind-specific options are:</para>
+
+<!-- start of xi:include in the manpage -->
+<variablelist id="hg.opts.list">
+
+  <varlistentry id="opt.private-stacks" xreflabel="--private-stacks">
+    <term>
+      <option><![CDATA[--private-stacks=<yes|no> [default: no] ]]></option>
+    </term>
+    <listitem>
+      <para>Assume thread stacks are used privately.</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry id="opt.show-last-access" xreflabel="--show-last-access">
+    <term>
+      <option><![CDATA[--show-last-access=<yes|some|no> [default: no] ]]></option>
+    </term>
+    <listitem>
+      <para>Show location of last word access on error.</para>
+    </listitem>
+  </varlistentry>
+
+</variablelist>
+<!-- end of xi:include in the manpage -->
+
+</sect1>
 
 </chapter>
diff --git a/lackey/docs/lk-manual.xml b/lackey/docs/lk-manual.xml
index d97100641..6bec5ea6a 100644
--- a/lackey/docs/lk-manual.xml
+++ b/lackey/docs/lk-manual.xml
@@ -10,10 +10,14 @@
 <computeroutput>--tool=lackey</computeroutput> on the Valgrind
 command line.</para>
 
-<para>Lackey is a simple Valgrind tool that does some basic
-program measurement.  It adds quite a lot of simple
-instrumentation to the program's code.  It is primarily intended
-to be of use as an example tool.</para>
+
+<sect1 id="lk-manual.overview" xreflabel="Overview">
+<title>Overview</title>
+
+<para>Lackey is a simple valgrind tool that does some basic program
+measurement.  It adds quite a lot of simple instrumentation to the
+program's code.  It is primarily intended to be of use as an example
+tool.</para>
 
 <para>It measures and reports:</para>
 
@@ -84,4 +88,38 @@ sophisticated job of the instrumentation, but that would undermine
 its role as a simple example tool.  Hence we have chosen not to do
 so.</para>
 
+</sect1>
+
+
+<sect1 id="lk-manual.options" xreflabel="Lackey Options">
+<title>Lackey Options</title>
+
+<para>Lackey-specific options are:</para>
+
+<!-- start of xi:include in the manpage -->
+<variablelist id="lk.opts.list">
+
+  <varlistentry id="opt.fnname" xreflabel="--fnname">
+    <term>
+      <option><![CDATA[--fnname=<name> [default: _dl_runtime_resolve()] ]]></option>
+    </term>
+    <listitem>
+      <para>Count calls to &lt;name&gt;.</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry id="opt.detailed-counts" xreflabel="--detailed-counts">
+    <term>
+      <option><![CDATA[--detailed-counts=<no|yes> [default: no] ]]></option>
+    </term>
+    <listitem>
+      <para>Count loads, stores and alu ops.</para>
+    </listitem>
+  </varlistentry>
+
+</variablelist>
+<!-- end of xi:include in the manpage -->
+
+</sect1>
+
 </chapter>
diff --git a/massif/docs/ms-manual.xml b/massif/docs/ms-manual.xml
index e9e0eb99f..e177a58ed 100644
--- a/massif/docs/ms-manual.xml
+++ b/massif/docs/ms-manual.xml
@@ -1,6 +1,7 @@
 <?xml version="1.0"?> <!-- -*- sgml -*- -->
 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-  "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
+          "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
+
 
 <chapter id="ms-manual" xreflabel="Massif: a heap profiler">
   <title>Massif: a heap profiler</title>
@@ -9,6 +10,7 @@
 <computeroutput>--tool=massif</computeroutput> on the Valgrind
 command line.</para>
 
+
 <sect1 id="ms-manual.spaceprof" xreflabel="Heap profiling">
 <title>Heap profiling</title>
 
@@ -393,73 +395,93 @@ please let us know.</para>
 </sect1>
 
 
-<sect1 id="ms-manual.options" xreflabel="Massif options">
-<title>Massif options</title>
+<sect1 id="ms-manual.options" xreflabel="Massif Options">
+<title>Massif Options</title>
 
 <para>Massif-specific options are:</para>
 
-<itemizedlist>
+<!-- start of xi:include in the manpage -->
+<variablelist id="ms.opts">
 
-  <listitem id="heap">
-    <para><computeroutput>--heap=no</computeroutput></para>
-    <para><computeroutput>--heap=yes</computeroutput> [default]</para>
-    <para>When enabled, profile heap usage in detail.  Without
-    it, the <filename>massif.pid.txt</filename> or
-    <filename>massif.pid.html</filename> will be very
-    short.</para>
-  </listitem>
+  <varlistentry id="opt.heap" xreflabel="--heap">
+    <term>
+      <option><![CDATA[--heap=<yes|no> [default: yes] ]]></option>
+    </term>
+    <listitem>
+      <para>When enabled, profile heap usage in detail.  Without it, the
+      <filename>massif.pid.txt</filename> or
+      <filename>massif.pid.html</filename> will be very short.</para>
+    </listitem>
+  </varlistentry>
 
-  <listitem id="heap-admin">
-    <para><computeroutput>--heap-admin=n</computeroutput>
-    [default: 8]</para>
-    <para>The number of admin bytes per block to use.  This can
-    only be an estimate of the average, since it may vary.  The
-    allocator used by <computeroutput>glibc</computeroutput>
-    requires somewhere between 4--15 bytes per block, depending
-    on various factors.  It also requires admin space for freed
-    blocks, although Massif does not count this.</para>
-  </listitem>
+  <varlistentry id="opt.heap-admin" xreflabel="--heap-admin">
+    <term>
+      <option><![CDATA[--heap-admin=<number> [default: 8] ]]></option>
+    </term>
+    <listitem>
+      <para>The number of admin bytes per block to use.  This can only
+      be an estimate of the average, since it may vary.  The allocator
+      used by <computeroutput>glibc</computeroutput> requires somewhere
+      between 4 to 15 bytes per block, depending on various factors.  It
+      also requires admin space for freed blocks, although
+      <constant>massif</constant> does not count this.</para>
+    </listitem>
+  </varlistentry>
 
-  <listitem id="stacks">
-    <para><computeroutput>--stacks=no</computeroutput></para>
-    <para><computeroutput>--stacks=yes</computeroutput> [default]</para>
-    <para>When enabled, include stack(s) in the profile.
-    Threaded programs can have multiple stacks.</para>
-  </listitem>
+  <varlistentry id="opt.stacks" xreflabel="--stacks">
+    <term>
+      <option><![CDATA[--stacks=<yes|no> [default: yes] ]]></option>
+    </term>
+    <listitem>
+      <para>When enabled, include stack(s) in the profile.  Threaded
+      programs can have multiple stacks.</para>
+    </listitem>
+  </varlistentry>
 
-  <listitem id="depth">
-    <para><computeroutput>--depth=n</computeroutput>
-    [default: 3]</para>
-    <para>Depth of call chains to present in the detailed heap
-    information.  Increasing it will give more information, but
-    Massif will run the program more slowly, using more memory,
-    and produce a bigger <computeroutput>.txt</computeroutput> /
-    <computeroutput>.hp</computeroutput> file.</para>
-  </listitem>
+  <varlistentry id="opt.depth" xreflabel="--depth">
+    <term>
+      <option><![CDATA[--depth=<number> [default: 3] ]]></option>
+    </term>
+    <listitem>
+      <para>Depth of call chains to present in the detailed heap
+      information.  Increasing it will give more information, but
+      <constant>massif</constant> will run the program more slowly,
+      using more memory, and produce a bigger
+      <filename>massif.pid.txt</filename> or
+      <filename>massif.pid.hp</filename> file.</para>
+    </listitem>
+  </varlistentry>
 
-  <listitem id="alloc-fn">
-    <para><computeroutput>--alloc-fn=name</computeroutput></para>
-    <para>Specify a function that allocates memory.  This is
-    useful for functions that are wrappers to
-    <computeroutput>malloc()</computeroutput>, which can fill up
-    the context information uselessly (and give very
-    uninformative bands on the graph).  Functions specified will
-    be ignored in contexts, i.e. treated as though they were
-    <computeroutput>malloc()</computeroutput>.  This option can
-    be specified multiple times on the command line, to name
-    multiple functions.</para>
-  </listitem>
+  <varlistentry id="opt.alloc-fn" xreflabel="--alloc-fn">
+    <term>
+      <option><![CDATA[--alloc-fn=<name> ]]></option>
+    </term>
+    <listitem>
+      <para>Specify a function that allocates memory.  This is useful
+      for functions that are wrappers to <function>malloc()</function>,
+      which can fill up the context information uselessly (and give very
+      uninformative bands on the graph).  Functions specified will be
+      ignored in contexts, i.e. treated as though they were
+      <function>malloc()</function>.  This option can be specified
+      multiple times on the command line, to name multiple
+      functions.</para>
+    </listitem>
+  </varlistentry>
 
-  <listitem id="format">
-    <para><computeroutput>--format=text</computeroutput> [default]</para>
-    <para><computeroutput>--format=html</computeroutput></para>
-    <para>Produce the detailed heap information in text or HTML
-    format.  The file suffix used will be either
-    <computeroutput>.txt</computeroutput> or
-    <computeroutput>.html</computeroutput>.</para>
-  </listitem>
+  <varlistentry id="opt.format" xreflabel="--format">
+    <term>
+      <option><![CDATA[--format=<text|html> [default: text] ]]></option>
+    </term>
+    <listitem>
+      <para>Produce the detailed heap information in text or HTML
+      format.  The file suffix used will be either
+      <filename>.txt</filename> or <filename>.html</filename>.</para>
+    </listitem>
+  </varlistentry>
 
-</itemizedlist>
+</variablelist>
+<!-- end of xi:include in the manpage -->
 
 </sect1>
+
 </chapter>
diff --git a/memcheck/docs/mc-manual.xml b/memcheck/docs/mc-manual.xml
index ae761485d..1d6f8dacb 100644
--- a/memcheck/docs/mc-manual.xml
+++ b/memcheck/docs/mc-manual.xml
@@ -1,24 +1,24 @@
 <?xml version="1.0"?> <!-- -*- sgml -*- -->
 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-  "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
+          "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
+
 
 <chapter id="mc-manual" xreflabel="Memcheck: a heavyweight memory checker">
 <title>Memcheck: a heavyweight memory checker</title>
 
-<para>To use this tool, you may specify
-<computeroutput>--tool=memcheck</computeroutput> on the Valgrind
-command line.  You don't have to, though, since Memcheck is the default
-tool.</para>
+<para>To use this tool, you may specify <option>--tool=memcheck</option>
+on the Valgrind command line.  You don't have to, though, since Memcheck
+is the default tool.</para>
 
 
 <sect1 id="mc-manual.bugs" 
        xreflabel="Kinds of bugs that Memcheck can find">
 <title>Kinds of bugs that Memcheck can find</title>
 
-<para>Memcheck is Valgrind's heavyweight memory checking
-tool.  All reads and writes of memory are checked, and
-calls to malloc/new/free/delete are intercepted. As a result,
-Memcheck can detect the following problems:</para>
+<para>Memcheck is Valgrind's heavyweight memory checking tool.  All
+reads and writes of memory are checked, and calls to
+malloc/new/free/delete are intercepted. As a result, Memcheck can detect
+the following problems:</para>
 
 <itemizedlist>
   <listitem>
@@ -34,7 +34,7 @@ Memcheck can detect the following problems:</para>
     <para>Reading/writing inappropriate areas on the stack</para>
   </listitem>
   <listitem>
-    <para>Memory leaks -- where pointers to malloc'd blocks are
+    <para>Memory leaks - where pointers to malloc'd blocks are
    lost forever</para>
   </listitem>
   <listitem>
@@ -44,7 +44,7 @@ Memcheck can detect the following problems:</para>
   <listitem>
     <para>Overlapping <computeroutput>src</computeroutput> and
     <computeroutput>dst</computeroutput> pointers in
-    <computeroutput>memcpy()</computeroutput> and related
+    <function>memcpy()</function> and related
     functions</para>
   </listitem>
 </itemizedlist>
@@ -57,122 +57,137 @@ Memcheck can detect the following problems:</para>
        xreflabel="Command-line flags specific to Memcheck">
 <title>Command-line flags specific to Memcheck</title>
 
-<itemizedlist id="leakcheck">
-  <listitem>
-    <para><computeroutput>--leak-check=no</computeroutput></para>
-    <para><computeroutput>--leak-check=summary</computeroutput> [default]</para>
-    <para><computeroutput>--leak-check=full</computeroutput></para>
-    <para>When enabled, search for memory leaks when the client
-    program finishes.  A memory leak means a malloc'd block,
-    which has not yet been free'd, but to which no pointer can be
-    found.  Such a block can never be free'd by the program,
-    since no pointer to it exists.  If set to
-    <computeroutput>summary</computeroutput>, it says how many leaks occurred.
-    If set to <computeroutput>all</computeroutput>, it gives details of each
-    individual leak.</para>
+<!-- start of xi:include in the manpage -->
+<variablelist id="mc.opts.list">
 
-  </listitem>
+  <varlistentry id="opt.leak-check" xreflabel="--leak-check">
+    <term>
+      <option><![CDATA[--leak-check=<no|summary|yes|full> [default: summary] ]]></option>
+    </term>
+    <listitem>
+      <para>When enabled, search for memory leaks when the client
+      program finishes.  A memory leak means a malloc'd block, which has
+      not yet been free'd, but to which no pointer can be found.  Such a
+      block can never be free'd by the program, since no pointer to it
+      exists.  If set to <varname>summary</varname>, it says how many
+      leaks occurred.  If set to <varname>full</varname> or
+      <varname>yes</varname>, it gives details of each individual
+      leak.</para>
+    </listitem>
+  </varlistentry>
 
-  <listitem id="showreach">
-    <para><computeroutput>--show-reachable=no</computeroutput>
-    [default]</para>
-    <para><computeroutput>--show-reachable=yes</computeroutput></para>
-    <para>When disabled, the memory leak detector only shows
-    blocks for which it cannot find a pointer to at all, or it
-    can only find a pointer to the middle of.  These blocks are
-    prime candidates for memory leaks.  When enabled, the leak
-    detector also reports on blocks which it could find a pointer
-    to.  Your program could, at least in principle, have freed
-    such blocks before exit.  Contrast this to blocks for which
-    no pointer, or only an interior pointer could be found: they
-    are more likely to indicate memory leaks, because you do not
-    actually have a pointer to the start of the block which you
-    can hand to <computeroutput>free</computeroutput>, even if
-    you wanted to.</para>
-  </listitem>
+  <varlistentry id="opt.show-reachable" xreflabel="--show-reachable">
+    <term>
+      <option><![CDATA[--show-reachable=<yes|no> [default: no] ]]></option>
+    </term>
+    <listitem>
+      <para>When disabled, the memory leak detector only shows blocks
+      for which it cannot find a pointer to at all, or it can only find
+      a pointer to the middle of.  These blocks are prime candidates for
+      memory leaks.  When enabled, the leak detector also reports on
+      blocks which it could find a pointer to.  Your program could, at
+      least in principle, have freed such blocks before exit.  Contrast
+      this to blocks for which no pointer, or only an interior pointer
+      could be found: they are more likely to indicate memory leaks,
+      because you do not actually have a pointer to the start of the
+      block which you can hand to <function>free</function>, even if you
+      wanted to.</para>
+    </listitem>
+  </varlistentry>
 
-  <listitem id="leakres">
-    <para><computeroutput>--leak-resolution=low</computeroutput>
-    [default]</para>
-    <para><computeroutput>--leak-resolution=med</computeroutput></para>
-    <para><computeroutput>--leak-resolution=high</computeroutput></para>
-    <para>When doing leak checking, determines how willing
-    Memcheck is to consider different backtraces to be the same.
-    When set to <computeroutput>low</computeroutput>, the
-    default, only the first two entries need match.  When
-    <computeroutput>med</computeroutput>, four entries have to
-    match.  When <computeroutput>high</computeroutput>, all
-    entries need to match.</para>
-    <para>For hardcore leak debugging, you probably want to use
-    <computeroutput>--leak-resolution=high</computeroutput>
-    together with
-    <computeroutput>--num-callers=40</computeroutput> or some
-    such large number.  Note however that this can give an
-    overwhelming amount of information, which is why the defaults
-    are 4 callers and low-resolution matching.</para>
-    <para>Note that the
-    <computeroutput>--leak-resolution=</computeroutput> setting
-    does not affect Memcheck's ability to find leaks.  It only
-    changes how the results are presented.</para>
-  </listitem>
+  <varlistentry id="opt.leak-resolution" xreflabel="--leak-resolution">
+    <term>
+      <option><![CDATA[--leak-resolution=<low|med|high> [default: low] ]]></option>
+    </term>
+    <listitem>
+      <para>When doing leak checking, determines how willing
+      <constant>memcheck</constant> is to consider different backtraces to
+      be the same.  When set to <varname>low</varname>, only the first
+      two entries need match.  When <varname>med</varname>, four entries
+      have to match.  When <varname>high</varname>, all entries need to
+      match.</para>
 
-  <listitem id="freelist">
-    <para><computeroutput>--freelist-vol=&lt;number></computeroutput>
-    [default: 5000000]</para>
-    <para>When the client program releases memory using free (in
-    <literal>C</literal>) or delete (<literal>C++</literal>),
-    that memory is not immediately made available for
-    re-allocation.  Instead it is marked inaccessible and placed
-    in a queue of freed blocks.  The purpose is to defer 
-    as long as possible the point at which freed-up memory comes back 
-    into circulation.  This increases the chance that Memcheck will be 
-    able to detect invalid accesses to blocks for some significant period
-    of time after they have been freed.</para>
-    <para>This flag specifies the maximum total size, in bytes,
-    of the blocks in the queue.  The default value is five million
-    bytes.  Increasing this increases the total amount of memory
-    used by Memcheck but may detect invalid uses of freed blocks
-    which would otherwise go undetected.</para>
-  </listitem>
+      <para>For hardcore leak debugging, you probably want to use
+      <option>--leak-resolution=high</option> together with
+      <option>--num-callers=40</option> or some such large number.  Note
+      however that this can give an overwhelming amount of information,
+      which is why the defaults are 4 callers and low-resolution
+      matching.</para>
 
-  <listitem id="gcc296">
-    <para><computeroutput>--workaround-gcc296-bugs=no</computeroutput>
-    [default]</para>
-    <para><computeroutput>--workaround-gcc296-bugs=yes</computeroutput></para>
-    <para>When enabled, assume that reads and writes some small
-    distance below the stack pointer are due to bugs in gcc
-    2.96, and does not report them.  The "small distance" is 256
-    bytes by default.  Note that gcc 2.96 is the default compiler
-    on some older Linux distributions (RedHat 7.X) and so you may need to use 
-    this flag.  Do not use it if you do not have to, as it can cause real errors 
-    to be overlooked.  A better alternative is to use a more recent gcc/g++ in 
-    which this bug is fixed.</para>
-  </listitem>
+      <para>Note that the <option>--leak-resolution=</option> setting
+      does not affect <constant>memcheck's</constant> ability to find
+      leaks.  It only changes how the results are presented.</para>
+    </listitem>
+  </varlistentry>
 
-  <listitem id="partial">
-    <para><computeroutput>--partial-loads-ok=yes</computeroutput>
-    </para>
-    <para><computeroutput>--partial-loads-ok=no</computeroutput>[default]
-    </para>
-    <para>Controls how Memcheck handles word-sized, word-aligned loads from
-    addresses for which some bytes are addressible and others are
-    not.  When <computeroutput>yes</computeroutput>,
-    such loads do not elicit an address error.
-    Instead, the loaded V bytes corresponding to the illegal
-    addresses indicate Undefined, and those corresponding to
-    legal addresses are loaded from shadow memory, as usual.</para>
-    <para>When <computeroutput>no</computeroutput>(the default), 
-    loads from partially invalid addresses are treated the same as 
-    loads from completely invalid addresses: an illegal-address error
-    is issued, and the resulting V bytes indicate valid data.</para>
-    <para>Note that code that behaves in this way is in violation of
-    the the ISO C/C++ standards, and should be considered broken.
-    If at all possible, such code should be fixed.  This flag should
-    be used only as a last resort.
-    </para>
-  </listitem>
+  <varlistentry id="opt.freelist-vol" xreflabel="--freelist-vol">
+    <term>
+      <option><![CDATA[--freelist-vol=<number> [default: 5000000] ]]></option>
+    </term>
+    <listitem>
+      <para>When the client program releases memory using
+      <function>free</function> (in <literal>C</literal>) or delete
+      (<literal>C++</literal>), that memory is not immediately made
+      available for re-allocation.  Instead, it is marked inaccessible
+      and placed in a queue of freed blocks.  The purpose is to defer as
+      long as possible the point at which freed-up memory comes back
+      into circulation.  This increases the chance that
+      <constant>memcheck</constant> will be able to detect invalid
+      accesses to blocks for some significant period of time after they
+      have been freed.</para>
+
+      <para>This flag specifies the maximum total size, in bytes, of the
+      blocks in the queue.  The default value is five million bytes.
+      Increasing this increases the total amount of memory used by
+      <constant>memcheck</constant> but may detect invalid uses of freed
+      blocks which would otherwise go undetected.</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry id="opt.workaround-gcc296-bugs" xreflabel="--workaround-gcc296-bugs">
+    <term>
+      <option><![CDATA[--workaround-gcc296-bugs=<yes|no> [default: no] ]]></option>
+    </term>
+    <listitem>
+      <para>When enabled, assume that reads and writes some small
+      distance below the stack pointer are due to bugs in gcc 2.96, and
+      does not report them.  The "small distance" is 256 bytes by
+      default.  Note that gcc 2.96 is the default compiler on some older
+      Linux distributions (RedHat 7.X) and so you may need to use this
+      flag.  Do not use it if you do not have to, as it can cause real
+      errors to be overlooked.  A better alternative is to use a more
+      recent gcc/g++ in which this bug is fixed.</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry id="opt.partial-loads-ok" xreflabel="--partial-loads-ok">
+    <term>
+      <option><![CDATA[--partial-loads-ok=<yes|no> [default: no] ]]></option>
+    </term>
+    <listitem>
+      <para>Controls how <constant>memcheck</constant> handles word-sized,
+      word-aligned loads from addresses for which some bytes are
+      addressible and others are not.  When <varname>yes</varname>, such
+      loads do not elicit an address error.  Instead, the loaded V bytes
+      corresponding to the illegal addresses indicate Undefined, and
+      those corresponding to legal addresses are loaded from shadow
+      memory, as usual.</para>
+
+      <para>When <varname>no</varname>, loads from partially invalid
+      addresses are treated the same as loads from completely invalid
+      addresses: an illegal-address error is issued, and the resulting V
+      bytes indicate valid data.</para>
+
+      <para>Note that code that behaves in this way is in violation of
+      the the ISO C/C++ standards, and should be considered broken.  If
+      at all possible, such code should be fixed.  This flag should be
+      used only as a last resort.</para>
+    </listitem>
+  </varlistentry>
+
+</variablelist>
+<!-- end of xi:include in the manpage -->
 
-</itemizedlist>
 </sect1>
 
 
@@ -180,14 +195,13 @@ Memcheck can detect the following problems:</para>
        xreflabel="Explanation of error messages from Memcheck">
 <title>Explanation of error messages from Memcheck</title>
 
-<para>Despite considerable sophistication under the hood,
-Memcheck can only really detect two kinds of errors: use of
-illegal addresses, and use of undefined values.  Nevertheless,
-this is enough to help you discover all sorts of
-memory-management nasties in your code.  This section presents a
-quick summary of what error messages mean.  The precise behaviour
-of the error-checking machinery is described in <xref
-linkend="mc-manual.machine"/>.</para>
+<para>Despite considerable sophistication under the hood, Memcheck can
+only really detect two kinds of errors: use of illegal addresses, and
+use of undefined values.  Nevertheless, this is enough to help you
+discover all sorts of memory-management nasties in your code.  This
+section presents a quick summary of what error messages mean.  The
+precise behaviour of the error-checking machinery is described in 
+<xref linkend="mc-manual.machine"/>.</para>
 
 
 <sect2 id="mc-manual.badrw" 
@@ -204,37 +218,34 @@ Invalid read of size 4
  Address 0xBFFFF0E0 is not stack'd, malloc'd or free'd
 ]]></programlisting>
 
-<para>This happens when your program reads or writes memory at a
-place which Memcheck reckons it shouldn't.  In this example, the
-program did a 4-byte read at address 0xBFFFF0E0, somewhere within
-the system-supplied library libpng.so.2.1.0.9, which was called
-from somewhere else in the same library, called from line 326 of
-<filename>qpngio.cpp</filename>, and so on.</para>
+<para>This happens when your program reads or writes memory at a place
+which Memcheck reckons it shouldn't.  In this example, the program did a
+4-byte read at address 0xBFFFF0E0, somewhere within the system-supplied
+library libpng.so.2.1.0.9, which was called from somewhere else in the
+same library, called from line 326 of <filename>qpngio.cpp</filename>,
+and so on.</para>
 
-<para>Memcheck tries to establish what the illegal address might
-relate to, since that's often useful.  So, if it points into a
-block of memory which has already been freed, you'll be informed
-of this, and also where the block was free'd at.  Likewise, if it
-should turn out to be just off the end of a malloc'd block, a
-common result of off-by-one-errors in array subscripting, you'll
-be informed of this fact, and also where the block was
-malloc'd.</para>
+<para>Memcheck tries to establish what the illegal address might relate
+to, since that's often useful.  So, if it points into a block of memory
+which has already been freed, you'll be informed of this, and also where
+the block was free'd at.  Likewise, if it should turn out to be just off
+the end of a malloc'd block, a common result of off-by-one-errors in
+array subscripting, you'll be informed of this fact, and also where the
+block was malloc'd.</para>
 
-<para>In this example, Memcheck can't identify the address.
-Actually the address is on the stack, but, for some reason, this
-is not a valid stack address -- it is below the stack pointer
-and that isn't allowed.  In this
-particular case it's probably caused by gcc generating invalid
+<para>In this example, Memcheck can't identify the address.  Actually
+the address is on the stack, but, for some reason, this is not a valid
+stack address -- it is below the stack pointer and that isn't allowed.
+In this particular case it's probably caused by gcc generating invalid
 code, a known bug in some ancient versions of gcc.</para>
 
-<para>Note that Memcheck only tells you that your program is
-about to access memory at an illegal address.  It can't stop the
-access from happening.  So, if your program makes an access which
-normally would result in a segmentation fault, you program will
-still suffer the same fate -- but you will get a message from
-Memcheck immediately prior to this.  In this particular example,
-reading junk on the stack is non-fatal, and the program stays
-alive.</para>
+<para>Note that Memcheck only tells you that your program is about to
+access memory at an illegal address.  It can't stop the access from
+happening.  So, if your program makes an access which normally would
+result in a segmentation fault, you program will still suffer the same
+fate -- but you will get a message from Memcheck immediately prior to
+this.  In this particular example, reading junk on the stack is
+non-fatal, and the program stays alive.</para>
 
 </sect2>
 
@@ -252,11 +263,11 @@ Conditional jump or move depends on uninitialised value(s)
    by 0x8048472: main (tests/manuel1.c:8)
 ]]></programlisting>
 
-<para>An uninitialised-value use error is reported when your
-program uses a value which hasn't been initialised -- in other
-words, is undefined.  Here, the undefined value is used somewhere
-inside the printf() machinery of the C library.  This error was
-reported when running the following small program:</para>
+<para>An uninitialised-value use error is reported when your program
+uses a value which hasn't been initialised -- in other words, is
+undefined.  Here, the undefined value is used somewhere inside the
+printf() machinery of the C library.  This error was reported when
+running the following small program:</para>
 <programlisting><![CDATA[
 int main()
 {
@@ -264,30 +275,29 @@ int main()
   printf ("x = %d\n", x);
 }]]></programlisting>
 
-<para>It is important to understand that your program can copy
-around junk (uninitialised) data as much as it likes.
-Memcheck observes this and keeps track of the data, but does not
-complain.  A complaint is issued only when your program attempts
-to make use of uninitialised data.  In this example, x is
-uninitialised.  Memcheck observes the value being passed to
-<literal>_IO_printf</literal> and thence to
+<para>It is important to understand that your program can copy around
+junk (uninitialised) data as much as it likes.  Memcheck observes this
+and keeps track of the data, but does not complain.  A complaint is
+issued only when your program attempts to make use of uninitialised
+data.  In this example, x is uninitialised.  Memcheck observes the value
+being passed to <literal>_IO_printf</literal> and thence to
 <literal>_IO_vfprintf</literal>, but makes no comment.  However,
-_IO_vfprintf has to examine the value of x so it can turn it into
-the corresponding ASCII string, and it is at this point that
-Memcheck complains.</para>
+_IO_vfprintf has to examine the value of x so it can turn it into the
+corresponding ASCII string, and it is at this point that Memcheck
+complains.</para>
 
 <para>Sources of uninitialised data tend to be:</para>
 <itemizedlist>
   <listitem>
-    <para>Local variables in procedures which have not been
-    initialised, as in the example above.</para>
+    <para>Local variables in procedures which have not been initialised,
+    as in the example above.</para>
   </listitem>
   <listitem>
-    <para>The contents of malloc'd blocks, before you write
-    something there.  In C++, the new operator is a wrapper round
-    malloc, so if you create an object with new, its fields will
-    be uninitialised until you (or the constructor) fill them in,
-    which is only Right and Proper.</para>
+    <para>The contents of malloc'd blocks, before you write something
+    there.  In C++, the new operator is a wrapper round malloc, so if
+    you create an object with new, its fields will be uninitialised
+    until you (or the constructor) fill them in, which is only Right and
+    Proper.</para>
   </listitem>
 </itemizedlist>
 
@@ -308,14 +318,13 @@ Invalid free()
    by 0x80484C7: main (tests/doublefree.c:10)
 ]]></programlisting>
 
-<para>Memcheck keeps track of the blocks allocated by your
-program with malloc/new, so it can know exactly whether or not
-the argument to free/delete is legitimate or not.  Here, this
-test program has freed the same block twice.  As with the illegal
-read/write errors, Memcheck attempts to make sense of the address
-free'd.  If, as here, the address is one which has previously
-been freed, you wil be told that -- making duplicate frees of the
-same block easy to spot.</para>
+<para>Memcheck keeps track of the blocks allocated by your program with
+malloc/new, so it can know exactly whether or not the argument to
+free/delete is legitimate or not.  Here, this test program has freed the
+same block twice.  As with the illegal read/write errors, Memcheck
+attempts to make sense of the address free'd.  If, as here, the address
+is one which has previously been freed, you wil be told that -- making
+duplicate frees of the same block easy to spot.</para>
 
 </sect2>
 
@@ -327,8 +336,8 @@ function">
 function</title>
 
 <para>In the following example, a block allocated with
-<computeroutput>new[]</computeroutput> has wrongly been
-deallocated with <computeroutput>free</computeroutput>:</para>
+<function>new[]</function> has wrongly been deallocated with
+<function>free</function>:</para>
 <programlisting><![CDATA[
 Mismatched free() / delete / delete []
    at 0x40043249: free (vg_clientfuncs.c:171)
@@ -342,47 +351,42 @@ Mismatched free() / delete / delete []
    by 0x4C21788F: OLEFilter::convert(QCString const &) (olefilter.cc:272)
 ]]></programlisting>
 
-<para>In <literal>C++</literal> it's important to deallocate
-memory in a way compatible with how it was allocated.  The deal
-is:</para>
+<para>In <literal>C++</literal> it's important to deallocate memory in a
+way compatible with how it was allocated.  The deal is:</para>
 <itemizedlist>
   <listitem>
     <para>If allocated with
-    <computeroutput>malloc</computeroutput>,
-    <computeroutput>calloc</computeroutput>,
-    <computeroutput>realloc</computeroutput>,
-    <computeroutput>valloc</computeroutput> or
-    <computeroutput>memalign</computeroutput>, you must
-    deallocate with <computeroutput>free</computeroutput>.</para>
+    <function>malloc</function>,
+    <function>calloc</function>,
+    <function>realloc</function>,
+    <function>valloc</function> or
+    <function>memalign</function>, you must
+    deallocate with <function>free</function>.</para>
   </listitem>
   <listitem>
-    <para>If allocated with
-    <computeroutput>new[]</computeroutput>, you must deallocate
-    with <computeroutput>delete[]</computeroutput>.</para>
+    <para>If allocated with <function>new[]</function>, you must
+    deallocate with <function>delete[]</function>.</para>
   </listitem>
   <listitem>
-   <para>If allocated with <computeroutput>new</computeroutput>,
-   you must deallocate with
-   <computeroutput>delete</computeroutput>.</para>
+   <para>If allocated with <function>new</function>, you must deallocate
+   with <function>delete</function>.</para>
   </listitem>
 </itemizedlist>
 
-<para>The worst thing is that on Linux apparently it doesn't
-matter if you do muddle these up, and it all seems to work ok,
-but the same program may then crash on a different platform,
-Solaris for example.  So it's best to fix it properly.  According
-to the KDE folks "it's amazing how many C++ programmers don't
-know this".</para>
+<para>The worst thing is that on Linux apparently it doesn't matter if
+you do muddle these up, and it all seems to work ok, but the same
+program may then crash on a different platform, Solaris for example.  So
+it's best to fix it properly.  According to the KDE folks "it's amazing
+how many C++ programmers don't know this".</para>
 
 <para>Pascal Massimino adds the following clarification:
-<computeroutput>delete[]</computeroutput> must be used for
-objects allocated by <computeroutput>new[]</computeroutput> because
-the compiler stores the size of the array and the
-pointer-to-member to the destructor of the array's content just
-before the pointer actually returned.  This implies a
-variable-sized overhead in what's returned by
-<computeroutput>new</computeroutput> or
-<computeroutput>new[]</computeroutput>.</para>
+<function>delete[]</function> must be used for objects allocated by
+<function>new[]</function> because the compiler stores the size of the
+array and the pointer-to-member to the destructor of the array's content
+just before the pointer actually returned.  This implies a
+variable-sized overhead in what's returned by <function>new</function>
+or <function>new[]</function>.</para>
+
 </sect2>
 
 
@@ -395,19 +399,24 @@ permissions</title>
 
 <para>Memcheck checks all parameters to system calls:
 <itemizedlist>
-  <listitem><para>It checks all the direct parameters
-  themselves.</para></listitem> 
-  <listitem><para>Also, if a system call needs to read from a buffer provided
-  by your program, Memcheck checks that the entire buffer is addressible and
-  has valid data, ie, it is readable.</para></listitem>
-  <listitem><para>Also, if the system call needs to write to a user-supplied
-  buffer, Memcheck checks that the buffer is addressible.</para></listitem>
+  <listitem>
+    <para>It checks all the direct parameters themselves.</para>
+  </listitem> 
+  <listitem>
+    <para>Also, if a system call needs to read from a buffer provided by
+    your program, Memcheck checks that the entire buffer is addressible
+    and has valid data, ie, it is readable.</para>
+  </listitem>
+  <listitem>
+    <para>Also, if the system call needs to write to a user-supplied
+    buffer, Memcheck checks that the buffer is addressible.</para>
+  </listitem>
 </itemizedlist>
 </para>
 
 <para>After the system call, Memcheck updates its tracked information to
-precisely reflect any changes in memory permissions caused by the system call.
-</para>
+precisely reflect any changes in memory permissions caused by the system
+call.</para>
 
 <para>Here's an example of two system calls with invalid parameters:</para>
 <programlisting><![CDATA[
@@ -437,12 +446,14 @@ precisely reflect any changes in memory permissions caused by the system call.
      by 0x8048426: main (a.c:8)
 ]]></programlisting>
 
-<para>... because the program has (a) tried to write uninitialised junk from
-the malloc'd block to the standard output, and (b) passed an uninitialised
-value to <computeroutput>exit</computeroutput>.  Note that the first error
-refers to the memory pointed to by <computeroutput>buf</computeroutput> (not
-<computeroutput>buf</computeroutput> itself), but the second error refers to
-the argument <computeroutput>error_code</computeroutput> itself.</para>
+<para>... because the program has (a) tried to write uninitialised junk
+from the malloc'd block to the standard output, and (b) passed an
+uninitialised value to <function>exit</function>.  Note that the first
+error refers to the memory pointed to by
+<computeroutput>buf</computeroutput> (not
+<computeroutput>buf</computeroutput> itself), but the second error
+refers to the argument <computeroutput>error_code</computeroutput>
+itself.</para>
 
 </sect2>
 
@@ -453,15 +464,14 @@ the argument <computeroutput>error_code</computeroutput> itself.</para>
 
 <para>The following C library functions copy some data from one
 memory block to another (or something similar):
-<computeroutput>memcpy()</computeroutput>,
-<computeroutput>strcpy()</computeroutput>,
-<computeroutput>strncpy()</computeroutput>,
-<computeroutput>strcat()</computeroutput>,
-<computeroutput>strncat()</computeroutput>. 
-The blocks pointed to by their
-<computeroutput>src</computeroutput> and
-<computeroutput>dst</computeroutput> pointers aren't allowed to
-overlap.  Memcheck checks for this.</para>
+<function>memcpy()</function>,
+<function>strcpy()</function>,
+<function>strncpy()</function>,
+<function>strcat()</function>,
+<function>strncat()</function>. 
+The blocks pointed to by their <computeroutput>src</computeroutput> and
+<computeroutput>dst</computeroutput> pointers aren't allowed to overlap.
+Memcheck checks for this.</para>
 
 <para>For example:</para>
 <programlisting><![CDATA[
@@ -471,21 +481,22 @@ overlap.  Memcheck checks for this.</para>
 ==27492== 
 ]]></programlisting>
 
-<para>You don't want the two blocks to overlap because one of
-them could get partially trashed by the copying.</para>
+<para>You don't want the two blocks to overlap because one of them could
+get partially trashed by the copying.</para>
 
 <para>You might think that Memcheck is being overly pedantic reporting
-this in the case where <computeroutput>dst</computeroutput> is less
-than <computeroutput>src</computeroutput>.  For example, the obvious way
-to implement <computeroutput>memcpy()</computeroutput> is by copying
-from the first byte to the last.  However, the optimisation guides of
-some architectures recommend copying from the last byte down to the first.
-Also, some implementations of <computeroutput>memcpy()</computeroutput>
-zero <computeroutput>dst</computeroutput> before copying, because zeroing
-the destination's cache line(s) can improve performance.</para>
+this in the case where <computeroutput>dst</computeroutput> is less than
+<computeroutput>src</computeroutput>.  For example, the obvious way to
+implement <function>memcpy()</function> is by copying from the first
+byte to the last.  However, the optimisation guides of some
+architectures recommend copying from the last byte down to the first.
+Also, some implementations of <function>memcpy()</function> zero
+<computeroutput>dst</computeroutput> before copying, because zeroing the
+destination's cache line(s) can improve performance.</para>
 
-<para>The moral of the story is:  if you want to write truly portable code,
-don't make any assumptions about the language implementation.</para>
+<para>The moral of the story is: if you want to write truly portable
+code, don't make any assumptions about the language
+implementation.</para>
 
 </sect2>
 
@@ -493,54 +504,51 @@ don't make any assumptions about the language implementation.</para>
 <sect2 id="mc-manual.leaks" xreflabel="Memory leak detection">
 <title>Memory leak detection</title>
 
-<para>Memcheck keeps track of all memory blocks issued in
-response to calls to malloc/calloc/realloc/new.  So when the
-program exits, it knows which blocks have not been freed.
+<para>Memcheck keeps track of all memory blocks issued in response to
+calls to malloc/calloc/realloc/new.  So when the program exits, it knows
+which blocks have not been freed.
 </para>
 
-<para>If <computeroutput>--leak-check</computeroutput> is set
-appropriately, for each remaining block, Memcheck scans the entire
-address space of the process, looking for pointers to the block.
-Each block fits into one of the three following categories.</para>
+<para>If <option>--leak-check</option> is set appropriately, for each
+remaining block, Memcheck scans the entire address space of the process,
+looking for pointers to the block.  Each block fits into one of the
+three following categories.</para>
 
 <itemizedlist>
 
   <listitem>
-    <para>Still reachable: A pointer to the start
-    of the block is found.  This usually indicates programming
-    sloppiness.  Since the block is still pointed at, the
-    programmer could, at least in principle, free it before
-    program exit.  Because these are very common and arguably
+    <para>Still reachable: A pointer to the start of the block is found.
+    This usually indicates programming sloppiness.  Since the block is
+    still pointed at, the programmer could, at least in principle, free
+    it before program exit.  Because these are very common and arguably
     not a problem, Memcheck won't report such blocks unless
-    <computeroutput>--show-reachable=yes</computeroutput> is
-    specified.</para>
+    <option>--show-reachable=yes</option> is specified.</para>
   </listitem>
 
   <listitem>
-    <para>Possibly lost, or "dubious": A pointer to the
-    interior of the block is found.  The pointer might originally
-    have pointed to the start and have been moved along, or it
-    might be entirely unrelated.  Memcheck deems such a block as
-    "dubious", because it's unclear whether or not a pointer to it
-    still exists.</para>
+    <para>Possibly lost, or "dubious": A pointer to the interior of the
+    block is found.  The pointer might originally have pointed to the
+    start and have been moved along, or it might be entirely unrelated.
+    Memcheck deems such a block as "dubious", because it's unclear
+    whether or not a pointer to it still exists.</para>
   </listitem>
 
   <listitem>
-    <para>Definitely lost, or "leaked": The worst
-    outcome is that no pointer to the block can be found.  The
-    block is classified as "leaked", because the programmer could
-    not possibly have freed it at program exit, since no pointer
-    to it exists.  This is likely a symptom of having lost the
-    pointer at some earlier point in the program.</para>
+    <para>Definitely lost, or "leaked": The worst outcome is that no
+    pointer to the block can be found.  The block is classified as
+    "leaked", because the programmer could not possibly have freed it at
+    program exit, since no pointer to it exists.  This is likely a
+    symptom of having lost the pointer at some earlier point in the
+    program.</para>
     </listitem>
 
 </itemizedlist>
 
-<para>For each block mentioned, Memcheck will also tell you where
-the block was allocated.  It cannot tell you how or why the
-pointer to a leaked block has been lost;  you have to work that
-out for yourself.  In general, you should attempt to ensure your
-programs do not have any leaked or dubious blocks at exit.</para>
+<para>For each block mentioned, Memcheck will also tell you where the
+block was allocated.  It cannot tell you how or why the pointer to a
+leaked block has been lost; you have to work that out for yourself.  In
+general, you should attempt to ensure your programs do not have any
+leaked or dubious blocks at exit.</para>
 
 <para>For example:</para>
 <programlisting><![CDATA[
@@ -556,17 +564,16 @@ programs do not have any leaked or dubious blocks at exit.</para>
    by 0x........: main (leak-tree.c:25)
 ]]></programlisting>
 
-<para>The first message describes a simple case of a single 8 byte
-block that has been definitely lost.  The second case
-mentions both "direct" and "indirect" leaks.  The distinction is
-that a direct leak is a block which has no pointers to it.  An
-indirect leak is a block which is only pointed to by other leaked
-blocks.  Both kinds of leak are bad.</para>
+<para>The first message describes a simple case of a single 8 byte block
+that has been definitely lost.  The second case mentions both "direct"
+and "indirect" leaks.  The distinction is that a direct leak is a block
+which has no pointers to it.  An indirect leak is a block which is only
+pointed to by other leaked blocks.  Both kinds of leak are bad.</para>
 
-<para>The precise area of memory in which Memcheck searches for
-pointers is: all naturally-aligned machine-word-sized words for which all A
-bits indicate addressibility and all V bits indicated that the
-stored value is actually valid.</para>
+<para>The precise area of memory in which Memcheck searches for pointers
+is: all naturally-aligned machine-word-sized words for which all A bits
+indicate addressibility and all V bits indicated that the stored value
+is actually valid.</para>
 
 </sect2>
 
@@ -592,70 +599,68 @@ Memcheck,Addrcheck:suppression_type]]></programlisting>
 
 <itemizedlist>
   <listitem>
-    <para><computeroutput>Value1</computeroutput>, 
-    <computeroutput>Value2</computeroutput>,
-    <computeroutput>Value4</computeroutput>,
-    <computeroutput>Value8</computeroutput>,
-    <computeroutput>Value16</computeroutput>,
+    <para><varname>Value1</varname>, 
+    <varname>Value2</varname>,
+    <varname>Value4</varname>,
+    <varname>Value8</varname>,
+    <varname>Value16</varname>,
     meaning an uninitialised-value error when
     using a value of 1, 2, 4, 8 or 16 bytes.</para>
   </listitem>
 
   <listitem>
-    <para>Or: <computeroutput>Cond</computeroutput> (or its old
-    name, <computeroutput>Value0</computeroutput>), meaning use
+    <para>Or: <varname>Cond</varname> (or its old
+    name, <varname>Value0</varname>), meaning use
     of an uninitialised CPU condition code.</para>
   </listitem>
 
   <listitem>
-    <para>Or: <computeroutput>Addr1</computeroutput>,
-    <computeroutput>Addr2</computeroutput>, 
-    <computeroutput>Addr4</computeroutput>,
-    <computeroutput>Addr8</computeroutput>,
-    <computeroutput>Addr16</computeroutput>, 
+    <para>Or: <varname>Addr1</varname>,
+    <varname>Addr2</varname>, 
+    <varname>Addr4</varname>,
+    <varname>Addr8</varname>,
+    <varname>Addr16</varname>, 
     meaning an invalid address during a
     memory access of 1, 2, 4, 8 or 16 bytes respectively.</para>
   </listitem>
 
   <listitem>
-    <para>Or: <computeroutput>Param</computeroutput>, meaning an
+    <para>Or: <varname>Param</varname>, meaning an
     invalid system call parameter error.</para>
   </listitem>
 
   <listitem>
-    <para>Or: <computeroutput>Free</computeroutput>, meaning an
+    <para>Or: <varname>Free</varname>, meaning an
     invalid or mismatching free.</para>
   </listitem>
 
   <listitem>
-    <para>Or: <computeroutput>Overlap</computeroutput>, meaning a
+    <para>Or: <varname>Overlap</varname>, meaning a
     <computeroutput>src</computeroutput> /
     <computeroutput>dst</computeroutput> overlap in
-    <computeroutput>memcpy() or a similar
-    function</computeroutput>.</para>
+    <function>memcpy()</function> or a similar function.</para>
   </listitem>
 
   <listitem>
-    <para>Or: <computeroutput>Leak</computeroutput>, meaning
+    <para>Or: <varname>Leak</varname>, meaning
     a memory leak.</para>
   </listitem>
 
 </itemizedlist>
 
-<para>The extra information line: for Param errors, is the name
-of the offending system call parameter.  No other error kinds
-have this extra line.</para>
+<para>The extra information line: for Param errors, is the name of the
+offending system call parameter.  No other error kinds have this extra
+line.</para>
 
-<para>The first line of the calling context: for Value and Addr
-errors, it is either the name of the function in which the error
-occurred, or, failing that, the full path of the .so file or
-executable containing the error location.  For Free errors, is
-the name of the function doing the freeing (eg,
-<computeroutput>free</computeroutput>,
-<computeroutput>__builtin_vec_delete</computeroutput>, etc).  For
-Overlap errors, is the name of the function with the overlapping
-arguments (eg.  <computeroutput>memcpy()</computeroutput>,
-<computeroutput>strcpy()</computeroutput>, etc).</para>
+<para>The first line of the calling context: for Value and Addr errors,
+it is either the name of the function in which the error occurred, or,
+failing that, the full path of the .so file or executable containing the
+error location.  For Free errors, is the name of the function doing the
+freeing (eg, <function>free</function>,
+<function>__builtin_vec_delete</function>, etc).  For Overlap errors, is
+the name of the function with the overlapping arguments (eg.
+<function>memcpy()</function>, <function>strcpy()</function>,
+etc).</para>
 
 <para>Lastly, there's the rest of the calling context.</para>
 
@@ -674,35 +679,32 @@ what and how Memcheck is checking.</para>
 <sect2 id="mc-manual.value" xreflabel="Valid-value (V) bit">
 <title>Valid-value (V) bits</title>
 
-<para>It is simplest to think of Memcheck implementing a
-synthetic CPU which is identical to a real CPU, except
-for one crucial detail.  Every bit (literally) of data processed,
-stored and handled by the real CPU has, in the synthetic CPU, an
-associated "valid-value" bit, which says whether or not the
-accompanying bit has a legitimate value.  In the discussions
-which follow, this bit is referred to as the V (valid-value)
+<para>It is simplest to think of Memcheck implementing a synthetic CPU
+which is identical to a real CPU, except for one crucial detail.  Every
+bit (literally) of data processed, stored and handled by the real CPU
+has, in the synthetic CPU, an associated "valid-value" bit, which says
+whether or not the accompanying bit has a legitimate value.  In the
+discussions which follow, this bit is referred to as the V (valid-value)
 bit.</para>
 
-<para>Each byte in the system therefore has a 8 V bits which
-follow it wherever it goes.  For example, when the CPU loads a
-word-size item (4 bytes) from memory, it also loads the
-corresponding 32 V bits from a bitmap which stores the V bits for
-the process' entire address space.  If the CPU should later write
-the whole or some part of that value to memory at a different
-address, the relevant V bits will be stored back in the V-bit
-bitmap.</para>
+<para>Each byte in the system therefore has a 8 V bits which follow it
+wherever it goes.  For example, when the CPU loads a word-size item (4
+bytes) from memory, it also loads the corresponding 32 V bits from a
+bitmap which stores the V bits for the process' entire address space.
+If the CPU should later write the whole or some part of that value to
+memory at a different address, the relevant V bits will be stored back
+in the V-bit bitmap.</para>
 
-<para>In short, each bit in the system has an associated V bit,
-which follows it around everywhere, even inside the CPU.  Yes,
-all the CPU's registers (integer, floating point, vector and condition 
-registers) have their own V bit vectors.</para>
+<para>In short, each bit in the system has an associated V bit, which
+follows it around everywhere, even inside the CPU.  Yes, all the CPU's
+registers (integer, floating point, vector and condition registers) have
+their own V bit vectors.</para>
 
-<para>Copying values around does not cause Memcheck to check for,
-or report on, errors.  However, when a value is used in a way
-which might conceivably affect the outcome of your program's
-computation, the associated V bits are immediately checked.  If
-any of these indicate that the value is undefined, an error is
-reported.</para>
+<para>Copying values around does not cause Memcheck to check for, or
+report on, errors.  However, when a value is used in a way which might
+conceivably affect the outcome of your program's computation, the
+associated V bits are immediately checked.  If any of these indicate
+that the value is undefined, an error is reported.</para>
 
 <para>Here's an (admittedly nonsensical) example:</para>
 <programlisting><![CDATA[
@@ -713,11 +715,10 @@ for ( i = 0; i < 10; i++ ) {
   b[i] = j;
 }]]></programlisting>
 
-<para>Memcheck emits no complaints about this, since it merely
-copies uninitialised values from
-<computeroutput>a[]</computeroutput> into
-<computeroutput>b[]</computeroutput>, and doesn't use them in any
-way.  However, if the loop is changed to:</para>
+<para>Memcheck emits no complaints about this, since it merely copies
+uninitialised values from <varname>a[]</varname> into
+<varname>b[]</varname>, and doesn't use them in any way.  However, if
+the loop is changed to:</para>
 <programlisting><![CDATA[
 for ( i = 0; i < 10; i++ ) {
   j += a[i];
@@ -727,38 +728,34 @@ if ( j == 77 )
 ]]></programlisting>
 
 <para>then Valgrind will complain, at the
-<computeroutput>if</computeroutput>, that the condition depends
-on uninitialised values.  Note that it <command>doesn't</command>
-complain at the <computeroutput>j += a[i];</computeroutput>,
-since at that point the undefinedness is not "observable".  It's
-only when a decision has to be made as to whether or not to do
-the <computeroutput>printf</computeroutput> -- an observable
-action of your program -- that Memcheck complains.</para>
+<computeroutput>if</computeroutput>, that the condition depends on
+uninitialised values.  Note that it <command>doesn't</command> complain
+at the <varname>j += a[i];</varname>, since at that point the
+undefinedness is not "observable".  It's only when a decision has to be
+made as to whether or not to do the <function>printf</function> -- an
+observable action of your program -- that Memcheck complains.</para>
 
-<para>Most low level operations, such as adds, cause Memcheck to
-use the V bits for the operands to calculate the V bits for the result.
-Even if the result is partially or wholly undefined, it does not
+<para>Most low level operations, such as adds, cause Memcheck to use the
+V bits for the operands to calculate the V bits for the result.  Even if
+the result is partially or wholly undefined, it does not
 complain.</para>
 
-<para>Checks on definedness only occur in three places: when a
-value is used to generate a memory address, when control
-flow decision needs to be made, and when a system call is
-detected, Valgrind checks definedness of parameters as
-required.</para>
+<para>Checks on definedness only occur in three places: when a value is
+used to generate a memory address, when control flow decision needs to
+be made, and when a system call is detected, Valgrind checks definedness
+of parameters as required.</para>
 
 <para>If a check should detect undefinedness, an error message is
-issued.  The resulting value is subsequently regarded as
-well-defined.  To do otherwise would give long chains of error
-messages.  In effect, we say that undefined values are
-non-infectious.</para>
+issued.  The resulting value is subsequently regarded as well-defined.
+To do otherwise would give long chains of error messages.  In effect, we
+say that undefined values are non-infectious.</para>
 
-<para>This sounds overcomplicated.  Why not just check all reads
-from memory, and complain if an undefined value is loaded into a
-CPU register?  Well, that doesn't work well, because perfectly
-legitimate C programs routinely copy uninitialised values around
-in memory, and we don't want endless complaints about that.
-Here's the canonical example.  Consider a struct like
-this:</para>
+<para>This sounds overcomplicated.  Why not just check all reads from
+memory, and complain if an undefined value is loaded into a CPU
+register?  Well, that doesn't work well, because perfectly legitimate C
+programs routinely copy uninitialised values around in memory, and we
+don't want endless complaints about that.  Here's the canonical example.
+Consider a struct like this:</para>
 <programlisting><![CDATA[
 struct S { int x; char c; };
 struct S s1, s2;
@@ -767,30 +764,25 @@ s1.c = 'z';
 s2 = s1;
 ]]></programlisting>
 
-<para>The question to ask is: how large is <computeroutput>struct
-S</computeroutput>, in bytes?  An
-<computeroutput>int</computeroutput> is 4 bytes and a
-<computeroutput>char</computeroutput> one byte, so perhaps a
-<computeroutput>struct S</computeroutput> occupies 5 bytes?
-Wrong.  All (non-toy) compilers we know of will round the size of
-<computeroutput>struct S</computeroutput> up to a whole number of
-words, in this case 8 bytes.  Not doing this forces compilers to
-generate truly appalling code for subscripting arrays of
-<computeroutput>struct S</computeroutput>'s.</para>
+<para>The question to ask is: how large is <varname>struct S</varname>,
+in bytes?  An <varname>int</varname> is 4 bytes and a
+<varname>char</varname> one byte, so perhaps a <varname>struct
+S</varname> occupies 5 bytes?  Wrong.  All (non-toy) compilers we know
+of will round the size of <varname>struct S</varname> up to a whole
+number of words, in this case 8 bytes.  Not doing this forces compilers
+to generate truly appalling code for subscripting arrays of
+<varname>struct S</varname>'s.</para>
 
-<para>So <computeroutput>s1</computeroutput> occupies 8 bytes,
-yet only 5 of them will be initialised.  For the assignment
-<computeroutput>s2 = s1</computeroutput>, gcc generates code to
-copy all 8 bytes wholesale into
-<computeroutput>s2</computeroutput> without regard for their
-meaning.  If Memcheck simply checked values as they came out of
-memory, it would yelp every time a structure assignment like this
-happened.  So the more complicated semantics described above is
-necessary.  This allows <literal>gcc</literal> to copy
-<computeroutput>s1</computeroutput> into
-<computeroutput>s2</computeroutput> any way it likes, and a
-warning will only be emitted if the uninitialised values are
-later used.</para>
+<para>So <varname>s1</varname> occupies 8 bytes, yet only 5 of them will
+be initialised.  For the assignment <varname>s2 = s1</varname>, gcc
+generates code to copy all 8 bytes wholesale into <varname>s2</varname>
+without regard for their meaning.  If Memcheck simply checked values as
+they came out of memory, it would yelp every time a structure assignment
+like this happened.  So the more complicated semantics described above
+is necessary.  This allows <literal>gcc</literal> to copy
+<varname>s1</varname> into <varname>s2</varname> any way it likes, and a
+warning will only be emitted if the uninitialised values are later
+used.</para>
 
 </sect2>
 
@@ -798,27 +790,23 @@ later used.</para>
 <sect2 id="mc-manual.vaddress" xreflabel=" Valid-address (A) bits">
 <title>Valid-address (A) bits</title>
 
-<para>Notice that the previous subsection describes how the
-validity of values is established and maintained without having
-to say whether the program does or does not have the right to
-access any particular memory location.  We now consider the
-latter issue.</para>
+<para>Notice that the previous subsection describes how the validity of
+values is established and maintained without having to say whether the
+program does or does not have the right to access any particular memory
+location.  We now consider the latter issue.</para>
 
-<para>As described above, every bit in memory or in the CPU has
-an associated valid-value (V) bit.  In
-addition, all bytes in memory, but not in the CPU, have an
-associated valid-address (A) bit.  This
-indicates whether or not the program can legitimately read or
-write that location.  It does not give any indication of the
-validity or the data at that location -- that's the job of the
-V bits -- only whether or not the location may
-be accessed.</para>
+<para>As described above, every bit in memory or in the CPU has an
+associated valid-value (V) bit.  In addition, all bytes in memory, but
+not in the CPU, have an associated valid-address (A) bit.  This
+indicates whether or not the program can legitimately read or write that
+location.  It does not give any indication of the validity or the data
+at that location -- that's the job of the V bits -- only whether or not
+the location may be accessed.</para>
 
-<para>Every time your program reads or writes memory, Memcheck
-checks the A bits associated with the address.
-If any of them indicate an invalid address, an error is emitted.
-Note that the reads and writes themselves do not change the A
-bits, only consult them.</para>
+<para>Every time your program reads or writes memory, Memcheck checks
+the A bits associated with the address.  If any of them indicate an
+invalid address, an error is emitted.  Note that the reads and writes
+themselves do not change the A bits, only consult them.</para>
 
 <para>So how do the A bits get set/cleared?  Like this:</para>
 
@@ -829,38 +817,36 @@ bits, only consult them.</para>
   </listitem>
 
   <listitem>
-    <para>When the program does malloc/new, the A bits for
-    exactly the area allocated, and not a byte more, are marked
-    as accessible.  Upon freeing the area the A bits are changed
-    to indicate inaccessibility.</para>
+    <para>When the program does malloc/new, the A bits for exactly the
+    area allocated, and not a byte more, are marked as accessible.  Upon
+    freeing the area the A bits are changed to indicate
+    inaccessibility.</para>
   </listitem>
 
   <listitem>
-    <para>When the stack pointer register
-    (<literal>SP</literal>) moves up or down,
-    A bits are set.  The rule is that the area
-    from <literal>SP</literal> up to the base of the stack is
-    marked as accessible, and below <literal>SP</literal> is
-    inaccessible.  (If that sounds illogical, bear in mind that
-    the stack grows down, not up, on almost all Unix systems,
-    including GNU/Linux.)  Tracking <literal>SP</literal> like
-    this has the useful side-effect that the section of stack
-    used by a function for local variables etc is automatically
-    marked accessible on function entry and inaccessible on
-    exit.</para>
+    <para>When the stack pointer register (<literal>SP</literal>) moves
+    up or down, A bits are set.  The rule is that the area from
+    <literal>SP</literal> up to the base of the stack is marked as
+    accessible, and below <literal>SP</literal> is inaccessible.  (If
+    that sounds illogical, bear in mind that the stack grows down, not
+    up, on almost all Unix systems, including GNU/Linux.)  Tracking
+    <literal>SP</literal> like this has the useful side-effect that the
+    section of stack used by a function for local variables etc is
+    automatically marked accessible on function entry and inaccessible
+    on exit.</para>
   </listitem>
 
   <listitem>
-    <para>When doing system calls, A bits are changed
-    appropriately.  For example, mmap() magically makes files
-    appear in the process' address space, so the A bits must be
-    updated if mmap() succeeds.</para>
+    <para>When doing system calls, A bits are changed appropriately.
+    For example, mmap() magically makes files appear in the process'
+    address space, so the A bits must be updated if mmap()
+    succeeds.</para>
   </listitem>
 
   <listitem>
-    <para>Optionally, your program can tell Valgrind about such
-    changes explicitly, using the client request mechanism
-    described above.</para>
+    <para>Optionally, your program can tell Valgrind about such changes
+    explicitly, using the client request mechanism described
+    above.</para>
   </listitem>
 
 </itemizedlist>
@@ -876,118 +862,108 @@ follows:</para>
 
 <itemizedlist>
   <listitem>
-    <para>Each byte in memory has 8 associated
-    V (valid-value) bits, saying whether or
-    not the byte has a defined value, and a single
-    A (valid-address) bit, saying whether or
-    not the program currently has the right to read/write that
-    address.</para>
+    <para>Each byte in memory has 8 associated V (valid-value) bits,
+    saying whether or not the byte has a defined value, and a single A
+    (valid-address) bit, saying whether or not the program currently has
+    the right to read/write that address.</para>
   </listitem>
 
   <listitem>
-    <para>When memory is read or written, the relevant
-    A bits are consulted.  If they indicate an
-    invalid address, Valgrind emits an Invalid read or Invalid
-    write error.</para>
+    <para>When memory is read or written, the relevant A bits are
+    consulted.  If they indicate an invalid address, Valgrind emits an
+    Invalid read or Invalid write error.</para>
   </listitem>
 
   <listitem>
-    <para>When memory is read into the CPU's registers,
-    the relevant V bits are fetched from
-    memory and stored in the simulated CPU.  They are not
-    consulted.</para>
+    <para>When memory is read into the CPU's registers, the relevant V
+    bits are fetched from memory and stored in the simulated CPU.  They
+    are not consulted.</para>
   </listitem>
 
   <listitem>
-    <para>When a register is written out to memory, the
-    V bits for that register are written back
-    to memory too.</para>
+    <para>When a register is written out to memory, the V bits for that
+    register are written back to memory too.</para>
   </listitem>
 
   <listitem>
-    <para>When values in CPU registers are used to
-    generate a memory address, or to determine the outcome of a
-    conditional branch, the V bits for those
-    values are checked, and an error emitted if any of them are
-    undefined.</para>
+    <para>When values in CPU registers are used to generate a memory
+    address, or to determine the outcome of a conditional branch, the V
+    bits for those values are checked, and an error emitted if any of
+    them are undefined.</para>
   </listitem>
 
   <listitem>
-    <para>When values in CPU registers are used for any
-    other purpose, Valgrind computes the V bits for the result,
-    but does not check them.</para>
+    <para>When values in CPU registers are used for any other purpose,
+    Valgrind computes the V bits for the result, but does not check
+    them.</para>
   </listitem>
 
   <listitem>
-    <para>One the V bits for a value in the
-    CPU have been checked, they are then set to indicate
-    validity.  This avoids long chains of errors.</para>
+    <para>One the V bits for a value in the CPU have been checked, they
+    are then set to indicate validity.  This avoids long chains of
+    errors.</para>
   </listitem>
 
   <listitem>
-    <para>When values are loaded from memory, valgrind checks the
-    A bits for that location and issues an illegal-address
-    warning if needed.  In that case, the V bits loaded are
-    forced to indicate Valid, despite the location being invalid.</para>
-    <para>This apparently strange choice reduces the amount of
-    confusing information presented to the user.  It avoids the
-    unpleasant phenomenon in which memory is read from a place
-    which is both unaddressible and contains invalid values, and,
-    as a result, you get not only an invalid-address (read/write)
-    error, but also a potentially large set of
-    uninitialised-value errors, one for every time the value is
-    used.</para>
-    <para>There is a hazy boundary case to do with multi-byte
-    loads from addresses which are partially valid and partially
-    invalid.  See details of the flag
-    <computeroutput>--partial-loads-ok</computeroutput> for
-    details.  </para>
+    <para>When values are loaded from memory, valgrind checks the A bits
+    for that location and issues an illegal-address warning if needed.
+    In that case, the V bits loaded are forced to indicate Valid,
+    despite the location being invalid.</para>
+
+    <para>This apparently strange choice reduces the amount of confusing
+    information presented to the user.  It avoids the unpleasant
+    phenomenon in which memory is read from a place which is both
+    unaddressible and contains invalid values, and, as a result, you get
+    not only an invalid-address (read/write) error, but also a
+    potentially large set of uninitialised-value errors, one for every
+    time the value is used.</para>
+
+    <para>There is a hazy boundary case to do with multi-byte loads from
+    addresses which are partially valid and partially invalid.  See
+    details of the flag <option>--partial-loads-ok</option> for details.
+    </para>
   </listitem>
 
 </itemizedlist>
 
 
-<para>Memcheck intercepts calls to malloc, calloc, realloc,
-valloc, memalign, free, new, new[], delete and delete[].
-The behaviour you get
+<para>Memcheck intercepts calls to malloc, calloc, realloc, valloc,
+memalign, free, new, new[], delete and delete[].  The behaviour you get
 is:</para>
 
 <itemizedlist>
 
   <listitem>
-    <para>malloc/new/new[]: the returned memory is marked as
-    addressible but not having valid values.  This means you have
-    to write on it before you can read it.</para>
+    <para>malloc/new/new[]: the returned memory is marked as addressible
+    but not having valid values.  This means you have to write on it
+    before you can read it.</para>
   </listitem>
 
   <listitem>
-    <para>calloc: returned memory is marked both addressible and
-    valid, since calloc() clears the area to zero.</para>
+    <para>calloc: returned memory is marked both addressible and valid,
+    since calloc() clears the area to zero.</para>
   </listitem>
 
   <listitem>
-    <para>realloc: if the new size is larger than the old, the
-    new section is addressible but invalid, as with
-    malloc.</para>
+    <para>realloc: if the new size is larger than the old, the new
+    section is addressible but invalid, as with malloc.</para>
   </listitem>
 
   <listitem>
-    <para>If the new size is smaller, the dropped-off section is
-    marked as unaddressible.  You may only pass to realloc a
-    pointer previously issued to you by malloc/calloc/realloc.</para>
+    <para>If the new size is smaller, the dropped-off section is marked
+    as unaddressible.  You may only pass to realloc a pointer previously
+    issued to you by malloc/calloc/realloc.</para>
   </listitem>
 
   <listitem>
-    <para>free/delete/delete[]: you may only pass to these
-    functions a pointer
-    previously issued to you by the corresponding allocation function.
-    Otherwise, Valgrind complains.  If the pointer is
-    indeed valid, Valgrind marks the entire area it points at as
-    unaddressible, and places the block in the
-    freed-blocks-queue.  The aim is to defer as long as possible
-    reallocation of this block.  Until that happens, all attempts
-    to access it will elicit an invalid-address error, as you
-    would hope.</para>
+    <para>free/delete/delete[]: you may only pass to these functions a
+    pointer previously issued to you by the corresponding allocation
+    function.  Otherwise, Valgrind complains.  If the pointer is indeed
+    valid, Valgrind marks the entire area it points at as unaddressible,
+    and places the block in the freed-blocks-queue.  The aim is to defer
+    as long as possible reallocation of this block.  Until that happens,
+    all attempts to access it will elicit an invalid-address error, as
+    you would hope.</para>
   </listitem>
 
 </itemizedlist>
@@ -1008,9 +984,9 @@ arguments.</para>
 <itemizedlist>
 
   <listitem>
-    <para><computeroutput>VALGRIND_MAKE_NOACCESS</computeroutput>,
-    <computeroutput>VALGRIND_MAKE_WRITABLE</computeroutput> and
-    <computeroutput>VALGRIND_MAKE_READABLE</computeroutput>.
+    <para><varname>VALGRIND_MAKE_NOACCESS</varname>,
+    <varname>VALGRIND_MAKE_WRITABLE</varname> and
+    <varname>VALGRIND_MAKE_READABLE</varname>.
     These mark address ranges as completely inaccessible,
     accessible but containing undefined data, and accessible and
     containing defined data, respectively.  Subsequent errors may
@@ -1020,66 +996,61 @@ arguments.</para>
   </listitem>
 
   <listitem>
-    <para><computeroutput>VALGRIND_DISCARD</computeroutput>: At
-    some point you may want Valgrind to stop reporting errors in
-    terms of the blocks defined by the previous three macros.  To
-    do this, the above macros return a small-integer "block
-    handle".  You can pass this block handle to
-    <computeroutput>VALGRIND_DISCARD</computeroutput>.  After
-    doing so, Valgrind will no longer be able to relate
-    addressing errors to the user-defined block associated with
-    the handle.  The permissions settings associated with the
-    handle remain in place; this just affects how errors are
-    reported, not whether they are reported.  Returns 1 for an
-    invalid handle and 0 for a valid handle (although passing
-    invalid handles is harmless).  Always returns 0 when not run
+    <para><varname>VALGRIND_DISCARD</varname>: At some point you may
+    want Valgrind to stop reporting errors in terms of the blocks
+    defined by the previous three macros.  To do this, the above macros
+    return a small-integer "block handle".  You can pass this block
+    handle to <varname>VALGRIND_DISCARD</varname>.  After doing so,
+    Valgrind will no longer be able to relate addressing errors to the
+    user-defined block associated with the handle.  The permissions
+    settings associated with the handle remain in place; this just
+    affects how errors are reported, not whether they are reported.
+    Returns 1 for an invalid handle and 0 for a valid handle (although
+    passing invalid handles is harmless).  Always returns 0 when not run
     on Valgrind.</para>
   </listitem>
 
   <listitem>
-    <para><computeroutput>VALGRIND_CHECK_WRITABLE</computeroutput>
-    and <computeroutput>VALGRIND_CHECK_READABLE</computeroutput>:
-    check immediately whether or not the given address range has
-    the relevant property, and if not, print an error message.
-    Also, for the convenience of the client, returns zero if the
-    relevant property holds; otherwise, the returned value is the
-    address of the first byte for which the property is not true.
-    Always returns 0 when not run on Valgrind.</para>
+    <para><varname>VALGRIND_CHECK_WRITABLE</varname> and
+    <varname>VALGRIND_CHECK_READABLE</varname>: check immediately
+    whether or not the given address range has the relevant property,
+    and if not, print an error message.  Also, for the convenience of
+    the client, returns zero if the relevant property holds; otherwise,
+    the returned value is the address of the first byte for which the
+    property is not true.  Always returns 0 when not run on
+    Valgrind.</para>
   </listitem>
 
   <listitem>
-    <para><computeroutput>VALGRIND_CHECK_DEFINED</computeroutput>:
-    a quick and easy way to find out whether Valgrind thinks a
-    particular variable (lvalue, to be precise) is addressible
-    and defined.  Prints an error message if not.  Returns no
-    value.</para>
+    <para><varname>VALGRIND_CHECK_DEFINED</varname>: a quick and easy
+    way to find out whether Valgrind thinks a particular variable
+    (lvalue, to be precise) is addressible and defined.  Prints an error
+    message if not.  Returns no value.</para>
   </listitem>
 
   <listitem>
-    <para><computeroutput>VALGRIND_DO_LEAK_CHECK</computeroutput>:
-    run the memory leak detector right now.  Returns no value.  I
-    guess this could be used to incrementally check for leaks
-    between arbitrary places in the program's execution.
-    Warning: not properly tested!</para>
+    <para><varname>VALGRIND_DO_LEAK_CHECK</varname>: run the memory leak
+    detector right now.  Returns no value.  I guess this could be used
+    to incrementally check for leaks between arbitrary places in the
+    program's execution.  Warning: not properly tested!</para>
   </listitem>
 
   <listitem>
-    <para><computeroutput>VALGRIND_COUNT_LEAKS</computeroutput>:
-    fills in the four arguments with the number of bytes of
-    memory found by the previous leak check to be leaked,
-    dubious, reachable and suppressed.  Again, useful in test
-    harness code, after calling
-    <computeroutput>VALGRIND_DO_LEAK_CHECK</computeroutput>.</para>
+    <para><varname>VALGRIND_COUNT_LEAKS</varname>: fills in the four
+    arguments with the number of bytes of memory found by the previous
+    leak check to be leaked, dubious, reachable and suppressed.  Again,
+    useful in test harness code, after calling
+    <varname>VALGRIND_DO_LEAK_CHECK</varname>.</para>
   </listitem>
 
   <listitem>
-    <para><computeroutput>VALGRIND_GET_VBITS</computeroutput> and
-    <computeroutput>VALGRIND_SET_VBITS</computeroutput>: allow
-    you to get and set the V (validity) bits for an address
-    range.  You should probably only set V bits that you have got
-    with <computeroutput>VALGRIND_GET_VBITS</computeroutput>.
-    Only for those who really know what they are doing.  Note: currently
-    disabled in Valgrind 3.1.0.</para>
+    <para><varname>VALGRIND_GET_VBITS</varname> and
+    <varname>VALGRIND_SET_VBITS</varname>: allow you to get and set the
+    V (validity) bits for an address range.  You should probably only
+    set V bits that you have got with
+    <varname>VALGRIND_GET_VBITS</varname>.  Only for those who really
+    know what they are doing.  Note: currently disabled in Valgrind
+    3.1.0.</para>
   </listitem>
 
 </itemizedlist>
diff --git a/memcheck/docs/mc-tech-docs.xml b/memcheck/docs/mc-tech-docs.xml
index 31a23c969..cebe61326 100644
--- a/memcheck/docs/mc-tech-docs.xml
+++ b/memcheck/docs/mc-tech-docs.xml
@@ -1,6 +1,7 @@
 <?xml version="1.0"?> <!-- -*- sgml -*- -->
 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-  "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
+          "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
+
 
 <chapter id="mc-tech-docs" 
          xreflabel="The design and implementation of Valgrind">
@@ -12,66 +13,61 @@
 <sect1 id="mc-tech-docs.intro" xreflabel="Introduction">
 <title>Introduction</title>
 
-<para>This document contains a detailed, highly-technical
-description of the internals of Valgrind.  This is not the user
-manual; if you are an end-user of Valgrind, you do not want to
-read this.  Conversely, if you really are a hacker-type and want
-to know how it works, I assume that you have read the user manual
-thoroughly.</para>
+<para>This document contains a detailed, highly-technical description of
+the internals of Valgrind.  This is not the user manual; if you are an
+end-user of Valgrind, you do not want to read this.  Conversely, if you
+really are a hacker-type and want to know how it works, I assume that
+you have read the user manual thoroughly.</para>
 
-<para>You may need to read this document several times, and
-carefully.  Some important things, I only say once.</para>
+<para>You may need to read this document several times, and carefully.
+Some important things, I only say once.</para>
 
-<para>[Note: this document is now very old, and a lot of its contents are out
-of date, and misleading.]</para>
+<para>[Note: this document is now very old, and a lot of its contents
+are out of date, and misleading.]</para>
 
 
 <sect2 id="mc-tech-docs.history" xreflabel="History">
 <title>History</title>
 
-<para>Valgrind came into public view in late Feb 2002.  However,
-it has been under contemplation for a very long time, perhaps
-seriously for about five years.  Somewhat over two years ago, I
-started working on the x86 code generator for the Glasgow Haskell
-Compiler (http://www.haskell.org/ghc), gaining familiarity with
-x86 internals on the way.  I then did Cacheprof,
-gaining further x86 experience.  Some
-time around Feb 2000 I started experimenting with a user-space
-x86 interpreter for x86-Linux.  This worked, but it was clear
-that a JIT-based scheme would be necessary to give reasonable
-performance for Valgrind.  Design work for the JITter started in
-earnest in Oct 2000, and by early 2001 I had an x86-to-x86
-dynamic translator which could run quite large programs.  This
-translator was in a sense pointless, since it did not do any
-instrumentation or checking.</para>
+<para>Valgrind came into public view in late Feb 2002.  However, it has
+been under contemplation for a very long time, perhaps seriously for
+about five years.  Somewhat over two years ago, I started working on the
+x86 code generator for the Glasgow Haskell Compiler
+(http://www.haskell.org/ghc), gaining familiarity with x86 internals on
+the way.  I then did Cacheprof, gaining further x86 experience.  Some
+time around Feb 2000 I started experimenting with a user-space x86
+interpreter for x86-Linux.  This worked, but it was clear that a
+JIT-based scheme would be necessary to give reasonable performance for
+Valgrind.  Design work for the JITter started in earnest in Oct 2000,
+and by early 2001 I had an x86-to-x86 dynamic translator which could run
+quite large programs.  This translator was in a sense pointless, since
+it did not do any instrumentation or checking.</para>
 
-<para>Most of the rest of 2001 was taken up designing and
-implementing the instrumentation scheme.  The main difficulty,
-which consumed a lot of effort, was to design a scheme which did
-not generate large numbers of false uninitialised-value warnings.
-By late 2001 a satisfactory scheme had been arrived at, and I
-started to test it on ever-larger programs, with an eventual eye
-to making it work well enough so that it was helpful to folks
-debugging the upcoming version 3 of KDE.  I've used KDE since
-before version 1.0, and wanted to Valgrind to be an indirect
-contribution to the KDE 3 development effort.  At the start of
-Feb 02 the kde-core-devel crew started using it, and gave a huge
-amount of helpful feedback and patches in the space of three
-weeks.  Snapshot 20020306 is the result.</para>
+<para>Most of the rest of 2001 was taken up designing and implementing
+the instrumentation scheme.  The main difficulty, which consumed a lot
+of effort, was to design a scheme which did not generate large numbers
+of false uninitialised-value warnings.  By late 2001 a satisfactory
+scheme had been arrived at, and I started to test it on ever-larger
+programs, with an eventual eye to making it work well enough so that it
+was helpful to folks debugging the upcoming version 3 of KDE.  I've used
+KDE since before version 1.0, and wanted to Valgrind to be an indirect
+contribution to the KDE 3 development effort.  At the start of Feb 02
+the kde-core-devel crew started using it, and gave a huge amount of
+helpful feedback and patches in the space of three weeks.  Snapshot
+20020306 is the result.</para>
 
-<para>In the best Unix tradition, or perhaps in the spirit of
-Fred Brooks' depressing-but-completely-accurate epitaph "build
-one to throw away; you will anyway", much of Valgrind is a second
-or third rendition of the initial idea.  The instrumentation
-machinery (<filename>vg_translate.c</filename>,
-<filename>vg_memory.c</filename>) and core CPU simulation
-(<filename>vg_to_ucode.c</filename>,
-<filename>vg_from_ucode.c</filename>) have had three redesigns
-and rewrites; the register allocator, low-level memory manager
+<para>In the best Unix tradition, or perhaps in the spirit of Fred
+Brooks' depressing-but-completely-accurate epitaph "build one to throw
+away; you will anyway", much of Valgrind is a second or third rendition
+of the initial idea.  The instrumentation machinery
+(<filename>vg_translate.c</filename>, <filename>vg_memory.c</filename>)
+and core CPU simulation (<filename>vg_to_ucode.c</filename>,
+<filename>vg_from_ucode.c</filename>) have had three redesigns and
+rewrites; the register allocator, low-level memory manager
 (<filename>vg_malloc2.c</filename>) and symbol table reader
-(<filename>vg_symtab2.c</filename>) are on the second rewrite.
-In a sense, this document serves to record some of the knowledge
-gained as a result.</para>
+(<filename>vg_symtab2.c</filename>) are on the second rewrite.  In a
+sense, this document serves to record some of the knowledge gained as a
+result.</para>
 
 </sect2>
 
@@ -84,11 +80,11 @@ gained as a result.</para>
 <filename>valgrinq.so</filename>, of which more later.  The
 <filename>valgrind</filename> shell script adds
 <filename>valgrind.so</filename> to the
-<computeroutput>LD_PRELOAD</computeroutput> list of extra
-libraries to be loaded with any dynamically linked library.  This
-is a standard trick, one which I assume the
-<computeroutput>LD_PRELOAD</computeroutput> mechanism was
-developed to support.</para>
+<computeroutput>LD_PRELOAD</computeroutput> list of extra libraries to
+be loaded with any dynamically linked library.  This is a standard
+trick, one which I assume the
+<computeroutput>LD_PRELOAD</computeroutput> mechanism was developed to
+support.</para>
 
 <para><filename>valgrind.so</filename> is linked with the
 <computeroutput>-z initfirst</computeroutput> flag, which
@@ -101,7 +97,7 @@ return from this initialisation function.  So the normal startup
 actions, orchestrated by the dynamic linker
 <filename>ld.so</filename>, continue as usual, except on the
 synthetic CPU, not the real one.  Eventually
-<computeroutput>main</computeroutput> is run and returns, and
+<function>main</function> is run and returns, and
 then the finalisation code of the shared objects is run,
 presumably in inverse order to which they were initialised.
 Remember, this is still all happening on the simulated CPU.
@@ -111,14 +107,14 @@ CPU, prints any error summaries and/or does leak detection, and
 returns from the initialisation code on the real CPU.  At this
 point, in effect the real and synthetic CPUs have merged back
 into one, Valgrind has lost control of the program, and the
-program finally <computeroutput>exit()s</computeroutput> back to
+program finally <function>exit()s</function> back to
 the kernel in the usual way.</para>
 
 <para>The normal course of activity, once Valgrind has started
 up, is as follows.  Valgrind never runs any part of your program
 (usually referred to as the "client"), not a single byte of it,
 directly.  Instead it uses function
-<computeroutput>VG_(translate)</computeroutput> to translate
+<function>VG_(translate)</function> to translate
 basic blocks (BBs, straight-line sequences of code) into
 instrumented translations, and those are run instead.  The
 translations are stored in the translation cache (TC),
@@ -130,7 +126,7 @@ direct-map cache for fast lookups in TT; it usually achieves a
 hit rate of around 98% and facilitates an orig-to-trans lookup in
 4 x86 insns, which is not bad.</para>
 
-<para>Function <computeroutput>VG_(dispatch)</computeroutput> in
+<para>Function <function>VG_(dispatch)</function> in
 <filename>vg_dispatch.S</filename> is the heart of the JIT
 dispatcher.  Once a translated code address has been found, it is
 executed simply by an x86 <computeroutput>call</computeroutput>
@@ -141,19 +137,19 @@ does a <computeroutput>ret</computeroutput>, taking it back to
 the dispatch loop, with, interestingly, zero branch
 mispredictions.  The address requested in
 <computeroutput>%eax</computeroutput> is looked up first in
-<computeroutput>VG_(tt_fast)</computeroutput>, and, if not found,
+<function>VG_(tt_fast)</function>, and, if not found,
 by calling C helper
-<computeroutput>VG_(search_transtab)</computeroutput>.  If there
+<function>VG_(search_transtab)</function>.  If there
 is still no translation available,
-<computeroutput>VG_(dispatch)</computeroutput> exits back to the
+<function>VG_(dispatch)</function> exits back to the
 top-level C dispatcher
-<computeroutput>VG_(toploop)</computeroutput>, which arranges for
-<computeroutput>VG_(translate)</computeroutput> to make a new
+<function>VG_(toploop)</function>, which arranges for
+<function>VG_(translate)</function> to make a new
 translation.  All fairly unsurprising, really.  There are various
 complexities described below.</para>
 
 <para>The translator, orchestrated by
-<computeroutput>VG_(translate)</computeroutput>, is complicated
+<function>VG_(translate)</function>, is complicated
 but entirely self-contained.  It is described in great detail in
 subsequent sections.  Translations are stored in TC, with TT
 tracking administrative information.  The translations are
@@ -168,7 +164,7 @@ new translations is expensive, so it is worth having a large TC
 to minimise the (capacity) miss rate.</para>
 
 <para>The dispatcher,
-<computeroutput>VG_(dispatch)</computeroutput>, receives hints
+<function>VG_(dispatch)</function>, receives hints
 from the translations which allow it to cheaply spot all control
 transfers corresponding to x86
 <computeroutput>call</computeroutput> and
@@ -178,24 +174,24 @@ this in order to spot some special events:</para>
 <itemizedlist>
   <listitem>
     <para>Calls to
-    <computeroutput>VG_(shutdown)</computeroutput>.  This is
+    <function>VG_(shutdown)</function>.  This is
     Valgrind's cue to exit.  NOTE: actually this is done a
     different way; it should be cleaned up.</para>
   </listitem>
 
   <listitem>
     <para>Returns of system call handlers, to the return address
-    <computeroutput>VG_(signalreturn_bogusRA)</computeroutput>.
+    <function>VG_(signalreturn_bogusRA)</function>.
     The signal simulator needs to know when a signal handler is
     returning, so we spot jumps (returns) to this address.</para>
   </listitem>
 
   <listitem>
-    <para>Calls to <computeroutput>vg_trap_here</computeroutput>.
-    All <computeroutput>malloc</computeroutput>,
-    <computeroutput>free</computeroutput>, etc calls that the
+    <para>Calls to <function>vg_trap_here</function>.
+    All <function>malloc</function>,
+    <function>free</function>, etc calls that the
     client program makes are eventually routed to a call to
-    <computeroutput>vg_trap_here</computeroutput>, and Valgrind
+    <function>vg_trap_here</function>, and Valgrind
     does its own special thing with these calls.  In effect this
     provides a trapdoor, by which Valgrind can intercept certain
     calls on the simulated CPU, run the call as it sees fit
@@ -207,24 +203,24 @@ this in order to spot some special events:</para>
 </itemizedlist>
 
 <para>Valgrind intercepts the client's
-<computeroutput>malloc</computeroutput>,
-<computeroutput>free</computeroutput>, etc, calls, so that it can
+<function>malloc</function>,
+<function>free</function>, etc, calls, so that it can
 store additional information.  Each block
-<computeroutput>malloc</computeroutput>'d by the client gives
+<function>malloc</function>'d by the client gives
 rise to a shadow block in which Valgrind stores the call stack at
-the time of the <computeroutput>malloc</computeroutput> call.
-When the client calls <computeroutput>free</computeroutput>,
+the time of the <function>malloc</function> call.
+When the client calls <function>free</function>,
 Valgrind tries to find the shadow block corresponding to the
-address passed to <computeroutput>free</computeroutput>, and
+address passed to <function>free</function>, and
 emits an error message if none can be found.  If it is found, the
 block is placed on the freed blocks queue
 <computeroutput>vg_freed_list</computeroutput>, it is marked as
 inaccessible, and its shadow block now records the call stack at
-the time of the <computeroutput>free</computeroutput> call.
+the time of the <function>free</function> call.
 Keeping <computeroutput>free</computeroutput>'d blocks in this
 queue allows Valgrind to spot all (presumably invalid) accesses
 to them.  However, once the volume of blocks in the free queue
-exceeds <computeroutput>VG_(clo_freelist_vol)</computeroutput>,
+exceeds <function>VG_(clo_freelist_vol)</function>,
 blocks are finally removed from the queue.</para>
 
 <para>Keeping track of <literal>A</literal> and
@@ -236,7 +232,7 @@ in a way which is reasonably fast and reasonably space efficient.
 The 4G address space is divided up into 64K sections, each
 covering 64Kb of address space.  Given a 32-bit address, the top
 16 bits are used to select one of the 65536 entries in
-<computeroutput>VG_(primary_map)</computeroutput>.  The resulting
+<function>VG_(primary_map)</function>.  The resulting
 "secondary" (<computeroutput>SecMap</computeroutput>) holds A and
 V bits for the 64k of address space chunk corresponding to the
 lower 16 bits of the address.</para>
@@ -257,7 +253,7 @@ How can you figure out where in your simulator the bug is?</para>
 <para>Valgrind's answer is: cheat.  Valgrind is designed so that
 it is possible to switch back to running the client program on
 the real CPU at any point.  Using the
-<computeroutput>--stop-after= </computeroutput> flag, you can ask
+<option>--stop-after= </option> flag, you can ask
 Valgrind to run just some number of basic blocks, and then run
 the rest of the way on the real CPU.  If you are searching for a
 bug in the simulated CPU, you can use this to do a binary search,
@@ -271,7 +267,7 @@ regardless of whether it is running on the real or simulated CPU.
 This means that Valgrind can't do pointer swizzling -- well, no
 great loss -- and it can't run on the same stack as the client --
 again, no great loss.  Valgrind operates on its own stack,
-<computeroutput>VG_(stack)</computeroutput>, which it switches to
+<function>VG_(stack)</function>, which it switches to
 at startup, temporarily switching back to the client's stack when
 doing system calls for the client.</para>
 
@@ -299,8 +295,8 @@ transition inside a sighandler and still have things working, but
 in practice that's not much of a restriction.</para>
 
 <para>Valgrind's implementation of
-<computeroutput>malloc</computeroutput>,
-<computeroutput>free</computeroutput>, etc, (in
+<function>malloc</function>,
+<function>free</function>, etc, (in
 <filename>vg_clientmalloc.c</filename>, not the low-level stuff
 in <filename>vg_malloc2.c</filename>) is somewhat complicated by
 the need to handle switching back at arbitrary points.  It does
@@ -341,7 +337,7 @@ result:</para>
     <para>Aside from the assertions, valgrind contains various
     sets of internal sanity checks, which get run at varying
     frequencies during normal operation.
-    <computeroutput>VG_(do_sanity_checks)</computeroutput> runs
+    <function>VG_(do_sanity_checks)</function> runs
     every 1000 basic blocks, which means 500 to 2000 times/second
     for typical machines at present.  It checks that Valgrind
     hasn't overrun its private stack, and does some simple checks
@@ -359,7 +355,7 @@ result:</para>
       <listitem>
         <para>The symbol table reader(s): various checks to
         ensure uniqueness of mappings; see
-        <computeroutput>VG_(read_symbols)</computeroutput> for a
+        <function>VG_(read_symbols)</function> for a
         start.  Is permanently engaged.</para>
       </listitem>
 
@@ -381,9 +377,9 @@ result:</para>
       <listitem>
         <para>The JITter parses x86 basic blocks into sequences
         of UCode instructions.  It then sanity checks each one
-        with <computeroutput>VG_(saneUInstr)</computeroutput> and
+        with <function>VG_(saneUInstr)</function> and
         sanity checks the sequence as a whole with
-        <computeroutput>VG_(saneUCodeBlock)</computeroutput>.
+        <function>VG_(saneUCodeBlock)</function>.
         This stuff is engaged by default, and has caught some
         way-obscure bugs in the simulated CPU machinery in its
         time.</para>
@@ -391,14 +387,14 @@ result:</para>
 
       <listitem>
         <para>The system call wrapper does
-        <computeroutput>VG_(first_and_last_secondaries_look_plausible)</computeroutput>
+        <function>VG_(first_and_last_secondaries_look_plausible)</function>
         after every syscall; this is known to pick up bugs in the
         syscall wrappers.  Engaged by default.</para>
       </listitem>
 
       <listitem>
         <para>The main dispatch loop, in
-        <computeroutput>VG_(dispatch)</computeroutput>, checks
+        <function>VG_(dispatch)</function>, checks
         that translations do not set
         <computeroutput>%ebp</computeroutput> to any value
         different from
@@ -455,8 +451,8 @@ result:</para>
     valgrind.so | grep " T "</computeroutput>, which shows you
     all the globally exported text symbols.  They should all have
     an approved prefix, except for those like
-    <computeroutput>malloc</computeroutput>,
-    <computeroutput>free</computeroutput>, etc, which we
+    <function>malloc</function>,
+    <function>free</function>, etc, which we
     deliberately want to shadow and take precedence over the same
     names exported from <filename>glibc.so</filename>, so that
     valgrind can intercept those calls easily.  Similarly,
@@ -905,24 +901,24 @@ stages, coordinated by
 transformation passes, all on straight-line blocks of UCode (type
 <computeroutput>UCodeBlock</computeroutput>).  Steps 2 and 4 are
 optimisation passes and can be disabled for debugging purposes,
-with <computeroutput>--optimise=no</computeroutput> and
-<computeroutput>--cleanup=no</computeroutput> respectively.</para>
+with <option>--optimise=no</option> and
+<option>--cleanup=no</option> respectively.</para>
 
 <para>Valgrind can also run in a no-instrumentation mode, given
-<computeroutput>--instrument=no</computeroutput>.  This is useful
+<option>--instrument=no</option>.  This is useful
 for debugging the JITter quickly without having to deal with the
 complexity of the instrumentation mechanism too.  In this mode,
 steps 3 and 4 are omitted.</para>
 
 <para>These flags combine, so that
-<computeroutput>--instrument=no</computeroutput> together with
-<computeroutput>--optimise=no</computeroutput> means only steps
+<option>--instrument=no</option> together with
+<option>--optimise=no</option> means only steps
 1, 5 and 6 are used.
-<computeroutput>--single-step=yes</computeroutput> causes each
+<option>--single-step=yes</option> causes each
 x86 instruction to be treated as a single basic block.  The
 translations are terrible but this is sometimes instructive.</para>
 
-<para>The <computeroutput>--stop-after=N</computeroutput> flag
+<para>The <option>--stop-after=N</option> flag
 switches back to the real CPU after
 <computeroutput>N</computeroutput> basic blocks.  It also re-JITs
 the final basic block executed and prints the debugging info
diff --git a/none/docs/nl-manual.xml b/none/docs/nl-manual.xml
index 384773ec0..2cd06e465 100644
--- a/none/docs/nl-manual.xml
+++ b/none/docs/nl-manual.xml
@@ -2,9 +2,10 @@
 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
   "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
 
+
 <chapter id="nl-manual" xreflabel="Nulgrind">
 
-<title>Nulgrind: the ``null'' tool</title>
+<title>Nulgrind: the "null" tool</title>
 <subtitle>A tool that does not very much at all</subtitle>
 
 <para>Nulgrind is the minimal tool for Valgrind.  It does no
@@ -12,11 +13,10 @@ initialisation or finalisation, and adds no instrumentation to
 the program's code.  It is mainly of use for Valgrind's
 developers for debugging and regression testing.</para>
 
-<para>Nonetheless you can run programs with Nulgrind.  They will
-run roughly 5 times more slowly than normal, for no useful
-effect.  Note that you need to use the option
-<computeroutput>--tool=none</computeroutput> to run Nulgrind
-(ie. not <computeroutput>--tool=nulgrind</computeroutput>).</para>
+<para>Nonetheless you can run programs with Nulgrind.  They will run
+roughly 5 times more slowly than normal, for no useful effect.  Note
+that you need to use the option <option>--tool=none</option> to run
+Nulgrind (ie. not <option>--tool=nulgrind</option>).</para>
 
 </chapter>