diff --git a/cachegrind/docs/cg-manual.xml b/cachegrind/docs/cg-manual.xml
index 1b37a50ea..c1377b63a 100644
--- a/cachegrind/docs/cg-manual.xml
+++ b/cachegrind/docs/cg-manual.xml
@@ -8,7 +8,7 @@
 <title>Cachegrind: a cache and branch-prediction profiler</title>
 
 <para>To use this tool, you must specify
-<computeroutput>--tool=cachegrind</computeroutput> on the
+<option>--tool=cachegrind</option> on the
 Valgrind command line.</para>
 
 <sect1 id="cg-manual.overview" xreflabel="Overview">
@@ -55,7 +55,7 @@ instruction executed, you can find out how many instructions are
 executed per line, which can be useful for traditional profiling.</para>
 
 <para>Branch profiling is not enabled by default.  To use it, you must
-additionally specify <computeroutput>--branch-sim=yes</computeroutput>
+additionally specify <option>--branch-sim=yes</option>
 on the command line.</para>
 
 
@@ -64,7 +64,7 @@ on the command line.</para>
 
 <para>First off, as for normal Valgrind use, you probably want to
 compile with debugging info (the
-<computeroutput>-g</computeroutput> flag).  But by contrast with
+<option>-g</option> flag).  But by contrast with
 normal Valgrind use, you probably <command>do</command> want to turn
 optimisation on, since you should profile your program as it will
 be normally run.</para>
@@ -83,7 +83,7 @@ be normally run.</para>
 
     <para>Branch prediction statistics are not collected by default.
     To do so, add the flag
-    <computeroutput>--branch-sim=yes</computeroutput>.
+    <option>--branch-sim=yes</option>.
     </para>
 
     <para>This step should be done every time you want to collect
@@ -98,7 +98,7 @@ be normally run.</para>
     files to annotate can be specified manually, or manually on
     the command line, or "interesting" source files can be
     annotated automatically with the
-    <computeroutput>--auto=yes</computeroutput> option.  You can
+    <option>--auto=yes</option> option.  You can
     annotate C/C++ files or assembly language files equally
     easily.</para>
 
@@ -175,9 +175,9 @@ Cachegrind will fall back to using a default configuration (that
 of a model 3/4 Athlon).  Cachegrind will tell you if this
 happens.  You can manually specify one, two or all three levels
 (I1/D1/L2) of the cache from the command line using the
-<computeroutput>--I1</computeroutput>,
-<computeroutput>--D1</computeroutput> and
-<computeroutput>--L2</computeroutput> options.
+<option>--I1</option>,
+<option>--D1</option> and
+<option>--L2</option> options.
 For cache parameters to be valid for simulation, the number
 of sets (with associativity being the number of cache lines in
 each set) has to be a power of two.</para>
@@ -186,9 +186,9 @@ each set) has to be a power of two.</para>
 Cachegrind cannot automatically 
 determine the cache configuration, so you will 
 need to specify it with the
-<computeroutput>--I1</computeroutput>,
-<computeroutput>--D1</computeroutput> and
-<computeroutput>--L2</computeroutput> options.</para>
+<option>--I1</option>,
+<option>--D1</option> and
+<option>--L2</option> options.</para>
 
 
 <para>Other noteworthy behaviour:</para>
@@ -356,7 +356,7 @@ file:</para>
   <listitem>
     <para>To use an output file name other than the default
     <computeroutput>cachegrind.out</computeroutput>,
-    use the <computeroutput>--cachegrind-out-file</computeroutput>
+    use the <option>--cachegrind-out-file</option>
     switch.</para>
   </listitem>
   <listitem>
@@ -371,7 +371,7 @@ file:</para>
 on the output file name serves two purposes.  Firstly, it means you 
 don't have to rename old log files that you don't want to overwrite.  
 Secondly, and more importantly, it allows correct profiling with the
-<computeroutput>--trace-children=yes</computeroutput> option of
+<option>--trace-children=yes</option> option of
 programs that spawn child processes.</para>
 
 </sect2>
@@ -465,8 +465,8 @@ configuration, or failing that, via defaults).</para>
       <para>Enables or disables collection of branch instruction and
             misprediction counts.  By default this is disabled as it
             slows Cachegrind down by approximately 25%.  Note that you
-            cannot specify <computeroutput>--cache-sim=no</computeroutput>
-            and <computeroutput>--branch-sim=no</computeroutput>
+            cannot specify <option>--cache-sim=no</option>
+            and <option>--branch-sim=no</option>
             together, as that would leave Cachegrind with no
             information to collect.</para>
     </listitem>
@@ -615,7 +615,7 @@ Ir        I1mr I2mr Dr        D1mr  D2mr  Dw        D1mw   D2mw    file:function
  <listitem>
    <para>Events shown: the events shown, which is a subset of the events
    gathered.  This can be adjusted with the
-   <computeroutput>--show</computeroutput> option.</para>
+   <option>--show</option> option.</para>
   </listitem>
 
   <listitem>
@@ -626,12 +626,12 @@ Ir        I1mr I2mr Dr        D1mr  D2mr  Dw        D1mw   D2mw    file:function
     <computeroutput>Ir</computeroutput> counts, they will then be
     sorted by <computeroutput>I1mr</computeroutput> counts, and
     so on.  This order can be adjusted with the
-    <computeroutput>--sort</computeroutput> option.</para>
+    <option>--sort</option> option.</para>
 
     <para>Note that this dictates the order the functions appear.
     It is <command>not</command> the order in which the columns
     appear; that is dictated by the "events shown" line (and can
-    be changed with the <computeroutput>--show</computeroutput>
+    be changed with the <option>--show</option>
     option).</para>
   </listitem>
 
@@ -644,7 +644,7 @@ Ir        I1mr I2mr Dr        D1mr  D2mr  Dw        D1mw   D2mw    file:function
     <computeroutput>Ir</computeroutput> is chosen as the
     threshold event since it is the primary sort event.  The
     threshold can be adjusted with the
-    <computeroutput>--threshold</computeroutput>
+    <option>--threshold</option>
     option.</para>
   </listitem>
 
@@ -655,7 +655,7 @@ Ir        I1mr I2mr Dr        D1mr  D2mr  Dw        D1mw   D2mw    file:function
 
   <listitem>
     <para>Auto-annotation: whether auto-annotation was requested
-    via the <computeroutput>--auto=yes</computeroutput>
+    via the <option>--auto=yes</option>
     option. In this case no.</para>
   </listitem>
 
@@ -676,7 +676,7 @@ instructions that write to memory). The name
 and/or function name could not be determined from debugging
 information. If most of the entries have the form
 <computeroutput>???:???</computeroutput> the program probably
-wasn't compiled with <computeroutput>-g</computeroutput>.  If any
+wasn't compiled with <option>-g</option>.  If any
 code was invalidated (either due to self-modifying code or
 unloading of shared objects) its counts are aggregated into a
 single cost centre written as
@@ -688,7 +688,7 @@ and from libraries (eg. <filename>getc.c</filename>)</para>
 
 <para>There are two ways to annotate source files -- by choosing
 them manually, or with the
-<computeroutput>--auto=yes</computeroutput> option. To do it
+<option>--auto=yes</option> option. To do it
 manually, just specify the filenames as additional arguments to
 cg_annotate. For example, the
 output from running <filename>cg_annotate &lt;filename&gt;
@@ -736,7 +736,7 @@ terminal is clearly useful.)</para>
 (<computeroutput>User-annotated source</computeroutput>) as
 having been chosen manually for annotation.  If the file was
 found in one of the directories specified with the
-<computeroutput>-I / --include</computeroutput> option, the directory
+<option>-I</option>/<option>--include</option> option, the directory
 and file are both given.</para>
 
 <para>Each line is annotated with its event counts.  Events not
@@ -757,7 +757,7 @@ part of a file the shown code comes from, eg:</para>
 (figures and code for line 878)]]></programlisting>
 
 <para>The amount of context to show around annotated lines is
-controlled by the <computeroutput>--context</computeroutput>
+controlled by the <option>--context</option>
 option.</para>
 
 <para>To get automatic annotation, run
@@ -765,8 +765,8 @@ option.</para>
 cg_annotate will automatically annotate every source file it can
 find that is mentioned in the function-by-function summary.
 Therefore, the files chosen for auto-annotation are affected by
-the <computeroutput>--sort</computeroutput> and
-<computeroutput>--threshold</computeroutput> options.  Each
+the <option>--sort</option> and
+<option>--threshold</option> options.  Each
 source file is clearly marked (<computeroutput>Auto-annotated
 source</computeroutput>) as being chosen automatically.  Any
 files that could not be found are mentioned at the end of the
@@ -785,9 +785,9 @@ usually compiled with debugging information, but the source files
 are often not present on a system.  If a file is chosen for
 annotation <command>both</command> manually and automatically, it
 is marked as <computeroutput>User-annotated
-source</computeroutput>. Use the <computeroutput>-I /
---include</computeroutput> option to tell Valgrind where to look
-for source files if the filenames found from the debugging
+source</computeroutput>. Use the
+<option>-I</option>/<option>--include</option> option to tell Valgrind where
+to look for source files if the filenames found from the debugging
 information aren't specific enough.</para>
 
 <para>Beware that cg_annotate can take some time to digest large
@@ -839,27 +839,25 @@ cg_annotate.</para>
 <itemizedlist>
 
   <listitem>
-    <para><computeroutput>-h, --help</computeroutput></para>
-    <para><computeroutput>-v, --version</computeroutput></para>
+    <para><option>-h --help</option></para>
+    <para><option>-v --version</option></para>
     <para>Help and version, as usual.</para>
   </listitem>
 
   <listitem id="sort">
-    <para><computeroutput>--sort=A,B,C</computeroutput> [default:
+    <para><option>--sort=A,B,C</option> [default:
     order in
     <computeroutput>cachegrind.out.&lt;pid&gt;</computeroutput>]</para>
     <para>Specifies the events upon which the sorting of the
     function-by-function entries will be based.  Useful if you
     want to concentrate on eg. I cache misses
-    (<computeroutput>--sort=I1mr,I2mr</computeroutput>), or D
-    cache misses
-    (<computeroutput>--sort=D1mr,D2mr</computeroutput>), or L2
-    misses
-    (<computeroutput>--sort=D2mr,I2mr</computeroutput>).</para>
+    (<option>--sort=I1mr,I2mr</option>), or D cache misses
+    (<option>--sort=D1mr,D2mr</option>), or L2 misses
+    (<option>--sort=D2mr,I2mr</option>).</para>
   </listitem>
 
   <listitem id="show">
-    <para><computeroutput>--show=A,B,C</computeroutput> [default:
+    <para><option>--show=A,B,C</option> [default:
     all, using order in
     <computeroutput>cachegrind.out.&lt;pid&gt;</computeroutput>]</para>
     <para>Specifies which events to show (and the column
@@ -869,7 +867,7 @@ cg_annotate.</para>
   </listitem>
 
   <listitem id="threshold">
-    <para><computeroutput>--threshold=X</computeroutput>
+    <para><option>--threshold=X</option>
     [default: 99%]</para>
     <para>Sets the threshold for the function-by-function
     summary.  Functions are shown that account for more than X%
@@ -878,24 +876,23 @@ cg_annotate.</para>
       
     <para>Note: thresholds can be set for more than one of the
     events by appending any events for the
-    <computeroutput>--sort</computeroutput> option with a colon
+    <option>--sort</option> option with a colon
     and a number (no spaces, though).  E.g. if you want to see
     the functions that cover 99% of L2 read misses and 99% of L2
     write misses, use this option:</para>
-    <para><computeroutput>--sort=D2mr:99,D2mw:99</computeroutput></para>
+    <para><option>--sort=D2mr:99,D2mw:99</option></para>
   </listitem>
 
   <listitem id="auto">
-    <para><computeroutput>--auto=no</computeroutput> [default]</para>
-    <para><computeroutput>--auto=yes</computeroutput></para>
+    <para><option>--auto=no</option> [default]</para>
+    <para><option>--auto=yes</option></para>
     <para>When enabled, automatically annotates every file that
     is mentioned in the function-by-function summary that can be
     found.  Also gives a list of those that couldn't be found.</para>
   </listitem>
 
   <listitem id="context">
-    <para><computeroutput>--context=N</computeroutput> [default:
-    8]</para>
+    <para><option>--context=N</option> [default: 8]</para>
     <para>Print N lines of context before and after each
     annotated line.  Avoids printing large sections of source
     files that were not executed.  Use a large number
@@ -903,9 +900,8 @@ cg_annotate.</para>
   </listitem>
 
   <listitem id="include">
-    <para><computeroutput>-I&lt;dir&gt;,
-      --include=&lt;dir&gt;</computeroutput> [default: empty
-      string]</para>
+    <para><option>-I&lt;dir&gt;, --include=&lt;dir&gt;</option>
+        [default: empty string]</para>
     <para>Adds a directory to the list in which to search for
     files.  Multiple -I/--include options can be given to add
     multiple directories.</para>
@@ -1046,7 +1042,7 @@ cg_annotate issues warnings.</para>
 
   <listitem>
     <para>If you compile some files with
-    <computeroutput>-g</computeroutput> and some without, some
+    <option>-g</option> and some without, some
     events that take place in a file without debug info could be
     attributed to the last line of a file with debug info
     (whichever one gets placed before the non-debug-info file in
diff --git a/callgrind/docs/cl-manual.xml b/callgrind/docs/cl-manual.xml
index 2f3d9f65d..43104bd36 100644
--- a/callgrind/docs/cl-manual.xml
+++ b/callgrind/docs/cl-manual.xml
@@ -8,7 +8,7 @@
 
 
 <para>To use this tool, you must specify
-<computeroutput>--tool=callgrind</computeroutput> on the
+<option>--tool=callgrind</option> on the
 Valgrind command line.</para>
 
 <sect1 id="cl-manual.use" xreflabel="Overview">
@@ -61,7 +61,7 @@ of the profiling, two command line tools are provided:</para>
 </variablelist>
 
 <para>To use Callgrind, you must specify 
-<computeroutput>--tool=callgrind</computeroutput> on the Valgrind 
+<option>--tool=callgrind</option> on the Valgrind 
 command line.</para>
 
   <sect2 id="cl-manual.functionality" xreflabel="Functionality">
@@ -498,8 +498,7 @@ callgrind.out.<emphasis>pid</emphasis>.<emphasis>part</emphasis>-<emphasis>threa
 <title>Command line option reference</title>
 
 <para>
-In the following, options are grouped into classes, in the same order as
-the output of <computeroutput>callgrind --help</computeroutput>.
+In the following, options are grouped into classes.
 </para>
 <para>
 Some options allow the specification of a function/symbol name, such as
@@ -513,30 +512,6 @@ shell. This feature is important especially for C++, as without wildcard
 usage, the function would have to be specified in full extent, including
 parameter signature. </para>
 
-<sect2 id="cl-manual.options.misc" 
-       xreflabel="Miscellaneous options">
-<title>Miscellaneous options</title>
-
-<variablelist id="cl.opts.list.misc">
-
-  <varlistentry>
-    <term><option>--help</option></term>
-    <listitem>
-      <para>Show summary of options. This is a short version of this
-      manual section.</para>
-    </listitem>
-  </varlistentry>
-
-  <varlistentry>
-    <term><option>--version</option></term>
-    <listitem>
-      <para>Show version of callgrind.</para>
-    </listitem>
-  </varlistentry>
-
-</variablelist>
-</sect2>
-
 <sect2 id="cl-manual.options.creation" 
        xreflabel="Dump creation options">
 <title>Dump creation options</title>
@@ -750,9 +725,9 @@ Also see <xref linkend="cl-manual.limits"/>.</para>
       option <xref linkend="opt.toggle-collect"/>.  If you use this flag, 
       collection
       state should be switched off at the beginning.  Note that the
-      specification of <computeroutput>--toggle-collect</computeroutput>
+      specification of <option>--toggle-collect</option>
       implicitly sets
-      <computeroutput>--collect-state=no</computeroutput>.</para>
+      <option>--collect-state=no</option>.</para>
       <para>Collection state can be toggled also by inserting the client request
       <computeroutput><xref linkend="cr.toggle-collect"/>;</computeroutput>
       at the needed code positions.</para>
diff --git a/callgrind/docs/man-annotate.xml b/callgrind/docs/man-annotate.xml
index 529ec78b3..ddb9a6091 100644
--- a/callgrind/docs/man-annotate.xml
+++ b/callgrind/docs/man-annotate.xml
@@ -48,7 +48,7 @@ included below.</para>
 <variablelist remap="TP">
 
   <varlistentry>
-    <term><option>-h, --help</option></term>
+    <term><option>-h --help</option></term>
     <listitem>
       <para>Show summary of options.</para>
     </listitem>
diff --git a/callgrind/docs/man-control.xml b/callgrind/docs/man-control.xml
index ca3edde6e..119e17786 100644
--- a/callgrind/docs/man-control.xml
+++ b/callgrind/docs/man-control.xml
@@ -49,7 +49,7 @@ included below.</para>
 <variablelist remap="TP">
 
   <varlistentry>
-    <term><option>-h, --help</option></term>
+    <term><option>-h --help</option></term>
     <listitem>
       <para>Show summary of options.</para>
     </listitem>
diff --git a/docs/xml/manual-core-adv.xml b/docs/xml/manual-core-adv.xml
index fcbe03768..021019689 100644
--- a/docs/xml/manual-core-adv.xml
+++ b/docs/xml/manual-core-adv.xml
@@ -56,8 +56,8 @@ program with any extra supporting libraries.</para>
 on x86, amd64, ppc32 and ppc64, the overhead is 6 simple integer instructions
 and is probably undetectable except in tight loops.
 However, if you really wish to compile out the client requests, you can
-compile with <computeroutput>-DNVALGRIND</computeroutput> (analogous to
-<computeroutput>-DNDEBUG</computeroutput>'s effect on
+compile with <option>-DNVALGRIND</option> (analogous to
+<option>-DNDEBUG</option>'s effect on
 <computeroutput>assert()</computeroutput>).
 </para>
 
@@ -103,7 +103,7 @@ tool-specific macros).</para>
     once.</para>
     <para>
     Alternatively, for transparent self-modifying-code support,
-    use<computeroutput>--smc-check=all</computeroutput>, or run
+    use<option>--smc-check=all</option>, or run
     on ppc32/Linux or ppc64/Linux.
     </para>
    </listitem>
@@ -504,7 +504,7 @@ will honour only the first one.</para>
 
 <para>Figuring out what's going on given the dynamic nature of wrapping
 can be difficult.  The 
-<computeroutput>--trace-redir=yes</computeroutput> flag makes 
+<option>--trace-redir=yes</option> flag makes 
 this possible
 by showing the complete state of the redirection subsystem after
 every
@@ -536,10 +536,10 @@ sections.  The active binding set is (conceptually) recomputed from
 the specifications, and all known symbol names, following any change
 to the specification set.</para>
 
-<para><computeroutput>--trace-redir=yes</computeroutput> shows the contents 
+<para><option>--trace-redir=yes</option> shows the contents 
 of both sets following any such event.</para>
 
-<para><computeroutput>-v</computeroutput> prints a line of text each 
+<para><option>-v</option> prints a line of text each 
 time an active specification is used for the first time.</para>
 
 <para>Hence for maximum debugging effectiveness you will need to use both
@@ -555,7 +555,7 @@ However, to make the implementation more robust, the two kinds
 of interception (wrapping vs replacement) are treated differently.
 </para>
 
-<para><computeroutput>--trace-redir=yes</computeroutput> shows 
+<para><option>--trace-redir=yes</option> shows 
 specifications and bindings for both
 replacement and wrapper functions.  To differentiate the 
 two, replacement bindings are printed using 
diff --git a/docs/xml/manual-core.xml b/docs/xml/manual-core.xml
index 584304333..e9e2c7690 100644
--- a/docs/xml/manual-core.xml
+++ b/docs/xml/manual-core.xml
@@ -113,20 +113,20 @@ already, if you intended to debug your program with GNU gdb, or some
 other debugger.</para>
 
 <para>If you are planning to use Memcheck: On rare
-occasions, compiler optimisations (at <computeroutput>-O2</computeroutput>
-and above, and sometimes <computeroutput>-O1</computeroutput>) have been
+occasions, compiler optimisations (at <option>-O2</option>
+and above, and sometimes <option>-O1</option>) have been
 observed to generate code which fools Memcheck into wrongly reporting
 uninitialised value errors, or missing uninitialised value errors.  We have
 looked in detail into fixing this, and unfortunately the result is that
 doing so would give a further significant slowdown in what is already a slow
 tool.  So the best solution is to turn off optimisation altogether.  Since
 this often makes things unmanageably slow, a reasonable compromise is to use
-<computeroutput>-O</computeroutput>.  This gets you the majority of the
+<option>-O</option>.  This gets you the majority of the
 benefits of higher optimisation levels whilst keeping relatively small the
 chances of false positives or false negatives from Memcheck.  Also, you
-should compile your code with <computeroutput>-Wall</computeroutput> because
+should compile your code with <option>-Wall</option> because
 it can identify some or all of the problems that Valgrind can miss at the
-higher optimisation levels.  (Using <computeroutput>-Wall</computeroutput>
+higher optimisation levels.  (Using <option>-Wall</option>
 is also a good idea in general.)  All other tools (as far as we know) are
 unaffected by optimisation level.</para>
 
@@ -631,7 +631,7 @@ categories.</para>
   </varlistentry>
 
   <varlistentry id="opt.quiet" xreflabel="--quiet">
-    <term><option>-q --quiet</option></term>
+    <term><option>-q</option>, <option>--quiet</option></term>
     <listitem>
       <para>Run silently, and only print error messages. Useful if you
       are running regression tests or have some other automated test
@@ -640,7 +640,7 @@ categories.</para>
   </varlistentry>
 
   <varlistentry id="opt.verbose" xreflabel="--verbose">
-    <term><option>-v --verbose</option></term>
+    <term><option>-v</option>, <option>--verbose</option></term>
     <listitem>
       <para>Be more verbose. Gives extra information on various aspects
       of your program, such as: the shared objects loaded, the
@@ -1525,7 +1525,7 @@ following entry in <literal>~/.valgrindrc</literal>:</para>
 run.  Without the <computeroutput>memcheck:</computeroutput>
 part, this will cause problems if you select other tools that
 don't understand
-<computeroutput>--leak-check=yes</computeroutput>.</para>
+<option>--leak-check=yes</option>.</para>
 
 </sect2>
 
@@ -1589,7 +1589,7 @@ able to cope with any POSIX-compliant use of signals.</para>
 <para>If you're using signals in clever ways (for example, catching
 SIGSEGV, modifying page state and restarting the instruction), you're
 probably relying on precise exceptions.  In this case, you will need
-to use <computeroutput>--vex-iropt-precise-memory-exns=yes</computeroutput>.
+to use <option>--vex-iropt-precise-memory-exns=yes</option>.
 </para>
 
 <para>If your program dies as a result of a fatal core-dumping signal,
@@ -1961,7 +1961,7 @@ shipped.</para>
 <title>Warning Messages You Might See</title>
 
 <para>Most of these only appear if you run in verbose mode
-(enabled by <computeroutput>-v</computeroutput>):</para>
+(enabled by <option>-v</option>):</para>
 
  <itemizedlist>
 
diff --git a/docs/xml/quick-start-guide.xml b/docs/xml/quick-start-guide.xml
index 465df3d10..8035216a4 100644
--- a/docs/xml/quick-start-guide.xml
+++ b/docs/xml/quick-start-guide.xml
@@ -44,13 +44,13 @@ documentation of Memcheck and the other tools, please read the User Manual.
 
 <para>Compile your program with <option>-g</option> to include debugging
 information so that Memcheck's error messages include exact line
-numbers.  Using <computeroutput>-O0</computeroutput> is also a good
+numbers.  Using <option>-O0</option> is also a good
 idea, if you can tolerate the slowdown.  With
-<computeroutput>-O1</computeroutput> line numbers in error messages can
+<option>-O1</option> line numbers in error messages can
 be inaccurate, although generally speaking running Memcheck on code compiled
-at <computeroutput>-O1</computeroutput> works fairly well.
+at <option>-O1</option> works fairly well.
 Use of
-<computeroutput>-O2</computeroutput> and above is not recommended as
+<option>-O2</option> and above is not recommended as
 Memcheck occasionally reports uninitialised-value errors which don't
 really exist.</para>
 
diff --git a/docs/xml/xml_help.txt b/docs/xml/xml_help.txt
index ae71a6f98..7290d3eb6 100644
--- a/docs/xml/xml_help.txt
+++ b/docs/xml/xml_help.txt
@@ -17,8 +17,12 @@ xml to html markup transformations:
 
 <programlisting> --> <pre class="programlisting">
 <screen>         --> <pre class="screen">
-<computeroutput> --> <tt class="computeroutput">
-<literal>        --> <tt>
+<option>         --> <code class="option">
+<filename>       --> <code class="filename">
+<function>       --> <code class="function">
+<literal>        --> <code class="literal">
+<varname>        --> <code class="varname">
+<computeroutput> --> <code class="computeroutput">
 <emphasis>       --> <i>
 <command>        --> <b class="command">
 <blockquote>     --> <div class="blockquote">
diff --git a/drd/docs/drd-manual.xml b/drd/docs/drd-manual.xml
index 18c1fb939..3052b4f0d 100644
--- a/drd/docs/drd-manual.xml
+++ b/drd/docs/drd-manual.xml
@@ -8,7 +8,7 @@
   <title>DRD: a thread error detector</title>
 
 <para>To use this tool, you must specify
-<computeroutput>--tool=drd</computeroutput>
+<option>--tool=drd</option>
 on the Valgrind command line.</para>
 
 
@@ -653,7 +653,7 @@ The above report has the following meaning:
       displayed. For dynamically allocated data the allocation call
       stack is shown. For static variables and stack variables the
       allocation context is only shown when the option
-      <computeroutput>--read-var-info=yes</computeroutput> has been
+      <option>--read-var-info=yes</option> has been
       specified. Otherwise DRD will print <computeroutput>Allocation
       context: unknown</computeroutput>.
     </para>
diff --git a/exp-bbv/docs/bbv-manual.xml b/exp-bbv/docs/bbv-manual.xml
index 60c533271..afa994399 100644
--- a/exp-bbv/docs/bbv-manual.xml
+++ b/exp-bbv/docs/bbv-manual.xml
@@ -6,7 +6,7 @@
   <title>BBV: an experimental basic block vector generation tool</title>
 
 <para>To use this tool, you must specify
-<computeroutput>--tool=exp-bbv</computeroutput> on the Valgrind
+<option>--tool=exp-bbv</option> on the Valgrind
 command line.</para>
 
 <sect1 id="bbv-manual.overview" xreflabel="Overview">
@@ -202,7 +202,7 @@ command line.</para>
 <para>  
   The Basic Block Vector is dumped at fixed intervals.  This
   is commonly done every 100 million instructions; the 
-  <computeroutput>--interval-size</computeroutput> option can be 
+  <option>--interval-size</option> option can be 
   used to change this.
 </para>
 
@@ -252,7 +252,7 @@ T:18:45 :12:135353 :56:78 314:4324263]]></programlisting>
    BBV vectors will be different than those generated by other tools.
    In practice this does not seem to affect the accuracy of the
    SimPoint results.  We do internally force the
-   <computeroutput>--vex-guest-chase-thresh=0</computeroutput>
+   <option>--vex-guest-chase-thresh=0</option>
    option to Valgrind which forces a more basic-block like
    behavior.
 </para>
diff --git a/exp-ptrcheck/docs/pc-manual.xml b/exp-ptrcheck/docs/pc-manual.xml
index 3ce5ca0bd..abfeb29bd 100644
--- a/exp-ptrcheck/docs/pc-manual.xml
+++ b/exp-ptrcheck/docs/pc-manual.xml
@@ -9,7 +9,7 @@
   <title>Ptrcheck: an experimental heap, stack &amp; global array overrun detector</title>
 
 <para>To use this tool, you must specify
-<computeroutput>--tool=exp-ptrcheck</computeroutput> on the Valgrind
+<option>--tool=exp-ptrcheck</option> on the Valgrind
 command line.</para>
 
 
@@ -161,7 +161,7 @@ possibly be a valid pointer.</para>
 <title>How Ptrcheck Works: Stack and Global Checks</title>
 
 <para>When a source file is compiled
-with <computeroutput>-g</computeroutput>, the compiler attaches DWARF3
+with <option>-g</option>, the compiler attaches DWARF3
 debugging information which describes the location of all stack and
 global arrays in the file.</para>
 
diff --git a/helgrind/docs/hg-manual.xml b/helgrind/docs/hg-manual.xml
index bd7d5650e..f73d05a46 100644
--- a/helgrind/docs/hg-manual.xml
+++ b/helgrind/docs/hg-manual.xml
@@ -8,7 +8,7 @@
   <title>Helgrind: a thread error detector</title>
 
 <para>To use this tool, you must specify
-<computeroutput>--tool=helgrind</computeroutput> on the Valgrind
+<option>--tool=helgrind</option> on the Valgrind
 command line.</para>
 
 
diff --git a/lackey/docs/lk-manual.xml b/lackey/docs/lk-manual.xml
index 9a10425a6..633668b45 100644
--- a/lackey/docs/lk-manual.xml
+++ b/lackey/docs/lk-manual.xml
@@ -7,7 +7,7 @@
 <title>Lackey: an example tool</title>
 
 <para>To use this tool, you must specify
-<computeroutput>--tool=lackey</computeroutput> on the Valgrind
+<option>--tool=lackey</option> on the Valgrind
 command line.</para>
 
 
@@ -26,7 +26,7 @@ over performance.</para>
 
  <listitem>
   <para>When command line option
-  <computeroutput>--basic-counts=yes</computeroutput> is specified,
+  <option>--basic-counts=yes</option> is specified,
   it prints the following statistics and information about the execution of
   the client program:</para>
 
@@ -38,7 +38,7 @@ over performance.</para>
     function in glibc's dynamic linker that resolves function
     references to shared objects.</para>
     <para>You can change the name of the function tracked with command line
-    option <computeroutput>--fnname=&lt;name&gt;</computeroutput>.</para>
+    option <option>--fnname=&lt;name&gt;</option>.</para>
    </listitem>
 
    <listitem>
@@ -72,7 +72,7 @@ over performance.</para>
 
  <listitem>
   <para>When command line option
-  <computeroutput>--detailed-counts=yes</computeroutput> is
+  <option>--detailed-counts=yes</option> is
   specified, a table is printed with counts of loads, stores and ALU
   operations for various types of operands.</para>
 
@@ -82,7 +82,7 @@ over performance.</para>
 
  <listitem>
   <para>When command line option
-  <computeroutput>--trace-mem=yes</computeroutput> is
+  <option>--trace-mem=yes</option> is
   specified, it prints out the size and address of almost every load and
   store made by the program.  See the comments at the top of the file
   <computeroutput>lackey/lk_main.c</computeroutput> for details about
@@ -92,7 +92,7 @@ over performance.</para>
 
  <listitem>
   <para>When command line option
-  <computeroutput>--trace-superblocks=yes</computeroutput> is
+  <option>--trace-superblocks=yes</option> is
   specified, it prints out the address of every superblock 
   (extended basic block) executed by the program.  This is
   primarily of interest to Valgrind developers.  See the comments at 
@@ -104,14 +104,14 @@ over performance.</para>
 </orderedlist>
 
 <para>Note that Lackey runs quite slowly, especially when
-<computeroutput>--detailed-counts=yes</computeroutput> is specified.
+<option>--detailed-counts=yes</option> is specified.
 It could be made to run a lot faster by doing a slightly more
 sophisticated job of the instrumentation, but that would undermine
 its role as a simple example tool.  Hence we have chosen not to do
 so.</para>
 
-<para>Note also that <computeroutput>--trace-mem=yes</computeroutput>
-and <computeroutput>--trace-superblocks=yes</computeroutput> create
+<para>Note also that <option>--trace-mem=yes</option>
+and <option>--trace-superblocks=yes</option> create
 immense amounts of output.  If you are saving the output in a file,
 you can eat up tens of gigabytes of disk space very quickly.
 As a result of printing out so much stuff, they also cause the program
diff --git a/massif/docs/ms-manual.xml b/massif/docs/ms-manual.xml
index 4fc94593b..b59db46b9 100644
--- a/massif/docs/ms-manual.xml
+++ b/massif/docs/ms-manual.xml
@@ -8,7 +8,7 @@
   <title>Massif: a heap profiler</title>
 
 <para>To use this tool, you must specify
-<computeroutput>--tool=massif</computeroutput> on the Valgrind
+<option>--tool=massif</option> on the Valgrind
 command line.</para>
 
 <sect1 id="ms-manual.overview" xreflabel="Overview">
@@ -54,7 +54,7 @@ which parts of your program are responsible for allocating the heap memory.
 
 
 <para>First off, as for the other Valgrind tools, you should compile with
-debugging info (the <computeroutput>-g</computeroutput> flag).  It shouldn't
+debugging info (the <option>-g</option> flag).  It shouldn't
 matter much what optimisation level you compile your program with, as this
 is unlikely to affect the heap memory usage.</para>
 
@@ -188,7 +188,7 @@ For very short-run programs such as the example, most of the executed
 instructions involve the loading and dynamic linking of the program.  The
 execution of <computeroutput>main</computeroutput> (and thus the heap
 allocations) only occur at the very end.  For a short-running program like
-this, we can use the <computeroutput>--time-unit=B</computeroutput> option
+this, we can use the <option>--time-unit=B</option> option
 to specify that we want the time unit to instead be the number of bytes
 allocated/deallocated on the heap and stack(s).</para>
 
@@ -232,7 +232,7 @@ taking snapshots for every heap allocation/deallocation, but as a program
 runs for longer, it takes snapshots less frequently.  It also discards older
 snapshots as the program goes on;  when it reaches the maximum number of
 snapshots (100 by default, although changeable with the
-<computeroutput>--max-snapshots</computeroutput> option) half of them are
+<option>--max-snapshots</option> option) half of them are
 deleted.  This means that a reasonable number of snapshots are always
 maintained.</para>
 
@@ -246,7 +246,7 @@ shortly.  Detailed snapshots are represented in the graph by bars consisting
 of '@' characters.  The text at the bottom show that 3 detailed
 snapshots were taken for this program (snapshots 9, 14 and 24).  By default,
 every 10th snapshot is detailed, although this can be changed via the
-<computeroutput>--detailed-freq</computeroutput> option.</para>
+<option>--detailed-freq</option> option.</para>
 
 <para>Finally, there is at most one <emphasis>peak</emphasis> snapshot.  The
 peak snapshot is a detailed snapshot, and records the point where memory
@@ -260,7 +260,7 @@ at every allocation, i.e. it is <emphasis>not</emphasis> just the peak among
 the regular snapshots.  However, recording the true peak is expensive, and
 so by default Massif records a peak whose size is within 1% of the size of
 the true peak.  See the description of the
-<computeroutput>--peak-inaccuracy</computeroutput> option below for more
+<option>--peak-inaccuracy</option> option below for more
 details.</para>
 
 <para>The following graph is from an execution of Konqueror, the KDE web
@@ -331,7 +331,7 @@ a small amount of information is recorded for each one:</para>
 
   <listitem><para>The time it was taken. In this case, the time unit is
   bytes, due to the use of
-  <computeroutput>--time-unit=B</computeroutput>.</para></listitem>
+  <option>--time-unit=B</option>.</para></listitem>
 
   <listitem><para>The total memory consumption at that point.</para></listitem>
 
@@ -347,14 +347,14 @@ a small amount of information is recorded for each one:</para>
   The exact number of administrative bytes depends on the details of the
   allocator.  By default Massif assumes 8 bytes per block, as can be seen
   from the example, but this number can be changed via the
-  <computeroutput>--heap-admin</computeroutput> option.</para>
+  <option>--heap-admin</option> option.</para>
 
   <para>Second, allocators often round up the number of bytes asked for to a
   larger number.  By default, if N bytes are asked for, Massif rounds N up
   to the nearest multiple of 8 that is equal to or greater than N.  This is
   typical behaviour for allocators, and is required to ensure that elements
   within the block are suitably aligned.  The rounding size can be changed
-  with the <computeroutput>--alignment</computeroutput> option, although it
+  with the <option>--alignment</option> option, although it
   cannot be less than 8, and must be a power of two.</para></listitem>
 
   <listitem><para>The size of the stack(s).  By default, stack profiling is
@@ -379,7 +379,7 @@ functions, and so all 9,000 useful bytes (which is 99.21% of all allocated
 bytes) go through them.  But how were <function>malloc</function> and new
 called?  At this point, every allocation so far has been due to line 21
 inside <function>main</function>, hence the second line in the tree.  The
-<computeroutput>-></computeroutput> indicates that main (line 20) called
+<option>-></option> indicates that main (line 20) called
 <function>malloc</function>.</para>
 
 <para>Let's see what the subsequent output shows happened next:</para>
@@ -491,7 +491,7 @@ only prints the details for code locations responsible for more than 1%.
 The entries that do not meet this threshold are aggregated.  This avoids
 filling up the output with large numbers of unimportant entries.  The
 thresholds can be changed with the
-<computeroutput>--threshold</computeroutput> option that both Massif and
+<option>--threshold</option> option that both Massif and
 ms_print support.</para>
 
 </sect2>
@@ -617,7 +617,7 @@ operator new[](unsigned long, std::nothrow_t const&)
     <listitem>
       <para>Any direct heap allocation (i.e. a call to
       <function>malloc</function>, <function>new</function>, etc, or a call
-      to a function name in a <computeroutput>--alloc-fn</computeroutput>
+      to a function name in a <option>--alloc-fn</option>
       option) that occurs in a function specified by this option will be
       ignored.  This is mostly useful for testing purposes.  This option can
       be specified multiple times on the command line, to name multiple
@@ -632,7 +632,7 @@ operator new[](unsigned long, std::nothrow_t const&)
       </para>
       
       <para>Note that overloaded C++ names must be written in full, as for
-      <computeroutput>--alloc-fn</computeroutput> above.
+      <option>--alloc-fn</option> above.
       </para>
       </listitem>
   </varlistentry>
@@ -685,7 +685,7 @@ operator new[](unsigned long, std::nothrow_t const&)
     </term>
     <listitem>
       <para>Frequency of detailed snapshots.  With
-      <computeroutput>--detailed-freq=1</computeroutput>, every snapshot is
+      <option>--detailed-freq=1</option>, every snapshot is
       detailed.</para>
     </listitem>
   </varlistentry>
@@ -741,14 +741,14 @@ operator new[](unsigned long, std::nothrow_t const&)
 <itemizedlist>
 
   <listitem>
-    <para><computeroutput>-h, --help</computeroutput></para>
-    <para><computeroutput>-v, --version</computeroutput></para>
+    <para><option>-h --help</option></para>
+    <para><option>-v --version</option></para>
     <para>Help and version, as usual.</para>
   </listitem>
 
   <listitem>
     <para><option><![CDATA[--threshold=<m.n>]]></option> [default: 1.0]</para>
-    <para>Same as Massif's <computeroutput>--threshold</computeroutput>, but
+    <para>Same as Massif's <option>--threshold</option>, but
     applied after profiling rather than during.</para>
   </listitem>
 
diff --git a/memcheck/docs/mc-manual.xml b/memcheck/docs/mc-manual.xml
index 8014ce4e0..7bf574185 100644
--- a/memcheck/docs/mc-manual.xml
+++ b/memcheck/docs/mc-manual.xml
@@ -157,7 +157,7 @@ difficult-to-diagnose crashes.</para>
       lost" and "possibly lost" blocks.  When enabled, the leak detector also
       shows "reachable" and "indirectly lost" blocks.  (In other words, it
       shows all blocks, except suppressed ones, so
-      <computeroutput>--show-all</computeroutput> would be a better name for
+      <option>--show-all</option> would be a better name for
       it.)</para>
     </listitem>
   </varlistentry>
@@ -764,12 +764,12 @@ LEAK SUMMARY:
         suppressed: 0 bytes in 0 blocks.
 ]]></programlisting>
 
-<para>If <computeroutput>--leak-check=full</computeroutput> is specified,
+<para>If <option>--leak-check=full</option> is specified,
 Memcheck will give details for each definitely lost or possibly lost block,
 including where it was allocated.  (Actually, it merges results for all
 blocks that have the same category and sufficiently similar stack traces
 into a single "loss record".  The
-<computeroutput>--leak-resolution</computeroutput> lets you control the
+<option>--leak-resolution</option> lets you control the
 meaning of "sufficiently similar".)  It cannot tell you when or how or why
 the pointer to a leaked block was lost; you have to work that out for
 yourself.  In general, you should attempt to ensure your programs do not
@@ -795,7 +795,7 @@ bytes in other blocks are indirectly lost because of this lost block.
 The loss records are not presented in any notable order, so the loss record
 numbers aren't particularly meaningful.</para>
 
-<para>If you specify <computeroutput>--show-reachable=yes</computeroutput>,
+<para>If you specify <option>--show-reachable=yes</option>,
 reachable and indirectly lost blocks will also be shown, as the following
 two examples show.</para>
 
@@ -1289,7 +1289,7 @@ arguments.</para>
 
   <listitem>
     <para><varname>VALGRIND_DO_LEAK_CHECK</varname>: does a full memory leak
-    check (like <computeroutput>--leak-check=full</computeroutput> right now.
+    check (like <option>--leak-check=full</option> right now.
     This is useful for incrementally checking for leaks between arbitrary
     places in the program's execution.  It has no return value.</para>
   </listitem>
@@ -1297,7 +1297,7 @@ arguments.</para>
   <listitem>
     <para><varname>VALGRIND_DO_QUICK_LEAK_CHECK</varname>: like
     <varname>VALGRIND_DO_LEAK_CHECK</varname>, except it produces only a leak
-    summary (like <computeroutput>--leak-check=summary</computeroutput>).
+    summary (like <option>--leak-check=summary</option>).
     It has no return value.</para>
   </listitem>
 
@@ -1580,7 +1580,7 @@ the same <computeroutput>mpicc</computeroutput> you use to build the
 MPI application you want to debug.  By default, Valgrind tries
 <computeroutput>mpicc</computeroutput>, but you can specify a
 different one by using the configure-time flag
-<computeroutput>--with-mpicc=</computeroutput>.  Currently the
+<option>--with-mpicc=</option>.  Currently the
 wrappers are only buildable with
 <computeroutput>mpicc</computeroutput>s which are based on GNU
 <computeroutput>gcc</computeroutput> or Intel's
@@ -1704,7 +1704,7 @@ valgrind MPI wrappers 16386: Try MPIWRAP_DEBUG=help for possible options
 </itemizedlist>
 
 <para> If you want to use Valgrind's XML output facility
-(<computeroutput>--xml=yes</computeroutput>), you should pass
+(<option>--xml=yes</option>), you should pass
 <computeroutput>quiet</computeroutput> in
 <computeroutput>MPIWRAP_DEBUG</computeroutput> so as to get rid of any
 extraneous printing from the wrappers.</para>
diff --git a/memcheck/docs/mc-tech-docs.xml b/memcheck/docs/mc-tech-docs.xml
index 33146cecf..1de368e63 100644
--- a/memcheck/docs/mc-tech-docs.xml
+++ b/memcheck/docs/mc-tech-docs.xml
@@ -87,7 +87,7 @@ trick, one which I assume the
 support.</para>
 
 <para><filename>valgrind.so</filename> is linked with the
-<computeroutput>-z initfirst</computeroutput> flag, which
+<option>-z initfirst</option> flag, which
 requests that its initialisation code is run before that of any
 other object in the executable image.  When this happens,
 valgrind gains control.  The real CPU becomes "trapped" in
@@ -489,8 +489,8 @@ result:</para>
     entirely.</para>
 
     <para>To find out which glibc symbols are used by Valgrind,
-    reinstate the link flags <computeroutput>-nostdlib
-    -Wl,-no-undefined</computeroutput>.  This causes linking to
+    reinstate the link flags <option>-nostdlib
+    -Wl,-no-undefined</option>.  This causes linking to
     fail, but will tell you what you depend on.  I have mostly,
     but not entirely, got rid of the glibc dependencies; what
     remains is, IMO, fairly harmless.  AFAIK the current
diff --git a/none/docs/nl-manual.xml b/none/docs/nl-manual.xml
index 547f70248..53758f214 100644
--- a/none/docs/nl-manual.xml
+++ b/none/docs/nl-manual.xml
@@ -8,7 +8,7 @@
 <title>Nulgrind: the minimal Valgrind tool</title>
 
 <para>To use this tool, you must specify
-<computeroutput>--tool=none</computeroutput> on the Valgrind
+<option>--tool=none</option> on the Valgrind
 command line.</para>
 
 <sect1 id="ms-manual.overview" xreflabel="Overview">