mirror of
https://github.com/Zenithsiz/ftmemsim-valgrind.git
synced 2026-02-04 02:18:37 +00:00
488 lines
18 KiB
XML
488 lines
18 KiB
XML
<?xml version="1.0"?> <!-- -*- sgml -*- -->
|
|
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
|
|
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
|
|
|
|
|
|
<chapter id="ms-manual" xreflabel="Massif: a heap profiler">
|
|
<title>Massif: a heap profiler</title>
|
|
|
|
<para>To use this tool, you must specify
|
|
<computeroutput>--tool=massif</computeroutput> on the Valgrind
|
|
command line.</para>
|
|
|
|
|
|
<sect1 id="ms-manual.spaceprof" xreflabel="Heap profiling">
|
|
<title>Heap profiling</title>
|
|
|
|
<para>Massif is a heap profiler, i.e. it measures how much heap
|
|
memory programs use. In particular, it can give you information
|
|
about:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem><para>Heap blocks;</para></listitem>
|
|
<listitem><para>Heap administration blocks;</para></listitem>
|
|
<listitem><para>Stack sizes.</para></listitem>
|
|
</itemizedlist>
|
|
|
|
<para>Heap profiling is useful to help you reduce the amount of
|
|
memory your program uses. On modern machines with virtual
|
|
memory, this provides the following benefits:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem><para>It can speed up your program -- a smaller
|
|
program will interact better with your machine's caches and
|
|
avoid paging.</para></listitem>
|
|
|
|
<listitem><para>If your program uses lots of memory, it will
|
|
reduce the chance that it exhausts your machine's swap
|
|
space.</para></listitem>
|
|
</itemizedlist>
|
|
|
|
<para>Also, there are certain space leaks that aren't detected by
|
|
traditional leak-checkers, such as Memcheck's. That's because
|
|
the memory isn't ever actually lost -- a pointer remains to it --
|
|
but it's not in use. Programs that have leaks like this can
|
|
unnecessarily increase the amount of memory they are using over
|
|
time.</para>
|
|
|
|
|
|
|
|
<sect2 id="ms-manual.heapprof"
|
|
xreflabel="Why Use a Heap Profiler?">
|
|
<title>Why Use a Heap Profiler?</title>
|
|
|
|
<para>Everybody knows how useful time profilers are for speeding
|
|
up programs. They are particularly useful because people are
|
|
notoriously bad at predicting where are the bottlenecks in their
|
|
programs.</para>
|
|
|
|
<para>But the story is different for heap profilers. Some
|
|
programming languages, particularly lazy functional languages
|
|
like <ulink url="http://www.haskell.org">Haskell</ulink>, have
|
|
quite sophisticated heap profilers. But there are few tools as
|
|
powerful for profiling C and C++ programs.</para>
|
|
|
|
<para>Why is this? Maybe it's because C and C++ programmers must
|
|
think that they know where the memory is being allocated. After
|
|
all, you can see all the calls to
|
|
<computeroutput>malloc()</computeroutput> and
|
|
<computeroutput>new</computeroutput> and
|
|
<computeroutput>new[]</computeroutput>, right? But, in a big
|
|
program, do you really know which heap allocations are being
|
|
executed, how many times, and how large each allocation is? Can
|
|
you give even a vague estimate of the memory footprint for your
|
|
program? Do you know this for all the libraries your program
|
|
uses? What about administration bytes required by the heap
|
|
allocator to track heap blocks -- have you thought about them?
|
|
What about the stack? If you are unsure about any of these
|
|
things, maybe you should think about heap profiling.</para>
|
|
|
|
<para>Massif can tell you these things.</para>
|
|
|
|
<para>Or maybe it's because it's relatively easy to add basic
|
|
heap profiling functionality into a program, to tell you how many
|
|
bytes you have allocated for certain objects, or similar. But
|
|
this information might only be simple like total counts for the
|
|
whole program's execution. What about space usage at different
|
|
points in the program's execution, for example? And
|
|
reimplementing heap profiling code for each project is a
|
|
pain.</para>
|
|
|
|
<para>Massif can save you this effort.</para>
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<sect1 id="ms-manual.using" xreflabel="Using Massif">
|
|
<title>Using Massif</title>
|
|
|
|
|
|
<sect2 id="ms-manual.overview" xreflabel="Overview">
|
|
<title>Overview</title>
|
|
|
|
<para>First off, as for normal Valgrind use, you probably want to
|
|
compile with debugging info (the
|
|
<computeroutput>-g</computeroutput> flag). But, as opposed to
|
|
Memcheck, you probably <command>do</command> want to turn
|
|
optimisation on, since you should profile your program as it will
|
|
be normally run.</para>
|
|
|
|
<para>Then, run your program with <computeroutput>valgrind
|
|
--tool=massif</computeroutput> in front of the normal command
|
|
line invocation. When the program finishes, Massif will print
|
|
summary space statistics. It also creates a graph representing
|
|
the program's heap usage in a file called
|
|
<filename>massif.pid.ps</filename>, which can be read by any
|
|
PostScript viewer, such as Ghostview.</para>
|
|
|
|
<para>It also puts detailed information about heap consumption in
|
|
a file <filename>massif.pid.txt</filename> (text format) or
|
|
<filename>massif.pid.html</filename> (HTML format), where
|
|
<emphasis>pid</emphasis> is the program's process id.</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
<sect2 id="ms-manual.basicresults" xreflabel="Basic Results of Profiling">
|
|
<title>Basic Results of Profiling</title>
|
|
|
|
<para>To gather heap profiling information about the program
|
|
<computeroutput>prog</computeroutput>, type:</para>
|
|
<screen><![CDATA[
|
|
% valgrind --tool=massif prog]]></screen>
|
|
|
|
<para>The program will execute (slowly). Upon completion,
|
|
summary statistics that look like this will be printed:</para>
|
|
<programlisting><![CDATA[
|
|
==27519== Total spacetime: 2,258,106 ms.B
|
|
==27519== heap: 24.0%
|
|
==27519== heap admin: 2.2%
|
|
==27519== stack(s): 73.7%]]></programlisting>
|
|
|
|
<para>All measurements are done in
|
|
<emphasis>spacetime</emphasis>, i.e. space (in bytes) multiplied
|
|
by time (in milliseconds). Note that because Massif slows a
|
|
program down a lot, the actual spacetime figure is fairly
|
|
meaningless; it's the relative values that are
|
|
interesting.</para>
|
|
|
|
<para>Which entries you see in the breakdown depends on the
|
|
command line options given. The above example measures all the
|
|
possible parts of memory:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem><para>Heap: number of words allocated on the heap, via
|
|
<computeroutput>malloc()</computeroutput>,
|
|
<computeroutput>new</computeroutput> and
|
|
<computeroutput>new[]</computeroutput>.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Heap admin: each heap block allocated requires some
|
|
administration data, which lets the allocator track certain
|
|
things about the block. It is easy to forget about this, and
|
|
if your program allocates lots of small blocks, it can add
|
|
up. This value is an estimate of the space required for this
|
|
administration data.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Stack(s): the spacetime used by the programs' stack(s).
|
|
(Threaded programs can have multiple stacks.) This includes
|
|
signal handler stacks.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
</sect2>
|
|
|
|
|
|
<sect2 id="ms-manual.graphs" xreflabel="Spacetime Graphs">
|
|
<title>Spacetime Graphs</title>
|
|
|
|
<para>As well as printing summary information, Massif also
|
|
creates a file representing a spacetime graph,
|
|
<filename>massif.pid.hp</filename>. It will produce a file
|
|
called <filename>massif.pid.ps</filename>, which can be viewed in
|
|
a PostScript viewer.</para>
|
|
|
|
<para>Massif uses a program called
|
|
<computeroutput>hp2ps</computeroutput> to convert the raw data
|
|
into the PostScript graph. It's distributed with Massif, but
|
|
came originally from the
|
|
<ulink url="http://www.haskell.org/ghc/">Glasgow Haskell
|
|
Compiler</ulink>. You shouldn't need to worry about this at all.
|
|
However, if the graph creation fails for any reason, Massif will
|
|
tell you, and will leave behind a file named
|
|
<filename>massif.pid.hp</filename>, containing the raw heap
|
|
profiling data.</para>
|
|
|
|
<para>Here's an example graph:</para>
|
|
<mediaobject id="spacetime-graph">
|
|
<imageobject>
|
|
<imagedata fileref="images/massif-graph-sm.png" format="PNG"/>
|
|
</imageobject>
|
|
<textobject>
|
|
<phrase>Spacetime Graph</phrase>
|
|
</textobject>
|
|
</mediaobject>
|
|
|
|
<para>The graph is broken into several bands. Most bands
|
|
represent a single line of your program that does some heap
|
|
allocation; each such band represents all the allocations and
|
|
deallocations done from that line. Up to twenty bands are shown;
|
|
less significant allocation sites are merged into "other" and/or
|
|
"OTHER" bands. The accompanying text/HTML file produced by
|
|
Massif has more detail about these heap allocation bands. Then
|
|
there are single bands for the stack(s) and heap admin
|
|
bytes.</para>
|
|
|
|
<formalpara>
|
|
<title>Note:</title>
|
|
<para>it's the height of a band that's important. Don't let the
|
|
ups and downs caused by other bands confuse you. For example,
|
|
the <computeroutput>read_alias_file</computeroutput> band in the
|
|
example has the same height all the time it's in existence.</para>
|
|
</formalpara>
|
|
|
|
<para>The triangles on the x-axis show each point at which a
|
|
memory census was taken. These aren't necessarily evenly spread;
|
|
Massif only takes a census when memory is allocated or
|
|
deallocated. The time on the x-axis is wallclock time, which is
|
|
not ideal because you can get different graphs for different
|
|
executions of the same program, due to random OS delays. But
|
|
it's not too bad, and it becomes less of a problem the longer a
|
|
program runs.</para>
|
|
|
|
<para>Massif takes censuses at an appropriate timescale; censuses
|
|
take place less frequently as the program runs for longer. There
|
|
is no point having more than 100-200 censuses on a single
|
|
graph.</para>
|
|
|
|
<para>The graphs give a good overview of where your program's
|
|
space use comes from, and how that varies over time. The
|
|
accompanying text/HTML file gives a lot more information about
|
|
heap use.</para>
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<sect1 id="ms-manual.heapdetails"
|
|
xreflabel="Details of Heap Allocations">
|
|
<title>Details of Heap Allocations</title>
|
|
|
|
<para>The text/HTML file contains information to help interpret
|
|
the heap bands of the graph. It also contains a lot of extra
|
|
information about heap allocations that you don't see in the
|
|
graph.</para>
|
|
|
|
|
|
<para>Here's part of the information that accompanies the above
|
|
graph.</para>
|
|
|
|
<blockquote>
|
|
<literallayout>== 0 ===========================</literallayout>
|
|
|
|
<para>Heap allocation functions accounted for 50.8% of measured
|
|
spacetime</para>
|
|
|
|
<para>Called from:</para>
|
|
<itemizedlist>
|
|
<listitem id="a401767D1"><para>
|
|
<ulink url="#b401767D1">22.1%</ulink>: 0x401767D0:
|
|
_nl_intern_locale_data (in /lib/i686/libc-2.3.2.so)</para>
|
|
</listitem>
|
|
<listitem id="a4017C394"><para>
|
|
<ulink url="#b4017C394">8.6%</ulink>: 0x4017C393:
|
|
read_alias_file (in /lib/i686/libc-2.3.2.so)</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>... ... <emphasis>(several entries omitted)</emphasis></para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>and 6 other insignificant places</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</blockquote>
|
|
|
|
<para>The first part shows the total spacetime due to heap
|
|
allocations, and the places in the program where most memory was
|
|
allocated (Nb: if this program had been compiled with
|
|
<computeroutput>-g</computeroutput>, actual line numbers would be
|
|
given). These places are sorted, from most significant to least,
|
|
and correspond to the bands seen in the graph. Insignificant
|
|
sites (accounting for less than 0.5% of total spacetime) are
|
|
omitted.</para>
|
|
|
|
<para>That alone can be useful, but often isn't enough. What if
|
|
one of these functions was called from several different places
|
|
in the program? Which one of these is responsible for most of
|
|
the memory used? For
|
|
<computeroutput>_nl_intern_locale_data()</computeroutput>, this
|
|
question is answered by clicking on the
|
|
<ulink url="#b401767D1">22.1%</ulink> link, which takes us to the
|
|
following part of the file:</para>
|
|
|
|
<blockquote id="b401767D1">
|
|
<literallayout>== 1 ===========================</literallayout>
|
|
|
|
<para>Context accounted for <ulink url="#a401767D1">22.1%</ulink>
|
|
of measured spacetime</para>
|
|
|
|
<para><computeroutput> 0x401767D0: _nl_intern_locale_data (in
|
|
/lib/i686/libc-2.3.2.so)</computeroutput></para>
|
|
|
|
<para>Called from:</para>
|
|
<itemizedlist>
|
|
<listitem id="a40176F96"><para>
|
|
<ulink url="#b40176F96">22.1%</ulink>: 0x40176F95:
|
|
_nl_load_locale_from_archive (in
|
|
/lib/i686/libc-2.3.2.so)</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</blockquote>
|
|
|
|
<para>At this level, we can see all the places from which
|
|
<computeroutput>_nl_load_locale_from_archive()</computeroutput>
|
|
was called such that it allocated memory at 0x401767D0. (We can
|
|
click on the top <ulink url="#a40176F96">22.1%</ulink> link to go back
|
|
to the parent entry.) At this level, we have moved beyond the
|
|
information presented in the graph. In this case, it is only
|
|
called from one place. We can again follow the link for more
|
|
detail, moving to the following part of the file.</para>
|
|
|
|
<blockquote>
|
|
<literallayout>== 2 ===========================</literallayout>
|
|
<para id="b40176F96">
|
|
Context accounted for <ulink url="#a40176F96">22.1%</ulink> of
|
|
measured spacetime</para>
|
|
|
|
<para><computeroutput> 0x401767D0: _nl_intern_locale_data (in
|
|
/lib/i686/libc-2.3.2.so)</computeroutput> <computeroutput>
|
|
0x40176F95: _nl_load_locale_from_archive (in
|
|
/lib/i686/libc-2.3.2.so)</computeroutput></para>
|
|
|
|
<para>Called from:</para>
|
|
<itemizedlist>
|
|
<listitem id="a40176185">
|
|
<para>22.1%: 0x40176184: _nl_find_locale (in
|
|
/lib/i686/libc-2.3.2.so)</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</blockquote>
|
|
|
|
<para>In this way we can dig deeper into the call stack, to work
|
|
out exactly what sequence of calls led to some memory being
|
|
allocated. At this point, with a call depth of 3, the
|
|
information runs out (thus the address of the child entry,
|
|
0x40176184, isn't a link). We could rerun the program with a
|
|
greater <computeroutput>--depth</computeroutput> value if we
|
|
wanted more information.</para>
|
|
|
|
<para>Sometimes you will get a code location like this:</para>
|
|
<programlisting><![CDATA[
|
|
30.8% : 0xFFFFFFFF: ???]]></programlisting>
|
|
|
|
<para>The code address isn't really 0xFFFFFFFF -- that's
|
|
impossible. This is what Massif does when it can't work out what
|
|
the real code address is.</para>
|
|
|
|
<para>Massif produces this information in a plain text file by
|
|
default, or HTML with the
|
|
<computeroutput>--format=html</computeroutput> option. The plain
|
|
text version obviously doesn't have the links, but a similar
|
|
effect can be achieved by searching on the code addresses. (In
|
|
Vim, the '*' and '#' searches are ideal for this.)</para>
|
|
|
|
|
|
<sect2 id="ms-manual.accuracy" xreflabel="Accuracy">
|
|
<title>Accuracy</title>
|
|
|
|
<para>The information should be pretty accurate. Some
|
|
approximations made might cause some allocation contexts to be
|
|
attributed with less memory than they actually allocated, but the
|
|
amounts should be miniscule.</para>
|
|
|
|
<para>The heap admin spacetime figure is an approximation, as
|
|
described above. If anyone knows how to improve its accuracy,
|
|
please let us know.</para>
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|
|
|
|
|
|
<sect1 id="ms-manual.options" xreflabel="Massif Options">
|
|
<title>Massif Options</title>
|
|
|
|
<para>Massif-specific options are:</para>
|
|
|
|
<!-- start of xi:include in the manpage -->
|
|
<variablelist id="ms.opts.list">
|
|
|
|
<varlistentry id="opt.heap" xreflabel="--heap">
|
|
<term>
|
|
<option><![CDATA[--heap=<yes|no> [default: yes] ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>When enabled, profile heap usage in detail. Without it, the
|
|
<filename>massif.pid.txt</filename> or
|
|
<filename>massif.pid.html</filename> will be very short.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry id="opt.heap-admin" xreflabel="--heap-admin">
|
|
<term>
|
|
<option><![CDATA[--heap-admin=<number> [default: 8] ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>The number of admin bytes per block to use. This can only
|
|
be an estimate of the average, since it may vary. The allocator
|
|
used by <computeroutput>glibc</computeroutput> requires somewhere
|
|
between 4 to 15 bytes per block, depending on various factors. It
|
|
also requires admin space for freed blocks, although
|
|
<constant>massif</constant> does not count this.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry id="opt.stacks" xreflabel="--stacks">
|
|
<term>
|
|
<option><![CDATA[--stacks=<yes|no> [default: yes] ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>When enabled, include stack(s) in the profile. Threaded
|
|
programs can have multiple stacks.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry id="opt.depth" xreflabel="--depth">
|
|
<term>
|
|
<option><![CDATA[--depth=<number> [default: 3] ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>Depth of call chains to present in the detailed heap
|
|
information. Increasing it will give more information, but
|
|
<constant>massif</constant> will run the program more slowly,
|
|
using more memory, and produce a bigger
|
|
<filename>massif.pid.txt</filename> or
|
|
<filename>massif.pid.hp</filename> file.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry id="opt.alloc-fn" xreflabel="--alloc-fn">
|
|
<term>
|
|
<option><![CDATA[--alloc-fn=<name> ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>Specify a function that allocates memory. This is useful
|
|
for functions that are wrappers to <function>malloc()</function>,
|
|
which can fill up the context information uselessly (and give very
|
|
uninformative bands on the graph). Functions specified will be
|
|
ignored in contexts, i.e. treated as though they were
|
|
<function>malloc()</function>. This option can be specified
|
|
multiple times on the command line, to name multiple
|
|
functions.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry id="opt.format" xreflabel="--format">
|
|
<term>
|
|
<option><![CDATA[--format=<text|html> [default: text] ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>Produce the detailed heap information in text or HTML
|
|
format. The file suffix used will be either
|
|
<filename>.txt</filename> or <filename>.html</filename>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
<!-- end of xi:include in the manpage -->
|
|
|
|
</sect1>
|
|
|
|
</chapter>
|