mirror of
https://github.com/Zenithsiz/ftmemsim-valgrind.git
synced 2026-02-06 19:54:18 +00:00
332 lines
14 KiB
HTML
332 lines
14 KiB
HTML
<html>
|
|
<head>
|
|
<title>Massif: a heap profiler</title>
|
|
</head>
|
|
|
|
<body>
|
|
<a name="ms-top"></a>
|
|
<h2>7 <b>Massif</b>: a heap profiler</h2>
|
|
|
|
To use this tool, you must specify <code>--tool=massif</code>
|
|
on the Valgrind command line.
|
|
|
|
<a name="spaceprof"></a>
|
|
<h3>7.1 Heap profiling</h3>
|
|
Massif is a heap profiler, i.e. it measures how much heap memory programs use.
|
|
In particular, it can give you information about:
|
|
<ul>
|
|
<li>Heap blocks;
|
|
<li>Heap administration blocks;
|
|
<li>Stack sizes.
|
|
</ul>
|
|
|
|
Heap profiling is useful to help you reduce the amount of memory your program
|
|
uses. On modern machines with virtual memory, this provides the following
|
|
benefits:
|
|
<ul>
|
|
<li>It can speed up your program -- a smaller program will interact better
|
|
with your machine's caches, avoid paging, and so on.
|
|
|
|
<li>If your program uses lots of memory, it will reduce the chance that it
|
|
exhausts your machine's swap space.
|
|
</ul>
|
|
|
|
Also, there are certain space leaks that aren't detected by traditional
|
|
leak-checkers, such as Memcheck's. That's because the memory isn't ever
|
|
actually lost -- a pointer remains to it -- but it's not in use. Programs
|
|
that have leaks like this can unnecessarily increase the amount of memory
|
|
they are using over time.
|
|
<p>
|
|
|
|
|
|
<a name="whyuse_heapprof"></a>
|
|
<h3>7.2 Why Use a Heap Profiler?</h3>
|
|
|
|
Everybody knows how useful time profilers are for speeding up programs. They
|
|
are particularly useful because people are notoriously bad at predicting where
|
|
are the bottlenecks in their programs.
|
|
<p>
|
|
But the story is different for heap profilers. Some programming languages,
|
|
particularly lazy functional languages like <a
|
|
href="http://www.haskell.org">Haskell</a>, have quite sophisticated heap
|
|
profilers. But there are few tools as powerful for profiling C and C++
|
|
programs.
|
|
<p>
|
|
Why is this? Maybe it's because C and C++ programmers must think that
|
|
they know where the memory is being allocated. After all, you can see all the
|
|
calls to <code>malloc()</code> and <code>new</code> and <code>new[]</code>,
|
|
right? But, in a big program, do you really know which heap allocations are
|
|
being executed, how many times, and how large each allocation is? Can you give
|
|
even a vague estimate of the memory footprint for your program? Do you know
|
|
this for all the libraries your program uses? What about administration bytes
|
|
required by the heap allocator to track heap blocks -- have you thought about
|
|
them? What about the stack? If you are unsure about any of these things,
|
|
maybe you should think about heap profiling.
|
|
<p>
|
|
Massif can tell you these things.
|
|
<p>
|
|
Or maybe it's because it's relatively easy to add basic heap profiling
|
|
functionality into a program, to tell you how many bytes you have allocated for
|
|
certain objects, or similar. But this information might only be simple like
|
|
total counts for the whole program's execution. What about space usage at
|
|
different points in the program's execution, for example? And reimplementing
|
|
heap profiling code for each project is a pain.
|
|
<p>
|
|
Massif can save you this effort.
|
|
<p>
|
|
|
|
|
|
<a name="overview"></a>
|
|
<h3>7.3 Overview</h3>
|
|
First off, as for normal Valgrind use, you probably want to compile with
|
|
debugging info (the <code>-g</code> flag). But, as opposed to Memcheck,
|
|
you probably <b>do</b> want to turn optimisation on, since you should profile
|
|
your program as it will be normally run.
|
|
<p>
|
|
Then, run your program with <code>valgrind --tool=massif</code> in front of the
|
|
normal command line invocation. When the program finishes, Massif will print
|
|
summary space statistics. It also creates a graph representing the program's
|
|
heap usage in a file called <code>massif.<i>pid</i>.ps</code>, which can
|
|
be read by any PostScript viewer, such as Ghostview.
|
|
<p>
|
|
It also puts detailed information about heap consumption in a file file
|
|
<code>massif.<i>pid</i>.txt</code> (text format) or
|
|
<code>massif.<i>pid</i>.html</code> (HTML format), where
|
|
<code><i>pid</i></code> is the program's process id.
|
|
<p>
|
|
|
|
|
|
<a name="basicresults"></a>
|
|
<h3>7.4 Basic Results of Profiling</h3>
|
|
|
|
To gather heap profiling information about the program <code>prog</code>,
|
|
type:
|
|
<p>
|
|
<blockquote>
|
|
<code>valgrind --tool=massif prog</code>
|
|
</blockquote>
|
|
<p>
|
|
The program will execute (slowly). Upon completion, summary statistics
|
|
that look like this will be printed:
|
|
|
|
<pre>
|
|
==27519== Total spacetime: 2,258,106 ms.B
|
|
==27519== heap: 24.0%
|
|
==27519== heap admin: 2.2%
|
|
==27519== stack(s): 73.7%
|
|
</pre>
|
|
|
|
All measurements are done in <i>spacetime</i>, i.e. space (in bytes) multiplied
|
|
by time (in milliseconds). Note that because Massif slows a program down a
|
|
lot, the actual spacetime figure is fairly meaningless; it's the relative
|
|
values that are interesting.
|
|
<p>
|
|
Which entries you see in the breakdown depends on the command line options
|
|
given. The above example measures all the possible parts of memory:
|
|
<ul>
|
|
<li>Heap: number of words allocated on the heap, via <code>malloc()</code>,
|
|
<code>new</code> and <code>new[]</code>.
|
|
<p>
|
|
<li>Heap admin: each heap block allocated requires some administration data,
|
|
which lets the allocator track certain things about the block. It is easy
|
|
to forget about this, and if your program allocates lots of small blocks,
|
|
it can add up. This value is an estimate of the space required for this
|
|
administration data.
|
|
<p>
|
|
<li>Stack(s): the spacetime used by the programs' stack(s). (Threaded programs
|
|
can have multiple stacks.) This includes signal handler stacks.
|
|
<p>
|
|
</ul>
|
|
<p>
|
|
|
|
|
|
<a name="graphs"></a>
|
|
<h3>7.5 Spacetime Graphs</h3>
|
|
As well as printing summary information, Massif also creates a file
|
|
representing a spacetime graph, <code>massif.<i>pid</i>.hp</code>.
|
|
It will produce a file called <code>massif.<i>pid</i>.ps</code>, which can be
|
|
viewed in a PostScript viewer.
|
|
<p>
|
|
Massif uses a program called <code>hp2ps</code> to convert the raw data into
|
|
the PostScript graph. It's distributed with Massif, but came originally
|
|
from the <a href="http://haskell.cs.yale.edu/ghc/">Glasgow Haskell
|
|
Compiler</a>. You shouldn't need to worry about this at all. However, if
|
|
the graph creation fails for any reason, Massif tell you, and will leave
|
|
behind a file named <code>massif.<i>pid</i>.hp</code>, containing the raw
|
|
heap profiling data.
|
|
<p>
|
|
Here's an example graph:<br>
|
|
<img src="date.gif" alt="spacetime graph">
|
|
<p>
|
|
The graph is broken into several bands. Most bands represent a single line of
|
|
your program that does some heap allocation; each such band represents all
|
|
the allocations and deallocations done from that line. Up to twenty bands are
|
|
shown; less significant allocation sites are merged into "other" and/or "OTHER"
|
|
bands. The accompanying text/HTML file produced by Massif has more detail
|
|
about these heap allocation bands. Then there are single bands for the
|
|
stack(s) and heap admin bytes.
|
|
<p>
|
|
Note: it's the height of a band that's important. Don't let the ups and downs
|
|
caused by other bands confuse you. For example, the
|
|
<code>read_alias_file</code> band in the example has the same height all the
|
|
time it's in existence.
|
|
<p>
|
|
The triangles on the x-axis show each point at which a memory census was taken.
|
|
These aren't necessarily evenly spread; Massif only takes a census when
|
|
memory is allocated or deallocated. The time on the x-axis is wallclock
|
|
time, which is not ideal because you can get different graphs for different
|
|
executions of the same program, due to random OS delays. But it's not too
|
|
bad, and it becomes less of a problem the longer a program runs.
|
|
<p>
|
|
Massif takes censuses at an appropriate timescale; censuses take place less
|
|
frequently as the program runs for longer. There is no point having more
|
|
than 100-200 censuses on a single graph.
|
|
<p>
|
|
The graphs give a good overview of where your program's space use comes from,
|
|
and how that varies over time. The accompanying text/HTML file gives a lot
|
|
more information about heap use.
|
|
|
|
<a name="detailsofheap"></a>
|
|
<h3>7.6 Details of Heap Allocations</h3>
|
|
|
|
The text/HTML file contains information to help interpret the heap bands of the
|
|
graph. It also contains a lot of extra information about heap allocations that you don't see in the graph.
|
|
<p>
|
|
Here's part of the information that accompanies the above graph.
|
|
|
|
<hr>
|
|
== 0 ===========================<br>
|
|
Heap allocation functions accounted for 50.8% of measured spacetime<br>
|
|
<p>
|
|
Called from:
|
|
<ul>
|
|
<li><a name="a401767D1"></a><a href="#b401767D1">22.1%</a>: 0x401767D0: _nl_intern_locale_data (in /lib/i686/libc-2.3.2.so)
|
|
<li><a name="a4017C394"></a><a href="#b4017C394"> 8.6%</a>: 0x4017C393: read_alias_file (in /lib/i686/libc-2.3.2.so)
|
|
|
|
<li><i>(several entries omitted)</i>
|
|
|
|
<li>and 6 other insignificant places</li>
|
|
</ul>
|
|
<hr>
|
|
The first part shows the total spacetime due to heap allocations, and the
|
|
places in the program where most memory was allocated (nb: if this program had
|
|
been compiled with <code>-g</code>, actual line numbers would be given). These
|
|
places are sorted, from most significant to least, and correspond to the bands
|
|
seen in the graph. Insignificant sites (accounting for less than 0.5% of total
|
|
spacetime) are omitted.
|
|
<p>
|
|
That alone can be useful, but often isn't enough. What if one of these
|
|
functions was called from several different places in the program? Which one
|
|
of these is responsible for most of the memory used? For
|
|
<code>_nl_intern_locale_data()</code>, this question is answered by clicking on
|
|
the <a href="#b401767D1">22.1%</a> link, which takes us to the following part
|
|
of the file.
|
|
|
|
<hr>
|
|
<p>== 1 ===========================<br>
|
|
<a name="b401767D1"></a>Context accounted for <a href="#a401767D1">22.1%</a> of measured spacetime<br>
|
|
0x401767D0: _nl_intern_locale_data (in /lib/i686/libc-2.3.2.so)<br>
|
|
<p>
|
|
Called from:
|
|
<ul>
|
|
<li><a name="a40176F96"></a><a href="#b40176F96">22.1%</a>: 0x40176F95: _nl_load_locale_from_archive (in /lib/i686/libc-2.3.2.so)
|
|
</ul>
|
|
<hr>
|
|
|
|
At this level, we can see all the places from which
|
|
<code>_nl_load_locale_from_archive()</code> was called such that it allocated
|
|
memory at 0x401767D0. (We can click on the top <a href="#a40176F96">22.1%</a>
|
|
link to go back to the parent entry.) At this level, we have moved beyond the
|
|
information presented in the graph. In this case, it is only called from one
|
|
place. We can again follow the link for more detail, moving to the following
|
|
part of the file.
|
|
|
|
<hr>
|
|
<p>== 2 ===========================<br>
|
|
<a name="b40176F96"></a>Context accounted for <a href="#a40176F96">22.1%</a> of measured spacetime<br>
|
|
0x401767D0: _nl_intern_locale_data (in /lib/i686/libc-2.3.2.so)<br>
|
|
0x40176F95: _nl_load_locale_from_archive (in /lib/i686/libc-2.3.2.so)<br>
|
|
<p>
|
|
Called from:
|
|
<ul>
|
|
<li><a name="a40176185"></a>22.1%: 0x40176184: _nl_find_locale (in /lib/i686/libc-2.3.2.so)
|
|
</ul>
|
|
<hr>
|
|
|
|
In this way we can dig deeper into the call stack, to work out exactly what
|
|
sequence of calls led to some memory being allocated. At this point, with a
|
|
call depth of 3, the information runs out (thus the address of the child entry,
|
|
0x40176184, isn't a link). We could rerun the program with a greater
|
|
<code>--depth</code> value if we wanted more information.
|
|
<p>
|
|
Sometimes you will get a code location like this:
|
|
<ul>
|
|
<li>30.8% : 0xFFFFFFFF: ???
|
|
</ul>
|
|
The code address isn't really 0xFFFFFFFF -- that's impossible. This is what
|
|
Massif does when it can't work out what the real code address is.
|
|
<p>
|
|
Massif produces this information in a plain text file by default, or HTML with
|
|
the <code>--format=html</code> option. The plain text version obviously
|
|
doesn't have the links, but a similar effect can be achieved by searching on
|
|
the code addresses. (In Vim, the '*' and '#' searches are ideal for this.)
|
|
|
|
|
|
<a name="massifoptions"></a>
|
|
<h3>7.7 Massif options</h3>
|
|
|
|
Massif-specific options are:
|
|
|
|
<ul>
|
|
<li><code>--heap=no</code><br>
|
|
<code>--heap=yes</code> [default]<br>
|
|
When enabled, profile heap usage in detail. Without it, the
|
|
<code>massif.<i>pid</i>.txt</code> or
|
|
<code>massif.<i>pid</i>.html</code> will be very short.
|
|
<p>
|
|
<li><code>--heap-admin=<i>n</i></code> [default: 8]<br>
|
|
The number of admin bytes per block to use. This can only be an
|
|
estimate of the average, since it may vary. The allocator used by
|
|
<code>glibc</code> requires somewhere between 4--15 bytes per block,
|
|
depending on various factors. It also requires admin space for freed
|
|
blocks, although Massif does not count this.
|
|
<p>
|
|
<li><code>--stacks=no</code><br>
|
|
<code>--stacks=yes</code> [default]<br>
|
|
When enabled, include stack(s) in the profile. Threaded programs can
|
|
have multiple stacks.
|
|
<p>
|
|
<li><code>--depth=<i>n</i></code> [default: 3]<br>
|
|
Depth of call chains to present in the detailed heap information.
|
|
Increasing it will give more information, but Massif will run the program
|
|
more slowly, using more memory, and produce a bigger
|
|
<code>.txt</code>/<code>.hp</code> file.
|
|
<p>
|
|
<li><code>--alloc-fn=<i>name</i></code><br>
|
|
Specify a function that allocates memory. This is useful for functions
|
|
that are wrappers to <code>malloc()</code>, which can fill up the context
|
|
information uselessly (and give very uninformative bands on the graph).
|
|
Functions specified will be ignored in contexts, i.e. treated as though
|
|
they were <code>malloc()</code>. This option can be specified multiple
|
|
times on the command line, to name multiple functions.
|
|
<p>
|
|
<li><code>--format=text</code> [default]<br>
|
|
<code>--format=html</code><br>
|
|
Produce the detailed heap information in text or HTML format. The file
|
|
suffix used will be either <code>.txt</code> or <code>.html</code>.
|
|
<p>
|
|
</ul>
|
|
|
|
<a name="accuracy"></a>
|
|
<h3>7.8 Accuracy</h3>
|
|
The information should be pretty accurate. Some approximations made might
|
|
cause some allocation contexts to be attributed with less memory than they
|
|
actually allocated, but the amounts should be miniscule.
|
|
<p>
|
|
The heap admin spacetime figure is an approximation, as described above. If
|
|
anyone knows how to improve its accuracy, please let us know.
|
|
|
|
</body>
|
|
</html>
|
|
|