Overhaul DHAT.

This commit thoroughly overhauls DHAT, moving it out of the
"experimental" ghetto. It makes moderate changes to DHAT itself,
including dumping profiling data to a JSON format output file. It also
implements a new data viewer (as a web app, in dhat/dh_view.html).

The main benefits over the old DHAT are as follows.

- The separation of data collection and presentation means you can run a
  program once under DHAT and then sort the data in various ways. Also,
  full data is in the output file, and the viewer chooses what to omit.

- The data can be sorted in more ways than previously. Some of these
  sorts involve useful filters such as "short-lived" and "zero reads or
  zero writes".

- The tree structure view avoids the need to choose stack trace depth.
  This avoids both the problem of not enough depth (when records that
  should be distinct are combined, and may not contain enough
  information to be actionable) and the problem of too much depth (when
  records that should be combined are separated, making them seem less
  important than they really are).

- Byte and block measures are shown with a percentage relative to the
  global count, which helps gauge relative significance of different
  parts of the profile.

- Byte and blocks measures are also shown with an allocation rate
  (bytes and blocks per million instructions), which enables comparisons
  across multiple profiles, even if those profiles represent different
  workloads.

- Both global and per-node measurements are taken at the global heap
  peak ("At t-gmax"), which gives Massif-like insight into the point of
  peak memory use.

- The final/liftimes stats are a bit more useful than the old deaths
  stats. (E.g. the old deaths stats didn't take into account lifetimes
  of unfreed blocks.)

- The handling of realloc() has changed. The sequence `p = malloc(100);
  realloc(p, 200);` now increases the total block count by 2 and the
  total byte count by 300. Previously it increased them by 1 and 200.
  The new handling is a more operational view that better reflects the
  effect of allocations on performance. It makes a significant
  difference in the results, giving paths involving reallocation (e.g.
  repeated pushing to a growing vector) more prominence.

Other things of note:

- There is now testing, both regression tests that run within the
  standard test suite, and viewer-specific tests that cannot run within
  the standard test suite. The latter are run by loading
  dh_view.html?test=1 in a web browser.

- The commit puts all tool lists in Makefiles (and similar files) in the
  following consistent order: memcheck, cachegrind, callgrind, helgrind,
  drd, massif, dhat, lackey, none; exp-sgcheck, exp-bbv.

- A lot of fields in dh_main.c have been given more descriptive names.
  Those names now match those used in dh_view.js.
This commit is contained in:
Nicholas Nethercote 2018-10-04 11:00:22 +10:00
parent b19f6882cf
commit 441bfc5f51
45 changed files with 5737 additions and 864 deletions

44
.gitignore vendored
View File

@ -246,6 +246,34 @@
/coregrind/m_ume/.deps
/coregrind/m_ume/.dirstamp
# /dhat/
/dhat/*.dSYM
/dhat/.deps
/dhat/dhat-*-darwin
/dhat/dhat-*-linux
/dhat/dhat-*-solaris
/dhat/Makefile
/dhat/Makefile.in
/dhat/vgpreload_dhat-*-linux.so
/dhat/vgpreload_dhat-*-darwin.so
/dhat/vgpreload_dhat-*-solaris.so
# /dhat/tests/
/dhat/tests/Makefile
/dhat/tests/Makefile.in
/dhat/tests/*.dSYM
/dhat/tests/*.so
/dhat/tests/*.stderr.diff*
/dhat/tests/*.stderr.out
/dhat/tests/*.stdout.diff*
/dhat/tests/*.stdout.out
/dhat/tests/.deps
/dhat/tests/acc
/dhat/tests/basic
/dhat/tests/big
/dhat/tests/empty
/dhat/tests/single
# /docs/
/docs/FAQ.txt
/docs/html
@ -496,22 +524,6 @@
/exp-bbv/tests/x86-linux/Makefile
/exp-bbv/tests/x86-linux/Makefile.in
# /exp-dhat/
/exp-dhat/*.dSYM
/exp-dhat/.deps
/exp-dhat/exp-dhat-*-darwin
/exp-dhat/exp-dhat-*-linux
/exp-dhat/exp-dhat-*-solaris
/exp-dhat/Makefile
/exp-dhat/Makefile.in
/exp-dhat/vgpreload_exp-dhat-*-linux.so
/exp-dhat/vgpreload_exp-dhat-*-darwin.so
/exp-dhat/vgpreload_exp-dhat-*-solaris.so
# /exp-dhat/tests/
/exp-dhat/tests/Makefile
/exp-dhat/tests/Makefile.in
# /exp-sgcheck/
/exp-sgcheck/*.dSYM
/exp-sgcheck/.deps

View File

@ -6,15 +6,15 @@ include $(top_srcdir)/Makefile.all.am
TOOLS = memcheck \
cachegrind \
callgrind \
massif \
lackey \
none \
helgrind \
drd
drd \
massif \
dhat \
lackey \
none
EXP_TOOLS = exp-sgcheck \
exp-bbv \
exp-dhat
exp-bbv
# Put docs last because building the HTML is slow and we want to get
# everything else working before we try it.

16
NEWS
View File

@ -20,6 +20,20 @@ support for X86/macOS 10.13, AMD64/macOS 10.13.
* ==================== TOOL CHANGES ====================
* DHAT:
- DHAT been thoroughly overhauled and improved. As a result, it has been
promoted from an experimental tool to a regular tool. Run it with
--tool=dhat instead of --tool=exp-dhat.
- DHAT now prints only minimal data when the program ends, instead writing
the bulk of the profiling data to a file. As a result, the --show-top-n and
--sort-by options have been removed.
- Data files can be viewed with the new viewer, dh_view.html.
- See the documentation for more details.
* Cachegrind:
- cg_annotate has a new option, --show-percs, which prints percentages next
@ -94,6 +108,8 @@ n-i-bz Fix callgrind_annotate non deterministic order for equal total
n-i-bz callgrind_annotate --threshold=100 does not print all functions.
n-i-bz callgrind_annotate Use of uninitialized value in numeric gt (>)
Release 3.14.0 (9 October 2018)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@ -4636,9 +4636,14 @@ AC_CONFIG_FILES([
callgrind/tests/Makefile
helgrind/Makefile
helgrind/tests/Makefile
drd/Makefile
drd/scripts/download-and-build-splash2
drd/tests/Makefile
massif/Makefile
massif/tests/Makefile
massif/ms_print
dhat/Makefile
dhat/tests/Makefile
lackey/Makefile
lackey/tests/Makefile
none/Makefile
@ -4664,9 +4669,6 @@ AC_CONFIG_FILES([
none/tests/x86-solaris/Makefile
exp-sgcheck/Makefile
exp-sgcheck/tests/Makefile
drd/Makefile
drd/scripts/download-and-build-splash2
drd/tests/Makefile
exp-bbv/Makefile
exp-bbv/tests/Makefile
exp-bbv/tests/x86/Makefile
@ -4674,8 +4676,6 @@ AC_CONFIG_FILES([
exp-bbv/tests/amd64-linux/Makefile
exp-bbv/tests/ppc32-linux/Makefile
exp-bbv/tests/arm-linux/Makefile
exp-dhat/Makefile
exp-dhat/tests/Makefile
shared/Makefile
solaris/Makefile
])

View File

@ -7,7 +7,7 @@
This file is part of Valgrind, a dynamic binary instrumentation
framework.
Copyright (C) 2010-2017 Mozilla Inc
Copyright (C) 2010-2017 Mozilla Foundation
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as

View File

@ -1454,7 +1454,7 @@ Int valgrind_main ( Int argc, HChar **argv, HChar **envp )
|| 0 == VG_(strcmp)(VG_(clo_toolname), "helgrind")
|| 0 == VG_(strcmp)(VG_(clo_toolname), "drd")
|| 0 == VG_(strcmp)(VG_(clo_toolname), "massif")
|| 0 == VG_(strcmp)(VG_(clo_toolname), "exp-dhat")) {
|| 0 == VG_(strcmp)(VG_(clo_toolname), "dhat")) {
/* Change the default setting. Later on (just below)
main_process_cmd_line_options should pick up any
user-supplied setting for it and will override the default

View File

@ -7,7 +7,7 @@
This file is part of Valgrind, a dynamic binary instrumentation
framework.
Copyright (C) 2010-2017 Mozilla Inc
Copyright (C) 2010-2017 Mozilla Foundation
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as

View File

@ -11,89 +11,89 @@ EXTRA_DIST = docs/dh-manual.xml
#bin_SCRIPTS = dh_print
#----------------------------------------------------------------------------
# exp_dhat-<platform>
# dhat-<platform>
#----------------------------------------------------------------------------
noinst_PROGRAMS = exp-dhat-@VGCONF_ARCH_PRI@-@VGCONF_OS@
noinst_PROGRAMS = dhat-@VGCONF_ARCH_PRI@-@VGCONF_OS@
if VGCONF_HAVE_PLATFORM_SEC
noinst_PROGRAMS += exp-dhat-@VGCONF_ARCH_SEC@-@VGCONF_OS@
noinst_PROGRAMS += dhat-@VGCONF_ARCH_SEC@-@VGCONF_OS@
endif
EXP_DHAT_SOURCES_COMMON = dh_main.c
exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_SOURCES = \
dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_SOURCES = \
$(EXP_DHAT_SOURCES_COMMON)
exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_CPPFLAGS = \
dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_CPPFLAGS = \
$(AM_CPPFLAGS_@VGCONF_PLATFORM_PRI_CAPS@)
exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_CFLAGS = $(LTO_CFLAGS) \
dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_CFLAGS = $(LTO_CFLAGS) \
$(AM_CFLAGS_@VGCONF_PLATFORM_PRI_CAPS@)
exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_DEPENDENCIES = \
dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_DEPENDENCIES = \
$(TOOL_DEPENDENCIES_@VGCONF_PLATFORM_PRI_CAPS@)
exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_LDADD = \
dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_LDADD = \
$(TOOL_LDADD_@VGCONF_PLATFORM_PRI_CAPS@)
exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_LDFLAGS = \
dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_LDFLAGS = \
$(TOOL_LDFLAGS_@VGCONF_PLATFORM_PRI_CAPS@)
exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_LINK = \
dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_LINK = \
$(top_builddir)/coregrind/link_tool_exe_@VGCONF_OS@ \
@VALT_LOAD_ADDRESS_PRI@ \
$(LINK) \
$(exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_CFLAGS) \
$(exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_LDFLAGS)
$(dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_CFLAGS) \
$(dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_LDFLAGS)
if VGCONF_HAVE_PLATFORM_SEC
exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_SOURCES = \
dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_SOURCES = \
$(EXP_DHAT_SOURCES_COMMON)
exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_CPPFLAGS = \
dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_CPPFLAGS = \
$(AM_CPPFLAGS_@VGCONF_PLATFORM_SEC_CAPS@)
exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_CFLAGS = $(LTO_CFLAGS) \
dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_CFLAGS = $(LTO_CFLAGS) \
$(AM_CFLAGS_@VGCONF_PLATFORM_SEC_CAPS@)
exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_DEPENDENCIES = \
dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_DEPENDENCIES = \
$(TOOL_DEPENDENCIES_@VGCONF_PLATFORM_SEC_CAPS@)
exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_LDADD = \
dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_LDADD = \
$(TOOL_LDADD_@VGCONF_PLATFORM_SEC_CAPS@)
exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_LDFLAGS = \
dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_LDFLAGS = \
$(TOOL_LDFLAGS_@VGCONF_PLATFORM_SEC_CAPS@)
exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_LINK = \
dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_LINK = \
$(top_builddir)/coregrind/link_tool_exe_@VGCONF_OS@ \
@VALT_LOAD_ADDRESS_SEC@ \
$(LINK) \
$(exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_CFLAGS) \
$(exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_LDFLAGS)
$(dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_CFLAGS) \
$(dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_LDFLAGS)
endif
#----------------------------------------------------------------------------
# vgpreload_exp_dhat-<platform>.so
# vgpreload_dhat-<platform>.so
#----------------------------------------------------------------------------
noinst_PROGRAMS += vgpreload_exp-dhat-@VGCONF_ARCH_PRI@-@VGCONF_OS@.so
noinst_PROGRAMS += vgpreload_dhat-@VGCONF_ARCH_PRI@-@VGCONF_OS@.so
if VGCONF_HAVE_PLATFORM_SEC
noinst_PROGRAMS += vgpreload_exp-dhat-@VGCONF_ARCH_SEC@-@VGCONF_OS@.so
noinst_PROGRAMS += vgpreload_dhat-@VGCONF_ARCH_SEC@-@VGCONF_OS@.so
endif
if VGCONF_OS_IS_DARWIN
noinst_DSYMS = $(noinst_PROGRAMS)
endif
vgpreload_exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_so_SOURCES =
vgpreload_exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_so_CPPFLAGS = \
vgpreload_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_so_SOURCES =
vgpreload_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_so_CPPFLAGS = \
$(AM_CPPFLAGS_@VGCONF_PLATFORM_PRI_CAPS@)
vgpreload_exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_so_CFLAGS = \
vgpreload_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_so_CFLAGS = \
$(AM_CFLAGS_PSO_@VGCONF_PLATFORM_PRI_CAPS@)
vgpreload_exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_so_DEPENDENCIES = \
vgpreload_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_so_DEPENDENCIES = \
$(LIBREPLACEMALLOC_@VGCONF_PLATFORM_PRI_CAPS@)
vgpreload_exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_so_LDFLAGS = \
vgpreload_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_so_LDFLAGS = \
$(PRELOAD_LDFLAGS_@VGCONF_PLATFORM_PRI_CAPS@) \
$(LIBREPLACEMALLOC_LDFLAGS_@VGCONF_PLATFORM_PRI_CAPS@)
if VGCONF_HAVE_PLATFORM_SEC
vgpreload_exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_so_SOURCES =
vgpreload_exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_so_CPPFLAGS = \
vgpreload_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_so_SOURCES =
vgpreload_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_so_CPPFLAGS = \
$(AM_CPPFLAGS_@VGCONF_PLATFORM_SEC_CAPS@)
vgpreload_exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_so_CFLAGS = \
vgpreload_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_so_CFLAGS = \
$(AM_CFLAGS_PSO_@VGCONF_PLATFORM_SEC_CAPS@)
vgpreload_exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_so_DEPENDENCIES = \
vgpreload_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_so_DEPENDENCIES = \
$(LIBREPLACEMALLOC_@VGCONF_PLATFORM_SEC_CAPS@)
vgpreload_exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_so_LDFLAGS = \
vgpreload_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_so_LDFLAGS = \
$(PRELOAD_LDFLAGS_@VGCONF_PLATFORM_SEC_CAPS@) \
$(LIBREPLACEMALLOC_LDFLAGS_@VGCONF_PLATFORM_SEC_CAPS@)
endif

File diff suppressed because it is too large Load Diff

2553
dhat/dh_test.js Normal file

File diff suppressed because it is too large Load Diff

130
dhat/dh_view.css Normal file
View File

@ -0,0 +1,130 @@
/*--------------------------------------------------------------------*/
/*--- DHAT: a Dynamic Heap Analysis Tool dh_view.css ---*/
/*--------------------------------------------------------------------*/
/*
This file is part of DHAT, a Valgrind tool for profiling the
heap usage of programs.
Copyright (C) 2018 Mozilla Foundation
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of the
License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
02111-1307, USA.
The GNU General Public License is contained in the file COPYING.
*/
html {
background: #cfcfcf; /* pale grey */
}
.section {
border-radius: 10px;
background-color: white;
padding: 1em;
margin: 0.5em 0;
}
div.header {
font-weight: bold;
display: inline-block;
margin: 0 1.5em 0 0;
border-radius: 10px;
padding: 0.5em;
background-color: #cfcfcf; /* pale grey */
-moz-user-select: none;
-webkit-user-select: none;
-ms-user-select: none;
user-select: none;
}
.hidden {
display: none;
}
.error {
color: red;
}
.invocation {
background-color: #bfd7d7; /* pale blue-grey */
}
.times {
background-color: #efdfbf; /* pale brown */
}
.arrow, .treeline {
background-color: white;
}
.internal {
cursor: pointer;
}
/* increasingly pale shades of green */
.leaf.lt100 { background-color: #7fff7f; }
.leaf.lt32 { background-color: #8fff8f; }
.leaf.lt16 { background-color: #9fff9f; }
.leaf.lt8 { background-color: #afffaf; }
.leaf.lt4 { background-color: #bfffbf; }
.leaf.lt2 { background-color: #cfffcf; }
.leaf.lt1 { background-color: #dfffdf; }
.leaf.insig { background-color: #efffef; }
/* increasingly pale shades of yellow */
.collapsed.lt100 { background-color: #ffff7f; }
.collapsed.lt32 { background-color: #ffff8f; }
.collapsed.lt16 { background-color: #ffff9f; }
.collapsed.lt8 { background-color: #ffffaf; }
.collapsed.lt4 { background-color: #ffffbf; }
.collapsed.lt2 { background-color: #ffffcf; }
.collapsed.lt1 { background-color: #ffffdf; }
.collapsed.insig { background-color: #ffffef; }
/* increasingly pale shades of blue */
.expanded.lt100 { background-color: #7f7fff; }
.expanded.lt32 { background-color: #8f8fff; }
.expanded.lt16 { background-color: #9f9fff; }
.expanded.lt8 { background-color: #afafff; }
.expanded.lt4 { background-color: #bfbfff; }
.expanded.lt2 { background-color: #cfcfff; }
.expanded.lt1 { background-color: #dfdfff; }
.expanded.insig { background-color: #efefff; }
.bold {
font-weight: bold;
}
.threshold {
background-color: #dfdfdf; /* pale grey */
}
.noselect {
-moz-user-select: none;
-webkit-user-select: none;
-ms-user-select: none;
user-select: none;
}
.legend, .timings {
font-size: 80%;
padding: 0 1em;
}
.debug {
font-size: 80%;
}

10
dhat/dh_view.html Normal file
View File

@ -0,0 +1,10 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<link rel="stylesheet" href="dh_view.css">
<script src="dh_view.js"></script>
</head>
<body onload="onLoad()"></body>
</html>

1445
dhat/dh_view.js Normal file

File diff suppressed because it is too large Load Diff

654
dhat/docs/dh-manual.xml Normal file
View File

@ -0,0 +1,654 @@
<?xml version="1.0"?> <!-- -*- sgml -*- -->
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
[ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
<chapter id="dh-manual"
xreflabel="DHAT: a dynamic heap analysis tool">
<title>DHAT: a dynamic heap analysis tool</title>
<para>To use this tool, you must specify
<option>--tool=dhat</option> on the Valgrind command line.</para>
<sect1 id="dh-manual.overview" xreflabel="Overview">
<title>Overview</title>
<para>DHAT is a tool for examining how programs use their heap
allocations.</para>
<para>It tracks the allocated blocks, and inspects every memory access
to find which block, if any, it is to. It presents, on an allocation point
basis, information about these blocks such as sizes, lifetimes, numbers of
reads and writes, and read and write patterns.</para>
<para>Using this information it is possible to identify allocation points with
the following characteristics:</para>
<itemizedlist>
<listitem><para>potential process-lifetime leaks: blocks allocated
by the point just accumulate, and are freed only at the end of the
run.</para></listitem>
<listitem><para>excessive turnover: points which chew through a lot
of heap, even if it is not held onto for very long</para></listitem>
<listitem><para>excessively transient: points which allocate very
short lived blocks</para></listitem>
<listitem><para>useless or underused allocations: blocks which are
allocated but not completely filled in, or are filled in but not
subsequently read.</para></listitem>
<listitem><para>blocks with inefficient layout -- areas never
accessed, or with hot fields scattered throughout the
block.</para></listitem>
</itemizedlist>
<para>As with the Massif heap profiler, DHAT measures program progress
by counting instructions, and so presents all age/time related figures
as instruction counts. This sounds a little odd at first, but it
makes runs repeatable in a way which is not possible if CPU time is
used.</para>
</sect1>
<sect1 id="dh-manual.profile" xreflabel="Using DHAT">
<title>Using DHAT</title>
<para>First off, as for normal Valgrind use, you probably want to compile with
debugging info (the <option>-g</option> option). But by contrast with normal
Valgrind use, you probably do want to turn optimisation on, since you should
profile your program as it will be normally run.</para>
<para>Second, you need to run your program under DHAT to gather the profiling
information.</para>
<para>Finally, you need to use DHAT's viewer (in a web browser) to get a
detailed presentation of that information.</para>
<sect2 id="dh-manual.running-DHAT" xreflabel="Running DHAT">
<title>Running DHAT</title>
<para>To run DHAT on a program <filename>prog</filename>, run:</para>
<screen><![CDATA[
valgrind --tool=dhat prog
]]></screen>
<para>The program will execute (slowly). Upon completion, summary statistics
that look like this will be printed:</para>
<programlisting><![CDATA[
==11514== Total: 823,849,731 bytes in 3,929,133 blocks
==11514== At t-gmax: 133,485,082 bytes in 436,521 blocks
==11514== At t-end: 258,002 bytes in 2,129 blocks
==11514== Reads: 2,807,182,810 bytes
==11514== Writes: 1,149,617,086 bytes
]]></programlisting>
<para>The first line shows how many heap blocks and bytes were allocated over
the entire execution.</para>
<para>The second line shows how many heap blocks and bytes were alive at
<computeroutput>t-gmax</computeroutput>, i.e. the time when the heap size
reached its global maximum (as measured in bytes).</para>
<para>The third line shows how many heap blocks and bytes were alive at
<computeroutput>t-end</computeroutput>, i.e. the end of execution. In other
words, how many blocks and bytes were not explicitly freed. </para>
<para>The fourth and fifth lines show how many bytes within heap blocks were
read and written during the entire execution. </para>
<para>These lines are moderately interesting at best. More useful information
can be seen with DHAT's viewer.</para>
</sect2>
<sect2 id="dh-manual.outputfile" xreflabel="Output File">
<title>Output File</title>
<para>As well as printing summary information, DHAT also writes more detailed
profiling information to a file. By default this file is named
<filename>dhat.out.&lt;pid&gt;</filename> (where
<filename>&lt;pid&gt;</filename> is the program's process ID), but its name can
be changed with the <option>--dhat-out-file</option> option. This file is JSON,
and intended to be viewed by DHAT's viewer, which is described in the next
section.</para>
<para>The default <computeroutput>.&lt;pid&gt;</computeroutput> suffix on the
output file name serves two purposes. Firstly, it means you don't have to
rename old log files that you don't want to overwrite. Secondly, and more
importantly, it allows correct profiling with the
<option>--trace-children=yes</option> option of programs that spawn child
processes.</para>
<para>The output file can be big, many megabytes for large applications
built with full debugging information.</para>
</sect2>
</sect1>
<sect1 id="dh-manual.viewer" xreflabel="DHAT's viewer">
<title>DHAT's Viewer</title>
<para>DHAT's viewer can be run in a web browser by loading the file
<computeroutput>dh_view.html</computeroutput>. Use the "Load" button to choose
a DHAT output file to view.</para>
<sect2><title>The Output Header</title>
<para>The first part of the output shows the program command and process ID.
For example:</para>
<programlisting><![CDATA[
Invocation {
Command: /home/njn/moz/rust0/build/x86_64-unknown-linux-gnu/stage2/bin/rustc --crate-name tuple_stress src/main.rs
PID: 18816
}
]]></programlisting>
<para>The second part of the output shows the
<computeroutput>t-gmax</computeroutput> and
<computeroutput>t-end</computeroutput> values again. For example:</para>
<programlisting><![CDATA[
Times {
t-gmax: 8,138,210,673 instrs (86.92% of program duration)
t-end: 9,362,544,994 instrs
}
]]></programlisting>
</sect2>
<sect2><title>The AP Tree</title>
<para>The third part of the output is the largest and most interesting part,
showing the allocation point (AP) tree.</para>
<sect3><title>Structure</title>
The following image shows a screenshot of part of an AP tree. The font is very
small because this screenshot is intended to demonstrate the high-level
structure of the tree rather than the details within the text.
<graphic fileref="images/dh-tree.png" scalefit="1"/>
<para>Like any tree, it has a root node, leaf nodes, and non-leaf nodes. The
structure of the tree is shown by the lines connecting nodes. Child nodes are
beneath their parent and indented one level.</para>
<para>The sub-trees beneath a non-leaf node can be collapsed or expanded by
clicking on the node. It is useful to collapse sub-trees that you aren't
interested in.</para>
<para>Colours are meaningful, and are intended to ease tree navigation, but the
information they represent is also present within the text. (This means that
colour-blind users are not denied any information.)</para>
<para>Each leaf node is coloured green. Each non-leaf node is coloured blue
and has a down arrow (<computeroutput></computeroutput>) next to it when
its sub-tree is expanded. Each non-leaf node is coloured yellow and has a
left arrow (<computeroutput></computeroutput>) next to it when its sub-tree
is collapsed.</para>
<para>The shade of green, blue or yellow used for a node indicate its
significance. Darker shades represent greater significance (in terms of bytes
or blocks).</para>
<para>Note that the entire output is text, even the arrows and lines connecting
nodes. This means you can copy and paste any part of the output easily into an
email, bug report, etc.</para>
</sect3>
<sect3><title>The Root Node</title>
<para>The root node looks like this:</para>
<programlisting><![CDATA[
AP 1/1 (25 children) {
Total: 1,355,253,987 bytes (100%, 67,454.81/Minstr) in 5,943,417 blocks (100%, 295.82/Minstr), avg size 228.03 bytes, avg lifetime 3,134,692,250.67 instrs (15.6% of program duration)
At t-gmax: 423,930,307 bytes (100%) in 1,575,682 blocks (100%), avg size 269.05 bytes
At t-end: 258,002 bytes (100%) in 2,129 blocks (100%), avg size 121.18 bytes
Reads: 5,478,606,988 bytes (100%, 272,685.7/Minstr), 4.04/byte
Writes: 2,040,294,800 bytes (100%, 101,551.22/Minstr), 1.51/byte
Allocated at {
#0: [root]
}
}
]]></programlisting>
<para>The root node covers the entire execution. The information is a superset
of the information shown when DHAT ran, adding details such as allocation
rates, average block sizes, block lifetimes, and read and write ratios. The
next example will explain these in more detail.</para>
</sect3>
<sect3><title>Interior Nodes</title>
<para>AP nodes further down the tree show information about a subset of
allocations. For example:</para>
<programlisting><![CDATA[
AP 1.1/25 (2 children) {
Total: 54,533,440 bytes (4.02%, 2,714.28/Minstr) in 458,839 blocks (7.72%, 22.84/Minstr), avg size 118.85 bytes, avg lifetime 1,127,259,403.64 instrs (5.61% of program duration)
At t-gmax: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
At t-end: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
Reads: 15,993,012 bytes (0.29%, 796.02/Minstr), 0.29/byte
Writes: 20,974,752 bytes (1.03%, 1,043.97/Minstr), 0.38/byte
Allocated at {
#1: 0x95CACC9: alloc (alloc.rs:72)
#2: 0x95CACC9: alloc (alloc.rs:148)
#3: 0x95CACC9: reserve_internal<syntax::tokenstream::TokenStream,alloc::alloc::Global> (raw_vec.rs:669)
#4: 0x95CACC9: reserve<syntax::tokenstream::TokenStream,alloc::alloc::Global> (raw_vec.rs:492)
#5: 0x95CACC9: reserve<syntax::tokenstream::TokenStream> (vec.rs:460)
#6: 0x95CACC9: push<syntax::tokenstream::TokenStream> (vec.rs:989)
#7: 0x95CACC9: parse_token_trees_until_close_delim (tokentrees.rs:27)
#8: 0x95CACC9: syntax::parse::lexer::tokentrees::<impl syntax::parse::lexer::StringReader<'a>>::parse_token_tree (tokentrees.rs:81)
}
}
]]></programlisting>
<para>The first line indicates the node's position in the tree. The
<computeroutput>1.1</computeroutput> is a unique identifier for the node and
also says that it is the first child node <computeroutput>1</computeroutput>
(which is the root). The <computeroutput>/25</computeroutput> says that it is
one of 25 children, i.e. it has 24 siblings. The <computeroutput>(2
children)</computeroutput> says that this node node has two children of its
own.</para>
<para>Allocations are aggregated by their allocation stack trace. The
<computeroutput>Allocated at</computeroutput> section shows the allocation
stack trace that is shared by all the blocks covered by this node.</para>
<para>The <computeroutput>Total</computeroutput> line shows that this node
accounts for 4.02% of all bytes allocated during execution, and 7.72% of all
blocks. These percentages are useful for comparing the significance of
different nodes within a single profile; an AP that accounts for 10% of bytes
allocated is likely to be more interesting than one that accounts for
2%.</para>
<para>The <computeroutput>Total</computeroutput> line also shows allocation
rates, measured in bytes and blocks per million instructions. These rates are
useful for comparing the significance of nodes across profiles made with
different workloads.</para>
<para>Finally, the <computeroutput>Total</computeroutput> line shows the
average size and lifetimes of these blocks.</para>
<para>The <computeroutput>At t-gmax</computeroutput> line says shows that no
blocks from this AP were alive when the global heap peak occurred. In other
words, these blocks do not contribute at all to the global heap peak.</para>
<para>The <computeroutput>At t-end</computeroutput> line shows that no blocks
were from this AP were alive at shutdown. In other words, all those blocks were
explicitly freed before termination.</para>
<para>The <computeroutput>Reads</computeroutput> and
<computeroutput>Writes</computeroutput> lines show how many bytes were read
within this AP's blocks, the fraction this represents of all heap reads, and
the read rate. Finally, it shows the read ratio, which is the number of reads
per byte. In this case the number is 0.29, which is quite low -- if no byte was
read twice, then only 29% of the allocated bytes, which means that at least 71%
of the bytes were never read! This suggests that the blocks are being
underutilized and might be worth optimizing.</para>
<para>The <computeroutput>Writes</computeroutput> lines is similar to the
<computeroutput>Reads</computeroutput> line. In this case, at most 38% of the
bytes are ever written, and at least 62% of the bytes were never written.
</para>
<para>The <computeroutput>Reads</computeroutput> and
<computeroutput>Writes</computeroutput> measurements suggest that the blocks
are being under-utilised and might be worth optimizing. Having said that, this
kind of under-utilisation is common in data structures that grow, such as
vectors and hash tables, and isn't always fixable. </para>
</sect3>
<sect3><title>Leaf Nodes</title>
<para>This is a leaf node:</para>
<programlisting><![CDATA[
AP 1.1.1.1/2 {
Total: 31,460,928 bytes (2.32%, 1,565.9/Minstr) in 262,171 blocks (4.41%, 13.05/Minstr), avg size 120 bytes, avg lifetime 986,406,885.05 instrs (4.91% of program duration)
Max: 16,779,136 bytes in 65,543 blocks, avg size 256 bytes
At t-gmax: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
At t-end: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
Reads: 5,964,704 bytes (0.11%, 296.88/Minstr), 0.19/byte
Writes: 10,487,200 bytes (0.51%, 521.98/Minstr), 0.33/byte
Allocated at {
^1: 0x95CACC9: alloc (alloc.rs:72)
^2: 0x95CACC9: alloc (alloc.rs:148)
^3: 0x95CACC9: reserve_internal<syntax::tokenstream::TokenStream,alloc::alloc::Global> (raw_vec.rs:669)
^4: 0x95CACC9: reserve<syntax::tokenstream::TokenStream,alloc::alloc::Global> (raw_vec.rs:492)
^5: 0x95CACC9: reserve<syntax::tokenstream::TokenStream> (vec.rs:460)
^6: 0x95CACC9: push<syntax::tokenstream::TokenStream> (vec.rs:989)
^7: 0x95CACC9: parse_token_trees_until_close_delim (tokentrees.rs:27)
^8: 0x95CACC9: syntax::parse::lexer::tokentrees::<impl syntax::parse::lexer::StringReader<'a>>::parse_token_tree (tokentrees.rs:81)
^9: 0x95CAC39: parse_token_trees_until_close_delim (tokentrees.rs:26)
^10: 0x95CAC39: syntax::parse::lexer::tokentrees::<impl syntax::parse::lexer::StringReader<'a>>::parse_token_tree (tokentrees.rs:81)
#11: 0x95CAC39: parse_token_trees_until_close_delim (tokentrees.rs:26)
#12: 0x95CAC39: syntax::parse::lexer::tokentrees::<impl syntax::parse::lexer::StringReader<'a>>::parse_token_tree (tokentrees.rs:81)
}
}
]]></programlisting>
<para>The <computeroutput>1.1.1.1/2</computeroutput> indicates that this node
is a great-grandchild of the root; is the first grandchild of the node in the
previous example; and has no children.</para>
<para>Leaf nodes contain an additional <computeroutput>Max</computeroutput>
line, indicating the peak memory use for the blocks covered by this AP. (This
peak may have occurred at a time other than
<computeroutput>t-gmax</computeroutput>.) In this case, 31,460,298 bytes were
allocated from this AP, but the maximum size alive at once was 16,779,136
bytes.</para>
<para>Stack frames that begin with a <computeroutput>^</computeroutput> rather
than a <computeroutput>#</computeroutput> are copied from ancestor nodes.
(In this example, the first 8 frames are identical to those from the node in
the previous example.) These frames could be found by tracing back through
ancestor nodes, but that can be annoying, which is why they are duplicated.
This also means that each node makes complete sense on its own.</para>
</sect3>
<sect3><title>Access Counts</title>
<para>If all blocks covered by an AP node have the same size, an additional
<computeroutput>Accesses</computeroutput> field will be present. It indicates
how the reads and writes within these blocks were distributed. For
example:</para>
<programlisting><![CDATA[
Total: 8,388,672 bytes (0.62%, 417.53/Minstr) in 262,146 blocks (4.41%, 13.05/Minstr), avg size 32 bytes, avg lifetime 16,726,078,401.51 instrs (83.25% of program duration)
At t-gmax: 8,388,672 bytes (1.98%) in 262,146 blocks (16.64%), avg size 32 bytes
At t-end: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
Reads: 9,109,682 bytes (0.17%, 453.41/Minstr), 1.09/byte
Writes: 7,340,088 bytes (0.36%, 365.34/Minstr), 0.88/byte
Accesses: {
[ 0] 65547 7 8 4 65529 〃 〃 〃 16 〃 〃 〃 12 〃 〃 〃 〃 〃 〃 〃 〃 〃 〃 〃 65542 〃 〃 〃 - - - -
}
]]></programlisting>
<para>Every block covered by this AP was 32 bytes. Within all of those blocks,
byte 0 was accessed (read or written) 65,547 times, byte 1 was accessed 7
times, byte 2 was accessed 8 times, and so on.</para>
<para>The ditto symbol (<computeroutput></computeroutput>) means "same access
count as the previous byte".</para>
<para>A dash (<computeroutput>-</computeroutput>) means "zero". (It is used
instead of <computeroutput>0</computeroutput> because it makes unaccessed
regions more easily identifiable.)</para>
<para>The infinity symbol (<computeroutput></computeroutput>, not present in
this example) means "exceeded the maximum tracked count".</para>
<para>Block layout can often be inferred from counts. For example, these blocks
probably have four separate byte-sized fields, followed by a four-byte field,
and so on.</para>
<para>Access counts can be useful for identifying data alignment holes or other
layout inefficiencies.</para>
</sect3>
<sect3><title>Aggregate Nodes</title>
<para>The AP tree is very large and many nodes represent tiny numbers of blocks
and bytes. Therefore, DHAT's viewer aggregates insignificant nodes like
this:</para>
<programlisting><![CDATA[
AP 1.14.2/2 {
Total: 5,175 blocks (0.09%, 0.26/Minstr)
Allocated at {
[5 insignificant]
}
}
]]></programlisting>
<para>Much of the detail is stripped away, leaving only basic measurements,
along with an indication of how many nodes were aggregated together (5 in this
case).</para>
</sect3>
</sect2>
<sect2><title>The Output Footer</title>
<para>Below the AP tree is a line like this:</para>
<programlisting><![CDATA[
AP significance threshold: total >= 59,434.17 blocks (1%)
]]></programlisting>
<para>It shows the function used to determine if an AP node is significant. All
nodes that don't satisfy this function are aggregated. It is occasionally
useful if you don't understand why an AP node has been aggregated. The exact
threshold depends on the sort metric (see below).</para>
<para>Finally, the bottom of the page shows a legend that explains some of the
terms, abbreviations and symbols used in the output.</para>
</sect2>
<sect2><title>Sort Metrics</title>
<para>The order in which sub-trees are sorted can be changed via the "Sort
metric" drop-down menu at the top of DHAT's viewer. Different sort metrics can
be useful for finding different things. Some sort metrics also incorporate some
filtering, so that only nodes meeting a particular criteria are shown.</para>
<!-- start of xi:include in the manpage -->
<variablelist>
<varlistentry>
<term>Total (bytes)</term>
<listitem><para>The total number of bytes allocated during the execution.
Highly useful for evaluating heap churn, though not quite as useful as
"Total (blocks)".
</para></listitem>
</varlistentry>
<varlistentry>
<term>Total (blocks)</term>
<listitem><para>The total number of blocks allocated during the execution.
Highly useful for evaluating heap churn; reducing the number of calls to
the allocator can significantly speed up a program. This is the default
sort metric.
</para></listitem>
</varlistentry>
<varlistentry>
<term>Total (blocks), tiny</term>
<listitem><para>Like "Total (blocks)", but shows only very small blocks.
Moderately useful, because such blocks are often easy to avoid allocating.
</para></listitem>
</varlistentry>
<varlistentry>
<term>Total (blocks), short-lived</term>
<listitem><para>Like "Total (blocks)", but shows only very short-lived
blocks. Moderately useful, because such blocks are often easy to avoid
allocating.
</para></listitem>
</varlistentry>
<varlistentry>
<term>Total (bytes), zero reads or zero writes</term>
<listitem><para>Like "Total (bytes)", but shows only blocks that are
never read or never written to (or both). Highly useful, because such
blocks indicate poor use of memory and are often easy to avoid allocating.
For example, sometimes a block is allocated and written to but then only
read if a condition C is true; in that case, it may be possible to delay
creating the block until condition C is true. Alternatively, sometimes
blocks are created and never used; such blocks are trivial to remove.
</para></listitem>
</varlistentry>
<varlistentry>
<term>Total (blocks), zero reads or zero writes</term>
<listitem><para>Like "Total (bytes), zero reads or zero writes" but for
blocks. Highly useful.
</para></listitem>
</varlistentry>
<varlistentry>
<term>Total (bytes), low-access</term>
<listitem><para>Like "Total (bytes)", but shows only blocks that have low
numbers of reads or low numbers of writes (or both). Moderately useful,
because such blocks indicate poor use of memory.
</para></listitem>
</varlistentry>
<varlistentry>
<term>Total (blocks), low-access</term>
<listitem><para>Like "Total (bytes), low-access", but for blocks.
</para></listitem>
</varlistentry>
<varlistentry>
<term>At t-gmax (bytes)</term>
<listitem><para>This shows the breakdown of memory at the point of peak
heap memory usage. Highly useful for reducing peak memory usage.
</para></listitem>
</varlistentry>
<varlistentry>
<term>At t-end (bytes)</term>
<listitem><para>This shows the breakdown of memory at program termination.
Highly useful for identifying process-lifetime leaks.
</para></listitem>
</varlistentry>
<varlistentry>
<term>Reads (bytes)</term>
<listitem><para>The number of bytes read within heap blocks. Occasionally
useful.
</para></listitem>
</varlistentry>
<varlistentry>
<term>Reads (bytes), high-access</term>
<listitem><para>Like "Reads (bytes)", but only shows blocks with high read
ratios. Occasionally useful for identifying hot areas of memory.
</para></listitem>
</varlistentry>
<varlistentry>
<term>Writes (bytes)</term>
<listitem><para>Like "Reads (bytes)", but for writes. Occasionally useful.
</para></listitem>
</varlistentry>
<varlistentry>
<term>Writes (bytes), high-access</term>
<listitem><para>Like "Reads (bytes), high-access", but for writes.
Occasionally useful.
</para></listitem>
</varlistentry>
</variablelist>
<para>The values within a node that represent the chosen sort metric are shown
in bold, so they stand out.</para>
<para>Here is part of an AP node found with "Total (blocks), tiny", showing
blocks with an average size of only 8.67 bytes:</para>
<programlisting><![CDATA[
Total: 3,407,848 bytes (0.25%, 169.62/Minstr) in 393,214 blocks (6.62%, 19.57/Minstr), avg size 8.67 bytes, avg lifetime 1,167,795,629.1 instrs (5.81% of program duration)
]]></programlisting>
<para>Here is part of an AP node found with "Total (blocks), short-lived",
showing blocks with an average lifetime of only 181.75 instructions:</para>
<programlisting><![CDATA[
Total: 23,068,584 bytes (1.7%, 1,148.19/Minstr) in 262,143 blocks (4.41%, 13.05/Minstr), avg size 88 bytes, avg lifetime 181.75 instrs (0% of program duration)
]]></programlisting>
<para>Here is an example of an AP identified with "Total (blocks), zero reads
or zero writes", showing blocks that are allocated but never touched:</para>
<programlisting><![CDATA[
Total: 7,339,920 bytes (0.54%, 365.33/Minstr) in 262,140 blocks (4.41%, 13.05/Minstr), avg size 28 bytes, avg lifetime 1,141,103,997.69 instrs (5.68% of program duration)
Max: 3,669,960 bytes in 131,070 blocks, avg size 28 bytes
At t-gmax: 3,336,400 bytes (0.79%) in 119,157 blocks (7.56%), avg size 28 bytes
At t-end: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
Reads: 0 bytes (0%, 0/Minstr), 0/byte
Writes: 0 bytes (0%, 0/Minstr), 0/byte
]]></programlisting>
<para>All the blocks identified by these APs are good candidates for
optimization.</para>
</sect2>
</sect1>
<sect1 id="dh-manual.options" xreflabel="DHAT Command-line Options">
<title>DHAT Command-line Options</title>
<para>DHAT-specific command-line options are:</para>
<!-- start of xi:include in the manpage -->
<variablelist id="dh.opts.list">
<varlistentry id="opt.dhat-out-file" xreflabel="--dhat-out-file">
<term>
<option><![CDATA[--dhat-out-file=<file> ]]></option>
</term>
<listitem>
<para>Write the profile data to
<computeroutput>file</computeroutput> rather than to the default
output file,
<filename>dhat.out.&lt;pid&gt;</filename>. The
<option>%p</option> and <option>%q</option> format specifiers
can be used to embed the process ID and/or the contents of an
environment variable in the name, as is the case for the core
option <option><xref linkend="opt.log-file"/></option>.
</para>
</listitem>
</varlistentry>
</variablelist>
<para>Note that stacks by default have 12 frames. This may be more than
necessary, in which case the <option>--num-callers</option> flag can be used to
reduce the number, which may make DHAT run slightly faster.
</para>
<!-- end of xi:include in the manpage -->
</sect1>
</chapter>

23
dhat/tests/Makefile.am Normal file
View File

@ -0,0 +1,23 @@
include $(top_srcdir)/Makefile.tool-tests.am
dist_noinst_SCRIPTS = filter_stderr
EXTRA_DIST = \
acc.post.exp acc.stderr.exp acc.vgtest \
basic.post.exp basic.stderr.exp basic.vgtest \
big.post.exp big.stderr.exp big.vgtest \
empty.post.exp empty.stderr.exp empty.vgtest \
single.post.exp single.stderr.exp single.vgtest
check_PROGRAMS = \
acc \
basic \
big \
empty \
sig \
single
AM_CFLAGS += $(AM_FLAG_M3264_PRI)
AM_CXXFLAGS += $(AM_FLAG_M3264_PRI)

74
dhat/tests/acc.c Normal file
View File

@ -0,0 +1,74 @@
// Testing accesses of blocks.
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
void* m1(size_t n) { return malloc(n); }
void* m2(size_t n) { return malloc(n); }
int main(void)
{
// 0th char is written 0 times, 1st char is written once, etc.
char* a = malloc(32);
for (int i = 1; i < 32; i++) {
for (int j = 0; j < i; j++) {
a[i] = 0;
}
}
free(a);
// Repetition and gaps.
int* b = malloc(20);
b[0] = 1;
b[2] = b[0];
for (int i = 0; i < 10; i++) {
b[4] = 99;
}
free(b);
// 33 bytes, goes onto a second line in dh_view.
char* c = calloc(33, 1);
c[32] = 0;
free(c);
// 1024 bytes, accesses are shown.
char* d = malloc(1024);
for (int i = 0; i < 1024; i++) {
d[i] = d[1023 - i];
}
for (int i = 500; i < 600; i++) {
d[i] = 0;
}
free(d);
// 1025 bytes, accesses aren't shown.
char* e = calloc(1025, 1);
for (int i = 0; i < 1025; i++) {
e[i] += 1;
}
free(e);
// Lots of accesses, but fewer than the 0xffff max value.
int* f1 = m1(100);
int* f2 = m1(100);
for (int i = 0; i < 50000; i++) {
f1[0] = 0;
f2[0] = 0;
}
free(f1);
free(f2);
// Lots of accesses, more than the 0xffff max value: treated as Infinity.
int* g1 = m2(100);
int* g2 = m2(100);
for (int i = 0; i < 100000; i++) {
g1[0] = 0;
g2[0] = 0;
}
free(g1);
free(g2);
return 0;
}

View File

@ -0,0 +1,7 @@
Total: 2,534 bytes in 9 blocks
At t-gmax: 1,025 bytes in 1 blocks
At t-end: 0 bytes in 0 blocks
Reads: 2,053 bytes
Writes: 1,202,694 bytes

3
dhat/tests/acc.vgtest Normal file
View File

@ -0,0 +1,3 @@
prog: acc
vgopts: --dhat-out-file=dhat.out
cleanup: rm dhat.out

26
dhat/tests/basic.c Normal file
View File

@ -0,0 +1,26 @@
// Some basic allocations and accesses.
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
int64_t* m = malloc(1000);
m[0] = 1; // write 8 bytes
m[10] = m[1]; // read and write 8 bytes
char* c = calloc(1, 2000);
for (int i = 0; i < 1000; i++) {
c[i + 1000] = c[i]; // read and write 1000 bytes
}
char* r = realloc(m, 3000);
for (int i = 0; i < 500; i++) {
r[i + 2000] = 99; // write 500 bytes
}
// totals: 1008 read, 1516 write
free(c);
return 0;
}

View File

@ -0,0 +1,7 @@
Total: 6,000 bytes in 3 blocks
At t-gmax: 5,000 bytes in 2 blocks
At t-end: 3,000 bytes in 1 blocks
Reads: 1,008 bytes
Writes: 1,516 bytes

3
dhat/tests/basic.vgtest Normal file
View File

@ -0,0 +1,3 @@
prog: basic
vgopts: --dhat-out-file=dhat.out
cleanup: rm dhat.out

61
dhat/tests/big.c Normal file
View File

@ -0,0 +1,61 @@
// This test implements a moderately complex call tree. The layout of these
// functions matches the layout of the tree produced by dh_view.js, when
// sorted by "total bytes".
#include <stdlib.h>
#define F(f, parent) void* f(size_t n) { return parent(n); }
// Note: don't use j1 -- that is a builtin C function, believe it or not.
F(a, malloc)
F(b1, a)
F(c1, b1)
F(d1, c1)
F(d2, c1) // insig total-bytes
F(c2, b1)
F(b2, a)
F(b3, a) F(e, b3) F(f, e)
F(g, malloc) F(h, g) F(i, h)
F(j2, i) F(k, j2) F(l, k)
F(j3, i) F(m, j3)
F(n1, m)
F(n2, m) F(o, n2)
F(p, i) F(q, p) // insig total-bytes
F(r, i) // insig total-bytes
F(s1, malloc) F(s2, s1) F(s3, s2) F(s4, s3) F(s5, s4)
F(t, malloc)
F(u, malloc)
F(v, malloc)
F(w, v) // insig total-bytes
F(x, v) // insig total-bytes
F(y, v) // insig total-bytes
F(z, v) // insig total-bytes
int main(void)
{
// Call all the leaves in the above tree.
int* d1p = d1(706);
free(d1p); // So the t-final numbers differ from the t-gmax/total numbers.
d2(5);
c2(30);
b2(20);
f(10);
l(60);
n1(30);
o(20);
q(7);
r(3);
s5(30);
t(20);
u(19);
w(9);
x(8);
y(7);
z(5);
z(1);
// And one allocation directly from main().
malloc(10);
}

View File

@ -0,0 +1,7 @@
Total: 1,000 bytes in 19 blocks
At t-gmax: 706 bytes in 1 blocks
At t-end: 294 bytes in 18 blocks
Reads: 0 bytes
Writes: 0 bytes

3
dhat/tests/big.vgtest Normal file
View File

@ -0,0 +1,3 @@
prog: big
vgopts: --dhat-out-file=dhat.out
cleanup: rm dhat.out

6
dhat/tests/empty.c Normal file
View File

@ -0,0 +1,6 @@
// No allocations.
int main(void)
{
return 0;
}

View File

@ -0,0 +1,7 @@
Total: 0 bytes in 0 blocks
At t-gmax: 0 bytes in 0 blocks
At t-end: 0 bytes in 0 blocks
Reads: 0 bytes
Writes: 0 bytes

3
dhat/tests/empty.vgtest Normal file
View File

@ -0,0 +1,3 @@
prog: empty
vgopts: --dhat-out-file=dhat.out
cleanup: rm dhat.out

9
dhat/tests/filter_stderr Executable file
View File

@ -0,0 +1,9 @@
#! /bin/sh
dir=`dirname $0`
$dir/../../tests/filter_stderr_basic |
# Remove "Massif, ..." line and the following copyright line.
sed "/^DHAT, a dynamic heap analysis tool/ , /./ d"

76
dhat/tests/sig.c Normal file
View File

@ -0,0 +1,76 @@
// This test implements sorting of a tree involving a mix of significant and
// insignificant nodes. The layout of these functions matches the layout of
// the tree produced by dh_view.js, when sorted by "total bytes".
#include <stdlib.h>
#define F(f, parent) void* f(size_t n) { return parent(n); }
F(am, malloc)
// main
F(a2, am) // main
F(a3, am)
// main
// main
F(bm, malloc)
// main
F(b2, bm) // main
F(b3, bm)
// main
// main
F(cm, malloc)
// main
F(c2, cm) // main
F(c3, cm)
// main
// main
F(dm, malloc)
// main
F(d2, dm) // main
F(d3, dm)
// main
// main
char access(char* p, size_t n)
{
for (int i = 0; i < 1499; i++) {
for (int j = 0; j < n; j++) {
p[j] = j;
}
}
char x = 0;
for (int j = 0; j < n; j++) {
x += p[j];
}
return x;
}
int main(void)
{
char* p;
// Call all the leaves in the above tree. The pointers we pass to access()
// become significant in a high-access sort and insignificant in a
// zero-reads-or-zero-writes sort, and vice versa.
p = am(11); access(p, 11);
p = a2(10); access(p, 10);
p = a3(5); access(p, 5);
p = a3(4); access(p, 5);
p = bm(10); access(p, 10);
p = b2(9); access(p, 9);
p = b3(5);
p = b3(3);
p = cm(9); access(p, 9);
p = c2(8);
p = c3(4);
p = c3(3);
p = dm(8);
p = d2(7);
p = d3(4);
p = d3(2);
}

View File

@ -0,0 +1,7 @@
Total: 102 bytes in 16 blocks
At t-gmax: 102 bytes in 16 blocks
At t-end: 102 bytes in 16 blocks
Reads: 58 bytes
Writes: 86,942 bytes

3
dhat/tests/sig.vgtest Normal file
View File

@ -0,0 +1,3 @@
prog: sig
vgopts: --dhat-out-file=dhat.out
cleanup: rm dhat.out

11
dhat/tests/single.c Normal file
View File

@ -0,0 +1,11 @@
// A single allocation (so the root node is the only node in the tree).
#include <stdlib.h>
int main() {
int* a = (int*)malloc(16);
a[0] = 0;
a[0] = 1;
a[0] = 2;
return 0;
}

View File

@ -0,0 +1,7 @@
Total: 16 bytes in 1 blocks
At t-gmax: 16 bytes in 1 blocks
At t-end: 16 bytes in 1 blocks
Reads: 0 bytes
Writes: 12 bytes

3
dhat/tests/single.vgtest Normal file
View File

@ -0,0 +1,3 @@
prog: single
vgopts: --dhat-out-file=dhat.out
cleanup: rm dhat.out

View File

@ -20,6 +20,7 @@ EXTRA_DIST = \
images/prev.png \
images/up.png \
images/kcachegrind_xtree.png \
images/dh-tree.png \
internals/3_0_BUGSTATUS.txt \
internals/3_1_BUGSTATUS.txt \
internals/3_2_BUGSTATUS.txt \

View File

@ -76,10 +76,18 @@ could just build the docs from XML when doing 'make install', which
would be simpler.
Notes on building PDF / PS documents
------------------------------------
Below are random notes and recollections about how to build PDF / PS
documents from the XML source at various times on various Linux distros.
Notes on building HTML / PDF / PS documents
-------------------------------------------
Below are random notes and recollections about how to build documents
from the XML source at various times on various Linux distros. They're
mostly about the PDF/PS documents, because they are the hardest to
build.
Notes [Jan 2019]
-----------------
For Ubuntu 18.04, to build HTML docs I had to:
sudo apt-get install xsltproc
Notes [May 2017]
----------------

BIN
docs/images/dh-tree.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 192 KiB

View File

@ -613,7 +613,7 @@ in most cases. We group the available options by rough categories.</para>
<listitem>
<para>Run the Valgrind tool called <varname>toolname</varname>,
e.g. memcheck, cachegrind, callgrind, helgrind, drd, massif,
lackey, none, exp-sgcheck, exp-bbv, exp-dhat, etc.</para>
dhat, lackey, none, exp-sgcheck, exp-bbv, etc.</para>
</listitem>
</varlistentry>
@ -2562,7 +2562,7 @@ need to use them.</para>
malloc related functions, using the
synonym <varname>somalloc</varname>. This synonym is usable for
all tools doing standard replacement of malloc related functions
(e.g. memcheck, massif, drd, helgrind, exp-dhat, exp-sgcheck).
(e.g. memcheck, helgrind, drd, massif, dhat, exp-sgcheck).
</para>
<itemizedlist>

View File

@ -36,15 +36,15 @@
xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="../../massif/docs/ms-manual.xml" parse="xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="../../exp-dhat/docs/dh-manual.xml" parse="xml"
<xi:include href="../../dhat/docs/dh-manual.xml" parse="xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="../../lackey/docs/lk-manual.xml" parse="xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="../../none/docs/nl-manual.xml" parse="xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="../../exp-sgcheck/docs/sg-manual.xml" parse="xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="../../exp-bbv/docs/bbv-manual.xml" parse="xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="../../lackey/docs/lk-manual.xml" parse="xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="../../none/docs/nl-manual.xml" parse="xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
</book>

View File

@ -1,401 +0,0 @@
<?xml version="1.0"?> <!-- -*- sgml -*- -->
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
[ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
<chapter id="dh-manual"
xreflabel="DHAT: a dynamic heap analysis tool">
<title>DHAT: a dynamic heap analysis tool</title>
<para>To use this tool, you must specify
<option>--tool=exp-dhat</option> on the Valgrind
command line.</para>
<sect1 id="dh-manual.overview" xreflabel="Overview">
<title>Overview</title>
<para>DHAT is a tool for examining how programs use their heap
allocations.</para>
<para>It tracks the allocated blocks, and inspects every memory access
to find which block, if any, it is to. The following data is
collected and presented per allocation point (allocation
stack):</para>
<itemizedlist>
<listitem><para>Total allocation (number of bytes and
blocks)</para></listitem>
<listitem><para>maximum live volume (number of bytes and
blocks)</para></listitem>
<listitem><para>average block lifetime (number of instructions
between allocation and freeing)</para></listitem>
<listitem><para>average number of reads and writes to each byte in
the block ("access ratios")</para></listitem>
<listitem><para>for allocation points which always allocate blocks
only of one size, and that size is 4096 bytes or less: counts
showing how often each byte offset inside the block is
accessed.</para></listitem>
</itemizedlist>
<para>Using these statistics it is possible to identify allocation
points with the following characteristics:</para>
<itemizedlist>
<listitem><para>potential process-lifetime leaks: blocks allocated
by the point just accumulate, and are freed only at the end of the
run.</para></listitem>
<listitem><para>excessive turnover: points which chew through a lot
of heap, even if it is not held onto for very long</para></listitem>
<listitem><para>excessively transient: points which allocate very
short lived blocks</para></listitem>
<listitem><para>useless or underused allocations: blocks which are
allocated but not completely filled in, or are filled in but not
subsequently read.</para></listitem>
<listitem><para>blocks with inefficient layout -- areas never
accessed, or with hot fields scattered throughout the
block.</para></listitem>
</itemizedlist>
<para>As with the Massif heap profiler, DHAT measures program progress
by counting instructions, and so presents all age/time related figures
as instruction counts. This sounds a little odd at first, but it
makes runs repeatable in a way which is not possible if CPU time is
used.</para>
</sect1>
<sect1 id="dh-manual.understanding" xreflabel="Understanding DHAT's output">
<title>Understanding DHAT's output</title>
<para>DHAT provides a lot of useful information on dynamic heap usage.
Most of the art of using it is in interpretation of the resulting
numbers. That is best illustrated via a set of examples.</para>
<sect2>
<title>Interpreting the max-live, tot-alloc and deaths fields</title>
<sect3><title>A simple example</title></sect3>
<screen><![CDATA[
======== SUMMARY STATISTICS ========
guest_insns: 1,045,339,534
[...]
max-live: 63,490 in 984 blocks
tot-alloc: 1,904,700 in 29,520 blocks (avg size 64.52)
deaths: 29,520, at avg age 22,227,424
acc-ratios: 6.37 rd, 1.14 wr (12,141,526 b-read, 2,174,460 b-written)
at 0x4C275B8: malloc (vg_replace_malloc.c:236)
by 0x40350E: tcc_malloc (tinycc.c:6712)
by 0x404580: tok_alloc_new (tinycc.c:7151)
by 0x40870A: next_nomacro1 (tinycc.c:9305)
]]></screen>
<para>Over the entire run of the program, this stack (allocation
point) allocated 29,520 blocks in total, containing 1,904,700 bytes in
total. By looking at the max-live data, we see that not many blocks
were simultaneously live, though: at the peak, there were 63,490
allocated bytes in 984 blocks. This tells us that the program is
steadily freeing such blocks as it runs, rather than hanging on to all
of them until the end and freeing them all.</para>
<para>The deaths entry tells us that 29,520 blocks allocated by this stack
died (were freed) during the run of the program. Since 29,520 is
also the number of blocks allocated in total, that tells us that
all allocated blocks were freed by the end of the program.</para>
<para>It also tells us that the average age at death was 22,227,424
instructions. From the summary statistics we see that the program ran
for 1,045,339,534 instructions, and so the average age at death is
about 2% of the program's total run time.</para>
<sect3><title>Example of a potential process-lifetime leak</title></sect3>
<para>This next example (from a different program than the above)
shows a potential process lifetime leak. A process lifetime leak
occurs when a program keeps allocating data, but only frees the
data just before it exits. Hence the program's heap grows constantly
in size, yet Memcheck reports no leak, because the program has
freed up everything at exit. This is particularly a hazard for
long running programs.</para>
<screen><![CDATA[
======== SUMMARY STATISTICS ========
guest_insns: 418,901,537
[...]
max-live: 32,512 in 254 blocks
tot-alloc: 32,512 in 254 blocks (avg size 128.00)
deaths: 254, at avg age 300,467,389
acc-ratios: 0.26 rd, 0.20 wr (8,756 b-read, 6,604 b-written)
at 0x4C275B8: malloc (vg_replace_malloc.c:236)
by 0x4C27632: realloc (vg_replace_malloc.c:525)
by 0x56FF41D: QtFontStyle::pixelSize(unsigned short, bool) (qfontdatabase.cpp:269)
by 0x5700D69: loadFontConfig() (qfontdatabase_x11.cpp:1146)
]]></screen>
<para>There are two tell-tale signs that this might be a
process-lifetime leak. Firstly, the max-live and tot-alloc numbers
are identical. The only way that can happen is if these blocks are
all allocated and then all deallocated.</para>
<para>Secondly, the average age at death (300 million insns) is 71% of
the total program lifetime (419 million insns), hence this is not a
transient allocation-free spike -- rather, it is spread out over a
large part of the entire run. One interpretation is, roughly, that
all 254 blocks were allocated in the first half of the run, held onto
for the second half, and then freed just before exit.</para>
</sect2>
<sect2>
<title>Interpreting the acc-ratios fields</title>
<sect3><title>A fairly harmless allocation point record</title></sect3>
<screen><![CDATA[
max-live: 49,398 in 808 blocks
tot-alloc: 1,481,940 in 24,240 blocks (avg size 61.13)
deaths: 24,240, at avg age 34,611,026
acc-ratios: 2.13 rd, 0.91 wr (3,166,650 b-read, 1,358,820 b-written)
at 0x4C275B8: malloc (vg_replace_malloc.c:236)
by 0x40350E: tcc_malloc (tinycc.c:6712)
by 0x404580: tok_alloc_new (tinycc.c:7151)
by 0x4046C4: tok_alloc (tinycc.c:7190)
]]></screen>
<para>The acc-ratios field tells us that each byte in the blocks
allocated here is read an average of 2.13 times before the block is
deallocated. Given that the blocks have an average age at death of
34,611,026, that's one read per block per approximately every 15
million instructions. So from that standpoint the blocks aren't
"working" very hard.</para>
<para>More interesting is the write ratio: each byte is written an
average of 0.91 times. This tells us that some parts of the allocated
blocks are never written, at least 9% on average. To completely
initialise the block would require writing each byte at least once,
and that would give a write ratio of 1.0. The fact that some block
areas are evidently unused might point to data alignment holes or
other layout inefficiencies.</para>
<para>Well, at least all the blocks are freed (24,240 allocations,
24,240 deaths).</para>
<para>If all the blocks had been the same size, DHAT would also show
the access counts by block offset, so we could see where exactly these
unused areas are. However, that isn't the case: the blocks have
varying sizes, so DHAT can't perform such an analysis. We can see
that they must have varying sizes since the average block size, 61.13,
isn't a whole number.</para>
<sect3><title>A more suspicious looking example</title></sect3>
<screen><![CDATA[
max-live: 180,224 in 22 blocks
tot-alloc: 180,224 in 22 blocks (avg size 8192.00)
deaths: none (none of these blocks were freed)
acc-ratios: 0.00 rd, 0.00 wr (0 b-read, 0 b-written)
at 0x4C275B8: malloc (vg_replace_malloc.c:236)
by 0x40350E: tcc_malloc (tinycc.c:6712)
by 0x40369C: __sym_malloc (tinycc.c:6787)
by 0x403711: sym_malloc (tinycc.c:6805)
]]></screen>
<para>Here, both the read and write access ratios are zero. Hence
this point is allocating blocks which are never used, neither read nor
written. Indeed, they are also not freed ("deaths: none") and are
simply leaked. So, here is 180k of completely useless allocation that
could be removed.</para>
<para>Re-running with Memcheck does indeed report the same leak. What
DHAT can tell us, that Memcheck can't, is that not only are the blocks
leaked, they are also never used.</para>
<sect3><title>Another suspicious example</title></sect3>
<para>Here's one where blocks are allocated, written to,
but never read from. We see this immediately from the zero read
access ratio. They do get freed, though:</para>
<screen><![CDATA[
max-live: 54 in 3 blocks
tot-alloc: 1,620 in 90 blocks (avg size 18.00)
deaths: 90, at avg age 34,558,236
acc-ratios: 0.00 rd, 1.11 wr (0 b-read, 1,800 b-written)
at 0x4C275B8: malloc (vg_replace_malloc.c:236)
by 0x40350E: tcc_malloc (tinycc.c:6712)
by 0x4035BD: tcc_strdup (tinycc.c:6750)
by 0x41FEBB: tcc_add_sysinclude_path (tinycc.c:20931)
]]></screen>
<para>In the previous two examples, it is easy to see blocks that are
never written to, or never read from, or some combination of both.
Unfortunately, in C++ code, the situation is less clear. That's
because an object's constructor will write to the underlying block,
and its destructor will read from it. So the block's read and write
ratios will be non-zero even if the object, once constructed, is never
used, but only eventually destructed.</para>
<para>Really, what we want is to measure only memory accesses in
between the end of an object's construction and the start of its
destruction. Unfortunately I do not know of a reliable way to
determine when those transitions are made.</para>
</sect2>
<sect2>
<title>Interpreting "Aggregated access counts by offset" data</title>
<para>For allocation points that always allocate blocks of the same
size, and which are 4096 bytes or smaller, DHAT counts accesses
per offset, for example:</para>
<screen><![CDATA[
max-live: 317,408 in 5,668 blocks
tot-alloc: 317,408 in 5,668 blocks (avg size 56.00)
deaths: 5,668, at avg age 622,890,597
acc-ratios: 1.03 rd, 1.28 wr (327,642 b-read, 408,172 b-written)
at 0x4C275B8: malloc (vg_replace_malloc.c:236)
by 0x5440C16: QDesignerPropertySheetPrivate::ensureInfo (qhash.h:515)
by 0x544350B: QDesignerPropertySheet::setVisible (qdesigner_propertysh...)
by 0x5446232: QDesignerPropertySheet::QDesignerPropertySheet (qdesigne...)
Aggregated access counts by offset:
[ 0] 28782 28782 28782 28782 28782 28782 28782 28782
[ 8] 20638 20638 20638 20638 0 0 0 0
[ 16] 22738 22738 22738 22738 22738 22738 22738 22738
[ 24] 6013 6013 6013 6013 6013 6013 6013 6013
[ 32] 18883 18883 18883 37422 0 0 0 0
[ 36] 5668 11915 5668 5668 11336 11336 11336 11336
[ 48] 6166 6166 6166 6166 0 0 0 0
]]></screen>
<para>This is fairly typical, for C++ code running on a 64-bit
platform. Here, we have aggregated access statistics for 5668 blocks,
all of size 56 bytes. Each byte has been accessed at least 5668
times, except for offsets 12--15, 36--39 and 52--55. These are likely
to be alignment holes.</para>
<para>Careful interpretation of the numbers reveals useful information.
Groups of N consecutive identical numbers that begin at an N-aligned
offset, for N being 2, 4 or 8, are likely to indicate an N-byte object
in the structure at that point. For example, the first 32 bytes of
this object are likely to have the layout</para>
<screen><![CDATA[
[0 ] 64-bit type
[8 ] 32-bit type
[12] 32-bit alignment hole
[16] 64-bit type
[24] 64-bit type
]]></screen>
<para>As a counterexample, it's also clear that, whatever is at offset 32,
it is not a 32-bit value. That's because the last number of the group
(37422) is not the same as the first three (18883 18883 18883).</para>
<para>This example leads one to enquire (by reading the source code)
whether the zeroes at 12--15 and 52--55 are alignment holes, and
whether 48--51 is indeed a 32-bit type. If so, it might be possible
to place what's at 48--51 at 12--15 instead, which would reduce
the object size from 56 to 48 bytes.</para>
<para>Bear in mind that the above inferences are all only "maybes". That's
because they are based on dynamic data, not static analysis of the
object layout. For example, the zeroes might not be alignment
holes, but rather just parts of the structure which were not used
at all for this particular run. Experience shows that's unlikely
to be the case, but it could happen.</para>
</sect2>
</sect1>
<sect1 id="dh-manual.options" xreflabel="DHAT Command-line Options">
<title>DHAT Command-line Options</title>
<para>DHAT-specific command-line options are:</para>
<!-- start of xi:include in the manpage -->
<variablelist id="dh.opts.list">
<varlistentry id="opt.show-top-n" xreflabel="--show-top-n">
<term>
<option><![CDATA[--show-top-n=<number>
[default: 10] ]]></option>
</term>
<listitem>
<para>At the end of the run, DHAT sorts the accumulated
allocation points according to some metric, and shows the
highest scoring entries. <varname>--show-top-n</varname>
controls how many entries are shown. The default of 10 is
quite small. For realistic applications you will probably need
to set it much higher, at least several hundred.</para>
</listitem>
</varlistentry>
<varlistentry id="opt.sort-by" xreflabel="--sort-by=string">
<term>
<option><![CDATA[--sort-by=<string> [default: max-bytes-live] ]]></option>
</term>
<listitem>
<para>At the end of the run, DHAT sorts the accumulated
allocation points according to some metric, and shows the
highest scoring entries. <varname>--sort-by</varname>
selects the metric used for sorting:</para>
<para><varname>max-bytes-live </varname> maximum live bytes [default]</para>
<para><varname>tot-bytes-allocd </varname> bytes allocates in total (turnover)</para>
<para><varname>max-blocks-live </varname> maximum live blocks</para>
<para><varname>tot-blocks-allocd </varname> blocks allocated in total (turnover)</para>
<para>This controls the order in which allocation points are
displayed. You can choose to look at allocation points with
the highest number of live bytes, or the highest total byte turnover, or
by the highest number of live blocks, or the highest total block
turnover. These give usefully different pictures of program behaviour.
For example, sorting by maximum live blocks tends to show up allocation
points creating large numbers of small objects.</para>
</listitem>
</varlistentry>
</variablelist>
<para>One important point to note is that each allocation stack counts
as a separate allocation point. Because stacks by default have 12
frames, this tends to spread data out over multiple allocation points.
You may want to use the flag --num-callers=4 or some such small
number, to reduce the spreading.</para>
<!-- end of xi:include in the manpage -->
</sect1>
</chapter>

View File

@ -1 +0,0 @@

View File

@ -7,7 +7,7 @@
This file is part of Valgrind, a dynamic binary instrumentation
framework.
Copyright (C) 2010-2017 Mozilla Inc
Copyright (C) 2010-2017 Mozilla Foundation
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as

View File

@ -1,6 +1,6 @@
//--------------------------------------------------------------------*/
//--- Massif: a heap profiling tool. ms_main.c ---*/
//--------------------------------------------------------------------*/
//--------------------------------------------------------------------//
//--- Massif: a heap profiling tool. ms_main.c ---//
//--------------------------------------------------------------------//
/*
This file is part of Massif, a Valgrind tool for profiling memory

View File

@ -50,12 +50,12 @@ file path=usr/lib/valgrind/cachegrind-x86-solaris owner=r
file path=usr/lib/valgrind/callgrind-amd64-solaris owner=root group=bin mode=0755
file path=usr/lib/valgrind/callgrind-x86-solaris owner=root group=bin mode=0755
file path=usr/lib/valgrind/default.supp owner=root group=bin mode=0644
file path=usr/lib/valgrind/dhat-amd64-solaris owner=root group=bin mode=0755
file path=usr/lib/valgrind/dhat-x86-solaris owner=root group=bin mode=0755
file path=usr/lib/valgrind/drd-amd64-solaris owner=root group=bin mode=0755
file path=usr/lib/valgrind/drd-x86-solaris owner=root group=bin mode=0755
file path=usr/lib/valgrind/exp-bbv-amd64-solaris owner=root group=bin mode=0755
file path=usr/lib/valgrind/exp-bbv-x86-solaris owner=root group=bin mode=0755
file path=usr/lib/valgrind/exp-dhat-amd64-solaris owner=root group=bin mode=0755
file path=usr/lib/valgrind/exp-dhat-x86-solaris owner=root group=bin mode=0755
file path=usr/lib/valgrind/exp-sgcheck-amd64-solaris owner=root group=bin mode=0755
file path=usr/lib/valgrind/exp-sgcheck-x86-solaris owner=root group=bin mode=0755
file path=usr/lib/valgrind/getoff-amd64-solaris owner=root group=bin mode=0755
@ -75,8 +75,8 @@ file path=usr/lib/valgrind/vgpreload_core-amd64-solaris.so owner=r
file path=usr/lib/valgrind/vgpreload_core-x86-solaris.so owner=root group=bin mode=0755
file path=usr/lib/valgrind/vgpreload_drd-amd64-solaris.so owner=root group=bin mode=0755
file path=usr/lib/valgrind/vgpreload_drd-x86-solaris.so owner=root group=bin mode=0755
file path=usr/lib/valgrind/vgpreload_exp-dhat-amd64-solaris.so owner=root group=bin mode=0755
file path=usr/lib/valgrind/vgpreload_exp-dhat-x86-solaris.so owner=root group=bin mode=0755
file path=usr/lib/valgrind/vgpreload_dhat-amd64-solaris.so owner=root group=bin mode=0755
file path=usr/lib/valgrind/vgpreload_dhat-x86-solaris.so owner=root group=bin mode=0755
file path=usr/lib/valgrind/vgpreload_exp-sgcheck-amd64-solaris.so owner=root group=bin mode=0755
file path=usr/lib/valgrind/vgpreload_exp-sgcheck-x86-solaris.so owner=root group=bin mode=0755
file path=usr/lib/valgrind/vgpreload_massif-amd64-solaris.so owner=root group=bin mode=0755

View File

@ -42,18 +42,18 @@ my %coregrind_dirs = (
);
my %tool_dirs = (
"none" => 1,
"lackey" => 1,
"massif" => 1,
"memcheck" => 1,
"drd" => 1,
"helgrind", => 1,
"callgrind" => 1,
"cachegrind" => 1,
"shared" => 1,
"callgrind" => 1,
"helgrind", => 1,
"drd" => 1,
"massif" => 1,
"dhat" => 1,
"lackey" => 1,
"none" => 1,
"exp-bbv" => 1,
"exp-dhat" => 1,
"exp-sgcheck" => 1
"shared" => 1,
);
my %dirs_to_ignore = (