Overhaul DHAT.

This commit thoroughly overhauls DHAT, moving it out of the "experimental" ghetto. It makes moderate changes to DHAT itself, including dumping profiling data to a JSON format output file. It also implements a new data viewer (as a web app, in dhat/dh_view.html). The main benefits over the old DHAT are as follows. - The separation of data collection and presentation means you can run a program once under DHAT and then sort the data in various ways. Also, full data is in the output file, and the viewer chooses what to omit. - The data can be sorted in more ways than previously. Some of these sorts involve useful filters such as "short-lived" and "zero reads or zero writes". - The tree structure view avoids the need to choose stack trace depth. This avoids both the problem of not enough depth (when records that should be distinct are combined, and may not contain enough information to be actionable) and the problem of too much depth (when records that should be combined are separated, making them seem less important than they really are). - Byte and block measures are shown with a percentage relative to the global count, which helps gauge relative significance of different parts of the profile. - Byte and blocks measures are also shown with an allocation rate (bytes and blocks per million instructions), which enables comparisons across multiple profiles, even if those profiles represent different workloads. - Both global and per-node measurements are taken at the global heap peak ("At t-gmax"), which gives Massif-like insight into the point of peak memory use. - The final/liftimes stats are a bit more useful than the old deaths stats. (E.g. the old deaths stats didn't take into account lifetimes of unfreed blocks.) - The handling of realloc() has changed. The sequence `p = malloc(100); realloc(p, 200);` now increases the total block count by 2 and the total byte count by 300. Previously it increased them by 1 and 200. The new handling is a more operational view that better reflects the effect of allocations on performance. It makes a significant difference in the results, giving paths involving reallocation (e.g. repeated pushing to a growing vector) more prominence. Other things of note: - There is now testing, both regression tests that run within the standard test suite, and viewer-specific tests that cannot run within the standard test suite. The latter are run by loading dh_view.html?test=1 in a web browser. - The commit puts all tool lists in Makefiles (and similar files) in the following consistent order: memcheck, cachegrind, callgrind, helgrind, drd, massif, dhat, lackey, none; exp-sgcheck, exp-bbv. - A lot of fields in dh_main.c have been given more descriptive names. Those names now match those used in dh_view.js.
2026-02-03 10:05:29 +00:00 · 2018-10-04 11:00:22 +10:00 · 2018-10-04 11:00:22 +10:00 · 441bfc5f51
commit 441bfc5f51
parent b19f6882cf
45 changed files with 5737 additions and 864 deletions
--- a/.gitignore
+++ b/.gitignore
@ -246,6 +246,34 @@
 /coregrind/m_ume/.deps
 /coregrind/m_ume/.dirstamp

+# /dhat/
+/dhat/*.dSYM
+/dhat/.deps
+/dhat/dhat-*-darwin
+/dhat/dhat-*-linux
+/dhat/dhat-*-solaris
+/dhat/Makefile
+/dhat/Makefile.in
+/dhat/vgpreload_dhat-*-linux.so
+/dhat/vgpreload_dhat-*-darwin.so
+/dhat/vgpreload_dhat-*-solaris.so
+
+# /dhat/tests/
+/dhat/tests/Makefile
+/dhat/tests/Makefile.in
+/dhat/tests/*.dSYM
+/dhat/tests/*.so
+/dhat/tests/*.stderr.diff*
+/dhat/tests/*.stderr.out
+/dhat/tests/*.stdout.diff*
+/dhat/tests/*.stdout.out
+/dhat/tests/.deps
+/dhat/tests/acc
+/dhat/tests/basic
+/dhat/tests/big
+/dhat/tests/empty
+/dhat/tests/single
+
 # /docs/
 /docs/FAQ.txt
 /docs/html
@ -496,22 +524,6 @@
 /exp-bbv/tests/x86-linux/Makefile
 /exp-bbv/tests/x86-linux/Makefile.in

-# /exp-dhat/
-/exp-dhat/*.dSYM
-/exp-dhat/.deps
-/exp-dhat/exp-dhat-*-darwin
-/exp-dhat/exp-dhat-*-linux
-/exp-dhat/exp-dhat-*-solaris
-/exp-dhat/Makefile
-/exp-dhat/Makefile.in
-/exp-dhat/vgpreload_exp-dhat-*-linux.so
-/exp-dhat/vgpreload_exp-dhat-*-darwin.so
-/exp-dhat/vgpreload_exp-dhat-*-solaris.so
-
-# /exp-dhat/tests/
-/exp-dhat/tests/Makefile
-/exp-dhat/tests/Makefile.in
-
 # /exp-sgcheck/
 /exp-sgcheck/*.dSYM
 /exp-sgcheck/.deps
--- a/Makefile.am
+++ b/Makefile.am
@ -6,15 +6,15 @@ include $(top_srcdir)/Makefile.all.am
 TOOLS =		memcheck \
 		cachegrind \
 		callgrind \
-		massif \
-		lackey \
-		none \
 		helgrind \
-		drd
+		drd \
+		massif \
+		dhat \
+		lackey \
+		none

 EXP_TOOLS = 	exp-sgcheck \
-		exp-bbv \
-		exp-dhat
+		exp-bbv

 # Put docs last because building the HTML is slow and we want to get
 # everything else working before we try it.
--- a/16
+++ b/16
@ -20,6 +20,20 @@ support for X86/macOS 10.13, AMD64/macOS 10.13.

 * ==================== TOOL CHANGES ====================

+* DHAT: 
+
+  - DHAT been thoroughly overhauled and improved. As a result, it has been
+    promoted from an experimental tool to a regular tool. Run it with
+    --tool=dhat instead of --tool=exp-dhat.
+
+  - DHAT now prints only minimal data when the program ends, instead writing
+    the bulk of the profiling data to a file. As a result, the --show-top-n and
+    --sort-by options have been removed.
+    
+  - Data files can be viewed with the new viewer, dh_view.html.
+    
+  - See the documentation for more details.
+
 * Cachegrind:

  - cg_annotate has a new option, --show-percs, which prints percentages next
@ -94,6 +108,8 @@ n-i-bz  Fix callgrind_annotate non deterministic order for equal total
 n-i-bz  callgrind_annotate --threshold=100 does not print all functions.
 n-i-bz  callgrind_annotate Use of uninitialized value in numeric gt (>)

+
+
 Release 3.14.0 (9 October 2018)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--- a/configure.ac
+++ b/configure.ac
@ -4636,9 +4636,14 @@ AC_CONFIG_FILES([
   callgrind/tests/Makefile
   helgrind/Makefile
   helgrind/tests/Makefile
+   drd/Makefile
+   drd/scripts/download-and-build-splash2
+   drd/tests/Makefile
   massif/Makefile
   massif/tests/Makefile
   massif/ms_print
+   dhat/Makefile
+   dhat/tests/Makefile
   lackey/Makefile
   lackey/tests/Makefile
   none/Makefile
@ -4664,9 +4669,6 @@ AC_CONFIG_FILES([
   none/tests/x86-solaris/Makefile
   exp-sgcheck/Makefile
   exp-sgcheck/tests/Makefile
-   drd/Makefile
-   drd/scripts/download-and-build-splash2
-   drd/tests/Makefile
   exp-bbv/Makefile
   exp-bbv/tests/Makefile
   exp-bbv/tests/x86/Makefile
@ -4674,8 +4676,6 @@ AC_CONFIG_FILES([
   exp-bbv/tests/amd64-linux/Makefile
   exp-bbv/tests/ppc32-linux/Makefile
   exp-bbv/tests/arm-linux/Makefile
-   exp-dhat/Makefile
-   exp-dhat/tests/Makefile
   shared/Makefile
   solaris/Makefile
 ])
--- a/coregrind/m_libcsetjmp.c
+++ b/coregrind/m_libcsetjmp.c
@ -7,7 +7,7 @@
   This file is part of Valgrind, a dynamic binary instrumentation
   framework.

-   Copyright (C) 2010-2017 Mozilla Inc
+   Copyright (C) 2010-2017 Mozilla Foundation

   This program is free software; you can redistribute it and/or
   modify it under the terms of the GNU General Public License as
--- a/coregrind/m_main.c
+++ b/coregrind/m_main.c
@ -1454,7 +1454,7 @@ Int valgrind_main ( Int argc, HChar **argv, HChar **envp )
       || 0 == VG_(strcmp)(VG_(clo_toolname), "helgrind")
       || 0 == VG_(strcmp)(VG_(clo_toolname), "drd")
       || 0 == VG_(strcmp)(VG_(clo_toolname), "massif")
-       || 0 == VG_(strcmp)(VG_(clo_toolname), "exp-dhat")) {
+       || 0 == VG_(strcmp)(VG_(clo_toolname), "dhat")) {
      /* Change the default setting.  Later on (just below)
         main_process_cmd_line_options should pick up any
         user-supplied setting for it and will override the default
--- a/coregrind/pub_core_libcsetjmp.h
+++ b/coregrind/pub_core_libcsetjmp.h
@ -7,7 +7,7 @@
   This file is part of Valgrind, a dynamic binary instrumentation
   framework.

-   Copyright (C) 2010-2017 Mozilla Inc
+   Copyright (C) 2010-2017 Mozilla Foundation

   This program is free software; you can redistribute it and/or
   modify it under the terms of the GNU General Public License as
--- a/exp-dhat/Makefile.am
+++ b/exp-dhat/Makefile.am
@ -11,89 +11,89 @@ EXTRA_DIST = docs/dh-manual.xml
 #bin_SCRIPTS = dh_print

 #----------------------------------------------------------------------------
-# exp_dhat-<platform>
+# dhat-<platform>
 #----------------------------------------------------------------------------

-noinst_PROGRAMS  = exp-dhat-@VGCONF_ARCH_PRI@-@VGCONF_OS@
+noinst_PROGRAMS  = dhat-@VGCONF_ARCH_PRI@-@VGCONF_OS@
 if VGCONF_HAVE_PLATFORM_SEC
-noinst_PROGRAMS += exp-dhat-@VGCONF_ARCH_SEC@-@VGCONF_OS@
+noinst_PROGRAMS += dhat-@VGCONF_ARCH_SEC@-@VGCONF_OS@
 endif

 EXP_DHAT_SOURCES_COMMON = dh_main.c

-exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_SOURCES      = \
+dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_SOURCES      = \
 	$(EXP_DHAT_SOURCES_COMMON)
-exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_CPPFLAGS     = \
+dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_CPPFLAGS     = \
 	$(AM_CPPFLAGS_@VGCONF_PLATFORM_PRI_CAPS@)
-exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_CFLAGS       = $(LTO_CFLAGS) \
+dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_CFLAGS       = $(LTO_CFLAGS) \
 	$(AM_CFLAGS_@VGCONF_PLATFORM_PRI_CAPS@)
-exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_DEPENDENCIES = \
+dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_DEPENDENCIES = \
 	$(TOOL_DEPENDENCIES_@VGCONF_PLATFORM_PRI_CAPS@)
-exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_LDADD        = \
+dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_LDADD        = \
 	$(TOOL_LDADD_@VGCONF_PLATFORM_PRI_CAPS@)
-exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_LDFLAGS      = \
+dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_LDFLAGS      = \
 	$(TOOL_LDFLAGS_@VGCONF_PLATFORM_PRI_CAPS@)
-exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_LINK = \
+dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_LINK = \
 	$(top_builddir)/coregrind/link_tool_exe_@VGCONF_OS@ \
 	@VALT_LOAD_ADDRESS_PRI@ \
 	$(LINK) \
-	$(exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_CFLAGS) \
-	$(exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_LDFLAGS)
+	$(dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_CFLAGS) \
+	$(dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_LDFLAGS)

 if VGCONF_HAVE_PLATFORM_SEC
-exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_SOURCES      = \
+dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_SOURCES      = \
 	$(EXP_DHAT_SOURCES_COMMON)
-exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_CPPFLAGS     = \
+dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_CPPFLAGS     = \
 	$(AM_CPPFLAGS_@VGCONF_PLATFORM_SEC_CAPS@)
-exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_CFLAGS       = $(LTO_CFLAGS) \
+dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_CFLAGS       = $(LTO_CFLAGS) \
 	$(AM_CFLAGS_@VGCONF_PLATFORM_SEC_CAPS@)
-exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_DEPENDENCIES = \
+dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_DEPENDENCIES = \
 	$(TOOL_DEPENDENCIES_@VGCONF_PLATFORM_SEC_CAPS@)
-exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_LDADD        = \
+dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_LDADD        = \
 	$(TOOL_LDADD_@VGCONF_PLATFORM_SEC_CAPS@)
-exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_LDFLAGS      = \
+dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_LDFLAGS      = \
 	$(TOOL_LDFLAGS_@VGCONF_PLATFORM_SEC_CAPS@)
-exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_LINK = \
+dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_LINK = \
 	$(top_builddir)/coregrind/link_tool_exe_@VGCONF_OS@ \
 	@VALT_LOAD_ADDRESS_SEC@ \
 	$(LINK) \
-	$(exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_CFLAGS) \
-	$(exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_LDFLAGS)
+	$(dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_CFLAGS) \
+	$(dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_LDFLAGS)
 endif

 #----------------------------------------------------------------------------
-# vgpreload_exp_dhat-<platform>.so
+# vgpreload_dhat-<platform>.so
 #----------------------------------------------------------------------------

-noinst_PROGRAMS += vgpreload_exp-dhat-@VGCONF_ARCH_PRI@-@VGCONF_OS@.so
+noinst_PROGRAMS += vgpreload_dhat-@VGCONF_ARCH_PRI@-@VGCONF_OS@.so
 if VGCONF_HAVE_PLATFORM_SEC
-noinst_PROGRAMS += vgpreload_exp-dhat-@VGCONF_ARCH_SEC@-@VGCONF_OS@.so
+noinst_PROGRAMS += vgpreload_dhat-@VGCONF_ARCH_SEC@-@VGCONF_OS@.so
 endif

 if VGCONF_OS_IS_DARWIN
 noinst_DSYMS = $(noinst_PROGRAMS)
 endif

-vgpreload_exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_so_SOURCES      = 
-vgpreload_exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_so_CPPFLAGS     = \
+vgpreload_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_so_SOURCES      = 
+vgpreload_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_so_CPPFLAGS     = \
 	$(AM_CPPFLAGS_@VGCONF_PLATFORM_PRI_CAPS@)
-vgpreload_exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_so_CFLAGS       = \
+vgpreload_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_so_CFLAGS       = \
 	$(AM_CFLAGS_PSO_@VGCONF_PLATFORM_PRI_CAPS@)
-vgpreload_exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_so_DEPENDENCIES = \
+vgpreload_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_so_DEPENDENCIES = \
 	$(LIBREPLACEMALLOC_@VGCONF_PLATFORM_PRI_CAPS@)
-vgpreload_exp_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_so_LDFLAGS      = \
+vgpreload_dhat_@VGCONF_ARCH_PRI@_@VGCONF_OS@_so_LDFLAGS      = \
 	$(PRELOAD_LDFLAGS_@VGCONF_PLATFORM_PRI_CAPS@) \
 	$(LIBREPLACEMALLOC_LDFLAGS_@VGCONF_PLATFORM_PRI_CAPS@)

 if VGCONF_HAVE_PLATFORM_SEC
-vgpreload_exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_so_SOURCES      = 
-vgpreload_exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_so_CPPFLAGS     = \
+vgpreload_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_so_SOURCES      = 
+vgpreload_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_so_CPPFLAGS     = \
 	$(AM_CPPFLAGS_@VGCONF_PLATFORM_SEC_CAPS@)
-vgpreload_exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_so_CFLAGS       = \
+vgpreload_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_so_CFLAGS       = \
 	$(AM_CFLAGS_PSO_@VGCONF_PLATFORM_SEC_CAPS@)
-vgpreload_exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_so_DEPENDENCIES = \
+vgpreload_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_so_DEPENDENCIES = \
 	$(LIBREPLACEMALLOC_@VGCONF_PLATFORM_SEC_CAPS@)
-vgpreload_exp_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_so_LDFLAGS      = \
+vgpreload_dhat_@VGCONF_ARCH_SEC@_@VGCONF_OS@_so_LDFLAGS      = \
 	$(PRELOAD_LDFLAGS_@VGCONF_PLATFORM_SEC_CAPS@) \
 	$(LIBREPLACEMALLOC_LDFLAGS_@VGCONF_PLATFORM_SEC_CAPS@)
 endif
--- a/exp-dhat/dh_main.c
+++ b/exp-dhat/dh_main.c
--- a/dhat/dh_test.js
+++ b/dhat/dh_test.js
--- a/dhat/dh_view.css
+++ b/dhat/dh_view.css
@ -0,0 +1,130 @@
+
+/*--------------------------------------------------------------------*/
+/*--- DHAT: a Dynamic Heap Analysis Tool               dh_view.css ---*/
+/*--------------------------------------------------------------------*/
+
+/*
+   This file is part of DHAT, a Valgrind tool for profiling the
+   heap usage of programs.
+
+   Copyright (C) 2018 Mozilla Foundation
+
+   This program is free software; you can redistribute it and/or
+   modify it under the terms of the GNU General Public License as
+   published by the Free Software Foundation; either version 2 of the
+   License, or (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307, USA.
+
+   The GNU General Public License is contained in the file COPYING.
+*/
+
+html {
+  background: #cfcfcf; /* pale grey */
+}
+
+.section {
+  border-radius: 10px;
+  background-color: white;
+  padding: 1em;
+  margin: 0.5em 0;
+}
+
+div.header {
+  font-weight: bold;
+  display: inline-block;
+  margin: 0 1.5em 0 0;
+  border-radius: 10px;
+  padding: 0.5em;
+  background-color: #cfcfcf; /* pale grey */
+  -moz-user-select: none;
+  -webkit-user-select: none;
+  -ms-user-select: none;
+  user-select: none;
+}
+
+.hidden {
+  display: none;
+}
+
+.error {
+  color: red;
+}
+
+.invocation {
+  background-color: #bfd7d7; /* pale blue-grey */
+}
+
+.times {
+  background-color: #efdfbf; /* pale brown */
+}
+
+.arrow, .treeline {
+  background-color: white;
+}
+
+.internal {
+  cursor: pointer;
+}
+
+/* increasingly pale shades of green */
+.leaf.lt100 { background-color: #7fff7f; }
+.leaf.lt32  { background-color: #8fff8f; }
+.leaf.lt16  { background-color: #9fff9f; }
+.leaf.lt8   { background-color: #afffaf; }
+.leaf.lt4   { background-color: #bfffbf; }
+.leaf.lt2   { background-color: #cfffcf; }
+.leaf.lt1   { background-color: #dfffdf; }
+.leaf.insig { background-color: #efffef; }
+
+/* increasingly pale shades of yellow */
+.collapsed.lt100 { background-color: #ffff7f; }
+.collapsed.lt32  { background-color: #ffff8f; }
+.collapsed.lt16  { background-color: #ffff9f; }
+.collapsed.lt8   { background-color: #ffffaf; }
+.collapsed.lt4   { background-color: #ffffbf; }
+.collapsed.lt2   { background-color: #ffffcf; }
+.collapsed.lt1   { background-color: #ffffdf; }
+.collapsed.insig { background-color: #ffffef; }
+
+/* increasingly pale shades of blue */
+.expanded.lt100 { background-color: #7f7fff; }
+.expanded.lt32  { background-color: #8f8fff; }
+.expanded.lt16  { background-color: #9f9fff; }
+.expanded.lt8   { background-color: #afafff; }
+.expanded.lt4   { background-color: #bfbfff; }
+.expanded.lt2   { background-color: #cfcfff; }
+.expanded.lt1   { background-color: #dfdfff; }
+.expanded.insig { background-color: #efefff; }
+
+.bold {
+  font-weight: bold;
+}
+
+.threshold {
+  background-color: #dfdfdf; /* pale grey */
+}
+
+.noselect {
+  -moz-user-select: none;
+  -webkit-user-select: none;
+  -ms-user-select: none;
+  user-select: none;
+}
+
+.legend, .timings {
+  font-size: 80%;
+  padding: 0 1em;
+}
+
+.debug {
+  font-size: 80%;
+}
--- a/dhat/dh_view.html
+++ b/dhat/dh_view.html
@ -0,0 +1,10 @@
+<!DOCTYPE html>
+<html>
+  <head>
+    <meta charset="UTF-8">
+    <link rel="stylesheet" href="dh_view.css">
+    <script src="dh_view.js"></script>
+  </head>
+
+  <body onload="onLoad()"></body>
+</html>
--- a/dhat/dh_view.js
+++ b/dhat/dh_view.js
--- a/dhat/docs/dh-manual.xml
+++ b/dhat/docs/dh-manual.xml
@ -0,0 +1,654 @@
+<?xml version="1.0"?> <!-- -*- sgml -*- -->
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+          "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
+[ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
+
+
+<chapter id="dh-manual" 
+         xreflabel="DHAT: a dynamic heap analysis tool">
+  <title>DHAT: a dynamic heap analysis tool</title>
+
+<para>To use this tool, you must specify
+<option>--tool=dhat</option> on the Valgrind command line.</para>
+
+
+
+<sect1 id="dh-manual.overview" xreflabel="Overview">
+<title>Overview</title>
+
+<para>DHAT is a tool for examining how programs use their heap
+allocations.</para>
+
+<para>It tracks the allocated blocks, and inspects every memory access
+to find which block, if any, it is to. It presents, on an allocation point
+basis, information about these blocks such as sizes, lifetimes, numbers of
+reads and writes, and read and write patterns.</para>
+
+<para>Using this information it is possible to identify allocation points with
+the following characteristics:</para>
+
+<itemizedlist>
+
+  <listitem><para>potential process-lifetime leaks: blocks allocated
+   by the point just accumulate, and are freed only at the end of the
+   run.</para></listitem>
+
+ <listitem><para>excessive turnover: points which chew through a lot
+  of heap, even if it is not held onto for very long</para></listitem>
+
+ <listitem><para>excessively transient: points which allocate very
+ short lived blocks</para></listitem>
+
+ <listitem><para>useless or underused allocations: blocks which are
+  allocated but not completely filled in, or are filled in but not
+  subsequently read.</para></listitem>
+
+ <listitem><para>blocks with inefficient layout -- areas never
+  accessed, or with hot fields scattered throughout the
+  block.</para></listitem>
+</itemizedlist>
+
+<para>As with the Massif heap profiler, DHAT measures program progress
+by counting instructions, and so presents all age/time related figures
+as instruction counts. This sounds a little odd at first, but it
+makes runs repeatable in a way which is not possible if CPU time is
+used.</para>
+
+</sect1>
+
+
+
+<sect1 id="dh-manual.profile" xreflabel="Using DHAT">
+<title>Using DHAT</title>
+
+<para>First off, as for normal Valgrind use, you probably want to compile with
+debugging info (the <option>-g</option> option). But by contrast with normal
+Valgrind use, you probably do want to turn optimisation on, since you should
+profile your program as it will be normally run.</para>
+
+<para>Second, you need to run your program under DHAT to gather the profiling
+information.</para>
+
+<para>Finally, you need to use DHAT's viewer (in a web browser) to get a
+detailed presentation of that information.</para>
+
+
+<sect2 id="dh-manual.running-DHAT" xreflabel="Running DHAT">
+<title>Running DHAT</title>
+
+<para>To run DHAT on a program <filename>prog</filename>, run:</para>
+<screen><![CDATA[
+valgrind --tool=dhat prog
+]]></screen>
+
+<para>The program will execute (slowly). Upon completion, summary statistics
+that look like this will be printed:</para>
+
+<programlisting><![CDATA[
+==11514== Total:     823,849,731 bytes in 3,929,133 blocks
+==11514== At t-gmax: 133,485,082 bytes in 436,521 blocks
+==11514== At t-end:  258,002 bytes in 2,129 blocks
+==11514== Reads:     2,807,182,810 bytes
+==11514== Writes:    1,149,617,086 bytes
+]]></programlisting>
+
+<para>The first line shows how many heap blocks and bytes were allocated over
+the entire execution.</para>
+
+<para>The second line shows how many heap blocks and bytes were alive at
+<computeroutput>t-gmax</computeroutput>, i.e. the time when the heap size
+reached its global maximum (as measured in bytes).</para>
+
+<para>The third line shows how many heap blocks and bytes were alive at
+<computeroutput>t-end</computeroutput>, i.e. the end of execution. In other
+words, how many blocks and bytes were not explicitly freed. </para>
+
+<para>The fourth and fifth lines show how many bytes within heap blocks were
+read and written during the entire execution. </para>
+
+<para>These lines are moderately interesting at best. More useful information
+can be seen with DHAT's viewer.</para>
+
+</sect2>
+
+
+<sect2 id="dh-manual.outputfile" xreflabel="Output File">
+<title>Output File</title>
+
+<para>As well as printing summary information, DHAT also writes more detailed
+profiling information to a file. By default this file is named
+<filename>dhat.out.&lt;pid&gt;</filename> (where
+<filename>&lt;pid&gt;</filename> is the program's process ID), but its name can
+be changed with the <option>--dhat-out-file</option> option. This file is JSON,
+and intended to be viewed by DHAT's viewer, which is described in the next
+section.</para>
+
+<para>The default <computeroutput>.&lt;pid&gt;</computeroutput> suffix on the
+output file name serves two purposes. Firstly, it means you don't have to
+rename old log files that you don't want to overwrite. Secondly, and more
+importantly, it allows correct profiling with the
+<option>--trace-children=yes</option> option of programs that spawn child
+processes.</para>
+
+<para>The output file can be big, many megabytes for large applications
+built with full debugging information.</para>
+
+</sect2>
+
+</sect1>
+
+
+
+<sect1 id="dh-manual.viewer" xreflabel="DHAT's viewer">
+<title>DHAT's Viewer</title>
+
+<para>DHAT's viewer can be run in a web browser by loading the file
+<computeroutput>dh_view.html</computeroutput>. Use the "Load" button to choose
+a DHAT output file to view.</para>
+
+
+<sect2><title>The Output Header</title>
+
+<para>The first part of the output shows the program command and process ID.
+For example:</para>
+
+<programlisting><![CDATA[
+Invocation {
+  Command: /home/njn/moz/rust0/build/x86_64-unknown-linux-gnu/stage2/bin/rustc --crate-name tuple_stress src/main.rs
+  PID:     18816
+}
+]]></programlisting>
+
+<para>The second part of the output shows the
+<computeroutput>t-gmax</computeroutput> and
+<computeroutput>t-end</computeroutput> values again. For example:</para>
+
+<programlisting><![CDATA[
+Times {
+  t-gmax: 8,138,210,673 instrs (86.92% of program duration)
+  t-end:  9,362,544,994 instrs
+}
+]]></programlisting>
+
+</sect2>
+
+
+<sect2><title>The AP Tree</title>
+
+<para>The third part of the output is the largest and most interesting part,
+showing the allocation point (AP) tree.</para>
+
+
+<sect3><title>Structure</title>
+
+The following image shows a screenshot of part of an AP tree. The font is very
+small because this screenshot is intended to demonstrate the high-level
+structure of the tree rather than the details within the text.
+
+<graphic fileref="images/dh-tree.png" scalefit="1"/>
+
+<para>Like any tree, it has a root node, leaf nodes, and non-leaf nodes. The
+structure of the tree is shown by the lines connecting nodes. Child nodes are
+beneath their parent and indented one level.</para>
+
+<para>The sub-trees beneath a non-leaf node can be collapsed or expanded by
+clicking on the node. It is useful to collapse sub-trees that you aren't
+interested in.</para>
+
+<para>Colours are meaningful, and are intended to ease tree navigation, but the
+information they represent is also present within the text. (This means that
+colour-blind users are not denied any information.)</para>
+
+<para>Each leaf node is coloured green. Each non-leaf node is coloured blue
+and has a down arrow (<computeroutput>▼</computeroutput>) next to it when
+its sub-tree is expanded. Each non-leaf node is coloured yellow and has a
+left arrow (<computeroutput>▶</computeroutput>) next to it when its sub-tree
+is collapsed.</para>
+
+<para>The shade of green, blue or yellow used for a node indicate its
+significance. Darker shades represent greater significance (in terms of bytes
+or blocks).</para>
+
+<para>Note that the entire output is text, even the arrows and lines connecting
+nodes. This means you can copy and paste any part of the output easily into an
+email, bug report, etc.</para>
+
+</sect3>
+
+
+<sect3><title>The Root Node</title>
+
+<para>The root node looks like this:</para>
+
+<programlisting><![CDATA[
+AP 1/1 (25 children) {
+  Total:     1,355,253,987 bytes (100%, 67,454.81/Minstr) in 5,943,417 blocks (100%, 295.82/Minstr), avg size 228.03 bytes, avg lifetime 3,134,692,250.67 instrs (15.6% of program duration)
+  At t-gmax: 423,930,307 bytes (100%) in 1,575,682 blocks (100%), avg size 269.05 bytes
+  At t-end:  258,002 bytes (100%) in 2,129 blocks (100%), avg size 121.18 bytes
+  Reads:     5,478,606,988 bytes (100%, 272,685.7/Minstr), 4.04/byte
+  Writes:    2,040,294,800 bytes (100%, 101,551.22/Minstr), 1.51/byte
+  Allocated at {
+    #0: [root]
+  }
+}
+]]></programlisting>
+
+<para>The root node covers the entire execution. The information is a superset
+of the information shown when DHAT ran, adding details such as allocation
+rates, average block sizes, block lifetimes, and read and write ratios. The
+next example will explain these in more detail.</para>
+
+</sect3>
+
+
+<sect3><title>Interior Nodes</title>
+
+<para>AP nodes further down the tree show information about a subset of
+allocations. For example:</para>
+
+<programlisting><![CDATA[
+AP 1.1/25 (2 children) {
+  Total:     54,533,440 bytes (4.02%, 2,714.28/Minstr) in 458,839 blocks (7.72%, 22.84/Minstr), avg size 118.85 bytes, avg lifetime 1,127,259,403.64 instrs (5.61% of program duration)
+  At t-gmax: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
+  At t-end:  0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
+  Reads:     15,993,012 bytes (0.29%, 796.02/Minstr), 0.29/byte
+  Writes:    20,974,752 bytes (1.03%, 1,043.97/Minstr), 0.38/byte
+  Allocated at {
+    #1: 0x95CACC9: alloc (alloc.rs:72)
+    #2: 0x95CACC9: alloc (alloc.rs:148)
+    #3: 0x95CACC9: reserve_internal<syntax::tokenstream::TokenStream,alloc::alloc::Global> (raw_vec.rs:669)
+    #4: 0x95CACC9: reserve<syntax::tokenstream::TokenStream,alloc::alloc::Global> (raw_vec.rs:492)
+    #5: 0x95CACC9: reserve<syntax::tokenstream::TokenStream> (vec.rs:460)
+    #6: 0x95CACC9: push<syntax::tokenstream::TokenStream> (vec.rs:989)
+    #7: 0x95CACC9: parse_token_trees_until_close_delim (tokentrees.rs:27)
+    #8: 0x95CACC9: syntax::parse::lexer::tokentrees::<impl syntax::parse::lexer::StringReader<'a>>::parse_token_tree (tokentrees.rs:81)
+  }
+}
+]]></programlisting>
+
+<para>The first line indicates the node's position in the tree. The
+<computeroutput>1.1</computeroutput> is a unique identifier for the node and
+also says that it is the first child node <computeroutput>1</computeroutput>
+(which is the root). The <computeroutput>/25</computeroutput> says that it is
+one of 25 children, i.e. it has 24 siblings. The <computeroutput>(2
+children)</computeroutput> says that this node node has two children of its
+own.</para>
+
+<para>Allocations are aggregated by their allocation stack trace. The
+<computeroutput>Allocated at</computeroutput> section shows the allocation
+stack trace that is shared by all the blocks covered by this node.</para>
+
+<para>The <computeroutput>Total</computeroutput> line shows that this node
+accounts for 4.02% of all bytes allocated during execution, and 7.72% of all
+blocks. These percentages are useful for comparing the significance of
+different nodes within a single profile; an AP that accounts for 10% of bytes
+allocated is likely to be more interesting than one that accounts for
+2%.</para>
+
+<para>The <computeroutput>Total</computeroutput> line also shows allocation
+rates, measured in bytes and blocks per million instructions. These rates are
+useful for comparing the significance of nodes across profiles made with
+different workloads.</para>
+
+<para>Finally, the <computeroutput>Total</computeroutput> line shows the
+average size and lifetimes of these blocks.</para>
+
+<para>The <computeroutput>At t-gmax</computeroutput> line says shows that no
+blocks from this AP were alive when the global heap peak occurred. In other
+words, these blocks do not contribute at all to the global heap peak.</para>
+
+<para>The <computeroutput>At t-end</computeroutput> line shows that no blocks
+were from this AP were alive at shutdown. In other words, all those blocks were
+explicitly freed before termination.</para>
+
+<para>The <computeroutput>Reads</computeroutput> and
+<computeroutput>Writes</computeroutput> lines show how many bytes were read 
+within this AP's blocks, the fraction this represents of all heap reads, and
+the read rate. Finally, it shows the read ratio, which is the number of reads
+per byte. In this case the number is 0.29, which is quite low -- if no byte was
+read twice, then only 29% of the allocated bytes, which means that at least 71%
+of the bytes were never read! This suggests that the blocks are being
+underutilized and might be worth optimizing.</para>
+
+<para>The <computeroutput>Writes</computeroutput> lines is similar to the
+<computeroutput>Reads</computeroutput> line. In this case, at most 38% of the
+bytes are ever written, and at least 62% of the bytes were never written.
+</para>
+
+<para>The <computeroutput>Reads</computeroutput> and
+<computeroutput>Writes</computeroutput> measurements suggest that the blocks
+are being under-utilised and might be worth optimizing. Having said that, this
+kind of under-utilisation is common in data structures that grow, such as
+vectors and hash tables, and isn't always fixable. </para>
+
+</sect3>
+
+
+<sect3><title>Leaf Nodes</title>
+
+<para>This is a leaf node:</para>
+
+<programlisting><![CDATA[
+AP 1.1.1.1/2 {
+  Total:     31,460,928 bytes (2.32%, 1,565.9/Minstr) in 262,171 blocks (4.41%, 13.05/Minstr), avg size 120 bytes, avg lifetime 986,406,885.05 instrs (4.91% of program duration)
+  Max:       16,779,136 bytes in 65,543 blocks, avg size 256 bytes
+  At t-gmax: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
+  At t-end:  0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
+  Reads:     5,964,704 bytes (0.11%, 296.88/Minstr), 0.19/byte
+  Writes:    10,487,200 bytes (0.51%, 521.98/Minstr), 0.33/byte
+  Allocated at {
+    ^1: 0x95CACC9: alloc (alloc.rs:72)
+    ^2: 0x95CACC9: alloc (alloc.rs:148)
+    ^3: 0x95CACC9: reserve_internal<syntax::tokenstream::TokenStream,alloc::alloc::Global> (raw_vec.rs:669)
+    ^4: 0x95CACC9: reserve<syntax::tokenstream::TokenStream,alloc::alloc::Global> (raw_vec.rs:492)
+    ^5: 0x95CACC9: reserve<syntax::tokenstream::TokenStream> (vec.rs:460)
+    ^6: 0x95CACC9: push<syntax::tokenstream::TokenStream> (vec.rs:989)
+    ^7: 0x95CACC9: parse_token_trees_until_close_delim (tokentrees.rs:27)
+    ^8: 0x95CACC9: syntax::parse::lexer::tokentrees::<impl syntax::parse::lexer::StringReader<'a>>::parse_token_tree (tokentrees.rs:81)
+    ^9: 0x95CAC39: parse_token_trees_until_close_delim (tokentrees.rs:26)
+    ^10: 0x95CAC39: syntax::parse::lexer::tokentrees::<impl syntax::parse::lexer::StringReader<'a>>::parse_token_tree (tokentrees.rs:81)
+    #11: 0x95CAC39: parse_token_trees_until_close_delim (tokentrees.rs:26)
+    #12: 0x95CAC39: syntax::parse::lexer::tokentrees::<impl syntax::parse::lexer::StringReader<'a>>::parse_token_tree (tokentrees.rs:81)
+  }
+}
+]]></programlisting>
+
+<para>The <computeroutput>1.1.1.1/2</computeroutput> indicates that this node
+is a great-grandchild of the root; is the first grandchild of the node in the
+previous example; and has no children.</para>
+
+<para>Leaf nodes contain an additional <computeroutput>Max</computeroutput>
+line, indicating the peak memory use for the blocks covered by this AP. (This
+peak may have occurred at a time other than
+<computeroutput>t-gmax</computeroutput>.) In this case, 31,460,298 bytes were
+allocated from this AP, but the maximum size alive at once was 16,779,136
+bytes.</para>
+
+<para>Stack frames that begin with a <computeroutput>^</computeroutput> rather
+than a <computeroutput>#</computeroutput> are copied from ancestor nodes.
+(In this example, the first 8 frames are identical to those from the node in
+the previous example.) These frames could be found by tracing back through
+ancestor nodes, but that can be annoying, which is why they are duplicated.
+This also means that each node makes complete sense on its own.</para>
+
+</sect3>
+
+
+<sect3><title>Access Counts</title>
+
+<para>If all blocks covered by an AP node have the same size, an additional
+<computeroutput>Accesses</computeroutput> field will be present. It indicates
+how the reads and writes within these blocks were distributed. For
+example:</para>
+
+<programlisting><![CDATA[
+Total:     8,388,672 bytes (0.62%, 417.53/Minstr) in 262,146 blocks (4.41%, 13.05/Minstr), avg size 32 bytes, avg lifetime 16,726,078,401.51 instrs (83.25% of program duration)
+At t-gmax: 8,388,672 bytes (1.98%) in 262,146 blocks (16.64%), avg size 32 bytes
+At t-end:  0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
+Reads:     9,109,682 bytes (0.17%, 453.41/Minstr), 1.09/byte
+Writes:    7,340,088 bytes (0.36%, 365.34/Minstr), 0.88/byte
+Accesses: {
+  [  0]  65547 7 8 4 65529 〃 〃 〃 16 〃 〃 〃 12 〃 〃 〃 〃 〃 〃 〃 〃 〃 〃 〃 65542 〃 〃 〃 - - - - 
+}
+]]></programlisting>
+
+<para>Every block covered by this AP was 32 bytes. Within all of those blocks,
+byte 0 was accessed (read or written) 65,547 times, byte 1 was accessed 7
+times, byte 2 was accessed 8 times, and so on.</para>
+
+<para>The ditto symbol (<computeroutput>〃</computeroutput>) means "same access
+count as the previous byte".</para>
+
+<para>A dash (<computeroutput>-</computeroutput>) means "zero". (It is used
+instead of <computeroutput>0</computeroutput> because it makes unaccessed
+regions more easily identifiable.)</para>
+
+<para>The infinity symbol (<computeroutput>∞</computeroutput>, not present in
+this example) means "exceeded the maximum tracked count".</para>
+
+<para>Block layout can often be inferred from counts. For example, these blocks
+probably have four separate byte-sized fields, followed by a four-byte field,
+and so on.</para>
+
+<para>Access counts can be useful for identifying data alignment holes or other
+layout inefficiencies.</para>
+
+</sect3>
+
+
+<sect3><title>Aggregate Nodes</title>
+
+<para>The AP tree is very large and many nodes represent tiny numbers of blocks
+and bytes. Therefore, DHAT's viewer aggregates insignificant nodes like
+this:</para>
+
+<programlisting><![CDATA[
+AP 1.14.2/2 {
+  Total:     5,175 blocks (0.09%, 0.26/Minstr)
+  Allocated at {
+    [5 insignificant]
+  }
+}
+]]></programlisting>
+
+<para>Much of the detail is stripped away, leaving only basic measurements,
+along with an indication of how many nodes were aggregated together (5 in this
+case).</para>
+
+</sect3>
+
+</sect2>
+
+
+<sect2><title>The Output Footer</title>
+
+<para>Below the AP tree is a line like this:</para>
+
+<programlisting><![CDATA[
+AP significance threshold: total >= 59,434.17 blocks (1%)
+]]></programlisting>
+
+<para>It shows the function used to determine if an AP node is significant. All
+nodes that don't satisfy this function are aggregated. It is occasionally
+useful if you don't understand why an AP node has been aggregated. The exact
+threshold depends on the sort metric (see below).</para>
+
+<para>Finally, the bottom of the page shows a legend that explains some of the
+terms, abbreviations and symbols used in the output.</para>
+
+</sect2>
+
+
+<sect2><title>Sort Metrics</title>
+
+<para>The order in which sub-trees are sorted can be changed via the "Sort
+metric" drop-down menu at the top of DHAT's viewer. Different sort metrics can
+be useful for finding different things. Some sort metrics also incorporate some
+filtering, so that only nodes meeting a particular criteria are shown.</para>
+
+<!-- start of xi:include in the manpage -->
+<variablelist>
+
+  <varlistentry>
+    <term>Total (bytes)</term>
+    <listitem><para>The total number of bytes allocated during the execution.
+    Highly useful for evaluating heap churn, though not quite as useful as
+    "Total (blocks)".
+    </para></listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>Total (blocks)</term>
+    <listitem><para>The total number of blocks allocated during the execution.
+    Highly useful for evaluating heap churn; reducing the number of calls to
+    the allocator can significantly speed up a program. This is the default
+    sort metric.
+    </para></listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>Total (blocks), tiny</term>
+    <listitem><para>Like "Total (blocks)", but shows only very small blocks.
+    Moderately useful, because such blocks are often easy to avoid allocating.
+    </para></listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>Total (blocks), short-lived</term>
+    <listitem><para>Like "Total (blocks)", but shows only very short-lived
+    blocks. Moderately useful, because such blocks are often easy to avoid
+    allocating.
+    </para></listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>Total (bytes), zero reads or zero writes</term>
+    <listitem><para>Like "Total (bytes)", but shows only blocks that are 
+    never read or never written to (or both). Highly useful, because such
+    blocks indicate poor use of memory and are often easy to avoid allocating.
+    For example, sometimes a block is allocated and written to but then only
+    read if a condition C is true; in that case, it may be possible to delay
+    creating the block until condition C is true. Alternatively, sometimes
+    blocks are created and never used; such blocks are trivial to remove.
+    </para></listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>Total (blocks), zero reads or zero writes</term>
+    <listitem><para>Like "Total (bytes), zero reads or zero writes" but for
+    blocks. Highly useful.
+    </para></listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>Total (bytes), low-access</term>
+    <listitem><para>Like "Total (bytes)", but shows only blocks that have low
+    numbers of reads or low numbers of writes (or both). Moderately useful,
+    because such blocks indicate poor use of memory.
+    </para></listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>Total (blocks), low-access</term>
+    <listitem><para>Like "Total (bytes), low-access", but for blocks.
+    </para></listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>At t-gmax (bytes)</term>
+    <listitem><para>This shows the breakdown of memory at the point of peak
+    heap memory usage. Highly useful for reducing peak memory usage.
+    </para></listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>At t-end (bytes)</term>
+    <listitem><para>This shows the breakdown of memory at program termination.
+    Highly useful for identifying process-lifetime leaks.
+    </para></listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>Reads (bytes)</term>
+    <listitem><para>The number of bytes read within heap blocks. Occasionally
+    useful.
+    </para></listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>Reads (bytes), high-access</term>
+    <listitem><para>Like "Reads (bytes)", but only shows blocks with high read
+    ratios. Occasionally useful for identifying hot areas of memory.
+    </para></listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>Writes (bytes)</term>
+    <listitem><para>Like "Reads (bytes)", but for writes. Occasionally useful.
+    </para></listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>Writes (bytes), high-access</term>
+    <listitem><para>Like "Reads (bytes), high-access", but for writes.
+    Occasionally useful.
+    </para></listitem>
+  </varlistentry>
+
+</variablelist>
+
+<para>The values within a node that represent the chosen sort metric are shown
+in bold, so they stand out.</para>
+
+<para>Here is part of an AP node found with "Total (blocks), tiny", showing
+blocks with an average size of only 8.67 bytes:</para>
+
+<programlisting><![CDATA[
+Total:     3,407,848 bytes (0.25%, 169.62/Minstr) in 393,214 blocks (6.62%, 19.57/Minstr), avg size 8.67 bytes, avg lifetime 1,167,795,629.1 instrs (5.81% of program duration)
+]]></programlisting>
+
+<para>Here is part of an AP node found with "Total (blocks), short-lived",
+showing blocks with an average lifetime of only 181.75 instructions:</para>
+
+<programlisting><![CDATA[
+Total:     23,068,584 bytes (1.7%, 1,148.19/Minstr) in 262,143 blocks (4.41%, 13.05/Minstr), avg size 88 bytes, avg lifetime 181.75 instrs (0% of program duration)
+]]></programlisting>
+
+<para>Here is an example of an AP identified with "Total (blocks), zero reads
+or zero writes", showing blocks that are allocated but never touched:</para>
+
+<programlisting><![CDATA[
+Total:     7,339,920 bytes (0.54%, 365.33/Minstr) in 262,140 blocks (4.41%, 13.05/Minstr), avg size 28 bytes, avg lifetime 1,141,103,997.69 instrs (5.68% of program duration)
+Max:       3,669,960 bytes in 131,070 blocks, avg size 28 bytes
+At t-gmax: 3,336,400 bytes (0.79%) in 119,157 blocks (7.56%), avg size 28 bytes
+At t-end:  0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
+Reads:     0 bytes (0%, 0/Minstr), 0/byte
+Writes:    0 bytes (0%, 0/Minstr), 0/byte
+]]></programlisting>
+
+<para>All the blocks identified by these APs are good candidates for
+optimization.</para>
+
+</sect2>
+
+</sect1>
+
+
+
+<sect1 id="dh-manual.options" xreflabel="DHAT Command-line Options">
+<title>DHAT Command-line Options</title>
+
+<para>DHAT-specific command-line options are:</para>
+
+<!-- start of xi:include in the manpage -->
+<variablelist id="dh.opts.list">
+
+  <varlistentry id="opt.dhat-out-file" xreflabel="--dhat-out-file">
+    <term>
+      <option><![CDATA[--dhat-out-file=<file> ]]></option>
+    </term>
+    <listitem>
+      <para>Write the profile data to 
+            <computeroutput>file</computeroutput> rather than to the default
+            output file,
+            <filename>dhat.out.&lt;pid&gt;</filename>. The
+            <option>%p</option> and <option>%q</option> format specifiers
+            can be used to embed the process ID and/or the contents of an
+            environment variable in the name, as is the case for the core
+            option <option><xref linkend="opt.log-file"/></option>.
+      </para>
+    </listitem>
+  </varlistentry>
+
+</variablelist>
+
+<para>Note that stacks by default have 12 frames. This may be more than
+necessary, in which case the <option>--num-callers</option> flag can be used to
+reduce the number, which may make DHAT run slightly faster.
+</para>
+
+<!-- end of xi:include in the manpage -->
+
+</sect1>
+
+</chapter>
--- a/dhat/tests/Makefile.am
+++ b/dhat/tests/Makefile.am
@ -0,0 +1,23 @@
+
+include $(top_srcdir)/Makefile.tool-tests.am
+
+dist_noinst_SCRIPTS = filter_stderr
+
+EXTRA_DIST = \
+	acc.post.exp acc.stderr.exp acc.vgtest \
+	basic.post.exp basic.stderr.exp basic.vgtest \
+	big.post.exp big.stderr.exp big.vgtest \
+	empty.post.exp empty.stderr.exp empty.vgtest \
+	single.post.exp single.stderr.exp single.vgtest
+
+check_PROGRAMS = \
+	acc \
+	basic \
+	big \
+	empty \
+	sig \
+	single
+
+AM_CFLAGS   += $(AM_FLAG_M3264_PRI)
+AM_CXXFLAGS += $(AM_FLAG_M3264_PRI)
+
--- a/dhat/tests/acc.c
+++ b/dhat/tests/acc.c
@ -0,0 +1,74 @@
+// Testing accesses of blocks.
+
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+
+void* m1(size_t n) { return malloc(n); }
+
+void* m2(size_t n) { return malloc(n); }
+
+int main(void)
+{
+   // 0th char is written 0 times, 1st char is written once, etc.
+   char* a = malloc(32);
+   for (int i = 1; i < 32; i++) {
+      for (int j = 0; j < i; j++) {
+         a[i] = 0;
+      }
+   }
+   free(a);
+
+   // Repetition and gaps.
+   int* b = malloc(20);
+   b[0] = 1;
+   b[2] = b[0];
+   for (int i = 0; i < 10; i++) {
+      b[4] = 99;
+   }
+   free(b);
+
+   // 33 bytes, goes onto a second line in dh_view.
+   char* c = calloc(33, 1);
+   c[32] = 0;
+   free(c);
+
+   // 1024 bytes, accesses are shown.
+   char* d = malloc(1024);
+   for (int i = 0; i < 1024; i++) {
+      d[i] = d[1023 - i];
+   }
+   for (int i = 500; i < 600; i++) {
+      d[i] = 0;
+   }
+   free(d);
+
+   // 1025 bytes, accesses aren't shown.
+   char* e = calloc(1025, 1);
+   for (int i = 0; i < 1025; i++) {
+      e[i] += 1;
+   }
+   free(e);
+
+   // Lots of accesses, but fewer than the 0xffff max value.
+   int* f1 = m1(100);
+   int* f2 = m1(100);
+   for (int i = 0; i < 50000; i++) {
+      f1[0] = 0;
+      f2[0] = 0;
+   }
+   free(f1);
+   free(f2);
+
+   // Lots of accesses, more than the 0xffff max value: treated as Infinity.
+   int* g1 = m2(100);
+   int* g2 = m2(100);
+   for (int i = 0; i < 100000; i++) {
+      g1[0] = 0;
+      g2[0] = 0;
+   }
+   free(g1);
+   free(g2);
+
+   return 0;
+}
--- a/dhat/tests/acc.stderr.exp
+++ b/dhat/tests/acc.stderr.exp
@ -0,0 +1,7 @@
+
+
+Total:     2,534 bytes in 9 blocks
+At t-gmax: 1,025 bytes in 1 blocks
+At t-end:  0 bytes in 0 blocks
+Reads:     2,053 bytes
+Writes:    1,202,694 bytes
--- a/dhat/tests/acc.vgtest
+++ b/dhat/tests/acc.vgtest
@ -0,0 +1,3 @@
+prog: acc
+vgopts: --dhat-out-file=dhat.out
+cleanup: rm dhat.out
--- a/dhat/tests/basic.c
+++ b/dhat/tests/basic.c
@ -0,0 +1,26 @@
+// Some basic allocations and accesses.
+
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+
+int main(void)
+{
+   int64_t* m = malloc(1000);
+   m[0] = 1;                     // write 8 bytes
+   m[10] = m[1];                 // read and write 8 bytes
+
+   char* c = calloc(1, 2000);
+   for (int i = 0; i < 1000; i++) {
+      c[i + 1000] = c[i];        // read and write 1000 bytes
+   }
+
+   char* r = realloc(m, 3000);
+   for (int i = 0; i < 500; i++) {
+      r[i + 2000] = 99;          // write 500 bytes
+   }
+                                 // totals: 1008 read, 1516 write
+   free(c);
+
+   return 0;
+}
--- a/dhat/tests/basic.stderr.exp
+++ b/dhat/tests/basic.stderr.exp
@ -0,0 +1,7 @@
+
+
+Total:     6,000 bytes in 3 blocks
+At t-gmax: 5,000 bytes in 2 blocks
+At t-end:  3,000 bytes in 1 blocks
+Reads:     1,008 bytes
+Writes:    1,516 bytes
--- a/dhat/tests/basic.vgtest
+++ b/dhat/tests/basic.vgtest
@ -0,0 +1,3 @@
+prog: basic
+vgopts: --dhat-out-file=dhat.out
+cleanup: rm dhat.out
--- a/dhat/tests/big.c
+++ b/dhat/tests/big.c
@ -0,0 +1,61 @@
+// This test implements a moderately complex call tree. The layout of these
+// functions matches the layout of the tree produced by dh_view.js, when
+// sorted by "total bytes".
+
+#include <stdlib.h>
+
+#define F(f, parent)    void* f(size_t n) { return parent(n); }
+
+// Note: don't use j1 -- that is a builtin C function, believe it or not.
+F(a, malloc)
+   F(b1, a)
+      F(c1, b1)
+         F(d1, c1)
+         F(d2, c1)   // insig total-bytes
+      F(c2, b1)
+   F(b2, a)
+   F(b3, a) F(e, b3) F(f, e)
+F(g, malloc) F(h, g) F(i, h)
+   F(j2, i) F(k, j2) F(l, k)
+   F(j3, i) F(m, j3)
+      F(n1, m)
+      F(n2, m) F(o, n2)
+   F(p, i) F(q, p)   // insig total-bytes
+   F(r, i)           // insig total-bytes
+F(s1, malloc) F(s2, s1) F(s3, s2) F(s4, s3) F(s5, s4)
+F(t, malloc)
+F(u, malloc)
+F(v, malloc)
+   F(w, v)           // insig total-bytes
+   F(x, v)           // insig total-bytes
+   F(y, v)           // insig total-bytes
+   F(z, v)           // insig total-bytes
+
+int main(void)
+{
+   // Call all the leaves in the above tree.
+
+   int* d1p = d1(706);
+   free(d1p); // So the t-final numbers differ from the t-gmax/total numbers.
+
+   d2(5);
+   c2(30);
+   b2(20);
+   f(10);
+   l(60);
+   n1(30);
+   o(20);
+   q(7);
+   r(3);
+   s5(30);
+   t(20);
+   u(19);
+   w(9);
+   x(8);
+   y(7);
+   z(5);
+   z(1);
+
+   // And one allocation directly from main().
+   malloc(10);
+}
--- a/dhat/tests/big.stderr.exp
+++ b/dhat/tests/big.stderr.exp
@ -0,0 +1,7 @@
+
+
+Total:     1,000 bytes in 19 blocks
+At t-gmax: 706 bytes in 1 blocks
+At t-end:  294 bytes in 18 blocks
+Reads:     0 bytes
+Writes:    0 bytes
--- a/dhat/tests/big.vgtest
+++ b/dhat/tests/big.vgtest
@ -0,0 +1,3 @@
+prog: big
+vgopts: --dhat-out-file=dhat.out
+cleanup: rm dhat.out
--- a/dhat/tests/empty.c
+++ b/dhat/tests/empty.c
@ -0,0 +1,6 @@
+// No allocations.
+
+int main(void)
+{
+   return 0;
+}
--- a/dhat/tests/empty.stderr.exp
+++ b/dhat/tests/empty.stderr.exp
@ -0,0 +1,7 @@
+
+
+Total:     0 bytes in 0 blocks
+At t-gmax: 0 bytes in 0 blocks
+At t-end:  0 bytes in 0 blocks
+Reads:     0 bytes
+Writes:    0 bytes
--- a/dhat/tests/empty.vgtest
+++ b/dhat/tests/empty.vgtest
@ -0,0 +1,3 @@
+prog: empty
+vgopts: --dhat-out-file=dhat.out
+cleanup: rm dhat.out
--- a/dhat/tests/filter_stderr
+++ b/dhat/tests/filter_stderr
@ -0,0 +1,9 @@
+#! /bin/sh
+
+dir=`dirname $0`
+
+$dir/../../tests/filter_stderr_basic |
+
+# Remove "Massif, ..." line and the following copyright line.
+sed "/^DHAT, a dynamic heap analysis tool/ , /./ d"
+
--- a/dhat/tests/sig.c
+++ b/dhat/tests/sig.c
@ -0,0 +1,76 @@
+// This test implements sorting of a tree involving a mix of significant and
+// insignificant nodes. The layout of these functions matches the layout of
+// the tree produced by dh_view.js, when sorted by "total bytes".
+
+#include <stdlib.h>
+
+#define F(f, parent)    void* f(size_t n) { return parent(n); }
+
+F(am, malloc)
+   // main
+   F(a2, am) // main
+   F(a3, am)
+      // main
+      // main
+F(bm, malloc)
+   // main
+   F(b2, bm) // main
+   F(b3, bm)
+      // main
+      // main
+F(cm, malloc)
+   // main
+   F(c2, cm) // main
+   F(c3, cm)
+      // main
+      // main
+F(dm, malloc)
+   // main
+   F(d2, dm) // main
+   F(d3, dm)
+      // main
+      // main
+
+char access(char* p, size_t n)
+{
+   for (int i = 0; i < 1499; i++) {
+      for (int j = 0; j < n; j++) {
+         p[j] = j;
+      }
+   }
+   char x = 0;
+   for (int j = 0; j < n; j++) {
+      x += p[j];
+   }
+   return x;
+}
+
+int main(void)
+{
+
+   char* p;
+
+   // Call all the leaves in the above tree. The pointers we pass to access()
+   // become significant in a high-access sort and insignificant in a
+   // zero-reads-or-zero-writes sort, and vice versa.
+
+   p = am(11); access(p, 11);
+   p = a2(10); access(p, 10);
+   p = a3(5);  access(p, 5);
+   p = a3(4);  access(p, 5);
+
+   p = bm(10); access(p, 10);
+   p = b2(9);  access(p, 9);
+   p = b3(5);
+   p = b3(3);
+
+   p = cm(9); access(p, 9);
+   p = c2(8);
+   p = c3(4);
+   p = c3(3);
+
+   p = dm(8);
+   p = d2(7);
+   p = d3(4);
+   p = d3(2);
+}
--- a/dhat/tests/sig.stderr.exp
+++ b/dhat/tests/sig.stderr.exp
@ -0,0 +1,7 @@
+
+
+Total:     102 bytes in 16 blocks
+At t-gmax: 102 bytes in 16 blocks
+At t-end:  102 bytes in 16 blocks
+Reads:     58 bytes
+Writes:    86,942 bytes
--- a/dhat/tests/sig.vgtest
+++ b/dhat/tests/sig.vgtest
@ -0,0 +1,3 @@
+prog: sig
+vgopts: --dhat-out-file=dhat.out
+cleanup: rm dhat.out
--- a/dhat/tests/single.c
+++ b/dhat/tests/single.c
@ -0,0 +1,11 @@
+// A single allocation (so the root node is the only node in the tree).
+
+#include <stdlib.h>
+
+int main() {
+   int* a = (int*)malloc(16);
+   a[0] = 0;
+   a[0] = 1;
+   a[0] = 2;
+   return 0;
+}
--- a/dhat/tests/single.stderr.exp
+++ b/dhat/tests/single.stderr.exp
@ -0,0 +1,7 @@
+
+
+Total:     16 bytes in 1 blocks
+At t-gmax: 16 bytes in 1 blocks
+At t-end:  16 bytes in 1 blocks
+Reads:     0 bytes
+Writes:    12 bytes
--- a/dhat/tests/single.vgtest
+++ b/dhat/tests/single.vgtest
@ -0,0 +1,3 @@
+prog: single
+vgopts: --dhat-out-file=dhat.out
+cleanup: rm dhat.out
--- a/docs/Makefile.am
+++ b/docs/Makefile.am
@ -20,6 +20,7 @@ EXTRA_DIST = \
 	images/prev.png \
 	images/up.png \
 	images/kcachegrind_xtree.png \
+	images/dh-tree.png \
 	internals/3_0_BUGSTATUS.txt \
 	internals/3_1_BUGSTATUS.txt \
 	internals/3_2_BUGSTATUS.txt \
--- a/docs/README
+++ b/docs/README
@ -76,10 +76,18 @@ could just build the docs from XML when doing 'make install', which
 would be simpler.


-Notes on building PDF / PS documents
------------------------------------
-Below are random notes and recollections about how to build PDF / PS
-documents from the XML source at various times on various Linux distros.
+Notes on building HTML / PDF / PS documents
+-------------------------------------------
+Below are random notes and recollections about how to build documents
+from the XML source at various times on various Linux distros. They're
+mostly about the PDF/PS documents, because they are the hardest to
+build.
+
+Notes [Jan 2019]
+-----------------
+For Ubuntu 18.04, to build HTML docs I had to:
+
+  sudo apt-get install xsltproc

 Notes [May 2017]
 ----------------
--- a/docs/images/dh-tree.png
+++ b/docs/images/dh-tree.png
--- a/docs/xml/manual-core.xml
+++ b/docs/xml/manual-core.xml
@ -613,7 +613,7 @@ in most cases.  We group the available options by rough categories.</para>
    <listitem>
      <para>Run the Valgrind tool called <varname>toolname</varname>,
      e.g. memcheck, cachegrind, callgrind, helgrind, drd, massif,
-      lackey, none, exp-sgcheck, exp-bbv, exp-dhat, etc.</para>
+      dhat, lackey, none, exp-sgcheck, exp-bbv, etc.</para>
    </listitem>
  </varlistentry>

@ -2562,7 +2562,7 @@ need to use them.</para>
      malloc related functions, using the
      synonym <varname>somalloc</varname>.  This synonym is usable for
      all tools doing standard replacement of malloc related functions
-      (e.g. memcheck, massif, drd, helgrind, exp-dhat, exp-sgcheck).
+      (e.g. memcheck, helgrind, drd, massif, dhat, exp-sgcheck).
      </para>

      <itemizedlist>
--- a/docs/xml/manual.xml
+++ b/docs/xml/manual.xml
@ -36,15 +36,15 @@
      xmlns:xi="http://www.w3.org/2001/XInclude" />
  <xi:include href="../../massif/docs/ms-manual.xml" parse="xml"  
      xmlns:xi="http://www.w3.org/2001/XInclude" />
-  <xi:include href="../../exp-dhat/docs/dh-manual.xml" parse="xml"  
+  <xi:include href="../../dhat/docs/dh-manual.xml" parse="xml"  
+      xmlns:xi="http://www.w3.org/2001/XInclude" />
+  <xi:include href="../../lackey/docs/lk-manual.xml" parse="xml"  
+      xmlns:xi="http://www.w3.org/2001/XInclude" />
+  <xi:include href="../../none/docs/nl-manual.xml" parse="xml"  
      xmlns:xi="http://www.w3.org/2001/XInclude" />
  <xi:include href="../../exp-sgcheck/docs/sg-manual.xml" parse="xml"  
      xmlns:xi="http://www.w3.org/2001/XInclude" />
  <xi:include href="../../exp-bbv/docs/bbv-manual.xml" parse="xml"  
      xmlns:xi="http://www.w3.org/2001/XInclude" />      
-  <xi:include href="../../lackey/docs/lk-manual.xml" parse="xml"  
-      xmlns:xi="http://www.w3.org/2001/XInclude" />
-  <xi:include href="../../none/docs/nl-manual.xml" parse="xml"  
-      xmlns:xi="http://www.w3.org/2001/XInclude" />

 </book>
--- a/exp-dhat/docs/dh-manual.xml
+++ b/exp-dhat/docs/dh-manual.xml
@ -1,401 +0,0 @@
-<?xml version="1.0"?> <!-- -*- sgml -*- -->
-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-          "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
-[ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
-
-
-<chapter id="dh-manual" 
-         xreflabel="DHAT: a dynamic heap analysis tool">
-  <title>DHAT: a dynamic heap analysis tool</title>
-
-<para>To use this tool, you must specify
-<option>--tool=exp-dhat</option> on the Valgrind
-command line.</para>
-
-
-
-<sect1 id="dh-manual.overview" xreflabel="Overview">
-<title>Overview</title>
-
-<para>DHAT is a tool for examining how programs use their heap
-allocations.</para>
-
-<para>It tracks the allocated blocks, and inspects every memory access
-to find which block, if any, it is to.  The following data is
-collected and presented per allocation point (allocation
-stack):</para>
-
-<itemizedlist>
-  <listitem><para>Total allocation (number of bytes and
-  blocks)</para></listitem>
-
-  <listitem><para>maximum live volume (number of bytes and
-  blocks)</para></listitem>
-
-  <listitem><para>average block lifetime (number of instructions
-   between allocation and freeing)</para></listitem>
-
-  <listitem><para>average number of reads and writes to each byte in
-   the block ("access ratios")</para></listitem>
-
-  <listitem><para>for allocation points which always allocate blocks
-   only of one size, and that size is 4096 bytes or less: counts
-   showing how often each byte offset inside the block is
-   accessed.</para></listitem>
-</itemizedlist>
-
-<para>Using these statistics it is possible to identify allocation
-points with the following characteristics:</para>
-
-<itemizedlist>
-
-  <listitem><para>potential process-lifetime leaks: blocks allocated
-   by the point just accumulate, and are freed only at the end of the
-   run.</para></listitem>
-
- <listitem><para>excessive turnover: points which chew through a lot
-  of heap, even if it is not held onto for very long</para></listitem>
-
- <listitem><para>excessively transient: points which allocate very
- short lived blocks</para></listitem>
-
- <listitem><para>useless or underused allocations: blocks which are
-  allocated but not completely filled in, or are filled in but not
-  subsequently read.</para></listitem>
-
- <listitem><para>blocks with inefficient layout -- areas never
-  accessed, or with hot fields scattered throughout the
-  block.</para></listitem>
-</itemizedlist>
-
-<para>As with the Massif heap profiler, DHAT measures program progress
-by counting instructions, and so presents all age/time related figures
-as instruction counts.  This sounds a little odd at first, but it
-makes runs repeatable in a way which is not possible if CPU time is
-used.</para>
-
-</sect1>
-
-
-
-
-<sect1 id="dh-manual.understanding" xreflabel="Understanding DHAT's output">
-<title>Understanding DHAT's output</title>
-
-
-<para>DHAT provides a lot of useful information on dynamic heap usage.
-Most of the art of using it is in interpretation of the resulting
-numbers.  That is best illustrated via a set of examples.</para>
-
-
-<sect2>
-<title>Interpreting the max-live, tot-alloc and deaths fields</title>
-
-<sect3><title>A simple example</title></sect3>
-
-<screen><![CDATA[
-   ======== SUMMARY STATISTICS ========
-
-   guest_insns:  1,045,339,534
-   [...]
-   max-live:    63,490 in 984 blocks
-   tot-alloc:   1,904,700 in 29,520 blocks (avg size 64.52)
-   deaths:      29,520, at avg age 22,227,424
-   acc-ratios:  6.37 rd, 1.14 wr  (12,141,526 b-read, 2,174,460 b-written)
-      at 0x4C275B8: malloc (vg_replace_malloc.c:236)
-      by 0x40350E: tcc_malloc (tinycc.c:6712)
-      by 0x404580: tok_alloc_new (tinycc.c:7151)
-      by 0x40870A: next_nomacro1 (tinycc.c:9305)
-]]></screen>
-
-<para>Over the entire run of the program, this stack (allocation
-point) allocated 29,520 blocks in total, containing 1,904,700 bytes in
-total.  By looking at the max-live data, we see that not many blocks
-were simultaneously live, though: at the peak, there were 63,490
-allocated bytes in 984 blocks.  This tells us that the program is
-steadily freeing such blocks as it runs, rather than hanging on to all
-of them until the end and freeing them all.</para>
-
-<para>The deaths entry tells us that 29,520 blocks allocated by this stack
-died (were freed) during the run of the program.  Since 29,520 is
-also the number of blocks allocated in total, that tells us that
-all allocated blocks were freed by the end of the program.</para>
-
-<para>It also tells us that the average age at death was 22,227,424
-instructions.  From the summary statistics we see that the program ran
-for 1,045,339,534 instructions, and so the average age at death is
-about 2% of the program's total run time.</para>
-
-<sect3><title>Example of a potential process-lifetime leak</title></sect3>
-
-<para>This next example (from a different program than the above)
-shows a potential process lifetime leak.  A process lifetime leak
-occurs when a program keeps allocating data, but only frees the
-data just before it exits.  Hence the program's heap grows constantly
-in size, yet Memcheck reports no leak, because the program has
-freed up everything at exit.  This is particularly a hazard for
-long running programs.</para>
-
-<screen><![CDATA[
-   ======== SUMMARY STATISTICS ========
-   
-   guest_insns:  418,901,537
-   [...]
-   max-live:    32,512 in 254 blocks
-   tot-alloc:   32,512 in 254 blocks (avg size 128.00)
-   deaths:      254, at avg age 300,467,389
-   acc-ratios:  0.26 rd, 0.20 wr  (8,756 b-read, 6,604 b-written)
-      at 0x4C275B8: malloc (vg_replace_malloc.c:236)
-      by 0x4C27632: realloc (vg_replace_malloc.c:525)
-      by 0x56FF41D: QtFontStyle::pixelSize(unsigned short, bool) (qfontdatabase.cpp:269)
-      by 0x5700D69: loadFontConfig() (qfontdatabase_x11.cpp:1146)
-]]></screen>
-
-<para>There are two tell-tale signs that this might be a
-process-lifetime leak.  Firstly, the max-live and tot-alloc numbers
-are identical.  The only way that can happen is if these blocks are
-all allocated and then all deallocated.</para>
-
-<para>Secondly, the average age at death (300 million insns) is 71% of
-the total program lifetime (419 million insns), hence this is not a
-transient allocation-free spike -- rather, it is spread out over a
-large part of the entire run.  One interpretation is, roughly, that
-all 254 blocks were allocated in the first half of the run, held onto
-for the second half, and then freed just before exit.</para>
-
-</sect2>
-
-
-<sect2>
-<title>Interpreting the acc-ratios fields</title>
-
-
-<sect3><title>A fairly harmless allocation point record</title></sect3>
-
-<screen><![CDATA[
-   max-live:    49,398 in 808 blocks
-   tot-alloc:   1,481,940 in 24,240 blocks (avg size 61.13)
-   deaths:      24,240, at avg age 34,611,026
-   acc-ratios:  2.13 rd, 0.91 wr  (3,166,650 b-read, 1,358,820 b-written)
-      at 0x4C275B8: malloc (vg_replace_malloc.c:236)
-      by 0x40350E: tcc_malloc (tinycc.c:6712)
-      by 0x404580: tok_alloc_new (tinycc.c:7151)
-      by 0x4046C4: tok_alloc (tinycc.c:7190)
-]]></screen>
-
-<para>The acc-ratios field tells us that each byte in the blocks
-allocated here is read an average of 2.13 times before the block is
-deallocated.  Given that the blocks have an average age at death of
-34,611,026, that's one read per block per approximately every 15
-million instructions.  So from that standpoint the blocks aren't
-"working" very hard.</para>
-
-<para>More interesting is the write ratio: each byte is written an
-average of 0.91 times.  This tells us that some parts of the allocated
-blocks are never written, at least 9% on average.  To completely
-initialise the block would require writing each byte at least once,
-and that would give a write ratio of 1.0.  The fact that some block
-areas are evidently unused might point to data alignment holes or
-other layout inefficiencies.</para>
-
-<para>Well, at least all the blocks are freed (24,240 allocations,
-24,240 deaths).</para>
-
-<para>If all the blocks had been the same size, DHAT would also show
-the access counts by block offset, so we could see where exactly these
-unused areas are.  However, that isn't the case: the blocks have
-varying sizes, so DHAT can't perform such an analysis.  We can see
-that they must have varying sizes since the average block size, 61.13,
-isn't a whole number.</para>
-
-
-<sect3><title>A more suspicious looking example</title></sect3>
-
-<screen><![CDATA[
-   max-live:    180,224 in 22 blocks
-   tot-alloc:   180,224 in 22 blocks (avg size 8192.00)
-   deaths:      none (none of these blocks were freed)
-   acc-ratios:  0.00 rd, 0.00 wr  (0 b-read, 0 b-written)
-      at 0x4C275B8: malloc (vg_replace_malloc.c:236)
-      by 0x40350E: tcc_malloc (tinycc.c:6712)
-      by 0x40369C: __sym_malloc (tinycc.c:6787)
-      by 0x403711: sym_malloc (tinycc.c:6805)
-]]></screen>
-
-<para>Here, both the read and write access ratios are zero.  Hence
-this point is allocating blocks which are never used, neither read nor
-written.  Indeed, they are also not freed ("deaths: none") and are
-simply leaked.  So, here is 180k of completely useless allocation that
-could be removed.</para>
-
-<para>Re-running with Memcheck does indeed report the same leak.  What
-DHAT can tell us, that Memcheck can't, is that not only are the blocks
-leaked, they are also never used.</para>
-
-<sect3><title>Another suspicious example</title></sect3>
-
-<para>Here's one where blocks are allocated, written to,
-but never read from.  We see this immediately from the zero read
-access ratio.  They do get freed, though:</para>
-
-<screen><![CDATA[
-   max-live:    54 in 3 blocks
-   tot-alloc:   1,620 in 90 blocks (avg size 18.00)
-   deaths:      90, at avg age 34,558,236
-   acc-ratios:  0.00 rd, 1.11 wr  (0 b-read, 1,800 b-written)
-      at 0x4C275B8: malloc (vg_replace_malloc.c:236)
-      by 0x40350E: tcc_malloc (tinycc.c:6712)
-      by 0x4035BD: tcc_strdup (tinycc.c:6750)
-      by 0x41FEBB: tcc_add_sysinclude_path (tinycc.c:20931)
-]]></screen>
-
-<para>In the previous two examples, it is easy to see blocks that are
-never written to, or never read from, or some combination of both.
-Unfortunately, in C++ code, the situation is less clear.  That's
-because an object's constructor will write to the underlying block,
-and its destructor will read from it.  So the block's read and write
-ratios will be non-zero even if the object, once constructed, is never
-used, but only eventually destructed.</para>
-
-<para>Really, what we want is to measure only memory accesses in
-between the end of an object's construction and the start of its
-destruction.  Unfortunately I do not know of a reliable way to
-determine when those transitions are made.</para>
-
-
-</sect2>
-
-<sect2>
-<title>Interpreting "Aggregated access counts by offset" data</title>
-
-<para>For allocation points that always allocate blocks of the same
-size, and which are 4096 bytes or smaller, DHAT counts accesses
-per offset, for example:</para>
-
-<screen><![CDATA[
-   max-live:    317,408 in 5,668 blocks
-   tot-alloc:   317,408 in 5,668 blocks (avg size 56.00)
-   deaths:      5,668, at avg age 622,890,597
-   acc-ratios:  1.03 rd, 1.28 wr  (327,642 b-read, 408,172 b-written)
-      at 0x4C275B8: malloc (vg_replace_malloc.c:236)
-      by 0x5440C16: QDesignerPropertySheetPrivate::ensureInfo (qhash.h:515)
-      by 0x544350B: QDesignerPropertySheet::setVisible (qdesigner_propertysh...)
-      by 0x5446232: QDesignerPropertySheet::QDesignerPropertySheet (qdesigne...)
-   
-   Aggregated access counts by offset:
-   
-   [   0]  28782 28782 28782 28782 28782 28782 28782 28782
-   [   8]  20638 20638 20638 20638 0 0 0 0 
-   [  16]  22738 22738 22738 22738 22738 22738 22738 22738
-   [  24]  6013 6013 6013 6013 6013 6013 6013 6013 
-   [  32]  18883 18883 18883 37422 0 0 0 0
-   [  36]  5668 11915 5668 5668 11336 11336 11336 11336 
-   [  48]  6166 6166 6166 6166 0 0 0 0 
-]]></screen>
-
-<para>This is fairly typical, for C++ code running on a 64-bit
-platform.  Here, we have aggregated access statistics for 5668 blocks,
-all of size 56 bytes.  Each byte has been accessed at least 5668
-times, except for offsets 12--15, 36--39 and 52--55.  These are likely
-to be alignment holes.</para>
-
-<para>Careful interpretation of the numbers reveals useful information.
-Groups of N consecutive identical numbers that begin at an N-aligned
-offset, for N being 2, 4 or 8, are likely to indicate an N-byte object
-in the structure at that point.  For example, the first 32 bytes of
-this object are likely to have the layout</para>
-
-<screen><![CDATA[
-   [0 ]  64-bit type
-   [8 ]  32-bit type
-   [12]  32-bit alignment hole
-   [16]  64-bit type
-   [24]  64-bit type
-]]></screen>
-
-<para>As a counterexample, it's also clear that, whatever is at offset 32,
-it is not a 32-bit value.  That's because the last number of the group
-(37422) is not the same as the first three (18883 18883 18883).</para>
-
-<para>This example leads one to enquire (by reading the source code)
-whether the zeroes at 12--15 and 52--55 are alignment holes, and
-whether 48--51 is indeed a 32-bit type.  If so, it might be possible
-to place what's at 48--51 at 12--15 instead, which would reduce
-the object size from 56 to 48 bytes.</para>
-
-<para>Bear in mind that the above inferences are all only "maybes".  That's
-because they are based on dynamic data, not static analysis of the
-object layout.  For example, the zeroes might not be alignment
-holes, but rather just parts of the structure which were not used
-at all for this particular run.  Experience shows that's unlikely
-to be the case, but it could happen.</para>
-
-</sect2>
-
-</sect1>
-
-
-
-
-
-
-
-<sect1 id="dh-manual.options" xreflabel="DHAT Command-line Options">
-<title>DHAT Command-line Options</title>
-
-<para>DHAT-specific command-line options are:</para>
-
-<!-- start of xi:include in the manpage -->
-<variablelist id="dh.opts.list">
-
-  <varlistentry id="opt.show-top-n" xreflabel="--show-top-n">
-    <term>
-      <option><![CDATA[--show-top-n=<number>
-      [default: 10] ]]></option>
-    </term>
-    <listitem>
-      <para>At the end of the run, DHAT sorts the accumulated
-       allocation points according to some metric, and shows the
-       highest scoring entries.  <varname>--show-top-n</varname>
-       controls how many entries are shown.  The default of 10 is
-       quite small.  For realistic applications you will probably need
-       to set it much higher, at least several hundred.</para>
-    </listitem>
-  </varlistentry>
-
-  <varlistentry id="opt.sort-by" xreflabel="--sort-by=string">
-    <term>
-      <option><![CDATA[--sort-by=<string> [default: max-bytes-live] ]]></option>
-    </term>
-    <listitem>
-      <para>At the end of the run, DHAT sorts the accumulated
-       allocation points according to some metric, and shows the
-       highest scoring entries.  <varname>--sort-by</varname>
-       selects the metric used for sorting:</para>
-      <para><varname>max-bytes-live   </varname>  maximum live bytes [default]</para>
-      <para><varname>tot-bytes-allocd </varname>  bytes allocates in total (turnover)</para>
-      <para><varname>max-blocks-live  </varname>  maximum live blocks</para>
-      <para><varname>tot-blocks-allocd </varname> blocks allocated in total (turnover)</para>
-      <para>This controls the order in which allocation points are
-       displayed.  You can choose to look at allocation points with
-       the highest number of live bytes, or the highest total byte turnover, or
-       by the highest number of live blocks, or the highest total block
-       turnover.  These give usefully different pictures of program behaviour.
-       For example, sorting by maximum live blocks tends to show up allocation
-       points creating large numbers of small objects.</para>
-    </listitem>
-  </varlistentry>
-
-</variablelist>
-
-<para>One important point to note is that each allocation stack counts
-as a separate allocation point.  Because stacks by default have 12
-frames, this tends to spread data out over multiple allocation points.
-You may want to use the flag --num-callers=4 or some such small
-number, to reduce the spreading.</para>
-
-<!-- end of xi:include in the manpage -->
-
-</sect1>
-
-</chapter>
--- a/exp-dhat/tests/Makefile.am
+++ b/exp-dhat/tests/Makefile.am
@ -1 +0,0 @@
-
--- a/include/pub_tool_libcsetjmp.h
+++ b/include/pub_tool_libcsetjmp.h
@ -7,7 +7,7 @@
   This file is part of Valgrind, a dynamic binary instrumentation
   framework.

-   Copyright (C) 2010-2017 Mozilla Inc
+   Copyright (C) 2010-2017 Mozilla Foundation

   This program is free software; you can redistribute it and/or
   modify it under the terms of the GNU General Public License as
--- a/massif/ms_main.c
+++ b/massif/ms_main.c
@ -1,6 +1,6 @@
-//--------------------------------------------------------------------*/
-//--- Massif: a heap profiling tool.                     ms_main.c ---*/
-//--------------------------------------------------------------------*/
+//--------------------------------------------------------------------//
+//--- Massif: a heap profiling tool.                     ms_main.c ---//
+//--------------------------------------------------------------------//

 /*
   This file is part of Massif, a Valgrind tool for profiling memory
--- a/solaris/valgrind.p5m
+++ b/solaris/valgrind.p5m
@ -50,12 +50,12 @@ file path=usr/lib/valgrind/cachegrind-x86-solaris                        owner=r
 file path=usr/lib/valgrind/callgrind-amd64-solaris                       owner=root group=bin mode=0755
 file path=usr/lib/valgrind/callgrind-x86-solaris                         owner=root group=bin mode=0755
 file path=usr/lib/valgrind/default.supp                                  owner=root group=bin mode=0644
+file path=usr/lib/valgrind/dhat-amd64-solaris                            owner=root group=bin mode=0755
+file path=usr/lib/valgrind/dhat-x86-solaris                              owner=root group=bin mode=0755
 file path=usr/lib/valgrind/drd-amd64-solaris                             owner=root group=bin mode=0755
 file path=usr/lib/valgrind/drd-x86-solaris                               owner=root group=bin mode=0755
 file path=usr/lib/valgrind/exp-bbv-amd64-solaris                         owner=root group=bin mode=0755
 file path=usr/lib/valgrind/exp-bbv-x86-solaris                           owner=root group=bin mode=0755
-file path=usr/lib/valgrind/exp-dhat-amd64-solaris                        owner=root group=bin mode=0755
-file path=usr/lib/valgrind/exp-dhat-x86-solaris                          owner=root group=bin mode=0755
 file path=usr/lib/valgrind/exp-sgcheck-amd64-solaris                     owner=root group=bin mode=0755
 file path=usr/lib/valgrind/exp-sgcheck-x86-solaris                       owner=root group=bin mode=0755
 file path=usr/lib/valgrind/getoff-amd64-solaris                          owner=root group=bin mode=0755
@ -75,8 +75,8 @@ file path=usr/lib/valgrind/vgpreload_core-amd64-solaris.so               owner=r
 file path=usr/lib/valgrind/vgpreload_core-x86-solaris.so                 owner=root group=bin mode=0755
 file path=usr/lib/valgrind/vgpreload_drd-amd64-solaris.so                owner=root group=bin mode=0755
 file path=usr/lib/valgrind/vgpreload_drd-x86-solaris.so                  owner=root group=bin mode=0755
-file path=usr/lib/valgrind/vgpreload_exp-dhat-amd64-solaris.so           owner=root group=bin mode=0755
-file path=usr/lib/valgrind/vgpreload_exp-dhat-x86-solaris.so             owner=root group=bin mode=0755
+file path=usr/lib/valgrind/vgpreload_dhat-amd64-solaris.so               owner=root group=bin mode=0755
+file path=usr/lib/valgrind/vgpreload_dhat-x86-solaris.so                 owner=root group=bin mode=0755
 file path=usr/lib/valgrind/vgpreload_exp-sgcheck-amd64-solaris.so        owner=root group=bin mode=0755
 file path=usr/lib/valgrind/vgpreload_exp-sgcheck-x86-solaris.so          owner=root group=bin mode=0755
 file path=usr/lib/valgrind/vgpreload_massif-amd64-solaris.so             owner=root group=bin mode=0755
--- a/tests/check_headers_and_includes
+++ b/tests/check_headers_and_includes
@ -42,18 +42,18 @@ my %coregrind_dirs = (
    );

 my %tool_dirs = (
-    "none" => 1,
-    "lackey" => 1,
-    "massif" => 1,
    "memcheck" => 1,
-    "drd" => 1,
-    "helgrind", => 1,
-    "callgrind" => 1,
    "cachegrind" => 1,
-    "shared" => 1,
+    "callgrind" => 1,
+    "helgrind", => 1,
+    "drd" => 1,
+    "massif" => 1,
+    "dhat" => 1,
+    "lackey" => 1,
+    "none" => 1,
    "exp-bbv" => 1,
-    "exp-dhat" => 1,
    "exp-sgcheck" => 1
+    "shared" => 1,
    );

 my %dirs_to_ignore = (