mirror of
https://github.com/Zenithsiz/ftmemsim-valgrind.git
synced 2026-02-04 02:18:37 +00:00
line options. This commit changes them to all <option>. Also make consistent how options with multiple names (eg. -h --help) are shown. Also, remove section describing --help and --version in Callgrind's chapter; these aren't necessary and are presumably a hangover from when Callgrind was a separate tool. git-svn-id: svn://svn.valgrind.org/valgrind/trunk@10659
347 lines
13 KiB
XML
347 lines
13 KiB
XML
<?xml version="1.0"?> <!-- -*- sgml -*- -->
|
|
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
|
|
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
|
|
|
|
<chapter id="bbv-manual" xreflabel="BBV">
|
|
<title>BBV: an experimental basic block vector generation tool</title>
|
|
|
|
<para>To use this tool, you must specify
|
|
<option>--tool=exp-bbv</option> on the Valgrind
|
|
command line.</para>
|
|
|
|
<sect1 id="bbv-manual.overview" xreflabel="Overview">
|
|
<title>Overview</title>
|
|
|
|
<para>
|
|
A Basic Blocks Vector (BBV) is a list of all basic blocks entered
|
|
during program execution, and a count of how many times each
|
|
block was run (a basic block is a section of code
|
|
with only one entry point and one exit point).
|
|
</para>
|
|
|
|
<para>
|
|
BBV is tool that generates basic block vectors
|
|
for use with the SimPoint analysis tool
|
|
(http://www.cse.ucsd.edu/~calder/simpoint/).
|
|
The SimPoint methodology enables speeding up architectural
|
|
simulations by only running a small portion of a program
|
|
and then extrapolating total behavior from this
|
|
small portion. Most programs exhibit phase-based behavior, which
|
|
means that at various times during execution a program will encounter
|
|
intervals of time where the code behaves similarly to a previous
|
|
interval. If you can detect these intervals and group them together,
|
|
an approximation of the total program behavior can be obtained
|
|
by only simulating a bare minimum number of intervals, and then scaling
|
|
the results.
|
|
</para>
|
|
|
|
<para>
|
|
In computer architecture research, running a
|
|
benchmark on a cycle-accurate simulator can cause slowdowns on the order
|
|
of 1000 times, making it take days, weeks, or even longer to run full
|
|
benchmarks. By utilizing SimPoint this can be reduced significantly,
|
|
usually by 90-95%, while still retaining reasonable accuracy.
|
|
</para>
|
|
|
|
<para>
|
|
A more complete introduction to how SimPoint works can be
|
|
found in the paper "Automatically Characterizing Large Scale
|
|
Program Behavior" by T. Sherwood, E. Perelman, G. Hamerly, and
|
|
B. Calder.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="bbv-manual.quickstart" xreflabel="Quick Start">
|
|
<title>Using Basic Block Vectors to create SimPoints</title>
|
|
|
|
<para>
|
|
To quickly create a basic block vector file, you will call Valgrind
|
|
like this:
|
|
<computeroutput>valgrind --tool=exp-bbv /bin/ls</computeroutput>
|
|
In this case we are running on the "ls" program, but this
|
|
can be any executable. By default a file called
|
|
<computeroutput>bb.out.PID</computeroutput> will be created,
|
|
where PID is replaced by the process ID of the running process.
|
|
This file is the basic block vector. For long-running programs
|
|
this file can be quite large, so it might be wise to compress
|
|
it with gzip or some other compression program.
|
|
</para>
|
|
|
|
<para>
|
|
To create actual SimPoint results, you will need the
|
|
SimPoint utility, available from the SimPoint webpage
|
|
(http://www.cse.ucsd.edu/~calder/simpoint/).
|
|
Assuming you have downloaded SimPoint 3.2 and compiled it,
|
|
create SimPoint results with a command like the following:
|
|
|
|
<programlisting><![CDATA[
|
|
./SimPoint.3.2/bin/simpoint -inputVectorsGzipped \
|
|
-loadFVFile bb.out.1234.gz \
|
|
-k 5 -saveSimpoints results.simpts \
|
|
-saveSimpointWeights results.weights]]></programlisting>
|
|
|
|
where bb.out.1234.gz is your compressed basic block vector file
|
|
generated by Valgrind exp-bbv.
|
|
</para>
|
|
|
|
<para>
|
|
The SimPoint utility does random linear projection using 15-dimensions,
|
|
then does k-mean clustering to calculate which intervals are
|
|
of interest. In this example we specify 5 intervals with the
|
|
-k 5 option.
|
|
</para>
|
|
|
|
<para>
|
|
The outputs from the SimPoint run are the
|
|
<computeroutput>results.simpts</computeroutput>
|
|
and <computeroutput>results.weights</computeroutput> files.
|
|
The first holds the 5 most relevant intervals of the program.
|
|
The seconds holds the weight to scale each interval by when
|
|
extrapolating full-program behavior. The intervals and the weights
|
|
can be used in conjunction with a simulator that supports
|
|
fast-forwarding; you fast-forward to the interval of interest,
|
|
collect stats for the desired interval length, then use
|
|
statistics gathered in conjunction with the weights to
|
|
calculate your results.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="bbv-manual.usage" xreflabel="BBV Usage">
|
|
<title>BBV Command Line Options</title>
|
|
|
|
<para>
|
|
BBV has various options that control the behavior of the plugin:
|
|
<!-- start of xi:include in the manpage -->
|
|
<variablelist id="bbv.opts.list">
|
|
|
|
<varlistentry id="opt.interval-size" xreflabel="--interval-size">
|
|
<term>
|
|
<option><![CDATA[--interval-size=<number> [default: 100000000] ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>
|
|
This option selects the size of the interval to use.
|
|
The default is 100
|
|
million instructions, which is a commonly used value.
|
|
Other sizes can be used; smaller intervals can help programs
|
|
with finer-grained phases. However smaller interval size
|
|
can lead to accuracy issues due to warm-up effects
|
|
(When fast-forwarding the various architectural features
|
|
will be un-initialized, and it will take some number
|
|
of instructions before they "warm up" to the state a
|
|
full simulation would be at without the fast-forwarding.
|
|
Large interval sizes tend to mitigate this.)
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry id="opt.instr-count-only" xreflabel="--instr-count-only">
|
|
<term>
|
|
<option><![CDATA[--instr-count-only [default: no] ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>
|
|
This option tells the tool to only display instruction
|
|
count totals, and to not generate the
|
|
actual BBV file. This is useful for debugging, and for
|
|
gathering instruction count info without generating
|
|
the large BBV files.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry id="opt.bb-out-file" xreflabel="--bb-out-file">
|
|
<term>
|
|
<option><![CDATA[--bb-out-file=<name> [default: bb.out.%p] ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>
|
|
This option selects the name of the basic block file. Default is
|
|
bb.out.%p. The
|
|
<option>%p</option> and <option>%q</option> format specifiers can be
|
|
used to embed the process ID and/or the contents of an environment
|
|
variable in the name, as is the case for the core option
|
|
<option>--log-file</option>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry id="opt.pc-out-file" xreflabel="--pc-out-file">
|
|
<term>
|
|
<option><![CDATA[--pc-out-file=<name> [default: pc.out.%p] ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>
|
|
This option selects the name of the PC file.
|
|
This file holds program counter addresses
|
|
and function name info for the various basic blocks.
|
|
This can be used in conjunction
|
|
with the bbv file to fast-forward via function names
|
|
instead of just instruction counts.
|
|
The default filename is pc.out.%p.
|
|
<option>%p</option> and <option>%q</option> format specifiers can be
|
|
used to embed the process ID and/or the contents of an environment
|
|
variable in the name, as is the case for the core option
|
|
<option>--log-file</option>.
|
|
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
<!-- end of xi:include in the manpage -->
|
|
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="bbv-manual.fileformat" xreflabel="BBV File Format">
|
|
<title>Basic Block Vector File Format</title>
|
|
|
|
<para>
|
|
The Basic Block Vector is dumped at fixed intervals. This
|
|
is commonly done every 100 million instructions; the
|
|
<option>--interval-size</option> option can be
|
|
used to change this.
|
|
</para>
|
|
|
|
<para>
|
|
The output file looks like this:
|
|
</para>
|
|
|
|
<programlisting><![CDATA[
|
|
T:45:1024 :189:99343
|
|
T:11:78573 :15:1353 :56:1
|
|
T:18:45 :12:135353 :56:78 314:4324263]]></programlisting>
|
|
|
|
<para>
|
|
Each new interval starts with a T. This is followed by a colon,
|
|
then by a unique number identifying the basic block. This is followed
|
|
by another colon, then followed by the frequency (which is scaled
|
|
by the number of instructions in the basic block).
|
|
</para>
|
|
|
|
<para>
|
|
The entry count is multiplied by the number of instructions that are
|
|
in the basic block, in order to weigh the count so that instructions in
|
|
small Basic Blocks aren't counted as more important than instructions
|
|
in large Basic Blocks.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="bbv-manual.implementation" xreflabel="Implementation">
|
|
<title>Implementation</title>
|
|
|
|
<para>
|
|
Valgrind provides all of the information necessary to create
|
|
BBV files. In the current implementation, all instructions
|
|
are instrumented. This is slower (by approximately a factor
|
|
of two) than a method that instruments at the basic-block level,
|
|
but there are some complications (especially with rep prefix
|
|
detection) that make that method more difficult.
|
|
</para>
|
|
|
|
<para>
|
|
Valgrind actually provides instrumentation at a super-block level.
|
|
A super-block has one entry point but unlike basic-blocks can
|
|
have multiple exit points. Once a branch occurs into the middle
|
|
of a block, it is split into a new basic-block. Because
|
|
Valgrind cannot produce "true" basic blocks, the generated
|
|
BBV vectors will be different than those generated by other tools.
|
|
In practice this does not seem to affect the accuracy of the
|
|
SimPoint results. We do internally force the
|
|
<option>--vex-guest-chase-thresh=0</option>
|
|
option to Valgrind which forces a more basic-block like
|
|
behavior.
|
|
</para>
|
|
|
|
<para>
|
|
When a super block is run for the first time, it is instrumented
|
|
with our BBV routine. This adds a call to our instruction
|
|
counting function for each original instruction.
|
|
The current superblock is looked up in an Ordered Set to find
|
|
a structure that holds block-specific statistics (the entry point
|
|
address is the index into the hash table). We increment the
|
|
instruction count for this superblock and
|
|
also update the master instruction count.
|
|
If the master count overflows the interval size
|
|
then we print out the basic block statistics for the current interval
|
|
to disk, and then reset all the superblock counters to zero.
|
|
</para>
|
|
|
|
<para>
|
|
On the x86 and amd64 architectures the code takes special
|
|
care with rep-prefixed string instructions. This is because
|
|
actual hardware counts a rep-prefixed instruction
|
|
as one instruction, while a naive Valgrind implementation
|
|
would count it as many (possibly hundreds, thousands or even millions)
|
|
of instructions. We have special code to handle
|
|
this properly, which makes the results match hardware performance
|
|
counter results.
|
|
</para>
|
|
|
|
<para>
|
|
The exp-bbv tool also counts the fldcw instruction. This
|
|
instruction is used on x86 machines when converting numbers
|
|
from floating point to integer (among other uses).
|
|
On Pentium 4 systems the retired instruction performance
|
|
counter counts this instruction as two
|
|
instructions (all other known processors only count it as one).
|
|
This can affect results when using SimPoint on Pentium 4 systems,
|
|
so we provide the count for use in mitigating this at analysis time.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="bbv-manual.threadsupport" xreflabel="BBV Threaded Support">
|
|
<title>Threaded Executable Support</title>
|
|
|
|
<para>
|
|
BBV supports threaded programs. When a program has multiple threads,
|
|
an additional BBV file is created for each thread (each additional
|
|
file is the specified filename with the thread number
|
|
appended at the end).
|
|
</para>
|
|
|
|
<para>
|
|
There is no official method of using SimPoint with
|
|
threaded workloads. The most common method is to run
|
|
SimPoint on each thread's results independently, and use
|
|
some method of deterministic execution to try to match the
|
|
original workload. This should be possible with current
|
|
exp-bbv.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="bbv-manual.validation" xreflabel="BBV Validation">
|
|
<title>Validation</title>
|
|
|
|
<para>
|
|
This plugin has been tested on x86, amd64, and ppc32 platforms.
|
|
An earlier version of the plugin was tested in detail using
|
|
hardware performance counters, this work is described in a paper
|
|
from the HiPEAC'08 conference, "Using Dynamic Binary Instrumentation
|
|
to Generate Multi-Platform SimPoints: Methodology and Accuracy" by
|
|
V.M. Weaver and S.A. McKee.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="bbv-manual.performance" xreflabel="BBV Performance">
|
|
<title>Performance</title>
|
|
|
|
<para>
|
|
Using this program slows down execution by roughly a factor of 40
|
|
over native execution. This varies depending on the machine
|
|
used and the benchmark being run.
|
|
On the SPEC CPU 2000 benchmarks running on a 3.4GHz Pentium D
|
|
processor, the slowdown ranges from 24x (mcf) to 340x (vortex.2).
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
</chapter>
|