mirror of
https://github.com/Zenithsiz/ftmemsim-valgrind.git
synced 2026-02-11 14:01:48 +00:00
observed by Matthieu Castet. Also, add another sanity-check flag. git-svn-id: svn://svn.valgrind.org/valgrind/trunk@7253
1315 lines
55 KiB
XML
1315 lines
55 KiB
XML
<?xml version="1.0"?> <!-- -*- sgml -*- -->
|
|
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
|
|
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
|
|
[ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
|
|
|
|
|
|
<chapter id="hg-manual" xreflabel="Helgrind: thread error detector">
|
|
<title>Helgrind: a thread error detector</title>
|
|
|
|
<para>To use this tool, you must specify
|
|
<computeroutput>--tool=helgrind</computeroutput> on the Valgrind
|
|
command line.</para>
|
|
|
|
|
|
|
|
|
|
<sect1 id="hg-manual.overview" xreflabel="Overview">
|
|
<title>Overview</title>
|
|
|
|
<para>Helgrind is a Valgrind tool for detecting synchronisation errors
|
|
in C, C++ and Fortran programs that use the POSIX pthreads
|
|
threading primitives.</para>
|
|
|
|
<para>The main abstractions in POSIX pthreads are: a set of threads
|
|
sharing a common address space, thread creation, thread joinage,
|
|
thread exit, mutexes (locks), condition variables (inter-thread event
|
|
notifications), reader-writer locks, and semaphores.</para>
|
|
|
|
<para>Helgrind is aware of all these abstractions and tracks their
|
|
effects as accurately as it can. Currently it does not correctly
|
|
handle pthread barriers and pthread spinlocks, although it will not
|
|
object if you use them. On x86 and amd64 platforms, it understands
|
|
and partially handles implicit locking arising from the use of the
|
|
LOCK instruction prefix.
|
|
</para>
|
|
|
|
<para>Helgrind can detect three classes of errors, which are discussed
|
|
in detail in the next three sections:</para>
|
|
|
|
<orderedlist>
|
|
<listitem>
|
|
<para><link linkend="hg-manual.api-checks">
|
|
Misuses of the POSIX pthreads API.</link></para>
|
|
</listitem>
|
|
<listitem>
|
|
<para><link linkend="hg-manual.lock-orders">
|
|
Potential deadlocks arising from lock
|
|
ordering problems.</link></para>
|
|
</listitem>
|
|
<listitem>
|
|
<para><link linkend="hg-manual.data-races">
|
|
Data races -- accessing memory without adequate locking.
|
|
</link></para>
|
|
</listitem>
|
|
</orderedlist>
|
|
|
|
<para>Following those is a section containing
|
|
<link linkend="hg-manual.effective-use">
|
|
hints and tips on how to get the best out of Helgrind.</link>
|
|
</para>
|
|
|
|
<para>Then there is a
|
|
<link linkend="hg-manual.options">summary of command-line
|
|
options.</link>
|
|
</para>
|
|
|
|
<para>Finally, there is
|
|
<link linkend="hg-manual.todolist">a brief summary of areas in which Helgrind
|
|
could be improved.</link>
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
|
|
<sect1 id="hg-manual.api-checks" xreflabel="API Checks">
|
|
<title>Detected errors: Misuses of the POSIX pthreads API</title>
|
|
|
|
<para>Helgrind intercepts calls to many POSIX pthreads functions, and
|
|
is therefore able to report on various common problems. Although
|
|
these are unglamourous errors, their presence can lead to undefined
|
|
program behaviour and hard-to-find bugs later in execution. The
|
|
detected errors are:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem><para>unlocking an invalid mutex</para></listitem>
|
|
<listitem><para>unlocking a not-locked mutex</para></listitem>
|
|
<listitem><para>unlocking a mutex held by a different
|
|
thread</para></listitem>
|
|
<listitem><para>destroying an invalid or a locked mutex</para></listitem>
|
|
<listitem><para>recursively locking a non-recursive mutex</para></listitem>
|
|
<listitem><para>deallocation of memory that contains a
|
|
locked mutex</para></listitem>
|
|
<listitem><para>passing mutex arguments to functions expecting
|
|
reader-writer lock arguments, and vice
|
|
versa</para></listitem>
|
|
<listitem><para>when a POSIX pthread function fails with an
|
|
error code that must be handled</para></listitem>
|
|
<listitem><para>when a thread exits whilst still holding locked
|
|
locks</para></listitem>
|
|
<listitem><para>calling <computeroutput>pthread_cond_wait</computeroutput>
|
|
with a not-locked mutex, or one locked by a different
|
|
thread</para></listitem>
|
|
</itemizedlist>
|
|
|
|
<para>Checks pertaining to the validity of mutexes are generally also
|
|
performed for reader-writer locks.</para>
|
|
|
|
<para>Various kinds of this-can't-possibly-happen events are also
|
|
reported. These usually indicate bugs in the system threading
|
|
library.</para>
|
|
|
|
<para>Reported errors always contain a primary stack trace indicating
|
|
where the error was detected. They may also contain auxiliary stack
|
|
traces giving additional information. In particular, most errors
|
|
relating to mutexes will also tell you where that mutex first came to
|
|
Helgrind's attention (the "<computeroutput>was first observed
|
|
at</computeroutput>" part), so you have a chance of figuring out which
|
|
mutex it is referring to. For example:</para>
|
|
|
|
<programlisting><![CDATA[
|
|
Thread #1 unlocked a not-locked lock at 0x7FEFFFA90
|
|
at 0x4C2408D: pthread_mutex_unlock (hg_intercepts.c:492)
|
|
by 0x40073A: nearly_main (tc09_bad_unlock.c:27)
|
|
by 0x40079B: main (tc09_bad_unlock.c:50)
|
|
Lock at 0x7FEFFFA90 was first observed
|
|
at 0x4C25D01: pthread_mutex_init (hg_intercepts.c:326)
|
|
by 0x40071F: nearly_main (tc09_bad_unlock.c:23)
|
|
by 0x40079B: main (tc09_bad_unlock.c:50)
|
|
]]></programlisting>
|
|
|
|
<para>Helgrind has a way of summarising thread identities, as
|
|
evidenced here by the text "<computeroutput>Thread
|
|
#1</computeroutput>". This is so that it can speak about threads and
|
|
sets of threads without overwhelming you with details. See
|
|
<link linkend="hg-manual.data-races.errmsgs">below</link>
|
|
for more information on interpreting error messages.</para>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
|
|
<sect1 id="hg-manual.lock-orders" xreflabel="Lock Orders">
|
|
<title>Detected errors: Inconsistent Lock Orderings</title>
|
|
|
|
<para>In this section, and in general, to "acquire" a lock simply
|
|
means to lock that lock, and to "release" a lock means to unlock
|
|
it.</para>
|
|
|
|
<para>Helgrind monitors the order in which threads acquire locks.
|
|
This allows it to detect potential deadlocks which could arise from
|
|
the formation of cycles of locks. Detecting such inconsistencies is
|
|
useful because, whilst actual deadlocks are fairly obvious, potential
|
|
deadlocks may never be discovered during testing and could later lead
|
|
to hard-to-diagnose in-service failures.</para>
|
|
|
|
<para>The simplest example of such a problem is as
|
|
follows.</para>
|
|
|
|
<itemizedlist>
|
|
<listitem><para>Imagine some shared resource R, which, for whatever
|
|
reason, is guarded by two locks, L1 and L2, which must both be held
|
|
when R is accessed.</para>
|
|
</listitem>
|
|
<listitem><para>Suppose a thread acquires L1, then L2, and proceeds
|
|
to access R. The implication of this is that all threads in the
|
|
program must acquire the two locks in the order first L1 then L2.
|
|
Not doing so risks deadlock.</para>
|
|
</listitem>
|
|
<listitem><para>The deadlock could happen if two threads -- call them
|
|
T1 and T2 -- both want to access R. Suppose T1 acquires L1 first,
|
|
and T2 acquires L2 first. Then T1 tries to acquire L2, and T2 tries
|
|
to acquire L1, but those locks are both already held. So T1 and T2
|
|
become deadlocked.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>Helgrind builds a directed graph indicating the order in which
|
|
locks have been acquired in the past. When a thread acquires a new
|
|
lock, the graph is updated, and then checked to see if it now contains
|
|
a cycle. The presence of a cycle indicates a potential deadlock involving
|
|
the locks in the cycle.</para>
|
|
|
|
<para>In simple situations, where the cycle only contains two locks,
|
|
Helgrind will show where the required order was established:</para>
|
|
|
|
<programlisting><![CDATA[
|
|
Thread #1: lock order "0x7FEFFFAB0 before 0x7FEFFFA80" violated
|
|
at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388)
|
|
by 0x40081F: main (tc13_laog1.c:24)
|
|
Required order was established by acquisition of lock at 0x7FEFFFAB0
|
|
at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388)
|
|
by 0x400748: main (tc13_laog1.c:17)
|
|
followed by a later acquisition of lock at 0x7FEFFFA80
|
|
at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388)
|
|
by 0x400773: main (tc13_laog1.c:18)
|
|
]]></programlisting>
|
|
|
|
<para>When there are more than two locks in the cycle, the error is
|
|
equally serious. However, at present Helgrind does not show the locks
|
|
involved, so as to avoid flooding you with information. That could be
|
|
fixed in future. For example, here is a an example involving a cycle
|
|
of five locks from a naive implementation the famous Dining
|
|
Philosophers problem
|
|
(see <computeroutput>helgrind/tests/tc14_laog_dinphils.c</computeroutput>).
|
|
In this case Helgrind has detected that all 5 philosophers could
|
|
simultaneously pick up their left fork and then deadlock whilst
|
|
waiting to pick up their right forks.</para>
|
|
|
|
<programlisting><![CDATA[
|
|
Thread #6: lock order "0x6010C0 before 0x601160" violated
|
|
at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388)
|
|
by 0x4007C0: dine (tc14_laog_dinphils.c:19)
|
|
by 0x4C25DF7: mythread_wrapper (hg_intercepts.c:178)
|
|
by 0x4E2F09D: start_thread (in /lib64/libpthread-2.5.so)
|
|
by 0x51054CC: clone (in /lib64/libc-2.5.so)
|
|
]]></programlisting>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
|
|
<sect1 id="hg-manual.data-races" xreflabel="Data Races">
|
|
<title>Detected errors: Data Races</title>
|
|
|
|
<para>A data race happens, or could happen, when two threads
|
|
access a shared memory location without using suitable locks to
|
|
ensure single-threaded access. Such missing locking can cause
|
|
obscure timing dependent bugs. Ensuring programs are race-free is
|
|
one of the central difficulties of threaded programming.</para>
|
|
|
|
<para>Reliably detecting races is a difficult problem, and most
|
|
of Helgrind's internals are devoted to do dealing with it.
|
|
As a consequence this section is somewhat long and involved.
|
|
We begin with a simple example.</para>
|
|
|
|
|
|
<sect2 id="hg-manual.data-races.example" xreflabel="Simple Race">
|
|
<title>A Simple Data Race</title>
|
|
|
|
<para>About the simplest possible example of a race is as follows. In
|
|
this program, it is impossible to know what the value
|
|
of <computeroutput>var</computeroutput> is at the end of the program.
|
|
Is it 2 ? Or 1 ?</para>
|
|
|
|
<programlisting><![CDATA[
|
|
#include <pthread.h>
|
|
|
|
int var = 0;
|
|
|
|
void* child_fn ( void* arg ) {
|
|
var++; /* Unprotected relative to parent */ /* this is line 6 */
|
|
return NULL;
|
|
}
|
|
|
|
int main ( void ) {
|
|
pthread_t child;
|
|
pthread_create(&child, NULL, child_fn, NULL);
|
|
var++; /* Unprotected relative to child */ /* this is line 13 */
|
|
pthread_join(child, NULL);
|
|
return 0;
|
|
}
|
|
]]></programlisting>
|
|
|
|
<para>The problem is there is nothing to
|
|
stop <computeroutput>var</computeroutput> being updated simultaneously
|
|
by both threads. A correct program would
|
|
protect <computeroutput>var</computeroutput> with a lock of type
|
|
<computeroutput>pthread_mutex_t</computeroutput>, which is acquired
|
|
before each access and released afterwards. Helgrind's output for
|
|
this program is:</para>
|
|
|
|
<programlisting><![CDATA[
|
|
Thread #1 is the program's root thread
|
|
|
|
Thread #2 was created
|
|
at 0x510548E: clone (in /lib64/libc-2.5.so)
|
|
by 0x4E2F305: do_clone (in /lib64/libpthread-2.5.so)
|
|
by 0x4E2F7C5: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.5.so)
|
|
by 0x4C23870: pthread_create@* (hg_intercepts.c:198)
|
|
by 0x4005F1: main (simple_race.c:12)
|
|
|
|
Possible data race during write of size 4 at 0x601034
|
|
at 0x4005F2: main (simple_race.c:13)
|
|
Old state: shared-readonly by threads #1, #2
|
|
New state: shared-modified by threads #1, #2
|
|
Reason: this thread, #1, holds no consistent locks
|
|
Location 0x601034 has never been protected by any lock
|
|
]]></programlisting>
|
|
|
|
<para>This is quite a lot of detail for an apparently simple error.
|
|
The last clause is the main error message. It says there is a race as
|
|
a result of a write of size 4 (bytes), at 0x601034, which is
|
|
presumably the address of <computeroutput>var</computeroutput>,
|
|
happening in function <computeroutput>main</computeroutput> at line 13
|
|
in the program.</para>
|
|
|
|
<para>Note that it is purely by chance that the race is
|
|
reported for the parent thread's access. It could equally have been
|
|
reported instead for the child's access, at line 6. The error will
|
|
only be reported for one of the locations, since neither the parent
|
|
nor child is, by itself, incorrect. It is only when both access
|
|
<computeroutput>var</computeroutput> without a lock that an error
|
|
exists.</para>
|
|
|
|
<para>The error message shows some other interesting details. The
|
|
sections below explain them. Here we merely note their presence:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem><para>Helgrind maintains some kind of state machine for the
|
|
memory location in question, hence the "<computeroutput>Old
|
|
state:</computeroutput>" and "<computeroutput>New
|
|
state:</computeroutput>" lines.</para>
|
|
</listitem>
|
|
<listitem><para>Helgrind keeps track of which threads have accessed
|
|
the location: "<computeroutput>threads #1, #2</computeroutput>".
|
|
Before printing the main error message, it prints the creation
|
|
points of these two threads, so you can see which threads it is
|
|
referring to.</para>
|
|
</listitem>
|
|
<listitem><para>Helgrind tries to provide an explanation of why the
|
|
race exists: "<computeroutput>Location 0x601034 has never been
|
|
protected by any lock</computeroutput>".</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>Understanding the memory state machine is central to
|
|
understanding Helgrind's race-detection algorithm. The next three
|
|
subsections explain this.</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
<sect2 id="hg-manual.data-races.memstates" xreflabel="Memory States">
|
|
<title>Helgrind's Memory State Machine</title>
|
|
|
|
<para>Helgrind tracks the state of every byte of memory used by your
|
|
program. There are a number of states, but only three are
|
|
interesting:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem><para>Exclusive: memory in this state is regarded as owned
|
|
exclusively by one particular thread. That thread may read and
|
|
write it without a lock. Even in highly threaded programs, the
|
|
majority of locations never leave the Exclusive state, since most
|
|
data is thread-private.</para>
|
|
</listitem>
|
|
<listitem><para>Shared-Readonly: memory in this state is regarded as
|
|
shared by multiple threads. In this state, any thread may read the
|
|
memory without a lock, reflecting the fact that readonly data may
|
|
safely be shared between threads without locking.</para>
|
|
</listitem>
|
|
<listitem><para>Shared-Modified: memory in this state is regarded as
|
|
shared by multiple threads, at least one of which has written to it.
|
|
All participating threads must hold at least one lock in common when
|
|
accessing the memory. If no such lock exists, Helgrind reports a
|
|
race error.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>Let's review the simple example above with this in mind. When
|
|
the program starts, <computeroutput>var</computeroutput> is not in any
|
|
of these states. Either the parent or child thread gets to its
|
|
<computeroutput>var++</computeroutput> first, and thereby
|
|
thereby gets Exclusive ownership of the location.</para>
|
|
|
|
<para>The later-running thread now arrives at
|
|
its <computeroutput>var++</computeroutput> statement. It first reads
|
|
the existing value from memory.
|
|
Because <computeroutput>var</computeroutput> is currently marked as
|
|
owned exclusively by the other thread, its state is changed to
|
|
shared-readonly by both threads.</para>
|
|
|
|
<para>This same thread adds one to the value it has and stores it back
|
|
in <computeroutput>var</computeroutput>. This causes another state
|
|
change, this time to the shared-modified state. Because Helgrind has
|
|
also been tracking which threads hold which locks, it can see that
|
|
<computeroutput>var</computeroutput> is in shared-modified state but
|
|
no lock has been used to consistently protect it. Hence a race is
|
|
reported exactly at the transition from shared-readonly to
|
|
shared-modified.</para>
|
|
|
|
<para>The essence of the algorithm is this. Helgrind keeps track of
|
|
each memory location that has been accessed by more than one thread.
|
|
For each such location it incrementally infers the set of locks which
|
|
have consistently been used to protect that location. If the
|
|
location's lockset becomes empty, and at some point one of the threads
|
|
attempts to write to it, a race is then reported.</para>
|
|
|
|
<para>This technique is known as "lockset inference" and was
|
|
introduced in: "Eraser: A Dynamic Data Race Detector for Multithreaded
|
|
Programs" (Stefan Savage, Michael Burrows, Greg Nelson, Patrick
|
|
Sobalvarro and Thomas Anderson, ACM Transactions on Computer Systems,
|
|
15(4):391-411, November 1997).</para>
|
|
|
|
<para>Lockset inference has since been widely implemented, studied and
|
|
extended. Helgrind incorporates several refinements aimed at avoiding
|
|
the high false error rate that naive versions of the algorithm suffer
|
|
from. A
|
|
<link linkend="hg-manual.data-races.summary">summary of the complete
|
|
algorithm used by Helgrind</link> is presented below. First, however,
|
|
it is important to understand details of transitions pertaining to the
|
|
Exclusive-ownership state.</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2 id="hg-manual.data-races.exclusive" xreflabel="Excl Transfers">
|
|
<title>Transfers of Exclusive Ownership Between Threads</title>
|
|
|
|
<para>As presented, the algorithm is far too strict. It reports many
|
|
errors in perfectly correct, widely used parallel programming
|
|
constructions, for example, using child worker threads and worker
|
|
thread pools.</para>
|
|
|
|
<para>To avoid these false errors, we must refine the algorithm so
|
|
that it keeps memory in an Exclusive ownership state in cases where it
|
|
would otherwise decay into a shared-readonly or shared-modified state.
|
|
Recall that Exclusive ownership is special in that it grants the
|
|
owning thread the right to access memory without use of any locks. In
|
|
order to support worker-thread and worker-thread-pool idioms, we will
|
|
allow threads to steal exclusive ownership of memory from other
|
|
threads under certain circumstances.</para>
|
|
|
|
<para>Here's an example. Imagine a parent thread creates child
|
|
threads to do units of work. For each unit of work, the parent
|
|
allocates a work buffer, fills it in, and creates the child thread,
|
|
handing it a pointer to the buffer. The child reads/writes the buffer
|
|
and eventually exits, and the waiting parent then extracts the results
|
|
from the buffer:</para>
|
|
|
|
<programlisting><![CDATA[
|
|
typedef ... Buffer;
|
|
|
|
pthread_t child;
|
|
Buffer buf;
|
|
|
|
/* ---- Parent ---- */ /* ---- Child ---- */
|
|
|
|
/* parent writes workload into buf */
|
|
pthread_create( &child, child_fn, &buf );
|
|
|
|
/* parent does not read */ void child_fn ( Buffer* buf ) {
|
|
/* or write buf */ /* read/write buf */
|
|
}
|
|
|
|
pthread_join ( child );
|
|
/* parent reads results from buf */
|
|
]]></programlisting>
|
|
|
|
<para>Although <computeroutput>buf</computeroutput> is accessed by
|
|
both threads, neither uses locks, yet the program is race-free. The
|
|
essential observation is that the child's creation and exit create
|
|
synchronisation events between it and the parent. These force the
|
|
child's accesses to <computeroutput>buf</computeroutput> to happen
|
|
after the parent initialises <computeroutput>buf</computeroutput>, and
|
|
before the parent reads the results
|
|
from <computeroutput>buf</computeroutput>.</para>
|
|
|
|
<para>To model this, Helgrind allows the child to steal, from the
|
|
parent, exclusive ownership of any memory exclusively owned by the
|
|
parent before the pthread_create call. Similarly, once the parent's
|
|
pthread_join call returns, it can steal back ownership of memory
|
|
exclusively owned by the child. In this way ownership
|
|
of <computeroutput>buf</computeroutput> is transferred from parent to
|
|
child and back, so the basic algorithm does not report any races
|
|
despite the absence of any locking.</para>
|
|
|
|
<para>Note that the child may only steal memory owned by the parent
|
|
prior to the pthread_create call. If the child attempts to read or
|
|
write memory which is also accessed by the parent in between the
|
|
pthread_create and pthread_join calls, an error is still
|
|
reported.</para>
|
|
|
|
<para>This technique was introduced with the name "thread lifetime
|
|
segments" in "Runtime Checking of Multithreaded Applications with
|
|
Visual Threads" (Jerry J. Harrow, Jr, Proceedings of the 7th
|
|
International SPIN Workshop on Model Checking of Software Stanford,
|
|
California, USA, August 2000, LNCS 1885, pp331--342). Helgrind
|
|
implements an extended version of it. Specifically, Helgrind allows
|
|
transfer of exclusive ownership in the following situations:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem><para>At thread creation: a child can acquire ownership of
|
|
memory held exclusively by the parent prior to the child's
|
|
creation.</para>
|
|
</listitem>
|
|
<listitem><para>At thread joining: the joiner (thread not exiting)
|
|
can acquire ownership of memory held exclusively by the joinee
|
|
(thread that is exiting) at the point it exited.</para>
|
|
</listitem>
|
|
<listitem><para>At condition variable signallings and broadcasts. A
|
|
thread Tw which completes a pthread_cond_wait call as a result of
|
|
a signal or broadcast on the same condition variable by some other
|
|
thread Ts, may acquire ownership of memory held exclusively by
|
|
Ts prior to the pthread_cond_signal/broadcast
|
|
call.</para>
|
|
</listitem>
|
|
<listitem><para>At semaphore posts (sem_post) calls. A thread Tw
|
|
which completes a sem_wait call call as a result of a sem_post call
|
|
on the same semaphore by some other thread Tp, may acquire
|
|
ownership of memory held exclusively by Tp prior to the sem_post
|
|
call.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2 id="hg-manual.data-races.re-excl" xreflabel="Re-Excl Transfers">
|
|
<title>Restoration of Exclusive Ownership</title>
|
|
|
|
<para>Another common idiom is to partition the lifetime of the program
|
|
as a whole into several distinct phases. In some of those phases, a
|
|
memory location may be accessed by multiple threads and so require
|
|
locking. In other phases only one thread exists and so can access the
|
|
memory without locking. For example:</para>
|
|
|
|
<programlisting><![CDATA[
|
|
int var = 0; /* shared variable */
|
|
pthread_mutex_t mx = PTHREAD_MUTEX_INITIALIZER; /* guard for var */
|
|
pthread_t child;
|
|
|
|
/* ---- Parent ---- */ /* ---- Child ---- */
|
|
|
|
var += 1; /* no lock used */
|
|
|
|
pthread_create( &child, child_fn, NULL );
|
|
|
|
void child_fn ( void* uu ) {
|
|
pthread_mutex_lock(&mx); pthread_mutex_lock(&mx);
|
|
var += 2; var += 3;
|
|
pthread_mutex_unlock(&mx); pthread_mutex_unlock(&mx);
|
|
}
|
|
|
|
pthread_join ( child );
|
|
|
|
var += 4; /* no lock used */
|
|
]]></programlisting>
|
|
|
|
<para>This program is correct, but using only the mechanisms described
|
|
so far, Helgrind would report an error at
|
|
<computeroutput>var += 4</computeroutput>. This is because, by that
|
|
point, <computeroutput>var</computeroutput> is marked as being in the
|
|
state "shared-modified and protected by the
|
|
lock <computeroutput>mx</computeroutput>", but is being accessed
|
|
without locking. Really, what we want is
|
|
for <computeroutput>var</computeroutput> to return to the parent
|
|
thread's exclusive ownership after the child thread has exited.</para>
|
|
|
|
<para>To make this possible, for every memory location Helgrind also keeps
|
|
track of all the threads that have accessed that location
|
|
-- its threadset. When a thread Tquitter joins back to Tstayer,
|
|
Helgrind examines the locksets of all memory in shared-modified or
|
|
shared-readable state. In each such lockset, if Tquitter is
|
|
mentioned, it is removed and replaced by Tstayer. If, as a result, a
|
|
lockset becomes a singleton set containing Tstayer, then the
|
|
location's state is changed to belongs-exclusively-to-Tstayer.</para>
|
|
|
|
<para>In our example, the result is exactly as we desire:
|
|
<computeroutput>var</computeroutput> is reacquired exclusively by the
|
|
parent after the child exits.</para>
|
|
|
|
<para>More generally, when a group of threads merges back to a single
|
|
thread via a cascade of pthread_join calls, any memory shared by the
|
|
group (or a subset of it) ends up being owned exclusively by the sole
|
|
surviving thread. This significantly enhances Helgrind's flexibility,
|
|
since it means that each memory location may make arbitrarily many
|
|
transitions between exclusive and shared ownership. Furthermore, a
|
|
different lock may protect the location during each period of shared
|
|
ownership.</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2 id="hg-manual.data-races.summary" xreflabel="Race Det Summary">
|
|
<title>A Summary of the Race Detection Algorithm</title>
|
|
|
|
<para>Helgrind looks for memory locations which are accessed by more
|
|
than one thread. For each such location, Helgrind records which of
|
|
the program's locks were held by the accessing thread at the time of
|
|
each access. The hope is to discover that there is indeed at least
|
|
one lock which is consistently used by all threads to protect that
|
|
location. If no such lock can be found, then there is apparently no
|
|
consistent locking strategy being applied for that location, and so a
|
|
possible data race might result. Helgrind accordingly reports an
|
|
error.</para>
|
|
|
|
<para>In practice this discipline is far too simplistic, and is
|
|
unusable since it reports many races in some widely used and
|
|
known-correct programming disciplines. Helgrind's checking therefore
|
|
incorporates many refinements to this basic idea, and can be
|
|
summarised as follows:</para>
|
|
|
|
<para>The following thread events are intercepted and monitored:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem><para>thread creation and exiting (pthread_create,
|
|
pthread_join, pthread_exit)</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>lock acquisition and release (pthread_mutex_lock,
|
|
pthread_mutex_unlock, pthread_rwlock_rdlock,
|
|
pthread_rwlock_wrlock,
|
|
pthread_rwlock_unlock)</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>inter-thread event notifications (pthread_cond_wait,
|
|
pthread_cond_signal, pthread_cond_broadcast,
|
|
sem_wait, sem_post)</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>Memory allocation and deallocation events are intercepted and
|
|
monitored:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>malloc/new/free/delete and variants</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>stack allocation and deallocation</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>All memory accesses are intercepted and monitored.</para>
|
|
|
|
<para>By observing the above events, Helgrind can infer certain
|
|
aspects of the program's locking discipline. Programs which adhere to
|
|
the following rules are considered to be acceptable:
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>A thread may allocate memory, and write initial values into
|
|
it, without locking. That thread is regarded as owning the memory
|
|
exclusively.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>A thread may read and write memory which it owns exclusively,
|
|
without locking.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Memory which is owned exclusively by one thread may be read by
|
|
that thread and others without locking. However, in this situation
|
|
no thread may do unlocked writes to the memory (except for the owner
|
|
thread's initializing write).</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Memory which is shared between multiple threads, one or more
|
|
of which writes to it, must be protected by a lock which is
|
|
correctly acquired and released by all threads accessing the
|
|
memory.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>Any violation of this discipline will cause an error to be reported.
|
|
However, two exemptions apply:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>A thread Y can acquire exclusive ownership of memory
|
|
previously owned exclusively by a different thread X providing
|
|
X's last access and Y's first access are separated by one of the
|
|
following synchronization events:</para>
|
|
<itemizedlist>
|
|
<listitem><para>X creates thread Y</para></listitem>
|
|
<listitem><para>X joins back to Y</para></listitem>
|
|
<listitem><para>X uses a condition-variable to signal at Y, and Y is
|
|
waiting for that event</para></listitem>
|
|
<listitem><para>Y completes a semaphore wait as a result of X signalling
|
|
on that same semaphore</para></listitem>
|
|
</itemizedlist>
|
|
<para>
|
|
This refinement allows Helgrind to correctly track the ownership
|
|
state of inter-thread buffers used in the worker-thread and
|
|
worker-thread-pool concurrent programming idioms (styles).</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Similarly, if thread Y joins back to thread X, memory
|
|
exclusively owned by Y becomes exclusively owned by X instead.
|
|
Also, memory that has been shared only by X and Y becomes
|
|
exclusively owned by X. More generally, memory that has been shared
|
|
by X, Y and some arbitrary other set S of threads is re-marked as
|
|
shared by X and S. Hence, under the right circumstances, memory
|
|
shared amongst multiple threads, all of which join into just one,
|
|
can revert to the exclusive ownership state.</para>
|
|
<para>
|
|
In effect, each memory location may make arbitrarily many
|
|
transitions between exclusive and shared ownership. Furthermore, a
|
|
different lock may protect the location during each period of shared
|
|
ownership. This significantly enhances the flexibility of the
|
|
algorithm.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>The ownership state, accessing thread-set and related lock-set
|
|
for each memory location are tracked at 8-bit granularity. This means
|
|
the algorithm is precise even for 16- and 8-bit memory
|
|
accesses.</para>
|
|
|
|
<para>Helgrind correctly handles reader-writer locks in this
|
|
framework. Locations shared between multiple threads can be protected
|
|
during reads by locks held in either read-mode or write-mode, but can
|
|
only be protected during writes by locks held in write-mode. Normal
|
|
POSIX mutexes are treated as if they are reader-writer locks which are
|
|
only ever held in write-mode.</para>
|
|
|
|
<para>Helgrind correctly handles POSIX mutexes for which recursive
|
|
locking is allowed.</para>
|
|
|
|
<para>Helgrind partially correctly handles x86 and amd64 memory access
|
|
instructions preceded by a LOCK prefix. Writes are correctly handled,
|
|
by pretending that the LOCK prefix implies acquisition and release of
|
|
a magic "bus hardware lock" mutex before and after the instruction.
|
|
This unfortunately requires subsequent reads from such locations to
|
|
also use a LOCK prefix, which is not required by the real hardware.
|
|
Helgrind does not offer any equivalent handling for atomic sequences
|
|
on PowerPC/POWER platforms created by the use of lwarx/stwcx
|
|
instructions.</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2 id="hg-manual.data-races.errmsgs" xreflabel="Race Error Messages">
|
|
<title>Interpreting Race Error Messages</title>
|
|
|
|
<para>Helgrind's race detection algorithm collects a lot of
|
|
information, and tries to present it in a helpful way when a race is
|
|
detected. Here's an example:</para>
|
|
|
|
<programlisting><![CDATA[
|
|
Thread #2 was created
|
|
at 0x510548E: clone (in /lib64/libc-2.5.so)
|
|
by 0x4E2F305: do_clone (in /lib64/libpthread-2.5.so)
|
|
by 0x4E2F7C5: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.5.so)
|
|
by 0x4C23870: pthread_create@* (hg_intercepts.c:198)
|
|
by 0x400CEF: main (tc17_sembar.c:195)
|
|
|
|
// And the same for threads #3, #4 and #5 -- omitted for conciseness
|
|
|
|
Possible data race during read of size 4 at 0x602174
|
|
at 0x400BE5: gomp_barrier_wait (tc17_sembar.c:122)
|
|
by 0x400C44: child (tc17_sembar.c:161)
|
|
by 0x4C25DF7: mythread_wrapper (hg_intercepts.c:178)
|
|
by 0x4E2F09D: start_thread (in /lib64/libpthread-2.5.so)
|
|
by 0x51054CC: clone (in /lib64/libc-2.5.so)
|
|
Old state: shared-modified by threads #2, #3, #4, #5
|
|
New state: shared-modified by threads #2, #3, #4, #5
|
|
Reason: this thread, #2, holds no consistent locks
|
|
Last consistently used lock for 0x602174 was first observed
|
|
at 0x4C25D01: pthread_mutex_init (hg_intercepts.c:326)
|
|
by 0x4009E4: gomp_barrier_init (tc17_sembar.c:46)
|
|
by 0x400CBC: main (tc17_sembar.c:192)
|
|
]]></programlisting>
|
|
|
|
<para>Helgrind first announces the creation points of any threads
|
|
referenced in the error message. This is so it can speak concisely
|
|
about threads and sets of threads without repeatedly printing their
|
|
creation point call stacks. Each thread is only ever announced once,
|
|
the first time it appears in any Helgrind error message.</para>
|
|
|
|
<para>The main error message begins at the text
|
|
"<computeroutput>Possible data race during read</computeroutput>".
|
|
At the start is information you would expect to see -- address and
|
|
size of the racing access, whether a read or a write, and the call
|
|
stack at the point it was detected.</para>
|
|
|
|
<para>More interesting is the state transition caused by this access.
|
|
This memory is already in the shared-modified state, and up to now has
|
|
been consistently protected by at least one lock. However, the thread
|
|
making the access in question (thread #2, here) does not hold any
|
|
locks in common with those held during all previous accesses to the
|
|
location -- "no consistent locks", in other words.</para>
|
|
|
|
<para>Finally, Helgrind shows the lock which has protected this
|
|
location in all previous accesses. (If there is more than one, only
|
|
one is shown). This can be a useful hint, because it typically shows
|
|
the lock that the programmers intended to use to protect the location,
|
|
but in this case forgot.</para>
|
|
|
|
<para>Here are some more examples of race reports. This not an
|
|
exhaustive list of combinations, but should give you some insight into
|
|
how to interpret the output.</para>
|
|
|
|
<programlisting><![CDATA[
|
|
Possible data race during write ...
|
|
Old state: shared-readonly by threads #1, #2, #3
|
|
New state: shared-modified by threads #1, #2, #3
|
|
Reason: this thread, #3, holds no consistent locks
|
|
Location ... has never been protected by any lock
|
|
]]></programlisting>
|
|
|
|
<para>The location is shared by 3 threads, all of which have been
|
|
reading it without locking ("has never been protected by any lock").
|
|
Now one of them is writing it. Regardless of whether the writer has a
|
|
lock or not, this is still an error, because the write races against
|
|
the previously observed reads.</para>
|
|
|
|
<programlisting><![CDATA[
|
|
Possible data race during read ...
|
|
Old state: shared-modified by threads #1, #2, #3
|
|
New state: shared-modified by threads #1, #2, #3
|
|
Reason: this thread, #3, holds no consistent locks
|
|
Last consistently used lock for ... was first observed ...
|
|
]]></programlisting>
|
|
|
|
<para>The location is shared by 3 threads, all of which have been
|
|
reading and writing it while (as required) holding at least one lock
|
|
in common. Now it is being read without that lock being held. In the
|
|
"Last consistently used lock" part, Helgrind offers its best guess as
|
|
to the identity of the lock that should have been used.</para>
|
|
|
|
<programlisting><![CDATA[
|
|
Possible data race during write ...
|
|
Old state: owned exclusively by thread #4
|
|
New state: shared-modified by threads #4, #5
|
|
Reason: this thread, #5, holds no locks at all
|
|
]]></programlisting>
|
|
|
|
<para>A location that has so far been accessed exclusively by thread
|
|
#4 has now been written by thread #5, without use of any lock. This
|
|
can be a sign that the programmer did not consider the possibility of
|
|
the location being shared between threads, or, alternatively, forgot
|
|
to use the appropriate lock.</para>
|
|
|
|
<para>Note that thread #4 exclusively owns the location, and so has
|
|
the right to access it without holding a lock. However, this message
|
|
does not say that thread #4 is not using a lock for this location.
|
|
Indeed, it could be using a lock for the location because it intends
|
|
to make it available to other threads, one of which is thread #5 --
|
|
and thread #5 has forgotten to use the lock.</para>
|
|
|
|
<para>Also, this message implies that Helgrind did not see any
|
|
synchronisation event between threads #4 and #5 that would have
|
|
allowed #5 to acquire exclusive ownership from #4. See
|
|
<link linkend="hg-manual.data-races.exclusive">above</link>
|
|
for a discussion of transfers of exclusive ownership states between
|
|
threads.</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="hg-manual.effective-use" xreflabel="Helgrind Effective Use">
|
|
<title>Hints and Tips for Effective Use of Helgrind</title>
|
|
|
|
<para>Helgrind can be very helpful in finding and resolving
|
|
threading-related problems. Like all sophisticated tools, it is most
|
|
effective when you understand how to play to its strengths.</para>
|
|
|
|
<para>Helgrind will be less effective when you merely throw an
|
|
existing threaded program at it and try to make sense of any reported
|
|
errors. It will be more effective if you design threaded programs
|
|
from the start in a way that helps Helgrind verify correctness. The
|
|
same is true for finding memory errors with Memcheck, but applies more
|
|
here, because thread checking is a harder problem. Consequently it is
|
|
much easier to write a correct program for which Helgrind falsely
|
|
reports (threading) errors than it is to write a correct program for
|
|
which Memcheck falsely reports (memory) errors.</para>
|
|
|
|
<para>With that in mind, here are some tips, listed most important first,
|
|
for getting reliable results and avoiding false errors. The first two
|
|
are critical. Any violations of them will swamp you with huge numbers
|
|
of false data-race errors.</para>
|
|
|
|
|
|
<orderedlist>
|
|
|
|
<listitem>
|
|
<para>Make sure your application, and all the libraries it uses,
|
|
use the POSIX threading primitives. Helgrind needs to be able to
|
|
see all events pertaining to thread creation, exit, locking and
|
|
other synchronisation events. To do so it intercepts many POSIX
|
|
pthread_ functions.</para>
|
|
|
|
<para>Do not roll your own threading primitives (mutexes, etc)
|
|
from combinations of the Linux futex syscall, counters and wotnot.
|
|
These throw Helgrind's internal what's-going-on models way off
|
|
course and will give bogus results.</para>
|
|
|
|
<para>Also, do not reimplement existing POSIX abstractions using
|
|
other POSIX abstractions. For example, don't build your own
|
|
semaphore routines or reader-writer locks from POSIX mutexes and
|
|
condition variables. Instead use POSIX reader-writer locks and
|
|
semaphores directly, since Helgrind supports them directly.</para>
|
|
|
|
<para>Helgrind directly supports the following POSIX threading
|
|
abstractions: mutexes, reader-writer locks, condition variables
|
|
(but see below), and semaphores. Currently spinlocks and barriers
|
|
are not supported, although they could be in future. A prototype
|
|
"safe" implementation of barriers, based on semaphores, is
|
|
available: please contact the Valgrind authors for details.</para>
|
|
|
|
<para>At the time of writing, the following popular Linux packages
|
|
are known to implement their own threading primitives:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem><para>Qt version 4.X. Qt 3.X is fine, but not 4.X.
|
|
Helgrind contains partial direct support for Qt 4.X threading,
|
|
but this is not yet in a usable state. Assistance from folks
|
|
knowledgeable in Qt 4 threading internals would be
|
|
appreciated.</para></listitem>
|
|
|
|
<listitem><para>Runtime support library for GNU OpenMP (part of
|
|
GCC), at least GCC versions 4.2 and 4.3. With some minor effort
|
|
of modifying the GNU OpenMP runtime support sources, it is
|
|
possible to use Helgrind on GNU OpenMP compiled codes. Please
|
|
contact the Valgrind authors for details.</para></listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>Avoid memory recycling. If you can't avoid it, you must use
|
|
tell Helgrind what is going on via the VALGRIND_HG_CLEAN_MEMORY
|
|
client request
|
|
(in <computeroutput>helgrind.h</computeroutput>).</para>
|
|
|
|
<para>Helgrind is aware of standard memory allocation and
|
|
deallocation that occurs via malloc/free/new/delete and from entry
|
|
and exit of stack frames. In particular, when memory is
|
|
deallocated via free, delete, or function exit, Helgrind considers
|
|
that memory clean, so when it is eventually reallocated, its
|
|
history is irrelevant.</para>
|
|
|
|
<para>However, it is common practice to implement memory recycling
|
|
schemes. In these, memory to be freed is not handed to
|
|
malloc/delete, but instead put into a pool of free buffers to be
|
|
handed out again as required. The problem is that Helgrind has no
|
|
way to know that such memory is logically no longer in use, and
|
|
its history is irrelevant. Hence you must make that explicit,
|
|
using the VALGRIND_HG_CLEAN_MEMORY client request to specify the
|
|
relevant address ranges. It's easiest to put these requests into
|
|
the pool manager code, and use them either when memory is returned
|
|
to the pool, or is allocated from it.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>Avoid POSIX condition variables. If you can, use POSIX
|
|
semaphores (sem_t, sem_post, sem_wait) to do inter-thread event
|
|
signalling. Semaphores with an initial value of zero are
|
|
particularly useful for this.</para>
|
|
|
|
<para>Helgrind only partially correctly handles POSIX condition
|
|
variables. This is because Helgrind can see inter-thread
|
|
dependencies between a pthread_cond_wait call and a
|
|
pthread_cond_signal/broadcast call only if the waiting thread
|
|
actually gets to the rendezvous first (so that it actually calls
|
|
pthread_cond_wait). It can't see dependencies between the threads
|
|
if the signaller arrives first. In the latter case, POSIX
|
|
guidelines imply that the associated boolean condition still
|
|
provides an inter-thread synchronisation event, but one which is
|
|
invisible to Helgrind.</para>
|
|
|
|
<para>The result of Helgrind missing some inter-thread
|
|
synchronisation events is to cause it to report false positives.
|
|
That's because missing such events reduces the extent to which it
|
|
can transfer exclusive memory ownership between threads. So
|
|
memory may end up in a shared-modified state when that was not
|
|
intended by the application programmers.</para>
|
|
|
|
<para>The root cause of this synchronisation lossage is
|
|
particularly hard to understand, so an example is helpful. It was
|
|
discussed at length by Arndt Muehlenfeld ("Runtime Race Detection
|
|
in Multi-Threaded Programs", Dissertation, TU Graz, Austria). The
|
|
canonical POSIX-recommended usage scheme for condition variables
|
|
is as follows:</para>
|
|
|
|
<programlisting><![CDATA[
|
|
b is a Boolean condition, which is False most of the time
|
|
cv is a condition variable
|
|
mx is its associated mutex
|
|
|
|
Signaller: Waiter:
|
|
|
|
lock(mx) lock(mx)
|
|
b = True while (b == False)
|
|
signal(cv) wait(cv,mx)
|
|
unlock(mx) unlock(mx)
|
|
]]></programlisting>
|
|
|
|
<para>Assume <computeroutput>b</computeroutput> is False most of
|
|
the time. If the waiter arrives at the rendezvous first, it
|
|
enters its while-loop, waits for the signaller to signal, and
|
|
eventually proceeds. Helgrind sees the signal, notes the
|
|
dependency, and all is well.</para>
|
|
|
|
<para>If the signaller arrives
|
|
first, <computeroutput>b</computeroutput> is set to true, and the
|
|
signal disappears into nowhere. When the waiter later arrives, it
|
|
does not enter its while-loop and simply carries on. But even in
|
|
this case, the waiter code following the while-loop cannot execute
|
|
until the signaller sets <computeroutput>b</computeroutput> to
|
|
True. Hence there is still the same inter-thread dependency, but
|
|
this time it is through an arbitrary in-memory condition, and
|
|
Helgrind cannot see it.</para>
|
|
|
|
<para>By comparison, Helgrind's detection of inter-thread
|
|
dependencies caused by semaphore operations is believed to be
|
|
exactly correct.</para>
|
|
|
|
<para>As far as I know, a solution to this problem that does not
|
|
require source-level annotation of condition-variable wait loops
|
|
is beyond the current state of the art.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>Make sure you are using a supported Linux distribution. At
|
|
present, Helgrind only properly supports x86-linux and amd64-linux
|
|
with glibc-2.3 or later. The latter restriction means we only
|
|
support glibc's NPTL threading implementation. The old
|
|
LinuxThreads implementation is not supported.</para>
|
|
|
|
<para>Unsupported targets may work to varying degrees. In
|
|
particular ppc32-linux and ppc64-linux running NTPL should work,
|
|
but you will get false race errors because Helgrind does not know
|
|
how to properly handle atomic instruction sequences created using
|
|
the lwarx/stwcx instructions.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>Round up all finished threads using pthread_join. Avoid
|
|
detaching threads: don't create threads in the detached state, and
|
|
don't call pthread_detach on existing threads.</para>
|
|
|
|
<para>Using pthread_join to round up finished threads provides a
|
|
clear synchronisation point that both Helgrind and programmers can
|
|
see. This synchronisation point allows Helgrind to adjust its
|
|
memory ownership
|
|
models <link linkend="hg-manual.data-races.exclusive">as described
|
|
extensively above</link>, which helps Helgrind produce more
|
|
accurate error reports.</para>
|
|
|
|
<para>If you don't call pthread_join on a thread, Helgrind has no
|
|
way to know when it finishes, relative to any significant
|
|
synchronisation points for other threads in the program. So it
|
|
assumes that the thread lingers indefinitely and can potentially
|
|
interfere indefinitely with the memory state of the program. It
|
|
has every right to assume that -- after all, it might really be
|
|
the case that, for scheduling reasons, the exiting thread did run
|
|
very slowly in the last stages of its life.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>Perform thread debugging (with Helgrind) and memory
|
|
debugging (with Memcheck) together.</para>
|
|
|
|
<para>Helgrind tracks the state of memory in detail, and memory
|
|
management bugs in the application are liable to cause confusion.
|
|
In extreme cases, applications which do many invalid reads and
|
|
writes (particularly to freed memory) have been known to crash
|
|
Helgrind. So, ideally, you should make your application
|
|
Memcheck-clean before using Helgrind.</para>
|
|
|
|
<para>It may be impossible to make your application Memcheck-clean
|
|
unless you first remove threading bugs. In particular, it may be
|
|
difficult to remove all reads and writes to freed memory in
|
|
multithreaded C++ destructor sequences at program termination.
|
|
So, ideally, you should make your application Helgrind-clean
|
|
before using Memcheck.</para>
|
|
|
|
<para>Since this circularity is obviously unresolvable, at least
|
|
bear in mind that Memcheck and Helgrind are to some extent
|
|
complementary, and you may need to use them together.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>POSIX requires that implementations of standard I/O (printf,
|
|
fprintf, fwrite, fread, etc) are thread safe. Unfortunately GNU
|
|
libc implements this by using internal locking primitives that
|
|
Helgrind is unable to intercept. Consequently Helgrind generates
|
|
many false race reports when you use these functions.</para>
|
|
|
|
<para>Helgrind attempts to hide these errors using the standard
|
|
Valgrind error-suppression mechanism. So, at least for simple
|
|
test cases, you don't see any. Nevertheless, some may slip
|
|
through. Just something to be aware of.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>Helgrind's error checks do not work properly inside the
|
|
system threading library itself
|
|
(<computeroutput>libpthread.so</computeroutput>), and it usually
|
|
observes large numbers of (false) errors in there. Valgrind's
|
|
suppression system then filters these out, so you should not see
|
|
them.</para>
|
|
|
|
<para>If you see any race errors reported
|
|
where <computeroutput>libpthread.so</computeroutput> or
|
|
<computeroutput>ld.so</computeroutput> is the object associated
|
|
with the innermost stack frame, please file a bug report at
|
|
http://www.valgrind.org.</para>
|
|
</listitem>
|
|
|
|
</orderedlist>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
|
|
<sect1 id="hg-manual.options" xreflabel="Helgrind Options">
|
|
<title>Helgrind Options</title>
|
|
|
|
<para>The following end-user options are available:</para>
|
|
|
|
<!-- start of xi:include in the manpage -->
|
|
<variablelist id="hg.opts.list">
|
|
|
|
<varlistentry id="opt.happens-before" xreflabel="--happens-before">
|
|
<term>
|
|
<option><![CDATA[--happens-before=none|threads|all
|
|
[default: all] ]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>Helgrind always regards locks as the basis for
|
|
inter-thread synchronisation. However, by default, before
|
|
reporting a race error, Helgrind will also check whether
|
|
certain other kinds of inter-thread synchronisation events
|
|
happened. It may be that if such events took place, then no
|
|
race really occurred, and so no error needs to be reported.
|
|
See <link linkend="hg-manual.data-races.exclusive">above</link>
|
|
for a discussion of transfers of exclusive ownership states
|
|
between threads.
|
|
</para>
|
|
<para>With <varname>--happens-before=all</varname>, the
|
|
following events are regarded as sources of synchronisation:
|
|
thread creation/joinage, condition variable
|
|
signal/broadcast/waits, and semaphore posts/waits.
|
|
</para>
|
|
<para>With <varname>--happens-before=threads</varname>, only
|
|
thread creation/joinage events are regarded as sources of
|
|
synchronisation.
|
|
</para>
|
|
<para>With <varname>--happens-before=none</varname>, no events
|
|
(apart, of course, from locking) are regarded as sources of
|
|
synchronisation.
|
|
</para>
|
|
<para>Changing this setting from the default will increase your
|
|
false-error rate but give little or no gain. The only advantage
|
|
is that <option>--happens-before=threads</option> and
|
|
<option>--happens-before=none</option> should make Helgrind
|
|
less and less sensitive to the scheduling of threads, and hence
|
|
the output more and more repeatable across runs.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry id="opt.trace-addr" xreflabel="--trace-addr">
|
|
<term>
|
|
<option><![CDATA[--trace-addr=0xXXYYZZ
|
|
]]></option> and
|
|
<option><![CDATA[--trace-level=0|1|2 [default: 1]
|
|
]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>Requests that Helgrind produces a log of all state changes
|
|
to location 0xXXYYZZ. This can be helpful in tracking down
|
|
tricky races. <varname>--trace-level</varname> controls the
|
|
verbosity of the log. At the default setting (1), a one-line
|
|
summary of is printed for each state change. At level 2 a
|
|
complete stack trace is printed for each state change.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
<!-- end of xi:include in the manpage -->
|
|
|
|
<!-- start of xi:include in the manpage -->
|
|
<para>In addition, the following debugging options are available for
|
|
Helgrind:</para>
|
|
|
|
<variablelist id="hg.debugopts.list">
|
|
|
|
<varlistentry id="opt.trace-malloc" xreflabel="--trace-malloc">
|
|
<term>
|
|
<option><![CDATA[--trace-malloc=no|yes [no]
|
|
]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>Show all client malloc (etc) and free (etc) requests.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry id="opt.gen-vcg" xreflabel="--gen-vcg">
|
|
<term>
|
|
<option><![CDATA[--gen-vcg=no|yes|yes-w-vts [no]
|
|
]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>At exit, write to stderr a dump of the happens-before
|
|
graph computed by Helgrind, in a format suitable for the VCG
|
|
graph visualisation tool. A suitable command line is:</para>
|
|
<para><computeroutput>valgrind --tool=helgrind
|
|
--gen-vcg=yes my_app 2>&1
|
|
| grep xxxxxx | sed "s/xxxxxx//g"
|
|
| xvcg -</computeroutput></para>
|
|
<para>With <varname>--gen-vcg=yes</varname>, the basic
|
|
happens-before graph is shown. With
|
|
<varname>--gen-vcg=yes-w-vts</varname>, the vector timestamp
|
|
for each node is also shown.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry id="opt.cmp-race-err-addrs"
|
|
xreflabel="--cmp-race-err-addrs">
|
|
<term>
|
|
<option><![CDATA[--cmp-race-err-addrs=no|yes [no]
|
|
]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>Controls whether or not race (data) addresses should be
|
|
taken into account when removing duplicates of race errors.
|
|
With <varname>--cmp-race-err-addrs=no</varname>, two otherwise
|
|
identical race errors will be considered to be the same if
|
|
their race addresses differ. With
|
|
With <varname>--cmp-race-err-addrs=yes</varname> they will be
|
|
considered different. This is provided to help make certain
|
|
regression tests work reliably.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry id="opt.hg-sanity-flags" xreflabel="--hg-sanity-flags">
|
|
<term>
|
|
<option><![CDATA[--hg-sanity-flags=<XXXXXX> (X = 0|1) [000000]
|
|
]]></option>
|
|
</term>
|
|
<listitem>
|
|
<para>Run extensive sanity checks on Helgrind's internal
|
|
data structures at events defined by the bitstring, as
|
|
follows:</para>
|
|
<para><computeroutput>100000 </computeroutput>at every query
|
|
to the happens-before graph</para>
|
|
<para><computeroutput>010000 </computeroutput>after changes to
|
|
the lock order acquisition graph</para>
|
|
<para><computeroutput>001000 </computeroutput>after every client
|
|
memory access (NB: not currently used)</para>
|
|
<para><computeroutput>000100 </computeroutput>after every client
|
|
memory range permission setting of 256 bytes or greater</para>
|
|
<para><computeroutput>000010 </computeroutput>after every client
|
|
lock or unlock event</para>
|
|
<para><computeroutput>000001 </computeroutput>after every client
|
|
thread creation or joinage event</para>
|
|
<para>Note these will make Helgrind run very slowly, often to
|
|
the point of being completely unusable.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
<!-- end of xi:include in the manpage -->
|
|
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="hg-manual.todolist" xreflabel="To Do List">
|
|
<title>A To-Do List for Helgrind</title>
|
|
|
|
<para>The following is a list of loose ends which should be tidied up
|
|
some time.</para>
|
|
|
|
<itemizedlist>
|
|
<listitem><para>Track which mutexes are associated with which
|
|
condition variables, and emit a warning if this becomes
|
|
inconsistent.</para>
|
|
</listitem>
|
|
<listitem><para>For lock order errors, print the complete lock
|
|
cycle, rather than only doing for size-2 cycles as at
|
|
present.</para>
|
|
</listitem>
|
|
<listitem><para>Document the VALGRIND_HG_CLEAN_MEMORY client
|
|
request.</para>
|
|
</listitem>
|
|
<listitem><para>Possibly a client request to forcibly transfer
|
|
ownership of memory from one thread to another. Requires further
|
|
consideration.</para>
|
|
</listitem>
|
|
<listitem><para>Add a new client request that marks an address range
|
|
as being "shared-modified with empty lockset" (the error state),
|
|
and describe how to use it.</para>
|
|
</listitem>
|
|
<listitem><para>Document races caused by gcc's thread-unsafe code
|
|
generation for speculative stores. In the interim see
|
|
<computeroutput>http://gcc.gnu.org/ml/gcc/2007-10/msg00266.html
|
|
</computeroutput>
|
|
and <computeroutput>http://lkml.org/lkml/2007/10/24/673</computeroutput>.
|
|
</para>
|
|
</listitem>
|
|
<listitem><para>Don't update the lock-order graph, and don't check
|
|
for errors, when a "try"-style lock operation happens (eg
|
|
pthread_mutex_trylock). Such calls do not add any real
|
|
restrictions to the locking order, since they can always fail to
|
|
acquire the lock, resulting in the caller going off and doing Plan
|
|
B (presumably it will have a Plan B). Doing such checks could
|
|
generate false lock-order errors and confuse users.</para>
|
|
</listitem>
|
|
<listitem><para> Performance can be very poor. Slowdowns on the
|
|
order of 100:1 are not unusual. There is quite some scope for
|
|
performance improvements, though.
|
|
</para>
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</sect1>
|
|
|
|
</chapter>
|