mirror of
https://github.com/Zenithsiz/ftmemsim-valgrind.git
synced 2026-02-14 06:48:06 +00:00
392 lines
17 KiB
Plaintext
392 lines
17 KiB
Plaintext
DRD: a Data Race Detector
|
||
=========================
|
||
|
||
Last update: February 16, 2008 by Bart Van Assche.
|
||
|
||
|
||
Introduction
|
||
------------
|
||
|
||
Multithreading is a concept to model multiple concurrent activities
|
||
within a single process. Each such concurrent activity is called a
|
||
thread. All threads that are active within a process share the same
|
||
set of memory locations. Data is exchanged between threads by writing
|
||
to and reading from the shared memory. Since the invention of the
|
||
multithreading concept, there is an ongoing debate about which way to
|
||
model concurrent activities is better -- shared memory programming or
|
||
message passing. This debate exists because each model has significant
|
||
advantages and disadvantages. While shared memory programming relieves
|
||
the programmer from writing code for the exchange of data between
|
||
concurrent activities and while shared memory programming has a
|
||
performance advantage over message passing, shared memory programming
|
||
is error prone. Shared memory programs can exhibit data races and/or
|
||
deadlocks. Data races are harmful because these may lead to
|
||
unpredictable results and nondeterministic behavior in multithreaded
|
||
programs. There are two ways to detect data races and deadlocks:
|
||
static analysis and runtime detection by a tool. Since there do not
|
||
yet exist any tools that can carry out static analysis of data races
|
||
or deadlocks, the only option to statically detect such anomalies is
|
||
source reading by a human. It takes a huge effort however to detect
|
||
all possible data races or deadlocks via source reading. This is why
|
||
tools for detecting data races and deadlocks at runtime are essential.
|
||
|
||
The de facto standard library for multithreading on Unix systems is
|
||
the POSIX threads library, also known as pthreads. The exp-drd tool
|
||
has been developed for multithreaded software that uses the POSIX
|
||
threads library.
|
||
|
||
|
||
Data Races
|
||
----------
|
||
|
||
Threads in a multithreaded process exchange information by writing to
|
||
and reading from memory locations shared by the threads. Two accesses
|
||
to the same memory location by different threads are called
|
||
conflicting accesses if at least one of these two accesses modifies
|
||
the contents of the memory location.
|
||
|
||
A deterministic exchange of data between threads is only possible if
|
||
conflicting accesses happen in a well-defined order. It is the role of
|
||
synchronization actions to enforce the runtime execution order of
|
||
conflicting accesses. Examples of such synchronization actions are
|
||
pthread_mutex_lock(), pthread_mutex_unlock(), sem_wait(), sem_post(),
|
||
...
|
||
|
||
An important concept with regard to the ordering of load and store
|
||
operations on shared memory is the happens-before-1 relation or hb1
|
||
[Adve 1991]. The hb1 relation is a partial order defined over all
|
||
shared memory operations. The hb1 relation includes both the
|
||
intrathread execution order and the interthread ordering imposed by
|
||
synchronization operations. All intrathread accesses of a single
|
||
thread are totally ordered by hb1. Since hb1 is a partial order for
|
||
interthread memory accesses, interthread memory accesses are either
|
||
ordered or not ordered by hb1. A data race is defined by Adve et
|
||
al. as two conflicting accesses that are not ordered by the
|
||
happens-before-1 relation. Or: which accesses are considered as data
|
||
races depends on the runtime behavior of a program.
|
||
|
||
There is an interesting relationship between runtime behavior and
|
||
multithreaded design patterns. The most straightforward way to ensure
|
||
that different threads access shared data in an orderly fashion is to
|
||
ensure that at most one thread can access the object at any given
|
||
time. This can be realized by a programmer to surround all shared data
|
||
accesses with calls to proper synchronization functions. Such a source
|
||
code strategy for avoiding data races is also called a locking
|
||
discipline. An important property of programs that follow this
|
||
strategy is that these programs are data-race free.
|
||
|
||
There exist two kinds of tools for verifying the runtime behavior of
|
||
multithreaded programs. One class of tools verifies a locking
|
||
strategy, and another class of tools verifies the absence of data
|
||
races. The difference is subtle but important.
|
||
|
||
The most well know algorithm for runtime verification of a locking
|
||
strategy is the so called Eraser algorithm [Savage 1997]. While this
|
||
algorithm allows to catch more programming errors than the conflicting
|
||
accesses classified as data races by the definition of Sarita Adve et
|
||
al., unfortunately the Eraser algorithm also reports a lot of false
|
||
positives. It is tedious to review the output of the Eraser tool
|
||
manually and to verify which reported pairs of accesses are false
|
||
positives and which pairs are real data races. There is still research
|
||
ongoing about how to reduce the number of false positives reported by
|
||
the Eraser algorithm -- see e.g. [Müehlenfeld 2007]. The Helgrind
|
||
tool is a refinement of the Eraser algorithm.
|
||
|
||
A second class of data race detection tools detects all conflicting
|
||
accesses that are data races according to the definition of Sarita
|
||
Adve et al. While in theory there is no guarantee that these tools
|
||
detect all locking discipline violations, these tools do not report
|
||
false positives. These tools are the most practical tools to
|
||
use. Examples of this class of tools are DIOTA [Ronsse 2004], Intel(R)
|
||
Thread Checker [Banerjee 2006a, Banerjee 2006b, Sack 2006] and DRD.
|
||
|
||
|
||
About DRD
|
||
---------
|
||
|
||
DRD is still under development, that is why the tool is named exp-drd.
|
||
The current version of DRD is able to perform data race detection on
|
||
small programs -- DRD quickly runs out of memory for realistically
|
||
sized programs. The current version runs well under Linux on x86
|
||
CPU's for multithreaded programs that use the POSIX threading
|
||
library. Regular POSIX threads, detached threads, mutexes, condition
|
||
variables, spinlocks, semaphores and barriers are supported. POSIX
|
||
reader-writer locks are not yet supported.
|
||
|
||
Although [Savage 1997] claims that a happens-before detector is harder
|
||
to implement efficiently than the Eraser algorithm, as of Valgrind
|
||
version 3.3.0 exp-drd runs significantly faster on several regression
|
||
tests than Helgrind.
|
||
|
||
|
||
Programming with Threads
|
||
------------------------
|
||
|
||
The difficulties with shared memory programming are well known and
|
||
have been outlined in more than one paper [Ousterhout 1996, Lee
|
||
2006]. It is possible however to develop and to debug multithreaded
|
||
shared memory software with a reasonable effort, even for large
|
||
applications (more than one million lines of code). In what follows an
|
||
approach is explained that has proven to work in practice. Before you
|
||
decide to use another approach, make sure you understand very well the
|
||
consequences of doing so.
|
||
|
||
1. Use of synchronization calls.
|
||
|
||
Do not call synchronization functions directly but use objects that
|
||
encapsulate the mutex, Mesa monitor and reader/writer locking policies.
|
||
Never use POSIX condition variables directly, since direct use of
|
||
condition variables can easily introduce race conditions. And never
|
||
lock or unlock mutexes explicitly -- use scoped locking instead.
|
||
|
||
2. Data hiding.
|
||
|
||
It is very important in multithreaded software to hide data that is
|
||
shared over threads. Make sure that all shared data is declared as
|
||
private data members of a class (not public, not protected). Design
|
||
the classes that contain shared data such that the number of data
|
||
members and the number of member functions is relatively small. Define
|
||
accessor functions as needed for querying and modifying member
|
||
data. Declare the associated locking objects also as private data
|
||
members, and document which locking object protects which data
|
||
members. Make sure that the query functions return a copy of data
|
||
members instead of a reference -- returning a reference would violate
|
||
data hiding anyway. This approach has a big advantage, namely that
|
||
correct use of a locking policy can be verified by reviewing one class
|
||
at a time.
|
||
|
||
3. Modularity and hierarchy.
|
||
|
||
For multithreaded software it is even more important than for single
|
||
threaded software that the software has a modular structure and that
|
||
there exists a hierarchy between modules. This way every call of a
|
||
function to another function can be classified as either a regular
|
||
function call (a call from a higher level to a lower level), a
|
||
callback (a call from a lower level to a higher level) or a recursive
|
||
function call.
|
||
|
||
4. Avoiding deadlocks.
|
||
|
||
Deadlocks can be nasty to solve since some deadlocks are very hard to
|
||
reproduce. Prevent deadlocks instead of waiting until one pops
|
||
up. Preventing deadlocks is possible by making sure that whenever two
|
||
or more mutexes are locked simultaneously, these mutexes are always
|
||
locked in the same order. One way to ensure this is by assigning each
|
||
mutex a locking order and by verifying the locking order at runtime.
|
||
This reduces the complexity of testing for absence of deadlocks from a
|
||
multithreaded to a single-threaded problem, which is a huge win. In
|
||
order to verify a locking order policy at run time, one can either use
|
||
a threading library with built-in support for verifying such a policy
|
||
or one can use a tool that verifies the locking order.
|
||
|
||
Make sure that no mutexes are locked while invoking a callback
|
||
(calling a function from a higher level module) -- invoking a
|
||
callback while a mutex is locked is a well known way to trigger
|
||
a deadlock.
|
||
|
||
5. Real-time software
|
||
|
||
Software with hard real-time constraints is a special case. There
|
||
exist real-time applications that must be able to generate a response
|
||
within e.g. 1 ms after a certain input has been received. The proper
|
||
way to implement time-critical paths is not to call any function in
|
||
that path for which it is not known how long the function call will
|
||
take. Exmples for Linux of actions with an unknown call time are:
|
||
- Locking a mutex.
|
||
- Dynamic memory allocation via e.g. malloc() since malloc() internally
|
||
uses mutexes.
|
||
- File I/O, since file I/O uses several resources that are shared over
|
||
threads and even over processes.
|
||
|
||
An approach that has proven to work for interthread communication
|
||
between real-time threads is the use of preallocated fixed size
|
||
message queueus, and to lock any data needed by any real-time thread
|
||
in memory (mlock()). Avoid mutexes with priority inheritance -- see
|
||
also [Yodaiken 2004] for more information.
|
||
|
||
|
||
How to use DRD
|
||
--------------
|
||
|
||
To use this tool, specify --tool=drd on the Valgrind command line.
|
||
|
||
|
||
Future DRD Versions
|
||
-------------------
|
||
The following may be expected in future versions of DRD:
|
||
* Drastically reduced memory consumption, such that realistic applications can
|
||
be analyzed with DRD.
|
||
* Faster operation.
|
||
* More extensive documentation.
|
||
* Support for reader-writer locks.
|
||
* Support for PowerPC CPU's.
|
||
* A lock dependency analyzer, as a help in deadlock prevention.
|
||
* Elimination of several artificial limitations.
|
||
|
||
|
||
Acknowledgements
|
||
----------------
|
||
|
||
The exp-drd tool is built on top of the Valgrind core and VEX, which
|
||
proved to be an excellent infrastructure for building such a tool.
|
||
|
||
During 2006, the early versions of drd were improved via helpful
|
||
feedback of Julian Seward and Nicholas Nethercote. Any bugs are my
|
||
responsibility of course.
|
||
|
||
Some of the regression tests used to test exp-drd were developed by
|
||
Julian Seward as regression tests for the Helgrind tool.
|
||
|
||
I would also like to thank Michiel Ronsse for introducing me a long
|
||
time ago to vector clocks and the JiTI and DIOTA projects.
|
||
|
||
|
||
References
|
||
----------
|
||
|
||
[Hansen 1972]
|
||
Per Brinch Hansen
|
||
A Comparison of Two Synchronizing Concepts.
|
||
Acta Informatica, 1 3(1972), pp. 190--199.
|
||
|
||
[Dijkstra 1974]
|
||
Edsger W. Dijkstra.
|
||
Over seinpalen (About Semaphores).
|
||
Circulated privately (never published), 1974.
|
||
http://www.cs.utexas.edu/users/EWD/transcriptions/EWD00xx/EWD74.html
|
||
|
||
[Hoare 1974]
|
||
C. A. R. Hoare.
|
||
Monitors: an operating system structuring concept
|
||
Communications of the ACM, October 1974, Vol. 17 No. 10, 1974.
|
||
http://www.cs.wisc.edu/~remzi/Classes/736/Fall2003/Papers/hoare-monitors.pdf
|
||
|
||
[Lamport 1978]
|
||
Leslie Lamport.
|
||
Time, clocks, and the ordering of events in a distributed system.
|
||
Communications of the ACM archive, Volume 21, Issue 7, 1978.
|
||
http://research.microsoft.com/users/lamport/pubs/time-clocks.pdf
|
||
http://portal.acm.org/citation.cfm?id=359563
|
||
|
||
[Accetta 1986]
|
||
Mike Accetta, Robert Baron, William Bolosky, David Golub, Richard Rashid,
|
||
Avadis Tevanian and Michael Young.
|
||
Mach: A New Kernel Foundation For UNIX Development.
|
||
USENIX 1986 (Atlanta. Ga., June 9-13), pp. 93-112, 1986.
|
||
http://www.fsl.cs.sunysb.edu/~gopalan/seminar/papers/mach.pdf
|
||
|
||
[Young 1987]
|
||
Michael Young, Avadis Tevanian, Richard Rashid, David Golub,
|
||
Jeffrey Eppinger, Jonathan Chew, William Bolosky, David Black, Robert Baron.
|
||
The duality of memory and communication in the implementation of a
|
||
multiprocessor operating system.
|
||
ACM Symposium on Operating Systems Principles, pp. 63-76, 1987.
|
||
http://csalpha.ist.unomaha.edu/~stanw/papers/csci8550/87-duality.pdf
|
||
http://portal.acm.org/citation.cfm?id=41457.37507
|
||
|
||
[Netzer 1992]
|
||
Robert H. B. Netzer and Barton P. Miller.
|
||
What are race conditions? Some issues and formalizations.
|
||
ACM Letters on Programming Languages and Systems, 1(1):74–88, March 1992.
|
||
http://www.securitytechnet.com/resource/security/os/race-conditions.pdf
|
||
http://portal.acm.org/citation.cfm?id=130623
|
||
|
||
[Adve 1991]
|
||
Sarita V. Adve, Mark D. Hill, Barton P. Miller, Robert H. B. Netzer.
|
||
Detecting data races on weak memory systems.
|
||
Proceedings of the 18th annual international symposium on Computer
|
||
architecture, Toronto, Ontario, Canada, pp 234-243, 1991.
|
||
http://rsim.cs.uiuc.edu/~sadve/Publications/isca91.dataraces.ps
|
||
http://portal.acm.org/citation.cfm?doid=115953.115976
|
||
|
||
[Ousterhout 1996]
|
||
John Ousterhout.
|
||
Why Threads Are A Bad Idea (for most purposes).
|
||
Invited Talk at the 1996 USENIX Technical Conference (January 25, 1996).
|
||
http://home.pacbell.net/ouster/threads.pdf
|
||
|
||
[Savage 1997]
|
||
Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro and
|
||
Thomas Anderson.
|
||
Eraser: A Dynamic Data Race Detector for Multithreaded Programs.
|
||
ACM Transactions on Computer Systems, 15(4):391-411, November 1997.
|
||
http://www.cs.ucsd.edu/users/savage/papers/Tocs97.pdf
|
||
http://portal.acm.org/citation.cfm?id=265927
|
||
|
||
[Ronsse 1999]
|
||
Michiel Ronsse, Koen De Bosschere.
|
||
RecPlay: a fully integrated practical record/replay system.
|
||
ACM Transactions on Computer Systems (TOCS), Volume 17, Issue 2 (May 1999),
|
||
pp. 133-152, 1999.
|
||
http://portal.acm.org/citation.cfm?id=312214
|
||
|
||
[Ronsse 2004]
|
||
Michiel Ronsse, Jonas Maebe, Koen De Bosschere.
|
||
Detecting Data Races in Sequential Programs with DIOTA.
|
||
Proceedings of the 10th International Euro-Par Conference, Springer-Verlag,
|
||
Lecture Notes in Computer Science, pp. 82-89, 2004.
|
||
http://escher.elis.ugent.be/publ/Edocs/DOC/P104_076.pdf
|
||
|
||
[Yodaiken 2004]
|
||
Victor Yodaiken.
|
||
Against Priority Inheritance.
|
||
FSMLabs Technical Report, 2004.
|
||
http://www.yodaiken.com/papers/inherit.pdf
|
||
|
||
[Banerjee 2006a]
|
||
Utpal Banerjee, Brian Bliss, Zhiqiang Ma, Paul Petersen.
|
||
Unraveling Data Race Detection in the Intel® Thread Checker.
|
||
First Workshop on Software Tools for Multi-core Systems (STMCS), in
|
||
conjunction with IEEE/ACM International Symposium on Code Generation and
|
||
Optimization (CGO), March 26, 2006, Manhattan, New York, NY.
|
||
http://www.isi.edu/~kintali/stmcs06/UnravelingDataRace.pdf
|
||
|
||
[Banerjee 2006b]
|
||
Utpal Banerjee, Brian Bliss, Zhiqiang Ma, Paul Petersen.
|
||
A theory of data race detection
|
||
Proceeding of the 2006 workshop on Parallel and distributed systems: testing
|
||
and debugging, Portland, Maine, USA, pp. 69-78, 2006.
|
||
http://www.cs.ucsb.edu/~tiwari/papers/threadchecker06
|
||
http://portal.acm.org/citation.cfm?id=1147416
|
||
|
||
[Lee 2006]
|
||
Edward A. Lee.
|
||
The Problem with Threads.
|
||
IEEE Computer, Volume 39, Issue 5 (May 2006), pp. 33-42, 2006.
|
||
http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf
|
||
http://portal.acm.org/citation.cfm?id=1137232.1137289
|
||
|
||
[Lu 2006]
|
||
Shan Lu, Joseph Tucek, Feng Qin, Yuanyuan Zhou.
|
||
AVIO: detecting atomicity violations via access interleaving invariants.
|
||
Proceedings of the 12th international conference on Architectural support
|
||
for programming languages and operating systems, San Jose, California, USA,
|
||
pp. 37-48, 2006.
|
||
http://www.cse.ohio-state.edu/~qin/pub-papers/2006andbefore/asplos062-lu.pdf
|
||
http://portal.acm.org/citation.cfm?id=1168864
|
||
|
||
[Sack 2006]
|
||
Paul Sack, Brian E. Bliss, Zhiqiang Ma, Paul Petersen, Josep Torrellas
|
||
Accurate and efficient filtering for the Intel thread checker race detector.
|
||
Proceedings of the 1st workshop on Architectural and system support for
|
||
improving software dependability, San Jose, California, pp. 34-41, 2006.
|
||
http://iacoma.cs.uiuc.edu/iacoma-papers/asid06.pdf
|
||
http://portal.acm.org/citation.cfm?id=1181309.1181315
|
||
|
||
[Müehlenfeld 2007]
|
||
Arndt Müehlenfeld, Franz Wotawa.
|
||
Fault Detection in Multi-threaded C++ Server Applications.
|
||
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of
|
||
parallel programming, San Jose, California, USA, poster session,
|
||
pp. 142-143, 2007.
|
||
http://valgrind.org/docs/muehlenfeld2006.pdf
|
||
http://portal.acm.org/citation.cfm?id=1229457
|
||
|
||
[Zhou 2007]
|
||
Pin Zhou, Radu Teodorescu, Yuanyuan Zhou.
|
||
HARD: Hardware-Assisted Lockset-based Race Detection.
|
||
Proceedings of the 2007 IEEE 13th International Symposium on High
|
||
Performance Computer Architecture, pp. 121-132, 2007.
|
||
http://opera.cs.uiuc.edu/paper/Hard-HPCA07.pdf
|
||
http://portal.acm.org/citation.cfm?id=1317533.1318108
|