diff --git a/cachegrind/docs/manual.html b/cachegrind/docs/manual.html index 95fe84080..9af1ec200 100644 --- a/cachegrind/docs/manual.html +++ b/cachegrind/docs/manual.html @@ -33,7 +33,7 @@ jseward@acm.org
Copyright © 2000-2002 Julian Seward

-Valgrind is licensed under the GNU General Public License, +Cachegrind is licensed under the GNU General Public License, version 2
An open-source tool for finding memory-management problems in Linux-x86 executables. @@ -45,1992 +45,17 @@ Linux-x86 executables.

Contents of this manual

-

Introduction

- 1.1  What Valgrind is for
- 1.2  What it does with your program +

How to use Cachegrind

-

How to use it, and how to make sense - of the results

- 2.1  Getting started
- 2.2  The commentary
- 2.3  Reporting of errors
- 2.4  Suppressing errors
- 2.5  Command-line flags
- 2.6  Explaination of error messages
- 2.7  Writing suppressions files
- 2.8  The Client Request mechanism
- 2.9  Support for POSIX pthreads
- 2.10  Building and installing
- 2.11  If you have problems
- -

Details of the checking machinery

- 3.1  Valid-value (V) bits
- 3.2  Valid-address (A) bits
- 3.3  Putting it all together
- 3.4  Signals
- 3.5  Memory leak detection
- -

Limitations

- -

How it works -- a rough overview

- 5.1  Getting started
- 5.2  The translation/instrumentation engine
- 5.3  Tracking the status of memory
- 5.4  System calls
- 5.5  Signals
- -

An example

- -

Cache profiling

- -

The design and implementation of Valgrind

+

How Cachegrind works


- -

1  Introduction

- - -

1.1  What Valgrind is for

- -Valgrind is a tool to help you find memory-management problems in your -programs. When a program is run under Valgrind's supervision, all -reads and writes of memory are checked, and calls to -malloc/new/free/delete are intercepted. As a result, Valgrind can -detect problems such as: - - -Problems like these can be difficult to find by other means, often -lying undetected for long periods, then causing occasional, -difficult-to-diagnose crashes. - -

-Valgrind is closely tied to details of the CPU, operating system and -to a less extent, compiler and basic C libraries. This makes it -difficult to make it portable, so I have chosen at the outset to -concentrate on what I believe to be a widely used platform: Linux on -x86s. Valgrind uses the standard Unix ./configure, -make, make install mechanism, and I have -attempted to ensure that it works on machines with kernel 2.2 or 2.4 -and glibc 2.1.X or 2.2.X. This should cover the vast majority of -modern Linux installations. - - -

-Valgrind is licensed under the GNU General Public License, version -2. Read the file LICENSE in the source distribution for details. Some -of the PThreads test cases, test/pth_*.c, are taken from -"Pthreads Programming" by Bradford Nichols, Dick Buttlar & Jacqueline -Proulx Farrell, ISBN 1-56592-115-1, published by O'Reilly & -Associates, Inc. - - - -

1.2  What it does with your program

- -Valgrind is designed to be as non-intrusive as possible. It works -directly with existing executables. You don't need to recompile, -relink, or otherwise modify, the program to be checked. Simply place -the word valgrind at the start of the command line -normally used to run the program. So, for example, if you want to run -the command ls -l on Valgrind, simply issue the -command: valgrind ls -l. - -

Valgrind takes control of your program before it starts. Debugging -information is read from the executable and associated libraries, so -that error messages can be phrased in terms of source code -locations. Your program is then run on a synthetic x86 CPU which -checks every memory access. All detected errors are written to a -log. When the program finishes, Valgrind searches for and reports on -leaked memory. - -

You can run pretty much any dynamically linked ELF x86 executable -using Valgrind. Programs run 25 to 50 times slower, and take a lot -more memory, than they usually would. It works well enough to run -large programs. For example, the Konqueror web browser from the KDE -Desktop Environment, version 3.0, runs slowly but usably on Valgrind. - -

Valgrind simulates every single instruction your program executes. -Because of this, it finds errors not only in your application but also -in all supporting dynamically-linked (.so-format) -libraries, including the GNU C library, the X client libraries, Qt, if -you work with KDE, and so on. That often includes libraries, for -example the GNU C library, which contain memory access violations, but -which you cannot or do not want to fix. - -

Rather than swamping you with errors in which you are not -interested, Valgrind allows you to selectively suppress errors, by -recording them in a suppressions file which is read when Valgrind -starts up. The build mechanism attempts to select suppressions which -give reasonable behaviour for the libc and XFree86 versions detected -on your machine. - - -

Section 6 shows an example of use. -

-


- - -

2  How to use it, and how to make sense of the results

- - -

2.1  Getting started

- -First off, consider whether it might be beneficial to recompile your -application and supporting libraries with optimisation disabled and -debugging info enabled (the -g flag). You don't have to -do this, but doing so helps Valgrind produce more accurate and less -confusing error reports. Chances are you're set up like this already, -if you intended to debug your program with GNU gdb, or some other -debugger. - -

-A plausible compromise is to use -g -O. -Optimisation levels above -O have been observed, on very -rare occasions, to cause gcc to generate code which fools Valgrind's -error tracking machinery into wrongly reporting uninitialised value -errors. -O gets you the vast majority of the benefits of -higher optimisation levels anyway, so you don't lose much there. - -

-Valgrind understands both the older "stabs" debugging format, used by -gcc versions prior to 3.1, and the newer DWARF2 format used by gcc 3.1 -and later. - -

-Then just run your application, but place the word -valgrind in front of your usual command-line invokation. -Note that you should run the real (machine-code) executable here. If -your application is started by, for example, a shell or perl script, -you'll need to modify it to invoke Valgrind on the real executables. -Running such scripts directly under Valgrind will result in you -getting error reports pertaining to /bin/sh, -/usr/bin/perl, or whatever interpreter you're using. -This almost certainly isn't what you want and can be confusing. - - -

2.2  The commentary

- -Valgrind writes a commentary, detailing error reports and other -significant events. The commentary goes to standard output by -default. This may interfere with your program, so you can ask for it -to be directed elsewhere. - -

All lines in the commentary are of the following form:
-

-  ==12345== some-message-from-Valgrind
-
-

The 12345 is the process ID. This scheme makes it easy -to distinguish program output from Valgrind commentary, and also easy -to differentiate commentaries from different processes which have -become merged together, for whatever reason. - -

By default, Valgrind writes only essential messages to the commentary, -so as to avoid flooding you with information of secondary importance. -If you want more information about what is happening, re-run, passing -the -v flag to Valgrind. - - - -

2.3  Reporting of errors

- -When Valgrind detects something bad happening in the program, an error -message is written to the commentary. For example:
-
-  ==25832== Invalid read of size 4
-  ==25832==    at 0x8048724: BandMatrix::ReSize(int, int, int) (bogon.cpp:45)
-  ==25832==    by 0x80487AF: main (bogon.cpp:66)
-  ==25832==    by 0x40371E5E: __libc_start_main (libc-start.c:129)
-  ==25832==    by 0x80485D1: (within /home/sewardj/newmat10/bogon)
-  ==25832==    Address 0xBFFFF74C is not stack'd, malloc'd or free'd
-
- -

This message says that the program did an illegal 4-byte read of -address 0xBFFFF74C, which, as far as it can tell, is not a valid stack -address, nor corresponds to any currently malloc'd or free'd blocks. -The read is happening at line 45 of bogon.cpp, called -from line 66 of the same file, etc. For errors associated with an -identified malloc'd/free'd block, for example reading free'd memory, -Valgrind reports not only the location where the error happened, but -also where the associated block was malloc'd/free'd. - -

Valgrind remembers all error reports. When an error is detected, -it is compared against old reports, to see if it is a duplicate. If -so, the error is noted, but no further commentary is emitted. This -avoids you being swamped with bazillions of duplicate error reports. - -

If you want to know how many times each error occurred, run with -the -v option. When execution finishes, all the reports -are printed out, along with, and sorted by, their occurrence counts. -This makes it easy to see which errors have occurred most frequently. - -

Errors are reported before the associated operation actually -happens. For example, if you program decides to read from address -zero, Valgrind will emit a message to this effect, and the program -will then duly die with a segmentation fault. - -

In general, you should try and fix errors in the order that they -are reported. Not doing so can be confusing. For example, a program -which copies uninitialised values to several memory locations, and -later uses them, will generate several error messages. The first such -error message may well give the most direct clue to the root cause of -the problem. - -

The process of detecting duplicate errors is quite an expensive -one and can become a significant performance overhead if your program -generates huge quantities of errors. To avoid serious problems here, -Valgrind will simply stop collecting errors after 300 different errors -have been seen, or 30000 errors in total have been seen. In this -situation you might as well stop your program and fix it, because -Valgrind won't tell you anything else useful after this. Note that -the 300/30000 limits apply after suppressed errors are removed. These -limits are defined in vg_include.h and can be increased -if necessary. - -

To avoid this cutoff you can use the ---error-limit=no flag. Then valgrind will always show -errors, regardless of how many there are. Use this flag carefully, -since it may have a dire effect on performance. - - - -

2.4  Suppressing errors

- -Valgrind detects numerous problems in the base libraries, such as the -GNU C library, and the XFree86 client libraries, which come -pre-installed on your GNU/Linux system. You can't easily fix these, -but you don't want to see these errors (and yes, there are many!) So -Valgrind reads a list of errors to suppress at startup. -A default suppression file is cooked up by the -./configure script. - -

You can modify and add to the suppressions file at your leisure, -or, better, write your own. Multiple suppression files are allowed. -This is useful if part of your project contains errors you can't or -don't want to fix, yet you don't want to continuously be reminded of -them. - -

Each error to be suppressed is described very specifically, to -minimise the possibility that a suppression-directive inadvertantly -suppresses a bunch of similar errors which you did want to see. The -suppression mechanism is designed to allow precise yet flexible -specification of errors to suppress. - -

If you use the -v flag, at the end of execution, Valgrind -prints out one line for each used suppression, giving its name and the -number of times it got used. Here's the suppressions used by a run of -ls -l: -

-  --27579-- supp: 1 socketcall.connect(serv_addr)/__libc_connect/__nscd_getgrgid_r
-  --27579-- supp: 1 socketcall.connect(serv_addr)/__libc_connect/__nscd_getpwuid_r
-  --27579-- supp: 6 strrchr/_dl_map_object_from_fd/_dl_map_object
-
- - -

2.5  Command-line flags

- -You invoke Valgrind like this: -
-  valgrind [options-for-Valgrind] your-prog [options for your-prog]
-
- -

Note that Valgrind also reads options from the environment variable -$VALGRIND_OPTS, and processes them before the command-line -options. - -

Valgrind's default settings succeed in giving reasonable behaviour -in most cases. Available options, in no particular order, are as -follows: -

- -There are also some options for debugging Valgrind itself. You -shouldn't need to use them in the normal run of things. Nevertheless: - - - - - -

2.6  Explaination of error messages

- -Despite considerable sophistication under the hood, Valgrind can only -really detect two kinds of errors, use of illegal addresses, and use -of undefined values. Nevertheless, this is enough to help you -discover all sorts of memory-management nasties in your code. This -section presents a quick summary of what error messages mean. The -precise behaviour of the error-checking machinery is described in -Section 4. - - -

2.6.1  Illegal read / Illegal write errors

-For example: -
-  Invalid read of size 4
-     at 0x40F6BBCC: (within /usr/lib/libpng.so.2.1.0.9)
-     by 0x40F6B804: (within /usr/lib/libpng.so.2.1.0.9)
-     by 0x40B07FF4: read_png_image__FP8QImageIO (kernel/qpngio.cpp:326)
-     by 0x40AC751B: QImageIO::read() (kernel/qimage.cpp:3621)
-     Address 0xBFFFF0E0 is not stack'd, malloc'd or free'd
-
- -

This happens when your program reads or writes memory at a place -which Valgrind reckons it shouldn't. In this example, the program did -a 4-byte read at address 0xBFFFF0E0, somewhere within the -system-supplied library libpng.so.2.1.0.9, which was called from -somewhere else in the same library, called from line 326 of -qpngio.cpp, and so on. - -

Valgrind tries to establish what the illegal address might relate -to, since that's often useful. So, if it points into a block of -memory which has already been freed, you'll be informed of this, and -also where the block was free'd at. Likewise, if it should turn out -to be just off the end of a malloc'd block, a common result of -off-by-one-errors in array subscripting, you'll be informed of this -fact, and also where the block was malloc'd. - -

In this example, Valgrind can't identify the address. Actually the -address is on the stack, but, for some reason, this is not a valid -stack address -- it is below the stack pointer, %esp, and that isn't -allowed. In this particular case it's probably caused by gcc -generating invalid code, a known bug in various flavours of gcc. - -

Note that Valgrind only tells you that your program is about to -access memory at an illegal address. It can't stop the access from -happening. So, if your program makes an access which normally would -result in a segmentation fault, you program will still suffer the same -fate -- but you will get a message from Valgrind immediately prior to -this. In this particular example, reading junk on the stack is -non-fatal, and the program stays alive. - - -

2.6.2  Use of uninitialised values

-For example: -
-  Conditional jump or move depends on uninitialised value(s)
-     at 0x402DFA94: _IO_vfprintf (_itoa.h:49)
-     by 0x402E8476: _IO_printf (printf.c:36)
-     by 0x8048472: main (tests/manuel1.c:8)
-     by 0x402A6E5E: __libc_start_main (libc-start.c:129)
-
- -

An uninitialised-value use error is reported when your program uses -a value which hasn't been initialised -- in other words, is undefined. -Here, the undefined value is used somewhere inside the printf() -machinery of the C library. This error was reported when running the -following small program: -

-  int main()
-  {
-    int x;
-    printf ("x = %d\n", x);
-  }
-
- -

It is important to understand that your program can copy around -junk (uninitialised) data to its heart's content. Valgrind observes -this and keeps track of the data, but does not complain. A complaint -is issued only when your program attempts to make use of uninitialised -data. In this example, x is uninitialised. Valgrind observes the -value being passed to _IO_printf and thence to _IO_vfprintf, but makes -no comment. However, _IO_vfprintf has to examine the value of x so it -can turn it into the corresponding ASCII string, and it is at this -point that Valgrind complains. - -

Sources of uninitialised data tend to be: -

- - - -

2.6.3  Illegal frees

-For example: -
-  Invalid free()
-     at 0x4004FFDF: free (ut_clientmalloc.c:577)
-     by 0x80484C7: main (tests/doublefree.c:10)
-     by 0x402A6E5E: __libc_start_main (libc-start.c:129)
-     by 0x80483B1: (within tests/doublefree)
-     Address 0x3807F7B4 is 0 bytes inside a block of size 177 free'd
-     at 0x4004FFDF: free (ut_clientmalloc.c:577)
-     by 0x80484C7: main (tests/doublefree.c:10)
-     by 0x402A6E5E: __libc_start_main (libc-start.c:129)
-     by 0x80483B1: (within tests/doublefree)
-
-

Valgrind keeps track of the blocks allocated by your program with -malloc/new, so it can know exactly whether or not the argument to -free/delete is legitimate or not. Here, this test program has -freed the same block twice. As with the illegal read/write errors, -Valgrind attempts to make sense of the address free'd. If, as -here, the address is one which has previously been freed, you wil -be told that -- making duplicate frees of the same block easy to spot. - - -

2.6.4  When a block is freed with an inappropriate -deallocation function

-In the following example, a block allocated with new[] -has wrongly been deallocated with free: -
-  Mismatched free() / delete / delete []
-     at 0x40043249: free (vg_clientfuncs.c:171)
-     by 0x4102BB4E: QGArray::~QGArray(void) (tools/qgarray.cpp:149)
-     by 0x4C261C41: PptDoc::~PptDoc(void) (include/qmemarray.h:60)
-     by 0x4C261F0E: PptXml::~PptXml(void) (pptxml.cc:44)
-     Address 0x4BB292A8 is 0 bytes inside a block of size 64 alloc'd
-     at 0x4004318C: __builtin_vec_new (vg_clientfuncs.c:152)
-     by 0x4C21BC15: KLaola::readSBStream(int) const (klaola.cc:314)
-     by 0x4C21C155: KLaola::stream(KLaola::OLENode const *) (klaola.cc:416)
-     by 0x4C21788F: OLEFilter::convert(QCString const &) (olefilter.cc:272)
-
-The following was told to me be the KDE 3 developers. I didn't know -any of it myself. They also implemented the check itself. -

-In C++ it's important to deallocate memory in a way compatible with -how it was allocated. The deal is: -

-The worst thing is that on Linux apparently it doesn't matter if you -do muddle these up, and it all seems to work ok, but the same program -may then crash on a different platform, Solaris for example. So it's -best to fix it properly. According to the KDE folks "it's amazing how -many C++ programmers don't know this". -

-Pascal Massimino adds the following clarification: -delete[] must be called associated with a -new[] because the compiler stores the size of the array -and the pointer-to-member to the destructor of the array's content -just before the pointer actually returned. This implies a -variable-sized overhead in what's returned by new or -new[]. It rather surprising how compilers [Ed: -runtime-support libraries?] are robust to mismatch in -new/delete -new[]/delete[]. - - -

2.6.5  Passing system call parameters with inadequate -read/write permissions

- -Valgrind checks all parameters to system calls. If a system call -needs to read from a buffer provided by your program, Valgrind checks -that the entire buffer is addressible and has valid data, ie, it is -readable. And if the system call needs to write to a user-supplied -buffer, Valgrind checks that the buffer is addressible. After the -system call, Valgrind updates its administrative information to -precisely reflect any changes in memory permissions caused by the -system call. - -

Here's an example of a system call with an invalid parameter: -

-  #include <stdlib.h>
-  #include <unistd.h>
-  int main( void )
-  {
-    char* arr = malloc(10);
-    (void) write( 1 /* stdout */, arr, 10 );
-    return 0;
-  }
-
- -

You get this complaint ... -

-  Syscall param write(buf) contains uninitialised or unaddressable byte(s)
-     at 0x4035E072: __libc_write
-     by 0x402A6E5E: __libc_start_main (libc-start.c:129)
-     by 0x80483B1: (within tests/badwrite)
-     by <bogus frame pointer> ???
-     Address 0x3807E6D0 is 0 bytes inside a block of size 10 alloc'd
-     at 0x4004FEE6: malloc (ut_clientmalloc.c:539)
-     by 0x80484A0: main (tests/badwrite.c:6)
-     by 0x402A6E5E: __libc_start_main (libc-start.c:129)
-     by 0x80483B1: (within tests/badwrite)
-
- -

... because the program has tried to write uninitialised junk from -the malloc'd block to the standard output. - - -

2.6.6  Warning messages you might see

- -Most of these only appear if you run in verbose mode (enabled by --v): - - - - -

2.7  Writing suppressions files

- -A suppression file describes a bunch of errors which, for one reason -or another, you don't want Valgrind to tell you about. Usually the -reason is that the system libraries are buggy but unfixable, at least -within the scope of the current debugging session. Multiple -suppressions files are allowed. By default, Valgrind uses -$PREFIX/lib/valgrind/default.supp. - -

-You can ask to add suppressions from another file, by specifying ---suppressions=/path/to/file.supp. - -

Each suppression has the following components:
-

- -

-Locations may be either names of shared objects/executables or wildcards -matching function names. They begin obj: and fun: -respectively. Function and object names to match against may use the -wildcard characters * and ?. - -A suppression only suppresses an error when the error matches all the -details in the suppression. Here's an example: -

-  {
-    __gconv_transform_ascii_internal/__mbrtowc/mbtowc
-    Value4
-    fun:__gconv_transform_ascii_internal
-    fun:__mbr*toc
-    fun:mbtowc
-  }
-
- -

What is means is: suppress a use-of-uninitialised-value error, when -the data size is 4, when it occurs in the function -__gconv_transform_ascii_internal, when that is called -from any function of name matching __mbr*toc, -when that is called from -mbtowc. It doesn't apply under any other circumstances. -The string by which this suppression is identified to the user is -__gconv_transform_ascii_internal/__mbrtowc/mbtowc. - -

Another example: -

-  {
-    libX11.so.6.2/libX11.so.6.2/libXaw.so.7.0
-    Value4
-    obj:/usr/X11R6/lib/libX11.so.6.2
-    obj:/usr/X11R6/lib/libX11.so.6.2
-    obj:/usr/X11R6/lib/libXaw.so.7.0
-  }
-
- -

Suppress any size 4 uninitialised-value error which occurs anywhere -in libX11.so.6.2, when called from anywhere in the same -library, when called from anywhere in libXaw.so.7.0. The -inexact specification of locations is regrettable, but is about all -you can hope for, given that the X11 libraries shipped with Red Hat -7.2 have had their symbol tables removed. - -

Note -- since the above two examples did not make it clear -- that -you can freely mix the obj: and fun: -styles of description within a single suppression record. - - - -

2.8  The Client Request mechanism

- -Valgrind has a trapdoor mechanism via which the client program can -pass all manner of requests and queries to Valgrind. Internally, this -is used extensively to make malloc, free, signals, threads, etc, work, -although you don't see that. -

-For your convenience, a subset of these so-called client requests is -provided to allow you to tell Valgrind facts about the behaviour of -your program, and conversely to make queries. In particular, your -program can tell Valgrind about changes in memory range permissions -that Valgrind would not otherwise know about, and so allows clients to -get Valgrind to do arbitrary custom checks. -

-Clients need to include the header file valgrind.h to -make this work. The macros therein have the magical property that -they generate code in-line which Valgrind can spot. However, the code -does nothing when not run on Valgrind, so you are not forced to run -your program on Valgrind just because you use the macros in this file. -Also, you are not required to link your program with any extra -supporting libraries. -

-A brief description of the available macros: -

-

- - - -

2.9  Support for POSIX Pthreads

- -As of late April 02, Valgrind supports programs which use POSIX -pthreads. Doing this has proved technically challenging but is now -mostly complete. It works well enough for significant threaded -applications to work. -

-It works as follows: threaded apps are (dynamically) linked against -libpthread.so. Usually this is the one installed with -your Linux distribution. Valgrind, however, supplies its own -libpthread.so and automatically connects your program to -it instead. -

-The fake libpthread.so and Valgrind cooperate to -implement a user-space pthreads package. This approach avoids the -horrible implementation problems of implementing a truly -multiprocessor version of Valgrind, but it does mean that threaded -apps run only on one CPU, even if you have a multiprocessor machine. -

-Valgrind schedules your threads in a round-robin fashion, with all -threads having equal priority. It switches threads every 50000 basic -blocks (typically around 300000 x86 instructions), which means you'll -get a much finer interleaving of thread executions than when run -natively. This in itself may cause your program to behave differently -if you have some kind of concurrency, critical race, locking, or -similar, bugs. -

-The current (valgrind-1.0 release) state of pthread support is as -follows: -

- - -As of 18 May 02, the following threaded programs now work fine on my -RedHat 7.2 box: Opera 6.0Beta2, KNode in KDE 3.0, Mozilla-0.9.2.1 and -Galeon-0.11.3, both as supplied with RedHat 7.2. Also Mozilla 1.0RC2. -OpenOffice 1.0. MySQL 3.something (the current stable release). - - -

2.10  Building and installing

- -We now use the standard Unix ./configure, -make, make install mechanism, and I have -attempted to ensure that it works on machines with kernel 2.2 or 2.4 -and glibc 2.1.X or 2.2.X. I don't think there is much else to say. -There are no options apart from the usual --prefix that -you should give to ./configure. - -

-The configure script tests the version of the X server -currently indicated by the current $DISPLAY. This is a -known bug. The intention was to detect the version of the current -XFree86 client libraries, so that correct suppressions could be -selected for them, but instead the test checks the server version. -This is just plain wrong. - -

-If you are building a binary package of Valgrind for distribution, -please read README_PACKAGERS. It contains some important -information. - -

-Apart from that there is no excitement here. Let me know if you have -build problems. - - - - -

2.11  If you have problems

-Mail me (jseward@acm.org). - -

See Section 4 for the known limitations of -Valgrind, and for a list of programs which are known not to work on -it. - -

The translator/instrumentor has a lot of assertions in it. They -are permanently enabled, and I have no plans to disable them. If one -of these breaks, please mail me! - -

If you get an assertion failure on the expression -chunkSane(ch) in vg_free() in -vg_malloc.c, this may have happened because your program -wrote off the end of a malloc'd block, or before its beginning. -Valgrind should have emitted a proper message to that effect before -dying in this way. This is a known problem which I should fix. -

- -


- - -

3  Details of the checking machinery

- -Read this section if you want to know, in detail, exactly what and how -Valgrind is checking. - - -

3.1  Valid-value (V) bits

- -It is simplest to think of Valgrind implementing a synthetic Intel x86 -CPU which is identical to a real CPU, except for one crucial detail. -Every bit (literally) of data processed, stored and handled by the -real CPU has, in the synthetic CPU, an associated "valid-value" bit, -which says whether or not the accompanying bit has a legitimate value. -In the discussions which follow, this bit is referred to as the V -(valid-value) bit. - -

Each byte in the system therefore has a 8 V bits which follow -it wherever it goes. For example, when the CPU loads a word-size item -(4 bytes) from memory, it also loads the corresponding 32 V bits from -a bitmap which stores the V bits for the process' entire address -space. If the CPU should later write the whole or some part of that -value to memory at a different address, the relevant V bits will be -stored back in the V-bit bitmap. - -

In short, each bit in the system has an associated V bit, which -follows it around everywhere, even inside the CPU. Yes, the CPU's -(integer and %eflags) registers have their own V bit -vectors. - -

Copying values around does not cause Valgrind to check for, or -report on, errors. However, when a value is used in a way which might -conceivably affect the outcome of your program's computation, the -associated V bits are immediately checked. If any of these indicate -that the value is undefined, an error is reported. - -

Here's an (admittedly nonsensical) example: -

-  int i, j;
-  int a[10], b[10];
-  for (i = 0; i < 10; i++) {
-    j = a[i];
-    b[i] = j;
-  }
-
- -

Valgrind emits no complaints about this, since it merely copies -uninitialised values from a[] into b[], and -doesn't use them in any way. However, if the loop is changed to -

-  for (i = 0; i < 10; i++) {
-    j += a[i];
-  }
-  if (j == 77) 
-     printf("hello there\n");
-
-then Valgrind will complain, at the if, that the -condition depends on uninitialised values. - -

Most low level operations, such as adds, cause Valgrind to -use the V bits for the operands to calculate the V bits for the -result. Even if the result is partially or wholly undefined, -it does not complain. - -

Checks on definedness only occur in two places: when a value is -used to generate a memory address, and where control flow decision -needs to be made. Also, when a system call is detected, valgrind -checks definedness of parameters as required. - -

If a check should detect undefinedness, an error message is -issued. The resulting value is subsequently regarded as well-defined. -To do otherwise would give long chains of error messages. In effect, -we say that undefined values are non-infectious. - -

This sounds overcomplicated. Why not just check all reads from -memory, and complain if an undefined value is loaded into a CPU register? -Well, that doesn't work well, because perfectly legitimate C programs routinely -copy uninitialised values around in memory, and we don't want endless complaints -about that. Here's the canonical example. Consider a struct -like this: -

-  struct S { int x; char c; };
-  struct S s1, s2;
-  s1.x = 42;
-  s1.c = 'z';
-  s2 = s1;
-
- -

The question to ask is: how large is struct S, in -bytes? An int is 4 bytes and a char one byte, so perhaps a struct S -occupies 5 bytes? Wrong. All (non-toy) compilers I know of will -round the size of struct S up to a whole number of words, -in this case 8 bytes. Not doing this forces compilers to generate -truly appalling code for subscripting arrays of struct -S's. - -

So s1 occupies 8 bytes, yet only 5 of them will be initialised. -For the assignment s2 = s1, gcc generates code to copy -all 8 bytes wholesale into s2 without regard for their -meaning. If Valgrind simply checked values as they came out of -memory, it would yelp every time a structure assignment like this -happened. So the more complicated semantics described above is -necessary. This allows gcc to copy s1 into -s2 any way it likes, and a warning will only be emitted -if the uninitialised values are later used. - -

One final twist to this story. The above scheme allows garbage to -pass through the CPU's integer registers without complaint. It does -this by giving the integer registers V tags, passing these around in -the expected way. This complicated and computationally expensive to -do, but is necessary. Valgrind is more simplistic about -floating-point loads and stores. In particular, V bits for data read -as a result of floating-point loads are checked at the load -instruction. So if your program uses the floating-point registers to -do memory-to-memory copies, you will get complaints about -uninitialised values. Fortunately, I have not yet encountered a -program which (ab)uses the floating-point registers in this way. - - -

3.2  Valid-address (A) bits

- -Notice that the previous section describes how the validity of values -is established and maintained without having to say whether the -program does or does not have the right to access any particular -memory location. We now consider the latter issue. - -

As described above, every bit in memory or in the CPU has an -associated valid-value (V) bit. In addition, all bytes in memory, but -not in the CPU, have an associated valid-address (A) bit. This -indicates whether or not the program can legitimately read or write -that location. It does not give any indication of the validity or the -data at that location -- that's the job of the V bits -- only whether -or not the location may be accessed. - -

Every time your program reads or writes memory, Valgrind checks the -A bits associated with the address. If any of them indicate an -invalid address, an error is emitted. Note that the reads and writes -themselves do not change the A bits, only consult them. - -

So how do the A bits get set/cleared? Like this: - -

- - - -

3.3  Putting it all together

-Valgrind's checking machinery can be summarised as follows: - - - -Valgrind intercepts calls to malloc, calloc, realloc, valloc, -memalign, free, new and delete. The behaviour you get is: - - - - - - -

3.4  Signals

- -Valgrind provides suitable handling of signals, so, provided you stick -to POSIX stuff, you should be ok. Basic sigaction() and sigprocmask() -are handled. Signal handlers may return in the normal way or do -longjmp(); both should work ok. As specified by POSIX, a signal is -blocked in its own handler. Default actions for signals should work -as before. Etc, etc. - -

Under the hood, dealing with signals is a real pain, and Valgrind's -simulation leaves much to be desired. If your program does -way-strange stuff with signals, bad things may happen. If so, let me -know. I don't promise to fix it, but I'd at least like to be aware of -it. - - - -

3.5  Memory leak detection

- -Valgrind keeps track of all memory blocks issued in response to calls -to malloc/calloc/realloc/new. So when the program exits, it knows -which blocks are still outstanding -- have not been returned, in other -words. Ideally, you want your program to have no blocks still in use -at exit. But many programs do. - -

For each such block, Valgrind scans the entire address space of the -process, looking for pointers to the block. One of three situations -may result: - -

- -Valgrind reports summaries about leaked and dubious blocks. -For each such block, it will also tell you where the block was -allocated. This should help you figure out why the pointer to it has -been lost. In general, you should attempt to ensure your programs do -not have any leaked or dubious blocks at exit. - -

The precise area of memory in which Valgrind searches for pointers -is: all naturally-aligned 4-byte words for which all A bits indicate -addressibility and all V bits indicated that the stored value is -actually valid. - -


- - - -

4  Limitations

- -The following list of limitations seems depressingly long. However, -most programs actually work fine. - -

Valgrind will run x86-GNU/Linux ELF dynamically linked binaries, on -a kernel 2.2.X or 2.4.X system, subject to the following constraints: - -

- -Programs which are known not to work are: - - - -Known platform-specific limitations, as of release 1.0.0: - - - - -


- - - -

5  How it works -- a rough overview

-Some gory details, for those with a passion for gory details. You -don't need to read this section if all you want to do is use Valgrind. - - -

5.1  Getting started

- -Valgrind is compiled into a shared object, valgrind.so. The shell -script valgrind sets the LD_PRELOAD environment variable to point to -valgrind.so. This causes the .so to be loaded as an extra library to -any subsequently executed dynamically-linked ELF binary, viz, the -program you want to debug. - -

The dynamic linker allows each .so in the process image to have an -initialisation function which is run before main(). It also allows -each .so to have a finalisation function run after main() exits. - -

When valgrind.so's initialisation function is called by the dynamic -linker, the synthetic CPU to starts up. The real CPU remains locked -in valgrind.so for the entire rest of the program, but the synthetic -CPU returns from the initialisation function. Startup of the program -now continues as usual -- the dynamic linker calls all the other .so's -initialisation routines, and eventually runs main(). This all runs on -the synthetic CPU, not the real one, but the client program cannot -tell the difference. - -

Eventually main() exits, so the synthetic CPU calls valgrind.so's -finalisation function. Valgrind detects this, and uses it as its cue -to exit. It prints summaries of all errors detected, possibly checks -for memory leaks, and then exits the finalisation routine, but now on -the real CPU. The synthetic CPU has now lost control -- permanently --- so the program exits back to the OS on the real CPU, just as it -would have done anyway. - -

On entry, Valgrind switches stacks, so it runs on its own stack. -On exit, it switches back. This means that the client program -continues to run on its own stack, so we can switch back and forth -between running it on the simulated and real CPUs without difficulty. -This was an important design decision, because it makes it easy (well, -significantly less difficult) to debug the synthetic CPU. - - - -

5.2  The translation/instrumentation engine

- -Valgrind does not directly run any of the original program's code. Only -instrumented translations are run. Valgrind maintains a translation -table, which allows it to find the translation quickly for any branch -target (code address). If no translation has yet been made, the -translator - a just-in-time translator - is summoned. This makes an -instrumented translation, which is added to the collection of -translations. Subsequent jumps to that address will use this -translation. - -

Valgrind no longer directly supports detection of self-modifying -code. Such checking is expensive, and in practice (fortunately) -almost no applications need it. However, to help people who are -debugging dynamic code generation systems, there is a Client Request -(basically a macro you can put in your program) which directs Valgrind -to discard translations in a given address range. So Valgrind can -still work in this situation provided the client tells it when -code has become out-of-date and needs to be retranslated. - -

The JITter translates basic blocks -- blocks of straight-line-code --- as single entities. To minimise the considerable difficulties of -dealing with the x86 instruction set, x86 instructions are first -translated to a RISC-like intermediate code, similar to sparc code, -but with an infinite number of virtual integer registers. Initially -each insn is translated seperately, and there is no attempt at -instrumentation. - -

The intermediate code is improved, mostly so as to try and cache -the simulated machine's registers in the real machine's registers over -several simulated instructions. This is often very effective. Also, -we try to remove redundant updates of the simulated machines's -condition-code register. - -

The intermediate code is then instrumented, giving more -intermediate code. There are a few extra intermediate-code operations -to support instrumentation; it is all refreshingly simple. After -instrumentation there is a cleanup pass to remove redundant value -checks. - -

This gives instrumented intermediate code which mentions arbitrary -numbers of virtual registers. A linear-scan register allocator is -used to assign real registers and possibly generate spill code. All -of this is still phrased in terms of the intermediate code. This -machinery is inspired by the work of Reuben Thomas (MITE). - -

Then, and only then, is the final x86 code emitted. The -intermediate code is carefully designed so that x86 code can be -generated from it without need for spare registers or other -inconveniences. - -

The translations are managed using a traditional LRU-based caching -scheme. The translation cache has a default size of about 14MB. - - - -

5.3  Tracking the status of memory

Each byte in the -process' address space has nine bits associated with it: one A bit and -eight V bits. The A and V bits for each byte are stored using a -sparse array, which flexibly and efficiently covers arbitrary parts of -the 32-bit address space without imposing significant space or -performance overheads for the parts of the address space never -visited. The scheme used, and speedup hacks, are described in detail -at the top of the source file vg_memory.c, so you should read that for -the gory details. - - - -

5.4 System calls

-All system calls are intercepted. The memory status map is consulted -before and updated after each call. It's all rather tiresome. See -vg_syscall_mem.c for details. - - - -

5.5  Signals

-All system calls to sigaction() and sigprocmask() are intercepted. If -the client program is trying to set a signal handler, Valgrind makes a -note of the handler address and which signal it is for. Valgrind then -arranges for the same signal to be delivered to its own handler. - -

When such a signal arrives, Valgrind's own handler catches it, and -notes the fact. At a convenient safe point in execution, Valgrind -builds a signal delivery frame on the client's stack and runs its -handler. If the handler longjmp()s, there is nothing more to be said. -If the handler returns, Valgrind notices this, zaps the delivery -frame, and carries on where it left off before delivering the signal. - -

The purpose of this nonsense is that setting signal handlers -essentially amounts to giving callback addresses to the Linux kernel. -We can't allow this to happen, because if it did, signal handlers -would run on the real CPU, not the simulated one. This means the -checking machinery would not operate during the handler run, and, -worse, memory permissions maps would not be updated, which could cause -spurious error reports once the handler had returned. - -

An even worse thing would happen if the signal handler longjmp'd -rather than returned: Valgrind would completely lose control of the -client program. - -

Upshot: we can't allow the client to install signal handlers -directly. Instead, Valgrind must catch, on behalf of the client, any -signal the client asks to catch, and must delivery it to the client on -the simulated CPU, not the real one. This involves considerable -gruesome fakery; see vg_signals.c for details. -

- -


- - -

6  Example

-This is the log for a run of a small program. The program is in fact -correct, and the reported error is as the result of a potentially serious -code generation bug in GNU g++ (snapshot 20010527). -
-sewardj@phoenix:~/newmat10$
-~/Valgrind-6/valgrind -v ./bogon 
-==25832== Valgrind 0.10, a memory error detector for x86 RedHat 7.1.
-==25832== Copyright (C) 2000-2001, and GNU GPL'd, by Julian Seward.
-==25832== Startup, with flags:
-==25832== --suppressions=/home/sewardj/Valgrind/redhat71.supp
-==25832== reading syms from /lib/ld-linux.so.2
-==25832== reading syms from /lib/libc.so.6
-==25832== reading syms from /mnt/pima/jrs/Inst/lib/libgcc_s.so.0
-==25832== reading syms from /lib/libm.so.6
-==25832== reading syms from /mnt/pima/jrs/Inst/lib/libstdc++.so.3
-==25832== reading syms from /home/sewardj/Valgrind/valgrind.so
-==25832== reading syms from /proc/self/exe
-==25832== loaded 5950 symbols, 142333 line number locations
-==25832== 
-==25832== Invalid read of size 4
-==25832==    at 0x8048724: _ZN10BandMatrix6ReSizeEiii (bogon.cpp:45)
-==25832==    by 0x80487AF: main (bogon.cpp:66)
-==25832==    by 0x40371E5E: __libc_start_main (libc-start.c:129)
-==25832==    by 0x80485D1: (within /home/sewardj/newmat10/bogon)
-==25832==    Address 0xBFFFF74C is not stack'd, malloc'd or free'd
-==25832==
-==25832== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
-==25832== malloc/free: in use at exit: 0 bytes in 0 blocks.
-==25832== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
-==25832== For a detailed leak analysis, rerun with: --leak-check=yes
-==25832==
-==25832== exiting, did 1881 basic blocks, 0 misses.
-==25832== 223 translations, 3626 bytes in, 56801 bytes out.
-
-

The GCC folks fixed this about a week before gcc-3.0 shipped. -


-

- - -

7  Cache profiling

-As well as memory debugging, Valgrind also allows you to do cache simulations -and annotate your source line-by-line with the number of cache misses. In -particular, it records: +

1  Cache profiling

+Cachegrind is a tool for doing cache simulations and annotate your source +line-by-line with the number of cache misses. In particular, it records: