diff --git a/NEWS b/NEWS index e169ac1ce..e6e0fb578 100644 --- a/NEWS +++ b/NEWS @@ -27,6 +27,11 @@ Release 3.8.0 (????) * The C++ demangler has been updated so as to work well with C++ compiled by even the most recent g++'s. +* The new option --fair-sched allows to control the locking mechanism + used by Valgrind. The locking mechanism influences the performance + and scheduling of multithreaded applications (in particular + on multiprocessor/multicore systems). + * ==================== FIXED BUGS ==================== The following bugs have been fixed or resolved. Note that "n-i-bz" @@ -41,6 +46,7 @@ https://bugs.kde.org/show_bug.cgi?id=XXXXXX where XXXXXX is the bug number as listed below. 247386 make perf does not run all performance tests +270006 -Valgrind scheduler unfair 270796 s390x: Removed broken support for the TS insn 271438 Fix configure for proper SSE4.2 detection 273114 s390x: Support TR, TRE, TROO, TROT, TRTO, and TRTT instructions diff --git a/docs/xml/manual-core.xml b/docs/xml/manual-core.xml index 736c39756..2d3d086af 100644 --- a/docs/xml/manual-core.xml +++ b/docs/xml/manual-core.xml @@ -1660,6 +1660,44 @@ need to use these. + + + + + + The controls the + locking mechanism used by Valgrind to serialise thread + execution. The locking mechanism differs in the way the threads + are scheduled, giving a different trade-off between fairness and + performance. For more details about the Valgrind thread + serialisation principle and its impact on performance and thread + scheduling, see . + + + The value + activates a fair scheduling. Basically, if multiple threads are + ready to run, the threads will be scheduled in a round robin + fashion. This mechanism is not available on all platforms or + linux versions. If not available, + using will cause Valgrind to + terminate with an error. + + + The value + activates the fair scheduling if available on the + platform. Otherwise, it will automatically fallback + to . + + + The value activates + a scheduling mechanism which does not guarantee fairness + between threads ready to run. + + + + + + @@ -1836,8 +1874,8 @@ that your program will use the native threading library, but Valgrind serialises execution so that only one (kernel) thread is running at a time. This approach avoids the horrible implementation problems of implementing a truly multithreaded version of Valgrind, but it does -mean that threaded apps run only on one CPU, even if you have a -multiprocessor or multicore machine. +mean that threaded apps never use more than one CPU simultaneously, +even if you have a multiprocessor or multicore machine. Valgrind doesn't schedule the threads itself. It merely ensures that only one thread runs at once, using a simple locking scheme. The @@ -1860,6 +1898,86 @@ everything is shared (a thread) or nothing is shared (fork-like); partial sharing will fail. + +Scheduling and Multi-Thread Performance + +A thread executes some code only when it holds the lock. After +executing a certain nr of instructions, the running thread will release +the lock. All threads ready to run will compete to acquire the lock. + +The option controls the locking mechanism +used to serialise the thread execution. + + The default pipe based locking +() is available on all platforms. The +pipe based locking does not guarantee fairness between threads : it is +very well possible that the thread that has just released the lock +gets it back directly. When using the pipe based locking, different +execution of the same multithreaded application might give very different +thread scheduling. + + The futex based locking is available on some platforms. +If available, it is activated by or +. The futex based locking ensures +fairness between threads : if multiple threads are ready to run, the lock +will be given to the thread which first requested the lock. Note that a thread +which is blocked in a system call (e.g. in a blocking read system call) has +not (yet) requested the lock: such a thread requests the lock only after the +system call is finished. + + The fairness of the futex based locking ensures a better reproducibility +of the thread scheduling for different executions of a multithreaded +application. This fairness/better reproducibility is particularly +interesting when using Helgrind or DRD. + + The Valgrind thread serialisation implies that only one thread +is running at a time. On a multiprocessor/multicore system, the +running thread is assigned to one of the CPUs by the OS kernel +scheduler. When a thread acquires the lock, sometimes the thread will +be assigned to the same CPU as the thread that just released the +lock. Sometimes, the thread will be assigned to another CPU. When +using the pipe based locking, the thread that just acquired the lock +will often be scheduled on the same CPU as the thread that just +released the lock. With the futex based mechanism, the thread that +just acquired the lock will more often be scheduled on another +CPU. + +The Valgrind thread serialisation and CPU assignment by the OS +kernel scheduler can badly interact with the CPU frequency scaling +available on many modern CPUs : to decrease power consumption, the +frequency of a CPU or core is automatically decreased if the CPU/core +has not been used recently. If the OS kernel often assigns the thread +which just acquired the lock to another CPU/core, there is quite some +chance that this CPU/core is currently at a low frequency. The +frequency of this CPU will be increased after some time. However, +during this time, the (only) running thread will have run at a low +frequency. Once this thread has run during some time, it will release +the lock. Another thread will acquire this lock, and might be +scheduled again on another CPU whose clock frequency was decreased in +the meantime. + +The futex based locking causes threads to more often switch of +CPU/core. So, if CPU frequency scaling is activated, the futex based +locking might decrease significantly (up to 50% degradation has been +observed) the performance of a multithreaded app running under +Valgrind. The pipe based locking also somewhat interacts badly with +CPU frequency scaling. Up to 10..20% performance degradation has been +observed. + +To avoid this performance degradation, you can indicate to the +kernel that all CPUs/cores should always run at maximum clock +speed. Depending on your linux distribution, CPU frequency scaling +might be controlled using a graphical interface or using command line +such as +cpufreq-selector or +cpufreq-set. You might also indicate to the +OS scheduler to run a Valgrind process on a specific (fixed) CPU using the +taskset command : running on a fixed +CPU should ensure that this specific CPU keeps a high frequency clock speed. + + + +