mirror of
https://github.com/Zenithsiz/ftmemsim-valgrind.git
synced 2026-02-03 18:13:01 +00:00
Document the new --fair-sched option.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@12398
This commit is contained in:
parent
564e685793
commit
6a15dd16e4
6
NEWS
6
NEWS
@ -27,6 +27,11 @@ Release 3.8.0 (????)
|
||||
* The C++ demangler has been updated so as to work well with C++
|
||||
compiled by even the most recent g++'s.
|
||||
|
||||
* The new option --fair-sched allows to control the locking mechanism
|
||||
used by Valgrind. The locking mechanism influences the performance
|
||||
and scheduling of multithreaded applications (in particular
|
||||
on multiprocessor/multicore systems).
|
||||
|
||||
* ==================== FIXED BUGS ====================
|
||||
|
||||
The following bugs have been fixed or resolved. Note that "n-i-bz"
|
||||
@ -41,6 +46,7 @@ https://bugs.kde.org/show_bug.cgi?id=XXXXXX
|
||||
where XXXXXX is the bug number as listed below.
|
||||
|
||||
247386 make perf does not run all performance tests
|
||||
270006 -Valgrind scheduler unfair
|
||||
270796 s390x: Removed broken support for the TS insn
|
||||
271438 Fix configure for proper SSE4.2 detection
|
||||
273114 s390x: Support TR, TRE, TROO, TROT, TRTO, and TRTT instructions
|
||||
|
||||
@ -1660,6 +1660,44 @@ need to use these.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry id="opt.fair-sched" xreflabel="--fair-sched">
|
||||
<term>
|
||||
<option><![CDATA[--fair-sched=<no|yes|try> [default: no] ]]></option>
|
||||
</term>
|
||||
|
||||
<listitem> <para>The <option>--fair-sched</option> controls the
|
||||
locking mechanism used by Valgrind to serialise thread
|
||||
execution. The locking mechanism differs in the way the threads
|
||||
are scheduled, giving a different trade-off between fairness and
|
||||
performance. For more details about the Valgrind thread
|
||||
serialisation principle and its impact on performance and thread
|
||||
scheduling, see <xref linkend="manual-core.pthreads_perf_sched"/>.
|
||||
|
||||
<itemizedlist>
|
||||
<listitem> <para>The value <option>--fair-sched=yes</option>
|
||||
activates a fair scheduling. Basically, if multiple threads are
|
||||
ready to run, the threads will be scheduled in a round robin
|
||||
fashion. This mechanism is not available on all platforms or
|
||||
linux versions. If not available,
|
||||
using <option>--fair-sched=yes</option> will cause Valgrind to
|
||||
terminate with an error.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem> <para>The value <option>--fair-sched=try</option>
|
||||
activates the fair scheduling if available on the
|
||||
platform. Otherwise, it will automatically fallback
|
||||
to <option>--fair-sched=no</option>.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem> <para>The value <option>--fair-sched=no</option> activates
|
||||
a scheduling mechanism which does not guarantee fairness
|
||||
between threads ready to run.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</para></listitem>
|
||||
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry id="opt.kernel-variant" xreflabel="--kernel-variant">
|
||||
<term>
|
||||
<option>--kernel-variant=variant1,variant2,...</option>
|
||||
@ -1836,8 +1874,8 @@ that your program will use the native threading library, but Valgrind
|
||||
serialises execution so that only one (kernel) thread is running at a
|
||||
time. This approach avoids the horrible implementation problems of
|
||||
implementing a truly multithreaded version of Valgrind, but it does
|
||||
mean that threaded apps run only on one CPU, even if you have a
|
||||
multiprocessor or multicore machine.</para>
|
||||
mean that threaded apps never use more than one CPU simultaneously,
|
||||
even if you have a multiprocessor or multicore machine.</para>
|
||||
|
||||
<para>Valgrind doesn't schedule the threads itself. It merely ensures
|
||||
that only one thread runs at once, using a simple locking scheme. The
|
||||
@ -1860,6 +1898,86 @@ everything is shared (a thread) or nothing is shared (fork-like); partial
|
||||
sharing will fail.
|
||||
</para>
|
||||
|
||||
<sect2 id="manual-core.pthreads_perf_sched" xreflabel="Scheduling and Multi-Thread Performance">
|
||||
<title>Scheduling and Multi-Thread Performance</title>
|
||||
|
||||
<para>A thread executes some code only when it holds the lock. After
|
||||
executing a certain nr of instructions, the running thread will release
|
||||
the lock. All threads ready to run will compete to acquire the lock.</para>
|
||||
|
||||
<para>The option <option>--fair-sched</option> controls the locking mechanism
|
||||
used to serialise the thread execution.</para>
|
||||
|
||||
<para> The default pipe based locking
|
||||
(<option>--fair-sched=no</option>) is available on all platforms. The
|
||||
pipe based locking does not guarantee fairness between threads : it is
|
||||
very well possible that the thread that has just released the lock
|
||||
gets it back directly. When using the pipe based locking, different
|
||||
execution of the same multithreaded application might give very different
|
||||
thread scheduling.</para>
|
||||
|
||||
<para> The futex based locking is available on some platforms.
|
||||
If available, it is activated by <option>--fair-sched=yes</option> or
|
||||
<option>--fair-sched=try</option>. The futex based locking ensures
|
||||
fairness between threads : if multiple threads are ready to run, the lock
|
||||
will be given to the thread which first requested the lock. Note that a thread
|
||||
which is blocked in a system call (e.g. in a blocking read system call) has
|
||||
not (yet) requested the lock: such a thread requests the lock only after the
|
||||
system call is finished.</para>
|
||||
|
||||
<para> The fairness of the futex based locking ensures a better reproducibility
|
||||
of the thread scheduling for different executions of a multithreaded
|
||||
application. This fairness/better reproducibility is particularly
|
||||
interesting when using Helgrind or DRD.</para>
|
||||
|
||||
<para> The Valgrind thread serialisation implies that only one thread
|
||||
is running at a time. On a multiprocessor/multicore system, the
|
||||
running thread is assigned to one of the CPUs by the OS kernel
|
||||
scheduler. When a thread acquires the lock, sometimes the thread will
|
||||
be assigned to the same CPU as the thread that just released the
|
||||
lock. Sometimes, the thread will be assigned to another CPU. When
|
||||
using the pipe based locking, the thread that just acquired the lock
|
||||
will often be scheduled on the same CPU as the thread that just
|
||||
released the lock. With the futex based mechanism, the thread that
|
||||
just acquired the lock will more often be scheduled on another
|
||||
CPU. </para>
|
||||
|
||||
<para>The Valgrind thread serialisation and CPU assignment by the OS
|
||||
kernel scheduler can badly interact with the CPU frequency scaling
|
||||
available on many modern CPUs : to decrease power consumption, the
|
||||
frequency of a CPU or core is automatically decreased if the CPU/core
|
||||
has not been used recently. If the OS kernel often assigns the thread
|
||||
which just acquired the lock to another CPU/core, there is quite some
|
||||
chance that this CPU/core is currently at a low frequency. The
|
||||
frequency of this CPU will be increased after some time. However,
|
||||
during this time, the (only) running thread will have run at a low
|
||||
frequency. Once this thread has run during some time, it will release
|
||||
the lock. Another thread will acquire this lock, and might be
|
||||
scheduled again on another CPU whose clock frequency was decreased in
|
||||
the meantime.</para>
|
||||
|
||||
<para>The futex based locking causes threads to more often switch of
|
||||
CPU/core. So, if CPU frequency scaling is activated, the futex based
|
||||
locking might decrease significantly (up to 50% degradation has been
|
||||
observed) the performance of a multithreaded app running under
|
||||
Valgrind. The pipe based locking also somewhat interacts badly with
|
||||
CPU frequency scaling. Up to 10..20% performance degradation has been
|
||||
observed. </para>
|
||||
|
||||
<para>To avoid this performance degradation, you can indicate to the
|
||||
kernel that all CPUs/cores should always run at maximum clock
|
||||
speed. Depending on your linux distribution, CPU frequency scaling
|
||||
might be controlled using a graphical interface or using command line
|
||||
such as
|
||||
<computeroutput>cpufreq-selector</computeroutput> or
|
||||
<computeroutput>cpufreq-set</computeroutput>. You might also indicate to the
|
||||
OS scheduler to run a Valgrind process on a specific (fixed) CPU using the
|
||||
<computeroutput>taskset</computeroutput> command : running on a fixed
|
||||
CPU should ensure that this specific CPU keeps a high frequency clock speed.
|
||||
</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
|
||||
</sect1>
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user