Performance improvements from 4 to 8% obtained on amd64 on the perf tests by:
1. using UNLIKELY inside tracing macros
2. avoid calling CLG_(switch_thread)(tid) on the hot patch setup_bbcc
unless tid differs from CLG_(current_tid).
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@12939