From ec83134fa2c183f7acb328c37b13789bf2ad591c Mon Sep 17 00:00:00 2001 From: Philippe Waroquiers Date: Sun, 24 Feb 2013 23:16:58 +0000 Subject: [PATCH] fix 315545 (find_TTEntry_from_hcode): Assertion '(UChar*)sec->tt[tteNo].tcptr <= (UChar*)hcode' failed Assertion valgrind: m_transtab.c:674 (find_TTEntry_from_hcode): Assertion '(UChar*)sec->tt[tteNo].tcptr <= (UChar*)hcode' failed. failure (encountered on some platforms while running gdbsrv tests). The problem is related to invalidated entries and the host_extents mapping between hostcode and the translation table entry. The problem: when an entry is invalidated, the translation table entry is changed to status Deleted. However, the host extent array element is not cleaned up. If a search for a host code address (find_TTEntry_from_hcode) finds this entry, the translation table entry in Deleted status is considered as a 'not found', which ensures that the invalidated entry is not used (e.g. for chaining). This is all ok. However, it might be that this Deleted entry is re-used (see function VG_(add_to_transtab), searching for a Empty or Deleted entry. If the Deleted entry is re-used, then a search for the dead host code can give a result pointing to the re-used entry. That is clearly wrong. Note that it is unclear if this bug can only be triggered while using gdbsrv or if this bug can be triggered with just the "normal" invalidation logic of translation. gdbsrv being a heavy "user" of invalidation, it might be it helps to trigger the code. Alternatively, as gdbsrv invalidation is special (e.g. invalidation of some entries is done during translation of other entries), it might be the bug is specific to gdbsrv. In any case, to avoid the bug: searching for an host code address must not only ignore Deleted entries, but must also ignore an entry found via a host_extent element which is for a Deleted entry that was re-used afterwards (pointed to by a newer host_extent element). Multiple solutions are possible for fixing the bug: Sol1: cleanup the host_extents array when an entry is deleted. The cleanup is however deemed costly: Each invalidate operation must do a search in the host_extents. The host_extents array must then be "compacted" to remove the "dead" host extent element from the array. The compact operation can be avoided if instead of removing the element, one marks instead the element as "dead" e.g. by using one bit of UInt len for that: UInt len : 31; Bool dead : 1; This avoids the compact, but still incurrs the cost of search and modify the host_extent for each entry invalidated. Invalidating entries seems to be a critical operation (e.g. specific ECLASS related data structures have been done to allow fast deletion). => it is deemed that a solution not incurring cost during invaliation is preferrable. * Sol 2: detect in find_TTEntry_from_hcode that the host_extent element is re-used, and handle it similarly to an host_extents which points at a Deleted entry. This detection is possible as if an entry is re-used after having been deleted, this implies that its host code will be after the end of the host code of the deleted entry (as host code of a sector is not re-used). The attached patch implements this solution. * Sol 3: avoid re-using an entry : the entry would then stay in Deleted state. This is deemed not ok as it would imply that invalidation of entries will cause a sector to become full faster. The patch: * adds a new function Bool HostExtent__is_dead (const HostExtent* hx, const Sector* sec) telling if the host extent hx from sector sec is a dead entry. * this function is used in find_TTEntry_from_hcode so that dead host extents are not resulting in host code to be found. * adds a regression test which caused the assert failure before (bug was found/reported/isolated in a small test case by Dejan Jevtic). * To check the logic of HostExtent__is_dead, m_transtab.c sanity check is completed to verify that the nr of entries in use in a sector is equal to the nr of non dead entries in the host extent array. * adds/improves traces in m_transtab.c (enabled at compile time using #define DEBUG_TRANSTAB). Some already existing 'if (0)' conditions are replaced by if (DEBUG_TRANSTAB) Regression tested on f12/x86 debian6/amd64 (also with export EXTRA_REGTEST_OPTS=--sanity-level=4) git-svn-id: svn://svn.valgrind.org/valgrind/trunk@13290 --- NEWS | 1 + coregrind/m_transtab.c | 86 ++++++++++++++++--- gdbserver_tests/Makefile.am | 5 ++ gdbserver_tests/nlself_invalidate.stderr.exp | 8 ++ gdbserver_tests/nlself_invalidate.stderrB.exp | 1 + gdbserver_tests/nlself_invalidate.stdinB.gdb | 14 +++ gdbserver_tests/nlself_invalidate.vgtest | 12 +++ gdbserver_tests/self_invalidate.c | 38 ++++++++ 8 files changed, 152 insertions(+), 13 deletions(-) create mode 100644 gdbserver_tests/nlself_invalidate.stderr.exp create mode 100644 gdbserver_tests/nlself_invalidate.stderrB.exp create mode 100644 gdbserver_tests/nlself_invalidate.stdinB.gdb create mode 100644 gdbserver_tests/nlself_invalidate.vgtest create mode 100644 gdbserver_tests/self_invalidate.c diff --git a/NEWS b/NEWS index 6ba5500bb..ce8a208e2 100644 --- a/NEWS +++ b/NEWS @@ -86,6 +86,7 @@ m = merged into 3_8_BRANCH 312913 [390] Dangling pointers error should also report the alloc stack trace 312980 [390] Building on Mountain Lion generates some compiler warnings 313811 [390] Buffer overflow in assert_fail +315545 [390] (find_TTEntry_from_hcode): Assertion '(UChar*)sec->tt[tteNo].tcptr <= (UChar*)hcode' failed n-i-bz [390] report error for vgdb snapshot requested before execution n-i-bz [390] Some wrong command line options could be ignored n-i-bz [390] same as 303624 (fixed in 3.8.0), but for x86 android diff --git a/coregrind/m_transtab.c b/coregrind/m_transtab.c index 0627aa337..566847ce8 100644 --- a/coregrind/m_transtab.c +++ b/coregrind/m_transtab.c @@ -46,7 +46,7 @@ #include "pub_core_dispatch.h" // For VG_(disp_cp*) addresses -/* #define DEBUG_TRANSTAB */ +#define DEBUG_TRANSTAB 0 /*-------------------------------------------------------------*/ @@ -632,6 +632,40 @@ Int HostExtent__cmpOrd ( const void* v1, const void* v2 ) return 0; /* partial overlap */ } +/* True if hx is a dead host extent, i.e. corresponds to host code + of an entry that was invalidated. */ +static +Bool HostExtent__is_dead (const HostExtent* hx, const Sector* sec) +{ + const UInt tteNo = hx->tteNo; +#define LDEBUG(m) if (DEBUG_TRANSTAB) \ + VG_(printf) (m \ + " start 0x%p len %u sector %d ttslot %u" \ + " tt.entry 0x%llu tt.tcptr 0x%p\n", \ + hx->start, hx->len, (int)(sec - sectors), \ + hx->tteNo, \ + sec->tt[tteNo].entry, sec->tt[tteNo].tcptr) + + /* Entry might have been invalidated and not re-used yet.*/ + if (sec->tt[tteNo].status == Deleted) { + LDEBUG("found deleted entry"); + return True; + } + /* Maybe we found this entry via a host_extents which was + inserted for an entry which was changed to Deleted then + re-used after. If this entry was re-used, then its tcptr + is >= to host_extents start (i.e. the previous tcptr) + len. + This is the case as there is no re-use of host code: a new + entry or re-used entry always gets "higher value" host code. */ + if ((UChar*) sec->tt[tteNo].tcptr >= hx->start + hx->len) { + LDEBUG("found re-used entry"); + return True; + } + + return False; +#undef LDEBUG +} + static __attribute__((noinline)) Bool find_TTEntry_from_hcode( /*OUT*/UInt* from_sNo, /*OUT*/UInt* from_tteNo, @@ -663,10 +697,12 @@ Bool find_TTEntry_from_hcode( /*OUT*/UInt* from_sNo, UInt tteNo = hx->tteNo; /* Do some additional sanity checks. */ vg_assert(tteNo <= N_TTES_PER_SECTOR); - /* Entry might have been invalidated. Consider this - as not found. */ - if (sec->tt[tteNo].status == Deleted) + + /* if this hx entry corresponds to dead host code, we must + tell this code has not been found, as it cannot be patched. */ + if (HostExtent__is_dead (hx, sec)) return False; + vg_assert(sec->tt[tteNo].status == InUse); /* Can only half check that the found TTEntry contains hcode, due to not having a length value for the hcode in the @@ -834,11 +870,14 @@ static void unchain_in_preparation_for_deletion ( VexArch vex_arch, UInt here_sNo, UInt here_tteNo ) { - if (0) - VG_(printf)("QQQ unchain_in_prep %u.%u\n", here_sNo, here_tteNo); + if (DEBUG_TRANSTAB) + VG_(printf)("QQQ unchain_in_prep %u.%u...\n", here_sNo, here_tteNo); UWord i, j, n, m; Int evCheckSzB = LibVEX_evCheckSzB(vex_arch); TTEntry* here_tte = index_tte(here_sNo, here_tteNo); + if (DEBUG_TRANSTAB) + VG_(printf)("... QQQ tt.entry 0x%llu tt.tcptr 0x%p\n", + here_tte->entry, here_tte->tcptr); vg_assert(here_tte->status == InUse); /* Visit all InEdges owned by here_tte. */ @@ -984,7 +1023,7 @@ UInt addEClassNo ( /*MOD*/Sector* sec, Int ec, UShort tteno ) vg_assert(ec >= 0 && ec < ECLASS_N); vg_assert(tteno < N_TTES_PER_SECTOR); - if (0) VG_(printf)("ec %d gets %d\n", ec, (Int)tteno); + if (DEBUG_TRANSTAB) VG_(printf)("ec %d gets %d\n", ec, (Int)tteno); if (sec->ec2tte_used[ec] >= sec->ec2tte_size[ec]) { @@ -1002,7 +1041,7 @@ UInt addEClassNo ( /*MOD*/Sector* sec, Int ec, UShort tteno ) sec->ec2tte_size[ec] = new_sz; sec->ec2tte[ec] = new_ar; - if (0) VG_(printf)("expand ec %d to %d\n", ec, new_sz); + if (DEBUG_TRANSTAB) VG_(printf)("expand ec %d to %d\n", ec, new_sz); } /* Common case */ @@ -1215,13 +1254,29 @@ static Bool sanity_check_all_sectors ( void ) Bool sane; Sector* sec; for (sno = 0; sno < N_SECTORS; sno++) { + Int i; + Int nr_not_dead_hx = 0; + Int szhxa; sec = §ors[sno]; if (sec->tc == NULL) continue; sane = sanity_check_eclasses_in_sector( sec ); if (!sane) return False; + szhxa = VG_(sizeXA)(sec->host_extents); + for (i = 0; i < szhxa; i++) { + const HostExtent* hx = VG_(indexXA)(sec->host_extents, i); + if (!HostExtent__is_dead (hx, sec)) + nr_not_dead_hx++; + } + if (nr_not_dead_hx != sec->tt_n_inuse) { + VG_(debugLog)(0, "transtab", + "nr_not_dead_hx %d sanity fail (expected == in use %d)\n", + nr_not_dead_hx, sec->tt_n_inuse); + return False; + } } + if ( !sanity_check_redir_tt_tc() ) return False; if ( !sanity_check_sector_search_order() ) @@ -1375,7 +1430,8 @@ static void initialiseSector ( Int sno ) VG_(machine_get_VexArchInfo)( &vex_arch, NULL ); /* Visit each just-about-to-be-abandoned translation. */ - if (0) VG_(printf)("QQQ unlink-entire-sector: %d START\n", sno); + if (DEBUG_TRANSTAB) VG_(printf)("QQQ unlink-entire-sector: %d START\n", + sno); for (i = 0; i < N_TTES_PER_SECTOR; i++) { if (sec->tt[i].status == InUse) { vg_assert(sec->tt[i].n_tte2ec >= 1); @@ -1394,7 +1450,8 @@ static void initialiseSector ( Int sno ) sec->tt[i].status = Empty; sec->tt[i].n_tte2ec = 0; } - if (0) VG_(printf)("QQQ unlink-entire-sector: %d END\n", sno); + if (DEBUG_TRANSTAB) VG_(printf)("QQQ unlink-entire-sector: %d END\n", + sno); /* Free up the eclass structures. */ for (i = 0; i < ECLASS_N; i++) { @@ -1467,8 +1524,8 @@ void VG_(add_to_transtab)( VexGuestExtents* vge, /* Generally stay sane */ vg_assert(n_guest_instrs < 200); /* it can be zero, tho */ - if (0) - VG_(printf)("add_to_transtab(entry = 0x%llx, len = %d)\n", + if (DEBUG_TRANSTAB) + VG_(printf)("add_to_transtab(entry = 0x%llx, len = %d) ...\n", entry, code_len); n_in_count++; @@ -1593,6 +1650,9 @@ void VG_(add_to_transtab)( VexGuestExtents* vge, vg_assert(hx_prev->start + hx_prev->len <= hx.start); } VG_(addToXA)(hx_array, &hx); + if (DEBUG_TRANSTAB) + VG_(printf)("... hx.start 0x%p hx.len %u sector %d ttslot %d\n", + hx.start, hx.len, y, i); } /* Update the fast-cache. */ @@ -2228,7 +2288,7 @@ void VG_(print_tt_tc_stats) ( void ) " transtab: discarded %'llu (%'llu -> ?" "?)\n", n_disc_count, n_disc_osize ); - if (0) { + if (DEBUG_TRANSTAB) { Int i; VG_(printf)("\n"); for (i = 0; i < ECLASS_N; i++) { diff --git a/gdbserver_tests/Makefile.am b/gdbserver_tests/Makefile.am index 940a49bf3..1e0e42d01 100644 --- a/gdbserver_tests/Makefile.am +++ b/gdbserver_tests/Makefile.am @@ -112,6 +112,10 @@ EXTRA_DIST = \ nlpasssigalrm.stderr.exp \ nlpasssigalrm.stdinB.gdb \ nlpasssigalrm.stdoutB.exp \ + nlself_invalidate.stderrB.exp \ + nlself_invalidate.stderr.exp \ + nlself_invalidate.stdinB.gdb \ + nlself_invalidate.vgtest \ nlsigvgdb.vgtest \ nlsigvgdb.stderr.exp \ nlsigvgdb.stderrB.exp \ @@ -123,6 +127,7 @@ check_PROGRAMS = \ gone \ main_pic \ passsigalrm \ + self_invalidate \ sleepers \ t \ watchpoints diff --git a/gdbserver_tests/nlself_invalidate.stderr.exp b/gdbserver_tests/nlself_invalidate.stderr.exp new file mode 100644 index 000000000..911b3e4ab --- /dev/null +++ b/gdbserver_tests/nlself_invalidate.stderr.exp @@ -0,0 +1,8 @@ +Nulgrind, the minimal Valgrind tool + +(action at startup) vgdb me ... + + + + + diff --git a/gdbserver_tests/nlself_invalidate.stderrB.exp b/gdbserver_tests/nlself_invalidate.stderrB.exp new file mode 100644 index 000000000..c8b202462 --- /dev/null +++ b/gdbserver_tests/nlself_invalidate.stderrB.exp @@ -0,0 +1 @@ +relaying data between gdb and process .... diff --git a/gdbserver_tests/nlself_invalidate.stdinB.gdb b/gdbserver_tests/nlself_invalidate.stdinB.gdb new file mode 100644 index 000000000..e8a22ff0d --- /dev/null +++ b/gdbserver_tests/nlself_invalidate.stdinB.gdb @@ -0,0 +1,14 @@ +# connect gdb to Valgrind gdbserver: +target remote | ./vgdb --wait=60 --vgdb-prefix=./vgdb-prefix-nlself_invalidate +echo vgdb launched process attached\n +# place a breakpoint that will cause an invalidation of the currently executed translation +break *top +# Now, continue till the end. This should not crash +continue +continue +continue +continue +continue +continue +# and the process should exit very quickly now +quit diff --git a/gdbserver_tests/nlself_invalidate.vgtest b/gdbserver_tests/nlself_invalidate.vgtest new file mode 100644 index 000000000..3902538bc --- /dev/null +++ b/gdbserver_tests/nlself_invalidate.vgtest @@ -0,0 +1,12 @@ +# reproduces a bug triggered by translation chaining and gdbserver +# invalidation due to breakpoint, when there is a jump from a tranlation +# block into itself. +prog: self_invalidate +vgopts: --tool=none --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-nlself_invalidate +stderr_filter: filter_stderr +prereq: test -e gdb && ( ../tests/arch_test amd64 || ../tests/arch_test mips32 || ../tests/arch_test mips64 ) +progB: gdb +argsB: --quiet -l 60 --nx ./self_invalidate +stdinB: nlself_invalidate.stdinB.gdb +stdoutB_filter: filter_make_empty +stderrB_filter: filter_gdb diff --git a/gdbserver_tests/self_invalidate.c b/gdbserver_tests/self_invalidate.c new file mode 100644 index 000000000..52b3d2bef --- /dev/null +++ b/gdbserver_tests/self_invalidate.c @@ -0,0 +1,38 @@ +/* Based on reproducer done by Dejan Jevtic */ +int main (void) +{ +#if defined(VGA_amd64) + /* Note that small changes in the code below + caused the bug not to be triggered anymore. + E.g. removing the dec %%eax avoids the assert + while this removal causes one more loop to be + executed. */ +__asm__ __volatile__ + ("mov $30, %%eax\n\t" + "top:\n\t" + "mov $-4, %%ebx\n\t" + "add %%ebx, %%eax\n\t" + "dec %%eax\n\t" + "cmp $0x0,%%eax\n\t" + "jne top\n\t" + "mov $60, %%eax\n\t" + "mov $0, %%rdi\n\t" + "syscall\n\t" + : : : "eax", "ebx", "rdi" + ); +#elif defined(VGA_mips32) || defined(VGA_mips64) +__asm__ __volatile__ + ("li $t0, 42\n\t" + "top:\n\t" + "li $t1, -4\n\t" + "addu $t0, $t0, $t1\n\t" + "li $t2, -2\n\t" + "addu $t0, $t0, $t2\n\t" + "addiu $t0, $t0, -1\n\t" + "bnez $t0, top\n\t" + "nop\n\t" + : : : "t0", "t1" + ); +#endif + return 0; +}