GCC10 defaults to -fno-common which produces this error:
guest_s390_defs.h:291: multiple definition of `s390x_vec_op_t
This is because GCC10 detects there are multiple definitions of the
variable s390x_vec_op_t. We don't want to define a variable though.
We had wanted to define a type (one that currently isn't used).
Fix this by making it a typedef enum.
https://bugzilla.redhat.com/show_bug.cgi?id=1794482
Mark the immediate as signed for Iex_Get and Ist_Put for Ity_V128 on BE.
The Malu_MADD case in emit_MIPSInst in VEX/priv/host_mips_defs.c expects a
signed immediate, hence the change.
This fixes an assert in host_mips_defs.c.
This might happen when the source contains something like
if (something_involving_pcmpxstrx && foo) { .. }
which might use amd64g_dirtyhelper_PCMPxSTRx.
Instruction decoding was not correct. In some cases, BEQC has been decoded
as BNEC and vice versa.
It caused problems with musl malloc() function.
Patch by Stefan Maksimovic.
UASWM and UALWM have not been implemented correctly.
Code used to implement SWM and LWM has been reused without making all of
the required adjustments.
This fixes memcpy() and memset() libc functions.
This code portion introduced a SEGFAULT:
- if (&i->NMin.Cas.sz){
+ if (i->NMin.Cas.sz == 8) {
The implementation of Ist_Cas has been fixed and missing logging has been
added as well.
During a save (push) instruction adjusting the SP is required before doing
a store, otherwise Memcheck reports warning because of a write operation
outside of the stack area.
This splits function iselCondCode into iselCondCode_C and iselCondCode_R, the
former of which is the old one that computes boolean expressions into an amd64
condition code, but the latter being new, and computes boolean expressions
into the lowest bit of an integer register. This enables much better code
generation for Or1/And1 trees, which now result quite commonly from the new
&&-recovery machinery in the front end.
Until now these have been handled by possibly widening the value to 64 bits,
if necessary, followed by a 64-bit shift. That wastes instructions and code
space.
.. hence treating it as a dependency-breaking idiom. Also handle the
resulting IRConst_V256(0xFFFFFFFF) in the amd64 insn selector.
(dup of 96de5118f5332ae145912ebe91b8fa143df74b8d from 'grail')
Possibly fixes#409429.
This isn't a good result. It merely disables the new functionality on MIPS
because enabling it causes segfaults, even with --tool=none, the cause of
which are not obvious. It is only chasing through conditional branches that
is disabled, though. Chasing through unconditional branches (jumps and calls
to known destinations) is still enabled.
* guest_generic_bb_to_IR.c bb_to_IR(): Disable, hopefully temporarily, the key
&&-recovery transformation on MIPS.
* VEX/priv/host_mips_isel.c iselWordExpr_R_wrk(), iselCondCode_wrk():
- add support for Iop_And1, Iop_Or1, and IRConst_U1. This code is my best
guess about what is correct, but is #if 0'd for now.
- Properly guard some Iex_Binop cases that lacked a leading check that the
expression actually was a Binop.
This isn't a good result. It merely disables the new functionality on s390x,
for the reason stated below.
* guest_generic_bb_to_IR.c bb_to_IR(): Disable, hopefully temporarily, the key
&&-recovery transformation on s390x, since it causes Memcheck to crash for
reasons I couldn't figure out. It also exposes some missing Iex_ITE cases
in the s390x insn selector, although those shouldn't be a big deal to fix.
Maybe it's some strangeness to do with the s390x "ex" instruction. I don't
exactly understand how that trickery works, but from some study of it, I
didn't see anything obviously wrong.
It is only chasing through conditional branches that is disabled for s390x.
Chasing through unconditional branches (jumps and calls to known
destinations) is still enabled.
* host_s390_isel.c s390_isel_cc(): No functional change. Code has been added
here to handle the new Iop_And1 and Iop_Or1, and it is somewhat tested, but
is not needed until conditional branch chasing is enabled on s390x.
* do_minimal_initial_iropt_BB: for ppc64, flatten rather than assert flatness.
(Kludge. Sigh.)
* priv/host_ppc_isel.c iselCondCode_wrk(): handle And1 and Or1, the
not-particularly-optimal way
* priv/host_ppc_isel.c iselCondCode_wrk(): handle Ico_U1(0).
* priv/guest_generic_bb_to_IR.c expr_is_guardable(), stmt_is_guardable():
add some missing cases
* do_minimal_initial_iropt_BB: add comment (no functional change)
* priv/host_arm_isel.c iselCondCode_wrk(): handle And1 and Or1, the
not-particularly-optimal way
* guest_arm64_toIR.c: use |sigill_diag| to guard auxiliary diagnostic printing
in case of decode failure
* guest_generic_bb_to_IR.c expr_is_guardable(), stmt_is_guardable(): handle a
few more cases that didn't turn up so far on x86 or amd64
* host_arm64_defs.[ch]:
- new instruction ARM64Instr_Set64, to copy a condition code value into a
register (the CSET instruction)
- use this to reimplement Iop_And1 and Iop_Or1
* Rewrite do_minimal_initial_iropt_BB so it doesn't do full constant folding;
that is unnecessary expense at this point, and later passes will do it
anyway
* do_iropt_BB: don't flatten the incoming block, because
do_minimal_initial_iropt_BB will have run earlier and done so. But at least
for the moment, assert that it really is flat.
* VEX/priv/guest_generic_bb_to_IR.c create_self_checks_as_needed: generate
flat IR so as not to fail the abovementioned assertion.
I believe this completes the target-independent aspects of this work, and also
the x86_64 specifics (of which there are very few).
* removes --vex-guest-chase-cond=no|yes. This was never used in practice.
* rename --vex-guest-chase-thresh=<0..99> to --vex-guest-chase=no|yes. In
otherwords, downgrade it from a numeric flag to a boolean one, that can
simply disable all chasing if required. (Some tools, notably Callgrind,
force-disable block chasing, so this functionality at least needs to be
retained).
* document some functions
* change naming and terminology from 'speculation' (which it isn't)
to 'guarding' (which it is)
* add a new function |primopMightTrap| so as to avoid conditionalising
IRExprs involving potentially trappy IROps
This branch contains code which avoids Memcheck false positives resulting from
gcc and clang creating branches on uninitialised data. For example:
bool isClosed;
if (src.isRect(..., &isClosed, ...) && isClosed) {
clang9 -O2 compiles this as:
callq 7e7cdc0 <_ZNK6SkPath6isRectEP6SkRectPbPNS_9DirectionE>
cmpb $0x0,-0x60(%rbp) // "if (isClosed) { .."
je 7ed9e08 // "je after"
test %al,%al // "if (return value of call is nonzero) { .."
je 7ed9e08 // "je after"
..
after:
That is, the && has been evaluated right-to-left. This is a correct
transformation if the compiler can prove that the call to |isRect| returns
|false| along any path on which it does not write its out-parameter
|&isClosed|.
In general, for the lazy-semantics (L->R) C-source-level && operator, we have
|A && B| == |B && A| if you can prove that |B| is |false| whenever A is
undefined. I assume that clang has some kind of interprocedural analysis that
tells it that. The compiler is further obliged to show that |B| won't trap,
since it is now being evaluated speculatively, but that's no big deal to
prove.
A similar result holds, per de Morgan, for transformations involving the C
language ||.
Memcheck correctly handles bitwise &&/|| in the presence of undefined inputs.
It has done so since the beginning. However, it assumes that every
conditional branch in the program is important -- any branch on uninitialised
data is an error. However, this idiom demonstrates otherwise. It defeats
Memcheck's existing &&/|| handling because the &&/|| is spread across two
basic blocks, rather than being bitwise.
This initial commit contains a complete initial implementation to fix that.
The basic idea is to detect the && condition spread across two blocks, and
transform it into a single block using bitwise &&. Then Memcheck's existing
accurate instrumentation of bitwise && will correctly handle it. The
transformation is
<contents of basic block A>
C1 = ...
if (!C1) goto after
.. falls through to ..
<contents of basic block B>
C2 = ...
if (!C2) goto after
.. falls through to ..
after:
===>
<contents of basic block A>
C1 = ...
<contents of basic block B, conditional on C1>
C2 = ...
if (!C1 && !C2) goto after
.. falls through to ..
after:
This assumes that <contents of basic block B> can be conditionalised, at the
IR level, so that the guest state is not modified if C1 is |false|. That's
not possible for all IRStmt kinds, but it is possible for a large enough
subset to make this transformation feasible.
There is no corresponding transformation that recovers an || condition,
because, per de Morgan, that merely corresponds to swapping the side exits vs
fallthoughs, and inverting the sense of the tests, and the pattern-recogniser
as implemented checks all possible combinations already.
The analysis and block-building is performed on the IR returned by the
architecture specific front ends. So they are almost not modified at all: in
fact they are simplified because all logic related to chasing through
unconditional and conditional branches has been removed from them, redone at
the IR level, and centralised.
The only file with big changes is the IRSB constructor logic,
guest_generic_bb_to_IR.c (a.k.a the "trace builder"). This is a complete
rewrite.
There is some additional work for the IR optimiser (ir_opt.c), since that
needs to do a quick initial simplification pass of the basic blocks, in order
to reduce the number of different IR variants that the trace-builder has to
pattern match on. An important followup task is to further reduce this cost.
There are two new IROps to support this: And1 and Or1, which both operate on
Ity_I1. They are regarded as evaluating both arguments, consistent with AndXX
and OrXX for all other sizes. It is possible to synthesise at the IR level by
widening the value to Ity_I8 or above, doing bitwise And/Or, and re-narrowing
it, but this gives inefficient code, so I chose to represent them directly.
The transformation appears to work for amd64-linux. In principle -- because
it operates entirely at the IR level -- it should work for all targets,
providing the initial pre-simplification pass can normalise the block ends
into the required form. That will no doubt require some tuning. And1 and Or1
will have to be implemented in all instruction selectors, but that's easy
enough.
Remaining FIXMEs in the code:
* Rename `expr_is_speculatable` et al to `expr_is_conditionalisable`. These
functions merely conditionalise code; the speculation has already been done
by gcc/clang.
* `expr_is_speculatable`: properly check that Iex_Unop/Binop don't contain
operatins that might trap (Div, Rem, etc).
* `analyse_block_end`: recognise all block ends, and abort on ones that can't
be recognised. Needed to ensure we don't miss any cases.
* maybe: guest_amd64_toIR.c: generate better code for And1/Or1
* ir_opt.c, do_iropt_BB: remove the initial flattening pass since presimp
will already have done it
* ir_opt.c, do_minimal_initial_iropt_BB (a.k.a. presimp). Make this as
cheap as possible. In particular, calling `cprop_BB_wrk` is total overkill
since we only need copy propagation.
* ir_opt.c: once the above is done, remove boolean parameter for `cprop_BB_wrk`.
* ir_opt.c: concatenate_irsbs: maybe de-dup w.r.t. maybe_unroll_loop_BB.
* remove option `guest_chase_cond` from VexControl (?). It was never used.
* convert option `guest_chase_thresh` from VexControl (?) into a Bool, since
the revised code here only cares about the 0-vs-nonzero distinction now.
Each member of the structure declaration for `VexGuestS390XState' is
commented with its offset within the structure. But starting with
`guest_r0' and for all remaining members, these comments indicate the
wrong offsets, and the actual offsets are 8 bytes higher. Adjust the
comments accordingly.
Necessary changes to support nanoMIPS on Linux.
Part 1/4 - VEX changes
Patch by Aleksandar Rikalo, Dimitrije Nikolic, Tamara Vlahovic and
Aleksandra Karadzic.
nanoMIPS architecture in brief
Designed for embedded devices, nanoMIPS is a variable lengths instruction
set architecture (ISA) offering high performance in substantially reduced
code size.
The nanoMIPS ISA combines recoded and new 16-, 32-, and 48-bit instructions
to achieve an ideal balance of performance and code density.
It incorporates all MIPS32 instructions and architecture modules including
MIPS DSP and MIPS MT, as well as new instructions for advanced code size
reduction.
nanoMIPS is supported in release 6 of the MIPS architecture. It is first
implemented in the new MIPS I7200 multi-threaded multi-core processor
series. Compiler support is included in the MIPS GNU-based development
tools.
Related KDE issue: #400872.
(from bug 404406 comment 0):
Valgrind on s390x currently lacks support for the miscellaneous
instruction extensions facility 2.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Fix false positives when invoking s390-check-opcodes.pl. Also clean up
some code formatting issues in that script. Add the instructions TPEI and
IRBM to guest_s390_toIR.c and s390-opcodes.csv, so they are not longer
warned about.
Add IBM z14 and IBM z14 ZR1 to the list of known machine models. Add an
expected output variant for z14 to the s390x-specific "ecag" test case.
In README.s390, refer to a current version of the z/Architecture
Principles of Operation that describes the instructions introduced with
IBM z14.
The amd64 CPUID dirtyhelpers are mostly static since they emulate some
existing CPU "family". The avx2 ("i7-4910MQ") CPUID variant however
can "dynamicly" enable rdrand and/or f16c if the host supports them.
Do the same for the avx_and_cx16 ("i5-2300") CPUID variant.
https://bugs.kde.org/show_bug.cgi?id=408009
Add Iop_Exp2_32Fx4 to VEX/pub/libvex_ir.h to support the 2^x instruction.
Enable the existing test support for the two instructions in
none/tests/ppc64/subnormal_test.c and none/tests/ppc64/jm-insns.c.
https://bugs.kde.org/show_bug.cgi?id=407340
The result of the floating point instructions vmaddfp, vnmsubfp,
vaddfp, vsubfp, vmaxfp, vminfp, vrefp, vrsqrtefp, vcmpeqfp, vcmpeqfp,
vcmpgefp, vcmpgtfp are controlled by the setting of the NJ bit in
the VSCR register. If VSCR[NJ] = 0; then denormalized values are
handled as specified by Java and the IEEE standard. If the bit is
a 1, then the denormalized element in the vector is replaced with
a zero.
Valgrind was not properly handling the denormalized case for these
instructions. This patch fixes the issue.
https://bugs.kde.org/show_bug.cgi?id=406256