This corrects a valgrind instruction emulation issue revealed by
a GCC change.
The xscvdpsp,xscvdpspn,xscvdpuxws instructions each convert
double precision values to single precision values, and write
the results into bits 0-32 of the 128 bit target register.
To get the value into the normal position for a scalar register
the result needed to be right-shifted 32 bits, so gcc always
did that.
It was determined that hardware also always did that, so the (redundant)
gcc shift was removed.
This exposed an issue because valgrind was only writing the result to
bits 0-31 of the target register.
This patch updates the emulation to write the result to both of the involved
32-bit fields.
VEX/priv/guest_ppc_toIR.c:
- rearrange ops in dis_vx_conv to update more portions of the target
register with copies of the result. xscvdpsp,xscvdpspn,xscvdpuxws
none/tests/ppc64/test_isa_2_06_part1.c
- update res32 checking to explicitly include fcfids and fcfidus in the
32-bit result grouping.
none/tests/ppc64/test_isa_2_07_part2.c
- correct NULL initializer for logic_tests definition
[*1] - GCC change referenced:
2017-09-26 Michael Meissner <meissner@linux.vnet.ibm.com>
* config/rs6000/rs6000.md (movsi_from_sf): Adjust code to
eliminate doing a 32-bit shift right or vector extract after
doing XSCVDPSPN.
patch submitted by: Will Schmidt <will_schmidt@vnet.ibm.com>
reviewed, committed by: Carl Love <cel@us.ibm.com>
Code in VEX/priv/guest_mips_toIR.c is notably refactored.
DSP ASE dissasembly has been put in a separate file: guest_mipsdsp_toIR.c.
Patch by Aleksandar Rikalo.
ovl was defined as an unsigned long. This would cause warnings from gcc:
guest_s390_toIR.c:195:30: warning: right shift count >= width of type
[-Wshift-count-overflow]
when building on 32bit arches, or building a 32bit secondary arch.
Fix this by defining ovl as ULong which is always guaranteed 64bit.
GCC 7 instroduced -Wimplicit-fallthrough
https://developers.redhat.com/blog/2017/03/10/wimplicit-fallthrough-in-gcc-7/
It caught a couple of bugs, but it does need a bit of extra comments to
explain when a switch case statement fall-through is deliberate. Luckily
with -Wimplicit-fallthrough=2 various existing comments already do that.
I have fixed the bugs, but adding explicit break statements where
necessary and added comments where the fall-through was correct.
https://bugs.kde.org/show_bug.cgi?id=405430
The instruction needs to have the 32-bit "lane" values chopped to 32-bits.
The current lane implementation is not doing the chopping. Need to
explicitly do the chop and add.
Valgrind bug 405362
Bug 398870 - Please add support for instruction vcvtps2ph
Bug 353370 - RDRAND amd64->IR: unhandled instruction bytes: 0x48 0xF 0xC7 0xF0
This commit implements:
* amd64 RDRAND instruction, on hosts that have it.
* amd64 VCVTPH2PS and VCVTPS2PH, on hosts that have it.
The presence/absence of these on the host is now reflected in the CPUID
results returned to the guest. So code that tests for these features in
CPUID and acts accordingly should "just work".
* New test cases, none/tests/amd64/rdrand and none/tests/amd64/f16c. These
are built if the host's assembler can handle them, in the usual way.
Certain projects, e.g. https://angr.io, use VEX as an intermediate
representation for the binary code analysis. In order to make it
possible to use them to analyze S/390 code on Intel, this patch
resolves the following issues in the disassembler:
- Bit fields, which are used to describe instruction formats, map to
different bits on different hosts. This patch replaces them with
macros, e.g. SS.l bit field becomes SS_l macro. Most bit field usages
are replaced using the following perl script:
perl -p -i \
-e 's/\(&ovl\.value\)/&ovl/g;' \
-e 's/ovl\.value/ovl/g;' \
-e 's/ovl\.fmt\.([a-zA-Z\d_]+)\.([a-z\d]+)/$1_$2(ovl)/g' \
priv/guest_s390_toIR.c
Since after that there are no more structs, #pragma pack is also
removed.
- Instructions are loaded from memory as words, which behaves
differently depending on host endianness. Such loads are replaced by
assembly of words from separately loaded bytes. This affects regular
disassembly functions, and also s390_irgen_EXRL(), which loads
last_execute_target this way.
- disInstr_S390() explicitly prohibits little-endian hosts with an
assert, which is removed in this patch.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Clang/LLVM trips over my_offsetof in VEX/auxprogs/genoffsets.c. See LLVM
PR 40890 for details (https://bugs.llvm.org/show_bug.cgi?id=40890).
Now, it's a Clang bug that Clang exits on an assertion failure rather than
emits a diagnostic, but the previous my_offsetof expression is a pointer,
not an integer. Add a cast as done in other definitions of offsetof in
the tree. Patch from Ed Maste <emaste@freebsd.org>.
GCC might use subfe x, x, x to initialize x to 0 or -1, based on
whether the carry flag is set. This happens in some cases when g++
compiles resetting a unique_ptr. The "trick" used by the compiler is
that it can AND a pointer with the register x (now 0x0 or 0xffffffff)
to set something to NULL or to the given pointer.
subfe is implemented as rD = (log not)rA + rB + XER[CA]
if we instead implement it as rD = rB - rA - (XER[CA] ^ 1)
then memcheck can see that rB and Ra cancel each other out if they
are the same.
https://bugs.kde.org/show_bug.cgi?id=404054
In s390_isel_vec_expr_wrk() there has been some assignments of enum-typed
values to variables of different enum types. This fixes it. It also adds a
few initialisations to variables of type HReg for safety against the
possibility of them being used uninitialised. No functional change. Tested
by Andreas Arnez.
2018-Dec-27: some of int<->fp conversion operations have been renamed so as to
have a trailing _DEP, meaning "deprecated". This is because they don't
specify a rounding mode to be used for the conversion and so are
underspecified. Their use should be replaced with equivalents that do specify
a rounding mode, either as a first argument or using a suffix on the name,
that indicates the rounding mode to use.
The wrong bit number was used when checking for the vector facility. This
can result in a fatal emulation error: "Encountered an instruction that
requires the vector facility. That facility is not available on this
host."
In many cases the wrong facility bit was usually set as well, hence
nothing bad happened. But when running Valgrind within a Qemu/KVM guest,
the wrong bit was not (always?) set and the emulation error occurred.
This fix simply corrects the vector facility bit number, changing it from
128 to 129.
addex uses OV as carry in and carry out. For all other instructions
OV is the signed overflow flag. And instructions like adde use CA
as carry.
Replace set_XER_OV_OV32 with set_XER_OV_OV32_ADDEX, which will
call calculate_XER_CA_64 and calculate_XER_CA_32, but with OV
as input, and sets OV and OV32.
Enable test_addex in none/tests/ppc64/test_isa_3_0.c and update
the expected output. test_addex would fail to match the expected
output before this patch.
For most SIMD operations that happen on 64-bit values (as would arise from MMX
instructions, for example, such as Add16x4, CmpEQ32x2, etc), generate code
that performs the operation using SSE/SSE2 instructions on values in the low
halves of XMM registers. This is much more efficient than the previous scheme
of calling out to helper functions written in C. There are still a few SIMD64
operations done via helpers, though.
.. by adding support for MOVQ xmm/ireg and using that to implement 64HLtoV128,
4x64toV256 and their inverses. This reduces the number of instructions,
removes the use of memory as an intermediary, and avoids store-forwarding
stalls.
pshufb mm/xmm/ymm rearranges byte lanes in vector registers. It's fairly
widely used, but we generated terrible code for it. With this patch, we just
generate, at the back end, pshufb plus a bit of masking, which is a great
improvement.
* changes set_AV_CR6 so that it does scalar comparisons against zero,
rather than sometimes against an all-ones word. This is something
that Memcheck can instrument exactly.
* in Memcheck, requests expensive instrumentation of Iop_Cmp{EQ,NE}64
by default on ppc64le.
https://bugs.kde.org/show_bug.cgi?id=386945#c62
This makes it possible for memcheck to know which part of the 128bit
vector is defined, even if the load is partly beyond an addressable block.
Partially resolves bug 386945.
This makes it possible for memcheck to know which part of the 128bit
vector is defined, even if the load is partly beyond an addressable block.
Partially resolves bug 386945.
This makes it possible for memcheck to analyse the new gcc strcmp
inlined code correctly even if the ldbrx load is partly beyond an
addressable block.
Partially resolves bug 386945.
This adds support for the z/Architecture vector FP instructions that were
introduced with z13.
The patch was contributed by Vadim Barkov, with some clean-up and minor
adjustments by Andreas Arnez.
This generalises the existing spec rules for W of 32 bits:
W <u 0---(N-1)---0 1 0---0 or
(that is, B/NB after SUBL, where dep2 has the above form), to also cover
W <=u 0---(N-1)---0 0 1---1
(that is, BE/NBE after SUBL, where dept2 has the specified form).
Patch from Nicolas B. Pierron (nicolas.b.pierron@nbp.name).
This patch addresses the following:
* Fix the implementation of LOCGHI. Previously Valgrind performed 32-bit
sign extension instead of 64-bit sign extension on the immediate value.
* Advertise VXRS in HWCAP. If no VXRS are advertised, but the program
uses vector registers, this could cause problems with a glibc built with
"-march=z13".
This pertains to bug 386945.
VEX/priv/guest_ppc_toIR.c:
gen_POPCOUNT: use Iop_PopCount{32,64} where possible.
gen_vpopcntd_mode32: use Iop_PopCount32.
for cntlz{w,d}, use Iop_CtzNat{32,64}.
gen_byterev32: use Iop_Reverse8sIn32_x1 instead of lengthy sequence.
verbose_Clz32: remove (was unused anyway).
VEX/priv/host_ppc_defs.c, VEX/priv/host_ppc_defs.h:
Dont emit cnttz{w,d}. We may need them on a target which doesn't support
them. Instead we can generate a fairly reasonable alternative sequence with
cntlz{w,d} instead.
Add support for emitting popcnt{w,d}.
VEX/priv/host_ppc_isel.c
Add support for: Iop_ClzNat32 Iop_ClzNat64
Redo support for: Iop_Ctz{32,64} and their Nat equivalents, so as to not use
cnttz{w,d}, as mentioned above.
Add support for: Iop_PopCount64 Iop_PopCount32 Iop_Reverse8sIn32_x1
This is part of the fix for bug 386945. It adds the following IROps, plus
their supporting type- and printing- fragments:
Iop_Reverse8sIn32_x1: 32-bit byteswap. A fancy name, but it is consistent
with naming for the other swapping IROps that already exist.
Iop_PopCount64, Iop_PopCount32: population count
Iop_ClzNat64, Iop_ClzNat32, Iop_CtzNat64, Iop_CtzNat32: counting leading and
trailing zeroes, with "natural" (Nat) semantics for a zero input, meaning, in
the case of zero input, return the number of bits in the word. These
functionally overlap with the existing Iop_Clz64, Iop_Clz32, Iop_Ctz64,
Iop_Ctz32. The existing operations are undefined in case of a zero input.
Adding these new variants avoids the complexity of having to change the
declared semantics of the existing operations. Instead they are deprecated
but still available for use.
The VEX implementation of each of the z/Architecture instructions LOCHI,
LOCHHI, and LOCGHI treats the immediate 16-bit operand as an unsigned
integer instead of a signed integer. This is fixed.