This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Re: [PATCH] Allow setting CpuVRex bit in .arch directive
- From: Jakub Jelinek <jakub at redhat dot com>
- To: "H.J. Lu" <hjl dot tools at gmail dot com>
- Cc: Binutils <binutils at sourceware dot org>, Uros Bizjak <ubizjak at gmail dot com>, Kirill Yukhin <kirill dot yukhin at gmail dot com>
- Date: Tue, 24 May 2016 19:49:33 +0200
- Subject: Re: [PATCH] Allow setting CpuVRex bit in .arch directive
- Authentication-results: sourceware.org; auth=none
- References: <20160521165405 dot GQ28550 at tucnak dot redhat dot com> <20160521170615 dot GE1875 at tucnak dot redhat dot com> <CAMe9rOrSYftrqeWjZQYmWmn7x_h9vHfz9Fcy3=UVUDNr+O2aCA at mail dot gmail dot com>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Tue, May 24, 2016 at 10:24:11AM -0700, H.J. Lu wrote:
> On Sat, May 21, 2016 at 10:06 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> > Hi!
> >
> > On Sat, May 21, 2016 at 06:54:05PM +0200, Jakub Jelinek wrote:
> >> I've tried today to check for the various AVX512* ISA issues in GCC
> >> using assembly .arch support. Seems by default all flags (but l10m/k10m)
> >> are set, but if I want to allow all insns but say AVX512DQ ISA instructions
> >> or something similar, there is no way to do it - there is no way except
> >> for explicit no* flags to remove ISA bits from the default, so one has to
> >> set some CPU and then add all the ISA flags one wants. Seems most of them
> >> can be added, except for one very important one - the CpuVRex bit.
> >>
> >> Here is a patch to add support for .arch .vrex to set that, another option
> >> might be to set CpuVRex whenever CpuAVX512F is set in 64-bit mode.
> >> Any preferences?
>
> Do you have a testcase to show how CpuVRex is used?
Try:
.arch corei7
.arch .avx512f
vpxord %xmm15, %xmm15, %xmm15
vpxord %xmm16, %xmm16, %xmm16
I get:
/tmp/1.s: Assembler messages:
/tmp/1.s:4: Error: bad register name `%xmm16'
and couldn't find any way how to make that assemble if I want to
disable even some ISA set and thus have to start with .arch <cpuname>
and add all the ISA sets I want to enable on top of that CPU.
> > BTW, to my surprise, I haven't found any issues in the compiler this way,
> > even the known ones that I've just fixed.
> > E.g.
> > .arch corei7
> > .arch .avx512f
> > .arch .avx512vl
> > vinserti32x4 $0x0, %xmm0, %ymm15, %ymm15
> > vinserti32x4 $0x1, %xmm0, %ymm15, %ymm15
> > vinserti64x2 $0x0, %xmm0, %ymm15, %ymm15
> > vinserti64x2 $0x1, %xmm0, %ymm15, %ymm15
> > vinsertf32x4 $0x0, %xmm0, %ymm15, %ymm15
> > vinsertf32x4 $0x1, %xmm0, %ymm15, %ymm15
> > vinsertf64x2 $0x0, %xmm0, %ymm15, %ymm15
> > vinsertf64x2 $0x1, %xmm0, %ymm15, %ymm15
> > assembles fine, even when it IMHO should not - the 64x2 instructions
> > are all AVX512VL & AVX512DQ.
> >
>
> Since vinsertf64x2 is an CpuAVX512VL instruction, I don't see
> why it shouldn't assemble.
Is it? I believe only vinsertf32x4 is, vinsertf64x2 is
CpuAVX512VL & CpuAVX512DQ:
EVEX.NDS.256.66.0F3A.W0 18 /r ib T4 V/V AVX512VL Insert 128 bits of packed single-precision floating-
VINSERTF32X4 ymm1 {k1}{z}, ymm2, AVX512F point values from xmm3/m128 and the remaining
xmm3/m128, imm8 values from ymm2 into ymm1 under writemask k1.
EVEX.NDS.512.66.0F3A.W0 18 /r ib T4 V/V AVX512F Insert 128 bits of packed single-precision floating-
VINSERTF32X4 zmm1 {k1}{z}, zmm2, point values from xmm3/m128 and the remaining
xmm3/m128, imm8 values from zmm2 into zmm1 under writemask k1.
EVEX.NDS.256.66.0F3A.W1 18 /r ib T2 V/V AVX512VL Insert 128 bits of packed double-precision floating-
VINSERTF64X2 ymm1 {k1}{z}, ymm2, AVX512DQ point values from xmm3/m128 and the remaining
xmm3/m128, imm8 values from ymm2 into ymm1 under writemask k1.
EVEX.NDS.512.66.0F3A.W1 18 /r ib T2 V/V AVX512DQ Insert 128 bits of packed double-precision floating-
VINSERTF64X2 zmm1 {k1}{z}, zmm2, point values from xmm3/m128 and the remaining
xmm3/m128, imm8 values from zmm2 into zmm1 under writemask k1.
vinsertf64x2, 4, 0x6618, None, 1, CpuAVX512DQ|CpuAVX512VL, Modrm|EVex=3|Masking=3|VexOpcode=2|VexVVVV=1|VexW=2|VecESize=1|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|XMMword|Unspecified|BaseIndex|Disp8|Disp16|Disp32|Disp32S|Vec_Disp8, RegYMM, RegYMM }
At least in 319433-024.pdf I see in 5.1.5:
The fourth column holds abbreviated CPUID feature flags (e.g. appropriate
bits in CPUID.1:ECX, CPUID.1:EDX for SSE/SSE2/SSE3/SSSE3/SSE4.1/SSE4.2/AVX/F16C support; bits in
CPUID.(EAX=07H,ECX=0):BCX for AVX2/AVX512F etc) that indicate processor support for the instruction. If
the corresponding flag is â0â, the instruction will #UD.
Therefore, my understanding is that you need all the mentioned flags enabled
or it will #UD. Does binutils treat CpuAVX512DQ|CpuAVX512VL instead
as the insn being enabled in either .arch .avx512vl, or .arch .avx512dq
alone, rather than only in .arch .avx512vl; .arch .avx512dq ?
Jakub