This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Allow setting CpuVRex bit in .arch directive

From: Jakub Jelinek <jakub at redhat dot com>
To: "H.J. Lu" <hjl dot tools at gmail dot com>
Cc: Binutils <binutils at sourceware dot org>, Uros Bizjak <ubizjak at gmail dot com>, Kirill Yukhin <kirill dot yukhin at gmail dot com>
Date: Tue, 24 May 2016 19:49:33 +0200
Subject: Re: [PATCH] Allow setting CpuVRex bit in .arch directive
Authentication-results: sourceware.org; auth=none
References: <20160521165405 dot GQ28550 at tucnak dot redhat dot com> <20160521170615 dot GE1875 at tucnak dot redhat dot com> <CAMe9rOrSYftrqeWjZQYmWmn7x_h9vHfz9Fcy3=UVUDNr+O2aCA at mail dot gmail dot com>
Reply-to: Jakub Jelinek <jakub at redhat dot com>

On Tue, May 24, 2016 at 10:24:11AM -0700, H.J. Lu wrote:
> On Sat, May 21, 2016 at 10:06 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> > Hi!
> >
> > On Sat, May 21, 2016 at 06:54:05PM +0200, Jakub Jelinek wrote:
> >> I've tried today to check for the various AVX512* ISA issues in GCC
> >> using assembly .arch support.  Seems by default all flags (but l10m/k10m)
> >> are set, but if I want to allow all insns but say AVX512DQ ISA instructions
> >> or something similar, there is no way to do it - there is no way except
> >> for explicit no* flags to remove ISA bits from the default, so one has to
> >> set some CPU and then add all the ISA flags one wants.  Seems most of them
> >> can be added, except for one very important one - the CpuVRex bit.
> >>
> >> Here is a patch to add support for .arch .vrex to set that, another option
> >> might be to set CpuVRex whenever CpuAVX512F is set in 64-bit mode.
> >> Any preferences?
> 
> Do you have a testcase to show how CpuVRex is used?

Try:
	.arch corei7
	.arch .avx512f
	vpxord %xmm15, %xmm15, %xmm15
	vpxord %xmm16, %xmm16, %xmm16

I get:
/tmp/1.s: Assembler messages:
/tmp/1.s:4: Error: bad register name `%xmm16'
and couldn't find any way how to make that assemble if I want to
disable even some ISA set and thus have to start with .arch <cpuname>
and add all the ISA sets I want to enable on top of that CPU.

> > BTW, to my surprise, I haven't found any issues in the compiler this way,
> > even the known ones that I've just fixed.
> > E.g.
> >         .arch   corei7
> >         .arch   .avx512f
> >         .arch   .avx512vl
> >         vinserti32x4    $0x0, %xmm0, %ymm15, %ymm15
> >         vinserti32x4    $0x1, %xmm0, %ymm15, %ymm15
> >         vinserti64x2    $0x0, %xmm0, %ymm15, %ymm15
> >         vinserti64x2    $0x1, %xmm0, %ymm15, %ymm15
> >         vinsertf32x4    $0x0, %xmm0, %ymm15, %ymm15
> >         vinsertf32x4    $0x1, %xmm0, %ymm15, %ymm15
> >         vinsertf64x2    $0x0, %xmm0, %ymm15, %ymm15
> >         vinsertf64x2    $0x1, %xmm0, %ymm15, %ymm15
> > assembles fine, even when it IMHO should not - the 64x2 instructions
> > are all AVX512VL & AVX512DQ.
> >
> 
> Since vinsertf64x2 is an CpuAVX512VL instruction, I don't see
> why it shouldn't assemble.

Is it?  I believe only vinsertf32x4 is, vinsertf64x2 is
CpuAVX512VL & CpuAVX512DQ:

EVEX.NDS.256.66.0F3A.W0 18 /r ib	T4	V/V	AVX512VL	Insert 128 bits of packed single-precision floating-
VINSERTF32X4 ymm1 {k1}{z}, ymm2,			AVX512F		point values from xmm3/m128 and the remaining
xmm3/m128, imm8								values from ymm2 into ymm1 under writemask k1.

EVEX.NDS.512.66.0F3A.W0 18 /r ib	T4	V/V	AVX512F		Insert 128 bits of packed single-precision floating-
VINSERTF32X4 zmm1 {k1}{z}, zmm2,					point values from xmm3/m128 and the remaining
xmm3/m128, imm8								values from zmm2 into zmm1 under writemask k1.

EVEX.NDS.256.66.0F3A.W1 18 /r ib	T2	V/V	AVX512VL	Insert 128 bits of packed double-precision floating-
VINSERTF64X2 ymm1 {k1}{z}, ymm2,			AVX512DQ	point values from xmm3/m128 and the remaining
xmm3/m128, imm8								values from ymm2 into ymm1 under writemask k1.

EVEX.NDS.512.66.0F3A.W1 18 /r ib	T2	V/V	AVX512DQ	Insert 128 bits of packed double-precision floating-
VINSERTF64X2 zmm1 {k1}{z}, zmm2,					point values from xmm3/m128 and the remaining
xmm3/m128, imm8								values from zmm2 into zmm1 under writemask k1.

vinsertf64x2, 4, 0x6618, None, 1, CpuAVX512DQ|CpuAVX512VL, Modrm|EVex=3|Masking=3|VexOpcode=2|VexVVVV=1|VexW=2|VecESize=1|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|XMMword|Unspecified|BaseIndex|Disp8|Disp16|Disp32|Disp32S|Vec_Disp8, RegYMM, RegYMM }

At least in 319433-024.pdf I see in 5.1.5:

The fourth column holds abbreviated CPUID feature flags (e.g. appropriate
bits in CPUID.1:ECX, CPUID.1:EDX for SSE/SSE2/SSE3/SSSE3/SSE4.1/SSE4.2/AVX/F16C support; bits in
CPUID.(EAX=07H,ECX=0):BCX for AVX2/AVX512F etc) that indicate processor support for the instruction. If
the corresponding flag is â0â, the instruction will #UD.

Therefore, my understanding is that you need all the mentioned flags enabled
or it will #UD.  Does binutils treat CpuAVX512DQ|CpuAVX512VL instead
as the insn being enabled in either .arch .avx512vl, or .arch .avx512dq
alone, rather than only in .arch .avx512vl; .arch .avx512dq ?

	Jakub

Follow-Ups:
- Re: [PATCH] Allow setting CpuVRex bit in .arch directive
  - From: H.J. Lu

References:
- [PATCH] Allow setting CpuVRex bit in .arch directive
  - From: Jakub Jelinek
- Re: [PATCH] Allow setting CpuVRex bit in .arch directive
  - From: Jakub Jelinek
- Re: [PATCH] Allow setting CpuVRex bit in .arch directive
  - From: H.J. Lu

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]