This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Allow setting CpuVRex bit in .arch directive


On Tue, May 24, 2016 at 10:49 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, May 24, 2016 at 10:24:11AM -0700, H.J. Lu wrote:
>> On Sat, May 21, 2016 at 10:06 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>> > Hi!
>> >
>> > On Sat, May 21, 2016 at 06:54:05PM +0200, Jakub Jelinek wrote:
>> >> I've tried today to check for the various AVX512* ISA issues in GCC
>> >> using assembly .arch support.  Seems by default all flags (but l10m/k10m)
>> >> are set, but if I want to allow all insns but say AVX512DQ ISA instructions
>> >> or something similar, there is no way to do it - there is no way except
>> >> for explicit no* flags to remove ISA bits from the default, so one has to
>> >> set some CPU and then add all the ISA flags one wants.  Seems most of them
>> >> can be added, except for one very important one - the CpuVRex bit.
>> >>
>> >> Here is a patch to add support for .arch .vrex to set that, another option
>> >> might be to set CpuVRex whenever CpuAVX512F is set in 64-bit mode.
>> >> Any preferences?
>>
>> Do you have a testcase to show how CpuVRex is used?
>
> Try:
>         .arch corei7
>         .arch .avx512f
>         vpxord %xmm15, %xmm15, %xmm15
>         vpxord %xmm16, %xmm16, %xmm16
>
> I get:
> /tmp/1.s: Assembler messages:
> /tmp/1.s:4: Error: bad register name `%xmm16'

I opened:

https://sourceware.org/bugzilla/show_bug.cgi?id=20141

> and couldn't find any way how to make that assemble if I want to
> disable even some ISA set and thus have to start with .arch <cpuname>
> and add all the ISA sets I want to enable on top of that CPU.

So you want to just disable  AVX512D, no thing else.  Wouldn't a
".noarch" directive work better?

>> > BTW, to my surprise, I haven't found any issues in the compiler this way,
>> > even the known ones that I've just fixed.
>> > E.g.
>> >         .arch   corei7
>> >         .arch   .avx512f
>> >         .arch   .avx512vl
>> >         vinserti32x4    $0x0, %xmm0, %ymm15, %ymm15
>> >         vinserti32x4    $0x1, %xmm0, %ymm15, %ymm15
>> >         vinserti64x2    $0x0, %xmm0, %ymm15, %ymm15
>> >         vinserti64x2    $0x1, %xmm0, %ymm15, %ymm15
>> >         vinsertf32x4    $0x0, %xmm0, %ymm15, %ymm15
>> >         vinsertf32x4    $0x1, %xmm0, %ymm15, %ymm15
>> >         vinsertf64x2    $0x0, %xmm0, %ymm15, %ymm15
>> >         vinsertf64x2    $0x1, %xmm0, %ymm15, %ymm15
>> > assembles fine, even when it IMHO should not - the 64x2 instructions
>> > are all AVX512VL & AVX512DQ.
>> >
>>
>> Since vinsertf64x2 is an CpuAVX512VL instruction, I don't see
>> why it shouldn't assemble.
>
> Is it?  I believe only vinsertf32x4 is, vinsertf64x2 is
> CpuAVX512VL & CpuAVX512DQ:
>
> EVEX.NDS.256.66.0F3A.W0 18 /r ib        T4      V/V     AVX512VL        Insert 128 bits of packed single-precision floating-
> VINSERTF32X4 ymm1 {k1}{z}, ymm2,                        AVX512F         point values from xmm3/m128 and the remaining
> xmm3/m128, imm8                                                         values from ymm2 into ymm1 under writemask k1.
>
> EVEX.NDS.512.66.0F3A.W0 18 /r ib        T4      V/V     AVX512F         Insert 128 bits of packed single-precision floating-
> VINSERTF32X4 zmm1 {k1}{z}, zmm2,                                        point values from xmm3/m128 and the remaining
> xmm3/m128, imm8                                                         values from zmm2 into zmm1 under writemask k1.
>
> EVEX.NDS.256.66.0F3A.W1 18 /r ib        T2      V/V     AVX512VL        Insert 128 bits of packed double-precision floating-
> VINSERTF64X2 ymm1 {k1}{z}, ymm2,                        AVX512DQ        point values from xmm3/m128 and the remaining
> xmm3/m128, imm8                                                         values from ymm2 into ymm1 under writemask k1.
>
> EVEX.NDS.512.66.0F3A.W1 18 /r ib        T2      V/V     AVX512DQ        Insert 128 bits of packed double-precision floating-
> VINSERTF64X2 zmm1 {k1}{z}, zmm2,                                        point values from xmm3/m128 and the remaining
> xmm3/m128, imm8                                                         values from zmm2 into zmm1 under writemask k1.
>
> vinsertf64x2, 4, 0x6618, None, 1, CpuAVX512DQ|CpuAVX512VL, Modrm|EVex=3|Masking=3|VexOpcode=2|VexVVVV=1|VexW=2|VecESize=1|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|XMMword|Unspecified|BaseIndex|Disp8|Disp16|Disp32|Disp32S|Vec_Disp8, RegYMM, RegYMM }
>
> At least in 319433-024.pdf I see in 5.1.5:
>
> The fourth column holds abbreviated CPUID feature flags (e.g. appropriate
> bits in CPUID.1:ECX, CPUID.1:EDX for SSE/SSE2/SSE3/SSSE3/SSE4.1/SSE4.2/AVX/F16C support; bits in
> CPUID.(EAX=07H,ECX=0):BCX for AVX2/AVX512F etc) that indicate processor support for the instruction. If
> the corresponding flag is â0â, the instruction will #UD.
>
> Therefore, my understanding is that you need all the mentioned flags enabled
> or it will #UD.  Does binutils treat CpuAVX512DQ|CpuAVX512VL instead
> as the insn being enabled in either .arch .avx512vl, or .arch .avx512dq
> alone, rather than only in .arch .avx512vl; .arch .avx512dq ?
>

I opened:

https://sourceware.org/bugzilla/show_bug.cgi?id=20140

-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]