This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Re: [PATCH] x86: Optimize with EVEX128 encoding for AVX512VL
On Fri, Mar 9, 2018 at 3:54 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 09.03.18 at 12:21, <hjl.tools@gmail.com> wrote:
>> On Fri, Mar 9, 2018 at 12:30 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 09.03.18 at 04:56, <hjl.tools@gmail.com> wrote:
>>>> This is the patch I am checking in. If i.vec_encoding == vex_encoding_evex,
>>>> we need to use EVEX128 encoding.
>>>
>>> But this retains some of the ISA extensions problem - only EVEX512
>>> should be used without "i.tm.cpu_flags.bitfield.cpuavx512vl ||
>>> cpu_arch_isa_flags.bitfield.cpuavx512vl".
>>>
>>
>> The condition is:
>>
>> && (i.tm.opcode_modifier.vex
>> || (!i.mask
>> && !i.rounding
>> && is_evex_encoding (&i.tm)
>> && (i.tm.cpu_flags.bitfield.cpuavx512vl
>> || cpu_arch_isa_flags.bitfield.cpuavx512vl)))
>>
>> For EVEX512 instructions, if
>>
>> i.tm.cpu_flags.bitfield.cpuavx512vl ||
>> cpu_arch_isa_flags.bitfield.cpuavx512vl
>>
>> false, the optimization is disabled.
>
> Oh, yes, I see - that asn't visible from the patch alone. I'm sorry
> for the noise. I now also see why you could nicely get rid of that
> extra loop over the register numbers.
>
We can do better. If EVEX encoding isn't required, we can encode
EVEX instructions with VEX128. I am checking in this patch.
--
H.J.
From d34afb133d7224a91e179151fa7f4a6e5eb07e03 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Fri, 9 Mar 2018 07:43:30 -0800
Subject: [PATCH] x86: Encode EVEX instructions with VEX128 if possible
If EVEX encoding isn't required, we can encode EVEX instructions with
VEX128.
* config/tc-i386.c (optimize_encoding): Encode EVEX instructions
with VEX128 if EVEX encoding isn't required.
* testsuite/gas/i386/optimize-1.d: Updated.
* testsuite/gas/i386/x86-64-optimize-2.d: Likewise.
---
gas/config/tc-i386.c | 3 ++-
gas/testsuite/gas/i386/optimize-1.d | 24 ++++++++++++------------
gas/testsuite/gas/i386/x86-64-optimize-2.d | 24 ++++++++++++------------
3 files changed, 26 insertions(+), 25 deletions(-)
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 724376096f..e94e01cf10 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -3875,7 +3875,8 @@ optimize_encoding (void)
|| (!i.mask
&& !i.rounding
&& is_evex_encoding (&i.tm)
- && (i.tm.cpu_flags.bitfield.cpuavx512vl
+ && (i.vec_encoding != vex_encoding_evex
+ || i.tm.cpu_flags.bitfield.cpuavx512vl
|| cpu_arch_isa_flags.bitfield.cpuavx512vl)))
&& ((i.tm.base_opcode == 0x55
|| i.tm.base_opcode == 0x6655
diff --git a/gas/testsuite/gas/i386/optimize-1.d b/gas/testsuite/gas/i386/optimize-1.d
index f7da296697..3ea6e75b9a 100644
--- a/gas/testsuite/gas/i386/optimize-1.d
+++ b/gas/testsuite/gas/i386/optimize-1.d
@@ -10,52 +10,52 @@ Disassembly of section .text:
0+ <_start>:
+[a-f0-9]+: 62 f1 f5 4f 55 e9 vandnpd %zmm1,%zmm1,%zmm5\{%k7\}
+[a-f0-9]+: 62 f1 f5 af 55 e9 vandnpd %ymm1,%ymm1,%ymm5\{%k7\}\{z\}
- +[a-f0-9]+: 62 f1 f5 48 55 e9 vandnpd %zmm1,%zmm1,%zmm5
+ +[a-f0-9]+: c5 f1 55 e9 vandnpd %xmm1,%xmm1,%xmm5
+[a-f0-9]+: c5 f1 55 e9 vandnpd %xmm1,%xmm1,%xmm5
+[a-f0-9]+: 62 f1 74 4f 55 e9 vandnps %zmm1,%zmm1,%zmm5\{%k7\}
+[a-f0-9]+: 62 f1 74 af 55 e9 vandnps %ymm1,%ymm1,%ymm5\{%k7\}\{z\}
- +[a-f0-9]+: 62 f1 74 48 55 e9 vandnps %zmm1,%zmm1,%zmm5
+ +[a-f0-9]+: c5 f0 55 e9 vandnps %xmm1,%xmm1,%xmm5
+[a-f0-9]+: c5 f0 55 e9 vandnps %xmm1,%xmm1,%xmm5
+[a-f0-9]+: c5 f1 df e9 vpandn %xmm1,%xmm1,%xmm5
+[a-f0-9]+: 62 f1 75 4f df e9 vpandnd %zmm1,%zmm1,%zmm5\{%k7\}
+[a-f0-9]+: 62 f1 75 af df e9 vpandnd %ymm1,%ymm1,%ymm5\{%k7\}\{z\}
- +[a-f0-9]+: 62 f1 75 48 df e9 vpandnd %zmm1,%zmm1,%zmm5
+ +[a-f0-9]+: c5 f1 df e9 vpandn %xmm1,%xmm1,%xmm5
+[a-f0-9]+: c5 f1 df e9 vpandn %xmm1,%xmm1,%xmm5
+[a-f0-9]+: 62 f1 f5 4f df e9 vpandnq %zmm1,%zmm1,%zmm5\{%k7\}
+[a-f0-9]+: 62 f1 f5 af df e9 vpandnq %ymm1,%ymm1,%ymm5\{%k7\}\{z\}
- +[a-f0-9]+: 62 f1 f5 48 df e9 vpandnq %zmm1,%zmm1,%zmm5
+ +[a-f0-9]+: c5 f1 df e9 vpandn %xmm1,%xmm1,%xmm5
+[a-f0-9]+: c5 f1 df e9 vpandn %xmm1,%xmm1,%xmm5
+[a-f0-9]+: 62 f1 f5 4f 57 e9 vxorpd %zmm1,%zmm1,%zmm5\{%k7\}
+[a-f0-9]+: 62 f1 f5 af 57 e9 vxorpd %ymm1,%ymm1,%ymm5\{%k7\}\{z\}
- +[a-f0-9]+: 62 f1 f5 48 57 e9 vxorpd %zmm1,%zmm1,%zmm5
+ +[a-f0-9]+: c5 f1 57 e9 vxorpd %xmm1,%xmm1,%xmm5
+[a-f0-9]+: c5 f1 57 e9 vxorpd %xmm1,%xmm1,%xmm5
+[a-f0-9]+: 62 f1 74 4f 57 e9 vxorps %zmm1,%zmm1,%zmm5\{%k7\}
+[a-f0-9]+: 62 f1 74 af 57 e9 vxorps %ymm1,%ymm1,%ymm5\{%k7\}\{z\}
- +[a-f0-9]+: 62 f1 74 48 57 e9 vxorps %zmm1,%zmm1,%zmm5
+ +[a-f0-9]+: c5 f0 57 e9 vxorps %xmm1,%xmm1,%xmm5
+[a-f0-9]+: c5 f0 57 e9 vxorps %xmm1,%xmm1,%xmm5
+[a-f0-9]+: c5 f1 ef e9 vpxor %xmm1,%xmm1,%xmm5
+[a-f0-9]+: 62 f1 75 4f ef e9 vpxord %zmm1,%zmm1,%zmm5\{%k7\}
+[a-f0-9]+: 62 f1 75 af ef e9 vpxord %ymm1,%ymm1,%ymm5\{%k7\}\{z\}
- +[a-f0-9]+: 62 f1 75 48 ef e9 vpxord %zmm1,%zmm1,%zmm5
+ +[a-f0-9]+: c5 f1 ef e9 vpxor %xmm1,%xmm1,%xmm5
+[a-f0-9]+: c5 f1 ef e9 vpxor %xmm1,%xmm1,%xmm5
+[a-f0-9]+: 62 f1 f5 4f ef e9 vpxorq %zmm1,%zmm1,%zmm5\{%k7\}
+[a-f0-9]+: 62 f1 f5 af ef e9 vpxorq %ymm1,%ymm1,%ymm5\{%k7\}\{z\}
- +[a-f0-9]+: 62 f1 f5 48 ef e9 vpxorq %zmm1,%zmm1,%zmm5
+ +[a-f0-9]+: c5 f1 ef e9 vpxor %xmm1,%xmm1,%xmm5
+[a-f0-9]+: c5 f1 ef e9 vpxor %xmm1,%xmm1,%xmm5
+[a-f0-9]+: 62 f1 75 4f f8 e9 vpsubb %zmm1,%zmm1,%zmm5\{%k7\}
+[a-f0-9]+: 62 f1 75 af f8 e9 vpsubb %ymm1,%ymm1,%ymm5\{%k7\}\{z\}
- +[a-f0-9]+: 62 f1 75 48 f8 e9 vpsubb %zmm1,%zmm1,%zmm5
+ +[a-f0-9]+: c5 f1 f8 e9 vpsubb %xmm1,%xmm1,%xmm5
+[a-f0-9]+: c5 f1 f8 e9 vpsubb %xmm1,%xmm1,%xmm5
+[a-f0-9]+: 62 f1 75 4f f9 e9 vpsubw %zmm1,%zmm1,%zmm5\{%k7\}
+[a-f0-9]+: 62 f1 75 af f9 e9 vpsubw %ymm1,%ymm1,%ymm5\{%k7\}\{z\}
- +[a-f0-9]+: 62 f1 75 48 f9 e9 vpsubw %zmm1,%zmm1,%zmm5
+ +[a-f0-9]+: c5 f1 f9 e9 vpsubw %xmm1,%xmm1,%xmm5
+[a-f0-9]+: c5 f1 f9 e9 vpsubw %xmm1,%xmm1,%xmm5
+[a-f0-9]+: 62 f1 75 4f fa e9 vpsubd %zmm1,%zmm1,%zmm5\{%k7\}
+[a-f0-9]+: 62 f1 75 af fa e9 vpsubd %ymm1,%ymm1,%ymm5\{%k7\}\{z\}
- +[a-f0-9]+: 62 f1 75 48 fa e9 vpsubd %zmm1,%zmm1,%zmm5
+ +[a-f0-9]+: c5 f1 fa e9 vpsubd %xmm1,%xmm1,%xmm5
+[a-f0-9]+: c5 f1 fa e9 vpsubd %xmm1,%xmm1,%xmm5
+[a-f0-9]+: 62 f1 f5 4f fb e9 vpsubq %zmm1,%zmm1,%zmm5\{%k7\}
+[a-f0-9]+: 62 f1 f5 af fb e9 vpsubq %ymm1,%ymm1,%ymm5\{%k7\}\{z\}
- +[a-f0-9]+: 62 f1 f5 48 fb e9 vpsubq %zmm1,%zmm1,%zmm5
+ +[a-f0-9]+: c5 f1 fb e9 vpsubq %xmm1,%xmm1,%xmm5
+[a-f0-9]+: c5 f1 fb e9 vpsubq %xmm1,%xmm1,%xmm5
#pass
diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2.d b/gas/testsuite/gas/i386/x86-64-optimize-2.d
index 9222efe8c1..ba3a2df887 100644
--- a/gas/testsuite/gas/i386/x86-64-optimize-2.d
+++ b/gas/testsuite/gas/i386/x86-64-optimize-2.d
@@ -10,7 +10,7 @@ Disassembly of section .text:
0+ <_start>:
+[a-f0-9]+: 62 71 f5 4f 55 f9 vandnpd %zmm1,%zmm1,%zmm15\{%k7\}
+[a-f0-9]+: 62 71 f5 af 55 f9 vandnpd %ymm1,%ymm1,%ymm15\{%k7\}\{z\}
- +[a-f0-9]+: 62 71 f5 48 55 f9 vandnpd %zmm1,%zmm1,%zmm15
+ +[a-f0-9]+: c5 71 55 f9 vandnpd %xmm1,%xmm1,%xmm15
+[a-f0-9]+: c5 71 55 f9 vandnpd %xmm1,%xmm1,%xmm15
+[a-f0-9]+: 62 e1 f5 48 55 c1 vandnpd %zmm1,%zmm1,%zmm16
+[a-f0-9]+: 62 e1 f5 08 55 c1 vandnpd %xmm1,%xmm1,%xmm16
@@ -18,7 +18,7 @@ Disassembly of section .text:
+[a-f0-9]+: 62 b1 f5 00 55 c9 vandnpd %xmm17,%xmm17,%xmm1
+[a-f0-9]+: 62 71 74 4f 55 f9 vandnps %zmm1,%zmm1,%zmm15\{%k7\}
+[a-f0-9]+: 62 71 74 af 55 f9 vandnps %ymm1,%ymm1,%ymm15\{%k7\}\{z\}
- +[a-f0-9]+: 62 71 74 48 55 f9 vandnps %zmm1,%zmm1,%zmm15
+ +[a-f0-9]+: c5 70 55 f9 vandnps %xmm1,%xmm1,%xmm15
+[a-f0-9]+: c5 70 55 f9 vandnps %xmm1,%xmm1,%xmm15
+[a-f0-9]+: 62 e1 74 48 55 c1 vandnps %zmm1,%zmm1,%zmm16
+[a-f0-9]+: 62 e1 74 08 55 c1 vandnps %xmm1,%xmm1,%xmm16
@@ -27,7 +27,7 @@ Disassembly of section .text:
+[a-f0-9]+: c5 71 df f9 vpandn %xmm1,%xmm1,%xmm15
+[a-f0-9]+: 62 71 75 4f df f9 vpandnd %zmm1,%zmm1,%zmm15\{%k7\}
+[a-f0-9]+: 62 71 75 af df f9 vpandnd %ymm1,%ymm1,%ymm15\{%k7\}\{z\}
- +[a-f0-9]+: 62 71 75 48 df f9 vpandnd %zmm1,%zmm1,%zmm15
+ +[a-f0-9]+: c5 71 df f9 vpandn %xmm1,%xmm1,%xmm15
+[a-f0-9]+: c5 71 df f9 vpandn %xmm1,%xmm1,%xmm15
+[a-f0-9]+: 62 e1 75 48 df c1 vpandnd %zmm1,%zmm1,%zmm16
+[a-f0-9]+: 62 e1 75 08 df c1 vpandnd %xmm1,%xmm1,%xmm16
@@ -35,7 +35,7 @@ Disassembly of section .text:
+[a-f0-9]+: 62 b1 75 00 df c9 vpandnd %xmm17,%xmm17,%xmm1
+[a-f0-9]+: 62 71 f5 4f df f9 vpandnq %zmm1,%zmm1,%zmm15\{%k7\}
+[a-f0-9]+: 62 71 f5 af df f9 vpandnq %ymm1,%ymm1,%ymm15\{%k7\}\{z\}
- +[a-f0-9]+: 62 71 f5 48 df f9 vpandnq %zmm1,%zmm1,%zmm15
+ +[a-f0-9]+: c5 71 df f9 vpandn %xmm1,%xmm1,%xmm15
+[a-f0-9]+: c5 71 df f9 vpandn %xmm1,%xmm1,%xmm15
+[a-f0-9]+: 62 e1 f5 48 df c1 vpandnq %zmm1,%zmm1,%zmm16
+[a-f0-9]+: 62 e1 f5 08 df c1 vpandnq %xmm1,%xmm1,%xmm16
@@ -43,7 +43,7 @@ Disassembly of section .text:
+[a-f0-9]+: 62 b1 f5 00 df c9 vpandnq %xmm17,%xmm17,%xmm1
+[a-f0-9]+: 62 71 f5 4f 57 f9 vxorpd %zmm1,%zmm1,%zmm15\{%k7\}
+[a-f0-9]+: 62 71 f5 af 57 f9 vxorpd %ymm1,%ymm1,%ymm15\{%k7\}\{z\}
- +[a-f0-9]+: 62 71 f5 48 57 f9 vxorpd %zmm1,%zmm1,%zmm15
+ +[a-f0-9]+: c5 71 57 f9 vxorpd %xmm1,%xmm1,%xmm15
+[a-f0-9]+: c5 71 57 f9 vxorpd %xmm1,%xmm1,%xmm15
+[a-f0-9]+: 62 e1 f5 48 57 c1 vxorpd %zmm1,%zmm1,%zmm16
+[a-f0-9]+: 62 e1 f5 08 57 c1 vxorpd %xmm1,%xmm1,%xmm16
@@ -51,7 +51,7 @@ Disassembly of section .text:
+[a-f0-9]+: 62 b1 f5 00 57 c9 vxorpd %xmm17,%xmm17,%xmm1
+[a-f0-9]+: 62 71 74 4f 57 f9 vxorps %zmm1,%zmm1,%zmm15\{%k7\}
+[a-f0-9]+: 62 71 74 af 57 f9 vxorps %ymm1,%ymm1,%ymm15\{%k7\}\{z\}
- +[a-f0-9]+: 62 71 74 48 57 f9 vxorps %zmm1,%zmm1,%zmm15
+ +[a-f0-9]+: c5 70 57 f9 vxorps %xmm1,%xmm1,%xmm15
+[a-f0-9]+: c5 70 57 f9 vxorps %xmm1,%xmm1,%xmm15
+[a-f0-9]+: 62 e1 74 48 57 c1 vxorps %zmm1,%zmm1,%zmm16
+[a-f0-9]+: 62 e1 74 08 57 c1 vxorps %xmm1,%xmm1,%xmm16
@@ -60,7 +60,7 @@ Disassembly of section .text:
+[a-f0-9]+: c5 71 ef f9 vpxor %xmm1,%xmm1,%xmm15
+[a-f0-9]+: 62 71 75 4f ef f9 vpxord %zmm1,%zmm1,%zmm15\{%k7\}
+[a-f0-9]+: 62 71 75 af ef f9 vpxord %ymm1,%ymm1,%ymm15\{%k7\}\{z\}
- +[a-f0-9]+: 62 71 75 48 ef f9 vpxord %zmm1,%zmm1,%zmm15
+ +[a-f0-9]+: c5 71 ef f9 vpxor %xmm1,%xmm1,%xmm15
+[a-f0-9]+: c5 71 ef f9 vpxor %xmm1,%xmm1,%xmm15
+[a-f0-9]+: 62 e1 75 48 ef c1 vpxord %zmm1,%zmm1,%zmm16
+[a-f0-9]+: 62 e1 75 08 ef c1 vpxord %xmm1,%xmm1,%xmm16
@@ -68,7 +68,7 @@ Disassembly of section .text:
+[a-f0-9]+: 62 b1 75 00 ef c9 vpxord %xmm17,%xmm17,%xmm1
+[a-f0-9]+: 62 71 f5 4f ef f9 vpxorq %zmm1,%zmm1,%zmm15\{%k7\}
+[a-f0-9]+: 62 71 f5 af ef f9 vpxorq %ymm1,%ymm1,%ymm15\{%k7\}\{z\}
- +[a-f0-9]+: 62 71 f5 48 ef f9 vpxorq %zmm1,%zmm1,%zmm15
+ +[a-f0-9]+: c5 71 ef f9 vpxor %xmm1,%xmm1,%xmm15
+[a-f0-9]+: c5 71 ef f9 vpxor %xmm1,%xmm1,%xmm15
+[a-f0-9]+: 62 e1 f5 48 ef c1 vpxorq %zmm1,%zmm1,%zmm16
+[a-f0-9]+: 62 e1 f5 08 ef c1 vpxorq %xmm1,%xmm1,%xmm16
@@ -76,7 +76,7 @@ Disassembly of section .text:
+[a-f0-9]+: 62 b1 f5 00 ef c9 vpxorq %xmm17,%xmm17,%xmm1
+[a-f0-9]+: 62 71 75 4f f8 f9 vpsubb %zmm1,%zmm1,%zmm15\{%k7\}
+[a-f0-9]+: 62 71 75 af f8 f9 vpsubb %ymm1,%ymm1,%ymm15\{%k7\}\{z\}
- +[a-f0-9]+: 62 71 75 48 f8 f9 vpsubb %zmm1,%zmm1,%zmm15
+ +[a-f0-9]+: c5 71 f8 f9 vpsubb %xmm1,%xmm1,%xmm15
+[a-f0-9]+: c5 71 f8 f9 vpsubb %xmm1,%xmm1,%xmm15
+[a-f0-9]+: 62 e1 75 48 f8 c1 vpsubb %zmm1,%zmm1,%zmm16
+[a-f0-9]+: 62 e1 75 08 f8 c1 vpsubb %xmm1,%xmm1,%xmm16
@@ -84,7 +84,7 @@ Disassembly of section .text:
+[a-f0-9]+: 62 b1 75 00 f8 c9 vpsubb %xmm17,%xmm17,%xmm1
+[a-f0-9]+: 62 71 75 4f f9 f9 vpsubw %zmm1,%zmm1,%zmm15\{%k7\}
+[a-f0-9]+: 62 71 75 af f9 f9 vpsubw %ymm1,%ymm1,%ymm15\{%k7\}\{z\}
- +[a-f0-9]+: 62 71 75 48 f9 f9 vpsubw %zmm1,%zmm1,%zmm15
+ +[a-f0-9]+: c5 71 f9 f9 vpsubw %xmm1,%xmm1,%xmm15
+[a-f0-9]+: c5 71 f9 f9 vpsubw %xmm1,%xmm1,%xmm15
+[a-f0-9]+: 62 e1 75 48 f9 c1 vpsubw %zmm1,%zmm1,%zmm16
+[a-f0-9]+: 62 e1 75 08 f9 c1 vpsubw %xmm1,%xmm1,%xmm16
@@ -92,7 +92,7 @@ Disassembly of section .text:
+[a-f0-9]+: 62 b1 75 00 f9 c9 vpsubw %xmm17,%xmm17,%xmm1
+[a-f0-9]+: 62 71 75 4f fa f9 vpsubd %zmm1,%zmm1,%zmm15\{%k7\}
+[a-f0-9]+: 62 71 75 af fa f9 vpsubd %ymm1,%ymm1,%ymm15\{%k7\}\{z\}
- +[a-f0-9]+: 62 71 75 48 fa f9 vpsubd %zmm1,%zmm1,%zmm15
+ +[a-f0-9]+: c5 71 fa f9 vpsubd %xmm1,%xmm1,%xmm15
+[a-f0-9]+: c5 71 fa f9 vpsubd %xmm1,%xmm1,%xmm15
+[a-f0-9]+: 62 e1 75 48 fa c1 vpsubd %zmm1,%zmm1,%zmm16
+[a-f0-9]+: 62 e1 75 08 fa c1 vpsubd %xmm1,%xmm1,%xmm16
@@ -100,7 +100,7 @@ Disassembly of section .text:
+[a-f0-9]+: 62 b1 75 00 fa c9 vpsubd %xmm17,%xmm17,%xmm1
+[a-f0-9]+: 62 71 f5 4f fb f9 vpsubq %zmm1,%zmm1,%zmm15\{%k7\}
+[a-f0-9]+: 62 71 f5 af fb f9 vpsubq %ymm1,%ymm1,%ymm15\{%k7\}\{z\}
- +[a-f0-9]+: 62 71 f5 48 fb f9 vpsubq %zmm1,%zmm1,%zmm15
+ +[a-f0-9]+: c5 71 fb f9 vpsubq %xmm1,%xmm1,%xmm15
+[a-f0-9]+: c5 71 fb f9 vpsubq %xmm1,%xmm1,%xmm15
+[a-f0-9]+: 62 e1 f5 48 fb c1 vpsubq %zmm1,%zmm1,%zmm16
+[a-f0-9]+: 62 e1 f5 08 fb c1 vpsubq %xmm1,%xmm1,%xmm16
--
2.14.3