This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Re: Common SSE4.1/SSE5 insns broken
- From: "H.J. Lu" <hjl at lucon dot org>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: binutils at sources dot redhat dot com
- Date: Fri, 28 Dec 2007 07:45:16 -0800
- Subject: Re: Common SSE4.1/SSE5 insns broken
- References: <20071228091034.GI2947@sunsite.mff.cuni.cz>
On Fri, Dec 28, 2007 at 10:10:34AM +0100, Jakub Jelinek wrote:
> Hi!
>
> Doesn't CpuSSE4_1|CpuSSE5 mean it requires SSE4.1 AND SSE5 rather than
> SSE4.1 OR SSE5?
>
> ptest, 2, 0x660f3817, None, 3, CpuSSE4_1|CpuSSE5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
> roundpd, 3, 0x660f3a09, None, 3, CpuSSE4_1|CpuSSE5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
> roundps, 3, 0x660f3a08, None, 3, CpuSSE4_1|CpuSSE5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
> roundsd, 3, 0x660f3a0b, None, 3, CpuSSE4_1|CpuSSE5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Imm8, BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
> roundss, 3, 0x660f3a0a, None, 3, CpuSSE4_1|CpuSSE5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
>
> Say e.g.:
>
> .arch generic64
> .arch .sse5
> ptest %xmm1,%xmm0
> frczss %xmm2, %xmm1
>
> fails to assemble with
> Warning: `ptest' is not supported on `generic64.sse5'
> Error: suffix or operands invalid for `ptest'
>
> and likewise for .arch .sse4.1. Works if both .sse5 and .sse4.1
> are present. Do we need yet another bit for the common
> SSE4.1 / SSE5 instructions, which .sse4.1, .sse5 would
> both set (and be set in unknown too)?
>
I am checking in this patch to fix it.
Thanks.
H.J.
----
gas/testsuite/
2007-12-28 H.J. Lu <hongjiu.lu@intel.com>
* gas/i386/arch-1.d: New file.
* gas/i386/arch-1.s: Likewise.
* gas/i386/arch-2.d: Likewise.
* gas/i386/arch-2.s: Likewise.
* gas/i386/arch-3.d: Likewise.
* gas/i386/arch-3.s: Likewise.
* gas/i386/arch-4.d: Likewise.
* gas/i386/arch-4.s: Likewise.
* gas/i386/i386.exp: Run arch-1, arch-2, arch-3 and arch-4.
opcodes/
2007-12-28 H.J. Lu <hongjiu.lu@intel.com>
* i386-gen.c (cpu_flag_init): Add CpuSSE4_1_Or_5 to
CPU_SSE4_1_FLAGS, CPU_SSE4_2_FLAGS and CPU_SSE5_FLAGS.
(cpu_flags): Add CpuSSE4_1_Or_5.
* i386-init.h: Regenerated.
* i386-tbl.h: Likewise.
* i386-opc.h (CpuSSE4_1_Or_5): New.
(CpuLM): Updated.
(i386_cpu_flags): Add cpusse4_1_or_5.
* i386-opc.tbl: Use CpuSSE4_1_Or_5 instead of CpuSSE4_1|CpuSSE5
on ptest roundpd, roundps, roundsd and roundss.
--- binutils/gas/testsuite/gas/i386/arch-1.d.arch 2007-12-28 07:30:42.000000000 -0800
+++ binutils/gas/testsuite/gas/i386/arch-1.d 2007-12-28 07:42:49.000000000 -0800
@@ -0,0 +1,15 @@
+#objdump: -dw
+#name: i386 arch 1
+
+.*: file format .*
+
+Disassembly of section .text:
+
+0+ <.text>:
+[ ]*[a-f0-9]+: 66 0f 38 17 c1 ptest %xmm1,%xmm0
+[ ]*[a-f0-9]+: 66 0f 3a 09 c1 00 roundpd \$0x0,%xmm1,%xmm0
+[ ]*[a-f0-9]+: 66 0f 3a 08 c1 00 roundps \$0x0,%xmm1,%xmm0
+[ ]*[a-f0-9]+: 66 0f 3a 0b c1 00 roundsd \$0x0,%xmm1,%xmm0
+[ ]*[a-f0-9]+: 66 0f 3a 0a c1 00 roundss \$0x0,%xmm1,%xmm0
+[ ]*[a-f0-9]+: 66 0f 38 41 d9 phminposuw %xmm1,%xmm3
+#pass
--- binutils/gas/testsuite/gas/i386/arch-1.s.arch 2007-12-28 07:30:42.000000000 -0800
+++ binutils/gas/testsuite/gas/i386/arch-1.s 2007-12-28 07:41:44.000000000 -0800
@@ -0,0 +1,9 @@
+# Test .arch .sse4.1
+.arch generic32
+.arch .sse4.1
+ptest %xmm1,%xmm0
+roundpd $0,%xmm1,%xmm0
+roundps $0,%xmm1,%xmm0
+roundsd $0,%xmm1,%xmm0
+roundss $0,%xmm1,%xmm0
+phminposuw %xmm1,%xmm3
--- binutils/gas/testsuite/gas/i386/arch-2.d.arch 2007-12-28 07:30:42.000000000 -0800
+++ binutils/gas/testsuite/gas/i386/arch-2.d 2007-12-28 07:42:59.000000000 -0800
@@ -0,0 +1,15 @@
+#objdump: -dw
+#name: i386 arch 2
+
+.*: file format .*
+
+Disassembly of section .text:
+
+0+ <.text>:
+[ ]*[a-f0-9]+: 66 0f 38 17 c1 ptest %xmm1,%xmm0
+[ ]*[a-f0-9]+: 66 0f 3a 09 c1 00 roundpd \$0x0,%xmm1,%xmm0
+[ ]*[a-f0-9]+: 66 0f 3a 08 c1 00 roundps \$0x0,%xmm1,%xmm0
+[ ]*[a-f0-9]+: 66 0f 3a 0b c1 00 roundsd \$0x0,%xmm1,%xmm0
+[ ]*[a-f0-9]+: 66 0f 3a 0a c1 00 roundss \$0x0,%xmm1,%xmm0
+[ ]*[a-f0-9]+: f2 0f 38 f1 d9 crc32l %ecx,%ebx
+#pass
--- binutils/gas/testsuite/gas/i386/arch-2.s.arch 2007-12-28 07:30:42.000000000 -0800
+++ binutils/gas/testsuite/gas/i386/arch-2.s 2007-12-28 07:41:48.000000000 -0800
@@ -0,0 +1,9 @@
+# Test .arch .sse4.2
+.arch generic32
+.arch .sse4.2
+ptest %xmm1,%xmm0
+roundpd $0,%xmm1,%xmm0
+roundps $0,%xmm1,%xmm0
+roundsd $0,%xmm1,%xmm0
+roundss $0,%xmm1,%xmm0
+crc32 %ecx,%ebx
--- binutils/gas/testsuite/gas/i386/arch-3.d.arch 2007-12-28 07:30:42.000000000 -0800
+++ binutils/gas/testsuite/gas/i386/arch-3.d 2007-12-28 07:43:08.000000000 -0800
@@ -0,0 +1,15 @@
+#objdump: -dw
+#name: i386 arch 3
+
+.*: file format .*
+
+Disassembly of section .text:
+
+0+ <.text>:
+[ ]*[a-f0-9]+: 66 0f 38 17 c1 ptest %xmm1,%xmm0
+[ ]*[a-f0-9]+: 66 0f 3a 09 c1 00 roundpd \$0x0,%xmm1,%xmm0
+[ ]*[a-f0-9]+: 66 0f 3a 08 c1 00 roundps \$0x0,%xmm1,%xmm0
+[ ]*[a-f0-9]+: 66 0f 3a 0b c1 00 roundsd \$0x0,%xmm1,%xmm0
+[ ]*[a-f0-9]+: 66 0f 3a 0a c1 00 roundss \$0x0,%xmm1,%xmm0
+[ ]*[a-f0-9]+: f2 0f 38 f1 d9 crc32l %ecx,%ebx
+#pass
--- binutils/gas/testsuite/gas/i386/arch-3.s.arch 2007-12-28 07:30:42.000000000 -0800
+++ binutils/gas/testsuite/gas/i386/arch-3.s 2007-12-28 07:41:53.000000000 -0800
@@ -0,0 +1,9 @@
+# Test .arch .sse4
+.arch generic32
+.arch .sse4
+ptest %xmm1,%xmm0
+roundpd $0,%xmm1,%xmm0
+roundps $0,%xmm1,%xmm0
+roundsd $0,%xmm1,%xmm0
+roundss $0,%xmm1,%xmm0
+crc32 %ecx,%ebx
--- binutils/gas/testsuite/gas/i386/arch-4.d.arch 2007-12-28 07:30:42.000000000 -0800
+++ binutils/gas/testsuite/gas/i386/arch-4.d 2007-12-28 07:43:16.000000000 -0800
@@ -0,0 +1,15 @@
+#objdump: -dw
+#name: i386 arch 4
+
+.*: file format .*
+
+Disassembly of section .text:
+
+0+ <.text>:
+[ ]*[a-f0-9]+: 66 0f 38 17 c1 ptest %xmm1,%xmm0
+[ ]*[a-f0-9]+: 66 0f 3a 09 c1 00 roundpd \$0x0,%xmm1,%xmm0
+[ ]*[a-f0-9]+: 66 0f 3a 08 c1 00 roundps \$0x0,%xmm1,%xmm0
+[ ]*[a-f0-9]+: 66 0f 3a 0b c1 00 roundsd \$0x0,%xmm1,%xmm0
+[ ]*[a-f0-9]+: 66 0f 3a 0a c1 00 roundss \$0x0,%xmm1,%xmm0
+[ ]*[a-f0-9]+: 0f 7a 12 ca frczss %xmm2,%xmm1
+#pass
--- binutils/gas/testsuite/gas/i386/arch-4.s.arch 2007-12-28 07:30:42.000000000 -0800
+++ binutils/gas/testsuite/gas/i386/arch-4.s 2007-12-28 07:41:59.000000000 -0800
@@ -0,0 +1,9 @@
+# Test .arch .sse5
+.arch generic32
+.arch .sse5
+ptest %xmm1,%xmm0
+roundpd $0,%xmm1,%xmm0
+roundps $0,%xmm1,%xmm0
+roundsd $0,%xmm1,%xmm0
+roundss $0,%xmm1,%xmm0
+frczss %xmm2, %xmm1
--- binutils/gas/testsuite/gas/i386/i386.exp.arch 2007-12-23 21:28:11.000000000 -0800
+++ binutils/gas/testsuite/gas/i386/i386.exp 2007-12-28 07:29:28.000000000 -0800
@@ -98,6 +98,10 @@ if [expr ([istarget "i*86-*-*"] || [ist
run_dump_test "i386"
run_dump_test "compat"
run_dump_test "compat-intel"
+ run_dump_test "arch-1"
+ run_dump_test "arch-2"
+ run_dump_test "arch-3"
+ run_dump_test "arch-4"
# These tests require support for 8 and 16 bit relocs,
# so we only run them for ELF and COFF targets.
--- binutils/opcodes/i386-gen.c.arch 2007-12-23 21:28:11.000000000 -0800
+++ binutils/opcodes/i386-gen.c 2007-12-28 07:08:45.000000000 -0800
@@ -93,9 +93,9 @@ static initializer cpu_flag_init [] =
{ "CPU_SSSE3_FLAGS",
"CpuMMX|CpuMMX2|CpuSSE|CpuSSE2|CpuSSE3|CpuSSSE3" },
{ "CPU_SSE4_1_FLAGS",
- "CpuMMX|CpuMMX2|CpuSSE|CpuSSE2|CpuSSE3|CpuSSSE3|CpuSSE4_1" },
+ "CpuMMX|CpuMMX2|CpuSSE|CpuSSE2|CpuSSE3|CpuSSSE3|CpuSSE4_1|CpuSSE4_1_Or_5" },
{ "CPU_SSE4_2_FLAGS",
- "CpuMMX|CpuMMX2|CpuSSE|CpuSSE2|CpuSSE3|CpuSSSE3|CpuSSE4_1|CpuSSE4_2" },
+ "CpuMMX|CpuMMX2|CpuSSE|CpuSSE2|CpuSSE3|CpuSSSE3|CpuSSE4_1|CpuSSE4_2|CpuSSE4_1_Or_5" },
{ "CPU_3DNOW_FLAGS",
"CpuMMX|Cpu3dnow" },
{ "CPU_3DNOWA_FLAGS",
@@ -109,7 +109,7 @@ static initializer cpu_flag_init [] =
{ "CPU_ABM_FLAGS",
"CpuABM" },
{ "CPU_SSE5_FLAGS",
- "CpuMMX|CpuMMX2|CpuSSE|CpuSSE2|CpuSSE3|CpuSSE4a|CpuABM|CpuSSE5"}
+ "CpuMMX|CpuMMX2|CpuSSE|CpuSSE2|CpuSSE3|CpuSSE4a|CpuABM|CpuSSE5|CpuSSE4_1_Or_5"}
};
static initializer operand_type_init [] =
@@ -234,6 +234,7 @@ static bitfield cpu_flags[] =
BITFIELD (CpuSSE4_2),
BITFIELD (CpuSSE4a),
BITFIELD (CpuSSE5),
+ BITFIELD (CpuSSE4_1_Or_5),
BITFIELD (Cpu3dnow),
BITFIELD (Cpu3dnowA),
BITFIELD (CpuPadLock),
--- binutils/opcodes/i386-opc.h.arch 2007-12-23 21:28:11.000000000 -0800
+++ binutils/opcodes/i386-opc.h 2007-12-28 07:04:57.000000000 -0800
@@ -82,8 +82,10 @@
#define CpuSSE4_2 (CpuSSE4_1 + 1)
/* SSE5 support required */
#define CpuSSE5 (CpuSSE4_2 + 1)
+/* SSE4.1 or SSE5 support required */
+#define CpuSSE4_1_Or_5 (CpuSSE5 + 1)
/* 64bit support available, used by -march= in assembler. */
-#define CpuLM (CpuSSE5 + 1)
+#define CpuLM (CpuSSE4_1_Or_5 + 1)
/* 64bit support required */
#define Cpu64 (CpuLM + 1)
/* Not supported in the 64bit mode */
@@ -132,6 +134,7 @@ typedef union i386_cpu_flags
unsigned int cpusse4_1:1;
unsigned int cpusse4_2:1;
unsigned int cpusse5:1;
+ unsigned int cpusse4_1_or_5:1;
unsigned int cpulm:1;
unsigned int cpu64:1;
unsigned int cpuno64:1;
--- binutils/opcodes/i386-opc.tbl.arch 2007-12-23 21:28:11.000000000 -0800
+++ binutils/opcodes/i386-opc.tbl 2007-12-28 07:07:07.000000000 -0800
@@ -1373,11 +1373,11 @@ pmovzxwq, 2, 0x660f3834, None, 3, CpuSSE
pmovzxdq, 2, 0x660f3835, None, 3, CpuSSE4_1, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
pmuldq, 2, 0x660f3828, None, 3, CpuSSE4_1, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
pmulld, 2, 0x660f3840, None, 3, CpuSSE4_1, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
-ptest, 2, 0x660f3817, None, 3, CpuSSE4_1|CpuSSE5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
-roundpd, 3, 0x660f3a09, None, 3, CpuSSE4_1|CpuSSE5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
-roundps, 3, 0x660f3a08, None, 3, CpuSSE4_1|CpuSSE5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
-roundsd, 3, 0x660f3a0b, None, 3, CpuSSE4_1|CpuSSE5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Imm8, BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
-roundss, 3, 0x660f3a0a, None, 3, CpuSSE4_1|CpuSSE5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
+ptest, 2, 0x660f3817, None, 3, CpuSSE4_1_Or_5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
+roundpd, 3, 0x660f3a09, None, 3, CpuSSE4_1_Or_5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
+roundps, 3, 0x660f3a08, None, 3, CpuSSE4_1_Or_5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
+roundsd, 3, 0x660f3a0b, None, 3, CpuSSE4_1_Or_5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64, { Imm8, BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
+roundss, 3, 0x660f3a0a, None, 3, CpuSSE4_1_Or_5, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, BaseIndex|Disp8|Disp16|Disp32|Disp32S|RegXMM, RegXMM }
// SSE4.2 instructions.