This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[PATCH v3] BZ #14059 - Fix AVX and FMA4 detection.
- From: "Carlos O'Donell" <carlos at systemhalted dot org>
- To: Jeff Law <law at redhat dot com>, Allan McRae <allan at archlinux dot org>, Andreas Jaeger <aj at suse dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Wed, 16 May 2012 21:46:42 -0400
- Subject: [PATCH v3] BZ #14059 - Fix AVX and FMA4 detection.
- References: <CADZpyizmP=jMtyS9yv0eBAAbiuyrou=dremZh7jG5sV73pR71A@mail.gmail.com>
On Wed, May 16, 2012 at 8:21 PM, Carlos O'Donell
<carlos@systemhalted.org> wrote:
> 2012-05-11 ?Andreas Jaeger ?<aj@suse.de>
> ? ? ? ? ? ?Carlos O'Donell ?<carlos_odonell@mentor.com>
>
> ? ? ? ?[BZ #14059]
> ? ? ? ?* sysdeps/x86_64/multiarch/init-arch.h
> ? ? ? ?(bit_YMM_Usable): Rename to...
> ? ? ? ?(bit_AVX_Usable): ... this.
> ? ? ? ?(bit_FMA4_Usable): New macro.
> ? ? ? ?(bit_XMM_state): New macro.
> ? ? ? ?(bit_YMM_state): New macro.
> ? ? ? ?[__ASSEMBLER__] (index_YMM_Usable): Rename to...
> ? ? ? ?[__ASSEMBLER__] (index_AVX_Usable): ... this.
> ? ? ? ?[__ASSEMBLER__] (index_FMA4_Usable): New macro.
> ? ? ? ?(CPUID_OSXSAVE): New macro.
> ? ? ? ?(CPUID_AVX): New macro.
> ? ? ? ?(CPUID_FMA4): New macro.
> ? ? ? ?(index_YMM_Usable): Rename to...
> ? ? ? ?(index_AVX_Usable): ... this.
> ? ? ? ?(HAS_AVX): Use HAS_ARCH_FEATURE.
> ? ? ? ?(HAS_FMA4): Likewise.
> ? ? ? ?(HAS_YMM_USABLE): Remove.
> ? ? ? ?* sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features):
> ? ? ? ?Fix check for AVX, enable FMA4 only if it exists and if AVX is
> ? ? ? ?usable.
> ? ? ? ?* sysdeps/x86_64/multiarch/strcmp.S: Use bit_AVX_Usable.
> ? ? ? ?* sysdeps/i386/i686/multiarch/Makefile: Add test-multiarch to tests.
> ? ? ? ?* sysdeps/x86_64/multiarch/Makefile: Likewise.
> ? ? ? ?* sysdeps/i386/i686/multiarch/test-multiarch.c: New file.
> ? ? ? ?* sysdeps/x86_64/multiarch/test-multiarch.c: New file.
> --
> ?i386/i686/multiarch/Makefile ? ? ? ? | ? ?1
> ?i386/i686/multiarch/test-multiarch.c | ? ?1
> ?x86_64/multiarch/Makefile ? ? ? ? ? ?| ? ?1
> ?x86_64/multiarch/init-arch.c ? ? ? ? | ? 17 ++++--
> ?x86_64/multiarch/init-arch.h ? ? ? ? | ? 51 +++++++++++++-------
> ?x86_64/multiarch/strcmp.S ? ? ? ? ? ?| ? ?9 ++-
> ?x86_64/multiarch/test-multiarch.c ? ?| ? 88 +++++++++++++++++++++++++++++++++++
> ?7 files changed, 142 insertions(+), 26 deletions(-)
Does the FMA4 support depend on AVX being present *and* enabled?
The patch enables FMA4 support if AVX is present, is this wrong?
We have FMA4 as bit-16, but unfortunately bit-16 of the CPUID result
is marked reserved in the "Intel 64 and IA-32 Architectures Software
Developer's Manual" (May 2012).
What are we actually detecting with FMA4?
I see that this is all part of an AMD and Intel mixup.
I found FMA4 in "AMD64 Architecture Programmer’s Manual Volume 2:
System Programming" (March 2012), and in "AMD64 Architecture
Programmer’s Manual Volume 6: 128-Bit and 256-Bit, XOP, and FMA4
Instructions" which does not say FMA4 is dependent on AVX.
~~~
Support for the new instructions is indicated by use of the CPUID instruction:
- XOP—ECX bit 11 as returned by CPUID function 8000_0001h.
- FMA4—ECX bit 16 as returned by CPUID function 8000_0001h.
Attempting to execute these instructions causes a #UD exception either
if they are not present in the
hardware or if operating system support for YMM context switching is
not indicated by setting
CR4.OSXSAVE to 1.
~~~
Thus FMA4 is enabled if present and YMM state is usable, similar to
AVX, but not dependent on AVX.
The delta is this:
diff --git a/sysdeps/x86_64/multiarch/init-arch.c
b/sysdeps/x86_64/multiarch/init-arch.c
index 26d62ef..155033d 100644
--- a/sysdeps/x86_64/multiarch/init-arch.c
+++ b/sysdeps/x86_64/multiarch/init-arch.c
@@ -143,21 +143,23 @@ __init_cpu_features (void)
else
kind = arch_kind_other;
- if (CPUID_AVX)
+ /* Can we call xgetbv? */
+ if (CPUID_OSXSAVE)
{
- /* Determine if AVX is usable. */
- if (CPUID_OSXSAVE
- && ({ unsigned int xcrlow;
- unsigned int xcrhigh;
- asm ("xgetbv"
- : "=a" (xcrlow), "=d" (xcrhigh) : "c" (0));
- (xcrlow & (bit_YMM_state | bit_XMM_state)) ==
- (bit_YMM_state | bit_XMM_state); }))
- __cpu_features.feature[index_AVX_Usable] |= bit_AVX_Usable;
-
- /* FMA4 depends on AVX support. */
- if (CPUID_FMA4)
- __cpu_features.feature[index_FMA4_Usable] |= bit_FMA4_Usable;
+ unsigned int xcrlow;
+ unsigned int xcrhigh;
+ asm ("xgetbv" : "=a" (xcrlow), "=d" (xcrhigh) : "c" (0));
+ /* Is YMM and XMM state usable? */
+ if ((xcrlow & (bit_YMM_state | bit_XMM_state)) ==
+ (bit_YMM_state | bit_XMM_state))
+ {
+ /* Determine if AVX is usable. */
+ if (CPUID_AVX)
+ __cpu_features.feature[index_AVX_Usable] |= bit_AVX_Usable;
+ /* Determine if FMA4 is usable. */
+ if (CPUID_FMA4)
+ __cpu_features.feature[index_FMA4_Usable] |= bit_FMA4_Usable;
+ }
}
__cpu_features.family = family;
---
I'll send out a new email when testing is done.
Who has a box with FMA4 for testing?
Cheers,
Carlos.