This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH v2] BZ #14059 - Fix AVX and FMA4 detection.


On Mon, May 14, 2012 at 11:30 AM, Jeff Law <law@redhat.com> wrote:
> When you've got an updated patch, let me know. ?I probably need to return my
> testbox today; so the window where I can test is closing rapidly.

Jeff,

Sorry, I couldn't speed things up.

Allan,

For some additional testing could you please test the attached patch?

Community,

Finished testing on three systems:
- an AVX-enabled RHEL61 box and everything worked.
- a non-AVX-enabled, but SSE4.2 enabled, Ubuntu box and everything worked.
- an ancient i686 system and everything worked.

I'd thought that calling __get_cpu_features() would work because of this:

# ifndef NOT_IN_libc
#  define __get_cpu_features()  (&__cpu_features)
# endif

But that doesn't cover all of the cases under which this code is compiled.

Therefore I just did this:

/* CPUID_* evaluates to true if the feature flag is enabled.
   We always use &__cpu_features because the HAS_CPUID_* macros
   are called only within __init_cpu_features, where we can't
   call __get_cpu_features without infinite recursion.  */
# define HAS_CPUID_FLAG(idx, reg, bit) \
  (((&__cpu_features)->cpuid[idx].reg & (bit)) != 0)

# define CPUID_OSXSAVE \
  HAS_CPUID_FLAG (COMMON_CPUID_INDEX_1, ecx, bit_OSXSAVE)
# define CPUID_AVX \
  HAS_CPUID_FLAG (COMMON_CPUID_INDEX_1, ecx, bit_AVX)
# define CPUID_FMA4 \
  HAS_CPUID_FLAG (COMMON_CPUID_INDEX_80000001, ecx, bit_FMA4)

While the dynamic loader initially uses a trivial
strcmp/strchr/rtld-strlen versions on purpose and this works,
eventually when the map for libc.so is loaded we call __strcasecmp via
the PLT, which calls __init_cpu_features. This is annoyingly hard to
follow with gdb as are most things in the dynamic loader.

The new patch runs the entire testsuite without regression.

The test-multiarch test passes on an AVX-enabled system with:

Checking HAS_AVX:
  init-arch 1
  cpuinfo (avx) 1
Checking HAS_FMA4:
  init-arch 0
  cpuinfo (fma4) 0
Checking HAS_SSE4_2:
  init-arch 1
  cpuinfo (sse4_2) 1
Checking HAS_SSE4_1:
  init-arch 1
  cpuinfo (sse4_1) 1
Checking HAS_SSSE3:
  init-arch 1
  cpuinfo (ssse3) 1
Checking HAS_POPCOUNT:
  init-arch 1
  cpuinfo (popcnt) 1
0 differences between /proc/cpuinfo and glibc code.

On a recent system without AVX I get:

Checking HAS_AVX:
  init-arch 0
  cpuinfo (avx) 0
Checking HAS_FMA4:
  init-arch 0
  cpuinfo (fma4) 0
Checking HAS_SSE4_2:
  init-arch 1
  cpuinfo (sse4_2) 1
Checking HAS_SSE4_1:
  init-arch 1
  cpuinfo (sse4_1) 1
Checking HAS_SSSE3:
  init-arch 1
  cpuinfo (ssse3) 1
Checking HAS_POPCOUNT:
  init-arch 1
  cpuinfo (popcnt) 1
0 differences between /proc/cpuinfo and glibc code.

On an old i686 system:

Checking HAS_AVX:
  init-arch 0
  cpuinfo (avx) 0
Checking HAS_FMA4:
  init-arch 0
  cpuinfo (fma4) 0
Checking HAS_SSE4_2:
  init-arch 0
  cpuinfo (sse4_2) 0
Checking HAS_SSE4_1:
  init-arch 0
  cpuinfo (sse4_1) 0
Checking HAS_SSSE3:
  init-arch 0
  cpuinfo (ssse3) 0
Checking HAS_POPCOUNT:
  init-arch 0
  cpuinfo (popcnt) 0
0 differences between /proc/cpuinfo and glibc code.

Patch attached.

2012-05-11  Andreas Jaeger  <aj@suse.de>
	    Carlos O'Donell  <carlos_odonell@mentor.com>

	[BZ #14059]
	* sysdeps/x86_64/multiarch/init-arch.h
	(bit_YMM_Usable): Rename to...
	(bit_AVX_Usable): ... this.
	(bit_FMA4_Usable): New macro.
	(bit_XMM_state): New macro.
	(bit_YMM_state): New macro.
	[__ASSEMBLER__] (index_YMM_Usable): Rename to...
	[__ASSEMBLER__] (index_AVX_Usable): ... this.
	[__ASSEMBLER__] (index_FMA4_Usable): New macro.
	(CPUID_OSXSAVE): New macro.
	(CPUID_AVX): New macro.
	(CPUID_FMA4): New macro.
	(index_YMM_Usable): Rename to...
	(index_AVX_Usable): ... this.
	(HAS_AVX): Use HAS_ARCH_FEATURE.
	(HAS_FMA4): Likewise.
	(HAS_YMM_USABLE): Remove.
	* sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features):
	Fix check for AVX, enable FMA4 only if it exists and if AVX is
	usable.
	* sysdeps/x86_64/multiarch/strcmp.S: Use bit_AVX_Usable.
	* sysdeps/i386/i686/multiarch/Makefile: Add test-multiarch to tests.
	* sysdeps/x86_64/multiarch/Makefile: Likewise.
	* sysdeps/i386/i686/multiarch/test-multiarch.c: New file.
	* sysdeps/x86_64/multiarch/test-multiarch.c: New file.
--
 i386/i686/multiarch/Makefile         |    1
 i386/i686/multiarch/test-multiarch.c |    1
 x86_64/multiarch/Makefile            |    1
 x86_64/multiarch/init-arch.c         |   17 ++++--
 x86_64/multiarch/init-arch.h         |   51 +++++++++++++-------
 x86_64/multiarch/strcmp.S            |    9 ++-
 x86_64/multiarch/test-multiarch.c    |   88 +++++++++++++++++++++++++++++++++++
 7 files changed, 142 insertions(+), 26 deletions(-)

OK to checkin?

Cheers,
Carlos.

Attachment: final-avx-v3.diff
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]