This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[RFC] dl-procinfo and HWCAP_IMPORTANT support for powerpc
- From: Steve Munroe <sjmunroe at us dot ibm dot com>
- To: libc-alpha at sources dot redhat dot com
- Date: Wed, 14 Dec 2005 22:30:37 -0600
- Subject: [RFC] dl-procinfo and HWCAP_IMPORTANT support for powerpc
- Reply-to:
- Sensitivity:
The recently committed dl-procinfo support for powerpc provides names for
AT_HWCAP bits as of 2.6.15. This update also defines HWCAP_IMPORTANT
support that allows for HWCAP based extensions to the library search path.
The intent is to allow the loader (ld.so) to select the most appropriate
version of a library given the hardware we are running on. This is
equivalent the support for i686 optimized libraries for the i386 platform.
The 2.6.15 kernel defines the following AT_HWCAP bits:
#define PPC_FEATURE_32 0x80000000 /* 32-bit mode. */
#define PPC_FEATURE_64 0x40000000 /* 64-bit mode. */
#define PPC_FEATURE_601_INSTR 0x20000000 /* 601 chip, Old POWER ISA.
*/
#define PPC_FEATURE_HAS_ALTIVEC 0x10000000 /* SIMD/Vector Unit. */
#define PPC_FEATURE_HAS_FPU 0x08000000 /* Floating Point Unit. */
#define PPC_FEATURE_HAS_MMU 0x04000000 /* Memory Management Unit.
*/
#define PPC_FEATURE_HAS_4xxMAC 0x02000000 /* 4xx Multiply Accumulator.
*/
#define PPC_FEATURE_UNIFIED_CACHE 0x01000000 /* Unified I/D cache. */
#define PPC_FEATURE_HAS_SPE 0x00800000
#define PPC_FEATURE_HAS_EFP_SINGLE 0x00400000
#define PPC_FEATURE_HAS_EFP_DOUBLE 0x00200000
#define PPC_FEATURE_NO_TB 0x00100000 /* 601/403gx have no
timebase */
#define PPC_FEATURE_POWER4 0x00080000 /* POWER4 microarch level */
#define PPC_FEATURE_POWER5 0x00040000 /* POWER5 microarch level */
#define PPC_FEATURE_POWER5_PLUS 0x00020000 /* POWER5+ microarch level
*/
#define PPC_FEATURE_CELL 0x00010000 /* CELL PU microarch level
*/
These file have been given the following procinfo names
"ppc32",
"ppc64",
"ppc601",
"altivec",
"fpu",
"mmu",
"4xxmac",
"ucache",
"spe",
"efpsingle",
"efpdouble",
"notb",
"power4",
"power5",
"power5+",
"cell"
The last 4 names represent architecture feature levels with a corresponding
ISA (see Rationale: below for details). The proposed HWCAP_IMPORTANT mask
is:
+#define HWCAP_IMPORTANT (PPC_FEATURE_HAS_ALTIVEC \
+ | PPC_FEATURE_POWER4
\
+ | PPC_FEATURE_POWER5
\
+ | PPC_FEATURE_POWER5_PLUS
\
+ | PPC_FEATURE_CELL)
This is the minimum set for defining unique micro-architectual or ISA
features. The dl-procinfo implementation uses this information to augment
the library search list. The proposed correspondence between processors and
runtime library search directories (assuming nptl and 32-bit) are:
processor library search
========== ============
power4 /lib/tls/power4, /lib/tls, /lib
power5 /lib/tls/power5, /lib/tls, /lib
power5+ /lib/tls/power5+, /lib/tls, /lib
970 /lib/tls/altivec/power4, /lib/tls/altivec, /lib/tls, /lib
cell /lib/tls/cell, /lib/tls, /lib
Similarly for 64-bit and /lib64. Since linuxthreads is deprecated the
directory structure may be simplified (eliminating the tls level of the
directory). If linuxthreads is still supported it is possible to only
support only one implementation of linuxthreads and support optimized
libraries only for nptl. In this case the LD_ASSUME_KERNEL and ABI note can
be used to simplify the directory structure (as in i386/i686).
Note: The additional "altivec" level is for the 970 is an artifact of
encoding the 970 with 2 bits in the AT_HWCAP. The "altivec" directory can
be used to store Altivec optimized libraries, including 32-bit G4
implementations.
Rationale:
The power4 and 970 implement the full 64-bit PowerPC Version 2.0 ISA
including the "optional" "General Purpose" and "Graphics" groups (for
example; fsqrt and fsqrts). The power4 processors are more aggressively
pipelined with out-of-order issue to 8 pipelines, and implements a weakly
consistent storage model. More importantly the Fixed Point, Floating
Pointer and Load/Store units are paired and symmetrical. This leads to a
optimizations that execute more instructions (in parallel) to get shorter
execution (fewer total cycles). These optimization might actually run
slower on older processors (which can't support the same level of parallel
execution) but are necessary to get full performance out of the power4.
Previous (power3, G4) PowerPC implementations implemented the older Version
1.x ISA, had fewer pipelines, may not implement the optional instructions,
and/or implement a strongly consistent storage model. So knowing that we
are running on a power4 (or newer) processor is very useful information.
The power5 processor implements the full 64-bit PowerPC Version 2.02 ISA
(adds the popcntb, fre, frsqrtes instructions). The power5 also has a
deeper storage queue (stores are more out-of-order then power4). The
power5+ processor implements the 64K page support and 4 additional FP
instructions.
The 970 chip (Apple G5, IBM JS20 blade) is represented as "power4" with the
"altivec" modifier. This is appropriate because the current 970 micro
architecture design was derived from the power4+ design with the addition
of the VMX unit. So fixed point and floating point (non-VMX) optimizations
for power4 are applicable to 970.
We have made a deliberate decision not to identify older processor
generations (POWER3, G4). They are no longer in production and/or are
adequately covered by the current glibc implementation. They will continue
to be supported by current base/default implementation (as defined by gcc
options -mcpu=powerpc for 32-bit and -mcpu=powerpc64 for 64-bit).
Steven J. Munroe
Linux on Power Toolchain Architect
IBM Corporation, Linux Technology Center