This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Sparc exp(), expf() performance improvement


Sparc has a significant performance issue with RAW (read after write).
That is, if a value is stored to a particular address and then read from that
address before the store has reached L2 cache, a pipeline hiccup occurs
and a 30+ cycle delay is seen. Most commonly this issue is seen in the case
of register spill/fills, but it also occurs when a value in an integer register to stored to a temporary in memory and then loaded to a floating point register.
The int to fp and fp to int operations are common in exp() algorithms due
to cracking the exponent from the mantissa to determine which special
case to use in handling particular input data ranges.

Starting with Niagara4 (T4), direct int to fp and fp to int transfer instructions
were added, avoiding this performance issue. If we compile for any Sparc
platform instead of T4 and later, we can't use the direct transfers.
Note that T4 was first introduced in 2011, meaning most current
Sparc/Linux platforms will have this support.

For comparison, recent x86 chips from Intel have thrown enough HW at
the RAW issue to not have any delays when a read-after-write occurs.

The new algorithm is significantly different from the existing sysdeps/ieee754 algorithm. The new algorithm matches the one used by the Solaris/Studio libm exp(), expf() code.
My effort was involved in porting (with Oracle corporate permission), not
algorithm construction.

It seems likely that this code could be faster on other CPUs, but I've only tested it on Sparc as that's the machines I have ready access to. The advantage may be much less on other platforms.

- patrick


On 7/31/2017 2:47 PM, David Miller wrote:
From: Patrick McGehearty <patrick.mcgehearty@oracle.com>
Date: Mon, 31 Jul 2017 15:39:29 -0400

This PATCH is intended to improve exp() and expf() performance on Sparc.
These changes will only be active on Sparc platforms and only for
those platforms that support HWCAP_SPARC_CRYPTO (niagara4 and later).
Can you explain which instructions exactly help make the compiled
C code for exp() and expf() faster instead of being vague like
this?

Wouldn't the new C code you are adding be faster on other CPUs as
well, even without gcc generating instructions for Niagara 4 and
later?

Thank you.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]