This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Sparc exp(), expf() performance improvement


On 7/31/2017 4:21 PM, David Miller wrote:
From: Patrick McGehearty <patrick.mcgehearty@oracle.com>
Date: Mon, 31 Jul 2017 16:06:44 -0500

Sparc has a significant performance issue with RAW (read after write).
That is, if a value is stored to a particular address and then read
from that
address before the store has reached L2 cache, a pipeline hiccup
occurs
and a 30+ cycle delay is seen. Most commonly this issue is seen in the
case
of register spill/fills, but it also occurs when a value in an integer
register
to stored to a temporary in memory and then loaded to a floating point
register.
The int to fp and fp to int operations are common in exp() algorithms
due
to cracking the exponent from the mantissa to determine which special
case to use in handling particular input data ranges.

Starting with Niagara4 (T4), direct int to fp and fp to int transfer
instructions
were added, avoiding this performance issue. If we compile for any
Sparc
platform instead of T4 and later, we can't use the direct transfers.
Note that T4 was first introduced in 2011, meaning most current
Sparc/Linux platforms will have this support.

For comparison, recent x86 chips from Intel have thrown enough HW at
the RAW issue to not have any delays when a read-after-write occurs.

The new algorithm is significantly different from the existing
sysdeps/ieee754 algorithm.
The new algorithm matches the one used by the Solaris/Studio libm
exp(), expf() code.
My effort was involved in porting (with Oracle corporate permission),
not
algorithm construction.

It seems likely that this code could be faster on other CPUs, but I've
only tested it on Sparc
as that's the machines I have ready access to. The advantage may be
much less on other platforms.
You miss my point.

You are doing two _completely_ different things here.

First, you could simply build the existing exp() and expf() C code in
glibc with niagara4.  In fact, if this float<-->int move instruction
helps so much, you probably want to build the entire math library
this way with appropriate ifunc hooks.  Not just exp/expf.

Second, you could then introduce the new C code implementation of exp
and expf functions and:

1) See if it is faster on other sparc cpus.

2) Ask other glibc developers to test whether it is faster on
    non-sparc cpus as well.

Making both changes and only targetting post-niagara4 cpus is
completely the wrong way to go about this.

I'm preparing to do a trial run on -mcpu=niagara4 for glibc.
I'll report back on any interesting differences for make bench
with/without -mcpu=niagara4 for the current sourceware tree.

I will note from my point of view, this project is focused only
on exp() and expf() as Sparc/Solaris/Studio showed dramatically
better performance on those specific functions. There are a few
other functions which run faster on Sparc/Solaris/Studio, but
nothing like the performance difference for exp() and expf().

- patrick


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]