This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Sparc exp(), expf() performance improvement


On 7/31/2017 6:22 PM, Carlos O'Donell wrote:
On 07/31/2017 05:30 PM, Patrick McGehearty wrote:
On 7/31/2017 3:12 PM, Carlos O'Donell wrote:
On 07/31/2017 03:47 PM, David Miller wrote:
From: Patrick McGehearty <patrick.mcgehearty@oracle.com>
Date: Mon, 31 Jul 2017 15:39:29 -0400

This PATCH is intended to improve exp() and expf() performance on Sparc.
These changes will only be active on Sparc platforms and only for
those platforms that support HWCAP_SPARC_CRYPTO (niagara4 and later).
Can you explain which instructions exactly help make the compiled
C code for exp() and expf() faster instead of being vague like
this?

Wouldn't the new C code you are adding be faster on other CPUs as
well, even without gcc generating instructions for Niagara 4 and
later?
... I would also like to see the results of the glibc microbenchmark
*before* and *after* the patches. We have *specific* microbenchmarks
for lots of math functions.

I'm assuming you are referring to the results of running
"make bench". I found some exp results in benchtests/bench.out
On my test machine (a single core VM in a Sparc S7 running at 4.3GHz):
My apologies, I realize I was being too harsh in my last email.
We look forward to more posts from Oracle to help out with SPARC.

Yes, this is exactly what you need to do, run `make bench` to get
data before and after the changes.

ieee754  (before)
   "exp": {
    "": {
     "duration": 4.34656e+10,
     "iterations": 8.224e+06,
     "max": 16550.3,
     "min": 400.426,
     "mean": 5285.21
    },

new sparc code (after)
   "exp": {
    "": {
     "duration": 4.25365e+10,
     "iterations": 6.07034e+08,
     "max": 183.446,
     "min": 27.095,
     "mean": 70.0726
    },

I have to say that the ratio of 5285/70 = 75x speedup seems way
too optimistic for my new code. I have not investigated the reason
for the apparently super slow max value.
It looks like you've made significant performance gains for the input
values being tested by the microbenchmark.

One reason you might see 75x is that you've eliminated a slow path
that used to go through multi-precision arithmetic?

I wrote my own standalone tests which tested a variety of different values for x
with a repeat factor of 500 (i.e. time the computation of each value 500 times).
A sample of results:
x= 10  ieee754 exp(x) = 172 nsec; new exp(x) = 37 nsec (or 51 nsec without t4 optimizations)
x=-10  ieee754 exp(x) = 172 nsec; new exp(x) = 37 nsec
x=-0.9 ieee754 exp(x) = 172 nsec; new exp(x) = 19 nsec
x= 0.9 ieee754 exp(x) = 172 nsec; new exp(x) = 19 nsec
x= 0   ieee754 exp(x) = 116 nsec; new exp(x) =  8 nsec

The expf() is around 200 nsec for ieee754.
The new expf() time is typically around 12-13 nsec/call.
Perhaps we need more indicative inputs added to benchtests/exp-inputs?

Thank you for explaining the source of the very slow paths in the current code.
It bothers me to have unusually good results from a code change
when I don't have a clue about the reason.

Yes, I can see value in adding a wider range of values for
the inputs to exp.  Some larger and some negative values
would be more representative. Also, by adding more values,
the unusual values would be less heavily weighted in the performance
averages. Of course, changes to benchtests/exp-inputs would
be a separate patch from the code I'm proposing.

- patrick


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]