This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Sparc exp(), expf() performance improvement


On 7/31/2017 3:12 PM, Carlos O'Donell wrote:
On 07/31/2017 03:47 PM, David Miller wrote:
From: Patrick McGehearty <patrick.mcgehearty@oracle.com>
Date: Mon, 31 Jul 2017 15:39:29 -0400

This PATCH is intended to improve exp() and expf() performance on Sparc.
These changes will only be active on Sparc platforms and only for
those platforms that support HWCAP_SPARC_CRYPTO (niagara4 and later).
Can you explain which instructions exactly help make the compiled
C code for exp() and expf() faster instead of being vague like
this?

Wouldn't the new C code you are adding be faster on other CPUs as
well, even without gcc generating instructions for Niagara 4 and
later?

... I would also like to see the results of the glibc microbenchmark
*before* and *after* the patches. We have *specific* microbenchmarks
for lots of math functions.


I'm assuming you are referring to the results of running
"make bench". I found some exp results in benchtests/bench.out
On my test machine (a single core VM in a Sparc S7 running at 4.3GHz):

ieee754  (before)
  "exp": {
   "": {
    "duration": 4.34656e+10,
    "iterations": 8.224e+06,
    "max": 16550.3,
    "min": 400.426,
    "mean": 5285.21
   },

new sparc code (after)
  "exp": {
   "": {
    "duration": 4.25365e+10,
    "iterations": 6.07034e+08,
    "max": 183.446,
    "min": 27.095,
    "mean": 70.0726
   },

I have to say that the ratio of 5285/70 = 75x speedup seems way
too optimistic for my new code. I have not investigated the reason
for the apparently super slow max value.

I wrote my own standalone tests which tested a variety of different values for x with a repeat factor of 500 (i.e. time the computation of each value 500 times).
A sample of results:
x= 10 ieee754 exp(x) = 172 nsec; new exp(x) = 37 nsec (or 51 nsec without t4 optimizations)
x=-10  ieee754 exp(x) = 172 nsec; new exp(x) = 37 nsec
x=-0.9 ieee754 exp(x) = 172 nsec; new exp(x) = 19 nsec
x= 0.9 ieee754 exp(x) = 172 nsec; new exp(x) = 19 nsec
x= 0   ieee754 exp(x) = 116 nsec; new exp(x) =  8 nsec

The expf() is around 200 nsec for ieee754.
The new expf() time is typically around 12-13 nsec/call.

- patrick


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]