This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Sparc exp(), expf() performance improvement


On 07/31/2017 05:30 PM, Patrick McGehearty wrote:
> On 7/31/2017 3:12 PM, Carlos O'Donell wrote:
>> On 07/31/2017 03:47 PM, David Miller wrote:
>>> From: Patrick McGehearty <patrick.mcgehearty@oracle.com>
>>> Date: Mon, 31 Jul 2017 15:39:29 -0400
>>>
>>>> This PATCH is intended to improve exp() and expf() performance on Sparc.
>>>> These changes will only be active on Sparc platforms and only for
>>>> those platforms that support HWCAP_SPARC_CRYPTO (niagara4 and later).
>>> Can you explain which instructions exactly help make the compiled
>>> C code for exp() and expf() faster instead of being vague like
>>> this?
>>>
>>> Wouldn't the new C code you are adding be faster on other CPUs as
>>> well, even without gcc generating instructions for Niagara 4 and
>>> later?
>>
>> ... I would also like to see the results of the glibc microbenchmark
>> *before* and *after* the patches. We have *specific* microbenchmarks
>> for lots of math functions.
>>
> 
> I'm assuming you are referring to the results of running
> "make bench". I found some exp results in benchtests/bench.out
> On my test machine (a single core VM in a Sparc S7 running at 4.3GHz):

My apologies, I realize I was being too harsh in my last email.
We look forward to more posts from Oracle to help out with SPARC.

Yes, this is exactly what you need to do, run `make bench` to get
data before and after the changes.

> ieee754  (before)
>   "exp": {
>    "": {
>     "duration": 4.34656e+10,
>     "iterations": 8.224e+06,
>     "max": 16550.3,
>     "min": 400.426,
>     "mean": 5285.21
>    },
> 
> new sparc code (after)
>   "exp": {
>    "": {
>     "duration": 4.25365e+10,
>     "iterations": 6.07034e+08,
>     "max": 183.446,
>     "min": 27.095,
>     "mean": 70.0726
>    },
> 
> I have to say that the ratio of 5285/70 = 75x speedup seems way
> too optimistic for my new code. I have not investigated the reason
> for the apparently super slow max value.

It looks like you've made significant performance gains for the input
values being tested by the microbenchmark.

One reason you might see 75x is that you've eliminated a slow path
that used to go through multi-precision arithmetic?

> I wrote my own standalone tests which tested a variety of different values for x
> with a repeat factor of 500 (i.e. time the computation of each value 500 times).
> A sample of results:
> x= 10  ieee754 exp(x) = 172 nsec; new exp(x) = 37 nsec (or 51 nsec without t4 optimizations)
> x=-10  ieee754 exp(x) = 172 nsec; new exp(x) = 37 nsec
> x=-0.9 ieee754 exp(x) = 172 nsec; new exp(x) = 19 nsec
> x= 0.9 ieee754 exp(x) = 172 nsec; new exp(x) = 19 nsec
> x= 0   ieee754 exp(x) = 116 nsec; new exp(x) =  8 nsec
> 
> The expf() is around 200 nsec for ieee754.
> The new expf() time is typically around 12-13 nsec/call.

Perhaps we need more indicative inputs added to benchtests/exp-inputs?

-- 
Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]