This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] improves exp() and expf() performance on Sparc.

From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
To: "patrick dot mcgehearty at oracle dot com" <patrick dot mcgehearty at oracle dot com>
Cc: "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, nd <nd at arm dot com>
Date: Mon, 11 Sep 2017 18:50:28 +0000
Subject: Re: [PATCH] improves exp() and expf() performance on Sparc.
Authentication-results: sourceware.org; auth=none
Authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco dot Dijkstra at arm dot com;
Nodisclaimer: True
Spamdiagnosticmetadata: NSPM
Spamdiagnosticoutput: 1:99

Patrick wrote:

> When the differences are this large and the new max is faster than the
> old min, I don't see a need in doing further performance testing.

Agreed, the new version is significantly faster that there is really no contest.
What isn't obvious is how much penalty the very large tables have.
So while I think further improvements are feasible, that shouldn't hold up
adding it to generic code.

> Moving on to expf, the comparison for individual values shows an
> improvement in the range of 15x. benchtests does not measure expf().

We do now have an expf benchmark, see:
https://sourceware.org/ml/libc-alpha/2017-08/msg01126.html

> The Szabolcs code appears to provide similar benefits.  There were
> some discussion of accuracy and of possible changes to the algorithm,
> perhaps by using a larger table. The Sparc code uses a larger table and
> thus may be more accurate for some ulp sensitive values. Or it may be
> a non-issue since both algorithms are using double precision for
> computation.

Part of the discussion was to further improve performance by reducing the
polynomial and increasing the table for a small increase in ULP error (still well
below 1ULP). Another aspect discussed was what one should do for non-nearest
rounding modes - I don't believe we should expect math functions to be perfect in
those modes if that means complicating or even slowing down round-to-nearest
(while this is no longer a critical performance bug on most targets after I fixed the
fenv implementation, it still causes significant slowdowns in many math functions).

Talking about tables, the Sparc version uses very large tables which may be
why it didn't do as well in the expf benchmark or running wrf_s (1.9% slower).
This appears to be inherent to the algorithm used - while it seems feasible to
almost halve the tables, it would mean lower throughput and increased latency.

> Wilco Dijkstra compared the new Sparc code to Szabolcs code on aarch64
> and found Szabolcs code to be 10% faster on aarch64.  That result is
> close enough to justify testing on Sparc. In addition to a performance
> comparison, we'd want to compare accuracy to see if there are notable
> differences.

Accuracy is unlikely an issue given both are already far more accurate than
strictly necessary. For testing I would suggest running the expf trace as well as
wrf_s, both built and ifunced in the same way (as Joseph already suggested).

Wilco

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]