This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] improves exp() and expf() performance on Sparc.


On 9/7/2017 4:05 PM, Joseph Myers wrote:
On Thu, 7 Sep 2017, Patrick McGehearty wrote:

The sysdeps/ieee_754 subtree has a number of direct calls into
ieee754_exp from such places as e_sinh, e_cosh, e_gamma_r, and s_erf.
While I have not found direct calls to __exp in the ieee_754 subtree,
I see overriding w_exp_compat.c as having some risk of
unexpected behavior with the only perceived benefit to be eliminating
a modest number of bytes from libm.
Those direct calls don't use the wrapper and so are completely irrelevant
to the matter of overriding it.

It is quite clear that the wrapper needs to be overridden on any
architecture providing its own exp (as opposed to __ieee754_exp)
implementation, just as ia64 overrides it.

For expf, the comparison for individual values shows an improvement
in the range of 15x. benchtests does not measure expf().
Presumably you need to test with the benchmark addition Szabolcs points to
in his patch submission.

Making this change will provide a clear, immediate gain in expf()
performance.
Maintainability is also important, and it points against having lots of
architecture-specific versions.  Thus, people interested in expf
optimization should first be helping with the review of Szabolcs's patch
(and the benchtests addition patch it builds on).  Once that's done, it
can provide a basis for judging the merits of architecture-specific expf
versions (which might well also indicate improvements to Szabolcs's code
as an alternative to adding an architecture-specific version).

For exp, when you have a better-performing C version the question should
first be whether it can replace the existing generic C version (possibly
then being built multiple times on architectures where that's useful)
rather than whether to add it as architecture-specific code.  Adding a C
version as architecture-specific code (rather than having limited
architecture-specific hooks in a generic version) should only be once
there is evidence of different architectures' performance characteristics
requiring substantially different approaches.



The sysdeps/ieee_754 subtree has a number of direct calls into
ieee754_exp from such places as e_sinh, e_cosh, e_gamma_r, and s_erf.
While I have not found direct calls to __exp in the ieee_754 subtree,
I see overriding w_exp_compat.c as having some risk of
unexpected behavior with the only perceived benefit to be eliminating
a modest number of bytes from libm.

As for exp performance, when I test isolated values, the factor of
improvement between ieee754 and the new code on Sparc to be in the
range of 8x to 14x. That's not considering cases which trigger
slowexp().

Comparing the "make bench" benchtests/bench.out for exp():
     ieee754    new
max:  17630     174
min:    399      26
mean:  5320      67

When the differences are this large and the new max is faster than the
old min, I don't see a need in doing further performance testing.

Moving on to expf, the comparison for individual values shows an
improvement in the range of 15x. benchtests does not measure expf().
Making this change will provide a clear, immediate gain in expf()
performance.

The Szabolcs code appears to provide similar benefits.  There were
some discussion of accuracy and of possible changes to the algorithm,
perhaps by using a larger table. The Sparc code uses a larger table and
thus may be more accurate for some ulp sensitive values. Or it may be
a non-issue since both algorithms are using double precision for
computation.

Wilco Dijkstra compared the new Sparc code to Szabolcs code on aarch64
and found Szabolcs code to be 10% faster on aarch64.  That result is
close enough to justify testing on Sparc. In addition to a performance
comparison, we'd want to compare accuracy to see if there are notable
differences.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]