This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] improves exp() and expf() performance on Sparc.


On 9/6/2017 4:01 PM, Joseph Myers wrote:
On Wed, 6 Sep 2017, Patrick McGehearty wrote:

The sysdeps/ieee754/dbl-64/w_exp_compat.c
declares __exp (double x)
and then adds:
hidden_def (__exp)
weak_alias (__exp, exp)

I believe the weak_alias in w_exp_compat.c is overriden by the
sparc_libm_ifunc in e_exp-generic.c.  At least, I am not seeing any
link time errors about double exp declarations and I am seeing the new
code being executed (as proved by the speed and accuracy changes).
Then you should avoid any object code from w_exp_compat.c being linked
into libm.so at all, by overriding it with a dummy file, rather than just
letting certain symbols be overridden at link time.

As for error handling, I believe the extra level of indirection on
return from exp provided by the sysdeps/ieee754/dbl-64/w_exp_compat.c
routine is an anti-performance design. Every normal return from e_exp
It's fairly clearly a design optimized for consistency of error handling
in the presence of several architecture-specific implementations of the
main function, without needing to e.g. deal with TLS in assembly code for
accessing errno or make multiple implementations handle matherr the same
way.  When you avoid architecture-specific implementations (especially .S
ones) as far as possible, integrated error handling is more practical,
especially if you also use new symbol versions to avoid needing to deal
with matherr.

For expf performance obviously needs to be compared with Szabolcs's
implementation (compiled with whatever options and configured
appropriately regarding conversions to integer etc. to be optimal for
SPARC).  For exp, I'm inclined to say performance should be compared with
the existing exp *with the slow paths calling __slowexp removed along with
the associated checks for whether to use those slow paths* since those
slow paths are completely unnecessary.

The sysdeps/ieee_754 subtree has a number of direct calls into
ieee754_exp from such places as e_sinh, e_cosh, e_gamma_r, and s_erf.
While I have not found direct calls to __exp in the ieee_754 subtree,
I see overriding w_exp_compat.c as having some risk of
unexpected behavior with the only perceived benefit to be eliminating
a modest number of bytes from libm.

For exp, when I test isolated values, the factor of improvement
between ieee754 and the new code on Sparc to be in the range of 8x to
14x. That's not considering cases which trigger slowexp().

Comparing the "make bench" benchtests/bench.out for exp():
     ieee754    new
max:  17630     174
min:    399      26
mean:  5320      67

When the differences are this large and the new max is faster than the
old min, I don't see a need in doing further performance testing.

For expf, the comparison for individual values shows an improvement
in the range of 15x. benchtests does not measure expf().
Making this change will provide a clear, immediate gain in expf()
performance.

Is the Szabolcs code in its final form?  There were some discussion
of accuracy and of possible changes to the algorithm, perhaps using
a larger table. The Sparc code uses a larger table and thus may
be more accurate for some ulp sensitive values. Or it may be a non-issue
since both algorithms are using double precision for computation.

Wilco Dijkstra compared the new Sparc code to Szabolcs code on
aarch64 and found Szabolcs code to be 10% faster on aarch64.
That advantage may or may not be reversed on Sparc, but it is
close enough to justify testing.
In addition to a performance comparison, we'd want to do an
accuracy comparison to see what differences we might be accepting.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]