This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Improving expf() for Sparc and/or generic


Notes from informal testing of Szabolcs Nagy's expf()
vs Patrick McGehearty's expf()  - Sept 11, 2017


Performance Discussion

For the following, Patrick's expf() code was revised from the
previously submitted version to remove on unnecessary branch without
changing the logic or execution order of the computation.  That
improved Patrick's code by roughly 10%, making the two algorithms
very close to even in overall performance.

Tests were conducted using gcc 6.3, -O -mcpu=niagara7
and run on a Sparc s7 at a nominal speed of 4.31 GHz

The following values were tested: 40.125, 39.75, 21.125, 19.625,
13.125, 12.375, 12.25, 10.0, 5.475, 2.3578 1.0001, 0.99, 0.80, 0.50,
0.10, 0.0 and the negatives of the above non-zero values.

Two versions of tests over these values were run.

In the first test, expf() for each value was computed 500 times and the
average time per value computed.
In the second test, expf() for all values were computed in order and that
computation was repeated 500 times. The average time per value was computed.

When the computation of a single value was repeated, the time to
compute expf() for negative values of x nearly identical for Szabolc's
version of expf(). For positive values, Patrick's code as 14% faster
or approximately 1.4 nsecs (6 clocks) per value.  When the computation
of all different values was intermingled, Szabolc's version was 6% faster
(2 clocks) than Patrick's version. The difference is most likely
due to branch prediction as mispredicted branches on Sparc s7 can
easily cost 10-20 cycles.


Correctness Discussion

When in "nearest" rounding mode, both algorithms gave the same result
to the last ulp for all tested values. When the "make check" output is
examined, out of 356 values tested, Patrick's version has 3 values
which differ in the last ulp while Szabolc's version has 18 values
which differ in the last ulp. Most of these were for very small values
of x (i.e. -0x1p-20). Sicne there are no differences in round to nearest
mode and all differences in other rounding modes are 1 ulp,
correctness differences are not a major concern.


Recommendation:

Since Szabolc's code has a very slight edge in performance and
Patrick's code has a very slight edge in correctness, it is not clear
which would be better to become the generic version. Before
Patrick's code could be used as generic, his code would have
to be adapted to use the call/return wrapper used by ieee754
which might imply some loss of performance.

Certainly it would be better to have either version as the generic
code and not add a redundant Sparc specific version.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]