This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Optimized generic expf and exp2f
On 06/09/17 13:36, Wilco Dijkstra wrote:
> Szabolcs Nagy wrote:
>> On 05/09/17 21:58, Joseph Myers wrote:
>>> On Tue, 5 Sep 2017, Arjan van de Ven wrote:
>>
>>>> you mentioned x86 data.. is that based on current git after
>>>> the recent optimizations (on a cpu with fma)?
>>
>>> Really you need to compare with both the fma and non-fma versions (and
>>> compare the C version built both with and without fma, since one
>>> possibility would be that the C version can replace the x86_64 ones but
>>> should be built twice, with and without fma, to achieve that replacement).
>
> My machine has AVX2 and FMA, and when building the new generic expf
> with -mavx2 -mfma I get:
>
> expf reciprocal-throughput: 1.5x faster
> expf latency: 1.4x faster
>
> I verified in both cases FMA was used.
note that the used algorithm is similar but
the c code uses smaller polynomial and another
fma latency is removed from the arg reduction
by using 2^(x/N) polynomial instead of e^x.
..and the c code has no wrapper and it has
less branches to handle special cases.
so the c code does less computation than the
asm, in exchange it does not pass the current
math tests for non-nearest rounding.