This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Optimized generic expf and exp2f


On 06/09/17 13:36, Wilco Dijkstra wrote:
> Szabolcs Nagy wrote:
>> On 05/09/17 21:58, Joseph Myers wrote:
>>> On Tue, 5 Sep 2017, Arjan van de Ven wrote:
>>
>>>> you mentioned x86 data.. is that based on current git after
>>>> the recent optimizations (on a cpu with fma)?
>>
>>> Really you need to compare with both the fma and non-fma versions (and 
>>> compare the C version built both with and without fma, since one 
>>> possibility would be that the C version can replace the x86_64 ones but 
>>> should be built twice, with and without fma, to achieve that replacement).
> 
> My machine has AVX2 and FMA, and when building the new generic expf
> with -mavx2 -mfma I get:
> 
> expf reciprocal-throughput: 1.5x faster
> expf latency: 1.4x faster
> 
> I verified in both cases FMA was used.

note that the used algorithm is similar but
the c code uses smaller polynomial and another
fma latency is removed from the arg reduction
by using 2^(x/N) polynomial instead of e^x.

..and the c code has no wrapper and it has
less branches to handle special cases.

so the c code does less computation than the
asm, in exchange it does not pass the current
math tests for non-nearest rounding.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]