This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC][PATCH 0/2] aarch64: Add optimized ASIMD versions of sinf/cosf
On 13/06/2017 10:23, Szabolcs Nagy wrote:
> On 13/06/17 13:56, Sekhar, Ashwin wrote:
>>>> SINF
>>>> ---------------------------------------------------------
>>>> Input ThunderX88 ThunderX99 CortexA57
>>>> ---------------------------------------------------------
>>>> 0.0 1.88x 1.18x 1.17x
>>>> 2.0^-28 1.33x 1.12x 1.03x
>>>> 2.0^-6 1.48x 1.28x 1.27x
>>>> 0.6*Pi/4 0.94x 1.14x 1.21x
>>>> 13*Pi/8 1.41x 2.00x 2.16x
>>>> 17*Pi/8 1.45x 1.93x 2.23x
>>> based on these numbers my current c implementation is faster,
>>> but it will take time to polish that for submission.
>>
>> Are these going to be aarch64 specific C implementations or changes in
>> generic code?
>>
>> And Could you please inform when you are going to submit your patches.
>>
>> I also dont agree to having duplicated efforts. But if you dont plan to
>> submit your changes in the near future, I guess I will go ahead
>> addressing the other comments and work on submitting a v2 patch.
>>
>
> the plan is the next release cycle (i plan to post powf
> first, then work on sinf/cosf, possibly sin/cos too, then
> look at vector versions once the vector abi is in gcc).
>
> the c implementation is generic
> (sometimes the instruction scheduling is suboptimal and
> i found that union based bithacks don't always give good
> code but those are issues we can work on the gcc side)
>
> one issue is fma vs non-fma code, i haven't solved that
> yet, but it will probably work either way (since we use
> double prec), if it makes a difference i will add ifdef
> code path for the two cases (might affect the fast arg
> reduction)
x86_64 does this trick using ifunc (sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c
for instance).