This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][PATCH 0/2] aarch64: Add optimized ASIMD versions of sinf/cosf



On 13/06/2017 10:23, Szabolcs Nagy wrote:
> On 13/06/17 13:56, Sekhar, Ashwin wrote:
>>>>   SINF
>>>>   ---------------------------------------------------------
>>>>   Input           ThunderX88      ThunderX99      CortexA57
>>>>   ---------------------------------------------------------
>>>>   0.0              1.88x           1.18x           1.17x
>>>>   2.0^-28          1.33x           1.12x           1.03x
>>>>   2.0^-6           1.48x           1.28x           1.27x
>>>>   0.6*Pi/4         0.94x           1.14x           1.21x
>>>>   13*Pi/8          1.41x           2.00x           2.16x
>>>>   17*Pi/8          1.45x           1.93x           2.23x
>>> based on these numbers my current c implementation is faster,
>>> but it will take time to polish that for submission.
>>
>> Are these going to be aarch64 specific C implementations or changes in
>> generic code?
>>
>> And Could you please inform when you are going to submit your patches.
>>
>> I also dont agree to having duplicated efforts. But if you dont plan to
>> submit your changes in the near future, I guess I will go ahead
>> addressing the other comments and work on submitting a v2 patch.
>>
> 
> the plan is the next release cycle (i plan to post powf
> first, then work on sinf/cosf, possibly sin/cos too, then
> look at vector versions once the vector abi is in gcc).
> 
> the c implementation is generic
> (sometimes the instruction scheduling is suboptimal and
> i found that union based bithacks don't always give good
> code but those are issues we can work on the gcc side)
> 
> one issue is fma vs non-fma code, i haven't solved that
> yet, but it will probably work either way (since we use
> double prec), if it makes a difference i will add ifdef
> code path for the two cases (might affect the fast arg
> reduction)

x86_64 does this trick using ifunc (sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c
for instance).


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]