This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Sparc exp(), expf() performance improvement



On 04/08/2017 15:05, Wilco Dijkstra wrote:
> Adhemerval wrote:
>>
>> I agree with David, we should refrain of adding even more platform
>> specific assembly optimization where a default C code could be as
>> good as and also improve generic performance on other platforms as
>> well.
> 
> Absolutely, the code is already generic and shows great improvements on
> other targets (I tried Patrick's expf and it works fine on AArch64, achieving
> almost the performance of Szabolc's version).
> 
>> The problem you specific is very similar to the one on POWER before POWER8,
>> where floating pointer to integer transfer issues a load-hit-store that
>> increases latency.  I tried to mitigate this on sin/cos by tweaking the 
>> internal code using a hackish hooks (commit 77a2a8b4a19f0), but currently
>> I am convinced that a new algorithm for single float exp, sin, cos (and
>> probably others) is in fact a better solution.
> 
> We certainly need new algorithms and better implementations of existing math
> functions. However in most cases you can use the same generic code and build
> it using the right options for the fp->int transfer instructions. I don't see a reason
> for target specific implementations that are actually generic. Most target specific
> features can be done via macros/inline functions in math_private.h.
> 
> Looking at your commit, it seems to me that it is all generic and in most cases
> the generic code could be updated to use floating point comparisons. Then if
> we can show significant gains using bit manipulation the code could add 
> specialized paths for those cases that benefit.

The problem with this specific issue (fp->int transfer) it is assumed to be a
cheap operation and thus used seamlessly on various parts.  It is indeed
cheap on current mostly architectures and one can build a glibc using
the baseline architecture/chip and its remove most of the performance issues.

However its is still hard to provide a baseline build that works without this
kind of performance issue without resorting on either multiple glibc builds
(and resorting on loader/auxv to select the best build) or multiple ifunc
implementations.  And for such issue 'ifuncing' just the fp->int will probably
leads to even worse performance due internal plt costs.

That's why I would like to promote a baseline implementation that assumes
fp->int as costly and try to avoid it on function hotpath.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]