This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Sparc exp(), expf() performance improvement
- From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- Cc: "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, nd <nd at arm dot com>
- Date: Fri, 4 Aug 2017 15:17:32 -0300
- Subject: Re: [PATCH] Sparc exp(), expf() performance improvement
- Authentication-results: sourceware.org; auth=none
- References: <DB6PR0801MB205301C335767EA2F9CDC46A83B60@DB6PR0801MB2053.eurprd08.prod.outlook.com>
On 04/08/2017 15:05, Wilco Dijkstra wrote:
> Adhemerval wrote:
>>
>> I agree with David, we should refrain of adding even more platform
>> specific assembly optimization where a default C code could be as
>> good as and also improve generic performance on other platforms as
>> well.
>
> Absolutely, the code is already generic and shows great improvements on
> other targets (I tried Patrick's expf and it works fine on AArch64, achieving
> almost the performance of Szabolc's version).
>
>> The problem you specific is very similar to the one on POWER before POWER8,
>> where floating pointer to integer transfer issues a load-hit-store that
>> increases latency. I tried to mitigate this on sin/cos by tweaking the
>> internal code using a hackish hooks (commit 77a2a8b4a19f0), but currently
>> I am convinced that a new algorithm for single float exp, sin, cos (and
>> probably others) is in fact a better solution.
>
> We certainly need new algorithms and better implementations of existing math
> functions. However in most cases you can use the same generic code and build
> it using the right options for the fp->int transfer instructions. I don't see a reason
> for target specific implementations that are actually generic. Most target specific
> features can be done via macros/inline functions in math_private.h.
>
> Looking at your commit, it seems to me that it is all generic and in most cases
> the generic code could be updated to use floating point comparisons. Then if
> we can show significant gains using bit manipulation the code could add
> specialized paths for those cases that benefit.
The problem with this specific issue (fp->int transfer) it is assumed to be a
cheap operation and thus used seamlessly on various parts. It is indeed
cheap on current mostly architectures and one can build a glibc using
the baseline architecture/chip and its remove most of the performance issues.
However its is still hard to provide a baseline build that works without this
kind of performance issue without resorting on either multiple glibc builds
(and resorting on loader/auxv to select the best build) or multiple ifunc
implementations. And for such issue 'ifuncing' just the fp->int will probably
leads to even worse performance due internal plt costs.
That's why I would like to promote a baseline implementation that assumes
fp->int as costly and try to avoid it on function hotpath.