This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
RE: [PATCH] Inline C99 math functions
- From: "Wilco Dijkstra" <wdijkstr at arm dot com>
- To: 'OndÅej BÃlka' <neleai at seznam dot cz>
- Cc: "'Joseph Myers'" <joseph at codesourcery dot com>, "GNU C Library" <libc-alpha at sourceware dot org>
- Date: Wed, 17 Jun 2015 16:24:46 +0100
- Subject: RE: [PATCH] Inline C99 math functions
- Authentication-results: sourceware.org; auth=none
- References: <001201d0a75b$921d9860$b658c920$ at com> <alpine dot DEB dot 2 dot 10 dot 1506151431490 dot 26683 at digraph dot polyomino dot org dot uk> <001701d0a789$f2ab86f0$d80294d0$ at com> <alpine dot DEB dot 2 dot 10 dot 1506151654100 dot 26683 at digraph dot polyomino dot org dot uk> <001801d0a84c$8c5cd7a0$a51686e0$ at com> <20150616164020 dot GA8970 at domone> <001901d0a85d$60857bd0$21907370$ at com> <20150617053502 dot GA13762 at domone>
> OndÅej BÃlka wrote:
> On Tue, Jun 16, 2015 at 06:53:39PM +0100, Wilco Dijkstra wrote:
> > > OndÅej BÃlka wrote:
> > > On Tue, Jun 16, 2015 at 04:53:11PM +0100, Wilco Dijkstra wrote:
> > > > I added a new math-inlines benchmark based on the string benchmark infrastructure.
> > > > I used 2x1024 inputs, one 99% finite FP numbers (20% zeroes) and 1% inf/NaN,
> > > > and the 2nd with 50% inf, and 50% Nan. Here are the relative timings for Cortex-A57:
> > > >
> > > Where is benchmark, there are several things that could go wrong with it.
> >
> > I'll send it when I can (it has to go through review etc).
> >
> > > > __fpclassify_t: 8.76 7.04
> > > > fpclassify_t: 4.91 5.17
> > >
> > > > __isnormal_inl_t: 8.77 7.16
> > > > isnormal_t: 3.16 3.17
> > >
> > > Where did you get inline? I couldn't find it anywhere. Also such big
> > > number for inline implementation is suspect
> >
> > It does (__fpclassify (x) == FP_NORMAL) like math.h which is obviously a bad
> > idea and the reason for the low performance. Although the GCC isnormal builtin
> > is not particularly fast, it still beats it by more than a factor of 2.
> >
> No, bad idea was not inlining fpclassify, that affects most of performance difference.
> There is also problem that glibcdev/glibc/sysdeps/ieee754/dbl-64/s_fpclassify.c is bit slow as
> it tests unlikely cases first but that is secondary.
Even with the inlined fpclassify (inl2 below), isnormal is slower:
__isnormal_inl2_t: 1.25 3.67
__isnormal_inl_t: 4.59 2.89
isnormal_t: 1 1
So using a dedicated builtin for isnormal is important.
> > It's certainly correct, but obviously different microarchitectures will show
> > different results. Note the GLIBC private inlines are not particularly good.
> >
> No, problem is that different benchmarks show different results on same
> architecture. To speed things up run following to test all cases of
> environment. Run attached tf script to get results on arm.
I tried, but I don't think this is a good benchmark - you're not measuring
the FP->int move for the branched version, and you're comparing the signed
version of isinf vs the builtin which does isinf_ns.
> Which doesn't matter. As gcc optimized unneded checks away you won't do
> unneeded checks. As using:
>
> __builtin_fpclassify (FP_NAN, FP_INFINITE, \
> FP_NORMAL, FP_SUBNORMAL, FP_ZERO, x),0);
> return result == FP_INFINITE || result == FP_NAN;
>
> is slower than:
>
> return __builtin_isinf (x) || __builtin_isnan (x);
>
> Your claim is false, run attached tf2 script to test.
That's not what I am seeing, using two explicit isinf/isnan calls (test2) is faster
than inlined fpclassify (test1):
__fpclassify_test2_t: 1 4.41
__fpclassify_test1_t: 1.23 4.66
Wilco