This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
RE: [PATCH] Inline C99 math functions
- From: "Wilco Dijkstra" <wdijkstr at arm dot com>
- To: 'Ondřej Bílka' <neleai at seznam dot cz>
- Cc: "'Joseph Myers'" <joseph at codesourcery dot com>, "GNU C Library" <libc-alpha at sourceware dot org>
- Date: Tue, 16 Jun 2015 18:53:39 +0100
- Subject: RE: [PATCH] Inline C99 math functions
- Authentication-results: sourceware.org; auth=none
- References: <001201d0a75b$921d9860$b658c920$ at com> <alpine dot DEB dot 2 dot 10 dot 1506151431490 dot 26683 at digraph dot polyomino dot org dot uk> <001701d0a789$f2ab86f0$d80294d0$ at com> <alpine dot DEB dot 2 dot 10 dot 1506151654100 dot 26683 at digraph dot polyomino dot org dot uk> <001801d0a84c$8c5cd7a0$a51686e0$ at com> <20150616164020 dot GA8970 at domone>
> Ondřej Bílka wrote:
> On Tue, Jun 16, 2015 at 04:53:11PM +0100, Wilco Dijkstra wrote:
> > I added a new math-inlines benchmark based on the string benchmark infrastructure.
> > I used 2x1024 inputs, one 99% finite FP numbers (20% zeroes) and 1% inf/NaN,
> > and the 2nd with 50% inf, and 50% Nan. Here are the relative timings for Cortex-A57:
> >
> Where is benchmark, there are several things that could go wrong with it.
I'll send it when I can (it has to go through review etc).
> > __fpclassify_t: 8.76 7.04
> > fpclassify_t: 4.91 5.17
>
> > __isnormal_inl_t: 8.77 7.16
> > isnormal_t: 3.16 3.17
>
> Where did you get inline? I couldn't find it anywhere. Also such big
> number for inline implementation is suspect
It does (__fpclassify (x) == FP_NORMAL) like math.h which is obviously a bad
idea and the reason for the low performance. Although the GCC isnormal builtin
is not particularly fast, it still beats it by more than a factor of 2.
> > __finite_inl_t: 1.91 1.91
> > __finite_t: 15.29 15.28
> > isfinite_t: 1.28 1.28
> > __isinf_inl_t: 1.92 2.99
> > __isinf_t: 8.9 6.17
> > isinf_t: 1.28 1.28
> > __isnan_inl_t: 1.91 1.92
> > __isnan_t: 15.28 15.28
> > isnan_t: 1 1.01
> >
> > The plain isnan_t functions use the GCC built-ins, the _inl variant uses the
> > existing math_private.h inlines (with __isinf fixed to return the sign too),
> > and the __isnan variants are the non-inline GLIBC functions.
> >
> > So this clearly shows the GCC built-ins win by a huge margin, including the
> > inline versions.
> That looks bit suspect, submit a benchmark to see if its correct or not.
It's certainly correct, but obviously different microarchitectures will show
different results. Note the GLIBC private inlines are not particularly good.
> > It also shows that multiple isinf/isnan calls would be faster
> > than a single inlined fpclassify...
> >
> No, thats completely wrong. When you look on assembly when using __builtin_isnan
> then its identical to one of (on x64 but I doubt that arm gcc is worse)
>
> __builtin_fpclassify (FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL, FP_ZERO, x),0) == FP_NAN
>
> So removing fpclassify wouldn't in better case didn't change performance
> at all, in worse one it would harm it due to duplicated checks.
Fpclassify basically does several checks if you save the result in a variable and
executes some branches. So you are far better off using dedicated checks if you
just need 2 or 3 of the 5 possible results. And depending on how the code is
structured you may only ever execute 1 check. That is far cheaper than first
computing the full result for fpclassify and then testing that.
> > A run of the math tests doesn't show up any obvious differences beyond the
> > usual variations from run to run. I suspect the difference due to inlining
> > is in the noise for expensive math functions.
> >
> Look at complex math, these use it. For real math you need to pick
> specific inputs to trigger unlikely path that uses isinf...
Yes this needs a dedicated test. Still if we save a cycle in a 100 cycle function,
it is hard to show it given modern OoO CPUs can have 5% variation from run to run...
Wilco