This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [PATCH] Inline C99 math functions


> Ondřej Bílka wrote:
> On Tue, Jun 16, 2015 at 04:53:11PM +0100, Wilco Dijkstra wrote:
> > I added a new math-inlines benchmark based on the string benchmark infrastructure.
> > I used 2x1024 inputs, one 99% finite FP numbers (20% zeroes) and 1% inf/NaN,
> > and the 2nd with 50% inf, and 50% Nan. Here are the relative timings for Cortex-A57:
> >
> Where is benchmark, there are several things that could go wrong with it.

I'll send it when I can (it has to go through review etc).

> > __fpclassify_t:	8.76	7.04
> > fpclassify_t:	4.91	5.17
> 
> > __isnormal_inl_t:	8.77	7.16
> > isnormal_t:		3.16	3.17
> 
> Where did you get inline? I couldn't find it anywhere. Also such big
> number for inline implementation is suspect

It does (__fpclassify (x) == FP_NORMAL) like math.h which is obviously a bad
idea and the reason for the low performance. Although the GCC isnormal builtin
is not particularly fast, it still beats it by more than a factor of 2.

> > __finite_inl_t:	1.91	1.91
> > __finite_t:		15.29	15.28
> > isfinite_t:		1.28	1.28
> > __isinf_inl_t:	1.92	2.99
> > __isinf_t:		8.9	6.17
> > isinf_t:		1.28	1.28
> > __isnan_inl_t:	1.91	1.92
> > __isnan_t:		15.28	15.28
> > isnan_t:		1	1.01
> >
> > The plain isnan_t functions use the GCC built-ins, the _inl variant uses the
> > existing math_private.h inlines (with __isinf fixed to return the sign too),
> > and the __isnan variants are the non-inline GLIBC functions.
> >
> > So this clearly shows the GCC built-ins win by a huge margin, including the
> > inline versions.
> That looks bit suspect, submit a benchmark to see if its correct or not.

It's certainly correct, but obviously different microarchitectures will show
different results. Note the GLIBC private inlines are not particularly good.

> >  It also shows that multiple isinf/isnan calls would be faster
> > than a single inlined fpclassify...
> >
> No, thats completely wrong. When you look on assembly when using __builtin_isnan
> then its identical to one of (on x64 but I doubt that arm gcc is worse)
> 
> __builtin_fpclassify (FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL, FP_ZERO, x),0) == FP_NAN
> 
> So removing fpclassify wouldn't in better case didn't change performance
> at all, in worse one it would harm it due to duplicated checks.

Fpclassify basically does several checks if you save the result in a variable and
executes some branches. So you are far better off using dedicated checks if you
just need 2 or 3 of the 5 possible results. And depending on how the code is
structured you may only ever execute 1 check. That is far cheaper than first
computing the full result for fpclassify and then testing that.

> > A run of the math tests doesn't show up any obvious differences beyond the
> > usual variations from run to run. I suspect the difference due to inlining
> > is in the noise for expensive math functions.
> >
> Look at complex math, these use it. For real math you need to pick
> specific inputs to trigger unlikely path that uses isinf...

Yes this needs a dedicated test. Still if we save a cycle in a 100 cycle function,
it is hard to show it given modern OoO CPUs can have 5% variation from run to run...

Wilco



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]