This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: Gcc builtin review: isinf, insnan ...


Ondřej Bílka wrote:
> I raised this issue before but didn't wrote patch so I should do it now.
> I would be silent about glibc as it shares same flaw as gcc.
> 
> Main problem that these functions try to be branchless. Which causes
> performance regression for most applications versus branched code.

Being branchless is one issue indeed but the main issue they are never
inlined on any target as GLIBC headers explicitly disable inlining by GCC.

> A problem is that predicted branch is free while conditional store
> always cost cycle. So you need to have unpredictable branch to get
> performance gain. When branch is 95% predicted then branchless code
> wouldn't pay for itself if it adds one cycle versus branched and
> misprediction costs 20 cycles.
> 
> And NaN is quite exceptional value so branches will almost always be
> predicted. Otherwise user has other problems, like that if 5% of his
> data are NaN's then result will likely be garbage.
> 
> Then you have problem that with modern gcc you wont likely save branch.
> Most of these functions are surrounded by if. From gcc-4.9 it will
> optimize out that branch as its predicated and it results in simpler
> code.
> 
> More evidence about that is that I took assembly of benchmark below and
> changed conditional move to jump which improves performance back by 10%
> 
> For showing that I wrote simple example of branched isinf that is around
> 10% faster than builtin.

Note the GCC built-ins are actually incorrect and should not be used until
they are fixed to use integer arithmetic. The GLIBC versions are never
inlined on any target, and adding generic inline implementations gives a 4-6
times speedup. Isnan, isnormal, isfinite, issignalling are equally trivial, 
needing ~3-4 instructions. An optimized fpclassify implementation seems
small enough to be fully inlineable (useful given it is used in lots of
complex math functions), however it could be partially inlined like:

__glibc_likely(isnormal(x)) ? FP_NORMAL : __fpclassify(x)

Just checking, are you planning to post patches for these?

Wilco



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]