This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [PATCH v3] Add math-inline benchmark


> OndÅej BÃlka wrote:
> On Fri, Jul 24, 2015 at 01:04:40PM +0100, Wilco Dijkstra wrote:
> > > OndÅej BÃlka wrote:
> > > On Thu, Jul 23, 2015 at 01:54:27PM +0100, Wilco Dijkstra wrote:

> It claims that finite builtin is faster than inline. But when I look at
> actual benchtests I reach opposite conclusion. I ran benchtest three
> times on haswell and consistently get that replacing finite in pow
> function by isfinite causes regression.
> 
> I also checked on core2 and ivy bridge, core2 shows no difference, on
> ivy bridge inlines are also faster than builtin.

There is no regression in your results. On x64 I get a consistent 4.6%
speedup on pow, 8.4% on exp2 and 9.2% on atan. Average speedup of my
patch is 1% across all math functions. See attached results.

> > My goal is to use the fastest possible implementation in all cases.
> > What would the point be to add fast inlines to math.h but keeping the
> > inefficient internal inlines? That makes no sense whatsoever unless
> > your goal is to keep GLIBC slow...
> >
> Then you are forced to take more difficult route. It isn't about
> inefficient inlines but also about any other inlines introduced. 

The only inlines that matter are the inefficient ones because there
are no other inlines.

> As performance varies wildly on how do you benchmark it you will select
> implementation that best on benchmark. That doesn't have anything in
> common in practical performance.

Of course it does. A micro benchmark is the best way to evaluate which
code sequence is best. And since the math functions show significant
speedups with my patch, it proves that the results of my micro benchmark
are 100% accurate.

> Problem is that this could be cpu specific where it could matter in some
> cpu. Also its just matter that just measure correct function, telling
> that it doesn't matter in one case does not imply that it wouldn't
> matter for different workload.

Speculation...

> Now when I looked you test different function than remainder as real
> implementation calls __ieee754_remainder and handles differently
> overflow. This could also make difference so

__ieee754_remainder is not exported so you can't call it. 
 
> Also it does not change that you have bug there and should fix it, and
> this isn't real remainder as you need to call ieee754 remainder which is
> different workload as you need to do second check there and OoO
> execution could help you.

It doesn't make any difference as the remainder call has a constant overhead.

> > To conclude none your issues are actual issues with my benchmark.
> >
> No, there are still plenty of issues. You need to be more careful what
> you do benchmark.

So far you have not pointed out a single concrete issue.

Wilco

Attachment: bench_x64_original.out
Description: Binary data

Attachment: bench_x64_builtin.out
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]