This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [PATCH v3] Add math-inline benchmark


> Ondřej Bílka wrote:
> On Thu, Jul 23, 2015 at 01:54:27PM +0100, Wilco Dijkstra wrote:
> > Add a benchmark for isinf/isnan/isnormal/isfinite/fpclassify. The test uses 2 arrays with
> 1024
> > doubles, one with 99% finite FP numbers (10% zeroes, 10% negative) and 1% inf/NaN, the other
> with
> > 50% inf, and 50% Nan.
> >
> > This version removes various tests that caused confusion and only leaves the existing GLIBC
> > definitions and inlines for comparison with the GCC builtins. I changed the tests to not
> inline
> > inside the loop and use a branch on the boolean result. The 64-bit immediates used by the
> GLIBC
> > inlines seem very expensive on some microarchitectures, so this shows even more clearly that
> using
> > the built-ins results in a significant performance gain (see x64 results below).
> >
> Thats better but still not ok.
> 
> First is what we need to make explicit. You argue both that you use
> benchmark just to show that inlining provides speedup and to justify
> claims. You cannot have it both ways.

Of course I can. This is not at all an either/or issue nor a contentious
statement. The existing inlines are pretty inefficient so it shouldn't
be a surprise to anyone that they can be easily beaten. And all the
evidence shows that my patch improves performance of the existing inlines.

My goal is to use the fastest possible implementation in all cases. 
What would the point be to add fast inlines to math.h but keeping the 
inefficient internal inlines? That makes no sense whatsoever unless 
your goal is to keep GLIBC slow...

> If you also want to make claims about builtins/current inlines then you
> need to make a benchmark that can accurately measure inlines and
> builtins to see what is correct and what is wrong. Thats quite hard as I
> said as it depends how gcc will optimize implementation.

I am confident that my benchmark does a great job. I looked at the
assembly code of all functions and concluded that GCC6 compiled the
code exactly as I expected and so it is measuring precisely what I
intended to measure.

> Here there are still unresolved issues from previous patches.
> 
> First you still test on x64 without getting EXTRACT_WORDS64
> from math_private. Without that you don't measure
> current inlines.

I ran my benchmark with the movq instruction and it doesn't change the
conclusion in any way (store-load forwarding is pretty quick).

So this is a non-issue - it is simply a bug in the x64 GCC that should
be fixed and yet another reason why it is better to use a builtin that
generates more optimal code without needing inline assembler.

> First as isinf you ommited current isinf inline. As its faster than
> isinf_ns and builtin which you check.

There is no isinf inline. I just kept the 3 math_private.h inlines.

> Then remainder test is wrong. It is inlined but kernel_standard isn't.
> As you wanted to used it to measure performance of noninline function it
> obviously doesn't measure that.

Kernel_standard should not be inlined as the real implementation isn't.

> When you fix that it clearly shows that current inlines are better on
> x64.

No that's not what it shows. In all cases the new inlines are faster.

>    "remainder_test1": {
>     "normal": {
>      "duration": 4.23772e+06,
>      "iterations": 500,
>      "mean": 8475
>     }
>    },
>    "remainder_test2": {
>     "normal": {
>      "duration": 4.45968e+06,
>      "iterations": 500,
>      "mean": 8919
>     }
>    }

Those results are way too fast, so something must be wrong with your changes.

To conclude none your issues are actual issues with my benchmark.

Wilco



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]