This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH v2] Add math-inline benchmark

From: OndÅej BÃlka <neleai at seznam dot cz>
To: Wilco Dijkstra <wdijkstr at arm dot com>
Cc: 'GNU C Library' <libc-alpha at sourceware dot org>
Date: Tue, 21 Jul 2015 19:47:33 +0200
Subject: Re: [PATCH v2] Add math-inline benchmark
Authentication-results: sourceware.org; auth=none
References: <002001d0bfb8$b36fa330$1a4ee990$ at com> <20150716225056 dot GA24479 at domone> <002501d0c094$3ea04cd0$bbe0e670$ at com> <20150718113423 dot GC30356 at domone> <002a01d0c2db$7ad80c30$70882490$ at com> <20150720192216 dot GA2019 at domone> <003001d0c3d0$5fb0a390$1f11eab0$ at com>

On Tue, Jul 21, 2015 at 05:14:51PM +0100, Wilco Dijkstra wrote:
> > OndÅej BÃlka wrote:
> > On Mon, Jul 20, 2015 at 12:01:50PM +0100, Wilco Dijkstra wrote:
> > > > OndÅej BÃlka wrote:
> > > > On Fri, Jul 17, 2015 at 02:26:53PM +0100, Wilco Dijkstra wrote:
> > > > But you claimed following in original mail which is wrong:
> > > >
> > > > "
> > > > Results shows that using the GCC built-ins in math.h gives huge speedups due to avoiding
> > > > explict
> > > > calls, PLT indirection to execute a function with 3-4 instructions - around 7x on AArch64
> > and
> > > > 2.8x
> > > > on x64. The GCC builtins have better performance than the existing math_private inlines
> > for
> > > > __isnan,
> > > > __finite and __isinf_ns, so these should be removed.
> > > > "
> > >
> > > No that statement is 100% correct.
> > >
> > As for isinf_ns on some architectures current isinf inline is better so
> > it should be replaced by that instead.
> 
> The current inline (__isinf_ns) is not better than the builtin.
>
Thats correct as I also said that. But it doesn't change my point.
isinf_ns is slower than builtin. builtin is on x64 slower that current
inline. So we should replace internal isinf_ns with current inline to
get better performance.
 
> > Your claim about __finite builtin is definitely false on x64, its
> > slower:
> 
> No that's not what I am getting on x64. With the movq instruction
> and inlining as my original patch:
> 
Thats only your patch. It isn't clear at all if that happens in reality.
My benchmark shows that it don't. 

So Carlos we have two benchmarks each saying that one alternative is
better than other, how you solve that?
 
> > > It's obvious the huge speedup applies to all other architectures as well -
> > > it's hard to imagine that avoiding a call, a return, a PLT indirection and
> > > additional optimization of 3-4 instructions could ever cause a slowdown...
> > >
> > But I didn't asked about that. I asked that you made bug x64 specific
> > but its not clear all all if other architectures are affected or not. So
> > you must test these.
> 
> All architectures are affected as they will get the speedup from the inlining.
>
Again this is irrelevant. Current inlines could be faster than builtins
on most architectures so you should test that and open gcc bugs.
 
> > > > So I ask you again to run my benchmark with changed EXTRACT_WORDS64 to
> > > > see if this is problem also and arm.
> > >
> > > Here are the results for x64 with inlining disabled (__always_inline changed
> > > into noinline) and the movq instruction like you suggested:
> > >
> > I asked to run my benchmark on arm, not your benchmark on x64. As you described
> > modifications it does measure just performance of noninline function
> > which we don't want. There is difference in performance how gcc
> > optimizes that so you must surround entire expression in noinline. For
> > example gcc optimizes
> > 
> > # define isinf(x) (noninf(x) ? (x == 1.0 / 0.0 ? 1 : 0))
> > if (isinf(x))
> >   foo()
> > 
> > into
> > 
> > if (!noninf)
> >   foo()
> 
> Which is exactly what we want to happen (and why the non-inlined isinf results
> are not interesting as 99.9% of times we do the optimization).
>
That isn't completely correct. While this happens most callers do is*
check only once at start so loading constants costs you. So you need to
be careful how you write benchmark to get correct result like I did in
mine.
> > >    "isnan_t": {
> > >     "normal": {
> > >      "duration": 1.50514e+06,
> > >      "iterations": 500,
> > >      "mean": 3010
> > >     }
> > >    },
> > 
> > why is isnan faster than builtin?
> 
> You asked for results with inlining turned off, so basically this shows the difference
> between inlining the isnan builtin into a loop and calling a function which has the
> isnan builtin inlined.
>
I never asked to do that. I said that you should measure expression
using isinf surrounded in noinline function to avoid reusing constants
in loop.

That will give you different results than you get.

Follow-Ups:
- RE: [PATCH v2] Add math-inline benchmark
  - From: Wilco Dijkstra

References:
- [PATCH v2] Add math-inline benchmark
  - From: Wilco Dijkstra
- Re: [PATCH v2] Add math-inline benchmark
  - From: OndÅej BÃlka
- RE: [PATCH v2] Add math-inline benchmark
  - From: Wilco Dijkstra
- Re: [PATCH v2] Add math-inline benchmark
  - From: OndÅej BÃlka
- RE: [PATCH v2] Add math-inline benchmark
  - From: Wilco Dijkstra
- Re: [PATCH v2] Add math-inline benchmark
  - From: OndÅej BÃlka
- RE: [PATCH v2] Add math-inline benchmark
  - From: Wilco Dijkstra

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]