This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2] Add math-inline benchmark


On Mon, Jul 20, 2015 at 12:01:50PM +0100, Wilco Dijkstra wrote:
> > OndÅej BÃlka wrote:
> > On Fri, Jul 17, 2015 at 02:26:53PM +0100, Wilco Dijkstra wrote:
> > But you claimed following in original mail which is wrong:
> > 
> > "
> > Results shows that using the GCC built-ins in math.h gives huge speedups due to avoiding
> > explict
> > calls, PLT indirection to execute a function with 3-4 instructions - around 7x on AArch64 and
> > 2.8x
> > on x64. The GCC builtins have better performance than the existing math_private inlines for
> > __isnan,
> > __finite and __isinf_ns, so these should be removed.
> > "
> 
> No that statement is 100% correct.
>
As for isinf_ns on some architectures current isinf inline is better so
it should be replaced by that instead.

Your claim about __finite builtin is definitely false on x64, its
slower:

   "__finite_inl_t": {
    "normal": {
     "duration": 3.42074e+07,
     "iterations": 500,
     "mean": 68414
    }
   },
   "__isfinite_builtin_t": {
    "normal": {
     "duration": 3.43805e+07,
     "iterations": 500,
     "mean": 68760
    }
   },
   "finite_new_t": {
    "normal": {
     "duration": 3.40305e+07,
     "iterations": 500,
     "mean": 68061
    }
   },
   "finite_new2_t": {
    "normal": {
     "duration": 3.40128e+07,
     "iterations": 500,
     "mean": 68025
    }

Only isnan looks correct. Probably nothing could beat builtin which does
#define isnan(x) x!=x

                       
> > Also when inlines give speedup you should also add math inlines for
> > signaling nan case. That gives similar speedup. And it would be natural
> > to ask if you should use these inlines everytime if they are already
> > faster than builtins.
> 
> I'm not sure what you mean here - I enable the new inlines in exactly the
> right case. Improvements to support signalling NaNs or to speedup the 
> built-ins further will be done in GCC.
> 
Why we cant just use
#ifdef __SUPPORT_SNAN__
math inline
#else
builtin
#endif


> > > > So at least on x64 we should publish math_private inlines instead using
> > > > slow builtins.
> > >
> > > Well it was agreed we are going to use the GCC built-ins and then improve
> > > those. If you want to propose additional patches with special inlines for
> > > x64 then please go ahead, but my plan is to improve the builtins.
> > >
> > And how are you sure that its just isolated x64 case. It may also happen
> > on powerpc, arm, sparc and other architectures and you need to test
> > that.
> 
> It's obvious the huge speedup applies to all other architectures as well -
> it's hard to imagine that avoiding a call, a return, a PLT indirection and 
> additional optimization of 3-4 instructions could ever cause a slowdown...
> 
But I didn't asked about that. I asked that you made bug x64 specific
but its not clear all all if other architectures are affected or not. So
you must test these.

> > So I ask you again to run my benchmark with changed EXTRACT_WORDS64 to
> > see if this is problem also and arm.
> 
> Here are the results for x64 with inlining disabled (__always_inline changed
> into noinline) and the movq instruction like you suggested:
> 
I asked to run my benchmark on arm, not your benchmark on x64. As you described
modifications it does measure just performance of noninline function
which we don't want. There is difference in performance how gcc
optimizes that so you must surround entire expression in noinline. For
example gcc optimizes

# define isinf(x) (noninf(x) ? (x == 1.0 / 0.0 ? 1 : 0))
if (isinf(x))
  foo()

into

if (!noninf)
  foo()


>    "__isnan_t": {
>     "normal": {
>      "duration": 3.52048e+06,
>      "iterations": 500,
>      "mean": 7040
>     }
>    },
>    "__isnan_inl_t": {
>     "normal": {
>      "duration": 3.09247e+06,
>      "iterations": 500,
>      "mean": 6184
>     }
>    },
>    "__isnan_builtin_t": {
>     "normal": {
>      "duration": 2.20378e+06,
>      "iterations": 500,
>      "mean": 4407
>     }
>    },
>    "isnan_t": {
>     "normal": {
>      "duration": 1.50514e+06,
>      "iterations": 500,
>      "mean": 3010
>     }
>    },

why is isnan faster than builtin?

>    "__isnormal_inl2_t": {
>     "normal": {
>      "duration": 2.18113e+06,
>      "iterations": 500,
>      "mean": 4362
>     }
>    },
>    "__isnormal_builtin_t": {
>     "normal": {
>      "duration": 3.08183e+06,
>      "iterations": 500,
>      "mean": 6163
>     }
>    },

also here why got isnormal2 so good?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]