This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2] Add math-inline benchmark


On Tue, Jul 21, 2015 at 06:46:38PM +0100, Wilco Dijkstra wrote:
> > OndÅej BÃlka wrote:
> > On Tue, Jul 21, 2015 at 05:17:23PM +0100, Wilco Dijkstra wrote:
> 
> > > Fpclassify uses the builtin, but I think it's better to remove these
> > > tests as they don't add much value. It's pretty obvious it is better
> > > to use separate tests instead of fpclassify when you only need 2 of them.
> > >
> > I agree that its better to remove them.
> > 
> > However it isn't obvious that separate tests are better as its wrong. It
> > should be same code as gcc should optimize that to better but often
> > isnt. When I looked to assembly of my benchmark of noinline tests then
> > performance difference was caused just by gcc bug. In fpclassify case it
> > increased and decreased stack pointer without writing anything to stack.
> 
> Sure optimization makes a difference to these benchmarks, that's unavoidable,
> so that's why I always carefully check the generated assembly to ensure I
> really measure what I wanted to measure.
> 
> > And its easy to find cases where opposite is true as this depends how
> > you write benchmark, for example if I change my benchmark to use following:
> > 
> > extern __always_inline int
> > __fpclassify_test1 (double d)
> > {
> >   int cl = fpclassify (d);
> >   return (cl == FP_INFINITE) ? 3 * sin (d) :
> >          ((cl == FP_NAN) ? 2 * cos (d) : 0) ;
> > }
> > 
> > extern __always_inline int
> > __fpclassify_test2 (double d)
> > {
> >   return (__builtin_isinf(d)) ? 3 * sin (d) :
> >          ((__builtin_isnan(d)) ? 2 * cos (d) : 0) ;
> > 
> > }
> > 
> > then fpclassify becomes lot faster.
> > 
> >    "__fpclassify_test1_t": {
> >     "normal": {
> >      "duration": 5.91344e+06,
> >      "iterations": 500,
> >      "mean": 11826
> >     }
> >    },
> >    "__fpclassify_test2_t": {
> >     "normal": {
> >      "duration": 7.45868e+06,
> >      "iterations": 500,
> >      "mean": 14917
> >     }
> >    },
> 
> Well that doesn't seem correct. Here are my results for the above
> using latest GCC and modern x64 hardware:
>
Thats what your benchmark produces. Again difference between mine
benchmarks and yours is in how you handled constants loads by forbidding
inlining. My benchmark used following:

#define BOOLTEST(func)                                    \
int __attribute__((noinline)) func## _t_t (double d,int a)        \
{                                                         \
  if (func(d))                                            \
    return 3 * sin (d) + a;                                       \
  else                                                    \
    return 2;                                             \
}                                                         \
int                                                       \
func ## _t (volatile double *p, size_t n, size_t iters)   \
{                                                         \
  int i, j;                                               \
  int res = 0;                                            \
  for (j = 0; j < iters; j++)                             \
    for (i = 0; i < n; i++)                               \
      res += func## _t_t (2.0 * p[i], i);                         \
  return res;                                             \
}

 
>    "__fpclassify_test1_t": {
>     "normal": {
>      "duration": 1.99627e+06,
>      "iterations": 500,
>      "mean": 3992
>     }
>    },
>    "__fpclassify_test2_t": {
>     "normal": {
>      "duration": 1.9893e+06,
>      "iterations": 500,
>      "mean": 3978
>     }
>    },
> 
> > > > Also you don't test inlines in isnormal:
> > > >
> > > > > > +/* Explicit inline similar to existing math.h implementation.  */
> > > > > > +
> > > > > > +#define __isnormal_inl(X) (__fpclassify (X) == FP_NORMAL)
> > >
> > > This is an inline and exactly what the current math.h does. The goal of my
> > > benchmark is comparing the existing implementation with the new inlines.
> > >
> > Then yqu should compare existing implementation with new inlines. Now
> > both would just call fpclassify without your patch.
> 
> OK, I'll add a builtin version for fpclassify as well (and use that in
> __isnormal_inl2) - then all tests have the non-inline version, an internal 
> GLIBC inline if it exists, the builtin and the math.h version.
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]