This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [PATCH v2] Add math-inline benchmark


> OndÅej BÃlka wrote:
> On Tue, Jul 21, 2015 at 05:17:23PM +0100, Wilco Dijkstra wrote:

> > Fpclassify uses the builtin, but I think it's better to remove these
> > tests as they don't add much value. It's pretty obvious it is better
> > to use separate tests instead of fpclassify when you only need 2 of them.
> >
> I agree that its better to remove them.
> 
> However it isn't obvious that separate tests are better as its wrong. It
> should be same code as gcc should optimize that to better but often
> isnt. When I looked to assembly of my benchmark of noinline tests then
> performance difference was caused just by gcc bug. In fpclassify case it
> increased and decreased stack pointer without writing anything to stack.

Sure optimization makes a difference to these benchmarks, that's unavoidable,
so that's why I always carefully check the generated assembly to ensure I
really measure what I wanted to measure.

> And its easy to find cases where opposite is true as this depends how
> you write benchmark, for example if I change my benchmark to use following:
> 
> extern __always_inline int
> __fpclassify_test1 (double d)
> {
>   int cl = fpclassify (d);
>   return (cl == FP_INFINITE) ? 3 * sin (d) :
>          ((cl == FP_NAN) ? 2 * cos (d) : 0) ;
> }
> 
> extern __always_inline int
> __fpclassify_test2 (double d)
> {
>   return (__builtin_isinf(d)) ? 3 * sin (d) :
>          ((__builtin_isnan(d)) ? 2 * cos (d) : 0) ;
> 
> }
> 
> then fpclassify becomes lot faster.
> 
>    "__fpclassify_test1_t": {
>     "normal": {
>      "duration": 5.91344e+06,
>      "iterations": 500,
>      "mean": 11826
>     }
>    },
>    "__fpclassify_test2_t": {
>     "normal": {
>      "duration": 7.45868e+06,
>      "iterations": 500,
>      "mean": 14917
>     }
>    },

Well that doesn't seem correct. Here are my results for the above
using latest GCC and modern x64 hardware:

   "__fpclassify_test1_t": {
    "normal": {
     "duration": 1.99627e+06,
     "iterations": 500,
     "mean": 3992
    }
   },
   "__fpclassify_test2_t": {
    "normal": {
     "duration": 1.9893e+06,
     "iterations": 500,
     "mean": 3978
    }
   },

> > > Also you don't test inlines in isnormal:
> > >
> > > > > +/* Explicit inline similar to existing math.h implementation.  */
> > > > > +
> > > > > +#define __isnormal_inl(X) (__fpclassify (X) == FP_NORMAL)
> >
> > This is an inline and exactly what the current math.h does. The goal of my
> > benchmark is comparing the existing implementation with the new inlines.
> >
> Then yqu should compare existing implementation with new inlines. Now
> both would just call fpclassify without your patch.

OK, I'll add a builtin version for fpclassify as well (and use that in
__isnormal_inl2) - then all tests have the non-inline version, an internal 
GLIBC inline if it exists, the builtin and the math.h version.

Wilco



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]