This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Add math-inline benchmark


On Mon, Jul 06, 2015 at 03:50:11PM +0100, Wilco Dijkstra wrote:
> 
> 
> > OndÅej BÃlka wrote:
> > But with latency hiding by using argument first suddenly even isnan and
> > isnormal become regression.
> > 
> >     for (i = 0; i < n; i++){ res += 3*sin(p[i] * 2.0);    \
> >       if (func (p[i] * 2.0)) res += 5;}                   \
> > 
> > 
> > __fpclassify_test2_t:   92929.4 37256.8
> > __fpclassify_test1_t:   94020.1 35512.1
> >       __fpclassify_t:   17321.2 13325.1
> >         fpclassify_t:   8021.29 4376.89
> >    __isnormal_inl2_t:   93896.9 38941.8
> >     __isnormal_inl_t:   98069.2 46140.4
> >           isnormal_t:   94775.6 36941.8
> >       __finite_inl_t:   84059.9 38304
> >           __finite_t:   96052.4 45998.2
> >           isfinite_t:   93371.5 36659.1
> >        __isinf_inl_t:   92532.9 36050.1
> >            __isinf_t:   95929.4 46585.2
> >              isinf_t:   93290.1 36445.6
> >        __isnan_inl_t:   82760.7 37452.2
> >            __isnan_t:   98064.6 45338.8
> >              isnan_t:   93386.7 37786.4
> 
> Can you try this with:
> 
>     for (i = 0; i < n; i++)                               \
>       { double tmp = p[i] * 2.0;    \
>       if (sin(tmp) < 1.0) res++; if (func (tmp)) res += 5;}                   \
>
That doesn't change outcome:

__fpclassify_test2_t: 	99721	51051.6
__fpclassify_test1_t: 	85015.2	43607.4
      __fpclassify_t: 	13997.3	10475.1
        fpclassify_t: 	13502.5	10253.6
   __isnormal_inl2_t: 	76479.4	41531.7
    __isnormal_inl_t: 	76526.9	41560.8
          isnormal_t: 	76458.6	41547.7
      __finite_inl_t: 	71108.6	33271.3
          __finite_t: 	73031	37452.3
          isfinite_t: 	73024.9	37447
       __isinf_inl_t: 	68599.2	32792.9
           __isinf_t: 	74851	40108.8
             isinf_t: 	74871.9	40109.9
       __isnan_inl_t: 	71100.8	33659.6
           __isnan_t: 	72914	37592.4
             isnan_t: 	72909.4	37635.8
 
> Basically GCC does the array read and multiply twice just like you told it
> to (remember this is not using -ffast-math). Also avoid adding unnecessary
> FP operations and conversions (which may interact badly with timing the
> code we're trying to test). 
> 
And how do you know that most users don't use fp conversions in their
code just before isinf? These interactions make benchtests worthless as
in practice a different variant would be faster than one that you
measure.

> For me the fixed version still shows the expected answer: the built-ins are
> either faster or as fast as the inlines. So I don't think there is any
> regression here (remember also that previously there were no inlines at all
> except for a few inside GLIBC, so the real speedup is much larger).

Thats arm only. So it looks that we need platform-specific headers and testing.

These give speedup but as internal on x64 are better as they are its
natural to ask if using these in general would give same speedup. That
leads to fixing gcc builtins.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]