This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
RE: [PATCH] Add math-inline benchmark
- From: "Wilco Dijkstra" <wdijkstr at arm dot com>
- To: 'OndÅej BÃlka' <neleai at seznam dot cz>
- Cc: "GNU C Library" <libc-alpha at sourceware dot org>
- Date: Fri, 10 Jul 2015 17:09:16 +0100
- Subject: RE: [PATCH] Add math-inline benchmark
- Authentication-results: sourceware.org; auth=none
- References: <001c01d0a912$42357710$c6a06530$ at com> <20150622083657 dot GA3684 at domone> <000701d0b7fb$0f27b840$2d7728c0$ at com> <20150709124454 dot GA29625 at domone>
> OndÅej BÃlka wrote:
> On Mon, Jul 06, 2015 at 03:50:11PM +0100, Wilco Dijkstra wrote:
> >
> >
> > > OndÅej BÃlka wrote:
> > > But with latency hiding by using argument first suddenly even isnan and
> > > isnormal become regression.
> > >
> > > for (i = 0; i < n; i++){ res += 3*sin(p[i] * 2.0); \
> > > if (func (p[i] * 2.0)) res += 5;} \
> > >
> > >
> > > __fpclassify_test2_t: 92929.4 37256.8
> > > __fpclassify_test1_t: 94020.1 35512.1
> > > __fpclassify_t: 17321.2 13325.1
> > > fpclassify_t: 8021.29 4376.89
> > > __isnormal_inl2_t: 93896.9 38941.8
> > > __isnormal_inl_t: 98069.2 46140.4
> > > isnormal_t: 94775.6 36941.8
> > > __finite_inl_t: 84059.9 38304
> > > __finite_t: 96052.4 45998.2
> > > isfinite_t: 93371.5 36659.1
> > > __isinf_inl_t: 92532.9 36050.1
> > > __isinf_t: 95929.4 46585.2
> > > isinf_t: 93290.1 36445.6
> > > __isnan_inl_t: 82760.7 37452.2
> > > __isnan_t: 98064.6 45338.8
> > > isnan_t: 93386.7 37786.4
> >
> > Can you try this with:
> >
> > for (i = 0; i < n; i++) \
> > { double tmp = p[i] * 2.0; \
> > if (sin(tmp) < 1.0) res++; if (func (tmp)) res += 5;} \
> >
> That doesn't change outcome:
>
> __fpclassify_test2_t: 99721 51051.6
> __fpclassify_test1_t: 85015.2 43607.4
> __fpclassify_t: 13997.3 10475.1
> fpclassify_t: 13502.5 10253.6
> __isnormal_inl2_t: 76479.4 41531.7
> __isnormal_inl_t: 76526.9 41560.8
> isnormal_t: 76458.6 41547.7
> __finite_inl_t: 71108.6 33271.3
> __finite_t: 73031 37452.3
> isfinite_t: 73024.9 37447
> __isinf_inl_t: 68599.2 32792.9
> __isinf_t: 74851 40108.8
> isinf_t: 74871.9 40109.9
> __isnan_inl_t: 71100.8 33659.6
> __isnan_t: 72914 37592.4
> isnan_t: 72909.4 37635.8
That doesn't look correct - it looks like this didn't use the built-ins at all,
did you forget to apply that patch?
Anyway I received a new machine so now GLIBC finally builds for x64. Since
there appear large variations from run to run I repeat the same tests 4 times
by copying the FOR_EACH_IMPL loop. The first 1 or 2 are bad, the last 2
converge to useable results. So I suspect frequency scaling is an issue here.
Without the sin(tmp) part I get:
remainder_test2_t: 40786.9 192862
remainder_test1_t: 43008.2 196311
__fpclassify_test2_t: 2856.56 3020.12
__fpclassify_test1_t: 3043.53 3135.89
__fpclassify_t: 12500.6 10152.5
fpclassify_t: 2972.54 3047.65
__isnormal_inl2_t: 4619.55 14491.1
__isnormal_inl_t: 12896.3 10306.7
isnormal_t: 4254.42 3667.87
__finite_inl_t: 3979.58 3991.6
__finite_t: 7039.61 7039.37
isfinite_t: 2992.65 2969.25
__isinf_inl_t: 2852.1 3239.23
__isinf_t: 8991.81 8813.44
isinf_t: 3241.75 3241.54
__isnan_inl_t: 4003.51 3977.73
__isnan_t: 7054.54 7054.5
isnan_t: 2819.66 2801.94
And with the sin() addition:
remainder_test2_t: 105093 214635
remainder_test1_t: 106635 218012
__fpclassify_test2_t: 64290.9 32116.6
__fpclassify_test1_t: 64365.1 32310.2
__fpclassify_t: 72006.1 41607
fpclassify_t: 64190.3 33450.1
__isnormal_inl2_t: 65959.1 33672
__isnormal_inl_t: 71875.7 41727.3
isnormal_t: 65676.1 32826.1
__finite_inl_t: 69600.6 35293.3
__finite_t: 67653.8 38627.2
isfinite_t: 64435.9 34904.9
__isinf_inl_t: 68556.6 33176
__isinf_t: 69066.4 39562.7
isinf_t: 64755.5 34244.6
__isnan_inl_t: 69577.3 34776.2
__isnan_t: 67538.8 38321.3
isnan_t: 63963 33276.6
The remainder test is basically math/w_remainder.c adapted to use __isinf_inl
and __isnan_inl (test1) or the isinf/isnan built-ins (test2).
>From this it seems that __isinf_inl is slightly better than the builtin, but
it does not show up as a regression when combined with sin or in the remainder
test.
So I don't see any potential regression here on x64 - in fact it looks like
inlining using the built-ins gives quite good speedups across the board. And
besides inlining applications using GLIBC it also inlines a lot of callsites
within GLIBC that weren't previously inlined.
> > Basically GCC does the array read and multiply twice just like you told it
> > to (remember this is not using -ffast-math). Also avoid adding unnecessary
> > FP operations and conversions (which may interact badly with timing the
> > code we're trying to test).
> >
> And how do you know that most users don't use fp conversions in their
> code just before isinf? These interactions make benchtests worthless as
> in practice a different variant would be faster than one that you
> measure.
You always get such interactions, it's unavoidable. That's why I added some
actual math code that uses isinf/isnan to see how it performs in real life.
> > For me the fixed version still shows the expected answer: the built-ins are
> > either faster or as fast as the inlines. So I don't think there is any
> > regression here (remember also that previously there were no inlines at all
> > except for a few inside GLIBC, so the real speedup is much larger).
>
> Thats arm only. So it looks that we need platform-specific headers and testing.
Well I just confirmed the same gains apply to x64.
Wilco