This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Add math-inline benchmark
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Wilco Dijkstra <wdijkstr at arm dot com>
- Cc: GNU C Library <libc-alpha at sourceware dot org>
- Date: Mon, 22 Jun 2015 10:36:57 +0200
- Subject: Re: [PATCH] Add math-inline benchmark
- Authentication-results: sourceware.org; auth=none
- References: <001c01d0a912$42357710$c6a06530$ at com>
On Wed, Jun 17, 2015 at 04:28:27PM +0100, Wilco Dijkstra wrote:
> Hi,
>
> Due to popular demand, here is a new benchmark that tests isinf, isnan,
> isnormal, isfinite and fpclassify. It uses 2 arrays with 1024 doubles,
> one with 99% finite FP numbers (10% zeroes, 10% negative) and 1% inf/NaN,
> the other with 50% inf, and 50% Nan.
>
> Results shows that using the GCC built-ins in math.h will give huge speedups
> due to avoiding explict calls, PLT indirection to execute a function with
> 3-4 instructions. The GCC builtins have similar performance as the existing
> math_private inlines for __isnan, __finite and __isinf_ns.
>
> OK for commit?
>
Ran these, on x64 using builtins is regression even with your benchmark.
Main problem here is what exactly you do measure. I don't know how much
of your results were caused by measuring latency of load/multiply/move
to int register chain. With OoO that latency shouldn't be problem.
Original results are following, when I also inlined isfinite:
__fpclassify_test2_t: 3660.24 3733.22
__fpclassify_test1_t: 3696.33 3691.3
__fpclassify_t: 14365.8 11116.5
fpclassify_t: 6045.69 3128.76
__isnormal_inl2_t: 5275.85 14562.6
__isnormal_inl_t: 14753.3 11143.5
isnormal_t: 4418.84 4411.59
__finite_inl_t: 3038.75 3038.4
__finite_t: 7712.42 7697.24
isfinite_t: 3108.91 3107.85
__isinf_inl_t: 2109.05 2817.19
__isinf_t: 8555.51 8559.36
isinf_t: 3472.62 3408.8
__isnan_inl_t: 2682.12 2691.39
__isnan_t: 7698.14 7735.29
isnan_t: 2592.58 2572.82
But with latency hiding by using argument first suddenly even isnan and
isnormal become regression.
for (i = 0; i < n; i++){ res += 3*sin(p[i] * 2.0); \
if (func (p[i] * 2.0)) res += 5;} \
__fpclassify_test2_t: 92929.4 37256.8
__fpclassify_test1_t: 94020.1 35512.1
__fpclassify_t: 17321.2 13325.1
fpclassify_t: 8021.29 4376.89
__isnormal_inl2_t: 93896.9 38941.8
__isnormal_inl_t: 98069.2 46140.4
isnormal_t: 94775.6 36941.8
__finite_inl_t: 84059.9 38304
__finite_t: 96052.4 45998.2
isfinite_t: 93371.5 36659.1
__isinf_inl_t: 92532.9 36050.1
__isinf_t: 95929.4 46585.2
isinf_t: 93290.1 36445.6
__isnan_inl_t: 82760.7 37452.2
__isnan_t: 98064.6 45338.8
isnan_t: 93386.7 37786.4