This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] aarch64: optimize the unaligned case of memcmp
Hi Wilco,
On Thu, Jun 29, 2017 at 9:47 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> This is the newlib version: https://sourceware.org/ml/newlib/2017/msg00524.html -
> maybe you could run your benchmark to verify it's faster?
Yes, your patch is faster than the patch that I submitted.
Here are the numbers from the bionic-benchmarks with your patch:
Benchmark Time CPU Iterations
----------------------------------------------------------------------
BM_string_memcmp/8 27 ns 27 ns 26198166
286.069MB/s
BM_string_memcmp/64 45 ns 45 ns 15553753
1.32443GB/s
BM_string_memcmp/512 242 ns 242 ns 2892423
1.97049GB/s
BM_string_memcmp/1024 455 ns 455 ns 1537290
2.09436GB/s
BM_string_memcmp/8k 3446 ns 3446 ns 203295
2.21392GB/s
BM_string_memcmp/16k 7567 ns 7567 ns 92582
2.01657GB/s
BM_string_memcmp/32k 16081 ns 16081 ns 43524
1.8977GB/s
BM_string_memcmp/64k 31029 ns 31028 ns 22565
1.96712GB/s
BM_string_memcmp_aligned/8 184 ns 184 ns 3800912
41.3654MB/s
BM_string_memcmp_aligned/64 287 ns 287 ns 2438835
212.65MB/s
BM_string_memcmp_aligned/512 1370 ns 1370 ns 511014
356.498MB/s
BM_string_memcmp_aligned/1024 2543 ns 2543 ns 275253
384.006MB/s
BM_string_memcmp_aligned/8k 20413 ns 20411 ns 34306
382.764MB/s
BM_string_memcmp_aligned/16k 42908 ns 42907 ns 16132
364.158MB/s
BM_string_memcmp_aligned/32k 88902 ns 88886 ns 8087
351.574MB/s
BM_string_memcmp_aligned/64k 173016 ns 173007 ns 4122
361.258MB/s
BM_string_memcmp_unaligned/8 212 ns 212 ns 3304163
36.0243MB/s
BM_string_memcmp_unaligned/64 361 ns 361 ns 1941597
169.279MB/s
BM_string_memcmp_unaligned/512 1754 ns 1753 ns 399210
278.492MB/s
BM_string_memcmp_unaligned/1024 3308 ns 3308 ns 211622
295.243MB/s
BM_string_memcmp_unaligned/8k 27227 ns 27225 ns 25637
286.964MB/s
BM_string_memcmp_unaligned/16k 55877 ns 55874 ns 12455
279.645MB/s
BM_string_memcmp_unaligned/32k 112397 ns 112366 ns 6200
278.11MB/s
BM_string_memcmp_unaligned/64k 223493 ns 223482 ns 3127
279.665MB/s
And here are the numbers for the base (without your patch):
Benchmark Time CPU Iterations
----------------------------------------------------------------------
BM_string_memcmp/8 30 ns 30 ns 23092418
251.611MB/s
BM_string_memcmp/64 58 ns 58 ns 12351397
1046.3MB/s
BM_string_memcmp/512 305 ns 305 ns 2297189
1.56479GB/s
BM_string_memcmp/1024 571 ns 571 ns 1225040
1.66906GB/s
BM_string_memcmp/8k 4307 ns 4306 ns 162561
1.77175GB/s
BM_string_memcmp/16k 9429 ns 9429 ns 74212
1.61826GB/s
BM_string_memcmp/32k 19166 ns 19164 ns 36521
1.59247GB/s
BM_string_memcmp/64k 37035 ns 37033 ns 18924
1.64811GB/s
BM_string_memcmp_aligned/8 199 ns 199 ns 3514227
38.3337MB/s
BM_string_memcmp_aligned/64 386 ns 386 ns 1811517
158.015MB/s
BM_string_memcmp_aligned/512 1731 ns 1730 ns 403250
282.163MB/s
BM_string_memcmp_aligned/1024 3198 ns 3198 ns 218758
305.354MB/s
BM_string_memcmp_aligned/8k 25217 ns 25214 ns 27483
309.85MB/s
BM_string_memcmp_aligned/16k 52015 ns 52013 ns 13527
300.407MB/s
BM_string_memcmp_aligned/32k 105034 ns 105034 ns 6689
297.522MB/s
BM_string_memcmp_aligned/64k 208464 ns 208465 ns 3427
299.81MB/s
BM_string_memcmp_unaligned/8 339 ns 339 ns 2066772
22.537MB/s
BM_string_memcmp_unaligned/64 1392 ns 1392 ns 502754
43.8398MB/s
BM_string_memcmp_unaligned/512 9194 ns 9194 ns 76133
53.1087MB/s
BM_string_memcmp_unaligned/1024 18326 ns 18324 ns 38209
53.2938MB/s
BM_string_memcmp_unaligned/8k 148591 ns 148585 ns 4713
52.5793MB/s
BM_string_memcmp_unaligned/16k 298256 ns 298207 ns 2343
52.3964MB/s
BM_string_memcmp_unaligned/32k 598855 ns 598828 ns 1085
52.1853MB/s
BM_string_memcmp_unaligned/64k 1196350 ns 1196355 ns 539
52.242MB/s
Thanks,
Sebastian