This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] aarch64: optimize the unaligned case of memcmp

From: Sebastian Pop <sebpop at gmail dot com>
To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
Cc: Sebastian Pop <s dot pop at samsung dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, Marcus Shawcroft <Marcus dot Shawcroft at arm dot com>, "maxim dot kuvyrkov at linaro dot org" <maxim dot kuvyrkov at linaro dot org>, Ramana Radhakrishnan <Ramana dot Radhakrishnan at arm dot com>, "ryan dot arnold at linaro dot org" <ryan dot arnold at linaro dot org>, "adhemerval dot zanella at linaro dot org" <adhemerval dot zanella at linaro dot org>, nd <nd at arm dot com>
Date: Thu, 29 Jun 2017 12:09:28 -0500
Subject: Re: [PATCH] aarch64: optimize the unaligned case of memcmp
Authentication-results: sourceware.org; auth=none
References: <CGME20170622233226uscas1p213aefedba5fe47e520aac1226a731162@uscas1p2.samsung.com> <1498174226-16525-1-git-send-email-s.pop@samsung.com> <637cf51c-160d-172f-6520-bba51058f85e@samsung.com> <AM5PR0802MB26106339AAEF3DABB5ACE56F83D80@AM5PR0802MB2610.eurprd08.prod.outlook.com> <19ed586c-9724-cdc4-177f-174f880864a4@samsung.com> <AM5PR0802MB2610E38DEE75A9457B824C7C83DF0@AM5PR0802MB2610.eurprd08.prod.outlook.com> <CAFk3UF_ek2HQzQ_Cr_CPg7pstn5MYnGxajiaaTvM--w43GXzCA@mail.gmail.com> <AM5PR0802MB26100A995EDD99E4F15D952B83DF0@AM5PR0802MB2610.eurprd08.prod.outlook.com> <AM5PR0802MB261047184D953EACA8268D6783DC0@AM5PR0802MB2610.eurprd08.prod.outlook.com> <CAFk3UF-wP0K5+cVk8iC051A05r+By1rHSpQrOGOi70gDhL0YCA@mail.gmail.com> <AM5PR0802MB2610BA956DC1B2499237254183D20@AM5PR0802MB2610.eurprd08.prod.outlook.com>

Hi Wilco,

On Thu, Jun 29, 2017 at 9:47 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> This is the newlib version: https://sourceware.org/ml/newlib/2017/msg00524.html -
> maybe you could run your benchmark to verify it's faster?

Yes, your patch is faster than the patch that I submitted.
Here are the numbers from the bionic-benchmarks with your patch:

Benchmark                                Time           CPU Iterations
----------------------------------------------------------------------
BM_string_memcmp/8                      27 ns         27 ns   26198166
  286.069MB/s
BM_string_memcmp/64                     45 ns         45 ns   15553753
  1.32443GB/s
BM_string_memcmp/512                   242 ns        242 ns    2892423
  1.97049GB/s
BM_string_memcmp/1024                  455 ns        455 ns    1537290
  2.09436GB/s
BM_string_memcmp/8k                   3446 ns       3446 ns     203295
  2.21392GB/s
BM_string_memcmp/16k                  7567 ns       7567 ns      92582
  2.01657GB/s
BM_string_memcmp/32k                 16081 ns      16081 ns      43524
   1.8977GB/s
BM_string_memcmp/64k                 31029 ns      31028 ns      22565
  1.96712GB/s
BM_string_memcmp_aligned/8             184 ns        184 ns    3800912
  41.3654MB/s
BM_string_memcmp_aligned/64            287 ns        287 ns    2438835
   212.65MB/s
BM_string_memcmp_aligned/512          1370 ns       1370 ns     511014
  356.498MB/s
BM_string_memcmp_aligned/1024         2543 ns       2543 ns     275253
  384.006MB/s
BM_string_memcmp_aligned/8k          20413 ns      20411 ns      34306
  382.764MB/s
BM_string_memcmp_aligned/16k         42908 ns      42907 ns      16132
  364.158MB/s
BM_string_memcmp_aligned/32k         88902 ns      88886 ns       8087
  351.574MB/s
BM_string_memcmp_aligned/64k        173016 ns     173007 ns       4122
  361.258MB/s
BM_string_memcmp_unaligned/8           212 ns        212 ns    3304163
  36.0243MB/s
BM_string_memcmp_unaligned/64          361 ns        361 ns    1941597
  169.279MB/s
BM_string_memcmp_unaligned/512        1754 ns       1753 ns     399210
  278.492MB/s
BM_string_memcmp_unaligned/1024       3308 ns       3308 ns     211622
  295.243MB/s
BM_string_memcmp_unaligned/8k        27227 ns      27225 ns      25637
  286.964MB/s
BM_string_memcmp_unaligned/16k       55877 ns      55874 ns      12455
  279.645MB/s
BM_string_memcmp_unaligned/32k      112397 ns     112366 ns       6200
   278.11MB/s
BM_string_memcmp_unaligned/64k      223493 ns     223482 ns       3127
  279.665MB/s

And here are the numbers for the base (without your patch):

Benchmark                                Time           CPU Iterations
----------------------------------------------------------------------
BM_string_memcmp/8                      30 ns         30 ns   23092418
  251.611MB/s
BM_string_memcmp/64                     58 ns         58 ns   12351397
   1046.3MB/s
BM_string_memcmp/512                   305 ns        305 ns    2297189
  1.56479GB/s
BM_string_memcmp/1024                  571 ns        571 ns    1225040
  1.66906GB/s
BM_string_memcmp/8k                   4307 ns       4306 ns     162561
  1.77175GB/s
BM_string_memcmp/16k                  9429 ns       9429 ns      74212
  1.61826GB/s
BM_string_memcmp/32k                 19166 ns      19164 ns      36521
  1.59247GB/s
BM_string_memcmp/64k                 37035 ns      37033 ns      18924
  1.64811GB/s
BM_string_memcmp_aligned/8             199 ns        199 ns    3514227
  38.3337MB/s
BM_string_memcmp_aligned/64            386 ns        386 ns    1811517
  158.015MB/s
BM_string_memcmp_aligned/512          1731 ns       1730 ns     403250
  282.163MB/s
BM_string_memcmp_aligned/1024         3198 ns       3198 ns     218758
  305.354MB/s
BM_string_memcmp_aligned/8k          25217 ns      25214 ns      27483
   309.85MB/s
BM_string_memcmp_aligned/16k         52015 ns      52013 ns      13527
  300.407MB/s
BM_string_memcmp_aligned/32k        105034 ns     105034 ns       6689
  297.522MB/s
BM_string_memcmp_aligned/64k        208464 ns     208465 ns       3427
   299.81MB/s
BM_string_memcmp_unaligned/8           339 ns        339 ns    2066772
   22.537MB/s
BM_string_memcmp_unaligned/64         1392 ns       1392 ns     502754
  43.8398MB/s
BM_string_memcmp_unaligned/512        9194 ns       9194 ns      76133
  53.1087MB/s
BM_string_memcmp_unaligned/1024      18326 ns      18324 ns      38209
  53.2938MB/s
BM_string_memcmp_unaligned/8k       148591 ns     148585 ns       4713
  52.5793MB/s
BM_string_memcmp_unaligned/16k      298256 ns     298207 ns       2343
  52.3964MB/s
BM_string_memcmp_unaligned/32k      598855 ns     598828 ns       1085
  52.1853MB/s
BM_string_memcmp_unaligned/64k     1196350 ns    1196355 ns        539
   52.242MB/s

Thanks,
Sebastian

Follow-Ups:
- Re: [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Sebastian Pop

References:
- [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Sebastian Pop
- Re: [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Sebastian Pop
- Re: [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Wilco Dijkstra
- Re: [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Sebastian Pop
- Re: [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Wilco Dijkstra
- Re: [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Sebastian Pop
- Re: [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Wilco Dijkstra
- Re: [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Wilco Dijkstra
- Re: [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Sebastian Pop
- Re: [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Wilco Dijkstra

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]