This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] aarch64: optimize the unaligned case of memcmp

From: Sebastian Pop <sebpop at gmail dot com>
To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
Cc: Sebastian Pop <s dot pop at samsung dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, Marcus Shawcroft <Marcus dot Shawcroft at arm dot com>, "maxim dot kuvyrkov at linaro dot org" <maxim dot kuvyrkov at linaro dot org>, Ramana Radhakrishnan <Ramana dot Radhakrishnan at arm dot com>, "ryan dot arnold at linaro dot org" <ryan dot arnold at linaro dot org>, "adhemerval dot zanella at linaro dot org" <adhemerval dot zanella at linaro dot org>, nd <nd at arm dot com>
Date: Tue, 27 Jun 2017 10:25:43 -0500
Subject: Re: [PATCH] aarch64: optimize the unaligned case of memcmp
Authentication-results: sourceware.org; auth=none
References: <CGME20170622233226uscas1p213aefedba5fe47e520aac1226a731162@uscas1p2.samsung.com> <1498174226-16525-1-git-send-email-s.pop@samsung.com> <637cf51c-160d-172f-6520-bba51058f85e@samsung.com> <AM5PR0802MB26106339AAEF3DABB5ACE56F83D80@AM5PR0802MB2610.eurprd08.prod.outlook.com> <19ed586c-9724-cdc4-177f-174f880864a4@samsung.com> <AM5PR0802MB2610E38DEE75A9457B824C7C83DF0@AM5PR0802MB2610.eurprd08.prod.outlook.com> <CAFk3UF_ek2HQzQ_Cr_CPg7pstn5MYnGxajiaaTvM--w43GXzCA@mail.gmail.com> <AM5PR0802MB26100A995EDD99E4F15D952B83DF0@AM5PR0802MB2610.eurprd08.prod.outlook.com> <AM5PR0802MB261047184D953EACA8268D6783DC0@AM5PR0802MB2610.eurprd08.prod.outlook.com>

On Tue, Jun 27, 2017 at 10:11 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> Hi,
>
> So I had a look at it using the GLIBC bench-memcmp.c. I quickly got the
> unaligned loop to go faster than the aligned one using ccmp, so I had to
> tune the unaligned loop too... This is using a similar trick as your byte loop to
> remove a branch and 1-2 ALU operations per iteration.
>
> This gives a 24% speedup on both Cortex-A53 and Cortex-A72 for
> the aligned loop, and about 18% for the unaligned loop on top of your
> patch. Aligning either src1 or src2 appears best as there isn't enough
> work in the loops to hide an unaligned access.

This looks good.  Could you please send a patch?

Thanks,
Sebastian

Follow-Ups:
- Re: [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Wilco Dijkstra

References:
- [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Sebastian Pop
- Re: [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Sebastian Pop
- Re: [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Wilco Dijkstra
- Re: [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Sebastian Pop
- Re: [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Wilco Dijkstra
- Re: [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Sebastian Pop
- Re: [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Wilco Dijkstra
- Re: [PATCH] aarch64: optimize the unaligned case of memcmp
  - From: Wilco Dijkstra

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]