This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] aarch64: optimize the unaligned case of memcmp


On Tue, Jun 27, 2017 at 10:11 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> Hi,
>
> So I had a look at it using the GLIBC bench-memcmp.c. I quickly got the
> unaligned loop to go faster than the aligned one using ccmp, so I had to
> tune the unaligned loop too... This is using a similar trick as your byte loop to
> remove a branch and 1-2 ALU operations per iteration.
>
> This gives a 24% speedup on both Cortex-A53 and Cortex-A72 for
> the aligned loop, and about 18% for the unaligned loop on top of your
> patch. Aligning either src1 or src2 appears best as there isn't enough
> work in the loops to hide an unaligned access.

This looks good.  Could you please send a patch?

Thanks,
Sebastian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]