This is the mail archive of the
libc-ports@sources.redhat.com
mailing list for the libc-ports project.
Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.
- From: "Joseph S. Myers" <joseph at codesourcery dot com>
- To: Will Newton <will dot newton at linaro dot org>
- Cc: <libc-ports at sourceware dot org>, <patches at linaro dot org>
- Date: Thu, 29 Aug 2013 23:58:08 +0000
- Subject: Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.
- Authentication-results: sourceware.org; auth=none
- References: <520894D5 dot 7060207 at linaro dot org>
On Mon, 12 Aug 2013, Will Newton wrote:
> A small change to the entry to the aligned copy loop improves
> performance slightly on A9 and A15 cores for certain copies.
Could you clarify what you mean by "certain copies"?
In particular, have you verified that for all three choices in this code
(NEON, VFP or neither), the code for unaligned copies is at least as fast
in this case (common 32-bit alignment, but not common 64-bit alignment) as
the code that would previously have been used in those cases?
There are various comments regarding alignment, whether stating "LDRD/STRD
support unaligned word accesses" or referring to the mutual alignment that
applies for particular code. Does this patch make any of them out of
date? (If code can now only be reached with common 64-bit alignment, but
in fact requires only 32-bit alignment, the comment should probably state
both those things explicitly.)
--
Joseph S. Myers
joseph@codesourcery.com