This is the mail archive of the
libc-ports@sources.redhat.com
mailing list for the libc-ports project.
Re: [PATCH v2] ARM: Add Cortex-A15 optimized NEON and VFP memcpy routines, with IFUNC.
- From: Will Newton <will dot newton at linaro dot org>
- To: Richard Henderson <rth at twiddle dot net>
- Cc: libc-ports at sourceware dot org, Patch Tracking <patches at linaro dot org>
- Date: Wed, 17 Apr 2013 16:53:17 +0100
- Subject: Re: [PATCH v2] ARM: Add Cortex-A15 optimized NEON and VFP memcpy routines, with IFUNC.
- References: <516D18F0 dot 4060009 at linaro dot org> <516EC27E dot 8080502 at twiddle dot net>
On 17 April 2013 16:40, Richard Henderson <rth@twiddle.net> wrote:
Hi Richard,
Thanks for the review!
> On 2013-04-16 11:25, Will Newton wrote:
>>
>> ports/sysdeps/arm/armv7/multiarch/Makefile | 3 +
>
>
> Does this really require v7? From a brief read I didn't see anything in the
> _arm version that didn't work since v5te (ldrd and pld). Any reason not to
> put this into armv6 instead?
>From reading the comments of the code v7 is required for NEON, v6 is
required for VFP and unaligned access is required. The unaligned
access requirement may be a problem on v5 I'm not sure. NB: I did not
write the memcpy code so I have not looked at it in great detail.
I also had trouble building an armv6 glibc. I only have armv7 systems
to test on and it doesn't seem possible to build for armv6 on an armv7
system as far as I can tell.
>> +ENTRY(memcpy)
>> + .type memcpy, %gnu_indirect_function
>> + ldr r1, .Lmemcpy_arm
>> + tst r0, #HWCAP_ARM_NEON
>> + it ne
>> + ldrne r1, .Lmemcpy_neon
>> + bne 1f
>
>
> Swap vfp and neon tests and you don't need the branch.
True, I'll do that.
>> +.Lreturn:
>
>
> Unused label?
Yes, thanks, will fix.
>> + ldr tmp1, [src, #-60] /* 15 words to go. */
>> + str tmp1, [dst, #-60]
>
>
> These negative offsets mean thumb2 doesn't work. That's fine, but it means
> that you need care for this in the _arm case.
>
> You have two choices: either do the swapping to arm mode by hand in the impl
> file, or force the entire memcpy.o to arm mode by using #define NO_THUMB at
> the top, before the #include <sysdep.h>.
It sounds like switching it all to arm mode is the best option, I'll do that.
--
Will Newton
Toolchain Working Group, Linaro