This is the mail archive of the
libc-ports@sources.redhat.com
mailing list for the libc-ports project.
Re: [PATCH] ARM: Add Cortex-A15 optimized NEON and VFP memcpy routines, with IFUNC.
- From: Will Newton <will dot newton at linaro dot org>
- To: Ondřej Bílka <neleai at seznam dot cz>
- Cc: Måns Rullgård <mans at mansr dot com>, libc-ports at sourceware dot org, Patch Tracking <patches at linaro dot org>
- Date: Thu, 18 Apr 2013 10:47:26 +0100
- Subject: Re: [PATCH] ARM: Add Cortex-A15 optimized NEON and VFP memcpy routines, with IFUNC.
- References: <516BCEE5 dot 9070809 at linaro dot org> <yw1x8v4k6rcc dot fsf at unicorn dot mansr dot com> <CANu=DmjJUZ319+7_M8cyxMga_rYxbGb_QSs87Q29JBdkKX_97g at mail dot gmail dot com> <20130418093900 dot GA3653 at domone dot kolej dot mff dot cuni dot cz>
On 18 April 2013 10:39, OndÅej BÃlka <neleai@seznam.cz> wrote:
> On Mon, Apr 15, 2013 at 11:38:49AM +0100, Will Newton wrote:
>> On 15 April 2013 11:06, MÃns RullgÃrd <mans@mansr.com> wrote:
>>
>> Hi MÃns,
>>
>> >> Add a high performance memcpy routine optimized for Cortex-A15 with
>> >> variants for use in the presence of NEON and VFP hardware, selected
>> >> at runtime using indirect function support.
>> >
>> > How does this perform on Cortex-A9?
>>
>> The code is also faster on A9 although the gains are not quite as
>> pronounced. A set of numbers is attached (they linewrap pretty
>> horribly inline).
>>
>>
> I forget to ask where to get benchmark source. Without it there is no
> way to tell if it was done correctly.
> You must randomly vary sizes in range n..2n and also vary alignments.
The benchmark is taken from the cortex-strings package:
https://launchpad.net/cortex-strings
I wrote a wrapper around the benchmark to vary alignment in {1, 2, 4,
8} and a variety of block lengths between 8 and 200.
--
Will Newton
Toolchain Working Group, Linaro