This is the mail archive of the
libc-ports@sources.redhat.com
mailing list for the libc-ports project.
Re: [PATCH] ARM: Add Cortex-A15 optimized NEON and VFP memcpy routines, with IFUNC.
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Will Newton <will dot newton at linaro dot org>
- Cc: libc-ports at sourceware dot org, Patch Tracking <patches at linaro dot org>
- Date: Mon, 15 Apr 2013 15:38:29 +0200
- Subject: Re: [PATCH] ARM: Add Cortex-A15 optimized NEON and VFP memcpy routines, with IFUNC.
- References: <516BCEE5 dot 9070809 at linaro dot org> <CANu=DmhNPNDCy8mMw6q41+kA_WDMPRXWqq2kuzNOgfCB3wfQ6g at mail dot gmail dot com> <20130415102327 dot GA7032 at domone dot kolej dot mff dot cuni dot cz> <CANu=Dmig+mxXWNc_c7tJZr4wuhYdQvFdADrLp0eDhapeCNvuvw at mail dot gmail dot com>
On Mon, Apr 15, 2013 at 11:59:27AM +0100, Will Newton wrote:
> On 15 April 2013 11:23, OndÅej BÃlka <neleai@seznam.cz> wrote:
> > On Mon, Apr 15, 2013 at 11:01:37AM +0100, Will Newton wrote:
> >> Attached are a set of benchmarks of the new code versus the existing
> >> memcpy implementation on a Cortex-A15 platform.
> >>
> >
> > As I wrote at previous thread:
> >
> > On Thu, Apr 04, 2013 at 08:37:01AM +0200, OndÅej BÃlka wrote:
> >> Try also benchmark with real world data (20MB). I put it on
> >> http://kam.mff.cuni.cz/~ondra/dryrun_memcpy.tar.bz2
> >>
> >> To add neon copy test_generic.c file and add compiling neon
> >> implementation to benchmark script.
> >>
> >> It now only measures total time.
> >> I would need something like timestamp counter for more detailed
> >> results.
> >
> > How good it fares on my benchmark?
>
> It wasn't clear to me how to integrate my code and run the tests - I
> built a version of replay.c with each memcpy implementation and the
> new one ran in 20% less time, but I don't know if I did that
> correctly.
>
Nice, this looks correct as that big improvement cannot happen by
chance.
Do you plan improve memset in same way? First step would be take memcpy
and replace loads with zero register.
Ondra