This is the mail archive of the libc-ports@sources.redhat.com mailing list for the libc-ports project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] ARM: NEON detected memcpy.


On Wed, Apr 03, 2013 at 09:15:46AM +0100, Will Newton wrote:
> On 3 April 2013 08:58, Shih-Yuan Lee (FourDollars) <sylee@canonical.com> wrote:
> > Hi,
> >
> > I am working on the NEON detected memcpy.
> > This is based on what Siarhei Siamashka did at 2009 [1].
> >
> > The idea is to use HWCAP and check NEON bit.
> > If there is a NEON bit, using NEON optimized memcpy.
> > If not, using the original memcpy instead.
> >
> > If using NEON optimized memcpy, the performance of memcpy will be
> > raised up by about 50% [2].
> >
> > How do you think about this idea? Any comment is welcome.
> 
> Hi,
> 
> I am working on a similar project within Linaro, which is to add the
> NEON/VFP capable memcpy from cortex-strings[1] to glibc. However I am
> looking at enabling it at runtime via indirect functions which makes
> it slightly more complex than just importing the cortex strings code,
> so I don't have any patches to show you just yet.
> 
> [1] https://launchpad.net/cortex-strings

Hi,

You need to optimize header beacuse you typically copy less than 128 bytes.

My measurement how many 16 byte blocks are used is here.
http://kam.mff.cuni.cz/~ondra/benchmark_string/profile/result.html

If I had code to get number of cycles from perf counter I could provide
tool to see memcpy performance in arbitrary binary.

On x64 I used overlapping load/store to minimize branches. Try how attached
memcpy works on small inputs.

Attachment: memcpy_generic.c
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]