This is the mail archive of the libc-ports@sources.redhat.com mailing list for the libc-ports project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Optimize MIPS memcpy

From: Steve Ellcey <sellcey at mips dot com>
To: Maxim Kuvyrkov <maxim at codesourcery dot com>
Cc: Andrew T Pinski <pinskia at gmail dot com>, "Joseph S. Myers" <joseph at codesourcery dot com>, <libc-ports at sourceware dot org>
Date: Mon, 8 Oct 2012 10:03:16 -0700
Subject: Re: [PATCH] Optimize MIPS memcpy
References: <5044746c.23eb440a.75e2.618f@mx.google.com> <1346771341.14333.20.camel@ubuntu-sellcey> <596797ED-6575-456D-98FD-C13A209DBC49@mentor.com> <1346948701.14333.152.camel@ubuntu-sellcey> <F265181B-6B5C-401B-B7CC-DC206B601795@codesourcery.com> <1347376645.14333.319.camel@ubuntu-sellcey> <F0856A02-839F-45C7-BF93-AA59E030A015@codesourcery.com> <1348166309.6170.55.camel@ubuntu-sellcey> <25105334-8813-4532-AC0E-B3A44BE69A19@codesourcery.com> <5B30D440-A918-4352-8DED-A7D681DF0338@codesourcery.com>

On Sat, 2012-10-06 at 17:43 +1300, Maxim Kuvyrkov wrote:

> Steve and I have debugged these failures and they now seem to be resolved.  I'll let Steve to followup with analysis and a new patch.
> 
> Meanwhile, I've benchmarked Steve's patch against mine.  On the benchmark that I use both implementations provide equal performance for N64 ABI, but on N32 ABI Steve's patch is only half as fast.  This is, probably, due to using 4-byte operations instead of 8-byte operations for N32 ABI:
> 
> #if _MIPS_SIM == _ABI64
> #define USE_DOUBLE
> #endif
> 
> It should be easy to improve Steve's patch for N32 ABI.  Steve, will you look into that?
> 
> I would also appreciate if you look into making your version of memcpy memmove-safe, if it is not already.
> 
> Thank you,
> 
> --
> Maxim Kuvyrkov
> CodeSourcery / Mentor Graphics

Maxim, do you know if your test is doing a memcpy on overlapping memory?
While our analysis showed that the problem was due to the use of the
'prepare to store' prefetch hint, the code I sent earlier should have
worked fine for any code that was not doing an overlapping memcpy.

For anyone who may be interested, the 'prepare for store' prefetch hint
is different then other 'safe' prefetches which can be executed or
ignored without affecting the results of the code being executed. 

Instead of bringing a chunk of memory into the cache, it simply
allocates a line of cache for use and zeros it out.  If you write to
every byte of that line of cache, you are OK.  But if you use the
'prepare to store' cache hint and do not write to the entire cache line
then the bytes you don't write to get written back to memory as zeros,
overwriting whatever was there before.  The code in my memcpy routine
accounts for this, by checking the length of the buffer before doing the
'prepare to store' prefetches and only using them when it knows that it
is going to write to the entire cache line.

The other issue though is if the source and destination of the memcpy
overlap and if you use the prepare to store prefetch on a memory address
that is also part of the source of the memcpy you will get incorrect
results.  That means that if we want to have memcpy be 'memmove-safe'
we cannot use the 'prepare to store' hint.

I will fix the code to use double loads and stores with the N32 ABI
and add comments about the 'prepare to store' hint.  I hate to give up
on using the 'prepare for store' prefetch hint, since it does result in
the best peformance,  but given the various issues maybe it is not the
best idea for glibc.

Steve Ellcey
sellcey@mips.com

Follow-Ups:
- Re: [PATCH] Optimize MIPS memcpy
  - From: Maxim Kuvyrkov

References:
- Re: [PATCH] Optimize MIPS memcpy
  - From: Maxim Kuvyrkov

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]