This is the mail archive of the libc-ports@sources.redhat.com mailing list for the libc-ports project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Optimize MIPS memcpy


On 9/10/2012, at 6:03 AM, Steve Ellcey wrote:

> On Sat, 2012-10-06 at 17:43 +1300, Maxim Kuvyrkov wrote:
> 
>> Steve and I have debugged these failures and they now seem to be resolved.  I'll let Steve to followup with analysis and a new patch.
>> 
>> Meanwhile, I've benchmarked Steve's patch against mine.  On the benchmark that I use both implementations provide equal performance for N64 ABI, but on N32 ABI Steve's patch is only half as fast.  This is, probably, due to using 4-byte operations instead of 8-byte operations for N32 ABI:
>> 
>> #if _MIPS_SIM == _ABI64
>> #define USE_DOUBLE
>> #endif
>> 
>> It should be easy to improve Steve's patch for N32 ABI.  Steve, will you look into that?
>> 
>> I would also appreciate if you look into making your version of memcpy memmove-safe, if it is not already.
>> 
>> Thank you,
>> 
>> --
>> Maxim Kuvyrkov
>> CodeSourcery / Mentor Graphics
> 
> Maxim, do you know if your test is doing a memcpy on overlapping memory?
> While our analysis showed that the problem was due to the use of the
> 'prepare to store' prefetch hint, the code I sent earlier should have
> worked fine for any code that was not doing an overlapping memcpy.

The test does not use overlapping memcpy.

> 
> For anyone who may be interested, the 'prepare for store' prefetch hint
> is different then other 'safe' prefetches which can be executed or
> ignored without affecting the results of the code being executed. 
> 
> Instead of bringing a chunk of memory into the cache, it simply
> allocates a line of cache for use and zeros it out.  If you write to
> every byte of that line of cache, you are OK.  But if you use the
> 'prepare to store' cache hint and do not write to the entire cache line
> then the bytes you don't write to get written back to memory as zeros,
> overwriting whatever was there before.  The code in my memcpy routine
> accounts for this, by checking the length of the buffer before doing the
> 'prepare to store' prefetches and only using them when it knows that it
> is going to write to the entire cache line.

Can there be a bug in logic that decides that a prepare-for-store prefetch is safe?

I've checked documentation for XLP (which is the target I'm using for testing) and it specifies 32-byte prefetch.

> 
> The other issue though is if the source and destination of the memcpy
> overlap and if you use the prepare to store prefetch on a memory address
> that is also part of the source of the memcpy you will get incorrect
> results.  That means that if we want to have memcpy be 'memmove-safe'
> we cannot use the 'prepare to store' hint.

I don't think this is a concern.  Memmove will use memcpy only if the memory locations don't overlap.  And for the record's sake, I'm testing without the memcpy-in-memmove patch.

> 
> I will fix the code to use double loads and stores with the N32 ABI
> and add comments about the 'prepare to store' hint.  I hate to give up
> on using the 'prepare for store' prefetch hint, since it does result in
> the best peformance,  but given the various issues maybe it is not the
> best idea for glibc.

I too want to keep prepare-for-store prefetches is possible.  For debugging purposes you could amend prepare-for-store prefetch macros to trigger a loop that would unconditionally clobber memory locations that prepare-for-store is expected to zero-out.  Or add some other assertions to help out with debugging.

Thanks,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]