This is the mail archive of the libc-ports@sources.redhat.com mailing list for the libc-ports project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Optimize MIPS memcpy

From: Maxim Kuvyrkov <maxim at codesourcery dot com>
To: Steve Ellcey <sellcey at mips dot com>
Cc: Andrew T Pinski <pinskia at gmail dot com>, "Joseph S. Myers" <joseph at codesourcery dot com>, <libc-ports at sourceware dot org>
Date: Tue, 9 Oct 2012 11:30:47 +1300
Subject: Re: [PATCH] Optimize MIPS memcpy
References: <5044746c.23eb440a.75e2.618f@mx.google.com> <1346771341.14333.20.camel@ubuntu-sellcey> <596797ED-6575-456D-98FD-C13A209DBC49@mentor.com> <1346948701.14333.152.camel@ubuntu-sellcey> <F265181B-6B5C-401B-B7CC-DC206B601795@codesourcery.com> <1347376645.14333.319.camel@ubuntu-sellcey> <F0856A02-839F-45C7-BF93-AA59E030A015@codesourcery.com> <1348166309.6170.55.camel@ubuntu-sellcey> <25105334-8813-4532-AC0E-B3A44BE69A19@codesourcery.com> <5B30D440-A918-4352-8DED-A7D681DF0338@codesourcery.com> <1349715796.30194.131.camel@ubuntu-sellcey>

On 9/10/2012, at 6:03 AM, Steve Ellcey wrote:

> On Sat, 2012-10-06 at 17:43 +1300, Maxim Kuvyrkov wrote:
> 
>> Steve and I have debugged these failures and they now seem to be resolved.  I'll let Steve to followup with analysis and a new patch.
>> 
>> Meanwhile, I've benchmarked Steve's patch against mine.  On the benchmark that I use both implementations provide equal performance for N64 ABI, but on N32 ABI Steve's patch is only half as fast.  This is, probably, due to using 4-byte operations instead of 8-byte operations for N32 ABI:
>> 
>> #if _MIPS_SIM == _ABI64
>> #define USE_DOUBLE
>> #endif
>> 
>> It should be easy to improve Steve's patch for N32 ABI.  Steve, will you look into that?
>> 
>> I would also appreciate if you look into making your version of memcpy memmove-safe, if it is not already.
>> 
>> Thank you,
>> 
>> --
>> Maxim Kuvyrkov
>> CodeSourcery / Mentor Graphics
> 
> Maxim, do you know if your test is doing a memcpy on overlapping memory?
> While our analysis showed that the problem was due to the use of the
> 'prepare to store' prefetch hint, the code I sent earlier should have
> worked fine for any code that was not doing an overlapping memcpy.

The test does not use overlapping memcpy.

> 
> For anyone who may be interested, the 'prepare for store' prefetch hint
> is different then other 'safe' prefetches which can be executed or
> ignored without affecting the results of the code being executed. 
> 
> Instead of bringing a chunk of memory into the cache, it simply
> allocates a line of cache for use and zeros it out.  If you write to
> every byte of that line of cache, you are OK.  But if you use the
> 'prepare to store' cache hint and do not write to the entire cache line
> then the bytes you don't write to get written back to memory as zeros,
> overwriting whatever was there before.  The code in my memcpy routine
> accounts for this, by checking the length of the buffer before doing the
> 'prepare to store' prefetches and only using them when it knows that it
> is going to write to the entire cache line.

Can there be a bug in logic that decides that a prepare-for-store prefetch is safe?

I've checked documentation for XLP (which is the target I'm using for testing) and it specifies 32-byte prefetch.

> 
> The other issue though is if the source and destination of the memcpy
> overlap and if you use the prepare to store prefetch on a memory address
> that is also part of the source of the memcpy you will get incorrect
> results.  That means that if we want to have memcpy be 'memmove-safe'
> we cannot use the 'prepare to store' hint.

I don't think this is a concern.  Memmove will use memcpy only if the memory locations don't overlap.  And for the record's sake, I'm testing without the memcpy-in-memmove patch.

> 
> I will fix the code to use double loads and stores with the N32 ABI
> and add comments about the 'prepare to store' hint.  I hate to give up
> on using the 'prepare for store' prefetch hint, since it does result in
> the best peformance,  but given the various issues maybe it is not the
> best idea for glibc.

I too want to keep prepare-for-store prefetches is possible.  For debugging purposes you could amend prepare-for-store prefetch macros to trigger a loop that would unconditionally clobber memory locations that prepare-for-store is expected to zero-out.  Or add some other assertions to help out with debugging.

Thanks,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics

Follow-Ups:
- Re: [PATCH] Optimize MIPS memcpy
  - From: Steve Ellcey
- Re: [PATCH] Optimize MIPS memcpy
  - From: Steve Ellcey

References:
- Re: [PATCH] Optimize MIPS memcpy
  - From: Maxim Kuvyrkov
- Re: [PATCH] Optimize MIPS memcpy
  - From: Steve Ellcey

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]