This is the mail archive of the libc-ports@sources.redhat.com mailing list for the libc-ports project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Optimize MIPS memcpy


Forgot to CC libc-ports@ .
On Sat, 2012-09-01 at 18:15 +1200, Maxim Kuvyrkov wrote:
> This patch improves MIPS assembly implementations of memcpy.  Two optimizations are added: prefetching of data for subsequent iterations of memcpy loop and pipelined expansion of unaligned memcpy.  These optimizations speed up MIPS memcpy by about 10%.
> 
> The prefetching part is straightforward: it adds prefetching of a cache line (32 bytes) for +1 iteration for unaligned case and +2 iteration for aligned case.  The rationale here is that it will take prefetch to acquire data about same time as 1 iteration of unaligned loop or 2 iterations of aligned loop.  Values for these parameters were tuned on a modern MIPS processor.
> 

This might hurt Octeon as the cache line size there is 128 bytes.  Can
you say which modern MIPS processor which this has been tuned with?  And
is there a way to not hard code 32 in the assembly but in a macro
instead.

Thanks,
Andrew Pinski


> The pipelined expansion of unaligned loop is implemented in a similar fashion as expansion of the aligned loop.  The assembly is tricky, but it works.
> 
> These changes are almost 3 years old, and have been thoroughly tested in CodeSourcery MIPS toolchains.  Retested with current trunk with no regressions for n32, n64 and o32 ABIs.
> 
> OK to apply?
> 
> --
> Maxim Kuvyrkov
> Mentor Graphics
> 
> 




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]