This is the mail archive of the
libc-ports@sources.redhat.com
mailing list for the libc-ports project.
Re: [PATCH] Optimize MIPS memcpy
- From: Andrew T Pinski <pinskia at gmail dot com>
- To: Maxim Kuvyrkov <maxim_kuvyrkov at mentor dot com>
- Cc: "Joseph S. Myers" <joseph at codesourcery dot com>, libc-ports at sourceware dot org
- Date: Mon, 03 Sep 2012 02:12:10 -0700
- Subject: Re: [PATCH] Optimize MIPS memcpy
- References: <C54042E9-C90C-46D1-A382-A0895AC9EFE3@mentor.com>
Forgot to CC libc-ports@ .
On Sat, 2012-09-01 at 18:15 +1200, Maxim Kuvyrkov wrote:
> This patch improves MIPS assembly implementations of memcpy. Two optimizations are added: prefetching of data for subsequent iterations of memcpy loop and pipelined expansion of unaligned memcpy. These optimizations speed up MIPS memcpy by about 10%.
>
> The prefetching part is straightforward: it adds prefetching of a cache line (32 bytes) for +1 iteration for unaligned case and +2 iteration for aligned case. The rationale here is that it will take prefetch to acquire data about same time as 1 iteration of unaligned loop or 2 iterations of aligned loop. Values for these parameters were tuned on a modern MIPS processor.
>
This might hurt Octeon as the cache line size there is 128 bytes. Can
you say which modern MIPS processor which this has been tuned with? And
is there a way to not hard code 32 in the assembly but in a macro
instead.
Thanks,
Andrew Pinski
> The pipelined expansion of unaligned loop is implemented in a similar fashion as expansion of the aligned loop. The assembly is tricky, but it works.
>
> These changes are almost 3 years old, and have been thoroughly tested in CodeSourcery MIPS toolchains. Retested with current trunk with no regressions for n32, n64 and o32 ABIs.
>
> OK to apply?
>
> --
> Maxim Kuvyrkov
> Mentor Graphics
>
>