This is the mail archive of the
newlib@sourceware.org
mailing list for the newlib project.
Re: [PATCH][AArch64] Adjust writeback in non-zero memset
- From: "Richard Earnshaw (lists)" <Richard dot Earnshaw at arm dot com>
- To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, "newlib at sourceware dot org" <newlib at sourceware dot org>
- Cc: nd <nd at arm dot com>
- Date: Tue, 6 Nov 2018 15:01:35 +0000
- Subject: Re: [PATCH][AArch64] Adjust writeback in non-zero memset
- References: <DB5PR08MB1030B6D34743A65E1BF4FEBC83CB0@DB5PR08MB1030.eurprd08.prod.outlook.com>
On 06/11/2018 14:42, Wilco Dijkstra wrote:
> This fixes an ineffiency in the non-zero memset. Delaying the writeback
> until the end of the loop is slightly faster on some cores - this shows
> ~5% performance gain on Cortex-A53 when doing large non-zero memsets.
>
> Tested against the GLIBC testsuite.
Thanks, pushed.
R.
>
> ---
>
> diff --git a/newlib/libc/machine/aarch64/memset.S b/newlib/libc/machine/aarch64/memset.S
> index 799e7b7874a397138c5c85cfa2adb85f63c94cef..7c8fe583bf88722d73b90ec470c72b509e5be137 100644
> --- a/newlib/libc/machine/aarch64/memset.S
> +++ b/newlib/libc/machine/aarch64/memset.S
> @@ -142,10 +142,10 @@ L(set_long):
> b.eq L(try_zva)
> L(no_zva):
> sub count, dstend, dst /* Count is 16 too large. */
> - add dst, dst, 16
> + sub dst, dst, 16 /* Dst is biased by -32. */
> sub count, count, 64 + 16 /* Adjust count and bias for loop. */
> -1: stp q0, q0, [dst], 64
> - stp q0, q0, [dst, -32]
> +1: stp q0, q0, [dst, 32]
> + stp q0, q0, [dst, 64]!
> L(tail64):
> subs count, count, 64
> b.hi 1b
>