This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][AArch64] Adjust writeback in non-zero memset


On 06/11/2018 14:42, Wilco Dijkstra wrote:
> This fixes an ineffiency in the non-zero memset.  Delaying the writeback
> until the end of the loop is slightly faster on some cores - this shows
> ~5% performance gain on Cortex-A53 when doing large non-zero memsets.
> 
> Tested against the GLIBC testsuite.

Thanks, pushed.

R.

> 
> ---
> 
> diff --git a/newlib/libc/machine/aarch64/memset.S b/newlib/libc/machine/aarch64/memset.S
> index 799e7b7874a397138c5c85cfa2adb85f63c94cef..7c8fe583bf88722d73b90ec470c72b509e5be137 100644
> --- a/newlib/libc/machine/aarch64/memset.S
> +++ b/newlib/libc/machine/aarch64/memset.S
> @@ -142,10 +142,10 @@ L(set_long):
>  	b.eq	L(try_zva)
>  L(no_zva):
>  	sub	count, dstend, dst	/* Count is 16 too large.  */
> -	add	dst, dst, 16
> +	sub	dst, dst, 16		/* Dst is biased by -32.  */
>  	sub	count, count, 64 + 16	/* Adjust count and bias for loop.  */
> -1:	stp	q0, q0, [dst], 64
> -	stp	q0, q0, [dst, -32]
> +1:	stp	q0, q0, [dst, 32]
> +	stp	q0, q0, [dst, 64]!
>  L(tail64):
>  	subs	count, count, 64
>  	b.hi	1b
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]