This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: PowerPC: memset optimization for POWER8/PPC64
- From: Richard Henderson <rth at twiddle dot net>
- To: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>, "GNU C. Library" <libc-alpha at sourceware dot org>
- Date: Fri, 18 Jul 2014 09:20:34 -0700
- Subject: Re: PowerPC: memset optimization for POWER8/PPC64
- Authentication-results: sourceware.org; auth=none
- References: <53C920CD dot 8030506 at linux dot vnet dot ibm dot com>
On 07/18/2014 06:27 AM, Adhemerval Zanella wrote:
> + andi. r11,r10,r15 /* Check alignment of DST. */
s/r15/15/
I had to read that line several times before I noticed the I in ANDI, and that
this wasn't in fact a read of the uninitialzed r15. (Stupid ppc
non-enforcement of registers vs integers syntax...)
> + mtocrf 0x01,r0
> + clrldi r0,r0,60
> +
> + /* Get DST aligned to 16 bytes. */
> +1: bf 31,2f
> + stb r4,0(r10)
> + addi r10,r10,1
> +
> +2: bf 30,4f
> + sth r4,0(r10)
> + addi r10,r10,2
> +
> +4: bf 29,8f
> + stw r4,0(r10)
> + addi r10,r10,4
> +
> +8: bf 28,16f
> + std r4,0(r10)
> + addi r10,r10,8
> +
> +16: subf r5,r0,r5
As clever as this is, surely it is less efficient than using the unaligned
store hardware. You know that there are at least 32 bytes to be written; you
could just do two unaligned std and then realign.
> + /* Write remaining 1~31 bytes. */
> + .align 4
> +L(tail_bytes):
> + beqlr cr6
> +
> + srdi r7,r11,4
> + clrldi r8,r11,60
> + mtocrf 0x01,r7
Likewise.
r~