This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PowerPC: memset optimization for POWER8/PPC64


On 07/18/2014 06:27 AM, Adhemerval Zanella wrote:
> +	andi.	r11,r10,r15	/* Check alignment of DST.  */

s/r15/15/

I had to read that line several times before I noticed the I in ANDI, and that
this wasn't in fact a read of the uninitialzed r15.  (Stupid ppc
non-enforcement of registers vs integers syntax...)

> +	mtocrf	0x01,r0
> +	clrldi	r0,r0,60
> +
> +	/* Get DST aligned to 16 bytes.  */
> +1:	bf	31,2f
> +	stb	r4,0(r10)
> +	addi	r10,r10,1
> +
> +2:	bf	30,4f
> +	sth	r4,0(r10)
> +	addi	r10,r10,2
> +
> +4:	bf	29,8f
> +	stw	r4,0(r10)
> +	addi	r10,r10,4
> +
> +8:	bf      28,16f
> +	std     r4,0(r10)
> +	addi    r10,r10,8
> +
> +16:	subf	r5,r0,r5

As clever as this is, surely it is less efficient than using the unaligned
store hardware.  You know that there are at least 32 bytes to be written; you
could just do two unaligned std and then realign.

> +	/* Write remaining 1~31 bytes.  */
> +	.align  4
> +L(tail_bytes):
> +	beqlr   cr6
> +
> +	srdi    r7,r11,4
> +	clrldi  r8,r11,60
> +	mtocrf  0x01,r7

Likewise.


r~


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]