This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 27/27] S390: Optimize memrchr.


On Fri, Jun 26, 2015 at 01:51:52PM +0200, Stefan Liebler wrote:
> This patch provides optimized version of memrchr with the z13 vector
> instructions.
>> +
> +.Lfound_g16:
> +	vlgvb	%r1,%v17,7	/* Index of c or 16 if not found.  */
> +	lgr	%r2,%r5
> +	lghi	%r4,15		/* Highest index of a full vr.  */
> +	clr	%r1,%r4
> +.Lfound_lt16:
> +	la	%r2,0(%r1,%r2)	/* Store current pointer to found c.  */
> +	ber	%r14		/* Return if found c is last loaded byte.  */
> +
> +	/* Shift vector elements left and start search again.  */
> +	aghi	%r1,1		/* Start search after current index.  */
> +	slr	%r4,%r1		/* New highest index.  */
> +	sll	%r1,3		/* Calculate byte count for vector shift
> +				   left.  */
> +	vlvgg	%v17,%r1,0
> +	vslb	%v16,%v16,%v17	/* Vector shift left by byte by number of bytes
> +				   specified in bits 1-4 of byte 7 in v17.  */
> +	vfeeb	%v17,%v16,%v18	/* Find c.  */
> +	la	%r5,1(%r2)	/* Save start-address of shifted v16.  */
> +	vlgvb	%r1,%v17,7	/* Index of c or 16 if not found  */
> +	clr	%r1,%r4
> +	locgrle	%r2,%r5		/* Use stored address as base if c found.  */
> +	jle	.Lfound_lt16	/* Found c within loaded bytes.  */
> +	br	%r14		/* No further c found, return last stored c.  */
> +
> +.Lfound_end:
> +	la	%r2,0(%r4,%r2)	/* Return pointer to found c  */
> +	br	%r14
> +
This sequence looks slow. Both here and in strrchr you should use
something better than loop. One possibility is to use shuffle to do
16-byte byteswap but it needs load constant.

A second way is to use same trick as in calculating clz with ctz. If
instruction produces nonzero for match an 0 for nonmatch then to find
highest nonzero byte you first need or all lower bytes with it for
example by

x |= x >> 8;
x |= x >> 4;
x |= x >> 2;
x |= x >> 1;

Then you look for index of first zero byte which is one more than
highest nonzero byte. When roles of 1 and 0 are reversed and all nonzero
bytes have bit in common you could use &.



For strrchr idea is same. When handling trailing zero you could make
similar trick with zero mask 
x |= x << 8;
x |= x << 4;
x |= x << 2;
x |= x << 1;
x = ~x;
to construct vector that is 255 before and at first zero byte and 0 after
that. Its important that first zero will get 255 to handle strrchr(x,0)
without separate check if c is zero.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]