This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 15/27] S390: Optimize strcmp and wcscmp.


On Fri, Jun 26, 2015 at 01:51:40PM +0200, Stefan Liebler wrote:
> This patch provides optimized versions of strcmp and wcscmp with the z13
> vector instructions.
> 
> The architecture specific string.h had a typo, which leads to ommiting the
> inline version in this file if __USE_STRING_INLINES is defined.
> Tested this inline version by tweaking test-strcmp.c.
> 
> +	lghi	%r5,0		/* current_len = 0.  */
> +
> +.Lloop:
> +	vlbb	%v16,0(%r5,%r2),6 /* Load s1 to block boundary.  */
> +	vlbb	%v17,0(%r5,%r3),6 /* Load s2 to block boundary.  */
> +	lcbb	%r1,0(%r5,%r2),6 /* Get loaded byte count of s1.  */
> +	jo	.Llt16_1	/* Jump away if vr is not fully loaded.  */
> +	lcbb	%r4,0(%r5,%r3),6
> +	jo	.Llt16_2	/* Jump away if vr is not fully loaded.  */
> +	/* Both vrs are fully loaded.  */

Whats block boundary? I think that checks for page would be faster but
need data to see how bad would be branch misprediction.

Also you should align x after reading first 16 bytes. For x64 a best way 
was compute number of 64 byte blocks until y crosses cache line and
handle that separately. You could reuse that counter in strncmp like I
did in my recent submission.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]