This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 15/27] S390: Optimize strcmp and wcscmp.
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Stefan Liebler <stli at linux dot vnet dot ibm dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Fri, 26 Jun 2015 15:27:38 +0200
- Subject: Re: [PATCH 15/27] S390: Optimize strcmp and wcscmp.
- Authentication-results: sourceware.org; auth=none
- References: <1435319512-22245-1-git-send-email-stli at linux dot vnet dot ibm dot com> <1435319512-22245-16-git-send-email-stli at linux dot vnet dot ibm dot com>
On Fri, Jun 26, 2015 at 01:51:40PM +0200, Stefan Liebler wrote:
> This patch provides optimized versions of strcmp and wcscmp with the z13
> vector instructions.
>
> The architecture specific string.h had a typo, which leads to ommiting the
> inline version in this file if __USE_STRING_INLINES is defined.
> Tested this inline version by tweaking test-strcmp.c.
>
> + lghi %r5,0 /* current_len = 0. */
> +
> +.Lloop:
> + vlbb %v16,0(%r5,%r2),6 /* Load s1 to block boundary. */
> + vlbb %v17,0(%r5,%r3),6 /* Load s2 to block boundary. */
> + lcbb %r1,0(%r5,%r2),6 /* Get loaded byte count of s1. */
> + jo .Llt16_1 /* Jump away if vr is not fully loaded. */
> + lcbb %r4,0(%r5,%r3),6
> + jo .Llt16_2 /* Jump away if vr is not fully loaded. */
> + /* Both vrs are fully loaded. */
Whats block boundary? I think that checks for page would be faster but
need data to see how bad would be branch misprediction.
Also you should align x after reading first 16 bytes. For x64 a best way
was compute number of 64 byte blocks until y crosses cache line and
handle that separately. You could reuse that counter in strncmp like I
did in my recent submission.