This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] powerpc: Optimized strcmp for POWER8/PPC64
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- Cc: Ondrej Bilka <ondra at kam dot mff dot cuni dot cz>, "GNU C. Library" <libc-alpha at sourceware dot org>
- Date: Wed, 7 Jan 2015 20:32:25 +0100
- Subject: Re: [PATCH] powerpc: Optimized strcmp for POWER8/PPC64
- Authentication-results: sourceware.org; auth=none
- References: <54AD48A6 dot 9000105 at linux dot vnet dot ibm dot com> <20150107163026 dot GA24290 at kam dot mff dot cuni dot cz> <54AD6881 dot 2090900 at linux dot vnet dot ibm dot com>
On Wed, Jan 07, 2015 at 03:10:25PM -0200, Adhemerval Zanella wrote:
> On 07-01-2015 14:30, Ondrej Bilka wrote:
> >> + /* For short string up to 16 bytes, load both s1 and s2 using
> >> + unaligned dwords and compare. */
> >> + ld r8,0(r3)
> >> + ld r10,0(r4)
> >> + li r9,0
> >> + cmpb r7,r8,r9
> >> + cmpdi cr7,r7,0
> >> + mr r9,r7
> >> + bne cr7,L(null_found)
> >> + cmpld cr7,r8,r10
> >> + bne cr7,L(different)
> >> + ld r8,8(r3)
> >> + ld r10,8(r4)
> >> + cmpb r9,r8,r7
> >> + cmpdi cr7,r9,r0
> >> + bne cr7,L(null_found)
> >> + cmpld cr7,r8,r10
> >> + bne cr7,L(different)
> >> + addi r7,r3,16
> >> + addi r4,r4,16
> > It makes no sense to do two separate checks which create pretty unpredictable branches.
> >
> > Just or these two check and look at first nonzero byte. Either they differ at that offset or both are zero and easily get result from that.
> >
> Which two checks are you referring exactly? The first two:
>
> + ld r8,0(r3)
> + ld r10,0(r4)
> + li r9,0
> + cmpb r7,r8,r9
> + cmpdi cr7,r7,0
> + mr r9,r7
> + bne cr7,L(null_found)
> + cmpld cr7,r8,r10
> + bne cr7,L(different)
>
> First cmpb instruction is not a branch instruction (and thus has no affect on
> branch prediction). Also, in this code is it has to check for NULL first
> before start to check different bytes at second dword. For instance, for
> strings:
>
No I am refering that to
bne cr7,L(null_found)
and
bne cr7,L(different)
you do need two nearly identical branches, just create mask that detects
both 0 and difference.
On x64 first 16 bytes are handled using this trick, you could replace
bytewise minimum there with bytewise and.
pxor %xmm2, %xmm2
movdqu (%rdi), %xmm1
movdqu (%rsi), %xmm0
pcmpeqb %xmm1, %xmm0
pminub %xmm1, %xmm0
pcmpeqb %xmm2, %xmm0
pmovmskb %xmm0, %eax
testq %rax, %rax
je L(next_48_bytes)
L(return):
bsfq %rax, %rdx
movzbl (%rdi, %rdx), %eax
movzbl (%rsi, %rdx), %edx
subl %edx, %eax
ret