This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] powerpc: Optimized strcmp for POWER8/PPC64


On Wed, Jan 07, 2015 at 03:10:25PM -0200, Adhemerval Zanella wrote:
> On 07-01-2015 14:30, Ondrej Bilka wrote:
> >> +	/* For short string up to 16 bytes, load both s1 and s2 using
> >> +	   unaligned dwords and compare.  */
> >> +	ld	r8,0(r3)
> >> +	ld	r10,0(r4)
> >> +	li	r9,0
> >> +	cmpb	r7,r8,r9
> >> +	cmpdi	cr7,r7,0
> >> +	mr	r9,r7
> >> +	bne 	cr7,L(null_found)
> >> +	cmpld	cr7,r8,r10
> >> +	bne	cr7,L(different)
> >> +	ld	r8,8(r3)
> >> +	ld	r10,8(r4)
> >> +	cmpb	r9,r8,r7
> >> +	cmpdi	cr7,r9,r0
> >> +	bne	cr7,L(null_found)
> >> +	cmpld	cr7,r8,r10
> >> +	bne	cr7,L(different)
> >> +	addi	r7,r3,16
> >> +	addi	r4,r4,16
> > It makes no sense to do two separate checks which create pretty unpredictable branches.
> >
> > Just or these two check and look at first nonzero byte. Either they differ at that offset or both are zero and easily get result from that.
> >
> Which two checks are you referring exactly? The first two:
> 
> +	ld	r8,0(r3)
> +	ld	r10,0(r4)
> +	li	r9,0
> +	cmpb	r7,r8,r9
> +	cmpdi	cr7,r7,0
> +	mr	r9,r7
> +	bne 	cr7,L(null_found)
> +	cmpld	cr7,r8,r10
> +	bne	cr7,L(different)
> 
> First cmpb instruction is not a branch instruction (and thus has no affect on 
> branch prediction).  Also, in this code is it has to check for NULL first 
> before start to check different bytes at second dword.  For instance, for
> strings:
>

No I am refering that to 
bne cr7,L(null_found) 
and 
bne cr7,L(different)
 
you do need two nearly identical branches, just create mask that detects
both 0 and difference.

On x64 first 16 bytes are handled using this trick, you could replace
bytewise minimum there with bytewise and.

        pxor    %xmm2, %xmm2
        movdqu  (%rdi), %xmm1
        movdqu  (%rsi), %xmm0
        pcmpeqb %xmm1, %xmm0
        pminub  %xmm1, %xmm0
        pcmpeqb %xmm2, %xmm0
        pmovmskb        %xmm0, %eax
        testq   %rax, %rax
        je      L(next_48_bytes)
L(return):
        bsfq    %rax, %rdx
        movzbl  (%rdi, %rdx), %eax
        movzbl  (%rsi, %rdx), %edx
        subl    %edx, %eax
        ret


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]