This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: memcmp-sse4.S EqualHappy bug


On Thu, Jun 18, 2015 at 08:05:17PM +0200, OndÅej BÃlka wrote:
> I see now. As I am writing new memcmp I don't see that likely, as it
> adds extra overhead thats hard to justify.
> 
> Rereading is needed for good performance, a loop checks 64 bytes at
> once and sse2 uses destructive operation so original data wont be there.
> 
> A best workaround would be add after final subtraction check if its zero
> then call 
> memcmp(x+found+1, y+found+1, remaining)
> 
> That could be almost free as you need to just add je branch after
> subtraction.

Yes, it's free for the unrolled loop, just the breakout of the
unrolled loop needs to adjust rdx in addition of rsi/rdi to be able to
check it to see if it's at the end before returning zero.

> However now I need test new way to check first 16 bytes that would avoid
> rereading. Problem that it would scale worse when you need combine
> results into two 64-byte masks instead one.
> 
> mov %rdx, %rcx
> neg %rcx
> movdqu (%rsi), %xmm0
> movdqu (%rdi), %xmm1
> movdqu %xmm0, %xmm2
> pcmpgtb %xmm1, %xmm0
> pcmpgtb %xmm2, %xmm1
> pmovmskb %xmm0, %eax
> pmovmskb %xmm1, %edx
> bswap %eax
> bswap %edx

The unrolled loop I think it's faster if it does ptest only, those are
plenty more sse4 instructions than current code does in the sse4
part. I guess it's likely measurably slower, but then mine is just a
guess and I haven't benchmarked.

Just saying the problem is not re-reading, re-reading is fine. Above
zero or below zero values would be undefined anyway no matter if we
re-read or not. The only defined thing is the function cannot return 0
and it currently does.

> shr %cl, %eax
> shr %cl, %edx
> cmp $-16, %rcx
> jae L(next_48_bytes)
> sub %edx, %eax
> L(ret):
> ret
> L(next_48_bytes):
> sub %edx, %eax
> jne L(ret)
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]