This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: memcmp-sse4.S EqualHappy bug

From: OndÅej BÃlka <neleai at seznam dot cz>
To: Andrea Arcangeli <aarcange at redhat dot com>
Cc: Szabolcs Nagy <nsz at port70 dot net>, libc-alpha at sourceware dot org, "H.J. Lu" <hongjiu dot lu at intel dot com>
Date: Thu, 18 Jun 2015 20:05:17 +0200
Subject: Re: memcmp-sse4.S EqualHappy bug
Authentication-results: sourceware.org; auth=none
References: <20150617172903 dot GC4317 at redhat dot com> <20150617185952 dot GE22285 at port70 dot net> <20150617201958 dot GA12298 at domone> <20150618155442 dot GI14955 at redhat dot com>

On Thu, Jun 18, 2015 at 05:54:42PM +0200, Andrea Arcangeli wrote:
> On Wed, Jun 17, 2015 at 10:19:58PM +0200, OndÅej BÃlka wrote:
> I fully understand your arguments about the standard and I expected
> this behavior was permitted.
> 
> I'm also pointing out we go a bit beyond in what we pretend from C
> with the READ_ONCE/WRITE_ONCE/volatile/asm("memory") to provide RCU
> (and to implement the spinlock/mutex). I just wanted to express my
> views on the practical aspects and how we could enforce that if a part
> of memory (a part that is separated by atomic granularity of the arch,
> a variable you need to know and isn't 1 byte minimum on alpha for
> example) the memcmp is well defined that can't return 0 (i.e. if it
> returns 0 it actually read all bytes of "length" parameter and at some
> point in time each byte individually was always equal, and the last
> part of the page is never changed and never equal).
> 
> I'm fine if no change is done, and it'd be great if at least the
> manpage of memcmp/bcmp is updated. If it was up to me though I'd
> prefer to fix this case so 0 isn't happily returned too soon
> unexpectedly, as the unrolled loop fast path won't require change.

I see now. As I am writing new memcmp I don't see that likely, as it
adds extra overhead thats hard to justify.

Rereading is needed for good performance, a loop checks 64 bytes at
once and sse2 uses destructive operation so original data wont be there.

A best workaround would be add after final subtraction check if its zero
then call 
memcmp(x+found+1, y+found+1, remaining)

That could be almost free as you need to just add je branch after
subtraction.

However now I need test new way to check first 16 bytes that would avoid
rereading. Problem that it would scale worse when you need combine
results into two 64-byte masks instead one.

mov %rdx, %rcx
neg %rcx
movdqu (%rsi), %xmm0
movdqu (%rdi), %xmm1
movdqu %xmm0, %xmm2
pcmpgtb %xmm1, %xmm0
pcmpgtb %xmm2, %xmm1
pmovmskb %xmm0, %eax
pmovmskb %xmm1, %edx
bswap %eax
bswap %edx
shr %cl, %eax
shr %cl, %edx
cmp $-16, %rcx
jae L(next_48_bytes)
sub %edx, %eax
L(ret):
ret
L(next_48_bytes):
sub %edx, %eax
jne L(ret)

Follow-Ups:
- Re: memcmp-sse4.S EqualHappy bug
  - From: Rich Felker
- Re: memcmp-sse4.S EqualHappy bug
  - From: Andrea Arcangeli

References:
- memcmp-sse4.S EqualHappy bug
  - From: Andrea Arcangeli
- Re: memcmp-sse4.S EqualHappy bug
  - From: Szabolcs Nagy
- Re: memcmp-sse4.S EqualHappy bug
  - From: OndÅej BÃlka
- Re: memcmp-sse4.S EqualHappy bug
  - From: Andrea Arcangeli

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]