This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: memcmp-sse4.S EqualHappy bug
- From: Simo Sorce <ssorce at redhat dot com>
- To: Torvald Riegel <triegel at redhat dot com>
- Cc: Andrea Arcangeli <aarcange at redhat dot com>, "Carlos O'Donell" <carlos at redhat dot com>, "Dr. David Alan Gilbert" <dgilbert at redhat dot com>, Szabolcs Nagy <nsz at port70 dot net>, libc-alpha at sourceware dot org, "H.J. Lu" <hongjiu dot lu at intel dot com>
- Date: Fri, 19 Jun 2015 13:08:41 -0400
- Subject: Re: memcmp-sse4.S EqualHappy bug
- Authentication-results: sourceware.org; auth=none
- References: <1434621040 dot 5250 dot 212 dot camel at localhost dot localdomain> <20150618124900 dot GD14955 at redhat dot com> <1434637415 dot 5250 dot 271 dot camel at localhost dot localdomain> <20150618145202 dot GG14955 at redhat dot com> <1434642635 dot 5250 dot 292 dot camel at localhost dot localdomain> <20150618161943 dot GN14955 at redhat dot com> <20150618172231 dot GS14955 at redhat dot com> <1434649785 dot 30819 dot 37 dot camel at localhost dot localdomain> <20150618181219 dot GL2248 at work-vm> <55841B6B dot 10104 at redhat dot com> <20150619140710 dot GA14955 at redhat dot com> <1434724946 dot 2716 dot 88 dot camel at willson dot usersys dot redhat dot com> <1434727945 dot 3061 dot 51 dot camel at localhost dot localdomain>
It is true you identify 2 different issues, but I am not interested in
either, I merely care about failing safe.
On Fri, 2015-06-19 at 17:32 +0200, Torvald Riegel wrote:
> I see two separate issues here. First, where do we draw the line, and
> what do we guarantee. I strongly believe that programs must not have
> data races, and that they should use atomics or other synchronization
> properly (this doesn't mean locks, but relaxed memory order atomics,
> for example).
Programs should not have data races, true, but sometimes they do,
failing safe makes 'bad' programs not be worse.
> Second, do we try to keep buggy programs working. If it has no cost
> to do so (e.g., like it might be true in this case), then doing that
> can help to trigger less errors. But that doesn't mean the buggy
> programs should get fixed eventually.
Sure, but sometimes it is hard to define something as a bug.
In the specific case Andrea reported (which I pondered on), the race was
clearly there, but also clearly the underlying memory was never
identical at any given point, yet memcmp reported 0 which means: the
whole memory area was checked and matched. This was clearly not the
case, memcmp didn't even check the whole memory area, and returned 0.
Now, if the 2 memory areas ever where (even for a nanosecond) identical,
then returning zero would have been perfectly acceptable. During a race
you can't know what the result is. But a function that supposedly checks
a data area for equality and instead stops short and return "equal" seem
just wrong, and it is certainly unexpected.
I understand that the cause for that is the program is operating on part
of the area currently and it shouldn't but still, I would expect memcmp
to at least always go through the whole area it is expected to check and
not stop short on a fluke and return a "random" result.
Simo.