This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: memcmp-sse4.S EqualHappy bug
- From: Simo Sorce <simo at redhat dot com>
- To: Torvald Riegel <triegel at redhat dot com>
- Cc: "Dr. David Alan Gilbert" <dgilbert at redhat dot com>, Andrea Arcangeli <aarcange at redhat dot com>, "Carlos O'Donell" <carlos at redhat dot com>, Szabolcs Nagy <nsz at port70 dot net>, libc-alpha at sourceware dot org, "H.J. Lu" <hongjiu dot lu at intel dot com>
- Date: Fri, 19 Jun 2015 13:59:57 -0400
- Subject: Re: memcmp-sse4.S EqualHappy bug
- Authentication-results: sourceware.org; auth=none
- References: <20150618145202 dot GG14955 at redhat dot com> <1434642635 dot 5250 dot 292 dot camel at localhost dot localdomain> <20150618161943 dot GN14955 at redhat dot com> <20150618172231 dot GS14955 at redhat dot com> <1434649785 dot 30819 dot 37 dot camel at localhost dot localdomain> <20150618181219 dot GL2248 at work-vm> <55841B6B dot 10104 at redhat dot com> <20150619140710 dot GA14955 at redhat dot com> <1434724946 dot 2716 dot 88 dot camel at willson dot usersys dot redhat dot com> <1434727945 dot 3061 dot 51 dot camel at localhost dot localdomain> <20150619155907 dot GF2147 at work-vm> <1434733019 dot 3061 dot 68 dot camel at localhost dot localdomain> <1434734186 dot 2716 dot 100 dot camel at willson dot usersys dot redhat dot com> <1434734325 dot 2716 dot 102 dot camel at willson dot usersys dot redhat dot com> <1434735651 dot 3061 dot 87 dot camel at localhost dot localdomain> <1434736520 dot 2716 dot 106 dot camel at willson dot usersys dot redhat dot com>
On Fri, 2015-06-19 at 13:55 -0400, Simo Sorce wrote:
> On Fri, 2015-06-19 at 19:40 +0200, Torvald Riegel wrote:
> > On Fri, 2015-06-19 at 13:18 -0400, Simo Sorce wrote:
> > > On Fri, 2015-06-19 at 13:16 -0400, Simo Sorce wrote:
> > > > On Fri, 2015-06-19 at 18:56 +0200, Torvald Riegel wrote:
> > > > > On Fri, 2015-06-19 at 16:59 +0100, Dr. David Alan Gilbert wrote:
> > > > > > * Torvald Riegel (triegel@redhat.com) wrote:
> > > > > > > On Fri, 2015-06-19 at 10:42 -0400, Simo Sorce wrote:
> > > > > > > > On Fri, 2015-06-19 at 16:07 +0200, Andrea Arcangeli wrote:
> > > > > > > > > On Fri, Jun 19, 2015 at 09:38:51AM -0400, Carlos O'Donell wrote:
> > > > > > > > > > I agree with aborting, but only as long as the hot path's performance
> > > > > > > > > > is not impacted and I haven't thought about how to do that.
> > > > > > > > > >
> > > > > >
> > > > > > > > Clearly people are better using atomic comparisons on canary values
> > > > > > > > instead, but it seem easy to avoid false positives (returning 0 when
> > > > > > > > memory is clearly different) and keep these things working, so why not
> > > > > > > > do it ?
> > > > > > >
> > > > > > > I see two separate issues here. First, where do we draw the line, and
> > > > > > > what do we guarantee. I strongly believe that programs must not have
> > > > > > > data races, and that they should use atomics or other synchronization
> > > > > > > properly (this doesn't mean locks, but relaxed memory order atomics, for
> > > > > > > example).
> > > > > > > Second, do we try to keep buggy programs working. If it has no cost to
> > > > > > > do so (e.g., like it might be true in this case), then doing that can
> > > > > > > help to trigger less errors. But that doesn't mean the buggy programs
> > > > > > > should get fixed eventually.
> > > > > >
> > > > > > I find it difficult to understand the boundaries of what the C library
> > > > > > is allowed to do in this type of optimisation.
> > > > > >
> > > > > > For example, consider the following:
> > > > > >
> > > > > > char a[128];
> > > > > > char b[128];
> > > > > >
> > > > > > put some static data in a[0-63]
> > > > > > put some static data in b[0-63]
> > > > > > a[64]=0;
> > > > > > b[64]=0;
> > > > > > start a thread doing stuff in a[65..]
> > > > > > start a thread doing stuff in b[65..]
> > > > > >
> > > > > > if (!strcmp(a,b)) {
> > > > > > /* Do something */
> > > > > > }
> > > > > >
> > > > > > a) Is that behaviour defined?
> > > > >
> > > > > Good question. I think it should be. This depends on both the data
> > > > > race definition and what strcmp/strncmp/memcmp are specified to do using
> > > > > the abstract machine.
> > > > >
> > > > > The data race definition uses memory locations as granularity, which is
> > > > > in 3.14 in C11. Separate characters in an array should be separate
> > > > > memory locations.
> > > > >
> > > > > C11 isn't very specific regarding strcmp, and just says that it
> > > > > "compares the string[s]" (7.24.4.2). C++14 is a bit more specific
> > > > > regarding basic_string::compare (21.4.7.9), saying that first the length
> > > > > of the strings are determined, and then a strncmp is run using the
> > > > > smaller of the two lengths.
> > > > >
> > > > > Assuming the C++ specs, only the array indices [0..64] should be
> > > > > accessed by the abstract machine. So no data race with the stuff going
> > > > > on in [65..).
> > > > >
> > > > > > b) Is it defined with strncmp(a,b,64) ?
> > > > >
> > > > > Yes.
> > > > >
> > > > > > c) Is it defined with strncmp(a,b,128)?
> > > > >
> > > > > Not sure. C11 says that not more than "n characters" are compared, and
> > > > > characters that follow the a null character aren't compared either.
> > > > > This indicates it wouldn't be different from strncmp(a,b,64) in the
> > > > > particular case.
> > > > > Regarding C++11, I'm not sure. The closest copies a substring
> > > > > (conceptually) and then compares, but the substring copying has to
> > > > > determine length of the string and then subtracting the max length.
> > > > > This would do a strlen first, which wouldn't access past index 64.
> > > > > Thus, should be fine too.
> > > > >
> > > > > > d) Is it defined with memcmp(a,b,64)?
> > > > >
> > > > > No data race, IMO.
> > > > >
> > > > > > e) Is it defined with memcmp(a,b,128)?
> > > > >
> > > > > Data race. Undefined behavior.
> > > > >
> > > > > > f) If I moved that boundary off a nice round % 8 mark would
> > > > > > it matter?
> > > > >
> > > > > No difference as far as the standard is concerned.
> > > > >
> > > > > > I can imagine there may be lots of things that terminate
> > > > > > a string and let other stuff happen in the allocated space
> > > > > > after the end of the string in the belief that at the end
> > > > > > of that string all is unknown. Andrea's case is a bit different
> > > > > > in that it's the later data that's static, but that doesn't
> > > > > > sound like it should change the answer as to what's allowed.
> > > > >
> > > > > I think it does, because the question is whether there is a data race on
> > > > > the memory locations that the abstract machine would access.
> > > >
> > > > Well given we are making examples.
> > > >
> > > > Assume 2 structures like this:
> > > >
> > > > struct test {
> > > > void *pointer;
> > > > char start[16];
> > > > char end[240];
> > > > }
> > > >
> > > > and
> > > >
> > > > struct test a;
> > > > struct test b;
> > > >
> > > > memset(a.end, 1, 240);
> > > > memset(b.end, 2, 240);
> > > >
> > > > In what case it would be expected/legal for
> > > > memcmp(a.start, b.start, 256); to ever return 0 ?
> > >
> > > Just to be clear, if there are data-races in [ab].pointer or [as].start
> > > I see absolutely no problem in stopping short and returning a
> > > positive/negative value. I just do not see how returning 0 would ever
> > > make sense.
> >
> > It would be legal if the invocation of memcmp and at least one of the
> > invocations of memset would be concurrent (ie, not ordered by
> > happens-before).
>
> Nope, sorry , there is a misunderstanding, consider the memsets part of
> initialization, *never* run in parallel with memcmp.
>
> > I do see that this wouldn't be an intuitive outcome if one reasons based
> > on assuming some simplified (or straightforward, or natural, or whatever
> > the personal opinion might be...) implementation. But a data race is
> > undefined behavior, and that's where we're drawing the line.
>
> Sure, I can understand that is the *whole* are under consideration is
Sorry for the typos, the above should read ".. that IF the *whole* AREA
under consideration ..."
> racing. But here we are talking about a large chunk of the area that is
> different and *never* raced upon.
>
> I can accept that the C standard may still decide this is undefined
> behavior even if just one small part of the total area sees concurrent
> modifications, but again, you are returning "short" from the optimized
> code, so it is unclear to me how you can return 0 when the memcmp code
> hasn't checked all the memory, wouldn't it make more sense to return the
> current position ? It is still undefined but at least it won't cause
> false positives.
>
> Simo.
>
--
Simo Sorce * Red Hat, Inc * New York