This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: memcmp-sse4.S EqualHappy bug


On Fri, 2015-06-19 at 18:56 +0200, Torvald Riegel wrote:
> On Fri, 2015-06-19 at 16:59 +0100, Dr. David Alan Gilbert wrote:
> > * Torvald Riegel (triegel@redhat.com) wrote:
> > > On Fri, 2015-06-19 at 10:42 -0400, Simo Sorce wrote:
> > > > On Fri, 2015-06-19 at 16:07 +0200, Andrea Arcangeli wrote:
> > > > > On Fri, Jun 19, 2015 at 09:38:51AM -0400, Carlos O'Donell wrote:
> > > > > > I agree with aborting, but only as long as the hot path's performance
> > > > > > is not impacted and I haven't thought about how to do that.
> > > > > > 
> > 
> > > > Clearly people are better using atomic comparisons on canary values
> > > > instead, but it seem easy to avoid false positives (returning 0 when
> > > > memory is clearly different) and keep these things working, so why not
> > > > do it ?
> > > 
> > > I see two separate issues here.  First, where do we draw the line, and
> > > what do we guarantee.  I strongly believe that programs must not have
> > > data races, and that they should use atomics or other synchronization
> > > properly (this doesn't mean locks, but relaxed memory order atomics, for
> > > example).
> > > Second, do we try to keep buggy programs working.  If it has no cost to
> > > do so (e.g., like it might be true in this case), then doing that can
> > > help to trigger less errors.  But that doesn't mean the buggy programs
> > > should get fixed eventually.
> > 
> > I find it difficult to understand the boundaries of what the C library
> > is allowed to do in this type of optimisation.
> > 
> > For example, consider the following:
> > 
> >     char a[128];
> >     char b[128];
> > 
> >     put some static data in a[0-63]
> >     put some static data in b[0-63]
> >     a[64]=0;
> >     b[64]=0;
> >     start a thread doing stuff in a[65..]
> >     start a thread doing stuff in b[65..]
> >     
> >     if (!strcmp(a,b)) {
> >       /* Do something */
> >     }
> > 
> >    a) Is that behaviour defined?
> 
> Good question.  I think it should be.  This depends on both the data
> race definition and what strcmp/strncmp/memcmp are specified to do using
> the abstract machine.
> 
> The data race definition uses memory locations as granularity, which is
> in 3.14 in C11.  Separate characters in an array should be separate
> memory locations.
> 
> C11 isn't very specific regarding strcmp, and just says that it
> "compares the string[s]" (7.24.4.2).  C++14 is a bit more specific
> regarding basic_string::compare (21.4.7.9), saying that first the length
> of the strings are determined, and then a strncmp is run using the
> smaller of the two lengths.
> 
> Assuming the C++ specs, only the array indices [0..64] should be
> accessed by the abstract machine.  So no data race with the stuff going
> on in [65..).
> 
> >    b) Is it defined with strncmp(a,b,64) ?
> 
> Yes.
> 
> >    c) Is it defined with strncmp(a,b,128)?
> 
> Not sure.  C11 says that not more than "n characters" are compared, and
> characters that follow the a null character aren't compared either.
> This indicates it wouldn't be different from strncmp(a,b,64) in the
> particular case.
> Regarding C++11, I'm not sure.  The closest copies a substring
> (conceptually) and then compares, but the substring copying has to
> determine length of the string and then subtracting the max length.
> This would do a strlen first, which wouldn't access past index 64.
> Thus, should be fine too.
> 
> >    d) Is it defined with memcmp(a,b,64)?
> 
> No data race, IMO.
> 
> >    e) Is it defined with memcmp(a,b,128)?
> 
> Data race.  Undefined behavior.
> 
> >    f) If I moved that boundary off a nice round % 8 mark would
> >       it matter?
> 
> No difference as far as the standard is concerned.
> 
> > I can imagine there may be lots of things that terminate
> > a string and let other stuff happen in the allocated space
> > after the end of the string in the belief that at the end
> > of that string all is unknown.   Andrea's case is a bit different
> > in that it's the later data that's static, but that doesn't
> > sound like it should change the answer as to what's allowed.
> 
> I think it does, because the question is whether there is a data race on
> the memory locations that the abstract machine would access.

Well given we are making examples.

Assume 2 structures like this:

struct test {
    void *pointer;
    char start[16];
    char end[240];
}

and

struct test a;
struct test b;

memset(a.end, 1, 240);
memset(b.end, 2, 240);

In what case it would be expected/legal for 
memcmp(a.start, b.start, 256); to ever return 0 ?

Simo.





Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]