This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: BZ 20822 :powerpc: race condition in __lll_unlock_elision


On Wed, 2016-11-23 at 12:25 +0100, Torvald Riegel wrote:
> On Tue, 2016-11-22 at 12:03 -0600, Steven Munroe wrote:
> > On Tue, 2016-11-22 at 09:44 +0100, Torvald Riegel wrote:
> > > On Mon, 2016-11-21 at 17:42 -0600, Steven Munroe wrote:
> > > > On Thu, 2016-11-17 at 15:46 +0100, Torvald Riegel wrote:
> > > > > On Tue, 2016-11-15 at 21:42 +0530, Rajalakshmi Srinivasaraghavan wrote:
> > > > > > The initial problem reported was memory corruption in MongoDB only on 
> > > > > > Ubuntu 16.04 ppc64le(not on Ubuntu 15).
> > > > > > The problem was not easily recreatable and debugging with many tools and 
> > > > > > creative ideas, the problem is narrowed down to lock elision in glibc.
> > > > > > Ubuntu 16.04 has glibc 2.23 with lock elision enabled by default which 
> > > > > > made the testcase fail only on 16.04.
> > > > > > 
> > > > > > As stated in BZ 20822, short description is
> > > > > > 
> > > > > > "The update of *adapt_count after the release of the lock causes a race 
> > > > > > condition. Thread A unlocks, thread B continues and returns, destroying 
> > > > > > mutex on stack,
> > > > > >   then gets into another function, thread A writes to *adapt_count and 
> > > > > > corrupts stack.The window is very narrow and requires that the
> > > > > > machine be in smt mode, so likely the two threads have to be on the same 
> > > > > > core in order to hit this"
> > > > > > 
> > > > > > Looking back the file changes in 
> > > > > > sysdeps/unix/sysv/linux/powerpc/elision-unlock.c, suspecting the commit
> > > > > >   https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=fadd2ad9cc36115440d50b0eae9299e65988917d which is introduced to improve
> > > > > > the performance.
> > > > > > 
> > > > > > Thinking of the following fixes.
> > > > > > 
> > > > > > 1) Update of adapt_count before the unlock. But I have still not 
> > > > > > identified if its going to affect the performance.
> > > > > 
> > > > > Because of the requirements on mutex destruction (and same for rwlock
> > > > > etc.), a thread must not access any data associated with the mutex after
> > > > > having unlocked it (this includes reads because it's correct to also
> > > > > unmap the memory holding the mutex after destroying the mutex). 
> > > > > 
> > > > > > 2) In elision-lock.c/elision-trylock.c update adapt_count right after 
> > > > > > the lock acquire at the end of the function.
> > > > > > In either case we have to be careful not to introduce another race 
> > > > > > condition.
> > > > > 
> > > > > Every access to mutex data outside of transactions must use atomic
> > > > > operations if there would be data races otherwise (according to the C11
> > > > > memory model).  I know that this isn't followed in the current mutex
> > > > > code, but it really should be, and your change qualifies as new code so
> > > > > please take care of this too.
> > > > > 
> > > > In this case (with the proposed patch) the adapt_count update is within
> > > > the critical region and adding C11 atomics does not add value and would
> > > > generally make the logical harder to read. 
> > > > 
> > > > __atomic_store_n (adapt_count, (__atomic_load_n(adapt_count,
> > > > __ATOMIC_RELAXED) - 1), __ATOMIC_RELAXED)
> > > > 
> > > > Is a lot of syntax without any real value.
> > > 
> > > First, it would actually be this in glibc:
> > > atomic_store_relaxed (&adapt_count,
> > >   atomic_load_relaxed(&adapt_count) - 1);
> > > 
> > This is not a good example for you case. The proposed patch:
> > 
> > -      lll_unlock ((*lock), pshared);
> > -
> > -      /* Update the adapt count AFTER completing the critical section.
> > -         Doing this here prevents unneeded stalling when entering
> > -         a critical section.  Saving about 8% runtime on P8.  */
> > +      /* Update the adapt count in the critical section to
> > +         prevent race condition as mentioned in BZ 20822.  */
> >        if (*adapt_count > 0)
> >  	(*adapt_count)--;
> > +      lll_unlock ((*lock), pshared);
> > 
> > makes it clear that this test and update of the adapt_count is contained
> > with the acquire/release semantics of the lll_lock and lll_unlock.
> 
> However, it does not make it clear that it should be accessed atomically
> outside of transactions.
> 
> Another reason why we want to use atomic operations consistently is to
> at least allow us to really use C11 atomics in the future (or some sort
> of similar underlying type), so that we can more easily catch cases
> where atomic operations should be used but aren't.  If we don't
> specifically use relaxed-MO operations, the default will be SC.  I'm
> sure you'd like to avoid that too.
> 
> This is a consistency argument wrt other code.  The fact that the atomic
> operation is in a transaction too is just another fact.
> We don't want people to assume that plain memory accesses are
> sufficient, for example because all accesses to adapt_count are within
> transactions.
> 
> > Also if we add the explicit syntax you propose, someone, who has not
> > studied the architecture, might leap to the conclusion that add_fetch
> > would be even better.
> 
> I disagree.  If you look at the rules we've set up for concurrent code
> in general (eg, see "Concurrency" on the wiki, or the concurrent code we
> have converted), we require developers to document what the code does;
> this means memory order and operations used, and the high-level intent.
> If you do this properly, the misunderstandings you worry should not
> happen.
> 
I disagree, your definition of consistency is in this case misleading
for my platform.

> > Which would be absolutely wrong. This would generate a larx-add-stcx
> > sequence which has two very bad side-effects.
> > 
> > 1) on the fall back leg, it would a force a larx-hit-stcx flush hazard
> > with the following lll_unlock and cause bad performance.
> > 2) on the transaction leg, it would be redundant and just slow things
> > down.
> 
> So why not add this as a comment?  The use of nonatomic accesses does
> not imply these points, BTW.
> 
So a comment to the effect that explicit atomics are not required for
this case because it is within the critical region of the lll_lock /
lll_unlock.

That should satisfy your concern, and mine.

> Also, the same concerns for how to access adapt_count apply to other
> accesses to it that may not or may be in a transaction (due to nesting).
> You must use atomic accesses for these, so you'd need the explaining
> comment anyway.
> 
Yes I agree that accesses to adapt_count where the context is not clear
should use atomic_load_relaxed / atomic_store_relaxed in the elision
code.

> On a positive note, it seems that we agree that we want to clarity of
> the source code.  I'm arguing that more comments and a consistent use of
> atomic operations (with proper memory orders and comments, of course)
> will get us there.  That's why I suggested that the patch should clean
> up and document all accesses to adapt_count.
> 
> 



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]