This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: BZ 20822 :powerpc: race condition in __lll_unlock_elision


On Mon, 2016-11-21 at 17:42 -0600, Steven Munroe wrote:
> On Thu, 2016-11-17 at 15:46 +0100, Torvald Riegel wrote:
> > On Tue, 2016-11-15 at 21:42 +0530, Rajalakshmi Srinivasaraghavan wrote:
> > > The initial problem reported was memory corruption in MongoDB only on 
> > > Ubuntu 16.04 ppc64le(not on Ubuntu 15).
> > > The problem was not easily recreatable and debugging with many tools and 
> > > creative ideas, the problem is narrowed down to lock elision in glibc.
> > > Ubuntu 16.04 has glibc 2.23 with lock elision enabled by default which 
> > > made the testcase fail only on 16.04.
> > > 
> > > As stated in BZ 20822, short description is
> > > 
> > > "The update of *adapt_count after the release of the lock causes a race 
> > > condition. Thread A unlocks, thread B continues and returns, destroying 
> > > mutex on stack,
> > >   then gets into another function, thread A writes to *adapt_count and 
> > > corrupts stack.The window is very narrow and requires that the
> > > machine be in smt mode, so likely the two threads have to be on the same 
> > > core in order to hit this"
> > > 
> > > Looking back the file changes in 
> > > sysdeps/unix/sysv/linux/powerpc/elision-unlock.c, suspecting the commit
> > >   https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=fadd2ad9cc36115440d50b0eae9299e65988917d which is introduced to improve
> > > the performance.
> > > 
> > > Thinking of the following fixes.
> > > 
> > > 1) Update of adapt_count before the unlock. But I have still not 
> > > identified if its going to affect the performance.
> > 
> > Because of the requirements on mutex destruction (and same for rwlock
> > etc.), a thread must not access any data associated with the mutex after
> > having unlocked it (this includes reads because it's correct to also
> > unmap the memory holding the mutex after destroying the mutex). 
> > 
> > > 2) In elision-lock.c/elision-trylock.c update adapt_count right after 
> > > the lock acquire at the end of the function.
> > > In either case we have to be careful not to introduce another race 
> > > condition.
> > 
> > Every access to mutex data outside of transactions must use atomic
> > operations if there would be data races otherwise (according to the C11
> > memory model).  I know that this isn't followed in the current mutex
> > code, but it really should be, and your change qualifies as new code so
> > please take care of this too.
> > 
> In this case (with the proposed patch) the adapt_count update is within
> the critical region and adding C11 atomics does not add value and would
> generally make the logical harder to read. 
> 
> __atomic_store_n (adapt_count, (__atomic_load_n(adapt_count,
> __ATOMIC_RELAXED) - 1), __ATOMIC_RELAXED)
> 
> Is a lot of syntax without any real value.

First, it would actually be this in glibc:
atomic_store_relaxed (&adapt_count,
  atomic_load_relaxed(&adapt_count) - 1);

Second, it does add value in that it makes it clear that there are
concurrent accesses elsewhere that are not in transactions.  We have
consensus to use the new C11-like atomics in new or changed code, and I
don't see a reason to make an exception here.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]