This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: dead-lock in glibc


On Wed, 2017-03-15 at 21:54 -0400, Carlos O'Donell wrote:
> On Wed, Mar 15, 2017 at 4:35 PM, Joël Krähemann <jkraehemann@gmail.com> wrote:
> > * libc6 2.24-9
> 
> > Might be I was trying to do a recursive lock on a non-recursive mutex?
> > I was playing 64 beats with the notation editor of GSequencer in a infinite
> > loop. Suddenly it aborted after some playbacka approximetaly 3 to 4 minutes.
> 
> No. The asserts are intended to indicate internal consistency is violated.
> 
> Recursively locking a non-recursive mutex should lead to the thread
> getting stuck forever, but not an assert.
> 
> >>> gsequencer: ../nptl/pthread_mutex_lock.c:349:
> >>> __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e,
> >>> __err) != EDEADLK || (kind != PTHREAD_MUTEX_ERRORCHECK_NP && kind !=
> >>> PTHREAD_MUTEX_RECURSIVE_NP)' failed.
> >>> Aborted
> 
> We've had a failure in the futex syscall, but that should not by
> itself trigger an assert.
> 
> The failure was either "no thread found" or "deadlock".
> 
> The assert triggers when we get "deadlock" from the kernel but the
> mutex was error-checking or recursive. Internally we don't ever expect
> to get "deadlock" from the kernel for these kinds of mutexes and
> indicates an algorithmic problem.
> 
> It's an algorithmic problem because earlier code should have detected
> we owned the mutex in the recursive case, bumped the ownership
> counter, and returned zero.
> 
> It's an algorithmic problem because earlier code should have detected
> we owned the mutex in the error checking case, and should have
> returned EDEADLK without making any futex syscalls.
> 
> So we didn't own the mutex and an attempt to acquire it determined it
> was locked by someone else (not us), and then the kernel returned
> EDEADLK, which doesn't make sense because we didn't own it to begin
> with!
> 
> It points to a kernel or glibc issue with PI mutexes.

It may, but not necessarily.  For example, the load of __lock that
handles the recursive/error-checking case is a separate access from the
CAS, so something else may have changed __lock in the meantime (eg, a
bug in the application).

A reproducer would be really helpful.  If we can't get this, we'd at
least need some information about the affected mutex: the kind of the
mutex, how it's used by the program, etc.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]