This is the mail archive of the
libc-help@sourceware.org
mailing list for the glibc project.
Re: dead-lock in glibc
- From: "Carlos O'Donell" <carlos at systemhalted dot org>
- To: jkraehemann-guest at users dot alioth dot debian dot org
- Cc: "libc-help at sourceware dot org" <libc-help at sourceware dot org>, Torvald Riegel <triegel at redhat dot com>
- Date: Wed, 15 Mar 2017 21:54:04 -0400
- Subject: Re: dead-lock in glibc
- Authentication-results: sourceware.org; auth=none
- References: <CA+Owze40Onq_uZs2wOjY=O5Xv3D75Ce_b7Sf5qEjMZ-bAnW_wA@mail.gmail.com> <CAE2sS1gXkrLAZf2o54QSkE_fqFMrSd987nP=QYRe=GQEdq26_w@mail.gmail.com> <CA+Owze6vtqJ4jURD2H4fouw5izePVaQ9iun2LCLQ+HqwVvkvWw@mail.gmail.com>
On Wed, Mar 15, 2017 at 4:35 PM, Joël Krähemann <jkraehemann@gmail.com> wrote:
> * libc6 2.24-9
> Might be I was trying to do a recursive lock on a non-recursive mutex?
> I was playing 64 beats with the notation editor of GSequencer in a infinite
> loop. Suddenly it aborted after some playbacka approximetaly 3 to 4 minutes.
No. The asserts are intended to indicate internal consistency is violated.
Recursively locking a non-recursive mutex should lead to the thread
getting stuck forever, but not an assert.
>>> gsequencer: ../nptl/pthread_mutex_lock.c:349:
>>> __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e,
>>> __err) != EDEADLK || (kind != PTHREAD_MUTEX_ERRORCHECK_NP && kind !=
>>> PTHREAD_MUTEX_RECURSIVE_NP)' failed.
>>> Aborted
We've had a failure in the futex syscall, but that should not by
itself trigger an assert.
The failure was either "no thread found" or "deadlock".
The assert triggers when we get "deadlock" from the kernel but the
mutex was error-checking or recursive. Internally we don't ever expect
to get "deadlock" from the kernel for these kinds of mutexes and
indicates an algorithmic problem.
It's an algorithmic problem because earlier code should have detected
we owned the mutex in the recursive case, bumped the ownership
counter, and returned zero.
It's an algorithmic problem because earlier code should have detected
we owned the mutex in the error checking case, and should have
returned EDEADLK without making any futex syscalls.
So we didn't own the mutex and an attempt to acquire it determined it
was locked by someone else (not us), and then the kernel returned
EDEADLK, which doesn't make sense because we didn't own it to begin
with!
It points to a kernel or glibc issue with PI mutexes.
Cheers,
Carlos.