This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: dead-lock in glibc

From: "Carlos O'Donell" <carlos at systemhalted dot org>
To: jkraehemann-guest at users dot alioth dot debian dot org
Cc: "libc-help at sourceware dot org" <libc-help at sourceware dot org>, Torvald Riegel <triegel at redhat dot com>
Date: Wed, 15 Mar 2017 21:54:04 -0400
Subject: Re: dead-lock in glibc
Authentication-results: sourceware.org; auth=none
References: <CA+Owze40Onq_uZs2wOjY=O5Xv3D75Ce_b7Sf5qEjMZ-bAnW_wA@mail.gmail.com> <CAE2sS1gXkrLAZf2o54QSkE_fqFMrSd987nP=QYRe=GQEdq26_w@mail.gmail.com> <CA+Owze6vtqJ4jURD2H4fouw5izePVaQ9iun2LCLQ+HqwVvkvWw@mail.gmail.com>

On Wed, Mar 15, 2017 at 4:35 PM, Joël Krähemann <jkraehemann@gmail.com> wrote:
> * libc6 2.24-9

> Might be I was trying to do a recursive lock on a non-recursive mutex?
> I was playing 64 beats with the notation editor of GSequencer in a infinite
> loop. Suddenly it aborted after some playbacka approximetaly 3 to 4 minutes.

No. The asserts are intended to indicate internal consistency is violated.

Recursively locking a non-recursive mutex should lead to the thread
getting stuck forever, but not an assert.

>>> gsequencer: ../nptl/pthread_mutex_lock.c:349:
>>> __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e,
>>> __err) != EDEADLK || (kind != PTHREAD_MUTEX_ERRORCHECK_NP && kind !=
>>> PTHREAD_MUTEX_RECURSIVE_NP)' failed.
>>> Aborted

We've had a failure in the futex syscall, but that should not by
itself trigger an assert.

The failure was either "no thread found" or "deadlock".

The assert triggers when we get "deadlock" from the kernel but the
mutex was error-checking or recursive. Internally we don't ever expect
to get "deadlock" from the kernel for these kinds of mutexes and
indicates an algorithmic problem.

It's an algorithmic problem because earlier code should have detected
we owned the mutex in the recursive case, bumped the ownership
counter, and returned zero.

It's an algorithmic problem because earlier code should have detected
we owned the mutex in the error checking case, and should have
returned EDEADLK without making any futex syscalls.

So we didn't own the mutex and an attempt to acquire it determined it
was locked by someone else (not us), and then the kernel returned
EDEADLK, which doesn't make sense because we didn't own it to begin
with!

It points to a kernel or glibc issue with PI mutexes.

Cheers,
Carlos.

Follow-Ups:
- Re: dead-lock in glibc
  - From: Joël Krähemann
- Re: dead-lock in glibc
  - From: Torvald Riegel

References:
- dead-lock in glibc
  - From: Joël Krähemann
- Re: dead-lock in glibc
  - From: Carlos O'Donell
- Re: dead-lock in glibc
  - From: Joël Krähemann

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]