This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: dead-lock in glibc


Hi

Just run the test again, hanging at a different time.

Bests,
Joël


On Thu, Mar 16, 2017 at 7:30 AM, Joël Krähemann <jkraehemann@gmail.com> wrote:
> Hi Carlos
>
> Thank you for the hints. If you need additional information please let me know.
>
> regards,
> Joël
>
>
> On Thu, Mar 16, 2017 at 2:54 AM, Carlos O'Donell
> <carlos@systemhalted.org> wrote:
>> On Wed, Mar 15, 2017 at 4:35 PM, Joël Krähemann <jkraehemann@gmail.com> wrote:
>>> * libc6 2.24-9
>>
>>> Might be I was trying to do a recursive lock on a non-recursive mutex?
>>> I was playing 64 beats with the notation editor of GSequencer in a infinite
>>> loop. Suddenly it aborted after some playbacka approximetaly 3 to 4 minutes.
>>
>> No. The asserts are intended to indicate internal consistency is violated.
>>
>> Recursively locking a non-recursive mutex should lead to the thread
>> getting stuck forever, but not an assert.
>>
>>>>> gsequencer: ../nptl/pthread_mutex_lock.c:349:
>>>>> __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e,
>>>>> __err) != EDEADLK || (kind != PTHREAD_MUTEX_ERRORCHECK_NP && kind !=
>>>>> PTHREAD_MUTEX_RECURSIVE_NP)' failed.
>>>>> Aborted
>>
>> We've had a failure in the futex syscall, but that should not by
>> itself trigger an assert.
>>
>> The failure was either "no thread found" or "deadlock".
>>
>> The assert triggers when we get "deadlock" from the kernel but the
>> mutex was error-checking or recursive. Internally we don't ever expect
>> to get "deadlock" from the kernel for these kinds of mutexes and
>> indicates an algorithmic problem.
>>
>> It's an algorithmic problem because earlier code should have detected
>> we owned the mutex in the recursive case, bumped the ownership
>> counter, and returned zero.
>>
>> It's an algorithmic problem because earlier code should have detected
>> we owned the mutex in the error checking case, and should have
>> returned EDEADLK without making any futex syscalls.
>>
>> So we didn't own the mutex and an attempt to acquire it determined it
>> was locked by someone else (not us), and then the kernel returned
>> EDEADLK, which doesn't make sense because we didn't own it to begin
>> with!
>>
>> It points to a kernel or glibc issue with PI mutexes.
>>
>> Cheers,
>> Carlos.

Attachment: ags_functional_log
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]