This is the mail archive of the
libc-help@sourceware.org
mailing list for the glibc project.
Re: Weird behavior observed with NPTL semaphores
- From: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- To: libc-help at sourceware dot org
- Date: Wed, 12 Nov 2014 22:43:30 -0200
- Subject: Re: Weird behavior observed with NPTL semaphores
- Authentication-results: sourceware.org; auth=none
- References: <86064E213FDE854885686AF0C4584029A76CB4A1A8 at ONWVEXCHMB02 dot ciena dot com>
Hi
On 30-10-2014 18:37, Tetreault, Francois wrote:
> Hello,
>
> We have questions about the glibc Native POSIX Thread Library (NPTL).
>
> We have an application which has a few threads, where mutexs are used to arbitrate access to data.
> The Mutex object content is as shown below.
> mMutex = {
> __data = {
> __lock = -2147473878,
> __count = 0,
> __owner = 0,
> __kind = 33,
> __nusers = 0,
> {
> __spins = 0,
> __list = {
> __next = 0x0
> }
> }
> },
> __size = "\200\000&*", '\000' <repeats 11 times>, "!\000\000\000\000\000\000\000",
> __align = -2147473878
> }
>
> Where 33 translates to:
> #define PTHREAD_MUTEX_TYPE(m) ((m)->__data.__kind & 127)
>
> PTHREAD_MUTEX_PRIO_INHERT_NP = 32
> PTHREAD_MUTEX_RECURSIVE_NP = 1
> PTHREAD_MUTEX_PI_RECURSIVE_NP = PTHREAD_MUTEX_PRIO_INHERT_NP | PTHREAD_MUTEX_RECURSIVE_NP
>
> A problem occurs, only once in a blue moon, where the code fails to release the semaphore. It complains about the semaphore not being owned by any threads when it comes to give it away.
> We have added our own instrumentation, to hopefully understand what is going on. See our trace below.
> Caution; our tracing is not perfect as it is not reentrant; we could easily get preempted while we are capturing the data.
> Also note that, in our trace:
> . "pre" is the value of the fields prior to the mutex operation, and "post" is afterwards.
> . MUTEX_GIVE is a call to pthread_mutex_unlock(), and
> . MUTEX_TAKE is a call to pthread_mutex_lock().
>
> { [trace 1]
> calling_task = 3659,
> action = MUTEX_GIVE,
> pre_count = 1,
> pre_owner = 3659,
> post_count = 0,
> post_owner = 0
> }, { [trace 2]
> calling_task = 4690,
> action = MUTEX_TAKE,
> pre_count = 0,
> pre_owner = 0,
> post_count = 1,
> post_owner = 4690
> }, { [trace 3]
> calling_task = 3659,
> action = MUTEX_TAKE,
> pre_count = 1,
> pre_owner = 4690,
> post_count = 1,
> post_owner = 3659
> }, { [trace 4]
> calling_task = 4690,
> action = MUTEX_GIVE,
> pre_count = 1,
> pre_owner = 4690,
> post_count = 0,
> post_owner = 0
> }, { [trace 5]
> calling_task = 3659,
> action = MUTEX_GIVE,
> pre_count = 0,
> pre_owner = 0,
> post_count = 0,
> post_owner = 0
> }, { [trace 6]
> calling_task = 4690,
> action = MUTEX_TAKE,
> pre_count = 0,
> pre_owner = 0,
> post_count = 0,
> post_owner = 0
> }, { [trace 7]
> calling_task = 3659,
> action = MUTEX_TAKE,
> pre_count = 0,
> pre_owner = 0,
> post_count = 1,
> post_owner = 0
> }, { [trace 8]
> calling_ta sk = 3659,
> action = MUTEX_GIVE,
> pre_count = 1,
> pre_owner = 0,
> post_count = 1,
> post_owner = 0
> }
>
> In the end [trace 8], the Mutex content is as follows:
> mMutex = {
> __data = {
> __lock = -2147479989,
> __count = 1,
> __owner = 0,
> __kind = 33,
> __nusers = 0,
> {
> __spins = 0,
> __list = {
> __next = 0x0
> }
> }
> },
> __size = "\200\000\016K\000\000\000\001\000\000\000\000\000\000\000!\000\000\000\000\000\000\000",
> __align = -2147479989
> }
> }
>
> The trace data actually triggered more questions than answers.
>
> 1. Is it ever a valid state to have a count greater than 0 while the value of owner is 0?
> 2. Note that our code asserts if any non-successful code is returned from calling either pthread_mutex_unlock() or pthread_mutex_lock().
> 3. In [trace 5], coming in (pre) we expected the mutex to be owned by 3659, but both count and owner are set to 0.
> 4. Starting from this point on, the content of the trace seems to be falling apart. Yet our code only asserts when it gets to [trace 8]!
> 5. Also notice that the owner field is always 0 from [trace 5] onwards.
> 6. Is there any known bugs that could lead to this weird behavior?
>
> Info about the system.
> . Linux Kernel version: 3.4.36
> . Glibc version: 2.9 "stable"
> . GCC version: powerpc-e500-linux-gnuspe-gcc (GCC) 4.6.3
> . Processor: Freescale MPC8572
> . Mode of operation: Symmetric Multi-Processing (SMP)
>
> Thank you,
> Francois
>
>
Your GLIBC version seems to be quite old compared to both kernel and GCC. Have you
tried with a new GLIBC? I am not aware of any powerpc bugs related to pthreads,
but due the GLIBC version I am not excluding it. Also I think there were some fixes
for PTHREAD_MUTEX_PI_RECURSIVE_NP in more recent versions.