This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: shared data protection failed in pthread_cond_timedwait


On 2014/4/26 14:45, Yang Yingliang wrote:
> On 2014/4/25 17:43, Will Newton wrote:
>> On 25 April 2014 07:58, Yang Yingliang <yangyingliang@huawei.com> wrote:
>>> Hi,
>>>
>>>
>>> I have 22 threads wait in pthread_cond_timedwait. When they are all woke up, I found
>>> there are more than one threads can access shared data in pthread_cond_timedwait.
>>>
>>> I added print messages as follow code:
>>>
>>> --- libc/nptl/pthread_cond_timedwait.c
>>> +++ libc/nptl/pthread_cond_timedwait.c
>>> @@ -34,6 +34,7 @@
>>>  #else
>>>  # include <bits/libc-vdso.h>
>>>  #endif
>>> +#include <stdio.h>
>>>
>>>  /* Cleanup handler, defined in pthread_cond_wait.c.  */
>>>  extern void __condvar_cleanup (void *arg)
>>> @@ -235,7 +239,9 @@
>>>
>>>   bc_out:
>>>
>>> +printf("start do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self());
>>>    cond->__data.__nwaiters -= 1 << COND_NWAITERS_SHIFT;
>>> +printf("end do sub :%d, lock:%d %p\n", cond->__data.__nwaiters, cond->__data.__lock, pthread_self());
>>>
>>>    /* If pthread_cond_destroy was called on this variable already,
>>>       notify the pthread_cond_destroy caller all waiters have left
>>>
>>>
>>> I tested on Linux arma15el 3.10.37+ #2 SMP Fri Apr 25 11:23:25 CST 2014 armv7l GNU/Linux.
>>> Here is the result:
>>>
>>> start do sub :45, lock:1 0xb6d9a460
>>> end do sub :43, lock:1 0xb6d9a460
>>> start do sub :43, lock:1 0xb6d9e460
>>> end do sub :41, lock:2 0xb6d9e460
>>> start do sub :43, lock:2 0xb6dbe460    //two threads both access the shared data
>>> start do sub :41, lock:1 0xb6daa460
>>> end do sub :39, lock:2 0xb6daa460
>>> start do sub :39, lock:2 0xb6de6460
>>> end do sub :37, lock:2 0xb6de6460
>>> start do sub :37, lock:2 0xb6db6460
>>> end do sub :35, lock:2 0xb6db6460
>>> start do sub :35, lock:2 0xb6dc2460
>>> end do sub :33, lock:2 0xb6dc2460
>>> end do sub :37, lock:2 0xb6dbe460
>>> start do sub :33, lock:2 0xb6dc6460
>>> end do sub :31, lock:0 0xb6dc6460
>>> start do sub :31, lock:2 0xb6dae460
>>> end do sub :29, lock:2 0xb6dae460
>>> start do sub :29, lock:2 0xb6db2460
>>> end do sub :27, lock:2 0xb6db2460
>>> start do sub :27, lock:2 0xb6dba460
>>> end do sub :25, lock:2 0xb6dba460
>>> start do sub :25, lock:2 0xb6da2460
>>> end do sub :23, lock:2 0xb6da2460
>>>
>>> Is lll_lock (cond->__data.__lock, pshared) failed?
>>>
>>> pshared is LLL_SHARED.
>>
>> I have had a quick look at this and there is no obvious reason I can
>> see for this behaviour, unless there is some way that IO buffering
>> could cause the messages to be strangely interleaved. The other
>> alternative that may be worth investigating is whether or not
>> ldrex/strex is working correctly in your SMP system.
>>
> 
> After doing some investigation, it looks like atomic_compare_and_exchange_val_acq
> not doing atomic. So two threads can both acquire lock when futex is 0. Is there
> something wrong in atomic_compare_and_exchange_val_acq ?
> 
> #define __lll_lock(futex, private)					      \
>   ((void) ({								      \
>     int *__futex = (futex);						      \
>     if (__builtin_expect (atomic_compare_and_exchange_val_acq (__futex,       \
> 								1, 0), 0))    \
>       {									      \
> 	if (__builtin_constant_p (private) && (private) == LLL_PRIVATE)	      \
> 	  __lll_lock_wait_private (__futex);				      \
> 	else								      \
> 	  __lll_lock_wait (__futex, private);				      \
>       }									      \
>   }))
> 

I noticed that atomic_compare_and_exchange_val_acq are not atomic for ARM in glibc-2.18.
I will try glibc-2.19.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]