This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug nptl/13165] pthread_cond_wait() can consume a signal that was sent before it started waiting


http://sourceware.org/bugzilla/show_bug.cgi?id=13165

--- Comment #5 from Mihail Mihaylov <mihaylov.mihail at gmail dot com> 2011-09-25 21:32:35 UTC ---
Created attachment 5945
  --> http://sourceware.org/bugzilla/attachment.cgi?id=5945
Test to observe the race

Attaching a self contained test. What the test does:

We have a mutex and a condition variable. We also have several auxiliary
condition variables and counters.

The main thread locks the mutex and creates as many waiter threads as possible.
The waiter threads start by waiting on the mutex. Then both the main thread and
the waiter thread start looping to perform iterations of the test until the
race condition in NPTL is hit.

The loops of the main thread and the waiter threads are synchronized and go
like this:

1) The main thread starts by releasing the mutex and blocking on an auxiliary
condvar. This unblocks the waiter threads which start by entering the first
wait on the condition variable. Each waiter thread increments a waiters counter
before waiting and the last one also signals the auxiliary condvar to notify
the main thread that all waiters are blocked on the first wait.

2) When all waiters are blocked on the first wait, the main thread is unblocked
and starts sending signals. It sends as many signals as there are waiters, so
all waiters should move (eventually) beyond the first wait. The main thread
holds the mutex while sending the signals. The 'releaseMutexBetweenSignals'
constant controls whether it will release and reacquire the mutex between
signals.

3) Each unblocked waiter decrements the waiters counter and moves to the second
wait. To simplify the test, the waiters don't enter the second wait until all
signals from step 2 have been sent. This is controlled through a sent signals
counter and another auxiliary condvar.

4) After the main thread has sent all signals, it starts waiting for at least
two waiters to block on the second wait. This is facilitated by a counter of
the threads that have reached the second wait and one more auxiliary condvar.

5) When at least two threads have blocked on the second wait, the main thread
sends one more signal. Threads that get unblocked from the second wait may
start a third wait to allow the test iteration to complete before they loop
back to the first wait (of course this actually happens when the main thread
releases the mutex in step 6)

6) The main thread starts waiting for all waiters to exit the first wait. Each
waiter that exits the first wait decrements the waiters count and the last one
signals the last auxiliary condvar that the main thread waits on. If the wait
times out, the test has failed, otherwise it has passed.

7) If the test has passed, all waiters are waiting on the condition variable in
the second wait or the third wait, so the main thread sends a broadcast to
unblock them and all waiters move back to the first wait. With this the test
iteration is complete and a new iteration begins.


The main point about this test is that at the point where the main thread sends
the single signal, all waiters should be:

1) either waiting on the mutex in the first wait,
2) or waiting on the condition variable in the second wait,
3) or waiting on the mutex in the wait on the auxiliary condvar from step 3

which means that if the mutex gets released for long enough, all threads should
acquire the mutex in the first wait and the waiters count should eventually
reach zero. Step 6 is meant to provide this time. At step 6, the main thread
releases the mutex and starts waiting, and all waiters that acquire the mutex
release it almost immediately and start waiting themselves, so there is nothing
to prevent the threads from group (1) above from acquiring the mutex one by one
and bringing the waiters counter back to 0. The only thing that can get in the
way is if there is a waiter which is still blocked on the condition variable in
the first wait, which is what the test aims to trigger and detect.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]