This is the mail archive of the cygwin@cygwin.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: 1.1.3 and upwards: apparent bug with pthread_cond_wait() and/or signal()


On Wednesday 01 May 2002 22:22, Robert Collins wrote:
> > -----Original Message-----
> > From: Michael Beach [mailto:michaelb@ieee.org]
> > Sent: Wednesday, May 01, 2002 9:44 PM
> > To: cygwin@cygwin.com
> > Subject: 1.1.3 and upwards: apparent bug with
> > pthread_cond_wait() and/or signal()
> >
> >
> > Hi all, I've just been wrestling with some code I've been
> > writing, trying to
> > get pthreads condition variables to work under Cygwin on
> > Windows 2000. I've
> > tried DLL versions 1.1.3 and the 20020409 snapshot, and
> > neither are working
> > for me, so I'm assuming that no versions in between will work
> > either...
>
> Between 1.1.3 and 1.3.0 a huge change occurred in the pthreads code
> base, so this assumption is not safe. (It's not necessarily wrong
> either.) I'd definitely be using 1.3.10 though.

I'll give it a try, but I'm not too hopeful considering that the snapshot 
(which postdates 1.3.10) doesn't seem to work.

>
> > #include <pthread.h>
> > #include <iostream>
>
> The cygwin c++ libgcc, stdlibc++ and gcc are not built with thread
> support, so C++ and threads may not work well together. C should work
> fine, and if anyone convinces Chris to release a thread-enabled gcc, C++
> should get better.
>
> > int main(int argc, char *argv[])
> >
> > {
> >
> >     CondVarTestData td;
> >
> >     pthread_mutex_init(&td.m, 0);
> >
> >     pthread_cond_init(&td.cv, 0);
> >
> >     td.state = CondVarTestData::START;
> >
> >     pthread_t th;
> >
> >     pthread_create(&th, 0, condVarTestThreadEntry, &td);
> >
> >     {
> >
> > 	pthread_mutex_lock(&td.m);
>
> you should lock this before starting your thread. It's a potential race.
> And due to cygwin's implementation, it *is* racing, and your other
> thread is entering the mutex and signalling before you enter the mutex
> and wait. That early signal with no waiters gets lost (as it should).

Thanks for taking the time to look at this issue, but I must disagree that 
this is the problem. There *is* indeterminacy here (vis-a-vis what is 
guaranteed by the pthreads spec) as to which thread locks the mutex first, 
but I'd hesitate to call it a race condition since the completion of the test 
program (by design) does not *depend* on which thread gets to the mutex 
first. I've included relevant parts of the program again below to illustrate 
my point.

If the test thread locks the mutex first, sure it will probably signal before 
the main thread is wating, but that doesn't matter because the main thread 
won't sleep since it tests the condition (that the shared state is 
NEW_THREAD_RUNNING) to see whether or not it should call pthread_cond_wait(), 
and the test thread ensures that that condition is satisfied before it 
signals. So the test thread wll then end up waiting for the main thread to 
signal it, which it will do. Then the test thread exits, the main thread 
joins it and the program terminates succesfully.

On the other hand, if the main thread gets to the mutex first then it will 
wait (as the NEW_THREAD_RUNNING condition will no be satisfied). At this 
point the test thread will get to run and will signal the waiting main thread 
after setting the state to NEW_THREAD_RUNNING. The main thread will then wake 
when the test thread itself calls pthread_cond_wait() (and so releases the 
mutex). The the main thread will signal the waiting test thread, which then 
exits, and so the program then terminates much as before.

If the above hand-wavy explanation does not seem convincing, then I'd also 
like to tender the empirical evidence of the printed output from the test 
runs on Cygwin and Linux. In both cases the output is the same, up until the 
point when the Cygwin built version just stops producing output at all. This 
tends to indicate that the underlying thread systems are making the same 
scheduling decisions with respect to those two threads, so the argument that 
it works on Linux but not on Cygwin due to an inherent race condition 
resolving itself differently (due to different scheduling of the threads) on 
the different platforms does not seem to hold much water...

However, that said, I will be trying 1.3.10 to see if it makes a difference. 
If not, then I guess I will just have to make the move to the Win32 threading 
and synchronization APIs. Blech!

int main(int argc, char *argv[])
{
    CondVarTestData td;
    pthread_mutex_init(&td.m, 0);
    pthread_cond_init(&td.cv, 0);
    td.state = CondVarTestData::START;
    pthread_t th;
    pthread_create(&th, 0, condVarTestThreadEntry, &td);
    {
         pthread_mutex_lock(&td.m);
         while (td.state != CondVarTestData::NEW_THREAD_RUNNING)
         {
            pthread_cond_wait(&td.cv, &td.m);
            clog << "-- main thread wakes!" << endl;
         }
         td.state = CondVarTestData::NEW_THREAD_ACKNOWLEDGED;
         clog << "-- main thread about to signal()" << endl;
         pthread_cond_signal(&td.cv);
         pthread_mutex_unlock(&td.m);
    }
    clog << "-- main thread waiting for exit..." << endl;
    pthread_join(th, 0);
    cout << "%% PASSED" << endl;

    return 0;
}


void *condVarTestThreadEntry(void *arg)
{
    CondVarTestData *td = (CondVarTestData *)arg;
    pthread_mutex_lock(&td->m);
    td->state = CondVarTestData::NEW_THREAD_RUNNING;
    pthread_cond_signal(&td->cv);
    clog << "-- test thread has signal()ed" << endl;
    while (td->state != CondVarTestData::NEW_THREAD_ACKNOWLEDGED)
    {
         clog << "-- test thread about to wait()..." << endl;
        pthread_cond_wait(&td->cv, &td->m);
         clog << "-- test thread wakes!" << endl;
    }
    pthread_mutex_unlock(&td->m);
    clog << "-- test thread about to exit..." << endl;
    return 0;
}

>
> You should also _always_ test for the return value when using pthreads
> calls. They don't throw exceptions and they don't set errno, so the only
> way you can tell an error has occurred is to record the return value.

Yes I know. The reason for this sloppy coding is that this test program is 
the result of quickly stripping out calls to a C++ threading library (which 
in the case of Cygwin simply wraps pthreads quite thinly) and replacing the 
with raw pthreads. The library does handle error returns, but I wanted to 
demonstrate the problem without any "noise" from the library before posting 
to the list.

>
> Rib

Regards
M.Beach

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]