This is the mail archive of the cygwin@cygwin.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: 1.1.3 and upwards: apparent bug with pthread_cond_wait() and/or signal()


On Thursday 02 May 2002 01:37, Robert Collins wrote:
> > -----Original Message-----
> > From: Michael Beach [mailto:michaelb@ieee.org]
> > Sent: Thursday, May 02, 2002 12:21 AM
> >
> >
> > Thanks for taking the time to look at this issue, but I must
> > disagree that
> > this is the problem.
>
> You're going to have to debug this yourself. I've given you my opinion
>
> :].
> :
> > If the test thread locks the mutex first, sure it will
> > probably signal before
> > the main thread is wating, but that doesn't matter because
> > the main thread
>
> does this sequence look plausible to you? I don't claim it is whats
> happening because the string output doesn't fit..  but it illustrates
> the race. On a dual processor machine this is much more likely than a
> single.
>
> thread - lock
> thread - state=run
> thread - signal
> main - lock
> main - test state (passes)

No, I don't think it's plausible. In particular, we can't get to "main-lock" 
until we get to "thread wait" because it's not until then that "thread" has 
(implicitly) released the mutex. The OS can pre-empt "thread" all it likes, 
but as soon as "main" has progressed to the pthread_mutex_lock() call it (ie 
"main") will no longer be runnable and so won't be scheduled, until "thread" 
calls pthread_cond_wait().

> thread - test state (fails)
> main - state = acknowledged
> main - signal
> thread wait
> main - unlock
> main - join
> thread is hung.
>
>
> what are we seeing:
> main - lock
> main - test state fails
> main - wait
> thread - lock
> thread - state=run
> thread - signal
> -- test thread has signal()ed
> thread - test state (fails)
> -- test thread about to wait()...
> thread wait
> -- main thread wakes!
> main - state = acknowledged
> -- main thread about to signal()
> main - signal
> main - unlock
> -- main thread waiting for exit...
> thread should wake here.
>
> > If the above hand-wavy explanation does not seem convincing,
>
> ...
>
> > the different platforms does not seem to hold much water...
>
> Without a few more output statements, I'll not buy into that.

Fair enough.

> However I
> do accept your hand waving. Particularly since I've noticed something
> useful out of this: pthread_join's argument should not be 0. I have to
> dig up the spec to confirm this though.... but our code will segfault
> like crazy on you as it stands.

Well, I'm not sure what the standard says on this either, and I've not had an 
authoritative reference book handy lately, so I've just been going with 
what's legal according to the manpages on SuSE 7.2. So my excuse is "Linux 
made me do it".

>
> > However, that said, I will be trying 1.3.10 to see if it
> > makes a difference.
> > If not, then I guess I will just have to make the move to the
> > Win32 threading
> > and synchronization APIs. Blech!
>
> You could always help us debug the pthreads code... I wonder if the
> recent patches I haven't reviewed properly yet address this. If you had
> time, you could try them and see...

In principle I'd be pleased to help, but in practice my time is a bit tight 
right now as I've been doing the public spirited thing for one or two bugs 
I've encountered in other open source projects I've been using, and now I 
think my employer would like me to focus more closely on Real Work (TM) ;-)

However if you're not expecting high bandwidth, if you could point me at a 
document or whatnot that explains how to set up a development environment I'd 
be willing to have a go.

>
> > > You should also _always_ test for the return value when
> >
> > using pthreads
> >
> > > calls. They don't throw exceptions and they don't set errno, so the
> > > only way you can tell an error has occurred is to record the return
> > > value.
> >
> > Yes I know. The reason for this sloppy coding is that this
> > test program is
> > ...
>
> Please don't remove error handling. If I were to run this program I'd
> expect to have error handling so I don't have to add it in. And running
> the code w/o error handling won't help me id anything non-trivial.

Sure. The quick'n'dirty pthreads calls were only so I didn't have to post 
half of our source tree in order to illustrate the problem with an example 
that actually compiles. If you're serious about wanting to run it, give me a 
shout and I'll give you a version with error handling.

>
> Rob (Cygwin pthreads maintainer).

Regards
M.Beach

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]