This is the mail archive of the ecos-discuss@sources.redhat.com mailing list for the eCos project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: questions about condition variable example from eCos book

From: Nick Garnett <nickg at ecoscentric dot com>
To: "Eric Smith" <eric-ecd at brouhaha dot com>
Cc: ecos-discuss at sources dot redhat dot com
Date: 28 Apr 2003 11:19:29 +0100
Subject: Re: [ECOS] questions about condition variable example from eCos book
References: <32786.64.169.63.74.1051517627.squirrel@ruckus.brouhaha.com>

"Eric Smith" <eric-ecd at brouhaha dot com> writes:

> I've got some questions about the condition variable example code in
> the eCos book on pages 107-109.  I'm new to eCos so it's possible that
> I'm misinterpreting something.
> 
> First, the description of cyg_cond_signal() says that it will "wake up
> exactly one thread waiting on the condition variable [...]  If there are
> no threads waiting for the condition variable when it is signaled, nothing
> happens."  But it also says "a race condition could arise if more than
> one thread is waiting for the condition variable.  This is why it is
> important for the waiting thread to retest the condition variable to
> ensure its proper state."
> 
> I don't understand where this race condition comes from.  Even if there
> are multiple threads waiting on the same condition, cyg_cond_signal() will
> only wake up one, right?  So how could the thread wake up without the
> condition having been signalled?

The problem is not really in what happens in the condition variable,
but in what happens in the mutex. When a thread waiting on a condition
variable is signalled, it has to re-acquire the mutex before
proceeding. Obviously it cannot do this immediately, since the thread
doing the signalling currently has the mutex, so it must wait until
the mutex becomes free. However, there may be other threads ahead of
it on the mutex queue, or a higher priority thread may jump in and
grab the mutex before it can run. So, by the time the thread actually
gets to run, other threads may have changed the state of the protected
data (stolen the buffer, consumed the resource, read the bytes from
the serial device...) and the condition it was waiting for is no
longer true. Hence, it has to re-test the condition and wait again if
it is not true.

There are other, more obscure, sources of potential race
conditions. Some implementations of condition variables, particularly
on multiprocessors, may occasionally cause more threads to be woken
than is strictly necessary. It is perfectly reasonable for any
condition variable implementation to treat signal() as if it were
broadcast().

> 
> The example code contains two threads.  Here's an excerpt,
> slightly reformatted and without most of the comments:
> 

[snip code]

> 
> 
> So in this example, why is it not adequate for the while statement
> on line 44 to be an if statement instead?  I'll concede that a while
> is better defensive programming to use while, but it doesn't seem
> strictly necessary as the text claims.
>

In this specific example, with only two threads, an if() might be
adequate. However, as soon as you add a second consumer thread, there
is the potential for a thread to come out of cyg_cond_wait() when
buffer_empty is true and then proceed to process bogus data.

In the real world, this is very common, and one might forget to go
back an change that if() to a while(). So it's best to get into the
habit of using while() all the time.

> 
> It seems fairly clear that line 19 "buffer_empty =false" in thread_a should
> actually come after line 22 acquires the mutex, in order to prevent exactly
> this sort of race condition.

I agree. This looks like a error. Probably the result of some "tidying
up".

> 
> A more general problem with the example code is that if there is a single
> buffer shared between thread_a and thread_b, there needs to be something
> to prevent thread_a from refilling the buffer before thread_b is done
> with it.  To solve this problem, it may be necessary to move the
> "acquire data" portion of thread_a after the mutex has been locked.  So
> the code with these two fixes would be:
> 
> 14      while (1)
> 15        {
>             cyg_mutex_lock (& mut_cond_var);
> 16          // Acquire data into the buffer...
> 19          buffer_empty = false;
> 25          cyg_cond_signal (& cond_var);
> 28          cyg_mutex_unlock (& mut_cond_var);
> 29        }
> 
> Since the thread has the mutex locked during the acquisition of data and
> setting buffer_empty = false, those could be done in either order, but as
> a matter of style it seems best to not set the variable until after the
> acquisition is completed.
> 

A possible problem with this approach is that if we have priority
inversion protection enabled, it is possible for the thread to end up
doing the entire data acquisition at an unnaturally raised priority.
It would be nice to avoid that.

An alternative approach would be to change the buffer_empty variable
into a buffer_owner, or a buffer_state variable that contains slightly
more information. This would allow thread_a to distinguish between the
buffer being unused, and being in use by thread_b.

-- 
Nick Garnett                    eCos Kernel Architect
http://www.ecoscentric.com/     The eCos and RedBoot experts

-- 
Before posting, please read the FAQ: http://sources.redhat.com/fom/ecos
and search the list archive: http://sources.redhat.com/ml/ecos-discuss

References:
- questions about condition variable example from eCos book
  - From: Eric Smith

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]