This is the mail archive of the
ecos-devel@sourceware.org
mailing list for the eCos project.
Subtle bug not calling DSRs.
- From: Sergei Organov <osv at javad dot com>
- To: ecos-devel at sources dot redhat dot com
- Date: Thu, 22 Dec 2005 22:00:08 +0300
- Subject: Subtle bug not calling DSRs.
Hello,
I believe I've found bug when eCos doesn't call pended DSR(s) until
_next_ time the scheduler lock is released. The scenario I see is like
this:
1. Start single thread that is run as soon as scheduler is enabled.
2. The thread suspends itself that way or another (wait on a primitive
or just sleep, -- it doesn't matter).
3. The (2) results in context switch from this thread to the idle thread
from within Cyg_Scheduler::unlock_inner(0).
4. During context switch interrupt happens that is requesting
corresponding DSR to be run. However, as scheduler is locked during
context switch, the DSR is not invoked on exit from the interrupt.
5. Context switch continues and transfers control to the idle thread
entry address being Cyg_HardwareThread::thread_entry(). This is
because the idle thread had no chance to run before this point.
6. Cyg_HardwareThread::thread_entry() has no code to check for pended
DSRs, so the DSR posted in (4) is not run.
In my case system just hangs (loops forever in idle thread) at this
point as my thread (1) in fact waits for the interrupt, but
corresponding DSR that is meant to wakeup the thread is never called.
The quick-and-dirty fix to the problem is:
--- thread.cxx 14 Dec 2005 13:48:31 -0000
+++ thread.cxx 22 Dec 2005 18:24:03 -0000
@@ -102,6 +102,9 @@
Cyg_Scheduler::zero_sched_lock(); // the assignment into the code above.
HAL_REORDER_BARRIER();
+ Cyg_Scheduler::lock();
+ Cyg_Scheduler::unlock();
+
// Call entry point in a loop.
for(;;)
But while we are at it, it seems that entire prologue of the
Cyg_HardwareThread::thread_entry() should be moved to a new method of
the Cyg_Scheduler class as the prologue consists entirely of the
intimate details of how scheduler works. Moreover, this prologue is in
fact a slightly modified copy of the code found in the
Cyg_Scheduler::unlock_inner() executing after the line
HAL_THREAD_SWITCH_CONTEXT( ¤t->stack_ptr,
&next->stack_ptr );
Even if this common part can't be easily factored out, it is much better
to at least keep two slightly different copies in one place close to
each other I think.
Anyway, somebody more experienced in the eCos internal architecture
(Nick?) may wish to take a look at this and could probably come up with
a better patch.
--
Sergei.