This is the mail archive of the
ecos-discuss@sources.redhat.com
mailing list for the eCos project.
Re: 0xDEADBEEF LOCK in spl_any() ;)
- To: <ecos-discuss at sources dot redhat dot com>
- Subject: Re: [ECOS] 0xDEADBEEF LOCK in spl_any() ;)
- From: Hugo Tyson <hmt at redhat dot com>
- Date: 18 Oct 2001 11:14:52 +0100
- References: <000001c15733$75443f00$090110ac@TRENT>
- Reply-To: <ecos-discuss at sources dot redhat dot com>
"Trenton D. Adams" <tadams@extremeeng.com> writes:
> FYI: Trying to debug WaveLAN pccard driver.
>
> I'm getting a dead lock in spl_any () in
> "net/tcpip/current/src/ecos/synch.c"
> I don't what the "cyg_mutex_lock( &splx_mutex )" is locking splx_mutex
> for. Anyone???
Splx() is the network stack mechanism for mutual exclusion and atomic
access to network drivers.
The implementation of splx() is locking a mutex in order to "do" mutual
exclusion; that's what it does. Otherwise multiple threads might call into
your driver simultaneously.
> if ( cyg_thread_self() != splx_thread ) {
> cyg_mutex_lock( &splx_mutex ); // <<< DEADLOCKS HERE
The thread you are looking at yields because it cannot get the mutex,
because some other thread owns the splx() lock. The other thread seems not
to run to completion of the locked section of code. Solve that, and you
have it!
> old_spl = 0; // Free when we unlock this context
> CYG_ASSERT( 0 == splx_thread, "Thread still owned" );
> CYG_ASSERT( 0 == spl_state, "spl still set" );
> splx_thread = cyg_thread_self();
> }
>
> FYI: spl_any () is called some time during the splsoftnet() call in
> "net/tcpip/current/src/sys/net/route.c:608"
>
> On a last note, I would try and figure this out myself, but I found 2903
> occurrences of splx_mutex throughout the sources. I imagine there's
> someone out there that understand the net stack better, and could give
> me a hint as to why this might be happening, and how to resolve it!? :)
Use GDB to look at the mutex when the system is "deadlocked"; it contains
an "owner" field. That's a pointer to a thread, the owner. See what
thread it is; see where it's executing. There's your problem!
For example, if, somewhere in your driver code, you get stuck in a loop,
that would do it. Because whatever thread enters the driver code, must own
the splx() lock and therefore owns that mutex.
Of course, it could be any or all of the usual causes of odd behaviour such
as unexpected deep recursion ie. your driver receives a packet, make the
call to give it to the stack and another call asking you to transmit comes
in because of that receive; you notice there is a packet ready, so you make
to receive a packet, make the call to give it to the stack and ... leading
to stack overflow. Or just plain stack overflow anyway...
HTH,
- Huge