This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [libc-alpha] linuxthreads bug in 2.2.4 under ppc linux


On Fri, 7 Dec 2001, Kevin B. Hendricks wrote:

> Based on dissassembling the code, the problem is here in
> 
> void __pthread_alt_unlock(struct _pthread_fastlock *lock);
> 
> After the test to see if the node was abandoned:
> 
>  if (p_node->abandoned) {
>         /* Remove abandoned node. */
> 
>  It turns out it was not and the else clause is invoked and the following 
> code is run:

It should *always* turn out this way if you are not using
pthread_mutex_timedlock. Only the timed-out locking function will
abandon a wait node in the wait queue. In fact that's the reason wait
nodes exist, so they can be abandoned, leaving it to the lock owner to
``garbage collect'' them. 

There isn't any safe way for a thread which times out on a wait to
remove its wait node from the middle of the wait queue; only the lock
owner can destructively manipulate the middle of the queue. Others can
only push stuff at the front using the atomic compare and swap,
so the owner knows it can't race against anyone removing in the middle.

(Too bad I somehow forgot about this principle by the time I got to
writing wait_node_alloc and wait_node_free. Doh!)
> 
>      } else if ((prio = p_node->thr->p_priority) >= maxprio) {
>         /* Otherwise remember it if its thread has a higher or equal 
> priority
>            compared to that of any node seen thus far. */
>         maxprio = prio;
>         pp_max_prio = pp_node;
>         p_max_prio = p_node;
>       }
> 
> But the wait_node structure being looked at had all 0 values

That tells you something is strange. A busy lock has some
non-zero value.  That value is 1 when no threads are waiting,
otherwise a pointer to the first wait node.

When the lock is busy, the calling thread enqueues onto the queue, and
so it copies the lock's status into the wait_node's  next field.

So, if the node is correctly enqueued, its next field cannot possibly
be zero. It's not a null terminated list, but a 1-terminated list,
so to speak. No node's next link can be null in that list.

> In the code r4 is the address of the fastlock and its status value is 
> 0x0fb57250 which is the pointer to the wait_node.
> 
> (gdb) x/10 $r4
> 0x7fffd3a4:     0x0fb57250      0x0fde66cc      0x0fb56e40      0x7fffd3c0
> 0x7fffd3b4:     0x0fdc8588      0x0fde66cc      0x0fb57250      0x7fffd3d0
> 0x7fffd3c4:     0x0fdc895c      0x0fb5f974
> 
> Unfortunately the wait node itself is all zeros (pnode->abandoned was 0 
> but also the thr and next pointers were 0.

Perhaps the thread bailed out of its wait prematurely, without being
resumed by the lock owner. That would spell disaster, because the wait
node is defined in automatic storage; it must not be destroyed while
it is still enqueued.

> So the question is is this a legal state?  

No; in fact this is a felony in at least 38 states. ;)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]