This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re:Example ex11 fails on SMP systems

From: "Steve Munroe" <sjmunroe at us dot ibm dot com>
To: libc-alpha at sources dot redhat dot com
Date: Wed, 24 Apr 2002 17:14:15 -0500
Subject: Re:Example ex11 fails on SMP systems

Continuing on this topic....

The initial patch only reactivated the spurious_wakeup restart loop but
resulted in a different hang. In this case we found that

if (self->p_nextlock != NULL)

was not sufficient to detect whether the pthread_descr was on the spinlock
wait list or not. Specifically pthread_unlock sets "self->p_nextlock =
NULL" before restarting a suspended thread. However this condition also
holds when the waiting thread is the last thread or only thread on the wait
list. This "false positive" was causing (in conjunction with the
spurious_wakeup) a thread, already on the wait list, to be enqueued again.

To address this problem I needed a way to guarantee that the ->p_nextlock
of any waiting thread would never be zero (NULL). The logic:

    do {
       oldstatus = lock->__status;
       ...
       newstatus = (long) self | 1;
       ...
         THREAD_SETMEM(self, p_nextlock, (pthread_descr) (oldstatus &
~1L));
       ...
     } while(! __compare_and_swap(&lock->__status, oldstatus, newstatus));

copies lock->__status (via oldstatus) to self->p_nextlock before storing
self (via newstatus) into lock->__status. For a blocked spinlock __status
will be non-zero (either 1L or self | 1L), so p_nextlock would be non-zero
as well except for the (oldstatus & ~1L) used in the assignment. This "ands
off" the low order bit and results in zero (NULL) for the first waiter.

Assigning oldstatus directly (without the "& ~1L") to p_nextlock would
guarantee that p_nextlock would be non-zero until pthread_unlock sets it to
NULL. However the logic of pthread_unlock depends on the p_nextlock chain
being valid addresses and the list being NULL terminated. So removing the "
& ~1L" from pthread_lock requires adding the same masking to three places
in pthread_unlock. The following patch contains the complete set of changes
to spinlock.c

>>>>>>>>>>>>>>
diff -rc2P glibc-2.2.5/ChangeLog glibc-2.2.5-pthreads/ChangeLog
*** glibc-2.2.5/ChangeLog     Sun Jan 20 21:20:18 2002
--- glibc-2.2.5-pthreads/ChangeLog  Wed Apr 24 16:49:24 2002
***************
*** 1,2 ****
--- 1,9 ----
+ 2002-04-24  Steven Munroe  <sjmunroe@us.ibm.com>
+
+     * linuxthreads/spinlock.c
+     Fixed race conditions in spinlock.c related to "spurious_wakeups"
+     generated by timed rwlocks. This fixes a hang in
+     linuxthreads/Examples/ex11 during make check.
+
  2002-01-18  Andreas Schwab  <schwab@suse.de>

diff -rc2P glibc-2.2.5/linuxthreads/spinlock.c
glibc-2.2.5-pthreads/linuxthreads/spinlock.c
*** glibc-2.2.5/linuxthreads/spinlock.c   Wed Aug 29 21:11:19 2001
--- glibc-2.2.5-pthreads/linuxthreads/spinlock.c      Tue Apr 23 11:11:19
2002
***************
*** 88,93 ****
    spin_count = 0;

- again:
-
    /* On SMP, try spinning to get the lock. */

--- 88,91 ----
***************
*** 117,120 ****
--- 115,120 ----
    }

+ again:
+
    /* No luck, try once more or suspend. */

***************
*** 133,137 ****

      if (self != NULL) {
!       THREAD_SETMEM(self, p_nextlock, (pthread_descr) (oldstatus & ~1L));
        /* Make sure the store in p_nextlock completes before performing
           the compare-and-swap */
--- 133,137 ----

      if (self != NULL) {
!       THREAD_SETMEM(self, p_nextlock, (pthread_descr) (oldstatus));
        /* Make sure the store in p_nextlock completes before performing
           the compare-and-swap */
***************
*** 217,221 ****
      }
      ptr = &(thr->p_nextlock);
!     thr = *ptr;
    }

--- 217,221 ----
      }
      ptr = &(thr->p_nextlock);
!     thr = (pthread_descr)((long)(thr->p_nextlock) & ~1L);
    }

***************
*** 229,233 ****
      thr = (pthread_descr) (oldstatus & ~1L);
      if (! __compare_and_swap_with_release_semantics
!         (&lock->__status, oldstatus, (long)(thr->p_nextlock)))
        goto again;
    } else {
--- 229,233 ----
      thr = (pthread_descr) (oldstatus & ~1L);
      if (! __compare_and_swap_with_release_semantics
!         (&lock->__status, oldstatus, (long)(thr->p_nextlock) & -1L))
        goto again;
    } else {
***************
*** 235,239 ****
         But in this case we must also flip the least significant bit
         of the status to mark the lock as released. */
!     thr = *maxptr;
      *maxptr = thr->p_nextlock;

--- 235,239 ----
         But in this case we must also flip the least significant bit
         of the status to mark the lock as released. */
!     thr = (pthread_descr)((long)*maxptr & -1L);
      *maxptr = thr->p_nextlock;

<<<<<<<<<<<<<<

With this patch applied ex11 ran 1000 iterations without failure (before it
would fail within 20 iterations).

I am not convinced that this is the best solution but it at least seems to
work. Anybody have a suggestions or alternatives?

Unfortunately this patch does not fix ex9 which still fails with either a
segfault or (less frequently) a hang. I will continue looking into ex9.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]