This is the mail archive of the libc-hacker@sourceware.cygnus.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: "memory" clobber in synchronization primitives


> Date: Tue, 1 Sep 1998 01:03:14 -0400
> From: Roland McGrath <roland@frob.com>

> > Note that on ppc, 'volatile' is not at present sufficient for
> > multiprocessor synchronization.  Arguably, this should be fixed in
> > egcs---certainly it would make the embedded people happy.
> 
> Please clarify.  `volatile' is not sufficient on the x86 either to
> guarantee coherent values when one processor writes while another reads.  I
> do not think it needs to be or should be--that's what the explicit
> synchronization primitives are for.  The purpose of `volatile' is to ensure
> that reads and writes cannot be optimized out, for things like mapped
> device registers, and interrupt handlers changing memory values.  I do not
> think that `volatile' should be made to, or be expected to, do MP
> synchronization.  Programmers can use the synchronization facilities.

The question then becomes "should the pthread lock/unlock functions be
using these synchronization facilities?".  I claim that they should,
if only to allow user code to be machine-independent.

Then, if they do, it is the synchronization facilities that should be
memory barriers, not the actual compare-and-swap itself.

If this is done in linuxthreads itself, rather than in the
compare_and_swap routines, this makes life somewhat nicer; for one
thing, it is only necessary to put such a barrier _before_ unlock, and
_after_ lock, rather than both before and after as is done now for
ppc.  In fact, I think there are some code paths that miss the barrier
in unlock, which is bad---fortunately at present they're too long for
real problems to occur.

So, I propose the attached patch. The only addition is a new macro,

#define enforce_memory_ordering() asm ("" : : : "memory")

which may or may not have to have something in the "", depending on
whether a particular class of linux ports supports multiple-processor
systems, and what your processor promises with respect to external
synchronization.  The macro is a memory barrier: all stores before it
in program order occur before all stores after it in program order.

With this, a "generic" compare_and_swap definition is

extern inline int
__compare_and_swap (long *p, long oldval, long newval)
{
  int result;
  asm ("<insert something here>"
       : "=g"(result), "+m"(*p)
       : "g"(newval), "g"(oldval));
  return result;
}

That is, it doesn't clobber 'memory', is not 'volatile', and it has
outputs of *p (which it may have changed) and the flag that says
whether or not it made the swap.  test_and_set, for machines that
don't have compare_and_swap, is similar, it just has two fewer inputs.

PS: My copy of the "Intel Architecture Software Developer's Manual"
suggests the CPUID instruction for enforce_memory_ordering, at
least for i686.  I will leave it up to people who can actually test it
to make the change if needed.

-- 
Geoffrey Keating <geoffk@ozemail.com.au>

===File ~/patches/glibc-26.diff=============================
1998-09-03  Geoff Keating  <geoffk@ozemail.com.au>

	* spinlock.c (__pthread_lock): Use enforce_memory_ordering().
	(__pthread_trylock): Likewise.
	(__pthread_unlock): Likewise.
	(__pthread_compare_and_swap): Likewise.
	* sysdeps/alpha/pt-machine.h (enforce_memory_ordering): New macro.
	(RELEASE): Delete, is unused.
	(__compare_and_swap): Remove 'mb' instruction.
	* sysdeps/arm/pt-machine.h (enforce_memory_ordering): New macro.
	* sysdeps/i386/pt-machine.h (enforce_memory_ordering): New macro.
	* sysdeps/i386/i686/pt-machine.h (enforce_memory_ordering): New macro.
	* sysdeps/m68k/pt-machine.h (enforce_memory_ordering): New macro.
	* sysdeps/mips/pt-machine.h (enforce_memory_ordering): New macro.
	* sysdeps/powerpc/pt-machine.h (enforce_memory_ordering): New macro
	to replace sync().
	(__compare_and_swap): Rewrite.
	* sysdeps/sparc/sparc32/pt-machine.h (enforce_memory_ordering):
	New macro.
	(RELEASE): Delete, is unused.
	* sysdeps/sparc/sparc64/pt-machine.h (enforce_memory_ordering):
	New macro.

diff -urpN libc-old/linuxthreads/spinlock.c libc/linuxthreads/spinlock.c
--- libc-old/linuxthreads/spinlock.c	Tue Aug 25 17:34:28 1998
+++ libc/linuxthreads/spinlock.c	Thu Sep  3 12:38:20 1998
@@ -53,6 +53,7 @@ void __pthread_lock(struct _pthread_fast
       self->p_nextwaiting = (pthread_descr) oldstatus;
   } while(! compare_and_swap(&lock->status, oldstatus, newstatus,
                              &lock->spinlock));
+  enforce_memory_ordering();
   if (oldstatus != 0) suspend(self);
 }
 
@@ -64,6 +65,7 @@ int __pthread_trylock(struct _pthread_fa
     oldstatus = lock->status;
     if (oldstatus != 0) return EBUSY;
   } while(! compare_and_swap(&lock->status, 0, 1, &lock->spinlock));
+  enforce_memory_ordering();
   return 0;
 }
 
@@ -73,6 +75,7 @@ void __pthread_unlock(struct _pthread_fa
   pthread_descr thr, * ptr, * maxptr;
   int maxprio;
 
+  enforce_memory_ordering();
 again:
   oldstatus = lock->status;
   if (oldstatus == 1) {
@@ -109,6 +112,7 @@ again:
   }
   /* Wake up the selected waiting thread */
   thr->p_nextwaiting = NULL;
+  enforce_memory_ordering();
   restart(thr);
 }
 
@@ -127,8 +131,10 @@ int __pthread_compare_and_swap(long * pt
 {
   int res;
   if (testandset(spinlock)) __pthread_acquire(spinlock);
+  enforce_memory_ordering();
   if (*ptr == oldval) {
     *ptr = newval; res = 1;
+    enforce_memory_ordering();
   } else {
     res = 0;
   }
diff -urpN libc-old/linuxthreads/sysdeps/alpha/pt-machine.h libc/linuxthreads/sysdeps/alpha/pt-machine.h
--- libc-old/linuxthreads/sysdeps/alpha/pt-machine.h	Fri Aug 14 21:27:32 1998
+++ libc/linuxthreads/sysdeps/alpha/pt-machine.h	Thu Sep  3 12:42:57 1998
@@ -21,6 +21,8 @@
 
 #include <asm/pal.h>
 
+#define enforce_memory_ordering() __asm__ __volatile__("mb" : : : "memory")
+
 
 /* Spinlock implementation; required.  */
 extern inline long testandset(int *spinlock)
@@ -44,12 +46,6 @@ extern inline long testandset(int *spinl
   return ret;
 }
 
-/* Spinlock release; default is just set to zero.  */
-#define RELEASE(spinlock) \
-  __asm__ __volatile__("mb" : : : "memory"); \
-  *spinlock = 0
-
-
 /* Begin allocating thread stacks at this address.  Default is to allocate
    them just below the initial program stack.  */
 #define THREAD_STACK_START_ADDRESS  0x40000000000
@@ -93,7 +89,7 @@ extern inline int __compare_and_swap(lon
 	"mov %3,%0\n\t"
 	"stq_c %0,%1\n\t"
 	"beq %0,1b\n\t"
-	"2:\tmb\n"
+	"2:"
 	"/* End compare & swap */"
 	: "=&r"(ret), "=m"(*p)
 	: "r"(oldval), "r"(newval), "m"(*p));
diff -urpN libc-old/linuxthreads/sysdeps/arm/pt-machine.h libc/linuxthreads/sysdeps/arm/pt-machine.h
--- libc-old/linuxthreads/sysdeps/arm/pt-machine.h	Fri Aug 14 21:27:32 1998
+++ libc/linuxthreads/sysdeps/arm/pt-machine.h	Thu Sep  3 12:44:04 1998
@@ -24,6 +24,8 @@
    machines.  Unfortunately we have no way to detect this at compile
    time; let's hope nobody tries to use one.  */
 
+#define enforce_memory_ordering() __asm__ ("" : : : "memory")
+
 /* Spinlock implementation; required.  */
 extern inline int
 testandset (int *spinlock)
diff -urpN libc-old/linuxthreads/sysdeps/i386/i686/pt-machine.h libc/linuxthreads/sysdeps/i386/i686/pt-machine.h
--- libc-old/linuxthreads/sysdeps/i386/i686/pt-machine.h	Fri Aug 14 21:27:32 1998
+++ libc/linuxthreads/sysdeps/i386/i686/pt-machine.h	Thu Sep  3 12:45:03 1998
@@ -20,6 +20,8 @@
    Boston, MA 02111-1307, USA.  */
 
 
+#define enforce_memory_ordering() __asm__ ("" : : : "memory")
+
 /* Spinlock implementation; required.  */
 extern inline int
 testandset (int *spinlock)
diff -urpN libc-old/linuxthreads/sysdeps/i386/pt-machine.h libc/linuxthreads/sysdeps/i386/pt-machine.h
--- libc-old/linuxthreads/sysdeps/i386/pt-machine.h	Fri Aug 14 21:27:32 1998
+++ libc/linuxthreads/sysdeps/i386/pt-machine.h	Thu Sep  3 12:44:40 1998
@@ -20,6 +20,8 @@
    Boston, MA 02111-1307, USA.  */
 
 
+#define enforce_memory_ordering() __asm__ ("" : : : "memory")
+
 /* Spinlock implementation; required.  */
 extern inline int
 testandset (int *spinlock)
diff -urpN libc-old/linuxthreads/sysdeps/m68k/pt-machine.h libc/linuxthreads/sysdeps/m68k/pt-machine.h
--- libc-old/linuxthreads/sysdeps/m68k/pt-machine.h	Fri Aug 14 21:27:32 1998
+++ libc/linuxthreads/sysdeps/m68k/pt-machine.h	Thu Sep  3 12:46:47 1998
@@ -20,6 +20,9 @@
    59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */
 
 
+/* Yes, 'nop' really does enforce memory ordering on the m68k series!  */
+#define enforce_memory_ordering() __asm__ ("nop" : : : "memory")
+
 /* Spinlock implementation; required.  */
 extern inline int
 testandset (int *spinlock)
diff -urpN libc-old/linuxthreads/sysdeps/mips/pt-machine.h libc/linuxthreads/sysdeps/mips/pt-machine.h
--- libc-old/linuxthreads/sysdeps/mips/pt-machine.h	Fri Aug 14 21:27:32 1998
+++ libc/linuxthreads/sysdeps/mips/pt-machine.h	Thu Sep  3 12:47:40 1998
@@ -26,6 +26,7 @@
    yet correctly.  There is however a better solution for R3000
    uniprocessor machines possible.  */
 
+#define enforce_memory_ordering() __asm__ ("" : : : "memory")
 
 /* Spinlock implementation; required.  */
 extern inline long testandset(int *spinlock)
diff -urpN libc-old/linuxthreads/sysdeps/powerpc/pt-machine.h libc/linuxthreads/sysdeps/powerpc/pt-machine.h
--- libc-old/linuxthreads/sysdeps/powerpc/pt-machine.h	Tue Aug 25 17:34:29 1998
+++ libc/linuxthreads/sysdeps/powerpc/pt-machine.h	Thu Sep  3 16:21:21 1998
@@ -25,9 +25,9 @@
    are completed before we reset a lock.  */
 #if 0
 /* on non multiprocessor systems, you can just: */
-#define sync() /* nothing */
+#define enforce_memory_ordering() asm ("" : : : "memory")
 #else
-#define sync() __asm__ __volatile__ ("sync")
+#define enforce_memory_ordering() asm ("sync" : : : "memory")
 #endif
 
 /* Get some notion of the current stack.  Need not be exactly the top
@@ -47,19 +47,17 @@ extern inline
 int
 __compare_and_swap (long *p, long oldval, long newval)
 {
-  int ret;
+  int curval;
 
-  sync();
-  __asm__ __volatile__(
-		       "0:    lwarx %0,0,%1 ;"
-		       "      xor. %0,%3,%0;"
-		       "      bne 1f;"
-		       "      stwcx. %2,0,%1;"
-		       "      bne- 0b;"
-		       "1:    "
-	: "=&r"(ret)
-	: "r"(p), "r"(newval), "r"(oldval)
-	: "cr0", "memory");
-  sync();
-  return ret == 0;
+  asm ("\
+0:	lwarx	%0,0,%2
+	cmpw%I4	%0,%4
+	bne	1f
+	stwcx.	%3,0,%2
+	bne-	0b
+1:"
+       : "=&r"(curval), "+m"(*p)
+       : "r"(p), "r"(newval), "Ir"(oldval)
+       : "cr0");
+  return curval == oldval;
 }
diff -urpN libc-old/linuxthreads/sysdeps/sparc/sparc32/pt-machine.h libc/linuxthreads/sysdeps/sparc/sparc32/pt-machine.h
--- libc-old/linuxthreads/sysdeps/sparc/sparc32/pt-machine.h	Fri Aug 14 21:27:32 1998
+++ libc/linuxthreads/sysdeps/sparc/sparc32/pt-machine.h	Thu Sep  3 12:54:23 1998
@@ -19,6 +19,8 @@
    write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
    Boston, MA 02111-1307, USA.  */
 
+#define enforce_memory_ordering() asm ("stbar" : : : "memory")
+
 /* Spinlock implementation; required.  */
 static inline int testandset(int *spinlock)
 {
@@ -30,12 +32,6 @@ static inline int testandset(int *spinlo
 
   return ret;
 }
-
-
-/* Spinlock release; default is just set to zero.  */
-#define RELEASE(spinlock) \
-  __asm__ __volatile__("stbar; stb %1,%0" : "=m"(*(spinlock)) : "r"(0));
-
 
 /* Get some notion of the current stack.  Need not be exactly the top
    of the stack, just something somewhere in the current frame.  */
diff -urpN libc-old/linuxthreads/sysdeps/sparc/sparc64/pt-machine.h libc/linuxthreads/sysdeps/sparc/sparc64/pt-machine.h
--- libc-old/linuxthreads/sysdeps/sparc/sparc64/pt-machine.h	Fri Aug 14 21:27:32 1998
+++ libc/linuxthreads/sysdeps/sparc/sparc64/pt-machine.h	Thu Sep  3 12:54:57 1998
@@ -19,6 +19,7 @@
    not, write to the Free Software Foundation, Inc.,
    59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */
 
+#define enforce_memory_ordering() asm ("stbar" : : : "memory")
 
 /* Spinlock implementation; required.  */
 extern inline int
============================================================


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]