This is the mail archive of the
libc-hacker@sourceware.cygnus.com
mailing list for the glibc project.
Re: "memory" clobber in synchronization primitives
- To: roland@frob.com
- Subject: Re: "memory" clobber in synchronization primitives
- From: Geoff Keating <geoffk@ozemail.com.au>
- Date: Thu, 3 Sep 1998 21:49:01 +1000
- CC: drepper@cygnus.com, libc-hacker@gnu.org
- References: <199809010503.BAA18697@baalperazim.frob.com>
> Date: Tue, 1 Sep 1998 01:03:14 -0400
> From: Roland McGrath <roland@frob.com>
> > Note that on ppc, 'volatile' is not at present sufficient for
> > multiprocessor synchronization. Arguably, this should be fixed in
> > egcs---certainly it would make the embedded people happy.
>
> Please clarify. `volatile' is not sufficient on the x86 either to
> guarantee coherent values when one processor writes while another reads. I
> do not think it needs to be or should be--that's what the explicit
> synchronization primitives are for. The purpose of `volatile' is to ensure
> that reads and writes cannot be optimized out, for things like mapped
> device registers, and interrupt handlers changing memory values. I do not
> think that `volatile' should be made to, or be expected to, do MP
> synchronization. Programmers can use the synchronization facilities.
The question then becomes "should the pthread lock/unlock functions be
using these synchronization facilities?". I claim that they should,
if only to allow user code to be machine-independent.
Then, if they do, it is the synchronization facilities that should be
memory barriers, not the actual compare-and-swap itself.
If this is done in linuxthreads itself, rather than in the
compare_and_swap routines, this makes life somewhat nicer; for one
thing, it is only necessary to put such a barrier _before_ unlock, and
_after_ lock, rather than both before and after as is done now for
ppc. In fact, I think there are some code paths that miss the barrier
in unlock, which is bad---fortunately at present they're too long for
real problems to occur.
So, I propose the attached patch. The only addition is a new macro,
#define enforce_memory_ordering() asm ("" : : : "memory")
which may or may not have to have something in the "", depending on
whether a particular class of linux ports supports multiple-processor
systems, and what your processor promises with respect to external
synchronization. The macro is a memory barrier: all stores before it
in program order occur before all stores after it in program order.
With this, a "generic" compare_and_swap definition is
extern inline int
__compare_and_swap (long *p, long oldval, long newval)
{
int result;
asm ("<insert something here>"
: "=g"(result), "+m"(*p)
: "g"(newval), "g"(oldval));
return result;
}
That is, it doesn't clobber 'memory', is not 'volatile', and it has
outputs of *p (which it may have changed) and the flag that says
whether or not it made the swap. test_and_set, for machines that
don't have compare_and_swap, is similar, it just has two fewer inputs.
PS: My copy of the "Intel Architecture Software Developer's Manual"
suggests the CPUID instruction for enforce_memory_ordering, at
least for i686. I will leave it up to people who can actually test it
to make the change if needed.
--
Geoffrey Keating <geoffk@ozemail.com.au>
===File ~/patches/glibc-26.diff=============================
1998-09-03 Geoff Keating <geoffk@ozemail.com.au>
* spinlock.c (__pthread_lock): Use enforce_memory_ordering().
(__pthread_trylock): Likewise.
(__pthread_unlock): Likewise.
(__pthread_compare_and_swap): Likewise.
* sysdeps/alpha/pt-machine.h (enforce_memory_ordering): New macro.
(RELEASE): Delete, is unused.
(__compare_and_swap): Remove 'mb' instruction.
* sysdeps/arm/pt-machine.h (enforce_memory_ordering): New macro.
* sysdeps/i386/pt-machine.h (enforce_memory_ordering): New macro.
* sysdeps/i386/i686/pt-machine.h (enforce_memory_ordering): New macro.
* sysdeps/m68k/pt-machine.h (enforce_memory_ordering): New macro.
* sysdeps/mips/pt-machine.h (enforce_memory_ordering): New macro.
* sysdeps/powerpc/pt-machine.h (enforce_memory_ordering): New macro
to replace sync().
(__compare_and_swap): Rewrite.
* sysdeps/sparc/sparc32/pt-machine.h (enforce_memory_ordering):
New macro.
(RELEASE): Delete, is unused.
* sysdeps/sparc/sparc64/pt-machine.h (enforce_memory_ordering):
New macro.
diff -urpN libc-old/linuxthreads/spinlock.c libc/linuxthreads/spinlock.c
--- libc-old/linuxthreads/spinlock.c Tue Aug 25 17:34:28 1998
+++ libc/linuxthreads/spinlock.c Thu Sep 3 12:38:20 1998
@@ -53,6 +53,7 @@ void __pthread_lock(struct _pthread_fast
self->p_nextwaiting = (pthread_descr) oldstatus;
} while(! compare_and_swap(&lock->status, oldstatus, newstatus,
&lock->spinlock));
+ enforce_memory_ordering();
if (oldstatus != 0) suspend(self);
}
@@ -64,6 +65,7 @@ int __pthread_trylock(struct _pthread_fa
oldstatus = lock->status;
if (oldstatus != 0) return EBUSY;
} while(! compare_and_swap(&lock->status, 0, 1, &lock->spinlock));
+ enforce_memory_ordering();
return 0;
}
@@ -73,6 +75,7 @@ void __pthread_unlock(struct _pthread_fa
pthread_descr thr, * ptr, * maxptr;
int maxprio;
+ enforce_memory_ordering();
again:
oldstatus = lock->status;
if (oldstatus == 1) {
@@ -109,6 +112,7 @@ again:
}
/* Wake up the selected waiting thread */
thr->p_nextwaiting = NULL;
+ enforce_memory_ordering();
restart(thr);
}
@@ -127,8 +131,10 @@ int __pthread_compare_and_swap(long * pt
{
int res;
if (testandset(spinlock)) __pthread_acquire(spinlock);
+ enforce_memory_ordering();
if (*ptr == oldval) {
*ptr = newval; res = 1;
+ enforce_memory_ordering();
} else {
res = 0;
}
diff -urpN libc-old/linuxthreads/sysdeps/alpha/pt-machine.h libc/linuxthreads/sysdeps/alpha/pt-machine.h
--- libc-old/linuxthreads/sysdeps/alpha/pt-machine.h Fri Aug 14 21:27:32 1998
+++ libc/linuxthreads/sysdeps/alpha/pt-machine.h Thu Sep 3 12:42:57 1998
@@ -21,6 +21,8 @@
#include <asm/pal.h>
+#define enforce_memory_ordering() __asm__ __volatile__("mb" : : : "memory")
+
/* Spinlock implementation; required. */
extern inline long testandset(int *spinlock)
@@ -44,12 +46,6 @@ extern inline long testandset(int *spinl
return ret;
}
-/* Spinlock release; default is just set to zero. */
-#define RELEASE(spinlock) \
- __asm__ __volatile__("mb" : : : "memory"); \
- *spinlock = 0
-
-
/* Begin allocating thread stacks at this address. Default is to allocate
them just below the initial program stack. */
#define THREAD_STACK_START_ADDRESS 0x40000000000
@@ -93,7 +89,7 @@ extern inline int __compare_and_swap(lon
"mov %3,%0\n\t"
"stq_c %0,%1\n\t"
"beq %0,1b\n\t"
- "2:\tmb\n"
+ "2:"
"/* End compare & swap */"
: "=&r"(ret), "=m"(*p)
: "r"(oldval), "r"(newval), "m"(*p));
diff -urpN libc-old/linuxthreads/sysdeps/arm/pt-machine.h libc/linuxthreads/sysdeps/arm/pt-machine.h
--- libc-old/linuxthreads/sysdeps/arm/pt-machine.h Fri Aug 14 21:27:32 1998
+++ libc/linuxthreads/sysdeps/arm/pt-machine.h Thu Sep 3 12:44:04 1998
@@ -24,6 +24,8 @@
machines. Unfortunately we have no way to detect this at compile
time; let's hope nobody tries to use one. */
+#define enforce_memory_ordering() __asm__ ("" : : : "memory")
+
/* Spinlock implementation; required. */
extern inline int
testandset (int *spinlock)
diff -urpN libc-old/linuxthreads/sysdeps/i386/i686/pt-machine.h libc/linuxthreads/sysdeps/i386/i686/pt-machine.h
--- libc-old/linuxthreads/sysdeps/i386/i686/pt-machine.h Fri Aug 14 21:27:32 1998
+++ libc/linuxthreads/sysdeps/i386/i686/pt-machine.h Thu Sep 3 12:45:03 1998
@@ -20,6 +20,8 @@
Boston, MA 02111-1307, USA. */
+#define enforce_memory_ordering() __asm__ ("" : : : "memory")
+
/* Spinlock implementation; required. */
extern inline int
testandset (int *spinlock)
diff -urpN libc-old/linuxthreads/sysdeps/i386/pt-machine.h libc/linuxthreads/sysdeps/i386/pt-machine.h
--- libc-old/linuxthreads/sysdeps/i386/pt-machine.h Fri Aug 14 21:27:32 1998
+++ libc/linuxthreads/sysdeps/i386/pt-machine.h Thu Sep 3 12:44:40 1998
@@ -20,6 +20,8 @@
Boston, MA 02111-1307, USA. */
+#define enforce_memory_ordering() __asm__ ("" : : : "memory")
+
/* Spinlock implementation; required. */
extern inline int
testandset (int *spinlock)
diff -urpN libc-old/linuxthreads/sysdeps/m68k/pt-machine.h libc/linuxthreads/sysdeps/m68k/pt-machine.h
--- libc-old/linuxthreads/sysdeps/m68k/pt-machine.h Fri Aug 14 21:27:32 1998
+++ libc/linuxthreads/sysdeps/m68k/pt-machine.h Thu Sep 3 12:46:47 1998
@@ -20,6 +20,9 @@
59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */
+/* Yes, 'nop' really does enforce memory ordering on the m68k series! */
+#define enforce_memory_ordering() __asm__ ("nop" : : : "memory")
+
/* Spinlock implementation; required. */
extern inline int
testandset (int *spinlock)
diff -urpN libc-old/linuxthreads/sysdeps/mips/pt-machine.h libc/linuxthreads/sysdeps/mips/pt-machine.h
--- libc-old/linuxthreads/sysdeps/mips/pt-machine.h Fri Aug 14 21:27:32 1998
+++ libc/linuxthreads/sysdeps/mips/pt-machine.h Thu Sep 3 12:47:40 1998
@@ -26,6 +26,7 @@
yet correctly. There is however a better solution for R3000
uniprocessor machines possible. */
+#define enforce_memory_ordering() __asm__ ("" : : : "memory")
/* Spinlock implementation; required. */
extern inline long testandset(int *spinlock)
diff -urpN libc-old/linuxthreads/sysdeps/powerpc/pt-machine.h libc/linuxthreads/sysdeps/powerpc/pt-machine.h
--- libc-old/linuxthreads/sysdeps/powerpc/pt-machine.h Tue Aug 25 17:34:29 1998
+++ libc/linuxthreads/sysdeps/powerpc/pt-machine.h Thu Sep 3 16:21:21 1998
@@ -25,9 +25,9 @@
are completed before we reset a lock. */
#if 0
/* on non multiprocessor systems, you can just: */
-#define sync() /* nothing */
+#define enforce_memory_ordering() asm ("" : : : "memory")
#else
-#define sync() __asm__ __volatile__ ("sync")
+#define enforce_memory_ordering() asm ("sync" : : : "memory")
#endif
/* Get some notion of the current stack. Need not be exactly the top
@@ -47,19 +47,17 @@ extern inline
int
__compare_and_swap (long *p, long oldval, long newval)
{
- int ret;
+ int curval;
- sync();
- __asm__ __volatile__(
- "0: lwarx %0,0,%1 ;"
- " xor. %0,%3,%0;"
- " bne 1f;"
- " stwcx. %2,0,%1;"
- " bne- 0b;"
- "1: "
- : "=&r"(ret)
- : "r"(p), "r"(newval), "r"(oldval)
- : "cr0", "memory");
- sync();
- return ret == 0;
+ asm ("\
+0: lwarx %0,0,%2
+ cmpw%I4 %0,%4
+ bne 1f
+ stwcx. %3,0,%2
+ bne- 0b
+1:"
+ : "=&r"(curval), "+m"(*p)
+ : "r"(p), "r"(newval), "Ir"(oldval)
+ : "cr0");
+ return curval == oldval;
}
diff -urpN libc-old/linuxthreads/sysdeps/sparc/sparc32/pt-machine.h libc/linuxthreads/sysdeps/sparc/sparc32/pt-machine.h
--- libc-old/linuxthreads/sysdeps/sparc/sparc32/pt-machine.h Fri Aug 14 21:27:32 1998
+++ libc/linuxthreads/sysdeps/sparc/sparc32/pt-machine.h Thu Sep 3 12:54:23 1998
@@ -19,6 +19,8 @@
write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA. */
+#define enforce_memory_ordering() asm ("stbar" : : : "memory")
+
/* Spinlock implementation; required. */
static inline int testandset(int *spinlock)
{
@@ -30,12 +32,6 @@ static inline int testandset(int *spinlo
return ret;
}
-
-
-/* Spinlock release; default is just set to zero. */
-#define RELEASE(spinlock) \
- __asm__ __volatile__("stbar; stb %1,%0" : "=m"(*(spinlock)) : "r"(0));
-
/* Get some notion of the current stack. Need not be exactly the top
of the stack, just something somewhere in the current frame. */
diff -urpN libc-old/linuxthreads/sysdeps/sparc/sparc64/pt-machine.h libc/linuxthreads/sysdeps/sparc/sparc64/pt-machine.h
--- libc-old/linuxthreads/sysdeps/sparc/sparc64/pt-machine.h Fri Aug 14 21:27:32 1998
+++ libc/linuxthreads/sysdeps/sparc/sparc64/pt-machine.h Thu Sep 3 12:54:57 1998
@@ -19,6 +19,7 @@
not, write to the Free Software Foundation, Inc.,
59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */
+#define enforce_memory_ordering() asm ("stbar" : : : "memory")
/* Spinlock implementation; required. */
extern inline int
============================================================