This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCHv2] powerpc: Spinlock optimization and cleanup
- From: Torvald Riegel <triegel at redhat dot com>
- To: munroesj at linux dot vnet dot ibm dot com
- Cc: "Paul E. Murphy" <murphyp at linux dot vnet dot ibm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, rth at twiddle dot net, Tulio Magno Quites Machado Filho <tuliom at linux dot vnet dot ibm dot com>, Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, Steve Munroe <sjmunroe at us dot ibm dot com>
- Date: Fri, 02 Oct 2015 13:38:28 +0200
- Subject: Re: [PATCHv2] powerpc: Spinlock optimization and cleanup
- Authentication-results: sourceware.org; auth=none
- References: <560C0DA6 dot 5060409 at linux dot vnet dot ibm dot com> <1443727671 dot 30828 dot 420 dot camel at localhost dot localdomain> <1443731284 dot 9011 dot 32 dot camel at oc7878010663>
On Thu, 2015-10-01 at 15:28 -0500, Steven Munroe wrote:
> On Thu, 2015-10-01 at 21:27 +0200, Torvald Riegel wrote:
> > On Wed, 2015-09-30 at 11:28 -0500, Paul E. Murphy wrote:
> > > diff --git a/sysdeps/powerpc/nptl/pthread_spin_lock.c b/sysdeps/powerpc/nptl/pthread_spin_lock.c
> > > index cc081f8..8a39da9 100644
> > > --- a/sysdeps/powerpc/nptl/pthread_spin_lock.c
> > > +++ b/sysdeps/powerpc/nptl/pthread_spin_lock.c
> > > @@ -19,24 +19,26 @@
> > > #include "pthreadP.h"
> > >
> > > int
> > > -pthread_spin_lock (lock)
> > > - pthread_spinlock_t *lock;
> > > +pthread_spin_lock (pthread_spinlock_t *lock)
> > > {
> > > unsigned int __tmp;
> > >
> > > asm volatile (
> > > - "1: lwarx %0,0,%1\n"
> > > + "0: lwzx %0,0,%1\n"
> > > + " cmpwi 0,%0,0\n"
> > > + " bne 0b\n"
> > > + "1: lwarx %0,0,%1" MUTEX_HINT_ACQ "\n"
> > > " cmpwi 0,%0,0\n"
> > > " bne- 2f\n"
> > > " stwcx. %2,0,%1\n"
> > > " bne- 2f\n"
> > > - " isync\n"
> > > - " .subsection 1\n"
> > > - "2: lwzx %0,0,%1\n"
> > > - " cmpwi 0,%0,0\n"
> > > - " bne 2b\n"
> > > - " b 1b\n"
> > > - " .previous"
> > > + __ARCH_ACQ_INSTR "\n"
> > > + " .subsection 1\n"
> > > + "2: lwzx %0,0,%1\n"
> > > + " cmpwi 0,%0,0\n"
> > > + " bne 2b\n"
> > > + " b 1b\n"
> > > + " .previous"
> > > : "=&r" (__tmp)
> > > : "r" (lock), "r" (1)
> > > : "cr0", "memory");
> >
> > Is this essentially a test-and-test-and-set implementation of a lock,
> > with the MUTEX_HINT_ACQ hint additionally? If so, have you considered
> > Have you considered adding a variant of
> > atomic_compare_exchange_weak_acquire or atomic_exchange_acquire that set
> > and hint, and then writing a C function using atomic an atomic load in
> > the outer test loop and using the new cmpxchg/xchg variant in the inner
> > loop? That would make it easier to eventually merge this with the
> > generic version of spinlocks. Also, once we have to adding spinning and
> > randomized exponential backoff to the generic locks, powerpc would be
> > able to benefit from those generic changes too (perhaps with some
> > additional tuning).
> >
> Paul is traveling today so I will answer.
>
> It may be functionally possible to use a
>
> __atomic_compare_exchange_n ((mem), (expected), (desired),
> weak, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)
>
> in this construct.
I agree, but that's not what I proposed.
> But the cmpxchg is a clumsy way to generate the PowerISA Acquire Import
> Barrier which has a very specific form.
>
> Also it is not obvious how the MUTEX_HINT_ACQ applies to
> __atomic_compare_exchange_n in general or even
> __atomic_compare_exchange_n (*, *, *, *, __ATOMIC_ACQUIRE,
> __ATOMIC_RELAXED).
>
> In the PowerISA, MUTEX_HINT_ACQ means:
>
> The Load and Reserve instructions include an Exclusive
> Access hint (EH), which can be used to indicate
> that the instruction sequence being executed is implementing
> one of two types of algorithms:
>
> Atomic Update (EH=0)
> This hint indicates that the program is using a fetch and
> operate (e.g., fetch and add) or some similar algorithm
> and that all programs accessing the shared variable are
> likely to use a similar operation to access the shared
> variable for some time.
> Exclusive Access (EH=1)
> This hint indicates that the program is attempting to
> acquire a lock and if it succeeds, will perform another
> store to the lock variable (releasing the lock) before
> another program attempts to modify the lock variable.
>
> It is not clear that C11 __atomic_compare_exchange when/if/ever implies
> exclusive access in the meaning of the PowerISA.
Yes. The hint is orthogonal to the C11 CAS semantics.
This is why I asked about adding a variant for *glibc's*
atomic_compare_exchange_weak_acquire, see include/atomic.h. This would
likely be implemented with custom asm on Power, and be just deferring to
atomic_compare_exchange_weak_acquire for all other archs.
Whether introducing that is better than just specializing the spinlocks
is something you will have to figure out. But there are several
lock-like CASes throughout the code (e.g., pthread_once, so not just in
the mutexes or rwlock). You probably don't want to have custom Power
variants of all those synchronization mechanisms.
> Finally trying to combine atomic_compare_exchange_weak_acquire with the
> relaxed load spin is likely to generate some awkward sequences with
> unnecessary code.
Perhaps, but that's determined by the compiler and how good it is at
optimizing atomics. Whether it matters will have to be tracked by a
benchmark: If this unnecessary code matters and the compiler isn't doing
a good enough job yet, it should be visible in some benchmark; once it
is, the benchmark will tell us and we're not stuck with an "old
assumption" in our code.
> I would prefer to proceed with this implementation and revisit when we
> are further along with C11 integration.
I don't mind having intermediate steps. But if you do something
Power-specific, and then somebody else improves the generic version, IMO
this other person has no obligation to spend any effort on updating your
Power-specific version too.