This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCHv2] powerpc: Spinlock optimization and cleanup

From: Torvald Riegel <triegel at redhat dot com>
To: munroesj at linux dot vnet dot ibm dot com
Cc: "Paul E. Murphy" <murphyp at linux dot vnet dot ibm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, rth at twiddle dot net, Tulio Magno Quites Machado Filho <tuliom at linux dot vnet dot ibm dot com>, Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, Steve Munroe <sjmunroe at us dot ibm dot com>
Date: Fri, 02 Oct 2015 13:38:28 +0200
Subject: Re: [PATCHv2] powerpc: Spinlock optimization and cleanup
Authentication-results: sourceware.org; auth=none
References: <560C0DA6 dot 5060409 at linux dot vnet dot ibm dot com> <1443727671 dot 30828 dot 420 dot camel at localhost dot localdomain> <1443731284 dot 9011 dot 32 dot camel at oc7878010663>

On Thu, 2015-10-01 at 15:28 -0500, Steven Munroe wrote:
> On Thu, 2015-10-01 at 21:27 +0200, Torvald Riegel wrote:
> > On Wed, 2015-09-30 at 11:28 -0500, Paul E. Murphy wrote:
> > > diff --git a/sysdeps/powerpc/nptl/pthread_spin_lock.c b/sysdeps/powerpc/nptl/pthread_spin_lock.c
> > > index cc081f8..8a39da9 100644
> > > --- a/sysdeps/powerpc/nptl/pthread_spin_lock.c
> > > +++ b/sysdeps/powerpc/nptl/pthread_spin_lock.c
> > > @@ -19,24 +19,26 @@
> > >  #include "pthreadP.h"
> > >  
> > >  int
> > > -pthread_spin_lock (lock)
> > > -     pthread_spinlock_t *lock;
> > > +pthread_spin_lock (pthread_spinlock_t *lock)
> > >  {
> > >    unsigned int __tmp;
> > >  
> > >    asm volatile (
> > > -       "1:	lwarx	%0,0,%1\n"
> > > +       "0:	lwzx    %0,0,%1\n"
> > > +       "	cmpwi   0,%0,0\n"
> > > +       "	bne	0b\n"
> > > +       "1:	lwarx	%0,0,%1" MUTEX_HINT_ACQ "\n"
> > >         "	cmpwi	0,%0,0\n"
> > >         "	bne-	2f\n"
> > >         "	stwcx.	%2,0,%1\n"
> > >         "	bne-	2f\n"
> > > -       "	isync\n"
> > > -       "	.subsection 1\n"
> > > -       "2:	lwzx	%0,0,%1\n"
> > > -       "	cmpwi	0,%0,0\n"
> > > -       "	bne	2b\n"
> > > -       "	b	1b\n"
> > > -       "	.previous"
> > > +                __ARCH_ACQ_INSTR "\n"
> > > +       "        .subsection 1\n"
> > > +       "2:	lwzx    %0,0,%1\n"
> > > +       "        cmpwi   0,%0,0\n"
> > > +       "        bne     2b\n"
> > > +       "        b       1b\n"
> > > +       "        .previous"
> > >         : "=&r" (__tmp)
> > >         : "r" (lock), "r" (1)
> > >         : "cr0", "memory");
> > 
> > Is this essentially a test-and-test-and-set implementation of a lock,
> > with the MUTEX_HINT_ACQ hint additionally?  If so, have you considered 
> > Have you considered adding a variant of
> > atomic_compare_exchange_weak_acquire or atomic_exchange_acquire that set
> > and hint, and then writing a C function using atomic an atomic load in
> > the outer test loop and using the new cmpxchg/xchg variant in the inner
> > loop?  That would make it easier to eventually merge this with the
> > generic version of spinlocks.  Also, once we have to adding spinning and
> > randomized exponential backoff to the generic locks, powerpc would be
> > able to benefit from those generic changes too (perhaps with some
> > additional tuning). 
> > 
> Paul is traveling today so I will answer.
> 
> It may be functionally possible to use a 
> 
> __atomic_compare_exchange_n ((mem), (expected), (desired),
> 	weak, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)
> 
> in this construct.

I agree, but that's not what I proposed.

> But the cmpxchg is a clumsy way to generate the PowerISA Acquire Import
> Barrier which has a very specific form. 
> 
> Also it is not obvious how the MUTEX_HINT_ACQ applies to
> __atomic_compare_exchange_n in general or even
> __atomic_compare_exchange_n (*, *, *, *, __ATOMIC_ACQUIRE,
> __ATOMIC_RELAXED).
> 
> In the PowerISA, MUTEX_HINT_ACQ means:
> 
>  The Load and Reserve instructions include an Exclusive
>  Access hint (EH), which can be used to indicate
>  that the instruction sequence being executed is implementing
>  one of two types of algorithms:
> 
>  Atomic Update (EH=0)
>   This hint indicates that the program is using a fetch and
>   operate (e.g., fetch and add) or some similar algorithm
>   and that all programs accessing the shared variable are
>   likely to use a similar operation to access the shared
>   variable for some time.
>  Exclusive Access (EH=1)
>   This hint indicates that the program is attempting to
>   acquire a lock and if it succeeds, will perform another
>   store to the lock variable (releasing the lock) before
>   another program attempts to modify the lock variable.
> 
> It is not clear that C11 __atomic_compare_exchange when/if/ever implies
> exclusive access in the meaning of the PowerISA.

Yes.  The hint is orthogonal to the C11 CAS semantics.

This is why I asked about adding a variant for *glibc's*
atomic_compare_exchange_weak_acquire, see include/atomic.h.  This would
likely be implemented with custom asm on Power, and be just deferring to
atomic_compare_exchange_weak_acquire for all other archs.

Whether introducing that is better than just specializing the spinlocks
is something you will have to figure out.  But there are several
lock-like CASes throughout the code (e.g., pthread_once, so not just in
the mutexes or rwlock).  You probably don't want to have custom Power
variants of all those synchronization mechanisms.

> Finally trying to combine atomic_compare_exchange_weak_acquire with the
> relaxed load spin is likely to generate some awkward sequences with
> unnecessary code.

Perhaps, but that's determined by the compiler and how good it is at
optimizing atomics.  Whether it matters will have to be tracked by a
benchmark: If this unnecessary code matters and the compiler isn't doing
a good enough job yet, it should be visible in some benchmark; once it
is, the benchmark will tell us and we're not stuck with an "old
assumption" in our code.

> I would prefer to proceed with this implementation and revisit when we
> are further along with C11 integration.

I don't mind having intermediate steps.  But if you do something
Power-specific, and then somebody else improves the generic version, IMO
this other person has no obligation to spend any effort on updating your
Power-specific version too.

References:
- Re: [PATCHv2] powerpc: Spinlock optimization and cleanup
  - From: Torvald Riegel
- Re: [PATCHv2] powerpc: Spinlock optimization and cleanup
  - From: Steven Munroe

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]