This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCHv2] powerpc: Spinlock optimization and cleanup


On Wed, 2015-09-30 at 11:28 -0500, Paul E. Murphy wrote:
> Changes from V1:
> 
> * Use C macro for atomic_store_release as suggested in
> comments.
> 
> * Run benchmarks to quantize the performance changes and
> note in commit message.
> 
> ---8<---
> This patch optimizes powerpc spinlock implementation by:
> 
> * Current algorithm spin over a lwzx, but only after issuing a lwarx.
>   first.  The optimization for such case is to avoid the first lwarx
>   in contention case (which is much more costly than normal loads).
> 
> * Use the correct EH hint bit on the larx for supported ISA.  For lock
>   acquisition, the thread that acquired the lock with a successful stcx
>   do not want to give away the write ownership on the cacheline.  The
>   idea is to make the load reservation "sticky" about retaining write
>   authority to the line.  That way, the store that must inevitably come
>   to release the lock can succeed quickly and not contend with other
>   threads issuing lwarx.  If another thread does a store to the line
>   (false sharing), the winning thread must give up write authority to
>   The proper value of EH for the larx for a lock acquisition is 1.
> 
> * Increase contented lock performance by up to 40%, and no measurable
>   impact on uncontended locks on P8.

Could you add the tests you used to the glibc microbenchmarks (or
whatever works best for them)?  We do want to be able to track
performance, and having benchmarks is the first step towards that.

> It also adds some cleanup to use the defined acquire semantic
> instructions and function prototype using default C style.
> 
> Thanks to Adhemerval Zanella who did most of the work.  I've run some
> tests, and addressed some minor feedback.
> 
> 2015-09-25  Paul E. Murphy  <murphyp@linux.vnet.ibm.com>
> 
> 	* sysdeps/powerpc/nptl/pthread_spin_lock.c (pthread_spin_lock):
> 	Optimize first check for contention case and add lwarx hint.
> 	* sysdeps/powerpc/nptl/pthread_spin_trylock.c (pthread_spin_trylock):
> 	Use ANSI prototype.
> 	* sysdep/unix/sysv/linux/powerpc/pthread_spin_unlock.c: Move to ...
> 	* sysdeps/powerpc/nptl/pthread_spin_unlock.c: ... here, and
> 	update to new atomic macros.
> ---
>  sysdeps/powerpc/nptl/pthread_spin_lock.c           |   22 ++++++++-------
>  sysdeps/powerpc/nptl/pthread_spin_trylock.c        |    3 +-
>  sysdeps/powerpc/nptl/pthread_spin_unlock.c         |   27 +++++++++++++++++++
>  .../unix/sysv/linux/powerpc/pthread_spin_unlock.c  |   28 --------------------
>  4 files changed, 40 insertions(+), 40 deletions(-)
>  create mode 100644 sysdeps/powerpc/nptl/pthread_spin_unlock.c
>  delete mode 100644 sysdeps/unix/sysv/linux/powerpc/pthread_spin_unlock.c
> 
> diff --git a/sysdeps/powerpc/nptl/pthread_spin_lock.c b/sysdeps/powerpc/nptl/pthread_spin_lock.c
> index cc081f8..8a39da9 100644
> --- a/sysdeps/powerpc/nptl/pthread_spin_lock.c
> +++ b/sysdeps/powerpc/nptl/pthread_spin_lock.c
> @@ -19,24 +19,26 @@
>  #include "pthreadP.h"
>  
>  int
> -pthread_spin_lock (lock)
> -     pthread_spinlock_t *lock;
> +pthread_spin_lock (pthread_spinlock_t *lock)
>  {
>    unsigned int __tmp;
>  
>    asm volatile (
> -       "1:	lwarx	%0,0,%1\n"
> +       "0:	lwzx    %0,0,%1\n"
> +       "	cmpwi   0,%0,0\n"
> +       "	bne	0b\n"
> +       "1:	lwarx	%0,0,%1" MUTEX_HINT_ACQ "\n"
>         "	cmpwi	0,%0,0\n"
>         "	bne-	2f\n"
>         "	stwcx.	%2,0,%1\n"
>         "	bne-	2f\n"
> -       "	isync\n"
> -       "	.subsection 1\n"
> -       "2:	lwzx	%0,0,%1\n"
> -       "	cmpwi	0,%0,0\n"
> -       "	bne	2b\n"
> -       "	b	1b\n"
> -       "	.previous"
> +                __ARCH_ACQ_INSTR "\n"
> +       "        .subsection 1\n"
> +       "2:	lwzx    %0,0,%1\n"
> +       "        cmpwi   0,%0,0\n"
> +       "        bne     2b\n"
> +       "        b       1b\n"
> +       "        .previous"
>         : "=&r" (__tmp)
>         : "r" (lock), "r" (1)
>         : "cr0", "memory");

Is this essentially a test-and-test-and-set implementation of a lock,
with the MUTEX_HINT_ACQ hint additionally?  If so, have you considered 
Have you considered adding a variant of
atomic_compare_exchange_weak_acquire or atomic_exchange_acquire that set
and hint, and then writing a C function using atomic an atomic load in
the outer test loop and using the new cmpxchg/xchg variant in the inner
loop?  That would make it easier to eventually merge this with the
generic version of spinlocks.  Also, once we have to adding spinning and
randomized exponential backoff to the generic locks, powerpc would be
able to benefit from those generic changes too (perhaps with some
additional tuning). 

> diff --git a/sysdeps/powerpc/nptl/pthread_spin_unlock.c b/sysdeps/powerpc/nptl/pthread_spin_unlock.c
> new file mode 100644
> index 0000000..f830ad2
> --- /dev/null
> +++ b/sysdeps/powerpc/nptl/pthread_spin_unlock.c
> @@ -0,0 +1,27 @@
> +/* pthread_spin_unlock -- unlock a spin lock.  PowerPC version.
> +   Copyright (C) 2007-2015 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "pthreadP.h"
> +#include <lowlevellock.h>

Do you really need to include lowlevellock.h?

> +int
> +pthread_spin_unlock (pthread_spinlock_t *lock)
> +{
> +  atomic_store_release (lock, 0);
> +  return 0;
> +}



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]