This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/2] S390: Test if lock is free before using atomic instruction in spin-lock.


On 12/09/2016 04:16 PM, Stefan Liebler wrote:
On 11/24/2016 02:04 PM, Torvald Riegel wrote:
On Wed, 2016-11-23 at 16:09 +0100, Stefan Liebler wrote:
This patch changes the behaviour of pthread_spin_lock on s390:
Instead of spinning on the lock with compare-and-swap instruction,
the atomic_compare_and_exchange_bool_acq macro is used.
The s390 atomic_compare_and_exchange_bool_acq macro called with constant
zero as oldval first compares the lock with normal instructions.  If
it is
free the compare-and-swap instruction is used to aquire the lock.  While
the lock is held by one cpu another cpu will not exclusively lock the
memory of the lock in a loop.  If the lock is unlocked by another cpu
it is observed by the normal instruction and the lock is acquired
with a compare-and-swap instruction.

I don't want to have more arch-specific code than we really need.
Please compare what you have against the generic spinlock code.  If you
find the generic spinlock code lacking, then please propose a change for
it in a way that does not make things worse for other architectures.

If there are arch-specific properties that significantly affect the
approach the generic spinlock takes (eg, assumptions about CAS vs
atomic-exchange runtime overheads), then please split them out.

In the long term, this will also benefit s390.  For example, the
spinlock code you have has no backoff, and introduces spinning with
loads in a pretty ugly way through the hack you have added to the CAS
(first loading the lock's value and comparing it manually if the
supplied argument is a constant).
While the generic spinlock code we have is also not very great,
improving it is what will make life easier for the whole glibc project.
Also, if someone else improves the generic spinlock code, s390 would
miss out on this if you have added your custom spinlock code.


I had a look into the assembler output of generic spinlock code, e.g:
int
pthread_spin_trylock (pthread_spinlock_t *lock)
{
  return atomic_exchange_acq (lock, 1) ? EBUSY : 0;
}

On s390x assembler output, a new stack-frame is generated, the lock
value is loaded, stored to stack, loaded from stack and then passed to
cs-instruction.
After cs-instruction, the "old-value" is stored to stack, loaded from
stack and then compared to zero.

I also had a look into the aarch64 pthread_spin_trylock.os compiled with
build-many-script and a gcc 6.2.
aarch64 is using the __atomic-builtins for atomic_exchange_acq,
atomic_compare_and_exchange_val_acq, ... .
The atomic_exchange_acq results in the exclusive load/store. Then the
old-value is also stored to stack and is immediately loaded back in
order to compare it against zero.

The type pthread_spinlock_t is a volatile int!
If lock is casted to (int *) before passing it to the atomic macros,
the assembler output looks okay.

If I change the current generic spinlock code, do I have to rewrite it
to the c11-like macros?

I've tested the following function in advance:
int
foo (pthread_spinlock_t *lock)
{
  return atomic_load_acquire (lock);
}

With the __atomic_load_n version of atomic_load_acquire, the assembler
output contains a load which is returned as expected.

The non-c11-macro results in the following:
   0:   58 10 20 00             l       %r1,0(%r2)
   4:   b3 c1 00 0f             ldgr    %f0,%r15
   8:   e3 f0 ff 58 ff 71       lay     %r15,-168(%r15)
   e:   50 10 f0 a4             st      %r1,164(%r15)
  12:   58 10 f0 a4             l       %r1,164(%r15)
  16:   50 10 f0 a0             st      %r1,160(%r15)
  1a:   58 20 f0 a0             l       %r2,160(%r15)
  1e:   b3 cd 00 f0             lgdr    %r15,%f0
  22:   b9 14 00 22             lgfr    %r2,%r2
  26:   07 fe                   br      %r14
The extra stores/loads to/from stack result from the "__typeof (*(mem))
__atg101_val" usages in atomic_load_* macros in conjunction with volatile.

As this behaviour is not s390 specific, I've grep'ed to see which arches
use the generic spin-lock code and the c11-like-atomic-macros:
sysdeps/hppa/nptl/pthread_spin_lock.c:18:#define
SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
sysdeps/unix/sysv/linux/hppa/atomic-machine.h:40:#define
USE_ATOMIC_COMPILER_BUILTINS 0

sysdeps/microblaze/nptl/pthread_spin_lock.c:19:#define
SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
sysdeps/microblaze/atomic-machine.h:39:#define
USE_ATOMIC_COMPILER_BUILTINS 0

sysdeps/aarch64/nptl/pthread_spin_lock.c:19:#define
SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
sysdeps/aarch64/atomic-machine.h:40:#define USE_ATOMIC_COMPILER_BUILTINS 1

sysdeps/mips/nptl/pthread_spin_lock.c:18:#define
SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
sysdeps/mips/atomic-machine.h:95:#define USE_ATOMIC_COMPILER_BUILTINS 1
sysdeps/mips/atomic-machine.h:216:#define USE_ATOMIC_COMPILER_BUILTINS 0

sysdeps/nios2/nptl/pthread_spin_lock.c:19:#define
SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
sysdeps/unix/sysv/linux/nios2/atomic-machine.h:35:#define
USE_ATOMIC_COMPILER_BUILTINS 0

sysdeps/arm/nptl/pthread_spin_lock.c:18:#define
SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
sysdeps/arm/atomic-machine.h:37:#define USE_ATOMIC_COMPILER_BUILTINS 0

sysdeps/m68k/nptl/pthread_spin_lock.c:19:#define
SPIN_LOCK_READS_BETWEEN_CMPXCHG 1000
sysdeps/m68k/m680x0/m68020/atomic-machine.h:48:#define
USE_ATOMIC_COMPILER_BUILTINS 0
sysdeps/m68k/coldfire/atomic-machine.h:54:#define
USE_ATOMIC_COMPILER_BUILTINS 0
sysdeps/unix/sysv/linux/m68k/coldfire/atomic-machine.h:40:#define
USE_ATOMIC_COMPILER_BUILTINS 0

What is your suggestion, how to proceed with the volatile int type in
conjunction with the atomic-macros?

Bye Stefan

This patch is not needed anymore as I've posted an adjusted generic spinlock code:
[PATCH 1/2] Optimize generic spinlock code and use C11 like atomic macros.
https://www.sourceware.org/ml/libc-alpha/2016-12/msg00617.html

[PATCH 2/2] S390: Use generic spinlock code.
https://www.sourceware.org/ml/libc-alpha/2016-12/msg00618.html

Please have a look.

Bye.
Stefan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]