This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] powerpc: remove linux lowlevellock.h
- From: Torvald Riegel <triegel at redhat dot com>
- To: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Wed, 24 Sep 2014 15:29:25 +0200
- Subject: Re: [PATCH] powerpc: remove linux lowlevellock.h
- Authentication-results: sourceware.org; auth=none
- References: <5421AF5E dot 8030402 at linux dot vnet dot ibm dot com> <1411554406 dot 27838 dot 93 dot camel at triegel dot csb> <5422B122 dot 8040803 at linux dot vnet dot ibm dot com>
On Wed, 2014-09-24 at 08:55 -0300, Adhemerval Zanella wrote:
> On 24-09-2014 07:26, Torvald Riegel wrote:
> > On Tue, 2014-09-23 at 14:35 -0300, Adhemerval Zanella wrote:
> >> diff --git a/sysdeps/unix/sysv/linux/powerpc/pthread_spin_unlock.c b/sysdeps/unix/sysv/linux/powerpc/pthread_spin_unlock.c
> >> index 2b8c84d..4906205 100644
> >> --- a/sysdeps/unix/sysv/linux/powerpc/pthread_spin_unlock.c
> >> +++ b/sysdeps/unix/sysv/linux/powerpc/pthread_spin_unlock.c
> >> @@ -22,7 +22,7 @@
> >> int
> >> pthread_spin_unlock (pthread_spinlock_t *lock)
> >> {
> >> - __asm __volatile (__lll_rel_instr ::: "memory");
> >> + __asm __volatile (__ARCH_REL_INSTR ::: "memory");
> >> *lock = 0;
> >> return 0;
> >> }
> > I'd prefer if this would be a memory_order_release barrier (which we
> > don't have yet, I know, so bear with me). However, I see that the
> > current situation is a bit messy:
> > * It seems that atomic_write_barrier is intended to be a C11
> > memory_order_release barrier/fence; at least it's used like that in all
> > what I saw in glibc. So atomic_write_barrier would be our current way
> > to express a release barrier.
> > * However, PowerPC atomic_write_barrier is "eieio" and is not using
> > __ARCH_REL_INSTR. Can you clarify the difference?
>
> These two pages [1] [2] have detailed information about C11/C++11 memory order
> mappings to PowerPC instructions.
I'm aware of these ...
> As you can see the store release is indeed 'lwsync; st'. First link also
> gives you an example of a spinlock implementation, however it is using a
> full memory barrier (sync) instead of a write one (lwsync). It is better
> explained in second link:
>
> "If the critical section contains only normal cached memory operations,
> the lwsync barrier may be substituted for the heavier-weight sync instruction
> above."
>
> Non-cached memory are handled by kernel and exported to application only in very
> specific situations (and afaik C11/C++11 specifications don't even handle it).
> The idea is a process won't try to run a spinlock in a DMA area.
>
> Also the 'Synchronizes With' gives some explanation about 'eieio' and 'lwsync'
> relation. My understanding is 'lwsync' is the correct memory order instruction
> to use in such cases.
>
> [1] http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
> [2] http://www.rdrop.com/users/paulmck/scalability/paper/N2745r.2011.03.04a.html
... but not of the difference between eieio and lwsync. AFAICT, N2745
states that eieio really applies only to stores.
What are your thoughts about changing atomic_write_barrier to eieio?
Are you aware of any prior reasoning about that (e.g., whether
atomic_write_barrier really only needs to prevent reordering of writes)?
Would there be a significant performance degradation?
> >
> > I'm interested in having a release barrier there so that this won't get
> > overlooked once we do the actual transition to C11 atomics.
> >
> > Also, related to that, it would be ideal if we could use the generic
> > implementation (nptl/pthread_spin_unlock.c) on Power as well. However,
> > this one uses atomic_full_barrier, which is stronger than what's used on
> > the major archs (x86, Power, ARM).
BTW, I've noticed that ARM uses the generic spin_lock, so ARM uses a
full barrier effectively.
> > I strongly believe this should use a release barrier, despite what POSIX
> > is claiming (see #16432), but we'll have to deal with that separately.
>
> I believe this is exactly why we have a specific POWER implementation: atomic
> full barrier in powerpc are 'sync' and for this spinlock cases 'lwsync' is suffice.
But it shouldn't be POWER-specific, IMO. Instead, the generic
implementation should have a release barrier only (so, translating to
lwsync on power), and all archs negatively affected by this should
update their release barriers to be proper release barriers.