This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] powerpc: remove linux lowlevellock.h


On 24-09-2014 10:29, Torvald Riegel wrote:
> On Wed, 2014-09-24 at 08:55 -0300, Adhemerval Zanella wrote:
>> On 24-09-2014 07:26, Torvald Riegel wrote:
>>> On Tue, 2014-09-23 at 14:35 -0300, Adhemerval Zanella wrote:
>>>> diff --git a/sysdeps/unix/sysv/linux/powerpc/pthread_spin_unlock.c b/sysdeps/unix/sysv/linux/powerpc/pthread_spin_unlock.c
>>>> index 2b8c84d..4906205 100644
>>>> --- a/sysdeps/unix/sysv/linux/powerpc/pthread_spin_unlock.c
>>>> +++ b/sysdeps/unix/sysv/linux/powerpc/pthread_spin_unlock.c
>>>> @@ -22,7 +22,7 @@
>>>>  int
>>>>  pthread_spin_unlock (pthread_spinlock_t *lock)
>>>>  {
>>>> -  __asm __volatile (__lll_rel_instr ::: "memory");
>>>> +  __asm __volatile (__ARCH_REL_INSTR ::: "memory");
>>>>    *lock = 0;
>>>>    return 0;
>>>>  }
>>> I'd prefer if this would be a memory_order_release barrier (which we
>>> don't have yet, I know, so bear with me).  However, I see that the
>>> current situation is a bit messy:
>>> * It seems that atomic_write_barrier is intended to be a C11
>>> memory_order_release barrier/fence; at least it's used like that in all
>>> what I saw in glibc.  So atomic_write_barrier would be our current way
>>> to express a release barrier.
>>> * However, PowerPC atomic_write_barrier is "eieio" and is not using
>>> __ARCH_REL_INSTR.  Can you clarify the difference?
>> These two pages [1] [2] have detailed information about C11/C++11 memory order
>> mappings to PowerPC instructions.
> I'm aware of these ...
>
>> As you can see the store release is indeed 'lwsync; st'.  First link also
>> gives you an example of a spinlock implementation, however it is using a 
>> full memory barrier (sync) instead of a write one (lwsync). It is better
>> explained in second link:
>>
>> "If the critical section contains only normal cached memory operations, 
>> the lwsync barrier may be substituted for the heavier-weight sync instruction
>> above."
>>
>> Non-cached memory are handled by kernel and exported to application only in very
>> specific situations (and afaik C11/C++11 specifications don't even handle it).
>> The idea is a process won't try to run a spinlock in a DMA area. 
>>
>> Also the 'Synchronizes With' gives some explanation about 'eieio' and 'lwsync'
>> relation. My understanding is 'lwsync' is the correct memory order instruction
>> to use in such cases.
>>
>> [1] http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
>> [2] http://www.rdrop.com/users/paulmck/scalability/paper/N2745r.2011.03.04a.html
> ... but not of the difference between eieio and lwsync.  AFAICT, N2745
> states that eieio really applies only to stores.
>
> What are your thoughts about changing atomic_write_barrier to eieio?
> Are you aware of any prior reasoning about that (e.g., whether
> atomic_write_barrier really only needs to prevent reordering of writes)?
> Would there be a significant performance degradation?

Reading again my last message I think I wasn't clear: my understanding of
'eieio' is to be a safe atomic_write_barrier for cacheable and cache inhibited
memory. And it is preferred over 'sync' for performance reasons.

Now, for cacheable memory (default memory allocated from kernel), a 'lwsync'
is a safe atomic_write_barrier and it also performance better than 'eieio'
(and AFAIK GCC won't emit 'eieio' for C11/C++11 atomic operations). That's why 
I see no need to change it to 'eieio' in the pthread_spin_unlock.

>
>>> I'm interested in having a release barrier there so that this won't get
>>> overlooked once we do the actual transition to C11 atomics.
>>>
>>> Also, related to that, it would be ideal if we could use the generic
>>> implementation (nptl/pthread_spin_unlock.c) on Power as well.  However,
>>> this one uses atomic_full_barrier, which is stronger than what's used on
>>> the major archs (x86, Power, ARM).
> BTW, I've noticed that ARM uses the generic spin_lock, so ARM uses a
> full barrier effectively.

A full barrier do work on PowerPC, the specific optimization to use the atomic
write 'lwsync' was added later.

>
>>> I strongly believe this should use a release barrier, despite what POSIX
>>> is claiming (see #16432), but we'll have to deal with that separately.
>> I believe this is exactly why we have a specific POWER implementation: atomic
>> full barrier in powerpc are 'sync' and for this spinlock cases 'lwsync' is suffice.
> But it shouldn't be POWER-specific, IMO.  Instead, the generic
> implementation should have a release barrier only (so, translating to
> lwsync on power), and all archs negatively affected by this should
> update their release barriers to be proper release barriers.

I agree with this definition. Do you have any updated on discussion you raise in
comment #1 in BZ#16432?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]