This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] [RFC] nptl: use compare and exchange for lll_cond_lock
- From: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- To: libc-alpha at sourceware dot org
- Date: Thu, 25 Sep 2014 21:20:48 -0300
- Subject: Re: [PATCH] [RFC] nptl: use compare and exchange for lll_cond_lock
- Authentication-results: sourceware.org; auth=none
- References: <5421B1C2 dot 9020509 at linux dot vnet dot ibm dot com> <1411668008 dot 22112 dot 67 dot camel at triegel dot csb>
On 25-09-2014 15:00, Torvald Riegel wrote:
> On Tue, 2014-09-23 at 14:45 -0300, Adhemerval Zanella wrote:
>> While checking the generated code and macros used in generic lowlevellock.h,
>> I noted powerpc and other arch uses uses a compare and swap instead of a plain
>> exchange value on lll_cond_lock.
>> I am not really sure which behavior would be desirable, since as far I could
>> they will have both the same side effects (since lll_cond_lock, different
>> from lll_lock, does not hold value of '1').
> What do you mean by "[the function] does not hold value of '1'"?
Bad wording in fact, I mean the 'futex' used in lll_cond_lock.
>
>> So I am proposing this patch to sync default implementation for what mostly
>> architectures (ia64, ppc, s390, sparc, x86, hppa) uses for lll_cond_lock. I see
>> that only microblaze and sh (I am not sure about this one, I not well versed in
>> its assembly and I'm being guided by its comment) used the atomic_exchange_acq
>> instead.
> I think both versions work from a correctness POV, but doing an
> unconditional exchange should be the right thing to do.
>
> The default implementation of __lll_lock_wait will test if the futex
> variable equals 2, and if not, do an exchange right away before running
> the FUTEX_WAIT syscall. So if the CAS that you propose fails, the next
> thing that will happen is an exchange. Thus, it seems that we should do
> the exchange right away.
>
> Thoughts?
The only 'advantage' I see on using the compare and exchange version is it might be
an optimization on architectures that uses LL/SC instead of CAS instruction. For
instance on POWER, the exchange version is translated to:
li r9,2
1: lwarx 10,0,3,1
stwcx. 9,0,3
bne- 1b
isync
And for compare and exchange:
li r10,2
li r9,0
1: lwarx r8,r0,r3,1
cmpw r8,r9
bne 2f
stwcx. r10,r0,r3
bne- 1b
2: isync
So for contend cases if the lock is taken it avoids the store (which for POWER8 is
at least 10 cycles to more).