This is the mail archive of the libc-ports@sources.redhat.com mailing list for the libc-ports project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Coldfire __lll_lock fails under heavy system stress


On 11/1/2012 11:35 PM, Carlos O'Donell wrote:
> On Thu, Nov 1, 2012 at 12:51 PM, Joseph S. Myers
> <joseph@codesourcery.com> wrote:
>> On Thu, 1 Nov 2012, Ed Slas wrote:
>>
>>> Thanks for your time. I understand the kernel's atomic_cmpxchg_32() is
>>> most likely the issue, but note that most of the other platforms use a
>>> atomic lock in user space, then resort to the kernel to arbitrate
>>> contentions. The Coldfire port makes the atomic_cmpxchg_32 kernel call
>>> first, when there is a user space atomic lock available (TAS
>>> instruction).
>> I don't believe TAS is sufficient to implement a general
>> compare-and-exchange operation, such as is expected by NPTL.  The syscall
>> is used because the ColdFire architecture has neither an atomic
>> compare-and-exchange instruction, nor load-locked / store-conditional that
>> are used on some other architectures to implement compare-and-exchange in
>> userspace.
> Correct, TAS is not sufficient. You really do need to be able to CAS
> in both userspace *and* in the kernel for futexes to be useful.
>
> One defect on HP-PARISC was that our kernel-helper CAS didn't
> coordinate with the futex syscall.
>
> We fixed this by having the kernel-helper CAS use the same locks as
> the futex syscall would use in order to complete the futex operation
> when required.

Our older 32-bit tilepro architecture has this same issue of supporting
only a single not-very-powerful atomic primitive, "tns".  It has the
semantics of "atomic_exchange(1)", i.e. you write the 32-bit value "1" and
get back the old value.  In the end we provided a kernel fastpath cmpxchg()
operation (as well as a few other atomic update primitives like "add",
"and", and "or"), and we use the kernel cmpxchg() in the glibc fastpath. 
The kernel fastpath is really much faster than a regular syscall, though. 
We leave interrupts disabled throughout, don't save/restore any registers,
and just take some bits in VA and hash them into an array of "tns" locks to
implement atomicity.  When cache/TLB is hot the whole syscall takes only
about 50 cycles.  (And note that kernel locks, futex locks, and the fast
atomic syscalls all coordinate with each other on tilepro.)

We tried implementing pthread locks with "tns" in userspace, but it's
tricky because you need that extra bit of state to track whether the mutex
is contended.  We ended up just using our kernel fastpath for everything. 
(Well, we use "tns" for pthread_spinlock.)

One approach we rejected early on, because it seemed hard to get right, was
to use two words of state for glibc's lowlevellock, and use "tns" on a
spinlock word plus having a value word that held the lock state (0, 1, or
2).  One problem with this approach is that if a thread gets
context-switched while holding the "tns" lock and before completing the
read-modify-write of the value portion of the lock, the lock gets frozen
and everyone else ends up busywaiting on the spinlock part.  Real atomic
intructions are much more convenient; good thing the Tilera chip architects
listened to us software folks for the current generation processor :-)

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]