This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Remove sparcv8 support

From: Chris Metcalf <cmetcalf at mellanox dot com>
To: David Miller <davem at davemloft dot net>, <triegel at redhat dot com>
Cc: <carlos at redhat dot com>, <adhemerval dot zanella at linaro dot org>, <andreas at gaisler dot com>, <libc-alpha at sourceware dot org>, <software at gaisler dot com>
Date: Thu, 10 Nov 2016 11:41:28 -0500
Subject: Re: Remove sparcv8 support
Authentication-results: sourceware.org; auth=none
Authentication-results: spf=none (sender IP is ) smtp.mailfrom=cmetcalf at mellanox dot com;
References: <502720f6-3057-41f5-7832-4b219f5f729f@redhat.com> <20161107.113825.631166023186879199.davem@davemloft.net> <1478711295.7146.969.camel@localhost.localdomain> <20161109.121552.63825213147087515.davem@davemloft.net>
Spamdiagnosticmetadata: NSPM
Spamdiagnosticoutput: 1:99

On 11/9/2016 12:15 PM, David Miller wrote:

From: Torvald Riegel <triegel@redhat.com>
Date: Wed, 09 Nov 2016 09:08:15 -0800

What approach are you going to use in the kernel to emulate the CAS if
the hardware doesn't offer one?  If you are not stopping all threads,
then there could be concurrent stores to the same memory location
targeted by the CAS; to make such stores atomic wrt. the CAS, you would
need to implement atomic stores in glibc to also use the kernel (eg, to
do a CAS).

I keep hearing about this case, but as long as the CAS is atomic what
is the difference between the store being synchronized in some way
or not?

I think the ordering allowed for gives the same set of legal results.

In any possible case either the CAS "wins" or the async store "wins"
and that determines the final result written.  All combinations are
legal outcomes even with a hardware CAS implementation.


That's not actually true.  Suppose you have an initial zero value, and you race
with a store of 2 and a kernel CAS from 0 to 1.  The legal output is only 2:
either the store hit first and the CAS failed, or the CAS hit first and succeeded,
then was overwritten by the 2.  But if the kernel CAS starts first and loads the
zero, then the store hits and sets the value to 2, the CAS will still decide it was
successful and write the 1, thus leaving the value illegally set to 1.

I really don't think such asynchronous stores are legal, nor should
the be explicitly accomodated in the CAS emulation support.  Either
the value is maintained in an atomic manner, or it is not.  And if it
is, updates must use CAS.  Straight stores are only legal on the
initialization of the word before any CAS code paths can get to the
value.

I cannot think of any sane setup that can allow async stores
intermixed with CAS updates.


So despite arguing above that mixing CAS and asynchronous store is safe,
here you are arguing that you shouldn't do it?  In any case yes, I think you
have come to the right conclusion, and you shouldn't do it.

If you're interested, I have some optimized code for the tilepro architecture to
handle this in arch/tile.  In kernel/intvec_32.S, the intvec_\vecname macro
does a fastpath check for negative syscalls and calls out to sys_cmpxchg, which
does some optimized work to figure out how to provide optimized atomics.
We actually support both 32 and 64-bit cmpxchg, as well as an "atomic_update"
that does (*mem & mask) + added, giving obvious implementations for
atomic_exchange, atomic_exchange_and_add, atomic_and_val, and atomic_or_val
(see glibc's sysdeps/tile/tilepro/atomic-machine.h).  There's some very hairy
stuff designed to handle the case of faulting with a bad user address here, since
we haven't set up the kernel stack yet.  But it works, and it's quite fast
(about 50 cycles to do the fast syscall).

We also hook into the same logic to support a more extended set of in-kernel
atomic operations; see arch/tile/lib/atomic*32* for that stuff.

The underlying locking is done by hashing into a lock table based on the low bits
of the address, which lets us support process-shared as well as process-private,
although it does mean that if multiple processes start up roughly
simultaneously and all try to lock the same process-private futex, they contend
with each other since they're using the same VA.  Oh well; we didn't come up
with a better solution that had good uncontended performance, but perhaps
there are better solutions to the hash function.

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

Follow-Ups:
- Re: Remove sparcv8 support
  - From: Torvald Riegel

References:
- Re: Remove sparcv8 support
  - From: David Miller
- Re: Remove sparcv8 support
  - From: Torvald Riegel
- Re: Remove sparcv8 support
  - From: David Miller

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]