This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 0/6][BZ #11588] pi-condvars: add priority inheritance for pthread_cond_* internal lock


On Mon, Aug 18, 2014 at 11:53:18PM +0200, Torvald Riegel wrote:
> On Mon, 2014-08-18 at 17:41 -0400, Rich Felker wrote:
> > On Mon, Aug 18, 2014 at 11:14:19PM +0200, Torvald Riegel wrote:
> > > On Tue, 2014-07-29 at 19:31 -0500, gratian.crisan@ni.com wrote:
> > > > Torvald Riegel made us aware of the new POSIX changes related to condvars
> > > > (http://austingroupbugs.net/view.php?id=609) and C++11 clarification
> > > > (http://cplusplus.github.com/LWG/lwg-active.html#2190)
> > > > We believe we can work on these issues in parallel and if they end up
> > > > colliding we will fix it.
> > > 
> > > I've asked the Austin Group about the current status of #609.  It seems
> > > they want the stronger ordering guarantees:
> > > http://austingroupbugs.net/view.php?id=609#c2349
> > > 
> > > I have an implementation that fulfills these guarantees, but I don't
> > > think it's possible to fully implement PI with the stronger guarantees
> > > and the futex operations that we currently have.  The possible options

BTW, does the current implementation, or your new one pending, address
self-synchronized destruction, i.e.

"It shall be safe to destroy an initialized condition variable upon
which no threads are currently blocked."

> > > that I see are:
> > > * Accept potential ABA issues (e.g., a lost wake-up if you do exactly
> > > 2^32 signals after a wait).
> > > * Accept that there's no PI every 2^32 wait calls (maybe that number can
> > > be increased somewhat, but this depends on the interleaving of
> > > wait/signal calls I believe).
> > > * Don't support PI on process-shared condvars, so that we can boost the
> > > priority of waiters with per-waiter PI mutexes.  More overhead.
> > 
> > Is there any documentation on the current glibc design and how it
> > avoids the 2^32 ABA issue with sequence numbers?
> 
> The current algorithm doesn't prevent the ABA issue.

Do you have an in-progress new one that does, though? Naively it seems
impossible to avoid as long as you're using futex waits on a sequence
number as the wait mechanism, unless you can reliably avoid reusing
sequence numbers on which there is currently a waiter.

If you have access to a list of all the current waiters (possible in
non-process-shared case; impossible in process-shared case as far as I
can tell) including knowledge of which sequence numbers they're
waiting on, you can always choose a new sequence number distinct from
any number in the list, and thereby avoid the problem. This is a
different approach from what I used in my implementation, but I think
it's very practical for the non-process-shared case.

One alternate approach to sequence numbers is to wait on a futex where
the waiter's tid has been stored; and where the waker stores 0, -1,
its own tid, or similar before waking. This guarantees that the new
value stored will never be equal to the value another thread is
waiting on. However it has pathologically bad performance when lots of
waiters are arriving; a second waiter is likely to write its tid after
the first waiter has done so, but before the first waiter performs the
futex wait, and in that case you get flooded with spurious wakes
(EAGAIN from the futex syscall). Additional logic could prevent them
from turning into actual spurious returns from the pthread_cond_wait
function, I think, but you still get a lot of extra cpu load.

> > FYI the new cond var implementation I have in musl can definitely be
> > extended to support PI on the internal locks, and it has no sequence
> > number issues, but it's also not compatible with process-shared cond
> > vars.
> 
> Well, I haven't found an algorithm that avoids all ABA issues (modulo
> those you won't hit in practice, like 64b overflow), has the stronger
> ordering guarantees, and does PI all the time.
> 
> Given that we're discussing glibc here, if you want to contribute an
> implementation that does fulfill these requirements, please feel
> welcome.

Unfortunately that's beyond the scope of what I have time to do. I
cited the fact that I do have an implementation outside of glibc
meeting these requirements just as a basis for it being possible, and
maybe as a source of ideas. But the design is radically different from
glibc's (each waiter is waiting on a different futex address, on its
own stack, rather than an address inside the cond var object) so ideas
from it are unlikely to be directly applicable.

> > > What would be everyone's preference?
> > 
> > For the second option, if you can identify the sequence number at
> > which there's going to be an issue and where you could solve it by
> > refraining from using PI, why not do this instead: On that particular
> > signal/broadcast operation, make it always behave as a broadcast
> > (spurious wakes are allowed) and synchronize with all waiters, waiting
> > for them to reach the point where they're ready to block on the mutex
> > before the signal/broadcast function returns? For non-PI, this would
> > have very bad priority-related consequences, but with PI, I think
> > those problems are avoided. Naturally this one (one in 2^32) call
> > would be more expensive than typical signals/waits, but it should
> > work.
> 
> I believe the priority issue is still there, in particular with PI.
> Whether or not we do a broadcast isn't the critical point.  (For non-PI
> we have to assume an eventually fair / non-starving scheduler anyway, or
> none of our blocking algorithms would work reliably in practice.)

Hmm, indeed I don't see an immediate way to make it work. In order for
a PI lock to give priority to another thread, you need to know the
target thread's tid and have it stored in the futex value, but there
are an unbounded number of waiters and no way for them all to store
their tids where they are available to the signaling thread...

Rich


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]