This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] mutex destruction (#13690): problem description and workarounds


On Mon, 2014-12-01 at 16:22 -0500, Rich Felker wrote:
> On Mon, Dec 01, 2014 at 09:52:30PM +0100, Torvald Riegel wrote:
> > On Mon, 2014-12-01 at 12:05 -0500, Rich Felker wrote:
> > > On Mon, Dec 01, 2014 at 05:42:05PM +0100, Torvald Riegel wrote:
> > > > On Mon, 2014-12-01 at 10:38 -0500, Rich Felker wrote:
> > > > > On Fri, Apr 04, 2014 at 04:20:30PM +0200, Torvald Riegel wrote:
> > > > > > === Workaround 1a: New FUTEX_WAKE_SPURIOUS operation that avoids the
> > > > > > specification change
> > > > > > 
> > > > > > This is like Workaround 1, except that the kernel could add a new futex
> > > > > > op that works like FUTEX_WAKE except that:
> > > > > > * FUTEX_WAITs woken up by a FUTEX_WAKE_SPURIOUS will always return
> > > > > > EINTR.  EINTR for spurious wakeups is already part of the spec, so
> > > > > > correct futex users are already handling this (e.g., glibc does).
> > > > > > * Make sure (and specify) that FUTEX_WAKE_SPURIOUS that hit other
> > > > > > futexes (e.g., PI) are ignored and don't cause wake-ups (or just benign
> > > > > > spurious wakeups already specified).
> > > > > > 
> > > > > > Users of FUTEX_WAKE_SPURIOUS should have to do very little compared to
> > > > > > when using FUTEX_WAKE.  The only thing that they don't have anymore is
> > > > > > the ability to distinguish between a real wakeup and a spurious one.
> > > > > > Single-use FUTEX_WAITs could be affected, but we don't have them in
> > > > > > glibc.  The only other benefit from being able to distinguish between
> > > > > > real and spurious is in combination with a timeout: If the wake-up is
> > > > > > real on a single-use futex, there's no need to check timeouts again.
> > > > > > But will programs want to use this often, and will they need to have to
> > > > > > use FUTEX_WAKE_SPURIOUS in this case?  I guess not.
> > > > > > 
> > > > > > Pros:
> > > > > > * Correct futex uses will need no changes.
> > > > > > Cons:
> > > > > > * Needs a new futex operation.
> > > > > 
> > > > > I'm fine with this except for the return value. EINTR should never
> > > > > mean anything but "interrupted by signal". Especially if we're going
> > > > > to be exposing futex() to applications as a public API, which should
> > > > > be done, applications should be able to rely on EINTR always being
> > > > > "interrupted by signal" in the sense that it's acceptable to assume it
> > > > > doesn't happen if you're not using (interrupting) signal handlers and
> > > > > that it's okay to use a standard EINTR retry loop if you want to. This
> > > > > would not be valid if EINTR were overloaded with the above meaning.
> > > > > 
> > > > > There are plenty of other errno codes that could be used without
> > > > > creating this problem. EINPROGRESS has good precedent as a "non-error"
> > > > > error condition, and seems like a reasonable choice, but I'm fine with
> > > > > anything that doesn't overload EINTR or other existing errors in ways
> > > > > that would break existing handling.
> > > > 
> > > > Given that glibc hasn't exposed an API for it, what would it break?
> > > 
> > > The kernel has exposed an API for it, and non-glibc software is using
> > > it via syscall() and/or asm. Answering that question would require
> > > surveying all such software. However I'm not sure that the proposal
> > > for a new FUTEX_WAKE_SPURIOUS would not already break such users. If
> > > they really want to count wakes and are relying on existing futex wait
> > > semantics, a new error condition that returns spuriously at seemingly
> > > random times is potentially going to break things (although not quite
> > > as badly as a spurious return of zero).
> > 
> > My proposal above reuses EINTR.  The futex man page states:
> > "Signals (see  signal(7)) or other spurious wakeups cause FUTEX_WAIT to
> > fail with the error EINTR."
> > 
> > The source of "other spurious wake-ups isn't defined, so I don't see how
> > a program could reliably prevent them, or reason that they won't ever
> > appear.  Thus, it seems that correct futex uses would have to be
> > prepared to handle EINTR.
> 
> I think this is incorrect documentation. I cannot find any hint at
> what other sort of "other spurious wake-ups" could cause EINTR.

But that's no reason to not have it.  I think it makes perfect sense to
allow for spurious wake-ups, especially for futexes.  Even if currently
there's no case in which there would be a spurious wake-up, it's safer
to have an error code that allows it so that if you need to have a
spurious wake-up later on, you have a way to delegate the issue to the
caller -- which, for futexes, is perfectly fine due to how they are
designed.

> > > There are other ways to use interrupting signals similarly to
> > > cancellation where you actually want to know you were interrupted by a
> > > signal handler.
> > 
> > But how would you distinguish from "other spurious wakeups" that are
> > currently allowed?
> 
> As far as I can tell that's just ***-covering by the man page with no
> basis in what the kernel actually does. If there are actually current
> situations under which it can produce EINTR without a signal, that's
> very bad. For instance sem_wait must return EINTR when actually
> interrupted by a non-SA_RESTART signal, but it's forbidden from
> returning EINTR if that didn't happen.

EINTR is a 'may fail'.  POSIX states that sem_wait is interruptible, but
I read this as allowing interruption, not requiring it.

The signal man pages list sem_wait as having to return EINTR if
interrupted, but what's the point?  There's no way for the thread that
raises the signal to know when sem_wait has started to execute.  So you
can never be sure when a signal will actually hit sem_wait.  The only
way I see to reliably interrupt sem_wait is to have the signal handler
execute sem_post -- because that's the only thing sem_wait checks before
blocking.  But then all that the semaphore implementation needs to do is
try to lock the semaphore when interrupted, and in this case it won't
return EINTR either.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]