This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Futex error handling


We got complains from the kernel side that glibc wouldn't react properly
to futex errors being returned.  Thus, I'm looking at what we'd need to
actually improve.  I'm using this here as a documentation for futex
error codes: https://lkml.org/lkml/2014/5/15/356

Generally, we have three categories of faults (ie, the cause for an
error/failure):
* Bug in glibc ("BL")
* Bug in the client program ("BP")
* Failures that are neither a bug in glibc nor the program ("F")

Also, there are cases where it's not a "real" failure, but just
something that is expected behavior that needs to be handled ("NF").

I'm not aware of a general policy about whether glibc should abort or
assert (ie, abort only with assertion checks enabled) when the fault is
in the BL or BP categories.  I'd say we don't, because there's no way to
handle it anyway, and other things will likely go wrong; but I don't
have a strong opinion.  Thoughts?

For every futex op, here's a list of how I'd categorize the possible
error codes (I'm ignoring ENOSYS, which is NF when feature testing (or
BL)):

FUTEX_WAIT:
* EFAULT is either BL or BP.  Nothing we can do.  Should have failed
earlier when we accessed the futex variable.
* EINVAL (alignment and timeout normalization) is BL/BP.
* EWOULDBLOCK, ETIMEDOUT are NF.

FUTEX_WAKE, FUTEX_WAKE_OP:
* EFAULT can be BL/BP *or* NF, so we *must not* abort or assert in this
case.  This is due to how futexes work when combined with certain rules
for destruction of the underlying synchronization data structure; see my
description of the mutex destruction issue (but this can happen with
other data structures such as semaphores or cond vars too):
https://sourceware.org/ml/libc-alpha/2014-04/msg00075.html
* EINVAL (futex alignment) is BL/BP.
* EINVAL (inconsistent state or hit a PI futex) can be either BL/BP *or*
NF.  The latter is caused by the mutex destruction issue, only that a
pending FUTEX_WAKE after destruction doesn't hit an inaccessible memory
location but one which has been reused for a PI futex.  Thus, we must
not abort or assert in this case.

FUTEX_REQUEUE:
* Like FUTEX_WAKE, except that it's not safe to use concurrently with
possible destruction / reuse of the futex memory (because requeueing to
a futex that's unrelated to the new futex located in reused memory is
bad).

FUTEX_REQUEUE_CMP:
* Like FUTEX_REQUEUE.  EAGAIN is NF.

FUTEX_WAKE_OP:
* Haven't looked at this yet.  Only used in condvars, and might not be
necessary for a condvar that's not based on a condvar-internal lock.

FUTEX_WAIT_BITSET / FUTEX_WAKE_BITSET:
* Like FUTEX_WAIT / FUTEX_WAKE.  The additional EINVAL is BL.

FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI:
* EFAULT is BL/BP.
* ENOMEM is F.  We need to handle this.
* EINVAL, EPERM, ESRCH are BL/BP.
* EAGAIN and ETIMEDOUT are NF.
* EDEADLOCK is BP (or BL).
* EOWNERDIED is F.

FUTEX_UNLOCK_PI:
(* I guess this can return EFAULT too, which is BL/BP.)
* EINVAL and EPERM are BL/BP.  I don't think there's a mutex destruction
issue with PI locks because the kernel takes care of both resetting the
value of the futex var and waking up threads; it should do so in a way
that won't access reused memory.  I guess we should check that though...

FUTEX_WAIT_REQUEUE_PI:
* EFAULT and EINVAL are BL/BP.
* EWOULDBLOCK and ETIMEDOUT are NF.
* EOWNERDIED is F.

FUTEX_CMP_REQUEUE_PI is like FUTEX_CMP_REQUEUE except:
* ENOMEM is F.
* EPERM and ESRCH are BL/BP.
* EDEADLOCK is BP (or BL).


I think the next steps to improve this should be:
1) Getting consensus on how we want to handle BL and BP in general.
2) Applying the outcome of that to the list above and getting consensus
on the result.
3) For each case of F, find the best way to report it to the caller
(e.g., error code from the pthreads function, abort, ...).
4) Change each use of the futexes accordingly, one at a time.

I've asked Michael Kerrisk for the state of the futex error docs, but
haven't gotten a reply yet.  (Last time I checked, the new input from
the email I referred to above wasn't part of the futex docs yet.)

Thoughts?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]