This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Futex error handling


On Tue, Sep 16, 2014 at 05:36:25PM +0200, Torvald Riegel wrote:
> We got complains from the kernel side that glibc wouldn't react properly
> to futex errors being returned.

Are these complaints public anywhere?

> Thus, I'm looking at what we'd need to
> actually improve.  I'm using this here as a documentation for futex
> error codes: https://lkml.org/lkml/2014/5/15/356
> 
> Generally, we have three categories of faults (ie, the cause for an
> error/failure):
> * Bug in glibc ("BL")
> * Bug in the client program ("BP")
> * Failures that are neither a bug in glibc nor the program ("F")
> 
> Also, there are cases where it's not a "real" failure, but just
> something that is expected behavior that needs to be handled ("NF").
> 
> I'm not aware of a general policy about whether glibc should abort or
> assert (ie, abort only with assertion checks enabled) when the fault is
> in the BL or BP categories.  I'd say we don't, because there's no way to
> handle it anyway, and other things will likely go wrong; but I don't
> have a strong opinion.  Thoughts?
> 
> For every futex op, here's a list of how I'd categorize the possible
> error codes (I'm ignoring ENOSYS, which is NF when feature testing (or
> BL)):
> 
> FUTEX_WAIT:
> * EFAULT is either BL or BP.  Nothing we can do.  Should have failed
> earlier when we accessed the futex variable.
> * EINVAL (alignment and timeout normalization) is BL/BP.
> * EWOULDBLOCK, ETIMEDOUT are NF.

I would distingish multiple versions of "BP" for EINVAL here. You seem
to have mixed "program has invoked undefined behavior" (e.g. invalid
synchronization object) with "program has provided an erroneous
argument which the implementation is required to report" (e.g. invalid
timespec contents). If you don't want to add a new class, the latter
technically could just be considered NF; it's fully equivalent to NF
in terms of how it has to be handled.

> FUTEX_WAKE, FUTEX_WAKE_OP:
> * EFAULT can be BL/BP *or* NF, so we *must not* abort or assert in this
> case.  This is due to how futexes work when combined with certain rules
> for destruction of the underlying synchronization data structure; see my
> description of the mutex destruction issue (but this can happen with
> other data structures such as semaphores or cond vars too):
> https://sourceware.org/ml/libc-alpha/2014-04/msg00075.html

Note that it's possible to use FUTEX_WAKE_OP in such a way that EFAULT
is reserved for BL/BP (and not NF). I don't see any point in
having/using FUTEX_WAKE_OP except for this purpose, but maybe I'm
missing something.

> * EINVAL (futex alignment) is BL/BP.
> * EINVAL (inconsistent state or hit a PI futex) can be either BL/BP *or*
> NF.  The latter is caused by the mutex destruction issue, only that a
> pending FUTEX_WAKE after destruction doesn't hit an inaccessible memory
> location but one which has been reused for a PI futex.  Thus, we must
> not abort or assert in this case.

Nice catch. I wasn't aware of this one (the PI causing EINVAL).

> FUTEX_REQUEUE:
> * Like FUTEX_WAKE, except that it's not safe to use concurrently with
> possible destruction / reuse of the futex memory (because requeueing to
> a futex that's unrelated to the new futex located in reused memory is
> bad).

I think all cases where FUTEX_REQUEUE remotely makes sense as an
operation inherently preclude destruction of the target (for example,
the mutex that cond var waiters are going to try to relock can't be
deleted until they've all relocked it and returned).

> I think the next steps to improve this should be:
> 1) Getting consensus on how we want to handle BL and BP in general.
> 2) Applying the outcome of that to the list above and getting consensus
> on the result.
> 3) For each case of F, find the best way to report it to the caller
> (e.g., error code from the pthreads function, abort, ...).
> 4) Change each use of the futexes accordingly, one at a time.

Sounds reasonable to me.

> I've asked Michael Kerrisk for the state of the futex error docs, but
> haven't gotten a reply yet.  (Last time I checked, the new input from
> the email I referred to above wasn't part of the futex docs yet.)

Yes, documentation on futex behavior would be very nice to have...

Rich


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]