This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Futex error handling
- From: Rich Felker <dalias at libc dot org>
- To: Torvald Riegel <triegel at redhat dot com>
- Cc: GLIBC Devel <libc-alpha at sourceware dot org>, Darren Hart <dvhart at infradead dot org>
- Date: Tue, 16 Sep 2014 14:54:57 -0400
- Subject: Re: Futex error handling
- Authentication-results: sourceware.org; auth=none
- References: <1410881785 dot 4967 dot 292 dot camel at triegel dot csb> <20140916165607 dot GZ23797 at brightrain dot aerifal dot cx> <1410891158 dot 4967 dot 303 dot camel at triegel dot csb>
On Tue, Sep 16, 2014 at 08:12:38PM +0200, Torvald Riegel wrote:
> > > FUTEX_WAKE, FUTEX_WAKE_OP:
> > > * EFAULT can be BL/BP *or* NF, so we *must not* abort or assert in this
> > > case. This is due to how futexes work when combined with certain rules
> > > for destruction of the underlying synchronization data structure; see my
> > > description of the mutex destruction issue (but this can happen with
> > > other data structures such as semaphores or cond vars too):
> > > https://sourceware.org/ml/libc-alpha/2014-04/msg00075.html
> >
> > Note that it's possible to use FUTEX_WAKE_OP in such a way that EFAULT
> > is reserved for BL/BP (and not NF). I don't see any point in
> > having/using FUTEX_WAKE_OP except for this purpose, but maybe I'm
> > missing something.
>
> I agree that I was a bit sloppy in the categorization. You're right
> that depending on how it's used, EFAULT can be just BL/BP. This applies
> to both FUTEX_WAKE and FUTEX_WAKE_OP, I think; the latter has just a
> finite number of bits, so you can't avoid an ABA issue entirely. You
I'm not sure what ABA issue you have in mind. The EFAULT case with
FUTEX_WAKE, and which I claim FUTEX_WAKE_OP avoids, is when the atomic
operation on the futex int that's associated with the wake allows
another thread to synchronize and determine that it may legally
destroy the object before the actual wake is sent. FUTEX_WAKE_OP can
fully avoid this by performing the atomic operation after looking up
and locking the futex hash bucket, so that there's no further access
after the atomic and thus no opportunity for fault.
> So, to summarize, my categories kind of assume a "typical" use of those
> operations in glibc. What I was trying to point out is that we can't
> abort in the generic futex syscall code when we see EFAULT, because
> that's wrong for typical uses of FUTEX_WAKE.
Yes, I agree with this. I'm not clear yet on whether it would be an
advantage to use FUTEX_WAKE_OP to avoid this, but I think it's
plausible that it might be.
Rich