This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: NPTL futex error handling
- From: Rich Felker <dalias at aerifal dot cx>
- To: Siddhesh Poyarekar <siddhesh at redhat dot com>
- Cc: Florian Weimer <fweimer at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Tue, 28 Jan 2014 21:14:30 -0500
- Subject: Re: NPTL futex error handling
- Authentication-results: sourceware.org; auth=none
- References: <52E66CD1 dot 5030402 at redhat dot com> <CAAHN_R2bBH_xLA7dB_4zDXZyLURb6Euvcpni3GWFU2wY1Mvb0Q at mail dot gmail dot com> <52E672E2 dot 3010907 at redhat dot com> <20140128043817 dot GH2149 at spoyarek dot pnq dot redhat dot com>
On Tue, Jan 28, 2014 at 10:08:17AM +0530, Siddhesh Poyarekar wrote:
> On Mon, Jan 27, 2014 at 03:53:22PM +0100, Florian Weimer wrote:
> > On 01/27/2014 03:38 PM, Siddhesh Poyarekar wrote:
> > >On 27 January 2014 19:57, Florian Weimer <fweimer@redhat.com> wrote:
> > >>Looking at the kernel code, I believe that FUTEX_LOCK_PI and FUTEX_WAKE (in
> > >>the case of cross-process mutexes, it seems) can fail with ENOMEM, in
> > >>addition to the more-or-less expected failure cases.
> > >>
> > >>Is the ENOMEM return value due to kernel changes after the initial futex
> > >>implementation, or has this already been evaluated and deemed not be
> > >>necessary for correctness?
> > >
> > >I guess you're referring to the return from refill_pi_state_cache?
> >
> > Yes, and the call to get_user_pages in the cross-process case.
>
> Hmmm, I (incorrectly) assumed that get_user_pages would only return
> EFAULT. get_user_pages failing with ENOMEM will impact everything,
> even regular FUTEX_WAIT. In fact, it doesn't look like
> pthread_mutex_lock is supposed to return EFAULT either.
Am I mistaken or does this also affect unlock for non-PI mutexes? It
seems this could lead to a deadlock where the unlock fails to wake any
waiters. The proper fix at the kernel level, if there's no simpler
way, would be to simply wake ALL tasks waiting on futexes if a futex
wake fails due to ENOMEM.
Really the fundamental issue is that the kernel is poorly designed
without consideration for resource allocation failure. Operations like
"wake" should not consume resources. But even if this just fixes one
of many similar issues, I think we should push to get it fixed with
the aim of eventually solving the problem.
Rich