This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Futex error handling

From: Roland McGrath <roland at hack dot frob dot com>
To: Torvald Riegel <triegel at redhat dot com>
Cc: GLIBC Devel <libc-alpha at sourceware dot org>, Darren Hart <dvhart at infradead dot org>
Date: Tue, 21 Oct 2014 15:34:04 -0700 (PDT)
Subject: Re: Futex error handling
Authentication-results: sourceware.org; auth=none
References: <1410881785 dot 4967 dot 292 dot camel at triegel dot csb> <1413821696 dot 8483 dot 40 dot camel at triegel dot csb>

The high-level comment is that we have always favored having actual bugs
cause quick and complete failure.

If it's a bug in libc, then it should fail early and catastrophically so
that we find out about the bug as soon as possible.  That trades off
against any runtime cost of detecting the case.  If it's cheap to detect,
then detect it.  If it's not so cheap, then don't pay the cost because we
don't expect that we'll have the bug.  Using assert is a middle ground for
things that have enough cost that we don't just leave them in all the time,
but little enough that there's still any question about it.  I don't know
what the distribution of NDEBUG use is like across distributions.  If every
distribution builds production libc with NDEBUG then in practice assert
will not catch any real-world problems and it shouldn't really count as
runtime detection, because only people developing libc will ever see it.

If it's user code invoking undefined behavior, then it should fail early
and catastrophically so that developers don't get the false impression that
their code is OK when it happens not to break the use cases they test
adequately.  (Said another way, so that we avoid giving developers an
excuse to complain when a future implementation change "breaks" their
programs that were always broken, but theretofore ignorably so.)  That too
trades off against any runtime cost of detecting the case.  I'd say the
allowance for cost of detection is marginally higher than in the first
case, because we expect user bugs to be more common that libc bugs.  But
it's still not much, since correct programs performing better is more
important to us than buggy programs being easier to debug.

Those are generic principles.  There's another kind of case I don't think
you mentioned, that is especially apropros for the futex operations.  That
is unexpected results from the kernel.  That could of course just be a libc
bug that causes its expectations to be wrong.  But it could also be a
kernel bug, or a new compatibility problem (e.g. some system call starts
returning new error codes in a new kernel version that weren't possible
when the libc code was written, built, and tested).  For those I'm not sure
there is any general rule that will really help.  It might just require
careful consideration case by case for what is the wisest form of
future-proofing.  Sometimes, propagating whatever error the kernel gave
back to the user is clearly the best thing to do.  But there might also be
situations where an unexpected result means that libc has become confused
about what state the kernel left things in, and crashing would be better.
And finally, there might well be instances of kernel bugs that we could
adequately recognize and work around.

Follow-Ups:
- Re: Futex error handling
  - From: Carlos O'Donell

References:
- Re: Futex error handling
  - From: Torvald Riegel

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]