This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC] mutex destruction (#13690): problem description and workarounds
- From: Torvald Riegel <triegel at redhat dot com>
- To: Rich Felker <dalias at libc dot org>
- Cc: "Carlos O'Donell" <carlos at redhat dot com>, GLIBC Devel <libc-alpha at sourceware dot org>
- Date: Thu, 04 Dec 2014 20:01:07 +0100
- Subject: Re: [RFC] mutex destruction (#13690): problem description and workarounds
- Authentication-results: sourceware.org; auth=none
- References: <20141201170542 dot GY29621 at brightrain dot aerifal dot cx> <1417467150 dot 1771 dot 581 dot camel at triegel dot csb> <20141201212223 dot GZ29621 at brightrain dot aerifal dot cx> <1417553118 dot 3930 dot 14 dot camel at triegel dot csb> <20141202210316 dot GI29621 at brightrain dot aerifal dot cx> <547F17E3 dot 9060901 at redhat dot com> <1417703533 dot 22797 dot 16 dot camel at triegel dot csb> <5480807D dot 3040309 at redhat dot com> <20141204173402 dot GQ4574 at brightrain dot aerifal dot cx> <1417719246 dot 22797 dot 34 dot camel at triegel dot csb> <20141204185726 dot GS4574 at brightrain dot aerifal dot cx>
On Thu, 2014-12-04 at 13:57 -0500, Rich Felker wrote:
> On Thu, Dec 04, 2014 at 07:54:06PM +0100, Torvald Riegel wrote:
> > On Thu, 2014-12-04 at 12:34 -0500, Rich Felker wrote:
> > > On Thu, Dec 04, 2014 at 10:40:45AM -0500, Carlos O'Donell wrote:
> > > > On 12/04/2014 09:32 AM, Torvald Riegel wrote:
> > > > >> I agree. The conflation of EINTR for non-signal use is IMO going to be
> > > > >> a design decision we regret in the future.
> > > > >
> > > > > I'd rather see the fault in POSIX semantics, and it not making it clear
> > > > > that signal handlers should do sem_post if they need to reliably
> > > > > interrupt a sem_wait.
> > > >
> > > > If we are going to disallow a signal to interrupt sem_post we should just
> > > > change the semantics, version the interface, and document that glibc no
> > > > longer ever returns EINTR for sem_wait, and that the right way to interrupt
> > > > it is with a signal handler that does sem_post.
> > > >
> > > > This prevents users from complaining that what they observe with strace
> > > > and gdb is a signal arriving after the sem_wait, but not interrupting it.
> > > > We can claim the user is looking under the hood, but that's what they do,
> > > > and if we can possibly avoid those arguments we win. We know we're right,
> > > > we know we don't want to allow timing to imply ordering, but we need time
> > > > to educate developers (and that looking under the hood leads to non-obvious
> > > > observations).
> > > >
> > > > I really wish the kernel returned some other error code for woken up
> > > > vs. signal. Is it not possible to get the kernel to distinguish these
> > > > two? Am I forgetting something?
> > >
> > > It *DOES*. It returns 0 for woken-up, and EINTR for
> > > interrupted-by-signal.
> >
> > No. See man 2 futex, return values of FUTEX_WAIT:
> > "Signals (see signal(7)) or other spurious wakeups cause FUTEX_WAIT
> > to fail with the error EINTR."
> >
> > The LKML message that expanded on other error codes states that existing
> > wording for FUTEX_WAIT "seems ok": https://lkml.org/lkml/2014/5/15/356
> >
> > So, EINTR is currently documented as happening *either* due to a signal
> > or spuriously.
>
> This documentation is incorrect. There is currently no cause of EINTR
> other than signals, nor has there been in the past. I'll ask Michael
> to fix this.
Thanks, but asking Michael Kerrisk is not sufficient. What we need is
explicit agreement by the kernel folks (preferably on LKML) that this is
the contract that they want. There can very well be reasons to design a
return value in a certain way even if this isn't used by the current
implementation.