This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Lock elision problems in glibc-2.18
- From: Torvald Riegel <triegel at redhat dot com>
- To: libc-alpha at sourceware dot org
- Cc: andi at firstfloor dot org
- Date: Tue, 17 Sep 2013 12:58:05 +0200
- Subject: Re: Lock elision problems in glibc-2.18
- Authentication-results: sourceware.org; auth=none
- References: <20130823084916 dot GA5506 at linux dot vnet dot ibm dot com> <1378901002 dot 3196 dot 14075 dot camel at triegel dot csb> <1379001387 dot 32370 dot 1509 dot camel at triegel dot csb> <87ioy36hx2 dot fsf at tassilo dot jf dot intel dot com> <1379321886 dot 32370 dot 3665 dot camel at triegel dot csb> <20130916102501 dot GA4391 at linux dot vnet dot ibm dot com> <1379343986 dot 32370 dot 4419 dot camel at triegel dot csb> <20130917104309 dot GA25252 at linux dot vnet dot ibm dot com>
On Tue, 2013-09-17 at 12:43 +0200, Dominik Vogt wrote:
> On Mon, Sep 16, 2013 at 05:06:26PM +0200, Torvald Riegel wrote:
> > On Mon, 2013-09-16 at 12:25 +0200, Dominik Vogt wrote:
> > > On Mon, Sep 16, 2013 at 10:58:06AM +0200, Torvald Riegel wrote:
> > > > On Fri, 2013-09-13 at 23:23 -0700, Andi Kleen wrote:
> > > > > Torvald Riegel <triegel@redhat.com> writes:
> > > > >
> > > > > > We need to do this, and if the app
> > > > > > misinterprets and *always* keeps retrying, it will hang.
> > > > >
> > > > > The app is broken then. Retrying forever is simply not allowed.
> > > > > I don't think it makes sense to complicate glibc for this.
>
> Anyway, even if the application does not retry forever, this is no
> guarantee that the glibc abort handler is ever called.
If it does not retry forever, it eventually will execute
nontransactionally. Once it does, we've "peeled off" this layer of the
problem. Thus, eventually, glibc's elision will constitute the
outermost transaction, so its abort handlers will be executed if
necessary.
> The
> fallback code might use other mechanisms but pthread_mutex_lock
> for protection and thus the mutexes that aborted might not be used
> at all.
Right, but if pthread mutexes aren't used in the fallback, why do its
abort handlers need to be called? IOW, in such a case the caller
effectively turns off elision.
> Thus, the glibc code must not rely on its abort handlers
> ever being called.
I think that's not quite a precise characterization. It can rely on the
abort handlers being called it started the outermost transaction. It
can also expect that callers with enclosing transactions will eventually
fall back to nontransactional execution.
It is true that we cannot expect that abort handlers are always called,
so if you'd try to use the abort handlers for precise monitoring of
aborts caused by critical sections that use elision, then this wouldn't
work. But from a correctness point of view, the pthread mutexes will
eventually do what they are supposed to.
> I am very reluctant to claim all sorts of scenarios to be broken
> or impossible with the knowledge we have at the moment. The fact
> that glibc interfaces are backed by Posix does not make it immune
> to bugs.
I haven't heard this particular argument by anyone else so far. What
we're discussing here is not relly specific to any pthreads facilities,
but rather platform/arch-wide rules for how you can use HTMs with flat
nesting.