This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Race in pthread_mutex_lock while promoting to PTHREAD_MUTEX_ELISION_NP.
- From: Stefan Liebler <stli at linux dot ibm dot com>
- To: GNU C Library <libc-alpha at sourceware dot org>
- Cc: Torvald Riegel <triegel at redhat dot com>, "Carlos O'Donell" <codonell at redhat dot com>
- Date: Tue, 12 Jun 2018 16:26:40 +0200
- Subject: Re: Race in pthread_mutex_lock while promoting to PTHREAD_MUTEX_ELISION_NP.
- References: <395f3653-a80a-31d4-2291-8b5a1654fb15@linux.vnet.ibm.com> <1527975115.7569.51.camel@redhat.com>
Hi,
I've opened the "Bug 23275 - Race in pthread_mutex_lock while promoting
to PTHREAD_MUTEX_ELISION_NP" and posted a patch.
Please have a look at it and answer to that post:
"[PATCH] Fix race in pthread_mutex_lock while promoting to
PTHREAD_MUTEX_ELISION_NP [BZ #23275]"
(https://www.sourceware.org/ml/libc-alpha/2018-06/msg00246.html)
On 06/02/2018 11:31 PM, Torvald Riegel wrote:
On Thu, 2018-05-24 at 16:39 +0200, Stefan Liebler wrote:
How can we solve this issue?
Good catch. I don't have the time currently to look into this in
detail, but the underlying problem is that __kind is concurrently read
and modified, so we need to have a plan for how to deal with it,
document that (concurrency notes), and use atomics to avoid the data
races and make everyone aware of this field being concurrently accessed
data.
Regarding how to solve this: Maybe we should force elision only when
unlocking, so that the change happens within the critical section and
doesn't affect further operations within that critical section. We'd
still need to check all the paths on which concurrent threads also
trying to acquire the lock may have old information (ie, elision bit not
set); my guess would be that it should usually be fine if they believe
elision is not yet enabled if they read __kind again after acquiring the
lock.
If we force elision when unlocking, __kind is changed within the
critical section, but __kind is read in pthread_mutex_lock before
entering the critical section. Then there could be other threads in
pthread_mutex_lock - in normal-path - waiting for the futex. That's fine
and yes, I think only one thread is within the critical section. BUT
then the __owner, __nusers fields are set while locking, but not while
unlocking.
Not updating __owner or __nusers if elision is enabled may also be an
option (the mutex types that require checks aren't compatible with
elision).
If the mutex is using the elision path, then those fields aren't
updated. If the race occures, then pthread_mutex_lock - in normal-path -
updates those fields, but pthread_mutex_unlock - in elision-path - does
not. This leads to the mentioned symptoms.
Another approach may be to cache __pthread_force_elision in the mutex
itself but not by messing with __kind in a way that affects the handling
of the mutex.
More generally, it would be great if we could clean up the mutex code,
remove some of the special cases and checks that aren't required by
POSIX. In practice, I guess that many people simply don't check return
codes anyway -- one simply expects locking to work.
Bye
Stefan