This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v2] malloc: Fix list_lock/arena lock deadlock [BZ #19182]
- From: Torvald Riegel <triegel at redhat dot com>
- To: Florian Weimer <fweimer at redhat dot com>
- Cc: GNU C Library <libc-alpha at sourceware dot org>
- Date: Mon, 21 Dec 2015 16:21:23 +0100
- Subject: Re: [PATCH v2] malloc: Fix list_lock/arena lock deadlock [BZ #19182]
- Authentication-results: sourceware.org; auth=none
- References: <56389C10 dot 6060001 at redhat dot com> <56642691 dot 7060806 at redhat dot com> <1449852593 dot 27311 dot 98 dot camel at localhost dot localdomain> <566F1114 dot 1060706 at redhat dot com> <1450380373 dot 26597 dot 29 dot camel at localhost dot localdomain> <56740C24 dot 90901 at redhat dot com>
On Fri, 2015-12-18 at 14:37 +0100, Florian Weimer wrote:
> On 12/17/2015 08:26 PM, Torvald Riegel wrote:
> > On Mon, 2015-12-14 at 19:57 +0100, Florian Weimer wrote:
> >> On 12/11/2015 05:49 PM, Torvald Riegel wrote:
> >>> I would provide more detail on the synchronization of read-only
> >>> traversals of the list. list_lock seems to be used to enforce mutual
> >>> exclusion between modification operations to the list (including
> >>> necessary reading/traversal of the list, so you can't go in without a
> >>> lock, figure out you need to change, and then grab list_lock). For
> >>> traversal-only / write-op synchronization, atomic operations are used.
> >>> This is the reason why write ops do not need to use acquire fences when
> >>> traversing the list. The acquire fences are missing in reused_arena,
> >>> and ptmalloc_[un]lock_all[2], it seems.
> >>
> >> Yeah. reused_arena also has a data race on the local static variable
> >> (and this one could actually be harmful).
> >
> > The lack of acquire fences is harmful. Or do you mean that this piece
> > of code might be more likely to cause code-generation-related problems
> > triggered by data races?
>
> I'm worried that the exit condition used for the loop does not trigger
> reliably and that two or more threads could loop endlessly in
> reused_arena. I don't know if it can happen in practice.
I think eventually, a thread would pick up the most recent values and
given that the arena's stick around forever, the list navigation bug
might not lead to an error. However, when a new arena is added, a
thread may read unitialized values for the elements of the arena struct
(because the pointer to the arena is published with mo_release but not
read with mo_acquire).
> > I think a FIXME would be good; we know there is a bug, so we should just
> > document that.
>
> The attached version of the patch points out the data races I know of.
> I also adjusted the comments in struct malloc_state in malloc.c.
OK. The changes should be enough to make the issue stand out.
> > I'm not sure whether my question came across, so let me ask again: When
> > the lock is released temporarily, we still holds a pointer to the arena.
> > Until when we reacquire the lock, what can happen to the arena? Can the
> > arena and it's memory be deleted or similar, and the memory be reused
> > for something else?
>
> Arenas are allocated once and then stick around until the process exits.
Okay, that's important, and it seems to be required for correctness.
> The locking during arena selection is only there to select an
> uncontended arena. Releasing an arena lock and reacquiring it changes
> the way arenas are selected and means that another thread is more likely
> to select the same arena.
>
> I'm not sure why we even perform arena selection in parallel. The risk
> is that the lock probing detects the additional selection process, but
> not actual arena usage (for which the acquired lock might be a poor
> indicator anyway).
Agreed, based on your description of the situation.
Overall, this patch looks good to me. Thanks!