This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v3] malloc: Fix list_lock/arena lock deadlock [BZ #19182]


On Tue, 2015-12-15 at 13:13 +0100, Florian Weimer wrote:
> diff --git a/malloc/arena.c b/malloc/arena.c
> index 3dab7bb..a84bb22 100644
> --- a/malloc/arena.c
> +++ b/malloc/arena.c
> @@ -68,15 +68,23 @@ extern int sanity_check_heap_info_alignment[(sizeof (heap_info)
>  
>  static __thread mstate thread_arena attribute_tls_model_ie;
>  
> -/* Arena free list.  list_lock protects the free_list variable below,
> -   and the next_free and attached_threads members of the mstate
> -   objects.  No other (malloc) locks must be taken while list_lock is
> -   active, otherwise deadlocks may occur.  */
> +/* Arena free list.  free_list_lock synchronizes access to the
> +   free_list variable below, and the next_free and attached_threads
> +   members of struct malloc_state objects.  No other locks must be
> +   acquired after free_list_lock has been acquired.  */
>  
> -static mutex_t list_lock = _LIBC_LOCK_INITIALIZER;
> +static mutex_t free_list_lock = _LIBC_LOCK_INITIALIZER;
>  static size_t narenas = 1;
>  static mstate free_list;
>  
> +/* list_lock prevents concurrent writes to the next member of struct
> +   malloc_state objects.  (Read access to the next member is
> +   synchronized via atomic_full_barrier before the write in
> +   _int_new_arena.)  list_lock also prevents concurrenct forks.  When

I only see an atomic_write_barrier there.  In any case, this new comment
is incorrect because just atomic_write_barrier is insufficient on archs
like Power or ARM; I don't mind if you document how this was supposed to
work, but your comment should make it obvious that the current code does
not achieve this because the matching fences on the reader side are
missing.  So, I think we should explain that, and perhaps put a FIXME in
the comments too.

> +   list_lock is acquired, no arena lock must be acquired, but it is
> +   permitted to acquire arena locks after list_lock.  */
> +static mutex_t list_lock = _LIBC_LOCK_INITIALIZER;
> +
>  /* Mapped memory in non-main arenas (reliable only for NO_THREADS). */
>  static unsigned long arena_mem;
>  
> @@ -207,6 +215,9 @@ ptmalloc_lock_all (void)
>    if (__malloc_initialized < 1)
>      return;
>  
> +  /* We do not acquire free_list_lock here because we completely
> +     reconstruct free_list in ptmalloc_unlock_all2.  */
> +
>    if (mutex_trylock (&list_lock))
>      {
>        if (thread_arena == ATFORK_ARENA_PTR)
> @@ -286,6 +297,7 @@ ptmalloc_unlock_all2 (void)
>  
>    /* Push all arenas to the free list, except save_arena, which is
>       attached to the current thread.  */
> +  mutex_init (&free_list_lock);
>    if (save_arena != NULL)
>      ((mstate) save_arena)->attached_threads = 1;
>    free_list = NULL;
> @@ -303,6 +315,7 @@ ptmalloc_unlock_all2 (void)
>        if (ar_ptr == &main_arena)
>          break;
>      }
> +
>    mutex_init (&list_lock);
>    atfork_recursive_cntr = 0;
>  }
> @@ -735,7 +748,7 @@ heap_trim (heap_info *heap, size_t pad)
>  /* Create a new arena with initial size "size".  */
>  
>  /* If REPLACED_ARENA is not NULL, detach it from this thread.  Must be
> -   called while list_lock is held.  */
> +   called while free_list_lock is held.  */
>  static void
>  detach_arena (mstate replaced_arena)
>  {
> @@ -788,12 +801,9 @@ _int_new_arena (size_t size)
>    mstate replaced_arena = thread_arena;
>    thread_arena = a;
>    mutex_init (&a->mutex);
> -  (void) mutex_lock (&a->mutex);
>  
>    (void) mutex_lock (&list_lock);
>  
> -  detach_arena (replaced_arena);
> -
>    /* Add the new arena to the global list.  */
>    a->next = main_arena.next;
>    atomic_write_barrier ();
> @@ -801,6 +811,19 @@ _int_new_arena (size_t size)
>  
>    (void) mutex_unlock (&list_lock);
>  
> +  (void) mutex_lock (&free_list_lock);
> +  detach_arena (replaced_arena);
> +  (void) mutex_unlock (&free_list_lock);
> +
> +  /* Lock this arena.  NB: We may not have exclusive access to this
> +     arena anymore because the arena is now accessible from the
> +     main_arena.next list and could have been picked by reused_arena
> +     in the meantime.  This can only happen for the last arena created
> +     (before the arena limit is reached).  This seems unlikely, so we
> +     do not protect against this special case.  */

"This seems unlikely" is not sufficient for correctness, IMO.  Either we
*known* that a situation will be unlikely (e.g., because we know that an
overflow or such will require execution time that is an order of
magnitudes larger than the lifetime of current machines) and thus make
this assumption, or we admit that there is a bug but that we do not
handle it yet.  Being honest about it is much better than hand-waving, I
believe.


Otherwise, the patch looks okay to me (my v2 disclaimer still applies
though).  Eventually, I think more thorough documentation will be good
to have, but that can be done later than an urgent bug fix.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]