This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH][malloc] Avoid atomics in have_fastchunks

From: Carlos O'Donell <carlos at redhat dot com>
To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, "dj at redhat dot com" <dj at redhat dot com>
Cc: nd <nd at arm dot com>
Date: Tue, 19 Sep 2017 11:29:14 -0600
Subject: Re: [PATCH][malloc] Avoid atomics in have_fastchunks
Authentication-results: sourceware.org; auth=none
References: <DB6PR0801MB2053938869AF403F0D95BC5F83600@DB6PR0801MB2053.eurprd08.prod.outlook.com>

On 09/19/2017 10:32 AM, Wilco Dijkstra wrote:
> Currently free typically uses 2 atomic operations per call.  The have_fastchunks
> flag indicates whether there are recently freed blocks in the fastbins.  This
> is purely an optimization to avoid calling malloc_consolidate too often and
> avoiding the overhead of walking all fast bins even if all are empty during a
> sequence of allocations.  However using atomics to update the flag is completely
> unnecessary since it can be changed into a simple boolean.  There is no change
> in multi-threaded behaviour given the flag is already approximate (it may be
> set when there are no blocks in any fast bins, or it may be clear when there
> are free blocks that could be consolidated).
> 
> Performance of malloc/free improves by 27% on a simple benchmark on AArch64
> (both single and multithreaded). The number of load/store exclusive instructions is
> reduced by 33%. Bench-malloc-thread speeds up by ~3% in all cases.
> 
> Passes GLIBC tests. OK to commit?
> 
> ChangeLog:
> 2017-09-19  Wilco Dijkstra  <wdijkstr@arm.com>
> 
> 	* malloc/malloc.c (FASTCHUNKS_BIT): Remove.
> 	(have_fastchunks): Remove.
> 	(clear_fastchunks): Remove.
> 	(set_fastchunks): Remove.
> 	(malloc_state): Add have_fastchunks.
> 	(malloc_init_state): Use have_fastchunks.
> 	(do_check_malloc_state): Remove incorrect invariant checks.
> 	(_int_malloc): Use have_fastchunks.
> 	(_int_free): Likewise.
> 	(malloc_consolidate): Likewise.

Quick review. I think we'll need a v2 before we're ready to commit, or at
least a discussion around the use of relaxed MO loads and stores.

High level:

It is great to see someone looking at the details of malloc at a atomic by
atomic cost analysis. I know we have looked briefly at fastbins and the
tradeoff between taking the arena lock (one atomic) and CAS required to put
the fastbin back in the list.

I'm happy to see this paying dividends for you e.g. ~3% global gains.

I like the direction this fix takes and the process you used to verify it.

Implementation level:

You use unadorned loads and stores of the variable av->have_fastchunks, and
this constitutes a data race which is undefined behaviour in C11.

For example in _int_free have_lock could be 0 and we would write to
av->have_fastchunks, while any other thread might be reading have_fastchunks.
This is a data race in C11.

Please use relaxed MO loads and stores if that is what we need.

After you add the relaxed MO loads and stores the comment for have_fastchunks
will need a little more explicit language about why the relaxed MO loads and
stores are OK from a P&C perspective.

Details:

Does this patch change the number of times malloc_consolidate might
be called? Do you have any figures on this? That would be a user visible
change (and require a bug #).

-- 
Cheers,
Carlos.

Follow-Ups:
- Re: [PATCH][malloc] Avoid atomics in have_fastchunks
  - From: DJ Delorie
- Re: [PATCH][malloc] Avoid atomics in have_fastchunks
  - From: Wilco Dijkstra

References:
- [PATCH][malloc] Avoid atomics in have_fastchunks
  - From: Wilco Dijkstra

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]