This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [updated patch] malloc per-thread cache ready for review


On 07/05/2017 02:55 PM, DJ Delorie wrote:
> 
> Latest patch, with news, install, changelog, and all latest tweaks...
> This has that "appease gcc" line simply removed, since I couldn't
> reproduce it at the moment.  If/when it comes up again, I'll patch it
> with a push/pop then.
> 
> Anything else?  ;-)
 
Two nits below.

> +* A per-thread cache has been added to malloc.

Suggest:

* A per-thread cache has been added to malloc. Access to the cache requires
  no locks and therefore significantly accelerates the fast path to allocate
  and free small amounts of memory. Refilling an empty cache requires locking
  the underlying arena. Performance measurements show significant gains in a
  wide variety of user workloads. Workloads were captured using a special
  instrumented malloc and analyzed with a malloc simulator. Contributed by
  DJ Delorie with the help of Florian Weimer, and Carlos O'Donell.

> diff -x .git -x po -rup glibc.pristine/manual/install.texi glibc.djmalloc-tcache/manual/install.texi
> --- glibc.pristine/manual/install.texi	2017-06-26 17:13:13.080645103 -0400
> +++ glibc.djmalloc-tcache/manual/install.texi	2017-07-05 14:22:50.287119093 -0400
> @@ -232,6 +232,11 @@ libnss_nisplus are not built at all.
>  Use this option to enable libnsl with all depending NSS modules and
>  header files.
>  
> +@item --disable-experimental-malloc
> +By default, a per-thread cache is enabled in @code{malloc}.  While
> +this cache can be disabled on a per-application basis using tunables,

How? Please call out the exact tunable to use to disable the cache.

> +this option can be used to remove it from the build completely.
> +
>  @item --build=@var{build-system}
>  @itemx --host=@var{host-system}
>  These options are for cross-compiling.  If you specify both options and

At the same time add the same text about disabling the cache to the
appropriate tunable below.

> diff -x .git -x po -rup glibc.pristine/manual/tunables.texi glibc.djmalloc-tcache/manual/tunables.texi
> --- glibc.pristine/manual/tunables.texi	2017-06-23 20:49:10.201148494 -0400
> +++ glibc.djmalloc-tcache/manual/tunables.texi	2017-07-05 14:50:27.881007814 -0400
> @@ -193,6 +193,37 @@ systems the limit is twice the number of
>  is 8 times the number of cores online.
>  @end deftp
>  
> +@deftp Tunable glibc.malloc.tcache_max
> +The maximum size of a request (in bytes) which may be met via the
> +per-thread cache.  The default (and maximum) value is 1032 bytes on
> +64-bit systems and 516 bytes on 32-bit systems.
> +@end deftp
> +
> +@deftp Tunable glibc.malloc.tcache_count
> +The maximum number of chunks of each size to cache. The default is 7.
> +There is no upper limit, other than available system memory.
> +
> +The approximate maximum overhead of the per-thread cache is thus equal
> +to the number of bins times the chunk count in each bin times the size
> +of each chunk.  With defaults, the approximate maximum overhead of the
> +per-thread cache is approximately 236 KB on 64-bit systems and 118 KB
> +on 32-bit systems.
> +@end deftp
> +
> +@deftp Tunable glibc.malloc.tcache_unsorted_limit
> +When the user requests memory and the request cannot be met via the
> +per-thread cache, the arenas are used to meet the request.  At this
> +time, additional chunks will be moved from existing arena lists to
> +pre-fill the corresponding cache.  While copies from the fastbins,
> +smallbins, and regular bins are bounded and predictable due to the bin
> +sizes, copies from the unsorted bin are not bounded, and incur
> +additional time penalties as they need to be sorted as they're
> +scanned.  To make scanning the unsorted list more predictable and
> +bounded, the user may set this tunable to limit the number of chunks
> +that are scanned from the unsorted list while searching for chunks to
> +pre-fill the per-thread cache with.  The default, or when set to zero,
> +is no limit.
> +
>  @node Hardware Capability Tunables
>  @section Hardware Capability Tunables
>  @cindex hardware capability tunables
> 


-- 
Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]