This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: RFC: replace ptmalloc2
- From: Eric Wong <normalperson at yhbt dot net>
- To: Will Newton <will dot newton at linaro dot org>
- Cc: JÃrn Engel <joern at purestorage dot com>, Rich Felker <dalias at libc dot org>, Siddhesh Poyarekar <siddhesh dot poyarekar at gmail dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Fri, 17 Oct 2014 09:03:40 +0000
- Subject: Re: RFC: replace ptmalloc2
- Authentication-results: sourceware.org; auth=none
- References: <20141009215447 dot GD8583 at Sligo dot logfs dot org> <CAAHN_R0JDNQkx7oV0HS9Knv7nsPZiARLeFb4zpPa+rj7cNfECg at mail dot gmail dot com> <20141010010743 dot GA15146 at Sligo dot logfs dot org> <20141010012530 dot GX23797 at brightrain dot aerifal dot cx> <20141010013302 dot GC15146 at Sligo dot logfs dot org> <20141010020229 dot GY23797 at brightrain dot aerifal dot cx> <20141014233254 dot GA1860 at Sligo dot logfs dot org> <20141015040031 dot GR32028 at brightrain dot aerifal dot cx> <20141015045238 dot GA4528 at Sligo dot logfs dot org> <CANu=DmgNQm4A0ChTky+c4iSBwDjb5uAmsHd3H7MynQPjL3vecA at mail dot gmail dot com>
Will Newton <will.newton@linaro.org> wrote:
> I currently have some microbenchmarks and a script to run a few open
> source applications inside a docker container and measure the
> performance of the commonly used allocators. I could certainly use
> more and more realistic workloads to add to it:
>
> https://git.linaro.org/toolchain/cortex-malloc.git
Hi Will, great to see you have a producer-consumer test in there! I
noticed jemalloc seemed flawed with large remote frees in my own tests,
too.
Several months ago, I also started studying memory allocator
implementations and working on some benchmarks and a dlmalloc-based one,
but got discouraged and sidetracked by other things.
* git://80x24.org/xtbench
- xthr.c is mine, also producer-consumer (uses URCU),
- t-test* from ptmalloc3
There's also ebizzy...
* git://80x24.org/femalloc
- it is dlmalloc + wait-free queue (from URCU)
dlmalloc 2.8 has an API (mspace) for doing per-thread arenas.
- "fe" == "fool's errand", that's what working on a general-purpose
malloc feels like :x
> So far I would say that tcmalloc seems to have the best performance
> but the highest space overhead.
I found the locklessinc.com malloc is fast, too; but use lots of space
and needs to be ported to non-x86. I also like the use of a slab
allocator in lockless for some small allocations and may do something
similar of I continue with femalloc. femalloc currently performs
well for medium/larger allocations, but there's some glaring weaknesses
I documented in the README[1].
There's several other things I want to keep in mind for a malloc
(but do not have good automated tests for, yet):
* copy-on-write sharing on fork + swap behavior for large allocations.
The ptmalloc/dlmalloc layout seems bad for this because the boundary
tags ends up touching extra pages, esp for bigger allocations. femalloc
inherits this weakness, so that's part of the reason I've been working
on other things, instead.
* Ability to take advantage of THP, but not inflate memory usage.
This is important for folks who run many threads w/o overcommit.
I think software like MySQL with 10s-100s of threads on a handful of
CPUs is here to stay. Getting folks to remember knobs like
MALLOC_ARENA_MAX is annoying and tiring.
[1] http://femalloc.80x24.org/README - in case the git clone is too heavy