This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: RFC: replace ptmalloc2
- From: Will Newton <will dot newton at linaro dot org>
- To: Eric Wong <normalperson at yhbt dot net>
- Cc: JÃrn Engel <joern at purestorage dot com>, Rich Felker <dalias at libc dot org>, Siddhesh Poyarekar <siddhesh dot poyarekar at gmail dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Mon, 20 Oct 2014 15:17:56 +0100
- Subject: Re: RFC: replace ptmalloc2
- Authentication-results: sourceware.org; auth=none
- References: <20141009215447 dot GD8583 at Sligo dot logfs dot org> <CAAHN_R0JDNQkx7oV0HS9Knv7nsPZiARLeFb4zpPa+rj7cNfECg at mail dot gmail dot com> <20141010010743 dot GA15146 at Sligo dot logfs dot org> <20141010012530 dot GX23797 at brightrain dot aerifal dot cx> <20141010013302 dot GC15146 at Sligo dot logfs dot org> <20141010020229 dot GY23797 at brightrain dot aerifal dot cx> <20141014233254 dot GA1860 at Sligo dot logfs dot org> <20141015040031 dot GR32028 at brightrain dot aerifal dot cx> <20141015045238 dot GA4528 at Sligo dot logfs dot org> <CANu=DmgNQm4A0ChTky+c4iSBwDjb5uAmsHd3H7MynQPjL3vecA at mail dot gmail dot com> <20141017090340 dot GA12253 at dcvr dot yhbt dot net>
On 17 October 2014 10:03, Eric Wong <normalperson@yhbt.net> wrote:
> Will Newton <will.newton@linaro.org> wrote:
>> I currently have some microbenchmarks and a script to run a few open
>> source applications inside a docker container and measure the
>> performance of the commonly used allocators. I could certainly use
>> more and more realistic workloads to add to it:
>>
>> https://git.linaro.org/toolchain/cortex-malloc.git
>
> Hi Will, great to see you have a producer-consumer test in there! I
> noticed jemalloc seemed flawed with large remote frees in my own tests,
> too.
>
> Several months ago, I also started studying memory allocator
> implementations and working on some benchmarks and a dlmalloc-based one,
> but got discouraged and sidetracked by other things.
>
> * git://80x24.org/xtbench
> - xthr.c is mine, also producer-consumer (uses URCU),
> - t-test* from ptmalloc3
> There's also ebizzy...
>
> * git://80x24.org/femalloc
> - it is dlmalloc + wait-free queue (from URCU)
> dlmalloc 2.8 has an API (mspace) for doing per-thread arenas.
> - "fe" == "fool's errand", that's what working on a general-purpose
> malloc feels like :x
Thanks for the links, it looks an interesting approach. I'm not really
familiar with URCU so I guess I'll have to get my head around that
first.
>> So far I would say that tcmalloc seems to have the best performance
>> but the highest space overhead.
>
> I found the locklessinc.com malloc is fast, too; but use lots of space
> and needs to be ported to non-x86. I also like the use of a slab
> allocator in lockless for some small allocations and may do something
> similar of I continue with femalloc. femalloc currently performs
> well for medium/larger allocations, but there's some glaring weaknesses
> I documented in the README[1].
>
>
> There's several other things I want to keep in mind for a malloc
> (but do not have good automated tests for, yet):
>
> * copy-on-write sharing on fork + swap behavior for large allocations.
> The ptmalloc/dlmalloc layout seems bad for this because the boundary
> tags ends up touching extra pages, esp for bigger allocations. femalloc
> inherits this weakness, so that's part of the reason I've been working
> on other things, instead.
I believe tcmalloc should behave better in this case but I do not have
a test for this scenario either.
> * Ability to take advantage of THP, but not inflate memory usage.
> This is important for folks who run many threads w/o overcommit.
> I think software like MySQL with 10s-100s of threads on a handful of
> CPUs is here to stay. Getting folks to remember knobs like
> MALLOC_ARENA_MAX is annoying and tiring.
Do you have an idea in mind how malloc could integrate with THP?
--
Will Newton
Toolchain Working Group, Linaro