This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: replace ptmalloc2


On 17 October 2014 10:03, Eric Wong <normalperson@yhbt.net> wrote:
> Will Newton <will.newton@linaro.org> wrote:
>> I currently have some microbenchmarks and a script to run a few open
>> source applications inside a docker container and measure the
>> performance of the commonly used allocators. I could certainly use
>> more and more realistic workloads to add to it:
>>
>> https://git.linaro.org/toolchain/cortex-malloc.git
>
> Hi Will, great to see you have a producer-consumer test in there!  I
> noticed jemalloc seemed flawed with large remote frees in my own tests,
> too.
>
> Several months ago, I also started studying memory allocator
> implementations and working on some benchmarks and a dlmalloc-based one,
> but got discouraged and sidetracked by other things.
>
> * git://80x24.org/xtbench
>   - xthr.c is mine, also producer-consumer (uses URCU),
>   - t-test* from ptmalloc3
>   There's also ebizzy...
>
> * git://80x24.org/femalloc
>   - it is dlmalloc + wait-free queue (from URCU)
>     dlmalloc 2.8 has an API (mspace) for doing per-thread arenas.
>   - "fe" == "fool's errand", that's what working on a general-purpose
>     malloc feels like :x

Thanks for the links, it looks an interesting approach. I'm not really
familiar with URCU so I guess I'll have to get my head around that
first.

>> So far I would say that tcmalloc seems to have the best performance
>> but the highest space overhead.
>
> I found the locklessinc.com malloc is fast, too; but use lots of space
> and needs to be ported to non-x86.  I also like the use of a slab
> allocator in lockless for some small allocations and may do something
> similar of I continue with femalloc.  femalloc currently performs
> well for medium/larger allocations, but there's some glaring weaknesses
> I documented in the README[1].
>
>
> There's several other things I want to keep in mind for a malloc
> (but do not have good automated tests for, yet):
>
> * copy-on-write sharing on fork + swap behavior for large allocations.
>   The ptmalloc/dlmalloc layout seems bad for this because the boundary
>   tags ends up touching extra pages, esp for bigger allocations.  femalloc
>   inherits this weakness, so that's part of the reason I've been working
>   on other things, instead.

I believe tcmalloc should behave better in this case but I do not have
a test for this scenario either.

> * Ability to take advantage of THP, but not inflate memory usage.
>   This is important for folks who run many threads w/o overcommit.
>   I think software like MySQL with 10s-100s of threads on a handful of
>   CPUs is here to stay.  Getting folks to remember knobs like
>   MALLOC_ARENA_MAX is annoying and tiring.

Do you have an idea in mind how malloc could integrate with THP?

-- 
Will Newton
Toolchain Working Group, Linaro


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]