This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: replace ptmalloc2


Will Newton <will.newton@linaro.org> wrote:
> I currently have some microbenchmarks and a script to run a few open
> source applications inside a docker container and measure the
> performance of the commonly used allocators. I could certainly use
> more and more realistic workloads to add to it:
> 
> https://git.linaro.org/toolchain/cortex-malloc.git

Hi Will, great to see you have a producer-consumer test in there!  I
noticed jemalloc seemed flawed with large remote frees in my own tests,
too.

Several months ago, I also started studying memory allocator
implementations and working on some benchmarks and a dlmalloc-based one,
but got discouraged and sidetracked by other things.

* git://80x24.org/xtbench
  - xthr.c is mine, also producer-consumer (uses URCU),
  - t-test* from ptmalloc3
  There's also ebizzy...

* git://80x24.org/femalloc
  - it is dlmalloc + wait-free queue (from URCU)
    dlmalloc 2.8 has an API (mspace) for doing per-thread arenas.
  - "fe" == "fool's errand", that's what working on a general-purpose
    malloc feels like :x

> So far I would say that tcmalloc seems to have the best performance
> but the highest space overhead.

I found the locklessinc.com malloc is fast, too; but use lots of space
and needs to be ported to non-x86.  I also like the use of a slab
allocator in lockless for some small allocations and may do something
similar of I continue with femalloc.  femalloc currently performs
well for medium/larger allocations, but there's some glaring weaknesses
I documented in the README[1].


There's several other things I want to keep in mind for a malloc
(but do not have good automated tests for, yet):

* copy-on-write sharing on fork + swap behavior for large allocations.
  The ptmalloc/dlmalloc layout seems bad for this because the boundary
  tags ends up touching extra pages, esp for bigger allocations.  femalloc
  inherits this weakness, so that's part of the reason I've been working
  on other things, instead.

* Ability to take advantage of THP, but not inflate memory usage.
  This is important for folks who run many threads w/o overcommit.
  I think software like MySQL with 10s-100s of threads on a handful of
  CPUs is here to stay.  Getting folks to remember knobs like
  MALLOC_ARENA_MAX is annoying and tiring.



[1] http://femalloc.80x24.org/README - in case the git clone is too heavy


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]