This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: RFC: replace ptmalloc2
- From: Siddhesh Poyarekar <siddhesh dot poyarekar at gmail dot com>
- To: GNU C Library <libc-alpha at sourceware dot org>
- Cc: JÃrn Engel <joern at purestorage dot com>
- Date: Fri, 10 Oct 2014 05:21:01 +0530
- Subject: Re: RFC: replace ptmalloc2
- Authentication-results: sourceware.org; auth=none
- References: <20141009215447 dot GD8583 at Sligo dot logfs dot org> <CAAHN_R0JDNQkx7oV0HS9Knv7nsPZiARLeFb4zpPa+rj7cNfECg at mail dot gmail dot com>
Ugh, sorry about the personal reply; I thought I had hit Reply-All.
Siddhesh
On 10 October 2014 05:19, Siddhesh Poyarekar
<siddhesh.poyarekar@gmail.com> wrote:
> On 10 October 2014 03:24, JÃrn Engel <joern@purestorage.com> wrote:
>> I have recently been forced to look at the internals of ptmalloc2.
>> There are some low-hanging fruits for fixing, but overall I find it
>> more worthwhile to replace the allocator with one of the alternatives,
>> jemalloc being my favorite.
>
> The list archives have a discussion on introducing alternate
> allocators in glibc. The summary is that we'd like to do it, but it
> doesn't necessarily mean replacing ptmalloc2 completely. The approach
> we'd like is to have a tunable to select (maybe even for the entire
> OS) the allocator implementation, with ptmalloc2 being the default.
>
>> Problems encountered the hard way:
>> - Using per-thread arenas causes horrible memory bloat. While it is
>> theoretically possible for arenas to shrink and return memory to the
>> kernel, that rarely happens in practice. Effectively every arena
>> retains the biggest size it has ever had in history (or close to).
>> Given many threads and dynamic behaviour of individual threads, a
>> significant ratio of memory can be wasted here.
>
> The bloat you're seeing is address space and not actually memory.
> That is, the commit charge more often than not is not affected that
> badly. Also, you can control the number of arenas the allocator
> spawns off by using the MALLOC_ARENA_MAX environment variable (or the
> M_ARENA_MAX mallopt option).
>
>> - mmap() returning NULL once 65530 vmas are used by a process. There
>> is a kernel-bug that plays into this, but ptmalloc2 would hit this
>> limit even without the kernel bug. Given a large system, one can go
>> OOM (malloc returning NULL) with hundreds of gigabytes free on the
>> system.
>> - mmap_sem causing high latency for multithreaded processes. Yes,
>> this is a kernel-internal lock, but ptmalloc2 is the main reason for
>> hammering the lock.
>
> Then arenas is not the problem, it is the address space allocated
> using mmap. Setting MALLOC_MMAP_THRESHOLD_ to a high enough value
> should bring those into one of the arenas, but you risk fragmentation
> by doing that. Either way, you might find it useful to use the malloc
> systemtap probes[1] to characterize malloc usage in your program.
>
>> Possible improvements found by source code inspection and via
>> testcases:
>> - Everything mentioned in
>> https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919
>
> That paper compares glibc 2.5 implementation with jemalloc and the
> former does not have per-thread malloc. According to performance
> numbers some customers gave us (Red Hat), per-thread arenas give them
> anywhere between 20%-30% improvement in application speed at the
> expense of additional address space usage.
>
>> - Arenas are a bad choice for per-thread caches.
>
> Bad is a strong word. It works well for a lot of cases. It doesn't
> so much for others.
>
>> - mprotect_size seems to be responsible for silly behaviour. When
>> extending the main arena with sbrk(), one could immediately
>> mprotect() the entire extension and be done. Instead mprotect() is
>
> We don't use mprotect on the main arena.
>
>> often called in 4k-granularities. Each call takes the mmap_sem
>> writeable and potentially splits off new vmas. Way too expensize to
>> do in small granularities.
>> It gets better when looking at the other arenas. Memory is
>> allocated via mmap(PROT_NONE), so every mprotect() will split off
>> new vmas. Potentially some of them can get merged later on. But
>> current Linux kernels contain at least one bug, so this doesn't
>> always happen.
>> If someone is arguing in favor of PROT_NONE as a debug- or
>> security-measure, I wonder why we don't have the equivalent for the
>> main arena. Do we really want the worst of both worlds?
>
> mprotect usage is not just a diagnostic or security measure, it is
> primarily there to reduce the commit charge of the process. This is
> what keeps the actual memory usage low for processes despite having
> large address space usage.
>
> Granularity of mprotect can be evaluated (it should probably be
> max(request, M_TRIM_THRESHOLD)), but again, it should not split off a
> lot of vmas. At any point, you ought to have only two splits of each
> arena heap - one that is PROT_READ|PROT_WRITE and the other that is
> PROT_NONE since adjacent vmas with the same protection should merge.
> The multiple vmas are either because of arena extensions with arena
> heaps (different concept from the process heap) or due to allocations
> that went directly to mmap. The latter obviously has more potential
> to overrun the vma limit the way you describe.
>
>> All of the above have convinced me to abandon ptmalloc2 and use a
>> different allocator for my work project. But look at the facebook
>> post again and see the 2x performance improvement for their webserver
>> load. That is not exactly a micro-benchmark for allocators, but
>> translates to significant hardware savings in the real world. It
>> would be nice to get those savings out of the box.
>
> It is perfectly fine if you decide to use a different allocator.
> You're obviously welcome to improve the glibc malloc (which definitely
> could use a lot of improvement) and even help build the framework to
> have multiple allocators in glibc to make it easier to choose an
> alternate allocator. I don't think anybody is working on the latter
> yet.
>
> Thanks,
> Siddhesh
> --
> http://siddhesh.in
--
http://siddhesh.in