This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: replace ptmalloc2


Ugh, sorry about the personal reply; I thought I had hit Reply-All.

Siddhesh

On 10 October 2014 05:19, Siddhesh Poyarekar
<siddhesh.poyarekar@gmail.com> wrote:
> On 10 October 2014 03:24, JÃrn Engel <joern@purestorage.com> wrote:
>> I have recently been forced to look at the internals of ptmalloc2.
>> There are some low-hanging fruits for fixing, but overall I find it
>> more worthwhile to replace the allocator with one of the alternatives,
>> jemalloc being my favorite.
>
> The list archives have a discussion on introducing alternate
> allocators in glibc.  The summary is that we'd like to do it, but it
> doesn't necessarily mean replacing ptmalloc2 completely.  The approach
> we'd like is to have a tunable to select (maybe even for the entire
> OS) the allocator implementation, with ptmalloc2 being the default.
>
>> Problems encountered the hard way:
>> - Using per-thread arenas causes horrible memory bloat.  While it is
>>   theoretically possible for arenas to shrink and return memory to the
>>   kernel, that rarely happens in practice.  Effectively every arena
>>   retains the biggest size it has ever had in history (or close to).
>>   Given many threads and dynamic behaviour of individual threads, a
>>   significant ratio of memory can be wasted here.
>
> The bloat you're seeing is address space and not actually memory.
> That is, the commit charge more often than not is not affected that
> badly.  Also, you can control the number of arenas the allocator
> spawns off by using the MALLOC_ARENA_MAX environment variable (or the
> M_ARENA_MAX mallopt option).
>
>> - mmap() returning NULL once 65530 vmas are used by a process.  There
>>   is a kernel-bug that plays into this, but ptmalloc2 would hit this
>>   limit even without the kernel bug.  Given a large system, one can go
>>   OOM (malloc returning NULL) with hundreds of gigabytes free on the
>>   system.
>> - mmap_sem causing high latency for multithreaded processes.  Yes,
>>   this is a kernel-internal lock, but ptmalloc2 is the main reason for
>>   hammering the lock.
>
> Then arenas is not the problem, it is the address space allocated
> using mmap.  Setting MALLOC_MMAP_THRESHOLD_ to a high enough value
> should bring those into one of the arenas, but you risk fragmentation
> by doing that.  Either way, you might find it useful to use the malloc
> systemtap probes[1] to characterize malloc usage in your program.
>
>> Possible improvements found by source code inspection and via
>> testcases:
>> - Everything mentioned in
>>   https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919
>
> That paper compares glibc 2.5 implementation with jemalloc and the
> former does not have per-thread malloc.  According to performance
> numbers some customers gave us (Red Hat), per-thread arenas give them
> anywhere between 20%-30% improvement in application speed at the
> expense of additional address space usage.
>
>> - Arenas are a bad choice for per-thread caches.
>
> Bad is a strong word.  It works well for a lot of cases.  It doesn't
> so much for others.
>
>> - mprotect_size seems to be responsible for silly behaviour.  When
>>   extending the main arena with sbrk(), one could immediately
>>   mprotect() the entire extension and be done.  Instead mprotect() is
>
> We don't use mprotect on the main arena.
>
>>   often called in 4k-granularities.  Each call takes the mmap_sem
>>   writeable and potentially splits off new vmas.  Way too expensize to
>>   do in small granularities.
>>   It gets better when looking at the other arenas.  Memory is
>>   allocated via mmap(PROT_NONE), so every mprotect() will split off
>>   new vmas.  Potentially some of them can get merged later on.  But
>>   current Linux kernels contain at least one bug, so this doesn't
>>   always happen.
>>   If someone is arguing in favor of PROT_NONE as a debug- or
>>   security-measure, I wonder why we don't have the equivalent for the
>>   main arena.  Do we really want the worst of both worlds?
>
> mprotect usage is not just a diagnostic or security measure, it is
> primarily there to reduce the commit charge of the process.  This is
> what keeps the actual memory usage low for processes despite having
> large address space usage.
>
> Granularity of mprotect can be evaluated (it should probably be
> max(request, M_TRIM_THRESHOLD)), but again, it should not split off a
> lot of vmas.  At any point, you ought to have only two splits of each
> arena heap - one that is PROT_READ|PROT_WRITE and the other that is
> PROT_NONE since adjacent vmas with the same protection should merge.
> The multiple vmas are either because of arena extensions with arena
> heaps (different concept from the process heap) or due to allocations
> that went directly to mmap.  The latter obviously has more potential
> to overrun the vma limit the way you describe.
>
>> All of the above have convinced me to abandon ptmalloc2 and use a
>> different allocator for my work project.  But look at the facebook
>> post again and see the 2x performance improvement for their webserver
>> load.  That is not exactly a micro-benchmark for allocators, but
>> translates to significant hardware savings in the real world.  It
>> would be nice to get those savings out of the box.
>
> It is perfectly fine if you decide to use a different allocator.
> You're obviously welcome to improve the glibc malloc (which definitely
> could use a lot of improvement) and even help build the framework to
> have multiple allocators in glibc to make it easier to choose an
> alternate allocator.  I don't think anybody is working on the latter
> yet.
>
> Thanks,
> Siddhesh
> --
> http://siddhesh.in



-- 
http://siddhesh.in


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]