This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [patch] malloc per-thread cache ready for review
- From: Markus Trippelsdorf <markus at trippelsdorf dot de>
- To: Carlos O'Donell <carlos at redhat dot com>
- Cc: DJ Delorie <dj at redhat dot com>, libc-alpha at sourceware dot org
- Date: Wed, 1 Feb 2017 22:23:19 +0100
- Subject: Re: [patch] malloc per-thread cache ready for review
- Authentication-results: sourceware.org; auth=none
- References: <20170201163333.GD17590@x4> <xnefzhlv2y.fsf@greed.delorie.com> <20170201165446.GE17590@x4> <6218e1a2-9208-791e-b7d8-455a714cfde2@redhat.com>
On 2017.02.01 at 15:50 -0500, Carlos O'Donell wrote:
> On 02/01/2017 11:54 AM, Markus Trippelsdorf wrote:
> > On 2017.02.01 at 11:44 -0500, DJ Delorie wrote:
> >>
> >> Markus Trippelsdorf <markus@trippelsdorf.de> writes:
> >>> http://locklessinc.com/downloads/lockless_allocator_src.tgz (the best in
> >>> my testing) or jemalloc.
> >>
> >> Before we go down the "which allocator is best" road... glibc's
> >> allocator is intended to be a general purpose "reasonably good enough"
> >> system allocator. It's easy to find a specific allocator that beats it
> >> in a specific test, but being a specifically best allocator is not our
> >> goal here - providing an allocator that can be the default on a
> >> Linux-based system is.
> >>
> >> Hence, my goal with the per-thread cache is to make it "generally
> >> better" for overall system performance.
> >>
> >> I am not trying to make it better than every other allocator in every
> >> case, that's a futile exercise.
> >
> > Well, there wouldn't be a reason for all these alternative allocators if
> > glibc's would be "reasonably good". In fact is often astonishingly bad.
>
> Given apriori knowledge of the workload allows you choose an allocator
> whose semantics match your allocation pattern and that improves
> performance, and memory usage.
>
> There are many more allocators than tcmalloc and jemalloc, and many more
> embedded allocators in projects that you don't readily have visibility
> into.
>
> For a general purpose allocator the performance of the allocator can
> only be measured against a given corpus of workloads.
>
> To this day no serious corpus of workloads has been collected to measure
> allocators against. All the academic papers I've seen only test against
> a few workloads.
>
> I hope that within glibc we can gather up workloads to test the allocator
> and raise the performance and quality. We have started gathering malloc
> traces for just this purpose.
>
> Regarding your comments about glibc malloc being astonishingly bad, do
> you have a reference to such a workload? I am looking for _real_
> workloads not synthetic ones created to show worst case behaviour in
> heap-based allocators (dlmalloc, ptmalloc, and glibc's malloc).
Using google turns up several examples. For example at Facebook
switching to jemalloc doubled their server throughput:
https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919/
https://www.facebook.com/Engineering/videos/696488619305/
I have mentioned compile times of C++ projects. Here an alternative
allocator typically decreases build times by 6-8% (I have measured this
myself).
> > Examples are all major browsers (using jemalloc or tcmalloc) and Rust
> > (jemalloc gets linkend in for all generated binaries by default).
>
> TLDR; I don't think any of these examples chose jemalloc because glibc's
> malloc was bad, but because they wanted to offer a choice, and fix
> Windows flaws.
>
> I can only speculate here because few projects provide detailed analysis
> backed up by real data as the rationale for switching to an alternate
> allocator.
>
> Firstly the browsers were looking for a cross-OS solution to solving
> memory fragmentation issues in Windows, something we don't specifically
> cater to in glibc, but which the portable jemalloc did solve. This was
> in the FF3 era when jemalloc was added and it solved the Windows XP
> fragmentation issues.
>
> The major browsers use forks of jemalloc. As far as I can tell the forks
> are no longer distinguishable from the originals after their modifications
> e.g. jemalloc vs. mozjemalloc. Though they do merge in jemalloc enhancements.
Well, Chromium uses tcmalloc by default.
And I don't think that Windows was the main reason for the switch.
More likely the multi threaded nature of modern browsers requires a
better allocator.
In this case DJ's per-thread cache patch would help, I guess.
--
Markus