This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch] malloc per-thread cache ready for review


On 2017.02.01 at 15:50 -0500, Carlos O'Donell wrote:
> On 02/01/2017 11:54 AM, Markus Trippelsdorf wrote:
> > On 2017.02.01 at 11:44 -0500, DJ Delorie wrote:
> >>
> >> Markus Trippelsdorf <markus@trippelsdorf.de> writes:
> >>> http://locklessinc.com/downloads/lockless_allocator_src.tgz (the best in
> >>> my testing) or jemalloc.
> >>
> >> Before we go down the "which allocator is best" road... glibc's
> >> allocator is intended to be a general purpose "reasonably good enough"
> >> system allocator.  It's easy to find a specific allocator that beats it
> >> in a specific test, but being a specifically best allocator is not our
> >> goal here - providing an allocator that can be the default on a
> >> Linux-based system is.
> >>
> >> Hence, my goal with the per-thread cache is to make it "generally
> >> better" for overall system performance.
> >>
> >> I am not trying to make it better than every other allocator in every
> >> case, that's a futile exercise.
> > 
> > Well, there wouldn't be a reason for all these alternative allocators if
> > glibc's would be "reasonably good". In fact is often astonishingly bad.
> 
> Given apriori knowledge of the workload allows you choose an allocator 
> whose semantics match your allocation pattern and that improves
> performance, and memory usage.
> 
> There are many more allocators than tcmalloc and jemalloc, and many more
> embedded allocators in projects that you don't readily have visibility
> into.
> 
> For a general purpose allocator the performance of the allocator can 
> only be measured against a given corpus of workloads.
> 
> To this day no serious corpus of workloads has been collected to measure
> allocators against. All the academic papers I've seen only test against
> a few workloads.
> 
> I hope that within glibc we can gather up workloads to test the allocator
> and raise the performance and quality. We have started gathering malloc
> traces for just this purpose.
> 
> Regarding your comments about glibc malloc being astonishingly bad, do
> you have a reference to such a workload? I am looking for _real_
> workloads not synthetic ones created to show worst case behaviour in
> heap-based allocators (dlmalloc, ptmalloc, and glibc's malloc).

Using google turns up several examples. For example at Facebook
switching to jemalloc doubled their server throughput:
https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919/
https://www.facebook.com/Engineering/videos/696488619305/

I have mentioned compile times of C++ projects. Here an alternative
allocator typically decreases build times by 6-8% (I have measured this
myself).

> > Examples are all major browsers (using jemalloc or tcmalloc) and Rust
> > (jemalloc gets linkend in for all generated binaries by default).
> 
> TLDR; I don't think any of these examples chose jemalloc because glibc's
> malloc was bad, but because they wanted to offer a choice, and fix
> Windows flaws.
> 
> I can only speculate here because few projects provide detailed analysis
> backed up by real data as the rationale for switching to an alternate
> allocator.
> 
> Firstly the browsers were looking for a cross-OS solution to solving
> memory fragmentation issues in Windows, something we don't specifically
> cater to in glibc, but which the portable jemalloc did solve. This was
> in the FF3 era when jemalloc was added and it solved the Windows XP
> fragmentation issues.
> 
> The major browsers use forks of jemalloc. As far as I can tell the forks
> are no longer distinguishable from the originals after their modifications
> e.g. jemalloc vs. mozjemalloc. Though they do merge in jemalloc enhancements.

Well, Chromium uses tcmalloc by default.
And I don't think that Windows was the main reason for the switch.
More likely the multi threaded nature of modern browsers requires a
better allocator.

In this case DJ's per-thread cache patch would help, I guess.

-- 
Markus


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]