This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch] malloc per-thread cache ready for review


On Thursday 02 February 2017 02:53 AM, Markus Trippelsdorf wrote:
> Using google turns up several examples. For example at Facebook
> switching to jemalloc doubled their server throughput:
> https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919/
> https://www.facebook.com/Engineering/videos/696488619305/

Several examples I had googled for in the past showed that tcmalloc or
jemalloc was a good 20% faster than glibc malloc.  The catch in many of
them was that they were comparing with a very old glibc (RHEL-4 or
RHEL-5), i.e. before glibc malloc got per-thread caches, which in many
proprietary workloads I have had a chance to see (tech support is quite
nice that way) got over 30% improvement in their performance at the cost
of address space usage.  This is similar to what happens with jemalloc
or tcmalloc.

One of the things we have talked about on the list in the past was to
provide a better framework in glibc to choose an alternate allocator and
that might actually be a better way to do things.  That is, allow
installation of multiple allocator libraries and then have a tunable
glibc.malloc.allocator that decides whether to use libtcmalloc.so or
libjemalloc.so or the default (libc.so) for the allocator.  It would be
nice to see this happening alongside improving glibc malloc since
tunables is now (finally!) in glibc mainline.

> Well, Chromium uses tcmalloc by default.
> And I don't think that Windows was the main reason for the switch.
> More likely the multi threaded nature of modern browsers requires a
> better allocator.
> 
> In this case DJ's per-thread cache patch would help, I guess.

DJ's patch does not only impact multi-threaded workloads positively,
(although I assume it will likely improve those workloads by a great
deal) it shortens the hot path quite considerably and possibly (a
hand-waving guess) also has a positive on icache usage because in my
testing (multiple runs on multiple systems because I couldn't believe it
at first) I saw a significant improvement in SPEC2006 numbers, often
more than the amount of time spent in malloc in those tests.

Siddhesh


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]