This is the mail archive of the
mailing list for the glibc project.
Re: [RFC/PoC] malloc: use wfcqueue to speed up remote frees
- From: Eric Wong <normalperson at yhbt dot net>
- To: Carlos O'Donell <carlos at redhat dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Wed, 1 Aug 2018 09:26:26 +0000
- Subject: Re: [RFC/PoC] malloc: use wfcqueue to speed up remote frees
- References: <20180731084936.g4yw6wnvt677miti@dcvr> <firstname.lastname@example.org> <20180731231819.57xsqvdfdyfxrzy5@whir> <email@example.com> <20180801062352.rlrjqmsszntkzlfe@untitled> <firstname.lastname@example.org>
Carlos O'Donell <email@example.com> wrote:
> On 08/01/2018 02:23 AM, Eric Wong wrote:
> > Carlos O'Donell <firstname.lastname@example.org> wrote:
> >> On 07/31/2018 07:18 PM, Eric Wong wrote:
> >>> Also, if I spawn a bunch of threads and get a bunch of
> >>> arenas early in the program lifetime; and then only have few
> >>> threads later, there can be a lot of idle arenas.
> >> Yes. That is true. We don't coalesce arenas to match the thread
> >> demand.
> > Eep :< If contention can be avoided (which tcache seems to
> > work well for), limiting arenas to CPU count seems desirable and
> > worth trying.
> In general it is not as bad as you think.
> An arena is made up of a chain of heaps, each an mmap'd block, and
> if we can manage to free an entire heap then we unmap the heap,
> and if we're lucky we can manage to free down the entire arena
> (_int_free -> large chunk / consolidate -> heap_trim -> shrink_heap).
> So we might just end up with a large number of arena's that don't
> have very much allocated at all, but are all on the arena free list
> waiting for a thread to attach to them to reduce overall contention.
> I agree that it would be *better* if we had one arena per CPU and
> each thread could easily determine the CPU it was on (via a
> restartable sequence) and then allocate CPU-local memory to work
> with (the best you can do; ignoring NUMA effects).
Thanks for the info on arenas. One problem for Ruby is we get
many threads, and they create allocations of varying
lifetimes. All this while malloc contention is rarely a
problem in Ruby because of the global VM lock (GVL).
Even without restartable sequences, I was wondering if lfstack
(also in urcu) could even be used for sharing/distributing
arenas between threads. This would require tcache to avoid
retries on lfstack pop/push.
Much less straighforward than using wfcqueue for frees with
this patch, though :)
 we only had green-threads back in Ruby 1.8, and I guess many
Rubyists got used to the idea that they could have many
threads cheaply. Ruby 1.9+ moved to 100% native threads,
so I'm also trying to reintroduce green threads as an option
back into Ruby (but still keeping native threads)
> > OK, I noticed my patch fails conformance tests because
> > (despite my use of __cds_wfcq_splice_nonblocking) it references
> > poll(), despite poll() being in an impossible code path:
> > __cds_wfcq_splice_nonblocking -> ___cds_wfcq_splice
> > -> ___cds_wfcq_busy_wait -> poll
> > The poll call is impossible because the `blocking' parameter is 0;
> > but I guess the linker doesn't know that?
> Correct. We can fix that easily at a later date. Don't worry about it.
Heh, a bit dirty, but #define-ing poll away seems to work :)
diff --git a/malloc/malloc.c b/malloc/malloc.c
index 40d61e45db..89e675c7a0 100644
@@ -247,6 +247,11 @@
/* For SINGLE_THREAD_P. */
+/* prevent wfcqueue.h from including poll.h and linking to it */
+#define poll(a,b,c) assert(0 && "should not be called")
#define _LGPL_SOURCE /* allows inlines */