This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] [RFC] malloc: Reduce worst-case behaviour with madvise and refault overhead

From: Julian Taylor <jtaylor dot debian at googlemail dot com>
To: libc-alpha at sourceware dot org
Date: Thu, 12 Feb 2015 18:58:14 +0100
Subject: Re: [PATCH] [RFC] malloc: Reduce worst-case behaviour with madvise and refault overhead
Authentication-results: sourceware.org; auth=none

> On Mon, Feb 09, 2015 at 03:52:22PM -0500, Carlos O'Donell wrote:
>> On 02/09/2015 09:06 AM, Mel Gorman wrote:
>> > while (data_to_process) {
>> > 	buf = malloc(large_size);
>> > 	do_stuff();
>> > 	free(buf);
>> > }
>> 
>> Why isn't the fix to change the application to hoist the
>> malloc out of the loop?
> 
> I understand this is impossible for some language idioms (typically
> OOP, and despite my personal belief that this indicates they're bad
> language idioms, I don't want to descend into that type of argument),
> but to me the big question is:
> 
> Why, when you have a large buffer -- so large that it can effect
> MADV_DONTNEED or munmap when freed -- are you doing so little with it
> in do_stuff() that the work performed on the buffer doesn't dominate
> the time spent?
> 
> This indicates to me that the problem might actually be significant
> over-allocation beyond the size that's actually going to be used. Do
> we have some real-world specific examples of where this is happening?
> If it's poor design in application code and the applications could be
> corrected, I think we should consider whether the right fix is on the
> application side.
> 

I also ran into this issue numerous times, also filed a bug:
https://sourceware.org/bugzilla/show_bug.cgi?id=17195

As a real world example I have higher level numerical software.
E.g. in python numpy you write code like this:
a = b+ c +d
where these are large arrays due to limitations of the library and
python this involves allocating multiple large arrays while the
operations on the memory itself is very small.
Reusing existing memory is hard to do from the high level the developer
is on here.
This also works when single threaded as it accepts the trim threshold
and reuses the memory most of the time, but when doing this in multiple
threads glibc goes haywire and throws away the memory on each free
causing it to be refaulted a few cycles later.
This is really bad for performance, when profiling these types of
applications you will often see 30% of runtime spent clearing pages

(I am aware this type of programming is also bad on the caches, but some
applications do simply not work on a level were you can control this)

Follow-Ups:
- Re: [PATCH] [RFC] malloc: Reduce worst-case behaviour with madvise and refault overhead
  - From: Mel Gorman

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]