This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] [RFC] malloc: Reduce worst-case behaviour with madvise and refault overhead


On 02/13/2015 05:29 PM, Mel Gorman wrote:
> On Fri, Feb 13, 2015 at 04:42:34PM +0100, Julian Taylor wrote:
>> On 02/13/2015 02:10 PM, Mel Gorman wrote:
>>> On Thu, Feb 12, 2015 at 06:58:14PM +0100, Julian Taylor wrote:
>>>>> On Mon, Feb 09, 2015 at 03:52:22PM -0500, Carlos O'Donell wrote:
>>>>>> On 02/09/2015 09:06 AM, Mel Gorman wrote:
>>>>>>> while (data_to_process) {
>>>>>>> 	buf = malloc(large_size);
>>>>>>> 	do_stuff();
>>>>>>> 	free(buf);
>>>>>>> }
>>>>>>
>>>>>> Why isn't the fix to change the application to hoist the
>>>>>> malloc out of the loop?
>>>>>
>>>>> I understand this is impossible for some language idioms (typically
>>>>> OOP, and despite my personal belief that this indicates they're bad
>>>>> language idioms, I don't want to descend into that type of argument),
>>>>> but to me the big question is:
>>>>>
>>>>> Why, when you have a large buffer -- so large that it can effect
>>>>> MADV_DONTNEED or munmap when freed -- are you doing so little with it
>>>>> in do_stuff() that the work performed on the buffer doesn't dominate
>>>>> the time spent?
>>>>>
>>>>> This indicates to me that the problem might actually be significant
>>>>> over-allocation beyond the size that's actually going to be used. Do
>>>>> we have some real-world specific examples of where this is happening?
>>>>> If it's poor design in application code and the applications could be
>>>>> corrected, I think we should consider whether the right fix is on the
>>>>> application side.
>>>>>
>>>>
>>>>
>>>> I also ran into this issue numerous times, also filed a bug:
>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=17195
>>>>
...
>>>
>>> Alternatively, would you be in the position to test v2 of this patch and
>>> see if the performance of your application can be adressed by tuning trim
>>> threshold to a high value?
>>>
>>
>> I can give it a try. Though the openmp testcase from the bug should be
>> the same problem and you can hopefully try that yourself.
> 
> I easily can but the first version got hit with the "does anything in
> the real world care?" hammer. Any artifical test case is vunerable to
> the same feedback. The python snippet is also artifical but it's a bit
> harder to wave away as being either unreasonable code or fixable through
> other means.
> 

fwiw there was actually a paper written about the memory allocation in
numpy being bad due to page zeroing with glibc [0].
it has many real world testcases and they show real world gains that can
be archived by using a custom caching allocator.
Unfortunately it the paper does not go into detail about which cases
spawned threads and which didn't. Many of the poor performance cases
might also be caused by unavoidable page zeroing simply because the data
was above the hard mmap threshold so it is hard to say how much is
really caused by this specific issue.
But it is likely that at least some of them spawned threads, e.g. when
numpy is using OpenBLAS threads are started at import time, regardless
of what the code actually does.

[1] hiperfit.dk/pdf/Doubling.pdf, I was not involved in any way with
this paper


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]