This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] [RFC] malloc: Reduce worst-case behaviour with madvise and refault overhead

From: "Carlos O'Donell" <carlos at redhat dot com>
To: Mel Gorman <mgorman at suse dot de>
Cc: Rik van Riel <riel at redhat dot com>, KOSAKI Motohiro <kosaki dot motohiro at gmail dot com>, Konstantin Serebryany <kcc at google dot com>, Minchan Kim <minchan dot kim at gmail dot com>, libc-alpha at sourceware dot org
Date: Tue, 10 Feb 2015 14:01:45 -0500
Subject: Re: [PATCH] [RFC] malloc: Reduce worst-case behaviour with madvise and refault overhead
Authentication-results: sourceware.org; auth=none
References: <20150209140608 dot GD2395 at suse dot de> <54D91E06 dot 7060603 at redhat dot com> <20150209224947 dot GA21275 at suse dot de> <54DA25D0 dot 3050501 at redhat dot com> <20150210165414 dot GF2395 at suse dot de>

On 02/10/2015 11:54 AM, Mel Gorman wrote:
> It's true that MADV_FREE will be faster than MADV_DONTNEED in this case
> as the pages do not need to be reallocated. That does not make it a
> cheap operation and using MADV_FREE thousands of times per second in the
> worst-case is still going to hurt. MADV_FREE still incurs a page table
> protection update, TLB flush and if it's reused then it's another minor
> fault. In the absense of memory pressure, the main gain is that you do
> not have to refault, realloc and zero the pages.

I agree.

>> Thus, instead of adding more complex heuristics to glibc's malloc I think
> 
> Is it really that complex? It's a basic heuristic comparing two counters
> that affects the alloc and free slow paths. The checks are neglible in
> comparison to the necessary work if glibc calls into the kernel.

It is complex. It's a time-based integral value that is going to 
complicate debugging, reproducers, and users will need to be educated
as to how it works. We will likely also want a tunnable for it such that
if it negatively impacts workloads we can disable it. The tunnable though
adds runtime overhead, and that's all added complexity.

Please keep in mind that glibc is a general purpose allocator which aims
to be good at the average case. Unfortunately to be honest we lack any data
with which to measure "average" workloads, which is why glibc as a community
is looking at whole system benchmarks to gather workload data and integrate
it directly into our microbenchmark (see benchtests/) or a future whole system
benchmark framework.

So while the technical complexity of the fix is low, that's not the only
factor I measure when looking at a fix.

A high quality contribution for changes to malloc should IMO include a
microbenchmark or two to flesh out the matching workload against which the
change was tested (see benchtests/README).

>> we should be testing the addition of MADV_FREE in glibc's malloc and then
>> supporting or adjusting the proposal for MADV_FREE or vrange dependent on
>> the outcome.
>>
> 
> They are ortogonal to each other and both can exist side by side. MADV_FREE
> is cheaper than MADV_DONTNEED but avoiding unnecessary system calls is far
> cheaper. As for MADV_FREE vs vrange, my expectation is that MADV_FREE will
> be merged in the current merge window. The patches are sitting in Andrew
> Morton's mmotm tree but he has not sent a merge request yet.

They are orthogonal in a technical sense, but they solve the same problem:
Make glibc's malloc faster for workload X.

My opinion is that MADV_FREE is a technically superior solution to rate
limiting madvise() calls from malloc. Rate limiting or dynamic heuristics
have been difficult to tune and support over the last decade, and I am
very hesitant to accept more without serious discussion, justification,
and data.

>> In the meantime we can talk about mitigating the problems in glibc's
>> allocator for systems with old kernels, but it should not be the primary
>> solution. In glibc we would conditionalize the changes against the first
>> kernel version that included MADV_FREE, and when the minimum supported
>> kernel version is higher than that we would remove the code in question.
>>
>> My suggested next steps are:
>>
>> (a) Test using kernel+MADV_FREE with a hacked glibc malloc that uses
>>     MADV_FREE, see how that performs, and inform upstream kernel.
>>
> 
> The upstream kernel developers in this case won't care what the performance
> is (changelog is already set) although Michan would care if there was any
> bug reports associated with its usage. MADV_FREE is likely to be supported
> either way and it's up to the glibc folk whether they want to support it.

We do want to support it. I also believe that upstream does care.

>> (b) Continue discussion over rate-limiting MADV_DONTNEED as a temporary
>>     measure. Before we go any further here, please increase M_TRIM_THRESHOLD
>>     in ebizzy and see if that makes a difference? It should make a difference
>>     by increasing the threshold at which we trim back to the OS, both sbrk,
>>     and mmaps, and thus reduce the MADV_DONTNEED calls at the cost of increased
>>     memory pressure. Changing the default though is not a trivial thing, since
>>     it could lead to immediate OOM for existing applications that run close to
>>     the limit of RAM. Discussion and analysis will be required.
>>
> 
> Altering trim threshold does not appear to help but that's not really
> surprising considering that it's only applied the main arena and not the
> per-thread heaps. At least that is how I'm interpreting this check

I agree. It looks like a bug to me. The arenas should from first principles
all behave relatively the same, regardless of the underlying storage e.g.
brk vs. mmap.

> You may have noticed that part of the patch takes mp_.mmap_threshold into
> account in the shrinking decision but this was the wrong choice. I should
> have at least used the trim threshold there.  I had considered just using
> trim threshold but worried that the glibc developers would not like it
> as it requred manual tuning by the user.

Please do not hesitate to ask an opinion on the list, like you are doing
right now with your RFC. Just ask if there is a preference, choice, or
rationale for using or not using trim_threshold.

I see algorithm development going through various phases:

(a) Implement algorithm, add tunnables.

(b) Evaluate algorithm against workloads and tweak tunnables.

(c) Develop dynamic tunning and test.

(e) Replace tunnable with dynamic tunning.

With glibc's malloc we've been at (b) for years. I hope that with some
of the tunnables unification work we're doing that we can make (b), 
(c) and (d) easier for the core developers.

> If trim threshold was to be used as a workaround then I'd think we'd need
> at this patch. Anyone using the ebizzy benchmark without knowing this will
> still report a regression between distros with newer glibcs but at least
> a google search might find this thread.

The trimming threshold is another solution.

Limiting the trimming should reduce the number of MADV_DONTNEED calls,
and make all of the arenas behave more uniformly.

> diff --git a/malloc/arena.c b/malloc/arena.c
> index 886defb074a2..a78d4835a825 100644
> --- a/malloc/arena.c
> +++ b/malloc/arena.c
> @@ -696,7 +696,7 @@ heap_trim (heap_info *heap, size_t pad)
>      }
>    top_size = chunksize (top_chunk);
>    extra = (top_size - pad - MINSIZE - 1) & ~(pagesz - 1);
> -  if (extra < (long) pagesz)
> +  if (extra < (long) mp_.trim_threshold)
>      return 0;
>  
>    /* Try to shrink. */
> 

This patch looks correct to me. I think it's a bug that non-main arenas
don't use the trim threshold. All arenas should behave the same way,
and thus make them easier to debug and use.

How does the patch impact your testing?

I might try this out in Fedora Rawhide for a few months.

Would you post another patch with just this change and your test
results?

Cheers,
Carlos.

Follow-Ups:
- Re: [PATCH] [RFC] malloc: Reduce worst-case behaviour with madvise and refault overhead
  - From: Mel Gorman
- [PATCH] [v2] malloc: Consistently apply trim_threshold to all heaps
  - From: Mel Gorman
- Re: [PATCH] [RFC] malloc: Reduce worst-case behaviour with madvise and refault overhead
  - From: Rich Felker

References:
- [PATCH] [RFC] malloc: Reduce worst-case behaviour with madvise and refault overhead
  - From: Mel Gorman
- Re: [PATCH] [RFC] malloc: Reduce worst-case behaviour with madvise and refault overhead
  - From: Carlos O'Donell
- Re: [PATCH] [RFC] malloc: Reduce worst-case behaviour with madvise and refault overhead
  - From: Mel Gorman
- Re: [PATCH] [RFC] malloc: Reduce worst-case behaviour with madvise and refault overhead
  - From: Carlos O'Donell
- Re: [PATCH] [RFC] malloc: Reduce worst-case behaviour with madvise and refault overhead
  - From: Mel Gorman

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]