This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] [RFC] malloc: Reduce worst-case behaviour with madvise and refault overhead


Thanks for reviewing.

On Mon, Feb 09, 2015 at 03:52:22PM -0500, Carlos O'Donell wrote:
> On 02/09/2015 09:06 AM, Mel Gorman wrote:
> > while (data_to_process) {
> > 	buf = malloc(large_size);
> > 	do_stuff();
> > 	free(buf);
> > }
> 
> Why isn't the fix to change the application to hoist the
> malloc out of the loop?
> 
> buf = malloc(large_size);
> while (data_to_process)
>   {
>     do_stuff();
>   }
> free(buf);
> 
> Is it simply that the software frameworks themselves are
> unable to do this directly?
> 

Fixing the benchmark in this case hides the problem -- glibc malloc has
a pathological worst case for a relatively basic allocation pattern that
is encountered simply because it happens to use threads (processes would
have avoided the problem). It was spotted when comparing versions of a
distribution and initially I assumed it was a kernel issue until I analysed
the problem. Even if ebizzy was fixed and it had an upstream maintainer
that would accept the patch, glibc would still have the same problem. It
would be a bit of a shame if the recommendation in some cases was simply
to avoid using malloc/free and instead cache buffers within the application.

> I can understand your position. Ebizzy models the workload and
> you use the workload model to improve performance by changing
> the runtime to match the workload.
> 

Exactly.

> The problem I face as a maintainer is that you've added
> complexity to malloc in the form of a decaying counter, and
> I need a strong justification for that kind of added complexity.
> 

I would also welcome suggestions on how madvise could be throttled without
the use of counters. The counters are heap-local where I do not expect
there will be cache conflicts and the allocation-side counter is only
updated after a recent heap shrink to minimise updates.

Initially I worked around this in the kernel but any solution there
breaks the existing semantics of MADV_DONTNEED and was rejected. See
last paragraph of https://lkml.org/lkml/2015/2/2/696 .

> For example, I see you're from SUSE, have you put this change
> through testing in your distribution builds or releases?

I'm not the glibc maintainer for our distribution but even if I was, the
distribution has an upstream-first policy. A change of this type would
have to be acceptable to the upstream maintainers. If this can be addressed
here then I can ask our glibc maintainers to apply the patch as a backport.

> What were the results? Under what *real* workloads did this
> make a difference?
> 

It was detected manually but the behaviour was also spotted in firefox
during normal browsing (200 calls to madvise on average per second when
monitored for a short period) and evolution when updating search folders
(110 madvise calls per second) but in neither case can I actually quantify
the impact because the overhead is a relatively small part of the overall
workload.

MariaDB when populating a database during the startup phase of sysbench
benchmark was calling madvise 95 times a second during population. In
that case, the cost of the page table teardown + refault is negligible in
comparison to the IO costs and bulk of the CPU time is spent in mariadb
itself but glancing at perf top, it looks like about 25% of system CPU
time is spent tearing down and reallocating pages CPU cycles were spent
on teardown measured cycles). During the sysbench run itself, mariadb was
calling madvise 2000 times a second.  I didn't formally quantify the impact
as I do not have a test setup for testing glibc modifications system-wide
and it's essentially the same problem seen by ebizzy except ebizzy is a
hell of a lot easier to test with a modified glibc.

Thanks.

-- 
Mel Gorman
SUSE Labs


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]