This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Simple malloc benchtest.

From: OndÅej BÃlka <neleai at seznam dot cz>
To: Siddhesh Poyarekar <siddhesh at redhat dot com>
Cc: Siddhesh Poyarekar <siddhesh dot poyarekar at gmail dot com>, GNU C Library <libc-alpha at sourceware dot org>
Date: Tue, 31 Dec 2013 12:38:41 +0100
Subject: Re: [PATCH] Simple malloc benchtest.
Authentication-results: sourceware.org; auth=none
References: <20131221153303 dot GA8420 at domone dot podge> <20131223090627 dot GF4979 at spoyarek dot pnq dot redhat dot com> <20131223095034 dot GA20816 at domone> <20131223110912 dot GG4979 at spoyarek dot pnq dot redhat dot com> <20131223133157 dot GB20816 at domone> <CAAHN_R2+N432brpObQ9RK4cJgjcqvpf+gtF7J4b1wwMShFQu7A at mail dot gmail dot com> <20131228011928 dot GB15270 at domone> <20131231060303 dot GA5374 at spoyarek dot pnq dot redhat dot com>

On Tue, Dec 31, 2013 at 11:33:03AM +0530, Siddhesh Poyarekar wrote:
> On Sat, Dec 28, 2013 at 02:19:28AM +0100, OndÅej BÃlka wrote:
> > You are reinventing the wheel. What you described is measurement 1:
> 
> I didn't reinvent the wheel.  I know I described measurement 1 - I
> started my statement with 'IMO measurement 1, i.e. ...'.
> 
> > You need to know limits of measurements, and that they could fail badly
> > in certain circumstances. In papers a 4. is common as authors know that
> > other measurements are more misleading.
> 
> I did not say that measurement 4 is altogether wrong either.  It may
> have uses in other purposes, like usage as one metric when comparing
> fragmentation of algorithms.  It is however a horrible measurement
> when your aim is to decide if fragmentation as a problem is relevant
> at all since it will obviously give you the answer you're looking for.
> 
You need to be careful that problem is indeed fragmentation but not
header overhead or recluctance to return memory for system.

When application frees memory then it decreases a live/system memory
ratio but this is mostly a potential problem as these will be reused
until you reach point when you need to allocate more memory. That point
depends on application usage pattern, one possibility would be do
simplification that allocations will have same size from range 16-4096,
and compute percentage that can be used and draw that in graph. Larger
requests are not problem from memory usage point as you can map/unmap
pages contained in gaps, that would increase page fragmentation which is
other problem.

> > 1. Not really measuring fragmentation, when you have program that
> > previously needed twice of live memory depending on allocator free
> > memory could form either continuous chunk or be fragmented on many
> > islands. With that amount of resources even bad allocator could likely
> > satisfy all requests and when average consists mostly of these
> > contributions then results will turn inconclusive.
> 
> <snip>
> 
> > 2. A number you get could be chaotic depending on tiny detail. Imagine a
> > program that in start does lot of small allocations, then frees everything 
> > but last allocation. Then it does long computation that is not memory intensive.
> > 
> > In that program amount of memory used is depends only on address of last
> > allocating which will vary wildly even for most innocent change of how
> > chunks are ordered.
> > 
> > A variant of this problem is that you have program that is mostly stable
> > except of periodic peaks. 
> > 
> > A good allocator could fit 80% of peak usage and need to place only 20%
> > into new area. A bad allocator would place 80% of peak data into new
> > area.
> > 
> > Bad allocator would come as winner as when freeing peak data it crosses
> > a trim treshold but good allocator does not.
> 
> For both these points you talk about a 'bad allocator' without
> actually defining what it means.  Does it mean a slow allocator or
> does it mean an allocator that is good at freeing resources in a
> timely manner?  It doesn't seem to be the latter if it can be tweaked
> into managing resources better, so I'm assuming you mean a slow
> allocator?  Do you still not see why simply allocation speed is not a
> good enough measurement?
> 
We are talking about fragmentation so good/bad refer to how big
fragmentation these cause. A example of bad allocator is pick largest
gap and place reqest there. Question if allocator is good depends on
workload and I do not think that there is a simple universal definition to
cover these.

A performance of allocator is not factor in these comparisons.

> > 3. Problems with stability. Unless you are measuring a long running process
> > with mostly flat profile then numbers you get could vary wildly
> > depending on input. Allocation pattern could stay same but if majority
> > of time is spend in routine that does not allocate memory but time
> > varies by input you get different percentages.
> 
> Agreed.
> 
> > 4. That by averaging a live memory versus used memory does not measure
> > fragmentation but how effectively we return memory to system.
> > 
> > You could get better numbers by simply choosing a smaller trim treshold
> > and triggering trim more often.
> > 
> 
> Yes, at the cost of performance.  You make an interesting point by
> trying to differentiate between the act of just blocking memory and
> not using it and blocking memory, not using it and asking for
> additional memory from the system, the latter behaviour being the
> textbook definition of fragmentation.  I believe this ties in with
> your point 3 as well, since a program with a flat profile where it has
> blocked a big chunk of memory because of an allocated chunk on top
> will show up as performing badly, when it shouldn't.
> 
> One way to counter this could be to somehow factor in events like
> growth of memory in the presence of holes in memory, so what you'll be
> looking at is the slope of memory usage, where a flatter profile is
> better.  But that's just off the top of my head, so I'm sure there
> would be more factors in play.
> 
You need to sacrifice these when comparing different strategies as a
otherwise performance of both would depend on a tunable parameter and
winner would be one that picked smaller tunable which without
considering other factors would not say which one is better.

> > We currently do not return memory to system well, see bug 11261 and it
> > is common that users like this one complain that we use 50% more memory
> > than alternatives.
> > http://juliank.wordpress.com/2012/05/31/memory-allocators/
> 
> I wonder how relevant the comparison in memory usage is in that blog
> post, given that the bloated 'memory usage' may actually just be
> address space usage due to the per-thread arenas, where most of each
> arena is unused and doesn't even have any pages backing the address
> space.  In other words, I believe more information is needed from that
> test.
> 
Still when average user discovers that we use more memory than others it
does not look good.

> That said, I don't disagree that we need to look at and monitor how
> good or bad our fragmentation is.  In fact, that is why I mentioned
> the need for any malloc benchmark to keep a tab on fragmentation as
> well ;)
> 
> > You would get more realistic estimate by actually unmapping unused pages
> > and dealing with holes, but that would change how long program spends on
> > various parts which would change average.
> 
> I'm not sure I understand.  I assume you're suggesting that we unmap
> or mprotect(PROT_NONE) (or use some other mechanism to give back the
> address space) for chunks that are larger than a threshold.  That's
> not a bad idea and it will reduce memory usage, but the cost would be
> additional syscalls.  More work needs to be done to show that the net
> result is beneficial and you'll need a benchmark that does both,
> measure fragmentation and speed, to prove that :)

Yes returning memory by some means but this needs tested on live
applications than synthetic benchmarks. For these you probably want to
use a time information, like having a queue with block to unmap with
timestamps and periodically (like once per 128 allocs/ big alloc) you go 
through it and unmap chunks older than 100ms.

You could also periodically read /proc/meminfo and based on that
determine thresholds.

Follow-Ups:
- Re: [PATCH] Simple malloc benchtest.
  - From: Siddhesh Poyarekar

References:
- [PATCH] Simple malloc benchtest.
  - From: OndÅej BÃlka
- Re: [PATCH] Simple malloc benchtest.
  - From: Siddhesh Poyarekar
- Re: [PATCH] Simple malloc benchtest.
  - From: OndÅej BÃlka
- Re: [PATCH] Simple malloc benchtest.
  - From: Siddhesh Poyarekar
- Re: [PATCH] Simple malloc benchtest.
  - From: OndÅej BÃlka
- Re: [PATCH] Simple malloc benchtest.
  - From: OndÅej BÃlka
- Re: [PATCH] Simple malloc benchtest.
  - From: Siddhesh Poyarekar

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]