This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] benchtests: Add malloc microbenchmark


On Tue, 2014-04-15 at 12:27 -0400, Rich Felker wrote:
> On Tue, Apr 15, 2014 at 04:42:25PM +0100, Will Newton wrote:
> > On 15 April 2014 16:36, Steven Munroe <munroesj@linux.vnet.ibm.com> wrote:
> > > On Tue, 2014-04-15 at 14:35 +0100, Will Newton wrote:
> > >> Add a microbenchmark for measuring malloc and free performance. The
> > >> benchmark allocates and frees buffers of random sizes in a random
> > >> order and measures the overall execution time and RSS. Variants of the
> > >> benchmark are run with 8, 32 and 64 threads to measure the effect of
> > >> concurrency on allocator performance.
> > >>
> > >> The random block sizes used follow an inverse square distribution
> > >> which is intended to mimic the behaviour of real applications which
> > >> tend to allocate many more small blocks than large ones.
> > >>
> > >
> > > This test is more likely to measure the locking overhead of random then
> > > it is to measure malloc performance.
> > 
> > It uses rand_r so I don't think this is the case.
> 
> If you're using rand_r, you need to be careful how you use the output,
> as glibc's rand_r implementation has very poor statistical properties.
> See:
> 
> http://sourceware.org/bugzilla/show_bug.cgi?id=15615
> 
> snip
> 
> > The benchmark code spends roughly 80% of its time within malloc/free
> > and friends, which is good, but does leave some room for improvement.
> > Around 10% of the time is spent in dealing with random number
> > generation so maybe a simple inline random number generator would
> > improve things.
> 
I personally strive for 95-99% time in the software-under-test (SUT).
This is much harder then it looks but can and should be done.

The other issue to look out for is gettimeofday/clock_gettime overheads.
You need to run the SUT long enough that the clock reading and
conversion is not a factor in the measurement.

> What about just pregenerating a large array of random numbers and
> accessing sequentual slots of the array? This potentially has cache
> issues but it might be possible to simply use a small array and wrap
> back to the beginning, perhaps performing a trivial operation like
> adding the last output of the previous run onto the value in the
> array.
> 
This is generally a better design for a micro-benchmark. 



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]