This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2] benchtests: Add malloc microbenchmark


On Tue, Jun 10, 2014 at 08:47:36AM +0100, Will Newton wrote:
> On 9 June 2014 21:33, OndÅej BÃlka <neleai@seznam.cz> wrote:
> > On Mon, Jun 09, 2014 at 10:07:53PM +0530, Siddhesh Poyarekar wrote:
> >> On Mon, Jun 09, 2014 at 04:14:35PM +0100, Will Newton wrote:
> >> > > A maximum of 32K only tests arena allocation performance.  This is
> >> > > fine for now since malloc+mmap performance is as interesting.  What is
> >> <snip>
> >> >
> >> > There's at least two axes we are interested in - how performance
> >> > scales with the number of threads and how performance scales with the
> >> > allocation size. For thread performance (which this benchmark is
> >> > about) the larger allocations are not so interesting - typically their
> >> > locking overhead is in the kernel rather than userland and in terms of
> >> > real world application performance its just not as likely to be a
> >> > bottleneck as small allocations. We have to be pragmatic in which
> >> > choices we make as the full matrix of threads versus allocation sizes
> >> > would be pretty huge.
> >>
> >> Heh, I noticed my typo now - I meant to say that malloc+mmap
> >> performance is *not* as interesting :)
> >>
> > Problem is that this benchmark does not measure a multithread
> > performance well. Just spawning many threads does not say much, my guess
> > is that locking will quicky cause convergence to state where at each
> > core a thread with separate arena is running. Also it does not measure
> > hard case when you allocate memory in one thread.
> >
> > I looked on multithread benchmark and it has additional flaws:
> >
> > Big variance, running time varies around by 10% accoss iterations,
> > depending on how kernel schedules these. Running threads and measuring
> > time after you join them measures a slowest thread so at end some cores
> > are idle.
> 
> Thanks for the suggestion, I will look into this.
> 
> > Bad units, when I run a benchmark then with one benchmark a mean is:
> > "mean": 91.605,
> > However when we run 32 threads then it looks that it speeds malloc
> > around three times:
> >  "mean": 28.5883,
> 
> What is wrong with that? I assume you have a multi-core system, would
> you not expect more threads to have higher throughput?
> 
It does say mean which is a mean execution time of function not
throughput. You are more interested on overhead caused by parallelism
than throughput and needing to divide it by number of cores.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]