This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v2] benchtests: Add malloc microbenchmark
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Will Newton <will dot newton at linaro dot org>
- Cc: Siddhesh Poyarekar <siddhesh at redhat dot com>, libc-alpha <libc-alpha at sourceware dot org>
- Date: Tue, 10 Jun 2014 18:48:08 +0200
- Subject: Re: [PATCH v2] benchtests: Add malloc microbenchmark
- Authentication-results: sourceware.org; auth=none
- References: <1397737835-15868-1-git-send-email-will dot newton at linaro dot org> <20140530094508 dot GQ12497 at spoyarek dot pnq dot redhat dot com> <CANu=Dmh=pdfj088kHQ9-eqaFmXKQN=bkQMjrC851EHjc_G3sPg at mail dot gmail dot com> <20140609163753 dot GI24899 at spoyarek dot pnq dot redhat dot com> <20140609203326 dot GA5396 at domone dot podge> <CANu=DmhwQw69ze5f9bsOQoETK01+H65q8oA0LnDhigYBcrbQ2w at mail dot gmail dot com>
On Tue, Jun 10, 2014 at 08:47:36AM +0100, Will Newton wrote:
> On 9 June 2014 21:33, OndÅej BÃlka <neleai@seznam.cz> wrote:
> > On Mon, Jun 09, 2014 at 10:07:53PM +0530, Siddhesh Poyarekar wrote:
> >> On Mon, Jun 09, 2014 at 04:14:35PM +0100, Will Newton wrote:
> >> > > A maximum of 32K only tests arena allocation performance. This is
> >> > > fine for now since malloc+mmap performance is as interesting. What is
> >> <snip>
> >> >
> >> > There's at least two axes we are interested in - how performance
> >> > scales with the number of threads and how performance scales with the
> >> > allocation size. For thread performance (which this benchmark is
> >> > about) the larger allocations are not so interesting - typically their
> >> > locking overhead is in the kernel rather than userland and in terms of
> >> > real world application performance its just not as likely to be a
> >> > bottleneck as small allocations. We have to be pragmatic in which
> >> > choices we make as the full matrix of threads versus allocation sizes
> >> > would be pretty huge.
> >>
> >> Heh, I noticed my typo now - I meant to say that malloc+mmap
> >> performance is *not* as interesting :)
> >>
> > Problem is that this benchmark does not measure a multithread
> > performance well. Just spawning many threads does not say much, my guess
> > is that locking will quicky cause convergence to state where at each
> > core a thread with separate arena is running. Also it does not measure
> > hard case when you allocate memory in one thread.
> >
> > I looked on multithread benchmark and it has additional flaws:
> >
> > Big variance, running time varies around by 10% accoss iterations,
> > depending on how kernel schedules these. Running threads and measuring
> > time after you join them measures a slowest thread so at end some cores
> > are idle.
>
> Thanks for the suggestion, I will look into this.
>
> > Bad units, when I run a benchmark then with one benchmark a mean is:
> > "mean": 91.605,
> > However when we run 32 threads then it looks that it speeds malloc
> > around three times:
> > "mean": 28.5883,
>
> What is wrong with that? I assume you have a multi-core system, would
> you not expect more threads to have higher throughput?
>
It does say mean which is a mean execution time of function not
throughput. You are more interested on overhead caused by parallelism
than throughput and needing to divide it by number of cores.