This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2] benchtests: Add malloc microbenchmark


On Mon, Jun 09, 2014 at 04:14:35PM +0100, Will Newton wrote:
> > A maximum of 32K only tests arena allocation performance.  This is
> > fine for now since malloc+mmap performance is as interesting.  What is
<snip>
> 
> There's at least two axes we are interested in - how performance
> scales with the number of threads and how performance scales with the
> allocation size. For thread performance (which this benchmark is
> about) the larger allocations are not so interesting - typically their
> locking overhead is in the kernel rather than userland and in terms of
> real world application performance its just not as likely to be a
> bottleneck as small allocations. We have to be pragmatic in which
> choices we make as the full matrix of threads versus allocation sizes
> would be pretty huge.

Heh, I noticed my typo now - I meant to say that malloc+mmap
performance is *not* as interesting :)

> So I guess I should probably also write a benchmark for allocation
> size for glibc as well...

Yes, it would be a separate benchmark and probably would need some
specific allocation patterns rather than random sizes.  Of course
choosing allocation patterns is not going to be easy.

> > Mark as const.
> 
> Ok, although I don't believe it affects code generation.

Right, it's just pedantry.

> > I don't know how useful max_rss would be since we're only doing a
> > malloc and never really writing anything to the allocated memory.
> > Smaller sizes may probably result in actual page allocation since we
> > write to the chunk headers, but probably not so for larger sizes.
> 
> Yes, it is slightly problematic. What you probably want to to do is
> zero all the memory and measure RSS at that point but it would slow
> down the benchmark and spend lots of time in memset instead. At the
> moment it tells you how many pages are taken up by book-keeping but
> not how many of those pages your application would touch anyway.

Oh I didn't mean to imply that we zero pages and try to get a more
accurate RSS value.  My point was that we could probably just do away
with it completely because it doesn't really tell us much - I can't
see how pages taken up by book-keeping would be useful.

However if you do want to show resource usage, then address space
usage (VSZ) might show scary numbers due to the per-thread arenas, but
they would be much more representative.  Also, it might be useful to
see how address space usage scales with threads, especially for
32-bit.

> No I haven't looked into that, so far I have been treating malloc as a
> black box and I'm hoping not to tailor teh benchmark too far to one
> implementation or another.

I agree that the benchmark should not be tailored to the current
implementation, but then this behaviour would essentially be another
set of inputs.  Simply increasing the maximum size from 32K to about
128K (that's the initial threshold for mmap anyway) might result in
that behaviour being triggered more frequently.

> I'll rework the patches and hopefully get a graphing script to go
> with it...

Thanks!  I have marked this patch as Accepted in patchwork as I think
it could go in as an initial revision for the test with nits fixed, so
you can push the benchmark and then work on improvements to it.  Or
you can do your improvements and post a new version - your choice.

Siddhesh

Attachment: pgpQsCwtUHARB.pgp
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]