This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Summary for the glibc benchmark BoF

From: OndÅej BÃlka <neleai at seznam dot cz>
To: Steven Munroe <munroesj at linux dot vnet dot ibm dot com>
Cc: Siddhesh Poyarekar <siddhesh at redhat dot com>, libc-alpha at sourceware dot org
Date: Tue, 18 Aug 2015 21:47:44 +0200
Subject: Re: Summary for the glibc benchmark BoF
Authentication-results: sourceware.org; auth=none
References: <20150818080953 dot GG2415 at spoyarek dot pnq dot redhat dot com> <1439925005 dot 569 dot 13 dot camel at oc7878010663>

On Tue, Aug 18, 2015 at 02:10:05PM -0500, Steven Munroe wrote:
> On Tue, 2015-08-18 at 13:39 +0530, Siddhesh Poyarekar wrote:
> > Here's a summary of what transpired in and around the glibc
> > benchmarking BoF at the Cauldron last week.  Apologies for sending
> > this out late.  The intent of this email is to get things started to
> > hopefully have a deliverable by 2.23 release.
> > 
> > We started with a summary of the current state of benchmarks and
> > defined the two problem statements we wanted to tackle viz. the string
> > and malloc benchmark inputs and whole system benchmarks.
> > 
> > The bigger interest was around whole system benchmarks and we came to
> > the following points of agreement:
> > 
> > - Create a separate project outside the glibc source tree that hosts
> >   installed-tree testing framework and code for glibc along with code
> >   to do whole system benchmarks
> > 
> > - The glibc source tree should have a make target within it that pulls
> >   in the glibc-test project and performs the necessary actions, like
> >   building and running installed-tree tests or system benchmark
> >   framework
> > 
> > - Work on the benchmark framework should focus on the schema of the
> >   output from the benchmark runs and not the technology.  That way, we
> >   allow external tools to run their own benchmarks and submit data for
> >   their workloads.  For example, it could be a JSON file with a
> >   specified format that captures details about the test environment, a
> >   description of the workload being tested and then input and timing
> >   data for functions that are being tested.
> > 
> > - FUTURE: Figure out a way to store the output data and process it
> > 
> > - FUTURE: Get patterns from the data to come up with representative
> >   inputs for the microbenchmarks
> > 
> > Ondrej has volunteered to work on this.  I guess the next steps would
> > be for Ondrej to come up with a first draft and also work with
> > sourceware admins to make a new project namespace for this.
> > 
> > As for the string benchmarks, opinions on whether the current
> > benchmarks are useful don't seem to converge.  I got the impression
> > that folks from IBM were content with using the string benchmarks as a
> > valid input, while Ondrej and a couple of others strongly believe that
> > the benchmarks are not representative.  I lean towards the latter, but
> > I don't have enough background to definitively lean either way.  We
> > concluded in the end that we would just have to wait for someone to
> > come up with some concrete improvement suggestions for these
> > benchmarks.  The outputs from whole system benchmarks may help us
> > build a representative input set for the string microbenchmarks.
> > 
> I don't think IBM is saying that the current benchmarks are complete or
> completely representative. We are saying that current benchmarks are
> what we have, and a assertion that the lack of some hypothetical
> "better" benchmark, should not be used as an excuse to block a patch.
>
A problem is that without benchmark or with misleading benchmarks you
could end with code that is regression but you didn't found that out due
lack of accurate benchmark.

So you would need to redo that functions anyway when you found
performance problems that could be catched in review phase.
 
> I would personally like to see more representative benchmarks based on
> actual usage.
> 
> I would also assert that benchmarks should be split into representative
> (of normal usage) and extreme (for example testing for quadratic
> behavior or only testing for the needle match at the very end of the
> haystack) categories. And clearly labeled as such.
> 
> 
> > Those present at the BoF, please add to this or make corrections if
> > you think I'm misremembering any points.
> > 
> > Thanks,
> > Siddhesh
> >

Follow-Ups:
- Re: Summary for the glibc benchmark BoF
  - From: Steven Munroe

References:
- Summary for the glibc benchmark BoF
  - From: Siddhesh Poyarekar
- Re: Summary for the glibc benchmark BoF
  - From: Steven Munroe

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]