This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Summary for the glibc benchmark BoF


On Tue, 2015-08-18 at 21:47 +0200, OndÅej BÃlka wrote:
> On Tue, Aug 18, 2015 at 02:10:05PM -0500, Steven Munroe wrote:
> > On Tue, 2015-08-18 at 13:39 +0530, Siddhesh Poyarekar wrote:
> > > Here's a summary of what transpired in and around the glibc
> > > benchmarking BoF at the Cauldron last week.  Apologies for sending
> > > this out late.  The intent of this email is to get things started to
> > > hopefully have a deliverable by 2.23 release.
> > > 
> > > We started with a summary of the current state of benchmarks and
> > > defined the two problem statements we wanted to tackle viz. the string
> > > and malloc benchmark inputs and whole system benchmarks.
> > > 
> > > The bigger interest was around whole system benchmarks and we came to
> > > the following points of agreement:
> > > 
> > > - Create a separate project outside the glibc source tree that hosts
> > >   installed-tree testing framework and code for glibc along with code
> > >   to do whole system benchmarks
> > > 
> > > - The glibc source tree should have a make target within it that pulls
> > >   in the glibc-test project and performs the necessary actions, like
> > >   building and running installed-tree tests or system benchmark
> > >   framework
> > > 
> > > - Work on the benchmark framework should focus on the schema of the
> > >   output from the benchmark runs and not the technology.  That way, we
> > >   allow external tools to run their own benchmarks and submit data for
> > >   their workloads.  For example, it could be a JSON file with a
> > >   specified format that captures details about the test environment, a
> > >   description of the workload being tested and then input and timing
> > >   data for functions that are being tested.
> > > 
> > > - FUTURE: Figure out a way to store the output data and process it
> > > 
> > > - FUTURE: Get patterns from the data to come up with representative
> > >   inputs for the microbenchmarks
> > > 
> > > Ondrej has volunteered to work on this.  I guess the next steps would
> > > be for Ondrej to come up with a first draft and also work with
> > > sourceware admins to make a new project namespace for this.
> > > 
> > > As for the string benchmarks, opinions on whether the current
> > > benchmarks are useful don't seem to converge.  I got the impression
> > > that folks from IBM were content with using the string benchmarks as a
> > > valid input, while Ondrej and a couple of others strongly believe that
> > > the benchmarks are not representative.  I lean towards the latter, but
> > > I don't have enough background to definitively lean either way.  We
> > > concluded in the end that we would just have to wait for someone to
> > > come up with some concrete improvement suggestions for these
> > > benchmarks.  The outputs from whole system benchmarks may help us
> > > build a representative input set for the string microbenchmarks.
> > > 
> > I don't think IBM is saying that the current benchmarks are complete or
> > completely representative. We are saying that current benchmarks are
> > what we have, and a assertion that the lack of some hypothetical
> > "better" benchmark, should not be used as an excuse to block a patch.
> >
> A problem is that without benchmark or with misleading benchmarks you
> could end with code that is regression but you didn't found that out due
> lack of accurate benchmark.
> 
Then write a better benchmark so that we have it for the next release.

Don't try to block an improvement just because it is not (in your
opinion) perfect.

> So you would need to redo that functions anyway when you found
> performance problems that could be catched in review phase.
> 
Performance is process, not an event.


> > I would personally like to see more representative benchmarks based on
> > actual usage.
> > 
> > I would also assert that benchmarks should be split into representative
> > (of normal usage) and extreme (for example testing for quadratic
> > behavior or only testing for the needle match at the very end of the
> > haystack) categories. And clearly labeled as such.
> > 
> > 
> > > Those present at the BoF, please add to this or make corrections if
> > > you think I'm misremembering any points.
> > > 
> > > Thanks,
> > > Siddhesh
> > > 
> 



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]