This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Summary for the glibc benchmark BoF
- From: Steven Munroe <munroesj at linux dot vnet dot ibm dot com>
- To: OndÅej BÃlka <neleai at seznam dot cz>
- Cc: Siddhesh Poyarekar <siddhesh at redhat dot com>, libc-alpha at sourceware dot org
- Date: Tue, 18 Aug 2015 15:13:25 -0500
- Subject: Re: Summary for the glibc benchmark BoF
- Authentication-results: sourceware.org; auth=none
- References: <20150818080953 dot GG2415 at spoyarek dot pnq dot redhat dot com> <1439925005 dot 569 dot 13 dot camel at oc7878010663> <20150818194744 dot GA6194 at domone>
- Reply-to: munroesj at linux dot vnet dot ibm dot com
On Tue, 2015-08-18 at 21:47 +0200, OndÅej BÃlka wrote:
> On Tue, Aug 18, 2015 at 02:10:05PM -0500, Steven Munroe wrote:
> > On Tue, 2015-08-18 at 13:39 +0530, Siddhesh Poyarekar wrote:
> > > Here's a summary of what transpired in and around the glibc
> > > benchmarking BoF at the Cauldron last week. Apologies for sending
> > > this out late. The intent of this email is to get things started to
> > > hopefully have a deliverable by 2.23 release.
> > >
> > > We started with a summary of the current state of benchmarks and
> > > defined the two problem statements we wanted to tackle viz. the string
> > > and malloc benchmark inputs and whole system benchmarks.
> > >
> > > The bigger interest was around whole system benchmarks and we came to
> > > the following points of agreement:
> > >
> > > - Create a separate project outside the glibc source tree that hosts
> > > installed-tree testing framework and code for glibc along with code
> > > to do whole system benchmarks
> > >
> > > - The glibc source tree should have a make target within it that pulls
> > > in the glibc-test project and performs the necessary actions, like
> > > building and running installed-tree tests or system benchmark
> > > framework
> > >
> > > - Work on the benchmark framework should focus on the schema of the
> > > output from the benchmark runs and not the technology. That way, we
> > > allow external tools to run their own benchmarks and submit data for
> > > their workloads. For example, it could be a JSON file with a
> > > specified format that captures details about the test environment, a
> > > description of the workload being tested and then input and timing
> > > data for functions that are being tested.
> > >
> > > - FUTURE: Figure out a way to store the output data and process it
> > >
> > > - FUTURE: Get patterns from the data to come up with representative
> > > inputs for the microbenchmarks
> > >
> > > Ondrej has volunteered to work on this. I guess the next steps would
> > > be for Ondrej to come up with a first draft and also work with
> > > sourceware admins to make a new project namespace for this.
> > >
> > > As for the string benchmarks, opinions on whether the current
> > > benchmarks are useful don't seem to converge. I got the impression
> > > that folks from IBM were content with using the string benchmarks as a
> > > valid input, while Ondrej and a couple of others strongly believe that
> > > the benchmarks are not representative. I lean towards the latter, but
> > > I don't have enough background to definitively lean either way. We
> > > concluded in the end that we would just have to wait for someone to
> > > come up with some concrete improvement suggestions for these
> > > benchmarks. The outputs from whole system benchmarks may help us
> > > build a representative input set for the string microbenchmarks.
> > >
> > I don't think IBM is saying that the current benchmarks are complete or
> > completely representative. We are saying that current benchmarks are
> > what we have, and a assertion that the lack of some hypothetical
> > "better" benchmark, should not be used as an excuse to block a patch.
> >
> A problem is that without benchmark or with misleading benchmarks you
> could end with code that is regression but you didn't found that out due
> lack of accurate benchmark.
>
Then write a better benchmark so that we have it for the next release.
Don't try to block an improvement just because it is not (in your
opinion) perfect.
> So you would need to redo that functions anyway when you found
> performance problems that could be catched in review phase.
>
Performance is process, not an event.
> > I would personally like to see more representative benchmarks based on
> > actual usage.
> >
> > I would also assert that benchmarks should be split into representative
> > (of normal usage) and extreme (for example testing for quadratic
> > behavior or only testing for the needle match at the very end of the
> > haystack) categories. And clearly labeled as such.
> >
> >
> > > Those present at the BoF, please add to this or make corrections if
> > > you think I'm misremembering any points.
> > >
> > > Thanks,
> > > Siddhesh
> > >
>