This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Inline C99 math functions
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Carlos O'Donell <carlos at redhat dot com>
- Cc: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, libc-alpha at sourceware dot org
- Date: Thu, 9 Jul 2015 17:02:44 +0200
- Subject: Re: [PATCH] Inline C99 math functions
- Authentication-results: sourceware.org; auth=none
- References: <alpine dot DEB dot 2 dot 10 dot 1506151431490 dot 26683 at digraph dot polyomino dot org dot uk> <001701d0a789$f2ab86f0$d80294d0$ at com> <20150615185201 dot GA3023 at domone> <alpine dot DEB dot 2 dot 10 dot 1506152127340 dot 9772 at digraph dot polyomino dot org dot uk> <20150616050045 dot GA8021 at domone> <55801706 dot 4010109 at linaro dot org> <20150616134331 dot GA7016 at domone> <55807EFB dot 8090702 at redhat dot com> <20150703084056 dot GD32307 at domone> <55969C28 dot 5060803 at redhat dot com>
On Fri, Jul 03, 2015 at 10:28:56AM -0400, Carlos O'Donell wrote:
> On 07/03/2015 04:40 AM, OndÅej BÃlka wrote:
> > On Tue, Jun 16, 2015 at 03:54:35PM -0400, Carlos O'Donell wrote:
> >> On 06/16/2015 09:43 AM, OndÅej BÃlka wrote:
> >>>> So to make this proposal to move forward, how exactly do you propose to
> >>>> create a benchtest for such scenario? I get this is tricky and a lot of
> >>>> variables may apply, but I do agree with Joseph that we shouldn't quite
> >>>> aim for optimal performance, imho using compiler builtins with reasonable
> >>>> performance is a gain in code maintainability.
> >>>>
> >>> As I said before about these they are hard to measure and I could
> >>> argue also versus my benchmark that its inaccurate as it doesn't measure
> >>> effect of cpu pipeline when function does other computation. Answer is
> >>> don't do microbenchmark.
> >>
> >> That's not an answer, an answer is "Here's a patch to extend the libm testing
> >> to show how isinf/isnan/signbit/isfinite/isnormal/fpclassify impact performance."
> >>
> > No its answer as it isn't my responsibility to provide benchmark to
> > convince that change is desirable but submitters. As I said before he should
> > for example add catan to benchtest, measure difference and report that.
> > If necessary increase iteration count to catch difference. Its
> > neccessary anyway if we want to measure microoptimizations that improve
> > performance with several cycles.
>
> I never said it was your responsibility. It is the responsibility of the person
> submitting the patch to provide an objective description of how they verified
> the performance gains. It has become standard practice to recommend the author
> of the patch contribute a microbenchmark, but it need not be a microbenchmark.
> The author does need to provide sufficient performance justification including
> the methods used to measure the performance gains to ensure the results can
> be reproduced.
>
Correct. Here there are plenty of things that could go wrong so I need
to point that out.
Only benchmark where I would be certain with result would be take one of
numerical applications where is* is bottleneck that were mentioned
before and measure performance before and after.
> In this case if you believe catan can be used to test the performance in this
> patch, please suggest this to Wilco. However, I will not accept performance
> patches without justification for the performance gains.
>
Already wrote about that to get more precise answer.
> > Carlos you talk lot about deciding objectively but when I ask you out
> > its never done. So I will ask you again to decide based on my previous
> > benchmark. There sometimes builtin is 20% faster and sometimes a current
> > inline is 20% faster. How do you imagine that experts would decide
> > solely on that instead of telling you that its inconclusive and you need
> > to do real world measurements or that benchmark is flawed because X?
>
> My apologies if I failed to do something you requested.
>
> I'm not interested in abstract examples, I'm interested in the patch being
> submitted by ARM. I will continue this discussion on the downstream thread
> that includes the microbenchmark written by Wilco.
>
> In general I expect there are certainly classes of performance problems
> that have inconclusive results. In which case I will reject that patch until
> you tell me how you measure the performance gains and what you expect the
> performance gains to be on average.
>
Problem is that most functions have inconclusive results as we use mined
cases where its possible and must use assumptions about input distribution to get speedup.
Simplest example is that in string functions a we assume that 64 bytes
cross page boundary only rarely and list of these assumptions goes on
and on. Then when microbenchmark violates one of these as its simplistic
truth is if that gives real speedup of programs not what microbenchmark
shows.