This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Inline C99 math functions
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Carlos O'Donell <carlos at redhat dot com>
- Cc: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, libc-alpha at sourceware dot org
- Date: Fri, 3 Jul 2015 10:40:56 +0200
- Subject: Re: [PATCH] Inline C99 math functions
- Authentication-results: sourceware.org; auth=none
- References: <001201d0a75b$921d9860$b658c920$ at com> <alpine dot DEB dot 2 dot 10 dot 1506151431490 dot 26683 at digraph dot polyomino dot org dot uk> <001701d0a789$f2ab86f0$d80294d0$ at com> <20150615185201 dot GA3023 at domone> <alpine dot DEB dot 2 dot 10 dot 1506152127340 dot 9772 at digraph dot polyomino dot org dot uk> <20150616050045 dot GA8021 at domone> <55801706 dot 4010109 at linaro dot org> <20150616134331 dot GA7016 at domone> <55807EFB dot 8090702 at redhat dot com>
On Tue, Jun 16, 2015 at 03:54:35PM -0400, Carlos O'Donell wrote:
> On 06/16/2015 09:43 AM, OndÅej BÃlka wrote:
> >> So to make this proposal to move forward, how exactly do you propose to
> >> create a benchtest for such scenario? I get this is tricky and a lot of
> >> variables may apply, but I do agree with Joseph that we shouldn't quite
> >> aim for optimal performance, imho using compiler builtins with reasonable
> >> performance is a gain in code maintainability.
> >>
> > As I said before about these they are hard to measure and I could
> > argue also versus my benchmark that its inaccurate as it doesn't measure
> > effect of cpu pipeline when function does other computation. Answer is
> > don't do microbenchmark.
>
> That's not an answer, an answer is "Here's a patch to extend the libm testing
> to show how isinf/isnan/signbit/isfinite/isnormal/fpclassify impact performance."
>
No its answer as it isn't my responsibility to provide benchmark to
convince that change is desirable but submitters. As I said before he should
for example add catan to benchtest, measure difference and report that.
If necessary increase iteration count to catch difference. Its
neccessary anyway if we want to measure microoptimizations that improve
performance with several cycles.
> I agree that microbenchmarks can be misleading if interpreted by automated
> systems, but we aren't talking about that yet, we are talking about experts
> using these tools to discuss patches in an objective fashion.
>
No they will tell you exacty same argument as I said to explain why what
you do want is impossible.
Carlos you talk lot about deciding objectively but when I ask you out
its never done. So I will ask you again to decide based on my previous
benchmark. There sometimes builtin is 20% faster and sometimes a current
inline is 20% faster. How do you imagine that experts would decide
solely on that instead of telling you that its inconclusive and you need
to do real world measurements or that benchmark is flawed because X?
don't inline
conditional add
branched
real 0m1.313s
user 0m1.312s
sys 0m0.000s
builtin
real 0m1.309s
user 0m1.308s
sys 0m0.000s
branch
branched
real 0m1.310s
user 0m1.308s
sys 0m0.000s
builtin
real 0m1.337s
user 0m1.312s
sys 0m0.004s
sum
branched
real 0m1.209s
user 0m1.204s
sys 0m0.000s
builtin
real 0m1.216s
user 0m1.212s
sys 0m0.000s
inline outer call
conditional add
branched
real 0m0.705s
user 0m0.704s
sys 0m0.000s
builtin
real 0m0.916s
user 0m0.916s
sys 0m0.000s
branch
branched
real 0m0.806s
user 0m0.804s
sys 0m0.000s
builtin
real 0m0.721s
user 0m0.716s
sys 0m0.000s
sum
branched
real 0m1.029s
user 0m1.028s
sys 0m0.000s
builtin
real 0m0.911s
user 0m0.908s
sys 0m0.000s
inline inner call
conditional add
branched
real 0m1.038s
user 0m1.032s
sys 0m0.000s
builtin
real 0m1.024s
user 0m1.016s
sys 0m0.000s
branch
branched
real 0m0.614s
user 0m0.608s
sys 0m0.000s
builtin
real 0m0.606s
user 0m0.608s
sys 0m0.000s
sum
branched
real 0m0.662s
user 0m0.660s
sys 0m0.000s
builtin
real 0m0.629s
user 0m0.628s
sys 0m0.000s
tigth loop
conditional add
branched
real 0m0.208s
user 0m0.208s
sys 0m0.000s
builtin
real 0m0.326s
user 0m0.324s
sys 0m0.000s
branch
branched
real 0m0.204s
user 0m0.200s
sys 0m0.000s
builtin
real 0m0.325s
user 0m0.324s
sys 0m0.000s
sum
branched
real 0m0.328s
user 0m0.332s
sys 0m0.000s
builtin
real 0m0.486s
user 0m0.484s
sys 0m0.000s