This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Fixes tree-loop-distribute-patterns issues
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Roland McGrath <roland at hack dot frob dot com>
- Cc: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>, Carlos O'Donell <carlos at redhat dot com>, "GNU C. Library" <libc-alpha at sourceware dot org>, Siddhesh Poyarekar <siddhesh at redhat dot com>
- Date: Fri, 21 Jun 2013 04:00:55 +0200
- Subject: Re: [PATCH] Fixes tree-loop-distribute-patterns issues
- References: <51C0AFB7 dot 1060009 at linux dot vnet dot ibm dot com> <20130618205608 dot 9CCE22C0AC at topped-with-meat dot com> <51C1BFE9 dot 4070805 at linux dot vnet dot ibm dot com> <51C1CEFC dot 9000100 at redhat dot com> <51C1FE4C dot 3020400 at linux dot vnet dot ibm dot com> <20130619221130 dot 7B91A2C10E at topped-with-meat dot com> <51C31177 dot 90303 at linux dot vnet dot ibm dot com> <20130620175832 dot 0E6FA2C133 at topped-with-meat dot com> <20130620213141 dot GA4833 at domone dot kolej dot mff dot cuni dot cz> <20130620205919 dot 9156B2C135 at topped-with-meat dot com>
On Thu, Jun 20, 2013 at 01:59:19PM -0700, Roland McGrath wrote:
> > Actually you should split simple_* to separate files and compile them with
> > O0.
>
> __attribute__ ((optimize ("O0"))) is sufficient in compilers that support
> it (4.6, I think) and less hassle than breaking up files. I don't think
> anyone does or should care about performance analysis using compilers that
> are so old as not to have that.
>
> > Doing otherwise makes their performance dependent on gcc version and
> > this makes results even more unreliable.
>
> Perhaps that matters for benchtests, if they are intended to use the
> simple_* implementations' performance as a baseline for comparison. The
> correctness tests (i.e. all tests outside benchtests/) do not care about
> that, and that's all I'm personally concerned with.
>
> If what you want as a performance baseline is "the obvious loop handling a
> byte at a time", then -O0 code can easily be substantially worse than this
> and give a misleading impression of what naive code would actually do.
> With -O0, the compiler is exceedingly stupid (by design), and usually every
> operation has excess spill and reload operations, which could easily
> dominate the performance of what would otherwise be a very tight loop.
> Short of hand-coding naive assembly for each machine, I'm not sure how you
> can robustly address that issue. Perhaps -O1 is a good fit for what
> assembly a human would write when not trying to be especially clever;
> but that's just a shot in the dark.
>
I choose a O0 as lesser evil than having reference implementation twice
faster depending what compiler you do use.
One solution is mandate to run benchmarks with fixed version of gcc and
fixed flags.
Second variant could be have assemblies and regeneration script that would
be ran with specific gcc.