This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Fixes tree-loop-distribute-patterns issues
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: "Joseph S. Myers" <joseph at codesourcery dot com>
- Cc: Roland McGrath <roland at hack dot frob dot com>, Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>, Carlos O'Donell <carlos at redhat dot com>, "GNU C. Library" <libc-alpha at sourceware dot org>, Siddhesh Poyarekar <siddhesh at redhat dot com>
- Date: Fri, 21 Jun 2013 09:47:21 +0200
- Subject: Re: [PATCH] Fixes tree-loop-distribute-patterns issues
- References: <51C1BFE9 dot 4070805 at linux dot vnet dot ibm dot com> <51C1CEFC dot 9000100 at redhat dot com> <51C1FE4C dot 3020400 at linux dot vnet dot ibm dot com> <20130619221130 dot 7B91A2C10E at topped-with-meat dot com> <51C31177 dot 90303 at linux dot vnet dot ibm dot com> <20130620175832 dot 0E6FA2C133 at topped-with-meat dot com> <20130620213141 dot GA4833 at domone dot kolej dot mff dot cuni dot cz> <Pine dot LNX dot 4 dot 64 dot 1306202232290 dot 14606 at digraph dot polyomino dot org dot uk> <20130621004338 dot GA6306 at domone dot kolej dot mff dot cuni dot cz> <Pine dot LNX dot 4 dot 64 dot 1306210058560 dot 14606 at digraph dot polyomino dot org dot uk>
On Fri, Jun 21, 2013 at 01:06:07AM +0000, Joseph S. Myers wrote:
> On Fri, 21 Jun 2013, Ondrej Bilka wrote:
>
> > > I expect -O0 performance to depend a lot more on GCC version than -O2.
> > >
> > You expect but could you prove it? Please provide two versions of gcc
> > where you get different simple-* function when compiling with -O0 -S
> >
> > Versions I checked are
> > Debian 4.5.3-12
> > Debian 4.7.1-2
> > gcc version 4.9.0 20130516 (experimental) (GCC)
> >
> > Assemblies produced are same for following fragment:
> >
> > void
> > *memset (char *s, int c, int n)
> > {
> > int i;
> > for(i=0 ;i<n; i++) s[i] = c;
> > return s;
> > }
>
> I tried 4.3 and 4.4 based compilers building for i586 and got differences:
>
> < movl %eax, %edx
> < addl 8(%ebp), %edx
> < movl 12(%ebp), %eax
> < movb %al, (%edx)
> ---
> > addl 8(%ebp), %eax
> > movl 12(%ebp), %edx
> > movb %dl, (%eax)
>
> The general principle is simple enough: -O0 code is more likely to depend
> on the fine details of the implementation, because differences in the
> internal representation of no semantic significance can easily result in
> changes to the generated code when a dumb conversion from IR to assembly
> is in operation, whereas with -O2 such non-semantic differences are likely
> to be optimized away. And for such simple functions there's only a
> limited amount an optimizer can do so different compiler versions are
> likely to differ only in insubstantial matters of instruction ordering and
> register allocation.
>
Are you sure? Lower optimization levels keep a structure of program
mostly intact so a single change is unlikely to have big impact on
performance. If this is so then combination is likely to produce just a
noise.
On O2 you have much optimizations enabled so room for change is bigger.
That it is simple function is a argument againist O2 as a single
optimization can have big impact.
>From my head unrolling will make big difference. If someone makes sane
heuristic for unroller to be enabled on O2 then swing will be big. Also
I heard that it is planned to enable vectorizer at O2 for obvious cases
which is also big.
> Any sort of performance measurement involving -O0 is extremely suspect,
> simply because performance is essentially not a consideration at all for
> -O0 code generation; other matters such as speed of the compiler itself
> and debuggability are the considerations involved, and are the things
> people may try to avoid regressing across compiler upgrades.
>
Here we need it mainly reference, as in this case it is more important than
actual performance.