This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: HWCAP is method to determine cpu features, not selection mechanism.
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- Cc: Szabolcs Nagy <szabolcs dot nagy at arm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
- Date: Wed, 10 Jun 2015 17:09:44 +0200
- Subject: Re: HWCAP is method to determine cpu features, not selection mechanism.
- Authentication-results: sourceware.org; auth=none
- References: <55760314 dot 6070601 at linux dot vnet dot ibm dot com> <5576FC80 dot 1090806 at arm dot com> <1433862393 dot 21101 dot 9 dot camel at sjmunroe-ThinkPad-W500> <20150609154223 dot GA20028 at domone> <1433865684 dot 21101 dot 20 dot camel at sjmunroe-ThinkPad-W500> <20150610125047 dot GA10861 at domone> <55783D2A dot 8050703 at linaro dot org> <557846D9 dot 3060909 at arm dot com> <55784802 dot 8070605 at linaro dot org>
On Wed, Jun 10, 2015 at 11:21:54AM -0300, Adhemerval Zanella wrote:
>
>
> On 10-06-2015 11:16, Szabolcs Nagy wrote:
> > On 10/06/15 14:35, Adhemerval Zanella wrote:
> >> I agree that adding an API to modify the current hwcap is not a good
> >> approach. However the cost you are assuming here are *very* x86 biased,
> >> where you have only on instruction (movl <variable>(%rip), %<destiny>)
> >> to load an external variable defined in a shared library, where for
> >> powerpc it is more costly:
> >
> > debian codesearch found 4 references to __builtin_cpu_supports
> > all seem to avoid using it repeatedly.
> >
> > multiversioning dispatch only happens at startup (for a small
> > number of functions according to existing practice).
> >
> > so why is hwcap expected to be used in hot loops?
> >
>
> Good question, I do not know and I believe Steve could answer this
> better than me. I am only advocating here that assuming x86 costs
> for powerpc is not the way to evaluate this patch.
Sorry but your details don't matter when underlying idea is just bad.
Even if getting hwcap took 20 cycles otherwise it would still be bad
idea. As you need to use hwcap only once at initialization bringing cost
is completely irrelevant.
First as I explained major flaw of Steve approach how exactly do you
ensure that gcc won't insert newer instruction that would lead to crash
on older platform?
Second is that it makes no sense. If you are at situation where hwcap
access gets noticable on profile a checking is also noticable on
profile. So use ifunc which will save you that additional cycles on
checking hwcap bits.
A programmer that uses hwcap in hot loop is just incompetent. Its stays
constant on application. So he should make more copies of loop, each
with appropriate options.
Then even if compiler still handled these issues correctly you will
probaly lose more on missed compiler optimizations that your supposed
gain. Compiler can select suboptimal patch as he doesn't want to expand
function too much due size concerns.
That quite easy, for example in following would get magnitude slower
with hwcap than ifuncs. Reason is that even gcc-5.1 doesn't split it
into two branches each doing shift. Instead it emits div instruction
which takes forever.
int hwcap;
unsigned int foo(unsigned int i)
{
int d = 8;
if (hwcap & 42)
d = 4;
return i / d;
}