This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: HWCAP is method to determine cpu features, not selection mechanism.


On Wed, Jun 10, 2015 at 11:21:54AM -0300, Adhemerval Zanella wrote:
> 
> 
> On 10-06-2015 11:16, Szabolcs Nagy wrote:
> > On 10/06/15 14:35, Adhemerval Zanella wrote:
> >> I agree that adding an API to modify the current hwcap is not a good
> >> approach. However the cost you are assuming here are *very* x86 biased,
> >> where you have only on instruction (movl <variable>(%rip), %<destiny>) 
> >> to load an external variable defined in a shared library, where for
> >> powerpc it is more costly:
> > 
> > debian codesearch found 4 references to __builtin_cpu_supports
> > all seem to avoid using it repeatedly.
> > 
> > multiversioning dispatch only happens at startup (for a small
> > number of functions according to existing practice).
> > 
> > so why is hwcap expected to be used in hot loops?
> > 
> 
> Good question, I do not know and I believe Steve could answer this
> better than me.  I am only advocating here that assuming x86 costs
> for powerpc is not the way to evaluate this patch.

Sorry but your details don't matter when underlying idea is just bad.
Even if getting hwcap took 20 cycles otherwise it would still be bad
idea. As you need to use hwcap only once at initialization bringing cost
is completely irrelevant.

First as I explained major flaw of Steve approach how exactly do you
ensure that gcc won't insert newer instruction that would lead to crash
on older platform?

Second is that it makes no sense. If you are at situation where hwcap
access gets noticable on profile a checking is also noticable on
profile. So use ifunc which will save you that additional cycles on
checking hwcap bits.

A programmer that uses hwcap in hot loop is just incompetent. Its stays
constant on application. So he should make more copies of loop, each
with appropriate options.

Then even if compiler still handled these issues correctly you will
probaly lose more on missed compiler optimizations that your supposed
gain. Compiler can select suboptimal patch as he doesn't want to expand
function too much due size concerns.

That quite easy, for example in following would get magnitude slower
with hwcap than ifuncs. Reason is that even gcc-5.1 doesn't split it
into two branches each doing shift. Instead it emits div instruction
which takes forever.

int hwcap;
unsigned int foo(unsigned int i)
{
  int d = 8;
  if (hwcap & 42)
    d = 4;
  return i / d;
}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]