This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: HWCAP is method to determine cpu features, not selection mechanism.



On 10-06-2015 17:56, OndÅej BÃlka wrote:
> On Wed, Jun 10, 2015 at 01:58:27PM -0500, Steven Munroe wrote:
>> On Wed, 2015-06-10 at 17:53 +0200, OndÅej BÃlka wrote:
>>> On Wed, Jun 10, 2015 at 12:23:40PM -0300, Adhemerval Zanella wrote:
>>>>
>>>>
>>>> On 10-06-2015 12:09, OndÅej BÃlka wrote:
>>>>> On Wed, Jun 10, 2015 at 11:21:54AM -0300, Adhemerval Zanella wrote:
>>>>>>
>>>>>>
>>>>>> On 10-06-2015 11:16, Szabolcs Nagy wrote:
>>>>>>> On 10/06/15 14:35, Adhemerval Zanella wrote:
>>>>>>>> I agree that adding an API to modify the current hwcap is not a good
>>>>>>>> approach. However the cost you are assuming here are *very* x86 biased,
>>>>>>>> where you have only on instruction (movl <variable>(%rip), %<destiny>) 
>>>>>>>> to load an external variable defined in a shared library, where for
>>>>>>>> powerpc it is more costly:
>>>>>>>
>>>>>>> debian codesearch found 4 references to __builtin_cpu_supports
>>>>>>> all seem to avoid using it repeatedly.
>>>>>>>
>>>>>>> multiversioning dispatch only happens at startup (for a small
>>>>>>> number of functions according to existing practice).
>>>>>>>
>>>>>>> so why is hwcap expected to be used in hot loops?
>>>>>>>
>>>>>>
>>> snip
>>>> And my understanding is to optimize hwcap access to provide a 'better' way
>>>> to enable '__builtin_cpu_supports' for powerpc.  IFUNC is another way to provide
>>>> function selection, but it does not exclude that accessing hwcap through
>>>> TLS is *faster* than current options. It is up to developer to decide to use
>>>> either IFUNC or __builtin_cpu_supports. If the developer will use it in
>>>> hot loops or not, it is up to them to profile and use another way.
>>>>
>>>> You can say the same about current x86 __builtin_cpu_supports support: you should
>>>> not use in loops, you should use ifunc, whatever.
>>>
>>> Sorry but no again. We are talking here on difference between variable
>>> access and tcb access. You forgot to count total cost. That includes
>>> initialization overhead per thread to initialize hwcap, increased
>>> per-thread memory usage, maintainance burden and increased cache misses.
>>> If you access hwcap only rarely as you should then per-thread copies
>>> would introduce cache miss that is more costy than GOT overhead. In GOT
>>> case it could be avoided as combined threads would access it more often.
>>>
>> Actually Adhemerval does have the knowledge, background, and experience
>> to understand this difference and accurately access the trade-offs.
>>
> While he may have background he didn't cover drawbacks. So I needed to
> point them out to start discussing cost-benefit analysis instead looking
> at them with rose glasses.
>  

What I did was to pointed out your earlier analysis related to instruction
latency was x86 biased and did not hold out for powerpc TOC cost model.  I
was *not* advocating anything more neither saying this hwcap in TCB is the
best approach.

And I do see the raised points you brought as valid, but IMHO this kind of
discussion will stretch without end mainly because it is based on assumptions
and tradeoffs.

Now, my opinion for powerpc to implement __builtin_cpu_supports is to similar
to x86 by adding it on libgcc and using initial executable TLS variables. It
will create 2 dynamic relocations (R_PPC64_TPREL16_HI and R_PPC64_TPREL16_LO),
but the access will require 2 arithmetic instruction and 1 load.  It will
decouple the implementation from GLIBC and not required any more TCB fields.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]