This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: powerpc __tls_get_addr call optimization


On 03/20/2015 11:27 AM, Rich Felker wrote:
> On Fri, Mar 20, 2015 at 06:25:02PM +1030, Alan Modra wrote:
>> On Thu, Mar 19, 2015 at 11:33:16PM -0400, Carlos O'Donell wrote:
>>> On 03/18/2015 10:56 PM, Alan Modra wrote:
>>>> On Wed, Mar 18, 2015 at 01:07:32PM -0400, Carlos O'Donell wrote:
>>>>> On 03/18/2015 02:11 AM, Alan Modra wrote:
>>>>>> Now that Alex's fixes for static TLS have gone in, I figure it's worth
>>>>>> revisiting an old patch of mine.
>>>>>> https://sourceware.org/ml/libc-alpha/2009-03/msg00053.html
>>>>>
>>>>> I'm not against this patch, but it certainly seems like you would be
>>>>> better served by just implementing tls descriptors?
>>>>
>>>> I think this is one better than tls descriptors, because powerpc
>>>> avoids the indirect function call used by tls descriptors.
>>>
>>> You mean to say it is "faster" than tls descriptors, but at the same
>>
>> To be honest, there isn't much difference in the optimized case where
>> static TLS is available.  It boils down to an indirect call to a
>> function that loads one value vs. a direct call to a stub that loads
>> two values and compares one against zero.  I think what I've
>> implemented is slightly better for PowerPC, but whether that would
>> carry over to other architectures is debatable.
> 
> If the performance difference isn't measurable in real-world
> applications, I would think uniformity between targets would be a lot
> more valuable.
> 
> I also don't see how your approach is a "direct call". The function
> being called is in a different DSO so it has to go through a pointer
> in the GOT or similar, in which case it's just as "indirect" as the
> TLSDESC call would be.

I agree. And this was my initial inclination, but I'm not against what
Alan has implemented. As a machine maintainer he should be allowed some
leeway to argue this implementation is "N instructions less" and therefore
must be faster, but that such speed is harder to show in a microbenchmark,
it would in the mean result in say less CPU usage over billions of cycles.

IBM has to accept that the downside to all of this is that breakage in
this area may take longer to fix, and get less fixes than those arches
already using TLS DESC.

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]