This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB



On 09-07-2015 18:51, OndÅej BÃlka wrote:
> On Thu, Jul 09, 2015 at 04:31:17PM -0300, Adhemerval Zanella wrote:
>>
>>
>> On 09-07-2015 16:02, OndÅej BÃlka wrote:
>>> On Tue, Jul 07, 2015 at 10:35:24AM -0500, Steven Munroe wrote:
>>>> Not so simple on PowerISA as we don't have PC-relative addressing.
>>>>
>>>> 1) The global entry requires 2 instruction to establish the TOC/GOT
>>>> 2) Medium model requires two instructions (fused) to load a pointer from
>>>> the GOT.
>>>> 3) Finally we can load the cached hwcap.
>>>>
>>>> None of this is required for the TP+offset.
>>>>
>>> And why you didn't wrote that when it was first suggested? When you don't answer 
>>> it looks like you don't want to answer because that suggestion is better.
>>>
>>> Here problem isn't lack of relative addressing but that you don't start
>>> with GOT in register. 
>>>
>>> You certainly could do similar hack as you do with tcb and place hwcap
>>> bits just after that so you could do just one load.
>>>
>>> That you require so many instructions on powerpc is gcc bug, rather than
>>> rule. You don't need that many instructions when you place frequent
>>> symbols in -32768..32767 range. For example here you could save one
>>> addition.
>>>
>>> int x, y;
>>> int foo()
>>> {
>>>   return x + y;
>>> }
>>>
>>> original
>>>
>>> 00000000000007d0 <foo>:
>>>  7d0:	02 00 4c 3c 	addis   r2,r12,2
>>>  7d4:	30 78 42 38 	addi    r2,r2,30768
>>>  7d8:	00 00 00 60 	nop
>>>  7dc:	30 80 42 e9 	ld      r10,-32720(r2)
>>>  7e0:	00 00 00 60 	nop
>>>  7e4:	38 80 22 e9 	ld      r9,-32712(r2)
>>>  7e8:	00 00 6a 80 	lwz     r3,0(r10)
>>>  7ec:	00 00 29 81 	lwz     r9,0(r9)
>>>  7f0:	14 4a 63 7c 	add     r3,r3,r9
>>>  7f4:	b4 07 63 7c 	extsw   r3,r3
>>>  7f8:	20 00 80 4e 	blr
>>>
>>> new
>>>
>>>  	addis   r2,r12,2
>>> 	ld      r10,-1952(r2)
>>> 	ld      r9,-1944(r2)
>>> 	lwz     r3,0(r10)
>>> 	lwz     r9,0(r9)
>>> 	add     r3,r3,r9
>>> 	extsw   r3,r3
>>> 	blr
>>
>> No you can't, you need to take in consideration powerpc64le ELFv2 ABi has two
>> entrypoints for every function, global and local, with former being used when
>> you need to materialize the TOC while latter you can use the same TOC. And
>> compiler has no information regarding this, it has to be decided by the linker.
>>
> Of course I can, reusing TOC is not mandatory. That would just decrease
> performance a bit for local.

Reusing TOC is exactly the optimization linker will do to avoid call the
global entrypoint.  And the problem is 1. it still requires to materialize
the TOC on global entrypoints, where you will need to save/restore it
in PLT stubs and 2. you will need a hwcap copy per TOC/DSO.  I think 
Steven proposal is exactly to avoid these. In fact this was one option
I advocate to him before he remind the issues.

> 
> You need majority of calls be from different dso to use global.
> Otherwise if you use local entrypoint there is no reason to use tcb as
> hidden variable does same job (and you could use local entrypoint in
> plt of same dso.). A example that I previously mentioned is
> compiled by
> 
>  gcc hw.c h.o -O3 -fPIC  -mcmodel=medium -shared
> 
> extern int __hwcap __attribute__ ((visibility ("hidden"))) ;
> int foo(int x, int y)
> {
>   if (__hwcap)
>     return x;
>   else
>     return y;
> }
> 
> into
> 
> 0000000000000750 <foo>:
>  750:	02 00 4c 3c 	addis   r2,r12,2
>  754:	b0 78 42 38 	addi    r2,r2,30896
>  758:	00 00 00 60 	nop
>  75c:	54 80 22 81 	lwz     r9,-32684(r2)
>  760:	00 00 89 2f 	cmpwi   cr7,r9,0
>  764:	20 00 9e 4c 	bnelr   cr7
>  768:	78 23 83 7c 	mr      r3,r4
>  76c:	20 00 80 4e 	blr
> 
> which with local entry uses only one load as tcb proposal.
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]