This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Re: Allow pie links to create PLT entries
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: Sriraman Tallam <tmsriram at google dot com>
- Cc: Cary Coutant <ccoutant at google dot com>, binutils <binutils at sourceware dot org>, David Li <davidxl at google dot com>, Ian Lance Taylor <iant at google dot com>
- Date: Thu, 29 Jan 2015 15:31:11 -0800
- Subject: Re: Allow pie links to create PLT entries
- Authentication-results: sourceware.org; auth=none
- References: <CAAs8HmyEG-m74+vcKFzuFTzVB-1cQvp1K_k3Hji=9ZnFci7CtA at mail dot gmail dot com> <CAMe9rOoW6NDcAgTdY1rATCR+ncLd3RaoMyX=hqFU-A6hxBHAUQ at mail dot gmail dot com> <CAAs8HmyLBFgrj70-U8xBuDv00RbESBwznAs6+9Q_tm_1cRoUkA at mail dot gmail dot com> <CAMe9rOqEx8X2444FCZJDbQm=VKniUM0bRNaUuqknQyeOnVj7HA at mail dot gmail dot com> <CAAs8Hmxm4ya74vf6TpJOAYFO3Yn17bDj=wNN40Hr=nC9M7pPiA at mail dot gmail dot com> <CAMe9rOoGwg-y5EQNavqsd6xWAMbpYNyo12TnNT1NvJiURNqwAw at mail dot gmail dot com>
On Thu, Jan 29, 2015 at 3:13 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Jan 29, 2015 at 2:17 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Thu, Jan 29, 2015 at 12:17 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Thu, Jan 29, 2015 at 12:08 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> On Thu, Jan 29, 2015 at 11:48 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Thu, Jan 29, 2015 at 11:00 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Here is a simple example that fails to link with -pie but which
>>>>>> should work just fine without having to use -fPIE.
>>>>>>
>>>>>> foo.cc
>>>>>> ======
>>>>>> int extern_func();
>>>>>> int main()
>>>>>> {
>>>>>> extern_func();
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> bar.cc
>>>>>> =====
>>>>>> int extern_func()
>>>>>> {
>>>>>> return 1;
>>>>>> }
>>>>>>
>>>>>> $ g++ -fPIC -shared bar.cc -o libbar.so
>>>>>> $ g++ foo.cc -lbar -pie
>>>>>>
>>>>>> ld: error: foo.o: requires dynamic R_X86_64_PC32 reloc against
>>>>>> '_Z11extern_funcv' which may overflow at runtime; recompile with -fPIC
>>>>>>
>>>>>> It fails because the linker disallows creating a PLT for
>>>>>> R_X86_64_PC32 reloc when it is perfectly fine to do so. Note that I
>>>>>> could have recompiled foo.cc with -fPIE or -fPIC but I still think
>>>>>> this can be allowed. With support for copy relocations in pie in gold
>>>>>> and with this support, the cases where we would need to use -fPIE to
>>>>>> get working pie links is smaller. This would help us link non-PIE
>>>>>> objects into pie executables.
>>>>>
>>>>> You can't do it for x86 since EBX isn't setup for calling via PLT.
>>>>> For x86-64, there should be little difference between PIE
>>>>> and non-PIE code.
>>>>
>>>> True but that little difference is sometimes causing non-trivial
>>>> performance penalties. With copyrelocations support for PIE added
>>>> recently, one big difference causing non-trivial performance penalty
>>>> went away. However, there are still differences in the way global
>>>> arrays are accessed. For instance,
>>>>
>>>> uint32 a[] = {1, 2, 3, 4}
>>>>
>>>> a[i] can be accessed with one insn without -fPIE, whereas with -fPIE,
>>>> we need two. One extra to get the 64-bit address of a.
>>>>
>>>> Without -fPIE:
>>>>
>>>> movslq 0x1655(%rip),%rax # 401b80 <i>
>>>> mov 0x401b30(,%rax,4),%esi # a[i]
>
> If you link it with -pie, you will have TEXTREL in executable.
> Do you want relocations in text sections in PIE?
>
>>>> With -fPIE:
>>>>
>>>> movslq 0x16c5(%rip),%rdx # <i>
>>>> lea 0x166e(%rip),%rax # <&a>
>>>> mov (%rax,%rdx,4),%esi # a[i]
>>>>
>>>> I wish we could use just one insn to do the last two in the -fPIE
>>>> case, using PC-relative addressing like:
>>>> mov 0x166e(%rip, %rdx, 4), %esi
>>>
>>> Can you improve GCC codegen for this?
>>
>> I didnt find an instruction similar to that which I could use. Is there one?
>>
>> I implemented an
>>> optimization in ld to convert
>>>
>>> mov foo@GOTPCREL(%rip), %reg
>>> to
>>> lea foo(%rip), %reg
>>>
>>> for the locally defined symbol, foo. It improves PIE performance
>>> by as much as 10%. You may want to implement it in gold. See
>>> elf_x86_64_convert_mov_to_lea for details.
>>
>> Wow, this is cool! But, with copy relocations support for PIE, I think
>> this should be fixed since the compiler can safely assume that the
>> global is defined in the executable no matter what. Do you have an
>> example where foo@GOTPCREL is still used for globals?
>>
>> foo.cc
>> ---------
>> extern int a;
>> int main()
>> {
>> printf("%p", &a);
>> }
>>
>> Before copyrelocations support for PIE check in GCC:
>>
>> foo.s
>> ------
>>
>> ....
>> movq a@GOTPCREL(%rip), %rax
>> .....
>>
>> and after copyrelocs support:
>>
>> foo.s
>> ------
>>
>> .......
>> leaq a(%rip), %rsi
>> ......
>>
>> Did I miss something?
>>
>>
>
> If you don't have GOTPCREL relocations against locally
> defined symbols, this optimization won't apply.
The same libstdc++.so.6.0.21 from GCC 5 today on Linux/x86-64.
With ld.bfd:
[hjl@gnu-6 src]$ readelf -r /tmp/libstdc++.so.6.0.21 |wc -l
4659
[hjl@gnu-6 src]$
with ld.gold:
[hjl@gnu-6 src]$ readelf -r .libs/libstdc++.so.6.0.21 |wc -l
5516
[hjl@gnu-6 src]$
ld.bfd has another optimization:
commit dd7e64d45b317128f5fe813a8da0b13b4ad046ae
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Tue Nov 25 05:05:39 2014 -0800
Optimize out i386/x86-64 JUMP_SLOT relocation
When there are both PLT and GOT references to the same function symbol,
linker will create a GOTPLT slot for PLT entry and a GOT slot for GOT
reference. A run-time JUMP_SLOT relocation is created to update the
GOTPLT slot and a run-time GLOB_DAT relocation is created to update the
GOT slot. Both JUMP_SLOT and GLOB_DAT relocations will apply the same
symbol value to GOTPLT and GOT slots, respectively, at run-time.
This optimization combines GOTPLT and GOT slots into a single GOT slot
and removes the run-time JUMP_SLOT relocation. It replaces the regular
PLT entry:
indirect jump [GOTPLT slot]
push relocation index
jump PLT0
with an GOT PLT entry with an indirect jump via the GOT slot:
indirect jump [GOT slot]
nop
and resolves PLT reference to the GOT PLT entry.
We must avoid this optimization if pointer equality is needed since
we don't clear symbol value in this case and the dynamic linker won't
update the GOT slot. Otherwise, the resulting binary will get into an
infinite loop at run-time.
You may want to implement it in gold.
--
H.J.