This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
- From: Sriraman Tallam <tmsriram at google dot com>
- To: "H.J. Lu" <hjl dot tools at gmail dot com>
- Cc: binutils <binutils at sourceware dot org>, Cary Coutant <ccoutant at gmail dot com>, David Li <davidxl at google dot com>
- Date: Fri, 20 May 2016 14:15:49 -0700
- Subject: Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
- Authentication-results: sourceware.org; auth=none
- References: <CAAs8HmxxdBpS7w8udZgK0QFi5TnenU3wGhpPfhWeKE8Tr=thvA at mail dot gmail dot com> <CAMe9rOpk3aOK5mMkKvYQyzeQxJ-h8o+3KjLRikKSkLmMfqoUtg at mail dot gmail dot com>
On Fri, May 20, 2016 at 1:32 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, May 20, 2016 at 1:27 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Hi,
>>
>> GCC has option -fno-plt which converts all extern calls to indirect
>> calls via GOT to prevent the linker for generating any PLT stubs.
>> However, if the function ends up defined in the executable this patch
>> will convert those indirect calls/jumps to direct. Since the indirect
>> calls are one byte longer, an extra nop is needed at the beginning.
>>
>> Here is a simple example:
>>
>> main.c
>> ---------
>> extern int foo();
>> int main() {
>> return foo();
>> }
>>
>> deffoo.c
>> -----------
>> int foo() {
>> return 0;
>> }
>>
>> $ gcc -fno-plt main.c deffoo.c
>> $objdump -d a.out
>>
>> 0000000000400626 <main>:
>> ...
>> 40062a: ff 15 28 14 00 00 callq *0x1428(%rip) #
>> 401a58 <_DYNAMIC+0x1d8>
>>
>> The call is indirect even though foo is defined in the executable.
>>
>> With this patch,
>> 0000000000400606 <main>:
>> ....
>> 40060a: 90 nop
>> 40060b: e8 03 00 00 00 callq 400613 <foo>
>>
>> The call is now direct with an extra nop.
>>
>>
>
> Please try ld, which uses 0x67 prefix (addr32) instead of nop.
Is this committed to ld?, trunk ld does not seem to do this.
Also a quick thing about -fPIE and -fno-plt. The assembly looks like
this for the call:
movq foo@GOTPCREL(%rip), %rax
jmp *%rax
Why can't we make this a single jmp *foo@GOTPCREL(%rip)
This goes via the GOT if foo is external and that is always reachable
with a 32-bit offset. Did I miss anything obvious?
> Also for
>
> jmp *foo#GOTPCREL(%rip)
>
> ld converts it to
>
> jmp foo
> nop
>
> --
> H.J.