This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
- From: Cary Coutant <ccoutant at gmail dot com>
- To: Sriraman Tallam <tmsriram at google dot com>
- Cc: "H.J. Lu" <hjl dot tools at gmail dot com>, binutils <binutils at sourceware dot org>, David Li <davidxl at google dot com>
- Date: Fri, 10 Jun 2016 13:06:04 -0700
- Subject: Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
- Authentication-results: sourceware.org; auth=none
- References: <CAAs8HmxxdBpS7w8udZgK0QFi5TnenU3wGhpPfhWeKE8Tr=thvA at mail dot gmail dot com> <CAMe9rOpk3aOK5mMkKvYQyzeQxJ-h8o+3KjLRikKSkLmMfqoUtg at mail dot gmail dot com> <CAAs8Hmw2KQ2neDNP5cnQPBVBZMJthvQGTARPiwa-NfAx5R6ugw at mail dot gmail dot com> <CAMe9rOrOyYv0+svcObyaBcoYbAWZTadEPm-mAGQUFFyNjPgctg at mail dot gmail dot com> <CAAs8HmzoxWe2YpvjviV-bs2BRotGa_WFWbCyyLh-_L=s00yxjQ at mail dot gmail dot com>
>>> * x86_64.cc (can_convert_callq_to_direct): New function.
>>> Target_x86_64<size>::Scan::global: Check if an indirect call via
>>> GOT can be converted to direct.
>>> Target_x86_64<size>::Relocate::relocate: Change any indirect call
>>> via GOT that can be converted.
>>> * testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
>>> * testsuite/Makefile.in: Regenerate.
>>> * testsuite/x86_64_indirect_call_to_direct1.s: New file.
>>> * testsuite/x86_64_indirect_jump_to_direct1.s: New file.
>>>
>>
>> Do you need to check R_X86_64_REX_GOTPCRELX for branch?
>
> Ok, patch changed to not check for this and refactored a bit.
Thanks for the refactoring -- I was about to suggest that. There's
still too much logic outside the can_convert_* functions that is
duplicated in Scan::global() and in Relocate::relocate(), all because
we don't want to fetch the section contents without doing a few
preliminary checks. I'd like to move most of that logic into the
can_convert_* functions, so let's start with a Lazy_view class, which
will fetch the section contents only when needed:
template<int size>
class Lazy_view
{
public:
Lazy_view(Sized_relobj_file<size, false>* object, unsigned int data_shndx)
: object_(object), data_shndx_(data_shndx), view_(NULL), view_size_(0)
{ }
inline unsigned char
operator[](size_t offset)
{
if (this->view_ == NULL)
this->view_ = this->object_->section_contents(this->data_shndx_,
&this->view_size_,
true);
if (offset >= this->view_size_)
return 0;
return this->view_[offset];
}
private:
Sized_relobj_file<size, false>* object_;
unsigned int data_shndx_;
const unsigned char* view_;
section_size_type view_size_;
};
Now we can move (almost) all of that external logic into the
can_convert_* routines, leaving us with this in Scan::global():
Lazy_view<size> view(object, data_shndx);
size_t r_offset = reloc.get_r_offset();
if (r_offset >= 2
&& Target_x86_64<size>::can_convert_mov_to_lea(gsym, r_type,
r_offset, &view))
break;
if (r_offset >= 2
&& Target_x86_64<size>::can_convert_callq_to_direct(gsym, r_type,
r_offset, &view))
break;
... and this in Relocate::relocate():
if ((gsym == NULL
&& rela.get_r_offset() >= 2
&& view[-2] == 0x8b
&& !psymval->is_ifunc_symbol())
|| (gsym != NULL
&& rela.get_r_offset() >= 2
&& Target_x86_64<size>::can_convert_mov_to_lea(gsym, r_type,
0, &view)))
{
...
}
else if (gsym != NULL
&& rela.get_r_offset() >= 2
&& Target_x86_64<size>::can_convert_callq_to_direct(gsym,
r_type,
0, &view))
{
if (view[-1] == 0x15)
{
...
}
else
{
...
}
}
Folding all the external logic into the can_convert_* functions, and
refactoring a bit gives this:
template<class View_type>
static inline bool
can_convert_mov_to_lea(const Symbol* gsym, unsigned int r_type,
size_t r_offset, View_type* view)
{
gold_assert(gsym != NULL);
// We cannot do the conversion unless it's one of these relocations.
if (r_type != elfcpp::R_X86_64_GOTPCREL
&& r_type != elfcpp::R_X86_64_GOTPCRELX
&& r_type != elfcpp::R_X86_64_REX_GOTPCRELX)
return false;
// We cannot convert references to IFUNC symbols, or to symbols that
// are not local to the current module.
if (gsym->type() == elfcpp::STT_GNU_IFUNC
|| gsym->is_undefined ()
|| gsym->is_from_dynobj()
|| gsym->is_preemptible())
return false;
if (parameters->options().shared()
&& (gsym->visibility() == elfcpp::STV_DEFAULT
|| gsym->visibility() == elfcpp::STV_PROTECTED)
&& !parameters->options().Bsymbolic())
return false;
// We cannot convert references to the _DYNAMIC symbol.
if (strcmp(gsym->name(), "_DYNAMIC") == 0)
return false;
// Check for a MOV opcode.
return (*view)[r_offset - 2] == 0x8b;
}
template<class View_type>
static inline bool
can_convert_callq_to_direct(const Symbol* gsym, unsigned int r_type,
size_t r_offset, View_type* view)
{
gold_assert(gsym != NULL);
// We cannot do the conversion unless it's a GOTPCRELX relocation.
if (r_type != elfcpp::R_X86_64_GOTPCRELX)
return false;
// We cannot convert references to IFUNC symbols, or to symbols that
// are not local to the current module.
if (gsym->type() == elfcpp::STT_GNU_IFUNC
|| gsym->is_undefined ()
|| gsym->is_from_dynobj()
|| gsym->is_preemptible())
return false;
// Check for a CALLQ or JMPQ opcode.
return ((*view)[r_offset - 2] == 0xff
&& ((*view)[r_offset - 1] == 0x15
|| (*view)[r_offset - 1] == 0x25));
}
A couple of observations, though:
1. Sri, in your patch, you just test for sym type == STT_FUNC. Isn't
it sufficient to test for sym type != STT_GNU_IFUNC (as in the
convert-to-lea case)? I don't think it really matters -- if we see a
jump to an STT_OBJECT or STT_NOTYPE symbol, why isn't the
transformation just as valid?
2. HJ, given an R_X86_64_GOTPCRELX relocation, is it still necessary
to check the opcode during Scan::global()? Doesn't the relocation
guarantee that it's an appropriate instruction for the transformation?
I think in both cases, we could skip fetching the section contents if
we have this relocation.
3. This piece of can_convert_mov_to_lea has me a bit puzzled:
if (parameters->options().shared()
&& (gsym->visibility() == elfcpp::STV_DEFAULT
|| gsym->visibility() == elfcpp::STV_PROTECTED)
&& !parameters->options().Bsymbolic())
return false;
We've already tested for is_preemptible, which should take care of all
of these cases. If I remove this code, however, I get a couple of
failures that I need to investigate. At the very least, I suspect this
part of the logic can be simplified. I want to investigate this
further before committing the whole thing.
-cary