This is the mail archive of the
mailing list for the binutils project.
Re: RFC: Possible tweak to MIPS16/microMIPS PLT choice
- From: Richard Sandiford <rdsandiford at googlemail dot com>
- To: "Maciej W. Rozycki" <macro at codesourcery dot com>
- Cc: <binutils at sourceware dot org>
- Date: Mon, 07 Oct 2013 19:55:43 +0100
- Subject: Re: RFC: Possible tweak to MIPS16/microMIPS PLT choice
- Authentication-results: sourceware.org; auth=none
- References: <87y567spj7 dot fsf at talisman dot default> <alpine dot DEB dot 1 dot 10 dot 1310071523470 dot 6696 at tp dot orcam dot me dot uk> <87wqlphulb dot fsf at sandifor-thinkpad dot stglab dot manchester dot uk dot ibm dot com> <alpine dot DEB dot 1 dot 10 dot 1310071826100 dot 6696 at tp dot orcam dot me dot uk>
"Maciej W. Rozycki" <email@example.com> writes:
> On Mon, 7 Oct 2013, Richard Sandiford wrote:
>> > I still owe you a new version of the test suite part proposed there, I
>> > remember -- I just couldn't allocate any time to do that, sigh...
>> NP -- I have a harnaess I'm happy with, I just need to fill in the details :-)
> I explicitly covered some corner cases in my proposal, I'd prefer it
> didn't get lost and therefore some form of the original test suite made
> its way into our sources. I reckon your main concern was the redundancy
> of some cases and the lack of machine code matching.
Let's leave the discussion of that until I've finished the proposed version.
You can tell me if I missed a case.
FWIW, I think I've found one bug. I haven't checked yet, but I'm pretty
sure it predates your patch.
>> > Hmm, I'm not sure if I follow you. The choice looks obvious to me: we
>> > want to use microMIPS PLT entries whenever possible because they are
>> > smaller (the whole purpose of the creation of the microMIPS ISA). In your
>> > scenario:
>> > 1. We obviously can use a microMIPS PLT entry because we have microMIPS
>> > code in the object being linked.
>> > 2. We obviously can't use a microMIPS PLT entry because the presence of
>> > MIPS16 code precludes the use of microMIPS code.
>> > 3. We obviously can't use a microMIPS PLT entry because we only saw
>> > standard MIPS code and therefore we want to run it on a standard MIPS
>> > processor.
>> > 4. We obviously can use a microMIPS PLT entry because we have microMIPS
>> > code in the object being linked.
>> > -- so what's unclear about it?
>> > In the absence of JAL relocations it's the output microMIPS ELF file
>> > header flag that makes BFD decide whether to use microMIPS or standard PLT
>> > entries and it seems obvious and straightforward to me. Why do you think
>> > CALL relocations should be treated specially in any way? Am I missing
>> > anything?
>> Yeah, I should have been more explicit, sorry. I thought the justification
>> for using MIPS PLTs in microMIPS objects, rather than converting MIPS
>> JALs to JALXs, was that we wanted to use the caller's encoding to avoid
>> the hit of a mode change. Wouldn't the hit be just the same if the call
>> is coming indirectly through a JALR rather than directly through a JALX?
> The hit was mainly a concern for MIPS16 mode switches and I believe is
> irrelevant for indirect calls because for these there is no branch
> prediction and hence no need to invalidate state already prefetched into
> the pipeline.
Hmm, I'd naively assumed that it'd be obvious that JALX was going to be
a mode change. How do things get mispredicted? FWIW, in:
1. There is a considerable pipeline reconfiguration overhead at least in
some implementations for cross-mode jumps made to switch to and from
the MIPS16 mode -- the pipeline has to be drained, the execution
decoder reconfigured and instruction fetches for the new mode started
from scratch. The overhead in some cores is I'm told a number between
ten and twenty cycles; closer to the latter figure than the former
Do the cores manage to avoid the decoder reconfiguration delay for
indirect jumps, or is just that all indirect jumps take the hit?
>> That is, I wasn't sure why we were treating:
>> jal foo
>> and the equivalent of:
>> lw $25, %call16(foo)($25)
>> jalr $25
>> differently when deciding the encoding of foo. With JAL->JALX conversion,
>> both forms can call either "standard" or "compressed" code, but as things
>> stand, they don't influence the choice in the same way.
> A JAL->JALX conversion could be made, but the issue here is there is no
> equivalent available for the J instruction. So to avoid a further
> complication such a conversion isn't made at all. And don't forget the
> JAL->BAL or J->B relaxation that we do want to make whenever possible.
> Overall a lot of hassle for a questionable gain. Someone who wants to
> squeeze out as much as possible from memory available won't be using
> mixed-mode binaries anyway.
OK, fair point. So for the MIPS-in-a-microMIPS-object case (case 4),
using a MIPS PLT was as much for convenience and simplicity as anything?
Note that case (2) is the opposite though. It could occur with pure
MIPS16 input and is a case where we would use MIPS code where it wasn't
really necessary. OTOH, the lazy-binding stubs that the code would
normally use would be MIPS rather than MIPS16, so I suppose it probably
isn't worth worrying about.
Oh well, thanks for hearing me out.
>> So going back to the list above, the equivalent for JAL relocations is:
>> 1. microMIPS JAL + HI/LO ref -> microMIPS PLT
>> 2. MIPS16 JAL + HI/LO ref -> MIPS16 PLT
>> 3. MIPS JAL + HI/LO ref in non-microMIPS object -> MIPS PLT
>> 4. MIPS JAL + HI/LO ref in microMIPS object -> MIPS PLT
>> where (2) and (4) give different results from the GOT CALL case.
>> I was thinking the GOT CALL and JAL cases could be handled in the same way.
> Well, as noted above the objective is different for JAL vs GOT CALL, so I
> don't think the two cases need to match each other. It's fine for GOT
> CALL to switch modes -- although will it really? Does the GOT entry
> created to satisfy a CALL relocation point to a PLT slot rather than a
> lazy binding stub in a non-PIC abicalls executable? That would look like
> an unnecessary performance pessimisation to me -- why two indirect calls
> instead of one only?
It does if there are HI/LO references, because then we need the executable's
"foo" symbol to be the canonical function address. A lazy binding stub can't
be used for that because the stub/resolver interface relies on the incoming
value of $gp. And we can't have both a PLT and a stub, because they need
conflicting symbol definitions.
If we wanted to get fancy, we could make the .got entry initially point
to a local bit of code that calls the PLT and then copies the .got.plt
entry to the .got entry. But that too feels like unnecessary complexity
for a corner case.