This is the mail archive of the
mailing list for the binutils project.
Re: [PATCH] MIPS/binutils: microMIPS linker relaxation fixes
"Maciej W. Rozycki" <firstname.lastname@example.org> writes:
>> > So I have actually given it some more thought and my understanding of the
>> > ABI remains that while orphaned R_MIPS_LO16 relocations are indeed
>> > permitted, they still must be preceded by a corresponding R_MIPS_HI16,
>> > although that is not required to be adjacent. I believe this is only
>> > permitted to allow cases like you quoted to avoid unnecessary extra code
>> > to add missing R_MIPS_HI16 relocations.
>> There are still potential problems though. We deliberately allow things like:
>> lui $4,%hi(foo)
>> lw $6,%lo(foo)($4)
>> lw $7,%lo(foo+4)($4)
>> .align 8
>> .word X, Y
>> and foo is allowed to be in a text section. Does your patch ensure that
>> foo remains 8-byte aligned, even if we relax code earlier in the section?
> Sigh, you're right -- I wish we realised this earlier on. No, the
> alignment of foo will get broken of course just as alignment of standard
> MIPS code would, as noted with the original submission of this update.
> Of course if you run this under Linux, the you won't notice unless you
> observe the performance drop badly.
Hmm, run the code above you mean? The point is that the code doesn't
work if foo becomes 0x....7ffc, since foo and foo+4 no longer have
the same high part.
>> > Do you have a better idea?
>> TBH, my inclination is to remove it from trunk too. I imagine
>> GCC's LTO will catch many of the interesting cases (because then
>> we assemble the output object's text section at once).
> OK, so let's see where we are. We've got three kinds of relaxation
> actions we make:
> 1. I think with the changes I made to branch relaxation in GAS we are
> mostly covered. There's one corner case remaining I reckon (I'd have
> to go back to the code and/or my earlier notes to track it down), where
> we fail to convert to a short or compact branch. And branches between
> separate modules are extremely rare, so I wouldn't bother about them.
> So all the branch relaxation code here should by now have been mostly
> redundant. I'll have a look into that corner case yet -- I may not be
> able to do that immediately though.
Sounds good. :-)
> 2. Short delay slot relaxation, i.e. JAL->JALS conversion. We actually
> should be handling JALR->JALRS and BGEZAL/BLTZAL->BGEZALS/BLTZALS as
> well, but we don't. These can and actually should be done in GAS.
> There are two cases to handle:
> * Instructions swapped into a delay slot. I reckon this is a bit
> tricky, but I think still doable. The instruction to be swapped is
> already of the right size, it's just not swapped if it's of the wrong
> size for the delay slot. We should enable that swapping and flip the
> delay slot size bit in the respective branch/jump opcode.
Yeah. This doesn't seem too difficult on face value though.
> * Instructions manually scheduled in a delay slot ("noreorder" mode).
> Currently the mnemonic used for the branch/jump determines the size
> of this instruction. I think we should always treat the long delay
> slot mnemonics as macros; they will often come from assembly written
> for the standard MIPS mode the conversion of which to the respective
> short delay slot mnemonics is IMO infeasible. Not even mentioning
> that if operands are substituted in any way (e.g. by macro
> expansion), then the size of the instruction may vary between
> assembly passess.
> Again, this may be a bit tricky as it requires looking forwards it
> would seem. But perhaps we can handle this with relaxation, or maybe
> simpler yet -- by tweaking the previous instruction emitted through
> the history of instructions we maintain.
Yeah. We'd still need a variant frag in the latter case, to cope with
things like ".loc"s between the two instructions. But I agree full
relaxation isn't needed. We already change variant frags on the fly
when doing things like nop insertion, so we might be able to do
something similar here.
> I think we should have a way to disable this branch/jump conversion,
> perhaps in the "nomacro" mode or with a new setting (up to debate).
Not sure if I follow this, probably due to my lack of familiarity with
microMIPS. If you really want a JALR rather than a JALRS, wouldn't the
simplest and most explicit way be to add ".32" to the delay slot insn?
There's then no way the assembler could validly change the JALR.
If I'm wrong about that, then I don't think ".set nomacro" is appropriate.
I think "macro" in that context means "one pseudo-instruction that
expands to multiple real instructions".
So I agree it makes sense to treat "JALR" as a macro (in the INSN_MACRO
sense). For consistency, it seems sensible to allow "JALR ...; FOO.16"
to be written as a shorthand for "JALRS ...; FOO.16" as well. I.e.
it seems sensible not to care where the 16-bitness comes from.
> * While at it we might want to think about instruction swapping around
> JALX -- as noted above we don't do that if the instruction does not
> satisfy the delay slot size requirement and there's no JALXS
> instruction. We could convert the instruction to the 32-bit size.
> But then it may be really tough unless we relax all the 16-bit
> instructions which, conversely, seems an overkill to me. So I
> wouldn't put too much effort into it, but still I think it's worth
Agreed on both counts (about it being interesting, but lower priority).
> 3. HI0_LO16 and ADDIUPC relaxation. There's nothing that can be done for
> the former any earlier than by the linker, period. But do we care? I
> think the architecture makes this optimisation unlikely to matter.
> It's really unusual for TLB systems to map these low/high pages. Are
> they used in BAT systems? I don't know -- can anyone comment? The
> addresses from 0 up are typically useful in the error exception
> handlers (where CP0.Status.ERL switches to the identity mapping of the
> virtual address space), but are they really such a common case as to
> dedicate a linker optimisation for? I doubt it. So I think we can
> safely drop this feature and nobody will notice.
Also sounds good :-)
> Now as to the ADDIUPC relaxation -- this I think is really worth the
> trouble as I have seen significant text size reduction as a result of
> this optimisation. I'll dig out the exact figures I've got with an
> example app. The problem is again you cannot really make this
> optimisation any earlier than in the linker. The compiler or assembler
> do not know what the size of the final executable will be and therefore
> which references are going to fit in the ADDIUPC's range or not.
It'd be interesting to see the numbers. The most telling statistic
would probably be the contribution made by ADDIUPC references to symbols
that don't live in text sections.
> Hmm, I wonder if there's anything we could do about this. One thought
> I've got is to refrain from making this optimisation if there are data
> symbols in code being processed. But is seems unlikely to me to work
> reasonably, because (please correct me if I am wrong) at the point
> relaxation is made all the text sections from all the modules have
> already been merged into the respective output sections and we cannot
> only omit the fragments that correspond to modules that had data
> symbols in text while preserving their alignment too if any of the
> preceding fragments shrinks. At least without turning half of the BFD
> linker code upside down, the lone idea of which makes me feel chilly.
Taking "fragment" to mean "input section", then I don't think that
in itself is a problem. Internal alignment within each input section
can't be larger than the alignment of the input section itself.
But I'm not sure we can rely on symbols like "foo" in the example
above having a different type from ".LXXXX"-style branch targets,
or from labels inserted for exception handling, debug info, etc.
You would also need to stop references to local data from being
converted into section-relative form (%lo(foo) becoming
%lo(.text + const), etc.) We already have legacy objects
in which that sort of transformation has happened.
> Any other thoughts? What do the others do -- or are we the only target
> doing this kind of linker relaxation? What's LTO BTW?
Link time optimisation. GCC stores IL in each object file, and then
instead of linking the original assembly from each object together,
it can merge the IL and recompile it into a single piece of assembly.