This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: New .nops directive, to aid Linux alternatives patching?


On 08/02/2018 20:36, H.J. Lu wrote:
> On Thu, Feb 8, 2018 at 12:33 PM, Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>> On 08/02/2018 20:28, H.J. Lu wrote:
>>> On Thu, Feb 8, 2018 at 12:27 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Thu, Feb 8, 2018 at 12:18 PM, Andrew Cooper
>>>> <andrew.cooper3@citrix.com> wrote:
>>>>> On 08/02/2018 20:10, H.J. Lu wrote:
>>>>>> On Thu, Feb 8, 2018 at 11:26 AM, Andrew Cooper
>>>>>> <andrew.cooper3@citrix.com> wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I realise this is a little bit niche, but how feasible would it be to
>>>>>>> introduce a new .nops directive which takes a size parameter, and
>>>>>>> outputs long nops covering the number of specified bytes?
>>>>>>>
>>>>>> Sounds to me you want a pseudo NOP instruction:
>>>>>>
>>>>>> pseudo-NOP N
>>>>>>
>>>>>> which generates a long NOP with N byte.  Is that correct.  If yes,
>>>>>> what is the range of N?
>>>>> Currently 255 based on other implementation limits, and I expect that
>>>>> ought to be long enough for anyone.  There is one existing user for
>>>>> N=43, and I expect that to grow a bit.
>>>>>
>>>>> The real answer properly depends at what point it is more efficient to
>>>>> jmp rather than wasting decode bandwidth decoding nops, and I don't know
>>>>> the answer, but expect that it isn't larger than 255.
>>>>>
>>>> How about
>>>>
>>>> {nop} N
>>>>
>>>> If N is less than 15 bytes, it generates a long nop.   Otherwise, we use a jump
>>>> instruction over nops.  Does it work for you?
>>> N will be limited to 255.
>> Do you mean up to 255 bytes of adjacent long nops, or still a jump if
>> over 15 bytes?  For alternatives in the range of 15-30, a jmp is almost
>> certainly slower than executing through the nops.  The ORM isn't clear
>> where the split lies, and I expect it is very uarch specific.
> How about this
>
> {nop} N, L
> {nop} N
>
> N is < =255. If L is missing, L is 15.
>
> If N < L then
>   Long NOPs up to N bytes
> else
>   jmp + long nops up to N bytes.
> fi

I'm afraid that I don't think that will be very helpful in that form. 
Are there technical reasons why you don't want to emit more than a
single 15byte long nop?

First of all, 9-byte long nops are the longest you can use without
suffering decode stalls from on most processors due to excess segment
prefixes, which is why both Linux and Xen top out there when dynamically
adding new nops.

Secondly, I don't understand why you want the jmp.  I think it would be
entirely reasonable to make it the programmers problem to work out when
a jmp is more efficient.  If the patchsites really do get stupidly long,
we could make a boot-time u-arch calculation to decider whether the jmp
or the nops are better, but shorter patchsites are better so I don't
expect such a feature to get any production use where using a jmp would
be beneficial.

Ideally, such an implementation would just emit as many long nops as
would fill up the space requested.  One trick however to consider is
that if you've got N+10 bytes remaining and emitting N-sized long nops
(where N is most likely 9), then emitting an N+8 long nop and a 2-byte
long nop is more efficient to execute than an N+9 nop and a singlebyte
nop, as the singlebyte nop can't be optimised during execution.

~Andrew


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]