This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Re: Help needed to track down bug: linking Linux kernel with gold creates unbootable kernel


On 01/-10/-28163 09:59 PM, John Reiser wrote:
Not identical, difference starts at byte 230:
-vmlinux.gold: file format elf64-x86-64
+vmlinux.bfd: file format elf64-x86-64
...
It appears to be a difference in the address chosen for that global (and
other globals later on).

Although the placement of globals chosen from a library need not be identical, it would be comforting to verify that this is the only reason. Try changing the link command to remove all *.a (extract and specify each *.o explicitly). Then there should be no difference.

There still is. The size of the .rodata is different, so probably the order is still different too.


Does gold do some optimizations that bfd ld doesn't do? (such as dropping unneeded globals, reordering the globals to not waste space due to alignment, if it can put another global inbetween, etc.)

This is the commandline I used (using /usr/bin/ld vs /usr/local/bin/ld):
/usr/bin/ld --build-id -m elf_x86_64 -o vmlinux.bfd -T arch/x86/kernel/vmlinux.lds arch/x86/kernel/head_64.o arch/x86/kernel/head64.o arch/x86/kernel/head.o arch/x86/kernel/init_task.o init/built-in.o --start-group usr/built-in.o arch/x86/built-in.o kernel/built-in.o mm/built-in.o fs/built-in.o ipc/built-in.o security/built-in.o crypto/built-in.o block/built-in.o as/*.o lib/built-in.o arch/x86/lib/built-in.o drivers/built-in.o sound/built-in.o firmware/built-in.o arch/x86/pci/built-in.o arch/x86/power/built-in.o arch/x86/video/built-in.o net/built-in.o --end-group .tmp_kallsyms2.o


I still have differences:
-ffffffff810000e1: 48 01 2d 08 c0 46 00 add %rbp,0x46c008(%rip) # ffffffff8146c0f0 <trampoline_level4_pgt>
-ffffffff810000e8: 48 01 2d f9 cf 46 00 add %rbp,0x46cff9(%rip) # ffffffff8146d0e8 <trampoline_level4_pgt+0xff8>
+ffffffff810000e1: 48 01 2d 98 74 40 00 add %rbp,0x407498(%rip) # ffffffff81407580 <trampoline_level4_pgt>
+ffffffff810000e8: 48 01 2d 89 84 40 00 add %rbp,0x408489(%rip) # ffffffff81408578 <trampoline_level4_pgt+0xff8>


So I did this (the .s is obtained by objdump -d vmlinux.gold >gold.s)
sed -re 's/(# |0x)[a-z0-9]+/HEX/g' gold.s | colrm 1 47 >gold1.s

And diff those.

Then aside from some local symbol name differences:
-       cmp    HEX(%rip),%edx        HEX <.LC3>
+       cmp    HEX(%rip),%edx        HEX <kallsyms_token_index+HEX>

I have this diff (+ is bfd), which is coming from .notes (why does objdump think it needs to dump .notes as assembly though?):
+ add $HEX,%al
+ add %al,(%rax)
+ adc $HEX,%al
+ add %al,(%rax)
+ add (%rax),%eax
+ add %al,(%rax)
+ rex.RXB
+ rex.WRX push %rbp
+ add %dh,%bh
+ insb (%dx),%es:(%rdi)
+ jle ffffffff813d1250 <bad_to_user+HEX>
+ and $HEX,%dl
+ (bad)
+ jge ffffffff813d1331 <__start___ex_table+HEX>
+ cs
+ callq ffffffff2bbe8fc1 <__crc___pskb_pull_tail+HEX>
+ stc
+ mov $HEX,%ch
+ (bad)




... there is also difference in padding:
gold uses 00 00 90 90 (add %al, (%rax) nop nop), while BFD uses 90 90 90
90 (4 nops).

That is a dispute over interpretation of the linker script: } :text=0x9090 The original spec was from the days when 2==sizeof(int), so padding was a 16-bit value, thus 0x9090 was all that mattered. Check the spec for an update regarding width of padding. In the meantime, try changing the script to } :text=0x90909090 which should remove this source of differences.

Yes that removes the differences from the nops.



If I read that correctly it means it uses hardware pages with a pagesize
of 2MB for kernel text.

Yes.


Since gold aligns only to 0x1000 perhaps the rodata ends up in the same
hardware page as the .text.

I think these are the relevant align commands from the vmlinux.lds ...

. = ALIGN((1 << 21));

It is a bug that gold does not propagate that alignment constraint to the .p_align.

If the hw pagesize is 2MB, then its not divisible, so its a bug.
Should I open a bugreport, or are there some patches to gold that I
could try?

Definitely open a bug report about ". = ALIGN((1 << 21));"

Opened bug 11490.



I think the .note difference is just due to gold embedding its version:
-Note section [ 2] '.notes' of 60 bytes at offset 0x3d2c58:
+Note section [ 2] '.notes' of 36 bytes at offset 0x5d1c58:
Owner Data size Type
- GNU 8 GNU_GOLD_VERSION
- Linker version: gold 1.9
GNU 20 GNU_BUILD_ID
- Build ID: a865af685f5222cdc17a28ea4e49d58b2185bc05
+ Build ID: 07b53da4e169ad1079080043ad72384fb80d0ea3

Again, it would be comforting to make a test run with GNU_GOLD_VERSION omitted, to see if the .text becomes identical (except for Build ID) with ld.


I did that (by editing gold source and returning from create_gold_note()), but as I've shown above there are still diffs due to global addresses...


Best regards,
--Edwin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]