This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Troubles with debug info, using systemtap on debian.
- From: James Y Knight <foom at fuhm dot net>
- To: systemtap at sourceware dot org
- Date: Mon, 9 Nov 2009 20:18:16 -0500
- Subject: Troubles with debug info, using systemtap on debian.
I built my own 2.6.31 kernel with:
make-kpkg --initrd --revision 1 --append-to-version -jknight-1-amd64
kernel_image kernel_headers kernel_debug
I have kernel-package version 12.025.
And I installed all 3 debs that created.
Therefore, I had on my filesystem direcories that look like this:
Original compile directory: /usr/src/linux-source-2.6.31
Kernel mods installed in: /lib/modules/2.6.31-jknight-1-amd64/
Debug data installed in: /usr/lib/debug/lib/modules/2.6.31-jknight-1-
amd64/
/lib/modules/2.6.31-jknight-1-amd64/build got created as a symlink
to: /usr/src/linux-source-2.6.31
Systemtap was working fine, for symbols in vmlinux, but segfaulted
when trying to probe modules. E.g., the simplest script segfaulted in
the translator.
probe module("autofs4").function("autofs4_fill_super") {}
Failed with this backtrace:
#0 0x00002b0e7568e34f in memmove () from /lib/libc.so.6
#1 0x00002b0e749cdf7c in elf64_xlatetof (dest=0x7fff515977d0,
src=0x7fff51597800,
encode=<value optimized out>) at elf32_xlatetof.c:118
#2 0x00002b0e747aeb0e in relocate (offset=49, addend=0x7fff51597940,
rtype=<value optimized out>,
symndx=11) at relocate.c:436
#3 0x00002b0e747af238 in relocate_section (ehdr=<value optimized
out>, shstrndx=<value optimized out>,
reloc_symtab=<value optimized out>, scn=0x291d020,
shdr=0x7fff515979d0, tscn=0x291cf68,
debugscn=false, partial=true) at relocate.c:501
#4 0x00002b0e747af741 in __libdwfl_relocate_section (mod=0x2908f60,
relocated=0x291cbb0,
relocscn=0x291d020, tscn=0x291cf68, partial=<value optimized
out>) at relocate.c:632
#5 0x00002b0e747b04a6 in dwfl_module_address_section (mod=0x2908f60,
address=<value optimized out>,
bias=0x7fff51597ed8) at derelocate.c:399
#6 0x000000000046d2f5 in dump_unwindsyms (m=0x2908f60,
userdata=<value optimized out>,
name=<value optimized out>, base=65536, arg=0x7fff51598330) at
translate.cxx:4730
#7 0x00002b0e747b1677 in dwfl_getmodules (dwfl=0x28cb170,
callback=0x46c560 <dump_unwindsyms>,
arg=0x7fff51598330, offset=2) at dwfl_getmodules.c:103
#8 0x0000000000469f66 in emit_symbol_data (s=@0x7fff515990f0) at
translate.cxx:4970
#9 0x000000000046c041 in translate_pass (s=@0x7fff515990f0) at
translate.cxx:5273
#10 0x000000000041062f in main (argc=2, argv=0x7fff5159aeb8) at
main.cxx:1231
Adding --ignore-vmlinux --ignore-dwarf didn't cause the crash to go
away.
Eventually, I figured out that it was finding debug data from a
strange location:
/lib/modules/2.6.31-jknight-1-amd64/build/debian/linux-image-2.6.31-
jknight-1-amd64-dbg/usr/lib/debug/lib/modules/2.6.31-jknight-1-amd64/
kernel/fs/autofs4/autofs4.ko
(I found that via, at that backtrace, "f 6; print *m").
Okay, I thought, that's odd. Let me just remove the "build" symlink,
so that hopefully it finds the debug data from the installed kernel-
debug package. Well, that failed, because the files there are
apparently expected to be called: *.ko.debug, but I had a file called:
/usr/lib/debug/lib/modules/2.6.31-jknight-1-amd64/kernel/fs/autofs4/
autofs4.ko
instead. So, I symlinked it to be called autofs4.ko.debug.
Note that autofs4.ko there is the same file (same md5sum) as the one
it found and crashed with above in /lib/modules/../build.
And, it still crashed. But, now, in a different place!!!
#0 0x00002b4884c2f34f in memmove () from /lib/libc.so.6
#1 0x00002b4883f6ef7c in elf64_xlatetof (dest=0x7fff49c3cff0,
src=0x7fff49c3d020,
encode=<value optimized out>) at elf32_xlatetof.c:118
#2 0x00002b4883d4fb0e in relocate (offset=47, addend=0x7fff49c3d160,
rtype=<value optimized out>,
symndx=179) at relocate.c:436
#3 0x00002b4883d50238 in relocate_section (ehdr=<value optimized
out>, shstrndx=<value optimized out>,
reloc_symtab=<value optimized out>, scn=0x2a160b0,
shdr=0x7fff49c3d1f0, tscn=0x2a15ff8,
debugscn=false, partial=true) at relocate.c:501
#4 0x00002b4883d50898 in __libdwfl_relocate (mod=0x2a511f0,
debugfile=0x2a15db0,
debug=<value optimized out>) at relocate.c:609
#5 0x00002b4883d539e8 in dwfl_module_getelf (mod=0x2a511f0,
loadbase=0x7fff49c3d6e0)
at dwfl_module_getelf.c:76
#6 0x000000000046cf79 in dump_unwindsyms (m=0x2a511f0,
userdata=<value optimized out>,
name=0x2b488f606b8f "autofs4_direct_root_inode_operations",
base=65536, arg=0x7fff49c3db40)
at translate.cxx:4475
#7 0x00002b4883d52677 in dwfl_getmodules (dwfl=0x19b9440,
callback=0x46c560 <dump_unwindsyms>,
arg=0x7fff49c3db40, offset=2) at dwfl_getmodules.c:103
#8 0x0000000000469f66 in emit_symbol_data (s=@0x7fff49c3e900) at
translate.cxx:4970
#9 0x000000000046c041 in translate_pass (s=@0x7fff49c3e900) at
translate.cxx:5273
#10 0x000000000041062f in main (argc=2, argv=0x7fff49c406c8) at
main.cxx:1231
Eventually after a bit of flailing, I decided to put the build symlink
back, but remove all the temporary packaging build directories: rm -
rf /usr/src/linux-source-2.6.31/debian/linux*
Now, stap found the debuginfo in:
/lib/modules/2.6.31-jknight-1-amd64/build/fs/autofs4/autofs4.ko
That is the file actually generated by the kernel build process,
unmangled by debian packaging scripts. And, then it worked! Without
segfaulting, hooray!
So, some questions, at the end of all this:
1) Surely --ignore-dwarf --ignore-vmlinux should've caused systemtap
to not use libelf to find and parse the dwarf debug info?
2) Why did stap find the debug data at such a strange path in /lib/
modules/.../build/debian/.... Does it do something like traverse every
file, recursively, under the modules directory until it finds one it
likes? That's quite...odd. I noticed that even if I renamed "build" to
"build.foo", it *STILL* looked in there.
3) The debian kernel's debuginfo does "objcopy --only-keep-
debug"...That seems like it shouldn't cause systemtap to blow up, but
it does. I guess that's a known bug?
4) Why does it blow up *differently* depending on whether it found the
file in /usr/lib/debug or /lib/modules?
5) Whose bug is it that systemtap doesn't look for /usr/lib/debug/.../
autofs4.ko, but only autofs4.ko.debug?
Apparently this is a difference between debian and Fedora. Fedora
systems append .debug, Debian systems do not. My guess: debian should
be patching their copy of elfutils to not append ".debug"? But maybe
that's an upstream bug, and it should try both by default (or
something). I dunno.
Someone else discovered the ".debug" issue in another program:
http://www.visophyte.org/rev_control/patches/chronicle-recorder/debian-usr-lib-debug-support.patch
And here's the debian reference about how to install debuginfo:
http://www.debian.org/doc/developers-reference/best-pkging-practices.html#bpp-dbg
I guess all these except the first are probably bugs in elfutils, not
systemtap, so perhaps I should be reporting it there instead. But
despite what you might think, I actually have no clue about any of
this crap: any clue you might infer from the above has all been gained
by random flailing over the course of the last couple hours. So I
figure it's safer to report here, first and redirect if requested. :)
James