This is the mail archive of the
elfutils-devel@sourceware.org
mailing list for the elfutils project.
Re: [patch 2/2] Fix loading core files without build-ids
- From: Jan Kratochvil <jan dot kratochvil at redhat dot com>
- To: elfutils-devel at lists dot fedorahosted dot org
- Date: Sun, 19 May 2013 18:38:56 +0200
- Subject: Re: [patch 2/2] Fix loading core files without build-ids
Hi Mark,
there is still some work on this patch as I have found from your review, see
below.
On Sat, 18 May 2013 00:00:58 +0200, Mark Wielaard wrote:
> BTW. How often does it happen that build-id are missing in core files?
> Is that just when a distro "misconfigures" the kernel dumping settings?
> We should of course handle it correctly anyway, just wondering.
CORE_DUMP_DEFAULT_ELF_HEADERS is in linux-2.6.git since
commit 656eb2cd5da153762f2e8419ca117ce12ef522c3
Author: Roland McGrath <roland@redhat.com>
Date: Sat Oct 18 20:28:23 2008 -0700
and it is default Y since
commit 895021552d6ffe8a4d076cb5c4b1e700c33e96e1
Author: Roland McGrath <roland@redhat.com>
Date: Wed Oct 27 15:34:09 2010 -0700
So anything older than 2010 or anything non-Linux or any GDB gcore-generated
core file (still not fixed, mea culpa). Besides that someone can also
unconfigure it, I guess some ultra-mini kernel configs may illogically do it.
OTOH for practical Fedora/RHEL cases this patch is probably irrelevant.
> > The testcase run-unstrip-n.sh had to be updated:
> > (a) It was currently reporting filenames like /lib/libc.so.6 but those were
> > a false match, the system files do not match build-ids from the core file,
> > therefore there should be no successful match to system files.
[...]
> But now you don't know the name at all? That is a pity since they are a
> good hint. Would be nice if we could provide them in some way as hint as
> you see with /proc/self/exe -> /bin/foo (deleted), or by putting them
> between [brackets].
This is a real bug, thanks for noticing it.
New dwfl_link_map_report does not report modules for which it cannot find its
file on disk. This is correct as one does not know the module boundaries just
from a link map entry. The module boundaries were found in such case by
dwfl_segment_report_module and names filled in later by dwfl_link_map_report.
Recent Linux kernels now produce also core note NT_FILE but in such case one
usually has also build-id headers already so this patch is not needed.
My current plan is:
(1) Save filenames for non-existing files during initial dwfl_link_map_report.
(2) Run dwfl_segment_report_module even if dwfl_link_map_report found something.
(2b) dwfl_segment_report_module should omit conflicting modules.
(3) Update found anonymous "[dso]" modules by the saved filenames.
> > * core-file.c (dwfl_core_file_report): Move raw segments reporting
> > lower. New variables vdso_bias, vdso_l_ld and exec_vaddr. Call also
> > dwfl_segment_report_module for EXEC_VADDR and VDSO_L_LD. Call
> > dwfl_segment_report_module for raw segments only if LISTED is zero.
>
> Is the FIXME: Use vdso_bias. just informational, or were you going to do
> that?
I am not going to do that. I was trying to refactor the code to make it
possible but I find too many changes needed in a code I do not understand
well. Primarily I believe the bias auto-detection in
dwfl_segment_report_module should always work correctly for vDSO.
> So you don't sniff at all if dwfl_link_map_report () found
> something. Would it hurt at this point to sniff?
That's the question. Sniffing will always bring in possibly incorrect data.
I believe it would be useful only in the case link_map (r_debug->r_map) is
corrupted in the middle and some of the libraries do not get reported
(as I have seen in real world core files from ABRT). Maybe report_r_debug
could report corrupted link_map - it currently internally checks
++iterations < dwfl->lookup_elts
but it should also check l_prev does match the previous entry like GDB does.
Still core files with corrupted link_map I find more a corner case and it
could be added as an add-on patch IMO.
> It is a pity dwfl_segment_report_module () doesn't have documentation. I am
> still staring at it to fully understand what it does.
Yes, I also had this problem, this is why I used it only as a 2nd pass if
dwfl_link_map_report did not find anything.
One needs to decide there when it is really safe to add a new non-conflicting
module by dwfl_segment_report_module but I admit I did not investigate it
much.
> > * dwfl_module_build_id.c (check_notes): Move into
> > __libdwfl_find_elf_build_id.
> > (__libdwfl_find_build_id): Rename to ...
> > (__libdwfl_find_elf_build_id): ... here. Add parameters build_id_bits,
> > build_id_elfaddr and build_id_len. Verify MOD vs. ELF.
> > (__libdwfl_find_elf_build_id) (check_notes): Remove parameters mod and
> > set, rename data_vaddr to data_elfaddr. Do not call found_build_id.
> > (__libdwfl_find_elf_build_id): Update the check_notes caller, do not
> > adjust its data_elfaddr parameter.
> > (__libdwfl_find_build_id): New wrapper of __libdwfl_find_elf_build_id.
> > * libdwflP.h (__libdwfl_find_elf_build_id): New declaration.
>
> OK, as written here the moving around confused me a little. But reading
> the code it looks correct. Since __libdwfl_find_build_id () adds the
> main bias afterwards, you don't adjusting vaddr in the other places
> anymore.
Technically I just split out the libelf/ part of __libdwfl_find_build_id into
__libdwfl_find_elf_build_id. But in the end I found I cannot easily make it
really libelf/-clean as __libdwfl_find_elf_build_id needs
__libdwfl_relocate_value for ET_REL and separating the call out of it would
complicate the code more than worth moving it to libelf/. But still I could
refactor it more to really it move __libdwfl_find_elf_build_id to libelf/ .
Thanks,
Jan