This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch 2/2] Fix loading core files without build-ids


Hi Mark,

there is still some work on this patch as I have found from your review, see
below.


On Sat, 18 May 2013 00:00:58 +0200, Mark Wielaard wrote:
> BTW. How often does it happen that build-id are missing in core files?
> Is that just when a distro "misconfigures" the kernel dumping settings?
> We should of course handle it correctly anyway, just wondering.

CORE_DUMP_DEFAULT_ELF_HEADERS is in linux-2.6.git since
	commit 656eb2cd5da153762f2e8419ca117ce12ef522c3
	Author: Roland McGrath <roland@redhat.com>
	Date:   Sat Oct 18 20:28:23 2008 -0700

and it is default Y since
	commit 895021552d6ffe8a4d076cb5c4b1e700c33e96e1
	Author: Roland McGrath <roland@redhat.com>
	Date:   Wed Oct 27 15:34:09 2010 -0700

So anything older than 2010 or anything non-Linux or any GDB gcore-generated
core file (still not fixed, mea culpa).  Besides that someone can also
unconfigure it, I guess some ultra-mini kernel configs may illogically do it.

OTOH for practical Fedora/RHEL cases this patch is probably irrelevant.


> > The testcase run-unstrip-n.sh had to be updated:
> > (a) It was currently reporting filenames like /lib/libc.so.6 but those were
> >     a false match, the system files do not match build-ids from the core file,
> >     therefore there should be no successful match to system files.
[...]
> But now you don't know the name at all? That is a pity since they are a
> good hint. Would be nice if we could provide them in some way as hint as
> you see with /proc/self/exe -> /bin/foo (deleted), or by putting them
> between [brackets].

This is a real bug, thanks for noticing it.

New dwfl_link_map_report does not report modules for which it cannot find its
file on disk.  This is correct as one does not know the module boundaries just
from a link map entry.  The module boundaries were found in such case by
dwfl_segment_report_module and names filled in later by dwfl_link_map_report.

Recent Linux kernels now produce also core note NT_FILE but in such case one
usually has also build-id headers already so this patch is not needed.

My current plan is:
(1) Save filenames for non-existing files during initial dwfl_link_map_report.
(2) Run dwfl_segment_report_module even if dwfl_link_map_report found something.
(2b) dwfl_segment_report_module should omit conflicting modules.
(3) Update found anonymous "[dso]" modules by the saved filenames.


> > 	* core-file.c (dwfl_core_file_report): Move raw segments reporting
> >     	lower.  New variables vdso_bias, vdso_l_ld and exec_vaddr.  Call also
> >     	dwfl_segment_report_module for EXEC_VADDR and VDSO_L_LD.  Call
> >     	dwfl_segment_report_module for raw segments only if LISTED is zero.
> 
> Is the FIXME: Use vdso_bias. just informational, or were you going to do
> that?

I am not going to do that.  I was trying to refactor the code to make it
possible but I find too many changes needed in a code I do not understand
well.  Primarily I believe the bias auto-detection in
dwfl_segment_report_module should always work correctly for vDSO.


> So you don't sniff at all if dwfl_link_map_report () found
> something. Would it hurt at this point to sniff?

That's the question.  Sniffing will always bring in possibly incorrect data.
I believe it would be useful only in the case link_map (r_debug->r_map) is
corrupted in the middle and some of the libraries do not get reported
(as I have seen in real world core files from ABRT).  Maybe report_r_debug
could report corrupted link_map - it currently internally checks
	++iterations < dwfl->lookup_elts
but it should also check l_prev does match the previous entry like GDB does.

Still core files with corrupted link_map I find more a corner case and it
could be added as an add-on patch IMO.


> It is a pity dwfl_segment_report_module () doesn't have documentation. I am
> still staring at it to fully understand what it does. 

Yes, I also had this problem, this is why I used it only as a 2nd pass if
dwfl_link_map_report did not find anything.

One needs to decide there when it is really safe to add a new non-conflicting
module by dwfl_segment_report_module but I admit I did not investigate it
much.


> > 	* dwfl_module_build_id.c (check_notes): Move into
> >     	__libdwfl_find_elf_build_id.
> >     	(__libdwfl_find_build_id): Rename to ...
> >     	(__libdwfl_find_elf_build_id): ... here.  Add parameters build_id_bits,
> >     	build_id_elfaddr and build_id_len.  Verify MOD vs. ELF.
> >     	(__libdwfl_find_elf_build_id) (check_notes): Remove parameters mod and
> >     	set, rename data_vaddr to data_elfaddr.  Do not call found_build_id.
> >     	(__libdwfl_find_elf_build_id): Update the check_notes caller, do not
> >     	adjust its data_elfaddr parameter.
> >     	(__libdwfl_find_build_id): New wrapper of __libdwfl_find_elf_build_id.
> >         * libdwflP.h (__libdwfl_find_elf_build_id): New declaration.
> 
> OK, as written here the moving around confused me a little. But reading
> the code it looks correct. Since __libdwfl_find_build_id () adds the
> main bias afterwards, you don't adjusting vaddr in the other places
> anymore.

Technically I just split out the libelf/ part of __libdwfl_find_build_id into
__libdwfl_find_elf_build_id.  But in the end I found I cannot easily make it
really libelf/-clean as __libdwfl_find_elf_build_id needs
__libdwfl_relocate_value for ET_REL and separating the call out of it would
complicate the code more than worth moving it to libelf/.  But still I could
refactor it more to really it move __libdwfl_find_elf_build_id to libelf/ .


Thanks,
Jan

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]