This is the mail archive of the
elfutils-devel@sourceware.org
mailing list for the elfutils project.
Re: [patch 2/2 v4] Fix loading core files without build-ids
- From: Mark Wielaard <mjw at redhat dot com>
- To: elfutils-devel at lists dot fedorahosted dot org
- Date: Wed, 29 May 2013 11:50:26 +0200
- Subject: Re: [patch 2/2 v4] Fix loading core files without build-ids
On Tue, 2013-05-28 at 20:36 +0200, Jan Kratochvil wrote:
> On Mon, 27 May 2013 16:05:46 +0200, Mark Wielaard wrote:
> > Since it doesn't actually free the struct itself, lets call it
> > clear_r_debug_info to avoid confusion.
>
> Done.
Thanks.
> > Reporting the full file names was arguably wrong to begin with. I don't
> > think we should restore that, but if people do feel this is a regression we
> > could add a guesser to unstrip.c itself (but I don't think we should).
>
> The full pathnames were clearly a bug for this patchset.
Yes, I agree, and I don't think anybody should rely on them, but maybe
something did anyway. I am fine with calling that broken behavior
though.
> But when we talk about it in general, as a new patchset, I rather find a bug
> dropping the path, when core file already contains the absolute pathname why
> to hide it from the user?
Right, if we can guarantee that path is the real one and not just an
arbitrary guess.
> But when we talk about some possible further extensions we should rather
> forget about the whole link map and just use NT_FILE. Or one can just already
> display the full pathnames easily (not in eu-readelf):
> $ readelf -n test-core.core
> Start End Page Offset
> 0x0000000000400000 0x0000000000402000 0x0000000000000000
> /home/jkratoch/redhat/elfutils-libregr/test-core-main
> 0x00000037ef400000 0x00000037ef420000 0x0000000000000000
> /usr/lib64/ld-2.16.so
Yes that is nice, we should have support for that too, in the future, in
some separate patch. Found the ELF NOTE description here:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2aa362c49c314a98fb9aebbd7760a461667bac05
Don't know if there is a more "official" documentation.
Does gdb/gcore also add this NT_FILE note now?
> > Document that count gets updated on success.
>
> The report_module helper is gone now.
>
> > The clearing of r_debug_info on failure is a nice because it prevents
> > accidental memory leaks, but also confused me on first read of the patch
> > (because I forgot the comment when reading the later code).
>
> The report_module helper is gone now.
That is a nice cleanup.
> > > /* We return the number of modules we found if we found any.
> > > If we found none, we return -1 instead of 0 if there was an
> > > - error rather than just nothing found. If link_map handling
> > > - failed, we still have the sniffed modules. */
> > > - return sniffed == 0 || listed > sniffed ? listed : sniffed;
> > > + error rather than just nothing found. */
> > > + return sniffed || listed >= 0 ? listed + sniffed : listed;
> >
> > What about the case of sniffed == 0 && listed == 0?
> > Should that return -1?
>
> It did not return -1 before so why we should now? There may exist valid core
> file which just does not contain any ELF module (=does not contain any
> identifiable ELF module).
>
> A different case is when there is some real unexpected failure processing the
> data structures, in such case the function still returns -1, either be an
> early return -1 call or due to LISTED == -1 (from report_r_debug()
> returning -1).
Aha. Sorry, I misread the comment "if there was an error". So if there
is no error and nothing found, just report zero. OK.
> > > + /* Try to match up DYN_VADDR against L_LD as found in link map.
> > > + Segments sniffing may guess invalid address as the first read-only memory
> > > + mapping may not be dumped to the core file (if ELF headers are not dumped)
> > > + and the ELF header is dumped first with the read/write mapping of the same
> > > + file at higher addresses. */
> > > + if (r_debug_info != NULL)
> > > + for (const struct r_debug_info_module *module = r_debug_info->module;
> > > + module != NULL; module = module->next)
> > > + if (module_start <= module->l_ld && module->l_ld < module_end)
> > > + {
> > > + GElf_Addr fixup = module->l_ld - dyn_vaddr;
> [...]
> > fixup might need a comment.
>
> added:
> /* L_LD read from link map must be right while DYN_VADDR is unsafe.
> Therefore subtract DYN_VADDR and add L_LD to get a possibly
> corrective displacement for all addresses computed so far. */
>
> > It wasn't immediately clear to me. The fixup is the difference between the
> > PT_LOAD address that the link-map know is the correct one and the PT_DYNAMIC
> > address (of which there can be only one). And module_start points to the
> > PT_LOAD address that was just gotten from the phdrs. Right?
>
> Right.
Thanks.
> > Actually now that I do understand, that is kind of what the comment above
> > already says. I just didn't immediately made the connection with the
> > offset/fixup being static. Is fixup always positive?
>
> At least in the supplied testcase it is negative.
>
> In fact I did not spent time thinking if it is positive or negative,
> I was following the logic which address is safe and which unsafe above.
OK, then I am wondering about the first check before the fixup.
> + if (module_start <= module->l_ld && module->l_ld < module_end)
Should the fixup not always be calculated and then the next check takes
care of everything?
> + GElf_Addr fixup = module->l_ld - dyn_vaddr;
> + if ((fixup & (dwfl->segment_align - 1)) == 0
> + && module_start + fixup <= module->l_ld
> + && module->l_ld < module_end + fixup)
Since the fixup could just be zero?
> > BTW. Why does link-map set module->name to the empty string when no name
> > (NULL) was found? Just convenience like here, so no NULL check is
> > needed, or is the empty name significant?
>
> 'name' is a GCC zero-length array, I am sorry but I cannot make it NULL:
> struct r_debug_info_module
> {
> [...]
> char name[0];
> };
Ah, OK then.
> > > diff --git a/libdwfl/link_map.c b/libdwfl/link_map.c
> > > @@ -352,9 +355,108 @@ report_r_debug (uint_fast8_t elfclass, uint_fast8_t elfdata,
> > > /* We have to find the file's phdrs to compute along with l_addr
> > > what its runtime address boundaries are. */
> > >
> > > - // XXX hook for sysroot
> > > - mod = INTUSE(dwfl_report_elf) (dwfl, basename (name),
> > > - name, -1, l_addr, true);
> > > + Dwfl_Module *mod = NULL;
> > > + if (iterations == 1 && dwfl->executable_for_core != NULL)
> > > + {
> > > + /* Find the main executable.
> > > + FIXME: Try to also find it via build-id. */
> > > + name = dwfl->executable_for_core;
> > > + }
> >
> > To solve that FIXME we should do somewhat the same as is done below with
> > the "inlined dwfl_report_elf" snippet? And we already do try to open the
> > executable if we cannot read the phdrs. Maybe this can be factored out a
> > bit?
>
> In fact when we do not have any executable filename or we failed to open the
> executable filename the only thing we can do is currently already done by the
> second pass of dwfl_segment_report_module calls with the r_debug_info hints.
>
> This code was there before I implemented the second pass even after the link
> map pass succeeded (found >= 1 shared libraries).
>
> So I have removed the FIXME comment and simplified it all a bit.
>
> > > + if (iterations != 2 && name != NULL)
> > > + {
> > > + /* Find a shared library or main executable. Do not try to
> > > + find a file for vDSO (where ITERATIONS equals 2). */
> >
> > I couldn't find where the iteration == 1 => exe, iteration == 2 => vDSO
> > comes from. Shouldn't the second check be iterations > 2 ?
>
> As discussed on #elfutils this ITERATIONS == 2 condition was unsafe, while
> currently vDSO is the 2nd entry with glibc at least with older or non-Linux
> core files 2nd entry is a regular shared library.
>
> One could make some name-based exceptions ("linux-vdso.so.1") like in
> http://pkgs.fedoraproject.org/cgit/gdb.git/tree/gdb-glibc-vdso-workaround.patch
>
> I found also this case is now obsoleted by the second
> dwfl_segment_report_module pass, finding the vDSO did not do anything else
> than just to call dwfl_segment_report_module for its segment.
All this certainly makes things a lot cleaner. Thanks.
> Fun is this all is now worthless - only for backward compatibility which has
> discutable value for core files - as it has been obsoleted by NT_FILE;
> probably, not sure, NT_FILE contains the list of any mapped files, not just
> the files really opened as shared libraries.
It certainly isn't worthless! We still would like to handle
old/incomplete core files from older systems (on newer systems). We
don't support NT_FILE yet. There are other ways core files are generated
than with the latest kernel version, like with gdb/gcore, which we also
like to handle.
> Attaching significantly simplified patch.
> commit cd584cc822842862923670ebc44d40bf55485a69
> Author: Jan Kratochvil <jan.kratochvil@redhat.com>
> Date: Thu May 23 20:19:16 2013 +0200
>
> Use DT_DEBUG library search first.
>
> libdwfl/
> 2013-05-28 Jan Kratochvil <jan.kratochvil@redhat.com>
>
> * argp-std.c (parse_opt) <ARGP_KEY_SUCCESS> <opt->core> <opt->e>: Set
> executable_for_core before calling dwfl_core_file_report.
> * core-file.c (clear_r_debug_info): New function.
> (dwfl_core_file_report): Move raw segments reporting lower. New
> variable r_debug_info, pass it to dwfl_segment_report_module. Call
> clear_r_debug_info in the end. Return sum of LISTED and SNIFFED.
> * dwfl_module_build_id.c (check_notes): Move into
> __libdwfl_find_elf_build_id.
> (__libdwfl_find_build_id): Rename to ...
> (__libdwfl_find_elf_build_id): ... here. Add parameters build_id_bits,
> build_id_elfaddr and build_id_len. Verify MOD vs. ELF.
> (__libdwfl_find_elf_build_id) (check_notes): Remove parameters mod and
> set, rename data_vaddr to data_elfaddr. Do not call found_build_id.
> (__libdwfl_find_elf_build_id): Update the check_notes caller, do not
> adjust its data_elfaddr parameter.
> (__libdwfl_find_build_id): New wrapper of __libdwfl_find_elf_build_id.
> * dwfl_segment_report_module.c (dwfl_segment_report_module): New
> parameter r_debug_info. New variable name_is_final. Adjust addresses
> according to R_DEBUG_INFO->MODULE. Check conflicts against DWFL.
> Do not overwrite NAME by SONAME if NAME_IS_FINAL.
> * libdwflP.h (__libdwfl_find_elf_build_id): New declaration.
> (struct r_debug_info_module, struct r_debug_info): New definitions.
> (dwfl_segment_report_module, dwfl_link_map_report): Add parameter
> r_debug_info.
> * link_map.c: Include fcntl.h.
> (report_r_debug): Add parameter r_debug_info, describe it in the
> function comment. Delete dwfl_addrmodule call and its dependent code.
> Verify build-id before calling dwfl_report_elf, also supply
> executable_for_core to it. Store r_debug_info->module info when
> appropriate.
> (dwfl_link_map_report): Add parameter r_debug_info. New variable
> in_ok. Try to read IN from EXECUTABLE_FOR_CORE. Update report_r_debug
> caller parameters.
>
> tests/
> 2013-05-28 Jan Kratochvil <jan.kratochvil@redhat.com>
>
> * Makefile.am (EXTRA_DIST): Add test-core-lib.so.bz2,
> test-core.core.bz2 and test-core.exec.bz2.
> * run-addrname-test.sh: New test for these files.
> * run-unstrip-n.sh: Update expected output. New test for these files.
> * test-core-lib.so.bz2: New file.
> * test-core.core.bz2: New file.
> * test-core.exec.bz2: New file.
>
> Signed-off-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Looks good to me. But please merge this commit and the code movement
commit into one, since those are really one change.
Thanks,
Mark