This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [patch 2/2 v4] Fix loading core files without build-ids

From: Mark Wielaard <mjw at redhat dot com>
To: elfutils-devel at lists dot fedorahosted dot org
Date: Wed, 29 May 2013 11:50:26 +0200
Subject: Re: [patch 2/2 v4] Fix loading core files without build-ids
On Tue, 2013-05-28 at 20:36 +0200, Jan Kratochvil wrote:
> On Mon, 27 May 2013 16:05:46 +0200, Mark Wielaard wrote:
> > Since it doesn't actually free the struct itself, lets call it
> > clear_r_debug_info to avoid confusion.
> 
> Done.

Thanks.

> > Reporting the full file names was arguably wrong to begin with. I don't
> > think we should restore that, but if people do feel this is a regression we
> > could add a guesser to unstrip.c itself (but I don't think we should).
> 
> The full pathnames were clearly a bug for this patchset.

Yes, I agree, and I don't think anybody should rely on them, but maybe
something did anyway. I am fine with calling that broken behavior
though.

> But when we talk about it in general, as a new patchset, I rather find a bug
> dropping the path, when core file already contains the absolute pathname why
> to hide it from the user?

Right, if we can guarantee that path is the real one and not just an
arbitrary guess.

> But when we talk about some possible further extensions we should rather
> forget about the whole link map and just use NT_FILE.  Or one can just already
> display the full pathnames easily (not in eu-readelf):
> $ readelf -n test-core.core
>                  Start                 End         Page Offset
>     0x0000000000400000  0x0000000000402000  0x0000000000000000
>         /home/jkratoch/redhat/elfutils-libregr/test-core-main
>     0x00000037ef400000  0x00000037ef420000  0x0000000000000000
>         /usr/lib64/ld-2.16.so

Yes that is nice, we should have support for that too, in the future, in
some separate patch. Found the ELF NOTE description here:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2aa362c49c314a98fb9aebbd7760a461667bac05
Don't know if there is a more "official" documentation.
Does gdb/gcore also add this NT_FILE note now?

> > Document that count gets updated on success.
> 
> The report_module helper is gone now.
>
> > The clearing of r_debug_info on failure is a nice because it prevents
> > accidental memory leaks, but also confused me on first read of the patch
> > (because I forgot the comment when reading the later code).
> 
> The report_module helper is gone now.

That is a nice cleanup.

> > >    /* We return the number of modules we found if we found any.
> > >       If we found none, we return -1 instead of 0 if there was an
> > > -     error rather than just nothing found.  If link_map handling
> > > -     failed, we still have the sniffed modules.  */
> > > -  return sniffed == 0 || listed > sniffed ? listed : sniffed;
> > > +     error rather than just nothing found.  */
> > > +  return sniffed || listed >= 0 ? listed + sniffed : listed;
> > 
> > What about the case of sniffed == 0 && listed == 0?
> > Should that return -1?
> 
> It did not return -1 before so why we should now?  There may exist valid core
> file which just does not contain any ELF module (=does not contain any
> identifiable ELF module).
> 
> A different case is when there is some real unexpected failure processing the
> data structures, in such case the function still returns -1, either be an
> early return -1 call or due to LISTED == -1 (from report_r_debug()
> returning -1).

Aha. Sorry, I misread the comment "if there was an error". So if there
is no error and nothing found, just report zero. OK.

> > > +  /* Try to match up DYN_VADDR against L_LD as found in link map.
> > > +     Segments sniffing may guess invalid address as the first read-only memory
> > > +     mapping may not be dumped to the core file (if ELF headers are not dumped)
> > > +     and the ELF header is dumped first with the read/write mapping of the same
> > > +     file at higher addresses.  */
> > > +  if (r_debug_info != NULL)
> > > +    for (const struct r_debug_info_module *module = r_debug_info->module;
> > > +	 module != NULL; module = module->next)
> > > +      if (module_start <= module->l_ld && module->l_ld < module_end)
> > > +	{
> > > +	  GElf_Addr fixup = module->l_ld - dyn_vaddr;
> [...]
> > fixup might need a comment.
> 
> added:
>           /* L_LD read from link map must be right while DYN_VADDR is unsafe.
>              Therefore subtract DYN_VADDR and add L_LD to get a possibly
>              corrective displacement for all addresses computed so far.  */
> 
> > It wasn't immediately clear to me. The fixup is the difference between the
> > PT_LOAD address that the link-map know is the correct one and the PT_DYNAMIC
> > address (of which there can be only one). And module_start points to the
> > PT_LOAD address that was just gotten from the phdrs. Right?
> 
> Right.

Thanks.

> > Actually now that I do understand, that is kind of what the comment above
> > already says. I just didn't immediately made the connection with the
> > offset/fixup being static. Is fixup always positive?
> 
> At least in the supplied testcase it is negative.
> 
> In fact I did not spent time thinking if it is positive or negative,
> I was following the logic which address is safe and which unsafe above.

OK, then I am wondering about the first check before the fixup.

> + if (module_start <= module->l_ld && module->l_ld < module_end)

Should the fixup not always be calculated and then the next check takes
care of everything?

> +	  GElf_Addr fixup = module->l_ld - dyn_vaddr;
> +	  if ((fixup & (dwfl->segment_align - 1)) == 0
> +	      && module_start + fixup <= module->l_ld
> +	      && module->l_ld < module_end + fixup)

Since the fixup could just be zero?

> > BTW. Why does link-map set module->name to the empty string when no name
> > (NULL) was found? Just convenience like here, so no NULL check is
> > needed, or is the empty name significant?
> 
> 'name' is a GCC zero-length array, I am sorry but I cannot make it NULL:
> 	struct r_debug_info_module
> 	{ 
> 	[...]
> 	  char name[0];
> 	};

Ah, OK then.

> > > diff --git a/libdwfl/link_map.c b/libdwfl/link_map.c
> > > @@ -352,9 +355,108 @@ report_r_debug (uint_fast8_t elfclass, uint_fast8_t elfdata,
> > >        /* We have to find the file's phdrs to compute along with l_addr
> > >  	 what its runtime address boundaries are.  */
> > >  
> > > -      // XXX hook for sysroot
> > > -      mod = INTUSE(dwfl_report_elf) (dwfl, basename (name),
> > > -				     name, -1, l_addr, true);
> > > +      Dwfl_Module *mod = NULL;
> > > +      if (iterations == 1 && dwfl->executable_for_core != NULL)
> > > +	{
> > > +	  /* Find the main executable.
> > > +	     FIXME: Try to also find it via build-id.  */
> > > +	  name = dwfl->executable_for_core;
> > > +	}
> > 
> > To solve that FIXME we should do somewhat the same as is done below with
> > the "inlined dwfl_report_elf" snippet? And we already do try to open the
> > executable if we cannot read the phdrs. Maybe this can be factored out a
> > bit?
> 
> In fact when we do not have any executable filename or we failed to open the
> executable filename the only thing we can do is currently already done by the
> second pass of dwfl_segment_report_module calls with the r_debug_info hints.
> 
> This code was there before I implemented the second pass even after the link
> map pass succeeded (found >= 1 shared libraries).
> 
> So I have removed the FIXME comment and simplified it all a bit.
>
> > > +      if (iterations != 2 && name != NULL)
> > > +	{
> > > +	  /* Find a shared library or main executable.  Do not try to
> > > +	     find a file for vDSO (where ITERATIONS equals 2).  */
> > 
> > I couldn't find where the iteration == 1 => exe, iteration == 2 => vDSO
> > comes from. Shouldn't the second check be iterations > 2 ?
> 
> As discussed on #elfutils this ITERATIONS == 2 condition was unsafe, while
> currently vDSO is the 2nd entry with glibc at least with older or non-Linux
> core files 2nd entry is a regular shared library.
> 
> One could make some name-based exceptions ("linux-vdso.so.1") like in
> 	http://pkgs.fedoraproject.org/cgit/gdb.git/tree/gdb-glibc-vdso-workaround.patch
> 
> I found also this case is now obsoleted by the second
> dwfl_segment_report_module pass, finding the vDSO did not do anything else
> than just to call dwfl_segment_report_module for its segment.

All this certainly makes things a lot cleaner. Thanks.

> Fun is this all is now worthless - only for backward compatibility which has
> discutable value for core files - as it has been obsoleted by NT_FILE;
> probably, not sure, NT_FILE contains the list of any mapped files, not just
> the files really opened as shared libraries.

It certainly isn't worthless! We still would like to handle
old/incomplete core files from older systems (on newer systems). We
don't support NT_FILE yet. There are other ways core files are generated
than with the latest kernel version, like with gdb/gcore, which we also
like to handle.

> Attaching significantly simplified patch.

> commit cd584cc822842862923670ebc44d40bf55485a69
> Author: Jan Kratochvil <jan.kratochvil@redhat.com>
> Date:   Thu May 23 20:19:16 2013 +0200
> 
>     Use DT_DEBUG library search first.
>     
>     libdwfl/
>     2013-05-28  Jan Kratochvil  <jan.kratochvil@redhat.com>
>     
>     	* argp-std.c (parse_opt) <ARGP_KEY_SUCCESS> <opt->core> <opt->e>: Set
>     	executable_for_core before calling dwfl_core_file_report.
>     	* core-file.c (clear_r_debug_info): New function.
>     	(dwfl_core_file_report): Move raw segments reporting lower.  New
>     	variable r_debug_info, pass it to dwfl_segment_report_module.  Call
>     	clear_r_debug_info in the end.  Return sum of LISTED and SNIFFED.
>     	* dwfl_module_build_id.c (check_notes): Move into
>     	__libdwfl_find_elf_build_id.
>     	(__libdwfl_find_build_id): Rename to ...
>     	(__libdwfl_find_elf_build_id): ... here.  Add parameters build_id_bits,
>     	build_id_elfaddr and build_id_len.  Verify MOD vs. ELF.
>     	(__libdwfl_find_elf_build_id) (check_notes): Remove parameters mod and
>     	set, rename data_vaddr to data_elfaddr.  Do not call found_build_id.
>     	(__libdwfl_find_elf_build_id): Update the check_notes caller, do not
>     	adjust its data_elfaddr parameter.
>     	(__libdwfl_find_build_id): New wrapper of __libdwfl_find_elf_build_id.
>     	* dwfl_segment_report_module.c (dwfl_segment_report_module): New
>     	parameter r_debug_info.  New variable name_is_final.  Adjust addresses
>     	according to R_DEBUG_INFO->MODULE.  Check conflicts against DWFL.
>     	Do not overwrite NAME by SONAME if NAME_IS_FINAL.
>     	* libdwflP.h (__libdwfl_find_elf_build_id): New declaration.
>     	(struct r_debug_info_module, struct r_debug_info): New definitions.
>     	(dwfl_segment_report_module, dwfl_link_map_report): Add parameter
>     	r_debug_info.
>     	* link_map.c: Include fcntl.h.
>     	(report_r_debug): Add parameter r_debug_info, describe it in the
>     	function comment.  Delete dwfl_addrmodule call and its dependent code.
>     	Verify build-id before calling dwfl_report_elf, also supply
>     	executable_for_core to it.  Store r_debug_info->module info when
>     	appropriate.
>     	(dwfl_link_map_report): Add parameter r_debug_info.  New variable
>     	in_ok.  Try to read IN from EXECUTABLE_FOR_CORE.  Update report_r_debug
>     	caller parameters.
>     
>     tests/
>     2013-05-28  Jan Kratochvil  <jan.kratochvil@redhat.com>
>     
>     	* Makefile.am (EXTRA_DIST): Add test-core-lib.so.bz2,
>     	test-core.core.bz2 and test-core.exec.bz2.
>     	* run-addrname-test.sh: New test for these files.
>     	* run-unstrip-n.sh: Update expected output.  New test for these files.
>     	* test-core-lib.so.bz2: New file.
>     	* test-core.core.bz2: New file.
>     	* test-core.exec.bz2: New file.
>     
>     Signed-off-by: Jan Kratochvil <jan.kratochvil@redhat.com>

Looks good to me. But please merge this commit and the code movement
commit into one, since those are really one change.

Thanks,

Mark
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]