This is the mail archive of the
elfutils-devel@sourceware.org
mailing list for the elfutils project.
Re: [PATCH] libdw: Make dwarf_getfuncs find all (defining) DW_TAG_subprogram DIEs.
- From: Josh Stone <jistone at redhat dot com>
- To: elfutils-devel at lists dot fedorahosted dot org
- Date: Thu, 19 Sep 2013 18:42:43 -0700
- Subject: Re: [PATCH] libdw: Make dwarf_getfuncs find all (defining) DW_TAG_subprogram DIEs.
On 09/18/2013 06:33 AM, Mark Wielaard wrote:
> dwarf_getfuncs used to return only the DW_TAG_subprogram DIEs that were
> direct children of the given CU. This is normally how GCC outputs the
> subprogram DIEs. But not always. For nested functions the subprogram DIE
> is placed under the subprogram DIE where it is nested. Other compilers
> might output the defining subprogram DIE of a C++ class function under
> the DW_TAG_namespace DIE where it was defined. Both such constructs seem
> allowed by the DWARF specification. So just searching the CU DIE children
> was wrong.
>
> To find all (defining) subprogram DIEs in a CU dwarf_getfuncs should
> use __libdw_visit_scopes to walk the tree. The only tricky part is
> making sure the offset returned and used when the callback returns
> DWARF_CB_ABORT is correct and the search continues at the right spot
> in the CU DIE tree.
As one might expect, deeper iteration ends up quite a bit slower, but we
may have to pay that price for correctness. However, before I get into
any numbers, I think you're *still* not catching everything.
I started experimenting with having stap do its own deep dive, rather
than using dwarf_getfuncs at all. We have full DIE walks in a few
different contexts, and I figured that if dwarf_getfuncs is now going to
be more expensive, maybe I could instead just consolidate our own full
DIE walks and cache multiple things at once.
With systemtap-debuginfo-2.3-1.fc19.x86_64, my patched stap.git found
75713 probes in 'process("/usr/bin/stap").function("*")' (as counted by
the pass-2 -v line). Unpatched stap.git with your patched elfutils.git
found only 61056 probes. :(
Here's one example which is not found: at pc=0x5cee9, there is an inline
instance of allocate(a)ios_base.h.
[ da48c] compile_unit
producer (GNU_strp_alt) "GNU C++ [...]"
language (data1) C_plus_plus (4)
name (strp) "parse.cxx"
...
[10a491] inlined_subroutine
abstract_origin (ref_addr) [ 1f72b]
low_pc (addr) +0x000000000005cee9
-->
[ 1f72b] subprogram
specification (ref_addr) [ 11b0]
inline (data1) declared_inlined (3)
-->
[ 11b0] subprogram
external (flag_present) Yes
name (GNU_strp_alt) "allocate"
decl_file (data1) 5 (->ios_base.h)
decl_line (data1) 99
As far as I can tell, dwarf_getfuncs never reports 1f72b. The trail
through partial units to that DIE should be:
[ da48c] compile_unit
...
[ da4e6] imported_unit
import (ref_addr) [ 21aad]
-->
[ 21aad] partial_unit
...
[ 21abb] imported_unit
import (ref_addr) [ 1e686]
-->
[ 1e686] partial_unit
...
[ 1f72b] subprogram
specification (ref_addr) [ 11b0]
inline (data1) declared_inlined (3)
And now that I've laid that out, I think I know why it's missed. Your
new tree_visitor is filtering out (die_offset < offset), in an attempt
to make the dwarf_getfuncs offset parameter work to resume. This can
fail with partial_units, because the DIE offsets as they are traversed
are not necessarily ordered anymore.
If I just remove your line that updates v->offset, then suddenly it
works. However, this means you won't have the right value to return at
the end of dwarf_getfuncs. Maybe that should be tracked in a separate
field of visitor_info.
Resuming is still going to be problematic with a less-than comparison.
I think rather it needs to iterate until the exact requested offset is
found, without assuming any order. Unless offset==0, of course, then
just start at the beginning.
I think this also means that you can't prune out whole subtrees, since
you don't know whether that offset could be found within, via an import.
And if a tree happens to be imported multiple times (can it?), then
resuming on an offset within that is ambiguous -- oh well.
HTH,
Josh