This is the mail archive of the archer@sourceware.org mailing list for the Archer project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: Generating gdb index at link time

From: Cary Coutant <ccoutant at google dot com>
To: Tom Tromey <tromey at redhat dot com>, Project Archer <archer at sourceware dot org>
Cc: Dodji Seketeli <dodji at redhat dot com>, Sterling Augustine <saugustine at google dot com>
Date: Fri, 5 Aug 2011 13:30:35 -0700
Subject: Re: Generating gdb index at link time
References: <CAHACq4oHymXfBOqzJfyzQXNyW5PYkN6g65X6x1rMU+YmJybZmQ@mail.gmail.com><m3d3gkx8ar.fsf@fleche.redhat.com><CAHACq4ofyavZBt4y65OfYSoeW--HeEAT=sz86urLC0MXBB0hnA@mail.gmail.com><m3d3gjwuho.fsf@fleche.redhat.com><CAHACq4poC0QkGowdHkg0_Y1FRmyXTZZDeoBYrUf91nxz9SJtQw@mail.gmail.com>
> [ opening up to the archer mailing list -- the beginning of the
> conversation is copied below ]

Oops, really adding archer this time! Sorry for the dup.

> [ also adding Sterling, who will be doing the GCC changes ]
>
>
> On Fri, Aug 5, 2011 at 12:04 PM, Tom Tromey <tromey@redhat.com> wrote:
>> Cary> No problem at all. I considered sending this to the archer mailing
>> Cary> list, but figured I'd just start with you and Dodji. Would that be
>> Cary> what you were thinking of by "public", or something more like the gcc
>> Cary> list?
>>
>> I was thinking archer or gdb.
>>
>> Tom> This problem was that .debug_pub* are explicitly for public names, but
>> Tom> for GDB we wanted to see all names (so that "break staticfunction" works
>> Tom> and "break typo" doesn't cause reading all CUs). ?This particular
>> Tom> problem was resolved by you :) saying that we should just change GCC to
>> Tom> emit all the names.
>>
>> Cary> No one has actually changed GCC to do that, yet, though -- right?
>>
>> I think either Dodji or I had a patch at one point. ?It may have been
>> part of Dodji's patch to change .debug_pub* to also include all the
>> extra info we thought we needed at the time.
>>
>> I'm not sure if it is still around.
>
> If you can find it, we'd definitely be interested in it.
>
>> Tom> The other issue with .debug_pub* was that I implemented a reader for it
>> Tom> in gdb and it was too slow. ?It may be possible to fix this -- I don't
>> Tom> think I tried lazily reading it.
>>
>> Cary> OK, that actually answers a question I posed a while back -- having a
>> Cary> complete hashed symbol table does help, compared to just having a
>> Cary> usable implementation of pubnames/pubtypes.
>>
>> Yeah. ?Though, again IIRC, one problem was that we had to do name
>> canonicalization while reading.
>>
>> Tom> I think at the time I was concerned that the arange was incorrect. ?I
>> Tom> did find some arange bugs in the wild (all since fixed) when we first
>> Tom> used that code. ?Also I think I had a concern, probably theoretical,
>> Tom> that since aranges is optional it might not appear -- but it seems to me
>> Tom> now that this could be accounted for at index creation time.
>>
>> Cary> In gold, I plan to check for DW_AT_ranges or DW_AT_{low,high}_pc in
>> Cary> the compile_unit DIE, as I'm doing now. That much of parsing the
>> Cary> debug_info isn't that expensive.
>>
>> You may want to research this change a little, just to make sure your
>> plan is compatible with what gdb expects:
>>
>> ? ?http://sourceware.org/bugzilla/show_bug.cgi?id=12302
>> ? ?http://sourceware.org/ml/gdb-patches/2010-12/msg00093.html
>>
>> I'm not sure what test case inspired this.
>
> Thanks, I'll check with dje when he gets back on Monday. I think
> gold's current behavior is compatible with this, but I'll check to
> make sure.
>
>> Cary> I think the biggest problem is an issue I've seen where the DIE for a
>> Cary> class isn't properly nested inside the right context (even after
>> Cary> following DW_AT_specification). There's an ugly workaround for this in
>> Cary> gdb (by looking for a member subprogram with a linkage name, and
>> Cary> extracting the class name from that), which I copied into gold. As
>> Cary> part of this work, I'd like to fix that bug in GCC, and any others
>> Cary> like it.
>>
>> Yeah. ?We've tried to file and/or fix problems like this as we've run
>> across them. ?There are still some open bugs around naming.
>>
>> Tom
>
> Thanks,
>
> -cary
>
>
>
> ---------- Forwarded message ----------
> From: Cary Coutant <ccoutant@google.com>
> Date: Thu, Aug 4, 2011 at 6:10 PM
> Subject: Generating gdb index at link time
> To: Tom Tromey <tromey@redhat.com>, Dodji Seketeli <dodji@redhat.com>
>
>
> I talked to Dodji in London in June and I believe I mentioned that I
> was working on adding gdb index support in the gold linker. I did
> finish that, but I've failed miserably so far at making it perform
> decently -- it more than doubles the link time for a large C++ binary.
> Logically, it shouldn't take any more time than it would take to load
> the non-indexed file in gdb and save the gdb index from there, but it
> does, and I've been struggling to fix that. It's still bad enough that
> I haven't yet bothered to submit the patch upstream. It does, however,
> have a significant effect on gdb startup time, as I'm sure you know,
> so we definitely want to have it.
>
> So I've been thinking about faster ways of generating the index...
>
> I'm planning to fix gcc to emit all the names we want to see in
> .debug_pubnames and .debug_pubtypes (extending .debug_pubtypes to
> include type units as well), then have gold generate the gdb index
> from those sections and .debug_aranges, plus what ought to be a *very*
> fast pass over the compilation unit headers in .debug_info. Gold could
> then discard the .debug_pub* sections from the final output.
>
> In reviewing some old email discussions, I noted a couple of things.
> One of your original complaints about .debug_pubnames was that it
> didn't distinguish between function and variable, or between typedef,
> enum, struct, and class. But the gdb index format does not do that
> either. Is this something you discovered later was not actually
> necessary? Likewise, I believe you also mentioned something early on
> about wanting to distinguish between private and public names, but no
> such distinction is made in the gdb index.
>
> Did you ever reach any conclusion on the value of .debug_aranges vs.
> the address table in the gdb index? I saw a gcc patch go by to add an
> empty range for compilation units that had no code so that you could
> distinguish between an empty arange and a missing arange. Did that
> resolve the problem with .debug_aranges? Were there other issues?
>
> Any comments, objections, advice, ...?
>
> -cary
>
>
>
>
> ---------- Forwarded message ----------
> From: Tom Tromey <tromey@redhat.com>
> Date: Fri, Aug 5, 2011 at 7:05 AM
> Subject: Re: Generating gdb index at link time
> To: Cary Coutant <ccoutant@google.com>
> Cc: Dodji Seketeli <dodji@redhat.com>
>
>
> Cary> I talked to Dodji in London in June and I believe I mentioned that I
> Cary> was working on adding gdb index support in the gold linker.
>
> Nice.
>
> It may be better to have this discussion in public; but at least I'd
> like to CC Jan and Keith, if you don't mind, as they can shed some light
> on a couple of issues.
>
> Cary> I did finish that, but I've failed miserably so far at making it
> Cary> perform decently -- it more than doubles the link time for a large
> Cary> C++ binary.
>
> I'm surprised by this as well.
>
> Cary> One of your original complaints about .debug_pubnames was that it
> Cary> didn't distinguish between function and variable, or between typedef,
> Cary> enum, struct, and class. But the gdb index format does not do that
> Cary> either. Is this something you discovered later was not actually
> Cary> necessary?
>
> Yeah; I tried it without this information, just having gdb expand all
> matching CUs, and aside from an issue with type lookup it performed
> acceptable. ?We fixed the type lookup thing without affecting the index,
> though I don't remember the details right now.
>
> Cary> Likewise, I believe you also mentioned something early on
> Cary> about wanting to distinguish between private and public names, but no
> Cary> such distinction is made in the gdb index.
>
> This problem was that .debug_pub* are explicitly for public names, but
> for GDB we wanted to see all names (so that "break staticfunction" works
> and "break typo" doesn't cause reading all CUs). ?This particular
> problem was resolved by you :) saying that we should just change GCC to
> emit all the names.
>
> The other issue with .debug_pub* was that I implemented a reader for it
> in gdb and it was too slow. ?It may be possible to fix this -- I don't
> think I tried lazily reading it.
>
> Cary> Did you ever reach any conclusion on the value of .debug_aranges vs.
> Cary> the address table in the gdb index? I saw a gcc patch go by to add an
> Cary> empty range for compilation units that had no code so that you could
> Cary> distinguish between an empty arange and a missing arange. Did that
> Cary> resolve the problem with .debug_aranges? Were there other issues?
>
> Putting the range info into the index may have been a mistake.
> I am not sure why I did this :(
>
> I still have arange-reading code on a branch. ?We could resurrect this
> and drop this information from the index.
>
> There was some bug fix involving the address ranges emitted in the
> index, but I don't recall details now.
>
> I think at the time I was concerned that the arange was incorrect. ?I
> did find some arange bugs in the wild (all since fixed) when we first
> used that code. ?Also I think I had a concern, probably theoretical,
> that since aranges is optional it might not appear -- but it seems to me
> now that this could be accounted for at index creation time.
>
> Cary> Any comments, objections, advice, ...?
>
> I think the biggest difficulty is in C++ name canonicalization.
>
> My understanding (Jan and Keith are the experts here) is that DW_AT_name
> does not always agree with the demangler; but one also cannot always
> rely on the demangler because not all entities are given a
> DW_linkage_name (perhaps fixed in newer GCC versions? ?Dodji would
> know, or anyway it is in bugzilla).
>
> We have a second canonicalization step (a bunch of code in
> cp-name-parser.y) that we run on the demangled names. ?I'm not 100% sure
> this is still necessary. ?I think this code is reasonably
> self-contained; but maybe slow.
>
> I think it is worth considering changes to the index, even radical ones,
> if it would make your solution better. ?The index is purely an ad hoc
> invention and should, IMO, be considered as mutable as any other piece.
>
> Tom
>
>
>
>
> ---------- Forwarded message ----------
> From: Cary Coutant <ccoutant@google.com>
> Date: Fri, Aug 5, 2011 at 11:10 AM
> Subject: Re: Generating gdb index at link time
> To: Tom Tromey <tromey@redhat.com>
> Cc: Dodji Seketeli <dodji@redhat.com>
>
>
>> It may be better to have this discussion in public; but at least I'd
>> like to CC Jan and Keith, if you don't mind, as they can shed some light
>> on a couple of issues.
>
> No problem at all. I considered sending this to the archer mailing
> list, but figured I'd just start with you and Dodji. Would that be
> what you were thinking of by "public", or something more like the gcc
> list?
>
>> Cary> I did finish that, but I've failed miserably so far at making it
>> Cary> perform decently -- it more than doubles the link time for a large
>> Cary> C++ binary.
>>
>> I'm surprised by this as well.
>
> Yeah, I found a few poor implementation choices (and a stupid bug
> where I scanned the section table for the .debug_str section for every
> string reference, even when I already knew the section index from the
> relocation) that made big improvements when fixed, but it's still
> spending a lot of time tracking the relocations, and a lot of time
> parsing DIEs and computing canonical names that I don't need simply
> because I *might* need them to follow a DW_AT_specification or
> DW_AT_abstract_origin reference.
>
>> Cary> Likewise, I believe you also mentioned something early on
>> Cary> about wanting to distinguish between private and public names, but no
>> Cary> such distinction is made in the gdb index.
>>
>> This problem was that .debug_pub* are explicitly for public names, but
>> for GDB we wanted to see all names (so that "break staticfunction" works
>> and "break typo" doesn't cause reading all CUs). ?This particular
>> problem was resolved by you :) saying that we should just change GCC to
>> emit all the names.
>
> No one has actually changed GCC to do that, yet, though -- right?
>
>> The other issue with .debug_pub* was that I implemented a reader for it
>> in gdb and it was too slow. ?It may be possible to fix this -- I don't
>> think I tried lazily reading it.
>
> OK, that actually answers a question I posed a while back -- having a
> complete hashed symbol table does help, compared to just having a
> usable implementation of pubnames/pubtypes.
>
>> Cary> Did you ever reach any conclusion on the value of .debug_aranges vs.
>> Cary> the address table in the gdb index? I saw a gcc patch go by to add an
>> Cary> empty range for compilation units that had no code so that you could
>> Cary> distinguish between an empty arange and a missing arange. Did that
>> Cary> resolve the problem with .debug_aranges? Were there other issues?
>>
>> Putting the range info into the index may have been a mistake.
>> I am not sure why I did this :(
>>
>> I still have arange-reading code on a branch. ?We could resurrect this
>> and drop this information from the index.
>
> I may be interested in doing that at a later point, but I think I'll
> start by using it as you've designed it.
>
>> I think at the time I was concerned that the arange was incorrect. ?I
>> did find some arange bugs in the wild (all since fixed) when we first
>> used that code. ?Also I think I had a concern, probably theoretical,
>> that since aranges is optional it might not appear -- but it seems to me
>> now that this could be accounted for at index creation time.
>
> In gold, I plan to check for DW_AT_ranges or DW_AT_{low,high}_pc in
> the compile_unit DIE, as I'm doing now. That much of parsing the
> debug_info isn't that expensive.
>
>> I think the biggest difficulty is in C++ name canonicalization.
>>
>> My understanding (Jan and Keith are the experts here) is that DW_AT_name
>> does not always agree with the demangler; but one also cannot always
>> rely on the demangler because not all entities are given a
>> DW_linkage_name (perhaps fixed in newer GCC versions? ?Dodji would
>> know, or anyway it is in bugzilla).
>
> I think the biggest problem is an issue I've seen where the DIE for a
> class isn't properly nested inside the right context (even after
> following DW_AT_specification). There's an ugly workaround for this in
> gdb (by looking for a member subprogram with a linkage name, and
> extracting the class name from that), which I copied into gold. As
> part of this work, I'd like to fix that bug in GCC, and any others
> like it.
>
>> I think it is worth considering changes to the index, even radical ones,
>> if it would make your solution better. ?The index is purely an ad hoc
>> invention and should, IMO, be considered as mutable as any other piece.
>
> Definitely something to pursue, after I get it working as is.
>
> Thanks for doing all this in the first place, and for your help!
>
> -cary
>
Follow-Ups:
- Re: Generating gdb index at link time
  - From: Dodji Seketeli
- Re: Generating gdb index at link time
  - From: Daniel Jacobowitz
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]