This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: New feature "source-id"


On Mon, Mar 17, 2014 at 5:48 PM, Bruce Dawson <bruced@valvesoftware.com> wrote:
> Hi, I thought I'd chime in since I'm the one who suggested this idea (at Steam Dev Days), and I have a (cruder, not worth sharing) version of this which we have been using for about a year, so I speak from experience.
>
> The size issue is not really a code-size issue but a file-size issue. Traditionally the debug information has been in strippable sections so that non-developer users don't need to pay any price (download bandwidth, storage) for debug information which they don't care about. I think that the source-id information falls into the same category. A simple hello world program could end up with many KB of source-id information and it would be a shame to have that in the data segment, loaded or not. Stripped ELF files don't have debug information, and they shouldn't have source-id information, philosophically and practically.
>
> Getting something into the .note section is done in our build system by using objcopy -- add-section. For some reason we do this in our build pipeline *after* we've stripped the symbols so we actually add the section to our .dbg file rather than the original .so file. We then have to remove and recreate the .gnu_debuglink. Adding the .note section to the original .so file would be cleaner since then the normal stripping steps would just handle it like other debug information. I think I did it wrong because it was simpler, but it might have been through ignorance.

It might be serendipitous that you do it afterwards.
Note that strip doesn't remove .note.gnu.build-id.
I'm guessing there's a way to mark a .note section as strippable, but
I only skimmed binutils.

> The way that our workflow works is that after we have done a build we use objdump -Wl (slightly customized to reduce the volume of information) get a list of all of the source files used to create a shared object. Then we query our source-control system for the version numbers associated with the files (we use Perforce). We then inject this data (a mapping from each local file system path to a Perforce path/version information) into the custom section. We have a hacky system for getting gdb to find the files, and we are looking forward to Gerhard's superior system.
>
> I don't see how this would work with a source file. You can easily know the set of input files (which includes header files, and source files from archive files) until the build has completed. If you put the mappings in a source file then you would  need to compile that after the initial link and then relink. That would be inefficient. So, you would need a way to find the list of source files (including header files) prior to building. That sounds messy.

I can imagine generating a .c file (say, could even be .S) from the
post-link binary that contains source information, compile it, and
then using objcopy to add the resultant section with the source
information into the resultant binary.  The key here being that source
file information gets put in a specific section so it's easy to do
this.  If one is splitting the debug information into a separate file,
one could objcopy --add-section the source-file-section there instead.
 There's no real "relinking" here.  I don't see the difference with
what you're doing now.

> BTW, one thing to think about is that this system should produce reproducible results. The gcc way is that if your input files have not changed then you should get an identical output. With Perforce this automatically works -- the file version numbers only change if new versions of the files have been submitted. For VCS systems with a global version number (git, IIRC) some care must be taken. If the global version number is used to specify which versions of the files to retrieve then a check-in of an unrelated file will change the source-id information and make the binaries not match. I believe that would be bad. This doesn't affect Gerhard's code at all, but it does affect how a sample script to create the source-id information should work.
>
> Finally, I think that a new command-line option to objdump to dump *just* source paths would be helpful. objdump -Wl is too slow -- something better is needed and I could not find anything. The patch to objdump would be trivial. Or a stand-alone tool could be created, although that is less tempting.
>
> Thanks to Gerhard for pushing this through. I dream of a future where I can step through the code in (almost) any package and have symbols and source code magically show up on-demand.
>
> Bruce Dawson, Valve
>
> -----Original Message-----
> From: gdb-patches-owner@sourceware.org [mailto:gdb-patches-owner@sourceware.org] On Behalf Of Doug Evans
> Sent: Monday, March 17, 2014 5:26 PM
> To: Gerhard Gappmeier
> Cc: gdb-patches
> Subject: Re: New feature "source-id"
>
> On Mon, Mar 17, 2014 at 12:01 PM, Gerhard Gappmeier <gerhard.gappmeier@ascolab.com> wrote:
>>> > example vcsinfo.c:
>>> > /* this file was genarated, bla bla, don't modifiy */ static const
>>> > char vcs_type[] = "git"; static const char vcs_url[] =
>>> > "git@github.com:gergap/source-id.git"
>>> > static const char vcs_version[] =
>>> > "c2ec66e6a36451ba47422d186fd97311989ef278"
>>> I think its weird to store this in .rodata instead of somewhere it
>>> can be easily stripped, especially if you plan on adding the sha1
>>> file hashes through this same mechanism, since that is a less
>>> constant size, though you did mention adding that to the debug info
>>> specifically.
>> I agree. That's a good point. I think we should stay with the original
>> idea of having a .note section. It is also more consistent with the build-id feature.
>
> I agree the consistency of .note is nice, but I wouldn't preclude people wanting something different.
> Getting something into a .note section may involve more build changes than some group may want to take on.
>
>> Another argument against adding this to the source might be code size.
>> For small programs on embedded devices memory matters, so saving these
>> strings would be a benefit. The .note section can be stripped and the
>> feature would still work with the "separate-debug-info" approach.
>
> Technically, even if the info was added to the source (so to speak), it needn't affect code size.
> I can imagine all of these (so called) global variables being put in a specific section which is put in a non-loadable segment.
>
> The solution in gdb needn't preclude any implementation, that is up to the script.
> So, assuming the community wants this feature, let's separate out how the source information is obtained from how gdb uses it.
>
> btw, If Python doesn't have a library for reading ELF files, it should.
> Thus we needn't hardcode anything about where the data lives into gdb
> - leave it to the externally supplied script.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]