This is the mail archive of the gdb-patches@sources.redhat.com mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Fix coff symbol table reading problem for C code compiled by g++

From: Jim Blandy <jimb at redhat dot com>
To: Ton van Overbeek <v-overbeek at cistron dot nl>, Daniel Jacobowitz <drow at false dot org>
Cc: gdb-patches at sources dot redhat dot com
Date: 15 Sep 2004 19:22:31 -0500
Subject: Re: [PATCH] Fix coff symbol table reading problem for C code compiled by g++
References: <Pine.LNX.4.58.0408130200490.24052@picard.cistron.nl>

Ton van Overbeek <v-overbeek@cistron.nl> writes:

> I found a bug/problem in the symbol reading code in symtab.c in gdb-6.2.
> 
> The problem occurs when reading symbols from a coff object file produced
> >From a C source file compiled by g++ (not by gcc).
> The particular compiler is m68k-palmos-gcc, which is still based on gcc-2.95.3.
> See http://prc-tools.sourceforge.net.
> I know this gcc version is very old, but I believe the problem may also exist
> for other compilers.
> 
> When compiling normal C code by g++ it produces mangled function names.
> The coff symbol reader first reads the function name and inserts this in
> the minimal symbol table and in a demangled name hash table in
> symtab_set_names(). When reading the '.bf' symbol the function name is inserted
> in the real symbol table. This time the symbol is already in the hash table
> in symtab_set_names() and symbol_find_demangled_name() is not called. A side
> effect of symbol_find_demangled_name() is that it changes/corrects the
> gsymbol->language field. In the case of 'C compiled by g++' it changes
> it from language_auto to language_cplus.
> Because symbol_find_demangled_name() is not called, the gsymbol->language field
> in the full symbol table stays set to language_auto.
> This causes all kinds of problems when looking up symbols later, since the
> stored name is the mangled name and the demangled name is empty: the symbol is
> not found in the full symbol table and the code falls back on the minimal
> symbol table or e.g. function names.
> When trying to set a breakpoint on a function, the breakpoint is then set
> on the last line of the preceding function.
> 
> I have applied the following fix to ensure that symbol_find_demangled_name()
> is also called in this case. It is working for me. I do not know
> if something else is needed for other languages/compilers/compiler
> versions.

So, let me make sure I understand this correctly:

The essential problem is that symbol_set_names sometimes has the side
effect of setting GSYMBOL->language, and sometimes it doesn't: whether
it does depends on whether that particular mangled name has been seen
before in this objfile, which shouldn't matter.

Here's the thread about introducing demangled_names_hash:

    http://sources.redhat.com/ml/gdb-patches/2003-01/msg00726.html

The main motivation for introducing it was to be able to include
mangled names in the partial symbol tables; we were also hoping to
save time by avoiding calling the demangler.  As it turns out, the
time saved by not calling the demangler was used up (to within 1%) by
the overhead of the patch, so there was no net performance win.
(Assuming you weren't paging...)

The problem with your patch is that it brings back all the calls to
the demangler that the hash table allowed us to avoid: the demangler
gets called every time, whether we've already demangled the symbol
before or not.

I think the fundamental problem is that the hash table only retains
partial information about the results from symbol_find_demangled_name:
it retains the demangled name, but not the language whose demangler we
used.  If we could retain that information, then symbol_set_names
could consistently provide the language.

I see two approaches.  Based on the discussion in the thread, space is
at a premium, so we're only considering things which won't
significantly increase the memory usage.  Specific numbers are from
the test case discussed in the thread.

- Store the language in another byte beyond the demangled name.  This
  makes the form of the hash table entries even less obvious.  It
  would also add 200k of memory consumption.  On the other hand,
  depending on the granularity of obstack_alloc, perhaps many of those
  would fall into the padding at the end of the value.

  The hair could be localized to symbol_set_names, though.

- Have a separate hash table for each language.  In 'struct objfile',
  we'd have:

  struct htab *demangled_names_hashes[nr_languages];

  They'd be allocated lazily.  One would need to probe all hash tables
  before deciding that a symbol hadn't been seen yet (or, only the
  hash tables that'd actually been allocated, typically only one
  unless you're mixing languages).  Then, the index of the hash table
  you'd found your name in would tell you the language.

  This would entail a lot of changes elsewhere to properly initialize
  and free demangled_names_hashes.

Daniel, what do you think?  Have I at least got the problem right?

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]