This is the mail archive of the gdb-patches@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFA: Search for symbol names the same way they're hashed.


On Tue, 1 Oct 2002 22:29:14 -0500, Jim Blandy <jimb@redhat.com> said:

> But when MANGLED_NAME is zero, then the SYMBOL_MATCHES_NAME part is
> questionable.  The definition of SYMBOL_MATCHES_NAME from symtab.h is
> as follows:

> #define SYMBOL_MATCHES_NAME(symbol, name)				\
>   (STREQ (SYMBOL_NAME (symbol), (name))					\
>    || (SYMBOL_DEMANGLED_NAME (symbol) != NULL				\
>        && strcmp_iw (SYMBOL_DEMANGLED_NAME (symbol), (name)) == 0))

> It returns true if NAME matches either SYMBOL_NAME or
> SYMBOL_DEMANGLED_NAME.

> If the intention really was to use SYMBOL_MATCHES_NAME to recognize
> matches, then the code is broken.  If SYMBOL_NAME (sym) matches
> NAME, and sym has a demangled name which is different from NAME
> (which will usually be the case), SYMBOL_MATCHES_NAME (sym, name)
> will be true, but finish_block will have hashed on the demangled
> name, and probably will have filed sym in a different hash bucket
> than the one we'll search.

I was thinking about this some last week, albeit from a slightly
different angle, and I agree with you that this doesn't work well.  My
attitude is that, in lookup_block_symbol, 'name' should be the
demangled form of the name.  (After all, there's a separate
'mangled_name' argument, not a separate 'demangled_name' argument.)
Assuming of course that we're in the C++ case where there's a
difference between mangled and demangled names, then using the current
definition of SYMBOL_MATCHES_NAME is wrong: it first does

  STREQ (SYMBOL_NAME (symbol), (name)

which tests 'name' against the _mangled_ form of the name, which we
never want to do, so this could lead to false positives.  It's not
very likely to run into false positives, of course, because a
demangled name should never look like a mangled name, but why run the
risk?  (Though I hadn't thought of the interaction with hashing that
you brought up: that certainly decreases the likelihood of false
positives.)

My current solution is perhaps a bit hard to read from the sources on
my branch, but basically it boils down to this:

1) This concept of 'a name that is as demangled as possible' is a
pretty important one and occurs in multiple places in GDB's sources,
so let's formalize it:

/* Macro that returns the demangled name of the symbol if if possible
   and the symbol name if not possible.  This is like
   SYMBOL_SOURCE_NAME except that it doesn't depend on the value of
   'demangle' (and is hence more suitable for internal usage).  The
   result should never be NULL.  */

/* FIXME: carlton/2002-09-26: Probably the situation with this and
   SYMBOL_SOURCE_NAME should be rethought.  */

#define SYMBOL_BEST_NAME(symbol)					\
  (SYMBOL_DEMANGLED_NAME (symbol) != NULL				\
   ? SYMBOL_DEMANGLED_NAME (symbol)					\
   : SYMBOL_NAME (symbol))

(I'm not exactly wedded to the name 'SYMBOL_BEST_NAME'; it was the
first acceptable name that I thought of, given that SYMBOL_SOURCE_NAME
was already taken to mean something subtly different.)

I then went and inserted this macro in some (but by no means all: I
haven't yet had time to do an exhaustive search) of all the places in
GDB where it would fit.  E.g. the definition for SYMBOL_SOURCE_NAME
now becomes

/* Macro that returns the "natural source name" of a symbol.  In C++ this is
   the "demangled" form of the name if demangle is on and the "mangled" form
   of the name if demangle is off.  In other languages this is just the
   symbol name.  The result should never be NULL. */

/* NOTE: carlton/2002-09-26: For external use only; in many
   situations, SYMBOL_BEST_NAME is more appropriate.  */

#define SYMBOL_SOURCE_NAME(symbol)					\
  (demangle ? SYMBOL_BEST_NAME (symbol) : SYMBOL_NAME (symbol))

or, when I'm hashing symbol names to build the hash table, I do

	  hash_index = msymbol_hash_iw (SYMBOL_BEST_NAME (sym)) % nbuckets;

2) So then what happens to lookup_block_symbol?  When looking up names
in a hash table, I find the index just like I do when building it.
Then, to test whether or not I've found the right symbol, if
'mangled_name' is nonzero, I do

  strcmp (SYMBOL_NAME (sym), mangled_name)

and otherwise, I do

  strcmp_iw (SYMBOL_BEST_NAME (sym), name)


Some further thoughts:

It seems entirely plausible to me that SYMBOL_MATCHES_NAME should
really be defined as follows:

#define SYMBOL_MATCHES_NAME(symbol, name) \
  (strcmp_iw (SYMBOL_BEST_NAME (sym), name) == 0)

But I haven't seriously looked at the other places where
SYMBOL_MATCHES_NAME is being used.  I suppose another plausible option
would be

#define SYMBOL_MATCHES_NAME(symbol, name) \
  (SYMBOL_DEMANGLED_NAME (sym) != NULL
   ? strcmp_iw (SYMBOL_DEMANGLED_NAME (sym), name) == 0
   : strcmp (SYMBOL_NAME (sym), name) == 0)

(This is effectively what you're doing in your patch to
lookup_block_symbol.)  The difference here would be that we use strcmp
in the non-C++ case instead of strcmp_iw.  My guess is that, in
practice, it wouldn't make a difference in the non-C++ case (since
symbol names shouldn't contain spaces or parentheses) and, if it
somehow did make a difference, then one could make a good case for
strcmp_iw being the right thing.  Given that I want to encourage usage
of SYMBOL_BEST_NAME wherever possible, I prefer the first definition
of SYMBOL_MATCHES_NAME.

If people agree with me that introducing a macro like SYMBOL_BEST_NAME
would clarify matters here and elsewhere, I could spend some more time
looking at where it could be used and submit a patch that introduces
it.

David Carlton
carlton@math.stanford.edu


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]