This is the mail archive of the
gdb-patches@sources.redhat.com
mailing list for the GDB project.
RFA/symtab: (Almost) always hash blocks when searching them
- From: Daniel Jacobowitz <drow at mvista dot com>
- To: gdb-patches at sources dot redhat dot com
- Date: Sun, 9 Feb 2003 17:03:21 -0500
- Subject: RFA/symtab: (Almost) always hash blocks when searching them
I'm working on modifying the symbol lookup functions to return multiple
symbols when there are multiple possible matches, and I didn't want to have
to modify all three kinds of binary search in lookup_block_symbol. First I
tried fixing mdebugread.c to generate hashed blocks properly; it was too
messy, and I couldn't build an mdebug toolchain to test with [mips-ecoff was
my best guess, and it's been broken for months. Part of it was my fault and
then GCC started segfaulting after I fixed that].
So instead, I added a new function to hash a block retroactively. Then, in
lookup_block_symbol, where we would previously have done a binary search we
instead hash the block and do a hash table search. Amortized cost is
somewhat lower, complexity cost is much lower. I like it.
I also updated the comments; the bit about not matching demangled names was
out of date. Symbols are hashed by their demangled name, if any.
Is this patch OK?
--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer
2003-02-09 Daniel Jacobowitz <drow@mvista.com>
* Makefile.in (symtab.o): Add $(buildsym_h) and $(gdb_assert_h).
* buildsym.c (hash_block): New function.
* buildsym.h: Add prototype for hash_block.
* symtab.c: Include "buildsym.h" and "gdb_assert.h".
(lookup_block_symbol): Use hash_block. Remove binary search.
Update comments.
Index: Makefile.in
===================================================================
RCS file: /cvs/src/src/gdb/Makefile.in,v
retrieving revision 1.328
diff -u -p -r1.328 Makefile.in
--- Makefile.in 7 Feb 2003 05:33:44 -0000 1.328
+++ Makefile.in 9 Feb 2003 21:52:05 -0000
@@ -2216,7 +2216,7 @@ symtab.o: symtab.c $(defs_h) $(symtab_h)
$(gdbcmd_h) $(call_cmds_h) $(gdb_regex_h) $(expression_h) \
$(language_h) $(demangle_h) $(inferior_h) $(linespec_h) \
$(filenames_h) $(gdb_obstack_h) $(gdb_string_h) $(gdb_stat_h) \
- $(cp_abi_h) $(source_h)
+ $(cp_abi_h) $(source_h) $(buildsym_h) $(gdb_assert_h)
target.o: target.c $(defs_h) $(gdb_string_h) $(target_h) $(gdbcmd_h) \
$(symtab_h) $(inferior_h) $(bfd_h) $(symfile_h) $(objfiles_h) \
$(gdb_wait_h) $(dcache_h) $(regcache_h)
Index: buildsym.c
===================================================================
RCS file: /cvs/src/src/gdb/buildsym.c,v
retrieving revision 1.27
diff -u -p -r1.27 buildsym.c
--- buildsym.c 14 Jan 2003 00:15:05 -0000 1.27
+++ buildsym.c 9 Feb 2003 21:52:05 -0000
@@ -203,6 +203,56 @@ free_pending_blocks (void)
pending_blocks = NULL;
}
+/* Convert a block BLOCK with unhashed symbols into a block with
+ hashed symbols.
+
+ All blocks coming out of the debug info readers should either be
+ function blocks (which contain a list of arguments in addition to
+ locals, so we don't hash them [yet] in order to preserve the order
+ of the arguments) or already hashed. However, some older readers
+ don't use finish_block, so they don't create hashed blocks. The only
+ offender as of 2003-02-09 is mdebugread. */
+
+void
+hash_block (struct block *block)
+{
+ int i, nsyms;
+ struct symbol **syms;
+
+ gdb_assert (BLOCK_FUNCTION (block) == NULL);
+ gdb_assert (BLOCK_HASHTABLE (block) == 0);
+
+ nsyms = BLOCK_NSYMS (block);
+
+ /* Save the current list of symbols. */
+ syms = xmalloc (nsyms * sizeof (struct symbol *));
+ memcpy (syms, block->sym, nsyms * sizeof (struct symbol *));
+ memset (block->sym, 0, nsyms * sizeof (struct symbol *));
+
+ BLOCK_HASHTABLE (block) = 1;
+
+ /* Just reuse the already-allocated symbol pointers as our hashtable.
+ Normally we'd use a smaller hash table than this, but reclaiming
+ the memory at this point is hard. */
+ BLOCK_BUCKETS (block) = nsyms;
+
+ for (i = 0; i < nsyms; i++)
+ {
+ struct symbol *sym;
+ unsigned int hash_index;
+ const char *name = SYMBOL_DEMANGLED_NAME (syms[i]);
+ if (name == NULL)
+ name = SYMBOL_NAME (syms[i]);
+ hash_index = msymbol_hash_iw (name);
+ hash_index = hash_index % BLOCK_BUCKETS (block);
+ sym = BLOCK_BUCKET (block, hash_index);
+ BLOCK_BUCKET (block, hash_index) = syms[i];
+ syms[i]->hash_next = sym;
+ }
+
+ xfree (syms);
+}
+
/* Take one of the lists of symbols and make a block from it. Keep
the order the symbols have in the list (reversed from the input
file). Put the block on the list of pending blocks. */
Index: buildsym.h
===================================================================
RCS file: /cvs/src/src/gdb/buildsym.h,v
retrieving revision 1.10
diff -u -p -r1.10 buildsym.h
--- buildsym.h 9 Jan 2003 18:30:32 -0000 1.10
+++ buildsym.h 9 Feb 2003 21:52:05 -0000
@@ -227,6 +227,8 @@ extern void add_symbol_to_list (struct s
extern struct symbol *find_symbol_in_list (struct pending *list,
char *name, int length);
+extern void hash_block (struct block *block);
+
extern void finish_block (struct symbol *symbol,
struct pending **listhead,
struct pending_block *old_blocks,
Index: symtab.c
===================================================================
RCS file: /cvs/src/src/gdb/symtab.c,v
retrieving revision 1.87
diff -u -p -r1.87 symtab.c
--- symtab.c 4 Feb 2003 18:07:01 -0000 1.87
+++ symtab.c 9 Feb 2003 21:52:06 -0000
@@ -40,17 +40,17 @@
#include "linespec.h"
#include "source.h"
#include "filenames.h" /* for FILENAME_CMP */
+#include "cp-abi.h"
+#include "buildsym.h"
#include "hashtab.h"
-
#include "gdb_obstack.h"
-
#include <sys/types.h>
#include <fcntl.h>
#include "gdb_string.h"
#include "gdb_stat.h"
#include <ctype.h>
-#include "cp-abi.h"
+#include "gdb_assert.h"
/* Prototypes for local functions */
@@ -1572,30 +1572,23 @@ find_main_psymtab (void)
return (NULL);
}
-/* Search BLOCK for symbol NAME in NAMESPACE.
-
- Note that if NAME is the demangled form of a C++ symbol, we will fail
- to find a match during the binary search of the non-encoded names, but
- for now we don't worry about the slight inefficiency of looking for
- a match we'll never find, since it will go pretty quick. Once the
- binary search terminates, we drop through and do a straight linear
- search on the symbols. Each symbol which is marked as being a C++
- symbol (language_cplus set) has both the encoded and non-encoded names
- tested for a match.
-
- If MANGLED_NAME is non-NULL, verify that any symbol we find has this
- particular mangled name.
-*/
+/* Search BLOCK for symbol NAME in NAMESPACE. If MANGLED_NAME is non-NULL,
+ verify that any symbol we find has this particular mangled name. */
struct symbol *
lookup_block_symbol (register const struct block *block, const char *name,
const char *mangled_name,
const namespace_enum namespace)
{
- register int bot, top, inc;
- register struct symbol *sym;
- register struct symbol *sym_found = NULL;
- register int do_linear_search = 1;
+ int bot, top;
+ struct symbol *sym;
+ struct symbol *sym_found = NULL;
+
+ if (BLOCK_SHOULD_SORT (block))
+ {
+ hash_block (block);
+ gdb_assert (BLOCK_HASHTABLE (block));
+ }
if (BLOCK_HASHTABLE (block))
{
@@ -1613,88 +1606,8 @@ lookup_block_symbol (register const stru
return NULL;
}
- /* If the blocks's symbols were sorted, start with a binary search. */
-
- if (BLOCK_SHOULD_SORT (block))
- {
- /* Reset the linear search flag so if the binary search fails, we
- won't do the linear search once unless we find some reason to
- do so */
-
- do_linear_search = 0;
- top = BLOCK_NSYMS (block);
- bot = 0;
-
- /* Advance BOT to not far before the first symbol whose name is NAME. */
-
- while (1)
- {
- inc = (top - bot + 1);
- /* No need to keep binary searching for the last few bits worth. */
- if (inc < 4)
- {
- break;
- }
- inc = (inc >> 1) + bot;
- sym = BLOCK_SYM (block, inc);
- if (!do_linear_search && (SYMBOL_LANGUAGE (sym) == language_java))
- {
- do_linear_search = 1;
- }
- if (SYMBOL_SOURCE_NAME (sym)[0] < name[0])
- {
- bot = inc;
- }
- else if (SYMBOL_SOURCE_NAME (sym)[0] > name[0])
- {
- top = inc;
- }
- else if (strcmp (SYMBOL_SOURCE_NAME (sym), name) < 0)
- {
- bot = inc;
- }
- else
- {
- top = inc;
- }
- }
-
- /* Now scan forward until we run out of symbols, find one whose
- name is greater than NAME, or find one we want. If there is
- more than one symbol with the right name and namespace, we
- return the first one; I believe it is now impossible for us
- to encounter two symbols with the same name and namespace
- here, because blocks containing argument symbols are no
- longer sorted. The exception is for C++, where multiple functions
- (cloned constructors / destructors, in particular) can have
- the same demangled name. So if we have a particular
- mangled name to match, try to do so. */
-
- top = BLOCK_NSYMS (block);
- while (bot < top)
- {
- sym = BLOCK_SYM (block, bot);
- if (SYMBOL_NAMESPACE (sym) == namespace
- && (mangled_name
- ? strcmp (SYMBOL_NAME (sym), mangled_name) == 0
- : SYMBOL_MATCHES_NAME (sym, name)))
- {
- return sym;
- }
- if (SYMBOL_SOURCE_NAME (sym)[0] > name[0])
- {
- break;
- }
- bot++;
- }
- }
-
- /* Here if block isn't sorted, or we fail to find a match during the
- binary search above. If during the binary search above, we find a
- symbol which is a Java symbol, then we have re-enabled the linear
- search flag which was reset when starting the binary search.
-
- This loop is equivalent to the loop above, but hacked greatly for speed.
+ /* If a block isn't hashable, we do a linear search here. Right now
+ this means only for blocks associated with a function.
Note that parameter symbols do not always show up last in the
list; this loop makes sure to take anything else other than
@@ -1702,55 +1615,53 @@ lookup_block_symbol (register const stru
last resort. Note that this only takes up extra computation
time on a match. */
- if (do_linear_search)
- {
- top = BLOCK_NSYMS (block);
- bot = 0;
- while (bot < top)
+ top = BLOCK_NSYMS (block);
+ bot = 0;
+ while (bot < top)
+ {
+ sym = BLOCK_SYM (block, bot);
+ if (SYMBOL_NAMESPACE (sym) == namespace
+ && (mangled_name
+ ? strcmp (SYMBOL_NAME (sym), mangled_name) == 0
+ : SYMBOL_MATCHES_NAME (sym, name)))
{
- sym = BLOCK_SYM (block, bot);
- if (SYMBOL_NAMESPACE (sym) == namespace
- && (mangled_name
- ? strcmp (SYMBOL_NAME (sym), mangled_name) == 0
- : SYMBOL_MATCHES_NAME (sym, name)))
+ /* If SYM has aliases, then use any alias that is active
+ at the current PC. If no alias is active at the current
+ PC, then use the main symbol.
+
+ ?!? Is checking the current pc correct? Is this routine
+ ever called to look up a symbol from another context?
+
+ FIXME: No, it's not correct. If someone sets a
+ conditional breakpoint at an address, then the
+ breakpoint's `struct expression' should refer to the
+ `struct symbol' appropriate for the breakpoint's
+ address, which may not be the PC.
+
+ Even if it were never called from another context,
+ it's totally bizarre for lookup_symbol's behavior to
+ depend on the value of the inferior's current PC. We
+ should pass in the appropriate PC as well as the
+ block. The interface to lookup_symbol should change
+ to require the caller to provide a PC. */
+
+ if (SYMBOL_ALIASES (sym))
+ sym = find_active_alias (sym, read_pc ());
+
+ sym_found = sym;
+ if (SYMBOL_CLASS (sym) != LOC_ARG &&
+ SYMBOL_CLASS (sym) != LOC_LOCAL_ARG &&
+ SYMBOL_CLASS (sym) != LOC_REF_ARG &&
+ SYMBOL_CLASS (sym) != LOC_REGPARM &&
+ SYMBOL_CLASS (sym) != LOC_REGPARM_ADDR &&
+ SYMBOL_CLASS (sym) != LOC_BASEREG_ARG)
{
- /* If SYM has aliases, then use any alias that is active
- at the current PC. If no alias is active at the current
- PC, then use the main symbol.
-
- ?!? Is checking the current pc correct? Is this routine
- ever called to look up a symbol from another context?
-
- FIXME: No, it's not correct. If someone sets a
- conditional breakpoint at an address, then the
- breakpoint's `struct expression' should refer to the
- `struct symbol' appropriate for the breakpoint's
- address, which may not be the PC.
-
- Even if it were never called from another context,
- it's totally bizarre for lookup_symbol's behavior to
- depend on the value of the inferior's current PC. We
- should pass in the appropriate PC as well as the
- block. The interface to lookup_symbol should change
- to require the caller to provide a PC. */
-
- if (SYMBOL_ALIASES (sym))
- sym = find_active_alias (sym, read_pc ());
-
- sym_found = sym;
- if (SYMBOL_CLASS (sym) != LOC_ARG &&
- SYMBOL_CLASS (sym) != LOC_LOCAL_ARG &&
- SYMBOL_CLASS (sym) != LOC_REF_ARG &&
- SYMBOL_CLASS (sym) != LOC_REGPARM &&
- SYMBOL_CLASS (sym) != LOC_REGPARM_ADDR &&
- SYMBOL_CLASS (sym) != LOC_BASEREG_ARG)
- {
- break;
- }
+ break;
}
- bot++;
}
+ bot++;
}
+
return (sym_found); /* Will be NULL if not found. */
}