This is the mail archive of the
gdb-patches@sources.redhat.com
mailing list for the GDB project.
Re: [RFA] Kill some linear searches in minsyms.c
- From: Elena Zannoni <ezannoni at redhat dot com>
- To: Daniel Jacobowitz <drow at mvista dot com>
- Cc: gdb-patches at sources dot redhat dot com, jimb at redhat dot com, ezannoni at redhat dot com
- Date: Mon, 6 Jan 2003 14:15:34 -0500
- Subject: Re: [RFA] Kill some linear searches in minsyms.c
- References: <20030106042203.GA28848@nevyn.them.org>
Daniel Jacobowitz writes:
> So uh, yeah, and stuff. I found an interesting one today when I wasn't even
> looking for it. Earlier today I posted an updated patch to make it easy,
> very easy, to profile GDB. Here's a pretty good sample of why I think this
> is important. I was actually looking for something completely different,
> but I noticed this on the way.
>
> Consider this patch:
>
> 2000-03-06 Jim Blandy <jimb@redhat.com>
>
> From Tom Tromey <tromey@cygnus.com> and Keith Seitz <?>:
>
> * minsyms.c: #include <ctype.h>, for msymbol_hash_iw.
> (compact_minimal_symbols): Added `objfile' argument.
> Put symbols in the objfile's hash table.
> (install_minimal_symbols): Put symbols in the objfile's demangled
> hash table.
> (lookup_minimal_symbol): Use hash table to find symbol in
> objfile.
> (msymbol_hash_iw, msymbol_hash, add_minsym_to_hash_table): New
> functions.
> (prim_record_minimal_symbol_and_info): Initialize the
> hash link fields of the new minimal symbol.
> * symtab.h (struct minimal_symbol): New fields `hash_next',
> `demangled_hash_next'.
> (msymbol_hash_iw, msymbol_hash, add_minsym_to_hash_table): Declare.
> * objfiles.h (MINIMAL_SYMBOL_HASH_SIZE): New define.
> (struct objfile): New fields `msymbol_hash',
> `msymbol_demangled_hash'.
>
> These replaced a linear search with a hash table. We'd better use it, too,
> since a good-sized application has an awful lot of minimal symbols.
> However, two other related functions (that's lookup_minimal_symbol_text and
> lookup_minimal_symbol_solib_trampoline) were not converted. These get
> called reasonably often, vis:
>
What command gdb executing at this point?
> 0.75 0.43 62/310 create_overlay_event_breakpoint [16]
> 3.01 1.72 248/310 create_longjmp_breakpoint [5]
> [4] 48.1 3.76 2.16 310 lookup_minimal_symbol_text [4]
>
> That "3.76" is in _seconds_, by the way. Then they proceed to do:
> 1.26 0.00 9567591/9567734 strcmp_iw [15]
> 0.90 0.00 27551335/30742029 symbol_demangled_name [17]
>
> 27.5 million calls to a wrapper function. My first instinct was that a
> wrapper function was a bad idea, then. My second instinct was to smack my
> first instinct upside the head and find out why we were calling it so many
> times, since I knew we hashed minsyms. Converting them to the hash table
> once I found the problem was quite easy. The patch is attached; is it OK to
> commit? No regressions on i386-pc-linux-gnu, for what that's worth, and
> it's quite straightforward.
>
how do the numbers for those functions look after your patch?
>
>
> Future things to examine:
>
> Before, GDB's CPU time to load symbols for all of mozilla, get through a
> large number of threaded shared library events (ow, but I'm glad the pread64
> patch is in, or this measurement would have been just impossible to sort out
> of the noise), and reach the first dialog box was 26.630000 seconds. After
> the attached patch, 17.820000 seconds. Much better. The new hot spots are:
>
> 21.32 1.42 1.42 1594586 0.00 0.00 hash
> 10.66 2.13 0.71 185486 0.00 0.00 find_corresponding_bincl_psymtab
> 8.11 2.67 0.54 1128766 0.00 0.00 bcache
> 6.76 3.12 0.45 41 10.98 91.28 read_dbx_symtab
> 6.31 3.54 0.42 1519 0.28 0.30 end_psymtab
>
> I suspect we can cut find_corresponding_bincl_psymtab out of the way too;
> and maybe the bcache needs some rethinking into a more specialized data
> structure since we're calling it so often; but maybe there's nothing that
> can be done about that (this test has a terrifyingly large number of symbols
> in it). There's 3.3 million calls to mmalloc, too - a million of them are
> from a call to save_inferior_ptid in thread_db_xfer_memory. And
> symbol_demangled_name is still on the top ten, via compare_psymbols. That's
> another good candidate for a clever data structure if it becomes worth the
> time, which it isn't in this test.
>
> --
> Daniel Jacobowitz
> MontaVista Software Debian GNU/Linux Developer
>
> 2003-01-05 Daniel Jacobowitz <drow@mvista.com>
>
> * minsyms.c (enum ms_search_which): New.
> (lookup_minimal_symbol_internal): New function, broken out from
> lookup_minimal_symbol.
> (lookup_minimal_symbol, lookup_minimal_symbol_text)
> (lookup_minimal_symbol_solib_trampoline): Use them.
^^^^
lookup_minimal_symbol_internal???
I don't like the _internal part, could you call it '_aux' instead?
This seems more in line with the rest of gdb's names. Hmm, but maybe
we already have a function by that name. I also winder how much work
would it be to eventually remove these new functions, that now become
wrappers.
>
> Index: minsyms.c
> ===================================================================
> RCS file: /cvs/src/src/gdb/minsyms.c,v
> retrieving revision 1.22
> diff -u -p -r1.22 minsyms.c
> --- minsyms.c 11 Jul 2002 20:46:19 -0000 1.22
> +++ minsyms.c 6 Jan 2003 04:19:16 -0000
> @@ -1,5 +1,6 @@
> /* GDB routines for manipulating the minimal symbol tables.
> - Copyright 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001
> + Copyright 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001,
> + 2002, 2003
> Free Software Foundation, Inc.
> Contributed by Cygnus Support, using pieces from other GDB modules.
>
> @@ -132,11 +133,18 @@ add_minsym_to_demangled_hash_table (stru
> }
> }
>
> +enum ms_search_which {
> + ms_search_all,
> + ms_search_text,
> + ms_search_trampoline
> +};
>
> /* Look through all the current minimal symbol tables and find the
> first minimal symbol that matches NAME. If OBJF is non-NULL, limit
> - the search to that objfile. If SFILE is non-NULL, limit the search
> - to that source file. Returns a pointer to the minimal symbol that
> + the search to that objfile. If SFILE is non-NULL, only find file-scope
> + symbols from that source file (global symbols are still preferred).
> + WHICH indicates which minimal symbols to accept: all, text symbols only,
> + or trampolines only. Returns a pointer to the minimal symbol that
> matches, or NULL if no match is found.
>
> Note: One instance where there may be duplicate minimal symbols with
> @@ -144,9 +152,10 @@ add_minsym_to_demangled_hash_table (stru
> symbol tables for an executable contain global symbols with the same
> names (the dynamic linker deals with the duplication). */
>
> -struct minimal_symbol *
> -lookup_minimal_symbol (register const char *name, const char *sfile,
> - struct objfile *objf)
> +static struct minimal_symbol *
> +lookup_minimal_symbol_internal (register const char *name, const char *sfile,
> + struct objfile *objf,
> + enum ms_search_which which)
> {
> struct objfile *objfile;
> struct minimal_symbol *msymbol;
> @@ -170,68 +179,77 @@ lookup_minimal_symbol (register const ch
> objfile != NULL && found_symbol == NULL;
> objfile = objfile->next)
> {
> - if (objf == NULL || objf == objfile)
> + int pass;
> + if (objf && objf != objfile)
> + continue;
> +
> + /* Do two passes: the first over the ordinary hash table,
> + and the second over the demangled hash table. */
> + for (pass = 1; pass <= 2 && found_symbol == NULL; pass++)
> {
> - /* Do two passes: the first over the ordinary hash table,
> - and the second over the demangled hash table. */
> - int pass;
> -
> - for (pass = 1; pass <= 2 && found_symbol == NULL; pass++)
> + /* Select hash list according to pass. */
> + if (pass == 1)
> + msymbol = objfile->msymbol_hash[hash];
> + else
> + msymbol = objfile->msymbol_demangled_hash[dem_hash];
> +
> + for (; msymbol != NULL && found_symbol == NULL;
> + msymbol = (pass == 1)
> + ? msymbol->hash_next
> + : msymbol->demangled_hash_next)
please, don't do this. This is much worse than the previous while, with the
updates at the end.
> {
> - /* Select hash list according to pass. */
> - if (pass == 1)
> - msymbol = objfile->msymbol_hash[hash];
> - else
> - msymbol = objfile->msymbol_demangled_hash[dem_hash];
> + if (!SYMBOL_MATCHES_NAME (msymbol, name))
> + continue;
>
> - while (msymbol != NULL && found_symbol == NULL)
> + if (which == ms_search_trampoline)
> {
> - if (SYMBOL_MATCHES_NAME (msymbol, name))
> - {
> - switch (MSYMBOL_TYPE (msymbol))
> - {
> - case mst_file_text:
> - case mst_file_data:
> - case mst_file_bss:
> + if (MSYMBOL_TYPE (msymbol) == mst_solib_trampoline)
> + return msymbol;
> + continue;
> + }
> +
> + if (which == ms_search_text
> + && MSYMBOL_TYPE (msymbol) != mst_file_text
> + && MSYMBOL_TYPE (msymbol) != mst_text)
> + continue;
> +
I don't like the fact that the check for ms_search_all is buried
inside the switch statement, while the other 2 are here. Could you
have an outer switch for the ms_search_*? Also with your changes will
the internal switch case mst_solib_trampoline ever be reached? It
seems like we would find it earlier and return right away.
The whole flow of control of this function seems a bit confused.
> + switch (MSYMBOL_TYPE (msymbol))
> + {
> + case mst_file_data:
> + case mst_file_bss:
> + case mst_file_text:
> #ifdef SOFUN_ADDRESS_MAYBE_MISSING
> - if (sfile == NULL || STREQ (msymbol->filename, sfile))
> - found_file_symbol = msymbol;
> + if (sfile == NULL || STREQ (msymbol->filename, sfile))
> + found_file_symbol = msymbol;
> #else
> - /* We have neither the ability nor the need to
> - deal with the SFILE parameter. If we find
> - more than one symbol, just return the latest
> - one (the user can't expect useful behavior in
> - that case). */
> - found_file_symbol = msymbol;
> + /* We have neither the ability nor the need to deal
> + with the SFILE parameter. If we find more than
> + one symbol, just return the latest one (the user
> + can't expect useful behavior in that case). */
> + found_file_symbol = msymbol;
> #endif
> - break;
> + break;
>
> - case mst_solib_trampoline:
> + case mst_solib_trampoline:
>
> - /* If a trampoline symbol is found, we prefer to
> - keep looking for the *real* symbol. If the
> - actual symbol is not found, then we'll use the
> - trampoline entry. */
> - if (trampoline_symbol == NULL)
> - trampoline_symbol = msymbol;
> - break;
> -
> - case mst_unknown:
> - default:
> - found_symbol = msymbol;
> - break;
> - }
> - }
> -
> - /* Find the next symbol on the hash chain. */
> - if (pass == 1)
> - msymbol = msymbol->hash_next;
> - else
> - msymbol = msymbol->demangled_hash_next;
> + /* If a trampoline symbol is found, we prefer to
> + keep looking for the *real* symbol. If the actual
> + symbol is not found, then we'll use the
> + trampoline entry. */
> + if (trampoline_symbol == NULL)
> + trampoline_symbol = msymbol;
> + break;
> +
> + case mst_unknown:
> + default:
> + if (which == ms_search_all)
> + found_symbol = msymbol;
> + break;
> }
> }
> }
> }
> +
> /* External symbols are best. */
> if (found_symbol)
> return found_symbol;
> @@ -247,127 +265,28 @@ lookup_minimal_symbol (register const ch
> return NULL;
> }
>
> -/* Look through all the current minimal symbol tables and find the
> - first minimal symbol that matches NAME and of text type.
> - If OBJF is non-NULL, limit
> - the search to that objfile. If SFILE is non-NULL, limit the search
> - to that source file. Returns a pointer to the minimal symbol that
> - matches, or NULL if no match is found.
> - */
Please, don't deleted the comments, they are still relevant, in a sense.
I mean, the function still searches for the NAME symbol of text type.
Do we have a comment for this function?
> +struct minimal_symbol *
> +lookup_minimal_symbol (register const char *name, const char *sfile,
> + struct objfile *objf)
> +{
> + return lookup_minimal_symbol_internal (name, sfile, objf, ms_search_all);
> +}
>
> struct minimal_symbol *
> lookup_minimal_symbol_text (register const char *name, const char *sfile,
> struct objfile *objf)
> {
> - struct objfile *objfile;
> - struct minimal_symbol *msymbol;
> - struct minimal_symbol *found_symbol = NULL;
> - struct minimal_symbol *found_file_symbol = NULL;
> -
> -#ifdef SOFUN_ADDRESS_MAYBE_MISSING
> - if (sfile != NULL)
> - {
> - char *p = strrchr (sfile, '/');
> - if (p != NULL)
> - sfile = p + 1;
> - }
> -#endif
> -
> - for (objfile = object_files;
> - objfile != NULL && found_symbol == NULL;
> - objfile = objfile->next)
> - {
> - if (objf == NULL || objf == objfile)
> - {
> - for (msymbol = objfile->msymbols;
> - msymbol != NULL && SYMBOL_NAME (msymbol) != NULL &&
> - found_symbol == NULL;
> - msymbol++)
> - {
> - if (SYMBOL_MATCHES_NAME (msymbol, name) &&
> - (MSYMBOL_TYPE (msymbol) == mst_text ||
> - MSYMBOL_TYPE (msymbol) == mst_file_text))
> - {
> - switch (MSYMBOL_TYPE (msymbol))
> - {
> - case mst_file_text:
> -#ifdef SOFUN_ADDRESS_MAYBE_MISSING
> - if (sfile == NULL || STREQ (msymbol->filename, sfile))
> - found_file_symbol = msymbol;
> -#else
> - /* We have neither the ability nor the need to
> - deal with the SFILE parameter. If we find
> - more than one symbol, just return the latest
> - one (the user can't expect useful behavior in
> - that case). */
> - found_file_symbol = msymbol;
> -#endif
> - break;
> - default:
> - found_symbol = msymbol;
> - break;
> - }
> - }
> - }
> - }
> - }
> - /* External symbols are best. */
> - if (found_symbol)
> - return found_symbol;
> -
> - /* File-local symbols are next best. */
> - if (found_file_symbol)
> - return found_file_symbol;
> -
> - return NULL;
> + return lookup_minimal_symbol_internal (name, sfile, objf, ms_search_text);
> }
>
> -/* Look through all the current minimal symbol tables and find the
> - first minimal symbol that matches NAME and of solib trampoline type.
> - If OBJF is non-NULL, limit
> - the search to that objfile. If SFILE is non-NULL, limit the search
> - to that source file. Returns a pointer to the minimal symbol that
> - matches, or NULL if no match is found.
> - */
> -
Same here, leave the comment (updated of course).
> struct minimal_symbol *
> lookup_minimal_symbol_solib_trampoline (register const char *name,
> - const char *sfile, struct objfile *objf)
> + const char *sfile,
> + struct objfile *objf)
> {
> - struct objfile *objfile;
> - struct minimal_symbol *msymbol;
> - struct minimal_symbol *found_symbol = NULL;
> -
> -#ifdef SOFUN_ADDRESS_MAYBE_MISSING
> - if (sfile != NULL)
> - {
> - char *p = strrchr (sfile, '/');
> - if (p != NULL)
> - sfile = p + 1;
> - }
> -#endif
> -
> - for (objfile = object_files;
> - objfile != NULL && found_symbol == NULL;
> - objfile = objfile->next)
> - {
> - if (objf == NULL || objf == objfile)
> - {
> - for (msymbol = objfile->msymbols;
> - msymbol != NULL && SYMBOL_NAME (msymbol) != NULL &&
> - found_symbol == NULL;
> - msymbol++)
> - {
> - if (SYMBOL_MATCHES_NAME (msymbol, name) &&
> - MSYMBOL_TYPE (msymbol) == mst_solib_trampoline)
> - return msymbol;
> - }
> - }
> - }
> -
> - return NULL;
> + return lookup_minimal_symbol_internal (name, sfile, objf,
> + ms_search_trampoline);
> }
> -
>
> /* Search through the minimal symbol table for each objfile and find
> the symbol whose address is the largest address that is still less