This is the mail archive of the gdb-patches@sourceware.cygnus.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Horrible GDB symbol table performance


And i'll put money these are either java, or C++ programs, right?

I had a theory a while ago most of that most of the performance problem in
lookup_partial_symbol is because we can't do the binary search, or force a
linear lookup, a lot of the time.

So, i grabbed the first small-medium size C++ program i could find, (First
thing google found on "large C++ progrram" where i could easily download
the source was a social security administration application that estimates
benefits.), to see if there was any chance i was right, and turned on
profiling in gdb to make sure my application could get close to yours in
the various percentages.

Oh, i also added a few counters to gdb in lookup_partial_symbol, and a
quick function to print them out, like so:
[root@dan anypiab]# ../../../gdb/gdb ./anypiab
GNU gdb 5.0
Copyright 2000 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i686-pc-linux"...info s
(gdb) info sy
symbol      symbolstat
(gdb) info symbolstat
Total searches:92
Linear searches forced:84
On average, linear search goes through 21.000000 symbols to find a match.
Searches we couldn't binary search:0
(gdb)

Searches we couldn't binary search means the global flag is 0, so we
couldn't binary search it.
forced linear searches are just that. It means it's really doing a linear
search, not just that the flag is set. This includes the times we had to
linear search because of the global flag being 0.
Obviously, the average is how many symbols the linear search went
through (inside the for loop), divided by the number of linear searches we
did.
Hmm, already not a particularly good start, but let's do a bunch of stuff
(print out variables, mainly, since i assume this is what most people do
in debugging), and see what happens.
(gdb) info symbolstat
Total searches:4778
Linear searches forced:4518
On average, linear search goes through 155.000000 symbols to find a match.
Searches we couldn't binary search:2343
Since i didn't have all the SSA datafiles it needed, i had to stop after
printing out almost all of the symbols in the first 10 lines (bout 10
symbols, all of them C++ classes), stepping
through calls,  etc.

I'd consider it a short debug session.
The stats say to me, that we can't linear search because of the global
flag about 50% of the time. 47% of the rest of the time, we get forced
into a linear search, and search an average of 155 symbols.
So we are only binary searching 3% of the time.
Which is bad for performance, obviously.
But hey, could just be a fluke, so let's keep going.
Seeing that i needed some more data, i chose another medium size program,
DDD.
~135k lines.

To make a long story short, after 15 minutes of debugging, and tracking
down a real crash in DDD, here are the stats:

Total searches: 858210
Linear searches forced: 825302
On average, linear search goes through 2036.000000 symbols to find a 
match.
Searches we couldn't binary search: 394732

So, we are being forced into linear search, 96% of the time, with ~40-45%
of those from the global flag being 0.
So 96% of the time, we end up searching through an average of 2036
symbols.
That's really bad.
And gprof confirms JT's results, i get 68% of my time spend in
lookup_partial_symbol.

So why doesn't anyone else see it?
Well, we mostly debug gdb.
which never will force a linear search when the global flag isn't set,
since it's written in C (if your language is C++, it'll get forced into a
linear search every time it can't find the symbol through a binary search)
Same with gcc.
So you get:
(After a quick session on gcc)
(gdb) info symbolstat
Total searches:54100
Linear searches forced:49090
On average, linear search goes through 23.000000 symbols to find a match.
Searches we couldn't binary search:49090

So while we aren't binary searching about 90% of the time, we aren't
looking at that many symbols when we aren't, so we don't get hurt.

What's my point?
Well, basically, that the binary search being done isn't helping, because
its not used, or we ignore the results, when debugging C++ programs.

Just for kicks, at the expense of a ton of memory, and mainly because i
could do it really quickly (using the libiberty hash table), i made it do
a hash table lookup on the same data it's doing a linear search through
now, and as expected, it went a bunch faster (Like,
lookup_partial_symbol dropped to an absurdly low percentage of the time).
Doing this properly would take more than the hundred lines or so it took
for my hack (since i didn't bother to change the structure of the partial
symbol tables, i just had it add it to my hash table as well), but it
would certainly help. For those wondering, i did handle the namespace
issue, just like it does now, so it is a valid comparison).

Also interesting to note is that printing some of these variables, and
members, was performing >800 partial symbol lookups a pop.

That took me completely by surprise.
all i did was print "argle.bargle", not something 50 levels deep.

THen again, i could be completely off, it's late, and i could just not be
thinking clearly.
--Dan





Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]