This is the mail archive of the gdb@sourceware.cygnus.com mailing list for the GDB project. See the GDB home page for more information.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

[srikanth@cup.hp.com: GDB questions]


OK, who wants to speak to save bcache? -s

------- Start of forwarded message -------
From: Srikanth Adayapalam <srikanth@cup.hp.com>
Subject: GDB questions
To: shebs@cygnus.com
Date: Mon, 15 Mar 1999 11:22:31 PST

Hi Stan,

	My name is Srikanth and I work for the Wildebeest project at HP.
I have been working on profiling the heap usage of GDB in response to 
several complaints from our customers about GDB's voracious appetite for
memory. In the first phase I am focussing on fixing allocation bugs, leaks,
eliminating redundancies, and tuning high overhead data structures. The second
phase will focus at architectural improvements to speed up startup times and
improve memory usage. 

	One of things that I ran into in the high overhead items list is
the byte cache (bcache.) This is the hash table used by the symbol reader
when it is building psymtabs. Here are my observations on this hash table 
scheme :

	o it is used only during the psymtab building stage and the only 
          things we store in it are vanilla symbol names, their demangled 
          equivalents, and psymbols themselves. (It would be a digression
          here to mention that when we attempt to stick symbol names into
          the bcache, we are already duplicating strings, for the symbols
          name strings come from VT or its equivalent and is around for
          the duration of the object file.) 

        o one thing unique about this hash table is that its objective seems
          to be not to achieve O(1) access to the objects stored in it. 
          To appreciate this see that the lookup_cache() function is not 
          called by any part of GDB other than the bcache module itself (and
          cannot be called as it a file static routine.) Rather the objective 
          seems to be minimize storage requirement by maintaining unique 
          copies of objects. 

        o thus the only client of this module i.e., the symbol reader, requests
          this module to store certain kinds of objects (char *, psymbols) 
          and is provided in return with a pointer to the location where the 
          bcache module actually stored according to its internal algorithms.
          The client never looks up the hash table since it has no need to for
          it has a pointer to the whole object.

    Ironically this module does not minimize memory requirements of GDB 
but rather increases it tremendously.  These are some of the numbers that
illustrate this point. The memory requirements are as reported by gdb (when
run with -statistics command line option) to bring up the application and
break on main.

                          Without bcache    With Bcache           Bloat 
                     
HP C compiler              151+ MB            291+ MB              48
A Customer Application      82+ MB            107+ MB              23
GDB                         20+ MB             24+ MB              16
HP C++ Compiler             43+ MB             51+ MB              15

	The case marked "without bcache", is actually the bcache module itself
but one that does not bother to eliminate duplicates.

	A further observation of interest is that this module scales very 
poorly : the overhead (which is really every byte that is not used to store
GDB's data i.e., cells used for house keeping info like pointers, hash chain
heads etc.,) is of the order of  O(m * n * 64k) where m is the number of
load modules compiled with -g, and n is the number of strings that have unique
length. This spells doom for applications wih a large number of shared 
libraries (m increases) and C++ (n increases since we also stick demangled 
names into the cache.) This explains the lower overhead in the case of GDB
and C++ compiler as they have only the main a.out compiled with -g. 
I think the first two benchmarks are more typical of HP's customers' code. 
 

	That is the story. Before we go ahead and unplug the bcache, we thought
it would be prudent to check with you guys to make sure we are overlooking
anything here. Is there any compelling reason we should avoid duplicates in
psymbols ?

	While we are on the topic, would it be possible to provide any
details you have on PR 2207 ? I saw some reference to this in GDB sources
and would like to find out more.

Thanks for your time.
Srikanth
 
------- End of forwarded message -------