This is the mail archive of the
mailing list for the Archer project.
Re: Initial psymtab replacement results
>>>>> "Daniel" == Daniel Jacobowitz <firstname.lastname@example.org> writes:
Daniel> Did you create the index, then populate the table? Is it quicker to
Daniel> populate the table, and then create the index?
I did both, they are both slow.
Originally I didn't have an index and when I was playing with the SQLite
shell I noticed searches were slow. So, I made an index -- which was
very slow to create in the shell.
Then I thought that maybe making the index before populating the table
would be faster. I made that change to gdb, but it was still quite
Another idea I have is to make a new column holding a hash code, and not
use an index; or maybe use that for the index (indexing on an integer
column may be faster).
I was experimenting just now, and removing the "CREATE INDEX" and
changing the schema to mark symbols.name as "PRIMARY KEY" made database
creation much faster -- for gdb, down from 60 seconds to 19. I still
think that is too slow though.
Daniel> Frank made a good point about putting host characteristics in the
Daniel> cache key. By careful choice of the types stored, we should be able
Daniel> to create a mapped data structure that is in practice dependent only
Daniel> on endianness and maybe pointer size. WDYT?
Yeah, I may give that a try.
Daniel> I know you've done a lot of work to kill psymtabs. Do we populate
Daniel> psymtabs from the index, or are they pretty much optional now? In
Daniel> other words, can we reclaim and reuse the memory formerly spent on
What I did was introduce a new struct of function pointers, alongside
struct sym_fns. This provides an abstraction that replaces direct uses
of partial symbols. The API "design" is completely ad hoc, based on
what previously existed. So, it is rather weird and large; e.g., it has
a special function just for Ada, because ada-lang.c directly examines
Then I moved all the uses of partial symbols into a new file, psymtab.c,
and made a new rule: only psymtab.c and the debuginfo readers are
allowed to directly manipulate these data structures.
Finally, I changed dwarf2read.c to have a separate implementation of
these functions, and to use its own indexing data structures.
dwarf2read now decides per-objfile whether to use partial symbols or the
I did all this because I did not think it was possible to really create
psymbols from the DWARF indices.
This approach saves a bit of memory when using the index. I don't have
numbers handy but my recollection is that the savings isn't very
I have considered modifying dwarf2read to create "new-style" data
structures when the indices are not available. I haven't implemented
this yet, though, because it is more work and the payoff doesn't seem to
The new code could free some memory whenever it reads full symbols for a
CU. I haven't implemented that yet.
Finally, with "-readnow", dwarf2read no longer reads partial symbols or
the indices; it skips directly to just reading everything. I only did
this because it was easy to implement; I actually consider -readnow to
be fairly useless.
Another idea I have is to change the threaded-dwarf branch to read
psymtabs in the background thread. This isn't too terribly hard, now
that psymtabs are fully segregated.