This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Patricia trie symbol tables?

From: John Moser <john dot r dot moser at gmail dot com>
To: binutils at sourceware dot org
Cc: Michael Meeks <michael dot meeks at novell dot com>
Date: Sun, 25 Jun 2006 22:45:28 -0400
Subject: Patricia trie symbol tables?

I was looking at some of the work Michael Meeks was doing, and thinking
on it, reading up a lot on linker optimizations lately actually.
Anyway, based on a paper Ulrich Drepper spit out a while back, I had a
weird thought...

[1] http://people.redhat.com/drepper/dsohowto.pdf

[1] page 8 shows a couple of symbol names:

_ZN14some_namespace22some_longer_class_nameC1Ei
_ZN14some_namespace22some_longer_class_name19the_getter_functionEv

Reading what Drepper says about -Wl,-O1 and why to use it, I get the
gist of this:  You'll do comparisons against
"_ZN14some_namespace22some_longer_class_name" twice, what a waste.  The
point of -Wl,-O1 is to make the buckets smaller so this happens less
often, as a side effect of making the hash table bigger (in other words,
no guarantees; but it'll probably happen).

I had come up with a slightly different approach a few hours ago, while
reading through some of Meeks' stuff and trying to condense all of this
information into a usable form.  This one involved doing a second hash
table, containing buckets of patricia tries instead of linked lists.

A patricia (radix) trie is a trie where nodes with only one child and no
data are collapsed into their children.  At any rate the function of a
trie is to use strings as keys to look up data very quickly.  Let's give
an example radix trie contaning:

foo
foobar
junk
break
braid
foodrax
foodramon
foodragon

Radix Trie:

 |-br--eak
 |   |-ead
 |
-|-foo--bar
 |    |-dra--x
 |         |-gon
 |         |-mon
 |
 | -junk

Starting from the root of this, we compare to find 'f' and find 'foo'.
Then 'd' and find 'dra'.  Then 'g' and find 'gon'.  Right now we know
we've found 'foodragon', we can stop.  So we compared:  b foo\0 b dra\0
x gon\0

This is opposed to comparing:  foo\0 foob j b b foodrax foodram
foodragon\0

Yes, I'm sure everyone here knew what a radix trie is already.
Anyway :)

I was thinking about it, and it seemed logical to me to do something
like that.  It'd have to be a second section, i.e. .gnu.dynradix
(instead of .dynstr?); and the pointers could point to the pointer in
the real symbol table (although if we can get away with it, the dynamic
linker could just use the radix tries the whole way across the board--
but where do we get better locality in the long run?  Meeks loves
cachegrind...)

another thought I had, are hash tables even useful with these?  I guess
you could skip comparing with aAbBcC.... to find zlib_compress() by
using hash tables with buckets of radix tries as opposed to just a giant
radix trie... any takers?

At any rate, it's just a thought.  I'm not up for coding it (I only
pretend to understand this stuff) but I figured I'd put it out there to
see what comments I get.
-- 
John Moser <john.r.moser@gmail.com>

Attachment: signature.asc
Description: This is a digitally signed message part

Follow-Ups:
- Re: Patricia trie symbol tables?
  - From: Florian Weimer
- Re: Patricia trie symbol tables?
  - From: John Moser

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]