This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Hashtables in guile.


Telford Tendys <telford@triangle.triode.net.au> writes:

> > > Hash tables are fast to pull things out of and fast to put things
> > > into but slow to resize -- them's the breaks.
> > 
> > Well, yes and no.  The standard hash table implementation is like
> > that.  There are a number of other hash table algorithms; one, in
> > particular, only requires rehashing one bucket at a time as it
> > resizes.  Still as much work, but not all in one lump. 
> 
> Is there a URL that explains this algorithm, I'm curious to see
> if it might be useful to me. (a book reference would make an
> acceptable second choice :-)

Well, I saw it in a Doctor Dobbs a few years ago; sorry, that's not
very helpful, is it?  I can give you an outline, though. 

Assume we have a hasher which returns an integer and is modulo-safe.
Let n be the current number of bins, and let m be such that
2**(m-1) <= n < 2**m.  To find where an item should go,
hash the key, modulo it by 2**m, then by n.  Store it in a linked list
of buckets there (as in the standard separate chaining).  When you
need to expand the hash table from n to n+1, you only need to re-hash
the keys in bucket n+1-2**(m-1); they will end up in either the same
bucket or in bucket n+1.  Similarly for contracting the table. 

> > We'd also want some good string-hashing functions preimplemented.
> > Doing it right is hard. 
> 
> I wrote a hasher that had a lookup table of 256 constant random ints.
> Each byte in the input is used as an index into the lookup table
> and the random int that returns from the lookup is xored into
> an accumulator. The accumulator rotates by one bit after each
> xor into it. It was simple, reasonably fast, and had statistical
> characteristics virtually the same as a uniform random generator
> would. I have C++ source somewhere if anyone is interested.

That sounds good... but this is one area (like crypto) where trusting
the wizards might be better than rolling your own.  There are hash
functions designed to be fast and to de-cluster english text as much
as possible.  Oh, a perfect hash generator would be cool too. 

Andrew