This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: avl-trees vs. hashes


On 20 Oct 1998, Harvey J. Stein wrote:

> Jay Glascoe <jglascoe@jay.giss.nasa.gov> writes:
> 
>  > On 20 Oct 1998, Harvey J. Stein wrote:
>  > 
>  > >    Get the smallest item.
>  > >    Get the largest item.
>  > >    Get the next item.
>  > >    Get the previous item.
>  > 
>  > I've never used trees in everyday programming.  Could you give me an
>  > example of how the above procedures might be useful to me?
> 
> Databases - maintaining ordered lists.
> 

fair enough.  I use gdbm and bsddb hashes a lot. 

> Well, except that someone who's doing something like putting lots of
> things, then deleting a lot of them, then putting new things in,
> etc. will get a lot of overhead from the continual growing & shrinking
> of the hash table.  You might want to make the shrinking part
> optional.
> 

good idea.  Right now my hash tables look like this

guile> (define foo (make-hashtab))
guile> (do ((i 1 (+ i 1))) ((> i 10)) (hashtab-set! foo i i))
guile> foo
(((number-buckets . 4) (max-bucket-size . 5) (largest-bucket . 3)
(number-entries . 10) (largest-bucket-index . 2)) . #((2 (00007 7 . 7)
(00003 3 . 3)) (3 (00010 10 . 10) (00006 6 . 6) (00002 2 . 2)) (3 (00009 9
. 9) (00005 5 . 5) (00001 1 . 1)) (2 (00008 8 . 8) (00004 4 . 4))))

I added the 'largest-bucket-index association so I only have to search
for the next largest bucket (an O(n) procedure) when the current largest
bucket decreases in size (this happens one in n times).

So, I'll just add a new association 'auto-shrink-flag, and modify
make-hashtab to accept 3 optional arguments (max-bucket-size,
number-buckets, and auto-shrink-flag).  I'll even add two new procedures:
(hashtab-enable 'auto-shrink) and (hashtab-disable 'auto-shrink) so the
user can easily turn it on/off as they please.

>  > > Also, why isn't shrinking the hashtable identical to growing it?
>  > 
>  > Garbage collection.
>  > 

that was sort of an off-the-cuff remark.  I agree, in theory shrinking a
hash table should take no longer than growing one.  But in the timing
tests I'm using, garbage collection is much slower than, uh, garbage
creation. 

> I'm still not sure exactly what you mean about shrinking & growing
> being different, but I'm sure it'll become clear when I see the code.
> 

I put a gzipped tar-ball on the web at
http://www.giss.nasa.gov/~jglascoe/hashtab-type/
containing hashtab-type.[ch], my-hasher.[ch], myguile.c and a makefile.

Please keep in mind, the code is a bit of a mess and it's almost
completely uncommented.  I'll clean it up when I'm happy with it.  It's
not my best work, but it is functional.

	Jay
	jglascoe@jay.giss.nasa.gov