This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: avl-trees vs. hashes


>  > > What the tree based stuff does that hash tables don't do is ordering.
>  > > You need a less-than operator as well as an equal operator & then you
>  > > can also do the following operations in O(log n):
>  > > 
>  > >    Get the smallest item.
>  > >    Get the largest item.
>  > >    Get the next item.
>  > >    Get the previous item.
>  > > 
>  > 
>  > I've never used trees in everyday programming.  Could you give me an
>  > example of how the above procedures might be useful to me?
> 
> Databases - maintaining ordered lists.

If you want to use floating point numbers as keys, you can't use hash
tables because hash tables only map an exact match on a key. Although
exact match with floating point numbers is possible, in practice
(and in scheme vernacular) floating points are inexact. You need to ask
for all entries plus or minus some epsilon value -- getting this sort
of info out of a hash table requires a linear search.

People earlier said that hash tables are faster with insertion and
deletion but this is only true if you have chosen the size of the table
to suit the number of keys, if you choose the size badly then hash table
performance drops off. If you use auto-resize on hash tables then your
insertion is no longer O(1) but it tends towards O(log N). This probably
doesn't matter in most cases because insertion speed is not as important
as the speed of a read-only lookup since most applications do a lot more
reads than inserts or deletes.

>  > > And although it'd be nice to have the functionality of
>  > > shrinking/reforming the hashtable, I don't expect it to get much use.
>  > > In particular, I wouldn't automatically call it in the delete
>  > > function.
>  > 
>  > I call it when the largest bucket size is less than one quarter the
>  > maximum bucket size.  The vector size is then halved.  This seems
>  > to work well.
> 
> Well, except that someone who's doing something like putting lots of
> things, then deleting a lot of them, then putting new things in,
> etc. will get a lot of overhead from the continual growing & shrinking
> of the hash table.  You might want to make the shrinking part
> optional.

It should all be optional but still, if you don't know the number of keys
that you expect to handle then you will be better off with the auto-resize.

> I keep forgetting the code's in C.  What'd be interesting would be to
> see if the scheme version compiled by hobbit is as fast as your C
> version.  Probably some business of appropriately flagging it as in
> use is needed.

I've been paranoidly setting every unused value to SCM_BOOL_F to give the
garbage collector an easy time. I haven't tried to measure how much penalty
this is or check how many places it really matters.

> I'm still not sure exactly what you mean about shrinking & growing
> being different, but I'm sure it'll become clear when I see the code.
> 
> When you say "insert", I assume that's using the stored hash values
> as opposed to using a hashtable-put! function...

Hmmm, I usually think of ``insert'' as putting in a new key/value pair,
if the key already exists then replace it, if not then add an extra key.
The result is that doing lots of inserts causes the table to fill up.
Is this an unusual use of the word?

If I just want to use an existing key then I call it ``find'' or ``lookup''.

	- Tel