This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Scheme style auto-resizing hashtable


Jay Glascoe <jglascoe@jay.giss.nasa.gov> writes:

 > I've been working on a Guile extension; auto-resizing hashtables.  The
 > extension is basically a bunch of functions which operate on a pair
 > (cons header
 >       vector)
 > 
 > where header is an alist
 > 
 > (list (cons 'number-buckets 4)   ;; 4 is default initial size
 >       (cons 'max-bucket-size 5)  ;; 5 seems to work best
 >       (cons 'largest-bucket 0)
 >       (cons 'number-entries 0))
 > 
 > and vector is, well, a vector of "buckets". Each bucket looks like
 > 
 > (N (hash1 key1 . value1) (hash2 key2 . value2) ... (hashN keyN . valueN))

<snip>

That looks exactly right.

 > Anyway, I re-timed the infamous python dictionaries (as ported to Guile),
 > Guile's hash tables, and these things I made, "hashtabs".  The result is
 > that the hash tables and the hashtabs are equally fast (NOTE: this time
 > around I disabled debugging, and chose a better initial size for the hash
 > tables: 529 for 3000 entries).  The python dictionary type is only about
 > 5% faster than the others. 

What about with a bigger hash table size, such as 2999 or 1499 or
1091?  As far as I recall, TCL's (& thus STk's) hash tables resize
when a 4th element is inserted into a bucket.

It'd also be interesting to run your speed test in STk.

 > The advantage (well, some might call it a disadvantage) that the hash
 > tables and hashtabs have over the python dictionaries is that the
 > dictionaries are "opaque".  You can't see, or screw up, a dictionary
 > because its representation is completely internal.  Guile's hash tables,
 > on the other hand, are just vectors of alists.  And the resizable hashtabs
 > are pairs of alists and vectors.  In fact, except for the hashing
 > functions, both the hash tables and the hashtabs could easily be rewritten
 > in Scheme, with only a modest speed penalty (umm, 100% slower?).

You implemented your hashtabs in C?

You can make them opaque if you want.  Have (make-hash-table) return a
closure with the data built in.  Then do things like:

   (let ((h (make-hash-table)))
      (h 'put key value)
      (h 'get key value)
      ...)

But I'd say you're better off leaving them open.

-- 
Harvey J. Stein
BFM Financial Research
hjstein@bfr.co.il