This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
On Fri, 23 Oct 1998, Maciej Stachowiak wrote: > > This isn't directly on-topic, but I was going to suggest earlier that > you change the header alist to a vector despite your readability > concerns; it should be trivial to write a hashtab-header-alist or > hashtab-properties in Scheme to translate it, and the more I think > about it, the more convinced I am it would be a noticeable performance > win. > yeah, I think you're right. I earlier replied that I like to do (car myhastab) to see its stats, but I could just as easily do (hashtab-stats myhashtab) and maybe throw in some real stats like the mean/median bucket size, standard deviation, etc. I think you're right about making the header a vector. I'm currently calling scm_sloppy_assq 4 times upon insertion of a new key-value pair. (5 times if the current bucket size exceeds the largest bucket size so far). > Regarding this proposal, basically nothing can save you from a hash > function that hashes a lot of things to the exact same value; no > number of resizes will help. The only hope for salvation here is for > hashing to significantly change when the size of the range changes, so > that resizing will actually solve such collisions. I've found that the expected maximum bucket size seems to be related to log(number-entries). I can't prove that, but I am currently modifying max-bucket-size to grow at this rate and it does seems to work well. > Another note: I use both eq? and equal? hashing in my code a fair bit. > Also it would be nice if your code were able to hash by any of eq?, > eqv? or equal? (the hashers for the first two should not be hard). At > that point, I would suggest that a suitably cleaned up version of your > code should replace Guile hash tables entirely, even 3% or 5% > performance overhead is a small price to pay for auto-resizing. > well, all we have to do there is rewrite the basic procedures to accept different eq predicates, then write a bunch of wrapper functions, i.e. hashtab_setqx(key, value) would call hashtab_basic_setx(key, value, scm_eq_p, my_hasherq) of course, the hasher functions would all be quite different. > Another reason that maybe not considering the ultimate hash range in > the hash function may be a lose: the obvious way to implement `hashq' > with no mod is to cast the SCM value as an int, this will never > collide in a word-size space but almost certainly will in many common > hashtable sizes. My current tests use a big file filled with unique, and very large, random numbers. But still, the bigger the hashtable, the bigger the expected largest bucket size (relative to average bucket size). Jay jglascoe@jay.giss.nasa.gov