This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Scheme style auto-resizing hashtable


Jay Glascoe <jglascoe@jay.giss.nasa.gov> writes:

 > On 18 Oct 1998, Harvey J. Stein wrote:
 > 
 > > What about with a bigger hash table size, such as 2999 or 1499 or
 > > 1091?  As far as I recall, TCL's (& thus STk's) hash tables resize
 > > when a 4th element is inserted into a bucket.
 > >
 > 
 > these hashtables resize, the vector's length doubles, when a bucket
 > exceeds the max-bucket-size.  (make-hashtab 0) is a very bad idea.
 > 
 > > 
 > > It'd also be interesting to run your speed test in STk.

I did a very rough comparison of Guile's hash tables vs STk's
hash tables.  Of course, they're hard to compare because STk & guile
are different interpreters.  But, the hash table insertion timings are
different enough & the overhead timings are close enough that it seems
that STk's hash tables are faster than Guile's (on the order of 20%
faster when computing runtime-overhead, and on the order of 50% when
doing runtime-gc-overhead, assuming (gc-run-time) returns time in
milliseconds).

Here're the details:

hashtest.scm:

(debug-disable 'debug)

(define (profile:clock)
  (/ (get-internal-run-time)
     internal-time-units-per-second))

(define-macro (timeit form)
  (let ((g1 (gensym))
	(g2 (gensym)))
    `(let* ((,g2 ())
	    (,g1 (profile:clock)))
       ,form
       (set! ,g2 (profile:clock))
       (display "Elapsed time : ")
       (display (- ,g2 ,g1))
       (display "      GC time : ")
       (display (gc-run-time))
       (newline)
       (display "GC Stats : ")
       (display (gc-stats))
       (newline))))

(define (make-hash-table) (make-vector 1009))

(define (do-test func lines number-times)
  (do ((i 1 (+ i 1)))
      ((>= i number-times))
    (let ((mydict (make-hash-table)))
      (for-each (func mydict) lines))))

(define (insert-entry mydict)
  (lambda (line)
    (let* ((len (string-length line))
	   (top (inexact->exact (/ len 10)))
	   (key (string->symbol (substring line 0 top))))
      (hash-set! mydict key line))))

(define (port->list p)
  (let loop ((ln (read-line p))
	     (l '()))
    (cond ((not (eof-object? ln))
	   (loop (read-line p) (cons ln l)))
	  (else (reverse l)))))


hashtest.stk:

(define (do-test func lines number-times)
  (do ((i 1 (+ i 1)))
      ((>= i number-times))
    (let ((mydict (make-hash-table)))
      (for-each (func mydict) lines))))

(define (insert-entry mydict)
  (lambda (line)
    (let* ((len (string-length line))
	   (top (inexact->exact (/ len 10)))
	   (key (string->symbol (substring line 0 top))))
      (hash-table-put! mydict key line))))

(define (port->list p)
  (let loop ((ln (read-line p))
	     (l '()))
    (cond ((not (eof-object? ln))
	   (loop (read-line p) (cons ln l)))
	  (else (reverse l)))))


Guile run:

;;; Setup:
guile> (load "hashtest.scm")
guile> (define l (port->list (open-input-file "x.ps")))

;;; Test insertions:
guile> (timeit (do-test insert-entry  l 10))
Elapsed time : 2.27      GC time : 76
GC Stats : ((gc-time-taken . 76) (cells-allocated . 59384) (cell-heap-size . 98304) (bytes-malloced . 168731) (gc-malloc-threshold . 227016) (cell-heap-segments (1075433480 . 1075171336) (1075961864 . 1075437576)))
guile> (timeit (do-test insert-entry  l 10))
Elapsed time : 2.2      GC time : 111
GC Stats : ((gc-time-taken . 111) (cells-allocated . 87620) (cell-heap-size . 98304) (bytes-malloced . 181773) (gc-malloc-threshold . 227016) (cell-heap-segments (1075433480 . 1075171336) (1075961864 . 1075437576)))
guile> (timeit (do-test insert-entry  l 10))
Elapsed time : 2.24      GC time : 155
GC Stats : ((gc-time-taken . 155) (cells-allocated . 48070) (cell-heap-size . 98304) (bytes-malloced . 164513) (gc-malloc-threshold . 227016) (cell-heap-segments (1075433480 . 1075171336) (1075961864 . 1075437576)))

;;; Test overhead:
guile> (timeit (do-test  (lambda (h) (lambda (i) #f))  l 10))
Elapsed time : 0.18      GC time : 164
GC Stats : ((gc-time-taken . 164) (cells-allocated . 32618) (cell-heap-size . 98304) (bytes-malloced . 151438) (gc-malloc-threshold . 227016) (cell-heap-segments (1075433480 . 1075171336) (1075961864 . 1075437576)))
guile> (timeit (do-test  (lambda (h) (lambda (i) #f))  l 10))
Elapsed time : 0.140000000000001      GC time : 168
GC Stats : ((gc-time-taken . 168) (cells-allocated . 85879) (cell-heap-size . 98304) (bytes-malloced . 167586) (gc-malloc-threshold . 227016) (cell-heap-segments (1075433480 . 1075171336) (1075961864 . 1075437576)))
guile> (timeit (do-test  (lambda (h) (lambda (i) #f))  l 10))
Elapsed time : 0.18      GC time : 176
GC Stats : ((gc-time-taken . 176) (cells-allocated . 69803) (cell-heap-size . 98304) (bytes-malloced . 159520) (gc-malloc-threshold . 227016) (cell-heap-segments (1075433480 . 1075171336) (1075961864 . 1075437576)))
guile> 

STk run:

;;; Setup:
STk> (load "hashtest.stk")
#[undefined]
STk> (define l (port->list (open-input-file "x.ps")))
#[undefined]

;;; Test insertions:
STk> (time (do-test insert-entry  l 10))
;;    Time: 1810.00ms
;; GC time: 580.00ms
;;   Cells: 554247
#[undefined]
STk> (time (do-test insert-entry  l 10))
;;    Time: 1800.00ms
;; GC time: 560.00ms
;;   Cells: 553911
#[undefined]

;;; Test overhead:
STk> (time (do-test (lambda (h) (lambda (i) #f)) l 10))
;;    Time: 140.00ms
;; GC time: 70.00ms
;;   Cells: 88769
#[undefined]
STk> (time (do-test (lambda (h) (lambda (i) #f)) l 10))
;;    Time: 150.00ms
;; GC time: 80.00ms
;;   Cells: 88757
#[undefined]


-- 
Harvey J. Stein
BFM Financial Research
hjstein@bfr.co.il