This is the mail archive of the
guile@sourceware.cygnus.com
mailing list for the Guile project.
Re: more guile for perl refugees (split, join)
- To: ttn at glug dot org
- Subject: Re: more guile for perl refugees (split, join)
- From: Steve Tell <tell at telltronics dot org>
- Date: Wed, 28 Jun 2000 01:15:27 -0400 (EDT)
- cc: Guile Mailing List <guile at sourceware dot cygnus dot com>
On Mon, 26 Jun 2000, thi wrote:
> did you know guile has module `(ice-9 string-fun)'? it would be
> interesting to see a performance study between your implementation and
> another one using those procedures.
I did rummage through (ice-9 string-fun) before writing my "split"
routine, and didn't find any immediately useful substitutes. The closest
looked like it might be the commented-out "with-regexp-parts".
Benchmarking my "split" with the regular expression that is probably most
common in most perl programs, the one-or-more-whitespace-characters,
"[ \t\n]+" might primarily be testing regcomp/regexec's optimization of
that case. It might well be worth writing a special case split-whitespace
routine to suplement string-fun.
Another perl-related comparison: has anyone thought about a module that
uses read-hash-extend to compile constant regular expressions once at read
time? Sure you can (make-regexp) outside of a critical loop, but that
could put the regular expression string far away from the compiled
regexp's use.
Trying not to be a "lazy bastard," I poked about in the source and figured
out how read-hash-extend works well enough to throw this together, and
made a few notes.
Then while trying to write a little documentation, I discovered #., as in:
#.(make-regexp "foo")
Is this standard scheme or a guile extension?
Anyway, maybe these ramblings still have some tutorial use:
(use-modules (ice-9 regex))
; Use the read-hash-extend facility to add a syntax for constant
; regular expressions that are to be compiled once when read in,
; instead of during the normal flow of execution. This can let loops
; that repeatedly use a constant regexp be optimized without moving the
; expression's definition far away from its use.
;
; With this hash-extension, these two expressions behave identicaly:
;
; (let ((r (make-regexp "de+"))) (regexp-exec r "abcdeeef"))
; (regexp-exec #+"de+" "abcdeeef")
;
;
(read-hash-extend #\+ (lambda (c port)
(let ((s (read port)))
(if (string? s)
(make-regexp s)
(error "syntax error; #+<string> expected")))))
;
; (very poorly written) general notes on read-hash-extend in liu of real
; documentation.
;
; The read-hash-extend procedure takes two arguments, a character and a
; procedure. The procedure is stored in a hash table keyed on the
; character.
; Later, when guile's reader encounters a token beginning with '#'
; followed by a character that it doesn't otherwise recognize, it calls
; the hash-extend procedure associated with the character with two
; arguments, the character and the reader's current input port.
; The procedure should call (read port) to consume guile tokens as
; necessary to implement its new syntax. The procedure should return a
; single guile object which will be the value of the new #-syntax.
;
; Among the characters NOT available for use with read-hash-extend because
; they are reserved for other guile/RnRS syntax are:
; most alphabetic characters
; *{\!(&'.
;