This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: A useful syntax for regexps and other things - And scheme version.


Mikael Djurfeldt writes:
 > "Harvey J. Stein" <abel@netvision.net.il> writes:
 > 
 > > In scheme one can do much better.  When I was moving my URL parsing
 > > code around I finally wrote a function
 > > 
 > >    (list->string-regexp lst)
 > 
 > This is a great idea!  Nice!
 > 
 > > which would take a regular expression such as 
 > >    '((set "^a-zA-Z_$")
 > >      (group "read" or "readv" or "readln"
 > > 	    or "write" or "writev" or "writeln"
 > > 	    or "reset" or "extend" or "rewrite"
 > > 	    or "close")
 > >      (zero-or-more whitespace)
 > >      "(")
 > 
 > But I like the syntax in the commentary better:
 > 
 > > ;;          (or regexp1 regexp2 ...)   - Match regexp1 or regexp2 or ...
 > 
 > Why do you use infix notation above?  Is that the in-"complete job"?

One of the reasons I don't like my implementation - it's not self
consistent.  I guess I intended to make or a prefix operator, but
ended up making it an infix operator, maybe because it was easy.  I
guess it could be fixed by changing:


	   ((or) (concat regexp-or
			 (list->regexp-string (cdr l) quote)))

to something like:

	   ((or) (concat-between (mapcar (lambda (regexp)
					   (list->regexp-string regexp quote))
					 (cdr l))
				 regexp-or))

and adding:

(defun concat-between (list separator)
  (cond ((null list)
	 "")
	((null (cdr list))
	 (car list))
	(t
	 (concat (car list)
		 separator
		 (concat-between (cdr list) separator)))))

Then the above regexp list could be:

   '((set "^a-zA-Z_$")
     (group or "read" "readv" "readln"
	    "write" "writev" "writeln"
	    "reset" "extend" "rewrite"
	    "close")
     (zero-or-more whitespace)
     "(")


Also, here's my scheme version - it's basically the same, but doesn't
have the automatic escaping of the emacs version - a regexp-quote
would have to be written & applied at the appropriate times.

Again, though, let me stress that it'd be better to use bigloo's
syntax since it predates mine.  Let's start trying to be more
compatible instead of less compatible...

;;; Set these up for your particular scheme regexp package...
(define regexp-start-group "\\(")
(define regexp-end-group "\\)")
(define regexp-start-set "[")
(define regexp-end-set "]")
(define regexp-one-or-more "\\+")
(define regexp-zero-or-more "*")
(define regexp-zero-or-one "\\?")
(define regexp-or "\\|")
(define regexp-begin "^")
(define regexp-end "$")
(define regexp-any-char ".")

(define (list->regexp-string l)
  (cond ((null? l) "")
	((and (list? l)
	      (symbol? (car l)))
	 (case (car l)
	   ((group) (string-append regexp-start-group
				   (list->regexp-string (cdr l))
				   regexp-end-group))
	   ((set)   (string-append regexp-start-set
				   (list->regexp-string (cdr l))
				   regexp-end-set))
	   ((one-or-more) (string-append (list->regexp-string (cdr l))
					 regexp-one-or-more))
	   ((zero-or-more) (string-append (list->regexp-string (cdr l))
					  regexp-zero-or-more))
	   ((zero-or-one) (string-append (list->regexp-string (cdr l))
					  regexp-zero-or-one))
	   ((begin) (string-append regexp-begin
				   (list->regexp-string (cdr l))))
	   ((end) (string-append regexp-end
				 (list->regexp-string (cdr l))))
	   ((any-char) (string-append regexp-any-char
				      (list->regexp-string (cdr l))))
	   ((or) (string-append regexp-or
				(list->regexp-string (cdr l))))))
	((list? l)
	 (string-append (list->regexp-string (car l))
			(list->regexp-string (cdr l))))
	(else
	 l)))


The above definitions are for guile.  I believe one would use these
instead for stk & scm, but I'm not sure:

(define regexp-start-group "(")
(define regexp-end-group ")")
(define regexp-start-set "[")
(define regexp-end-set "]")
(define regexp-one-or-more "+")
(define regexp-zero-or-more "*")
(define regexp-zero-or-one "?")
(define regexp-or "|")
(define regexp-begin "^")
(define regexp-end "$")
(define regexp-any-char ".")