This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: improving read-line


| Date: Thu, 11 Dec 1997 17:40:57 -0500 (EST)
| From: Tim Pierce <twp@skepsis.com>
| 
| read-line is presently just a wrapper around read-delimited, which is
| dog-slow. I'd like to make it use %read-line instead, which should
| give us a considerable performance improvement.  However, since
| %read-line relies on fgets for regular files and pipes, it can only
| handle newline-delimited lines.  That means that if you've rebound
| scm-line-incrementors in order to change the behavior of read-line,
| you'll have to call read-delimited instead.

Setting scm-line-incrementors doesn't seem good enough anyway: it
wouldn't cope well with lines terminated by \r\n.  fgets can, but only
after recompiling on a system where that's the norm.

I does seem like something that should be configurable in some way.
It's like a trivial case of the general external character encoding
problem (support JIS, BIG5, ISO-10646 etc., too if it seems
convenient).

| 
| While I'm at it, I'd also like to get rid of the `split' and `peek'
| arguments to read-line.  These are a pain in the neck to implement and
| don't seem to be very useful if you're only reading newline-delimited
| records.

You didn't suggest abolishing 'concat and 'trim.  Are the other two
really such a pain in the neck to implement?  Can you not just do an
SCM_CUNGET after the fgets for the 'peek case?  The extra options
don't seem essential, but they give compatibility with scsh's
read-line.

%read-line seems to have a bug when handling the NUL character:

guile> (define q (open-input-file "/vmlinuz"))
guile> (define a (read-line q))
guile> (string-length a)
388
guile> (ftell q)
389
guile> (define q (open-input-file "/vmlinuz"))
guile> (define a (%read-line q))
guile> (string-length a)
6
guile>  (ftell q)
79

Perhaps it would be useful to optimise read-line!, which is often the
one to use if you care about speed.  Scsh doesn't have it.  The SCM
version would be faster than the current implementation, but I'm not
sure by how much.

| Are there any objections to this change?  Please please please please
| say no.

Suggestion of change causes objection, can't avoid that.