This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: readline improvements


> From: "ccf::satchell"@hermes.dra.hmg.gb
> Date: Fri, 19 Dec 1997 11:24:20 GMT
> 
> The extra speed in read-line and friends is welcome, but there is
> a weird side-effect (at least in guile-core-971215 on Ultrix), as
> read-line has stopped working on pipes. A quick check showed that
> %read-line is effected, but %read-delimited! is ok. Is this intended,
> or should I flag it as a bug?

This is a bug -- thanks for reporting it.  It'll be fixed in
tomorrow's snapshot.  read-line will be slower on pipes than on
regular files, but it should be functional.

Here are the issues, for the curious and clever:

ANSI fgets is fundamentally broken.  It reads all characters from its
input stream until a newline is read, which means that it will happily
read and store null characters in your storage buffer.  However, fgets
does not report the length of the string that it has read.  So if it
returns with a null-terminated buffer that does not contain a newline,
there is an ambiguity:

   * the string was read from the end of the file, and there was no
     trailing newline at the end of file.

   * the line just read contains embedded null characters.

Why fgets does not simply return the number of characters read and
stored is beyond me; I guess ultimately we can blame Rob Pike for this
nonsense.  Anyway, in order to cope with embedded nulls and still be
able to use fast library calls, scm_fgets calls ftell to find out how
many characters it just read.  But ftell always returns 0 for
non-regular files, hence the bug you noticed.

For now, pipe objects will use a generic character-at-a-time line
reader.  This is slower than scm_fgets, but it should work.

Perl handles this problem by telling the stdio library, essentially,
"please go away."  The record input routine in Perl calls read() to
read chunks into a buffer, and then scans the buffer in core for
delimiter characters.  In some ways that's the Right Thing, since it
permits doing all kinds of funky things with delimiter strings and
buffered input that are harder to do if you rely on stdio.  But in
order to use *any* stdio functions (and ultimately that's just about
unavoidable), Perl has to use the stream's own buffer for all of this
work, and that creates monstrous portability headaches.