This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC/PoC 0/4] IO: Introduce getdelim_append


An issue with the getdelim function recently came up on the git
mailing list:

If getdelim fails (e.g. due to ENOMEM), the contents which have
already been read from the stream and copied to the output buffer are
effectively lost: the bytes exist in the output buffer and *n
faithfully reflects the allocated size of that, but the caller has no
way of knowing how many actually came from the stream, and how many
might be random malloc/realloc junk, since the local variable cur_len
is lost.

This means that there is no way for the application to try to free
some memory and retry the getdelim call or falling back to some other
method (e.g. a slow getc loop). One way to solve this is to introduce
getdelim_append, which has an extra in/out parameter allowing the
caller to indicate the initial offset in the buffer to start writing
at. getdelim_append updates this parameter every time content is
copied to the output buffer.

There are other use cases apart from allowing the application to try
to recover from an error. For example, one can imagine reading a file
format where a set of header lines is delimited by a blank
line. Reading the entire header can then be done without maintaining
both a line buffer and a final buffer, copying from one to the other:

  char *buf = NULL;
  size_t cap = 0, len = 0, old_len;
  ssize_t ret;
  do {
    old_len = len;
    ret = getdelim_append(&buf, &cap, '\n', f, &len);
  } while (ret > 0 && len-old_len > 1);

(this could probably just be ret > 1, but in more complicated
situations one could use old_len with ret and/or len to inspect the
last record read).

Notes:

(1) I'm not fond of the name, but it's the least bad I could come up
with (I'd hate getdelim2).

(2) The patch series is by no means complete. I suppose
getdelim_append should be defined in a separate file, appropriate
stuff should be added to the installed headers, documentation updated
etc.; hence the Proof of Concept heading.

(3) I'd prefer to call the new parameter 'offset' instead of
'cur_len', but I think the patches are more readable if we keep the
'cur_len' name initially.

(4) I considered allowing cur_len to be NULL, and then behaving like
getdelim. The new getdelim wrapper would then pass NULL, and the
necessary dummy declaration be moved to getdelim_append.

(5) One should probably add an EINVAL sanity check to ensure that
*cur_len <= *n, to catch callers passing an uninitialized cur_len
(thinking that it is only an output parameter). Depending on (4),
there's also an obvious cur_len == NULL check to add.

(6) 4/4 is somewhat unrelated.

Rasmus Villemoes (4):
  getdelim: Compute result separately
  getdelim: Change cur_len from ssize_t to size_t
  getdelim: Introduce getdelim_append
  getdelim: Allow a return value of SSIZE_MAX

 libio/iogetdelim.c | 35 ++++++++++++++++++++++++++---------
 libio/libioP.h     |  1 +
 2 files changed, 27 insertions(+), 9 deletions(-)

-- 
2.1.3


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]