This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2] Fix offset caching for streams and use it for ftell (BZ #16680)


On Mon, Mar 17, 2014 at 04:17:46AM -0400, Carlos O'Donell wrote:
> > I have now brought back the behavior of seeking to end of file for
> > append-only streams, but with a slight difference.  I don't cache the
> > offset though, since we would want ftell to query the current file
> > position through lseek while the stream is not active.  Since the
> > offset is moved to the end of file, we can rely on the file position
> > reported by lseek and we don't need to resort to the stat() nonsense.
> 
> OK, but I wouldn't call it nonsense, you either stat() or lseek() and
> both cost a syscall. Your patch inlines the lseek from get_file_offset
> and removes the stat. That's fine.

It is nonsense. For example stat does the wrong thing on device nodes.

> Does this reinstitution of lseek cause any problems for the use of ftell
> on inactive streams? For example is it really correct to have _IO_file_open
> seek to the end of the fully flushed file in append mode? What about
> other users of the fd that might expect the underlying offset to remain
> the same?

It's perfectly correct for fopen to perform the seek. For fdopen I'm
not so clear; I think it's wrong for fdopen to change the position
except in the case where O_APPEND wasn't set and fdopen adds it (in
this case the implementation can do whatever it wants, no?).

> > Finally, the cache is always reliable, except when there are unflished
> > writes in an append mode stream (i.e. both a and a+).  In the latter
> > case, it is safe to just do an lseek to SEEK_END.  The value can be
> > safely cached too, since the file handle is already active at this
> > point.  Incidentally, this is the only state change we affect in the
> > file handle (apart from taking locks of course).
> 
> When you say unflushed writes, I assume those are writes in the
> stream buffers that have not been passed to the kernel write syscall,
> as opposed to writes that the kernel itself has not flushed to disk
> from its own buffers.

Of course. From an application perspective kernel buffers don't exist.
They are not supposed to be observable in any way except for
performance.

> I don't understand this part. The cached offset *could* be reliable
> but you would have to take into account the unflushed writes, but we
> don't so it isn't? Why else is it unreliable?

It's not necessarily unreliable in all cases, but in two important
ones:

1. Transitioning from reading or seeking to writing, and the current
offset on the underlying fd is not at the end of the file.

2. Transitioning from being an inactive handle to being the active
handle, and the current offset on the underlying fd is not at the end
of the file.

In both cases, formally, the first write moved the position to the end
of the file (note: O_APPEND does this for you), but this isn't seen
yet because there hasn't been an actual write to the fd; the data is
stored in the FILE buffer. So you need to determine what position the
buffered data is going to get written to. lseek(fd,0,SEEK_END) does
this. And you don't have to worry that the result will be stale by the
time the underlying write happens, since the active handle rules
forbid any operations on the fd that would make it stale.

Rich


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]