This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Manual should discourage mixing TEMP_FAILURE_RETRY and close
On 12/06/2016 11:42 PM, Zack Weinberg wrote:
> On 12/06/2016 10:51 AM, Rich Felker wrote:
>>> Also, I took a look at the illumos code base, which presumably does
>>> not differ greatly from historical Solaris on this point. It looks to
>>> me (closef(), closeandsetf()) that also on that system there are paths
>>> in close() where errors are returned but the FD is nevertheless
>>> closed. So it really does seem that POSIX.1 was specifying (and plans
>>> to again specify) behavior that is inconsistent with actual
>>> implementations.
>>
>> I think you're assuming POSIX is imposing something opposite of what
>> it's doing. The new text has filedes left open _only_ in the case of
>> EINTR. For all other errors it's closed and available for reassignment
>> on subsequent open, etc.
>
> But that's still wrong! `fildes` MUST be closed even when EINTR is
> returned. That's what all existing implementations do (if I understood
> Michael correctly),
I believe it is "most (not all) implementations.
> and even if I've misunderstood that part, because
> *some* implementations do it, portable code has to assume that it's
> unsafe to retry the operation. So the only sane specification for POSIX
> to make is that the descriptor is closed.
That's my opinion.
> (No, new programs cannot assume that it's safe to retry, because new
> programs get run on old operating systems.)
Exactly. This is the sense in which I'd argue that the POSIX.1
proposal (to forbid an implementation from returning EINTR
and close the FD) potentially worsens portability.
> (Yes, this means delayed failures can get lost. Code that cares
> should've called `fsync` anyway. Also, the whole thing is moot if you
> use SA_RESTART on all your signals, which is the Right Thing for tons of
> other reasons.)
>
> ----
>
> Regardless of what the specification does or doesn't say or will be
> corrected to say, what the *glibc manual* should say is clear, IMNSHO:
> don't retry close(). Proposed new text follows - it's not as emphatic
> as what Michael wrote for the manpages but I think it's sufficient.
Yes, I still think the glibc manual needs fixing. But, by now, I've
tried to be more nuanced in a further revision to the close(2) man
page, which by now reads:
A careful programmer will check the return value of close(), since
it is quite possible that errors on a previous write(2) operation
are reported only on the final close() that releases the open file
description. Failing to check the return value when closing a
file may lead to silent loss of data. This can especially be
observed with NFS and with disk quota.
Note, however, that a failure return should be used only for diag‐
nostic purposes (i.e., a warning to the application that there may
still be I/O pending or there may have been failed I/O) or reme‐
dial purposes (e.g., writing the file once more or creating a
backup).
Retrying the close() after a failure return is the wrong thing to
do, since this may cause a reused file descriptor from another
thread to be closed. This can occur because the Linux kernel
always releases the file descriptor early in the close operation,
freeing it for reuse; the steps that may return an error, such as
flushing data to the filesystem or device, occur only later in the
close operation.
Many other implementations similarly always close the file
descriptor (except in the case of EBADF, meaning that the file
descriptor was invalid) even if they subsequently report an error
on return from close(). POSIX.1 is currently silent on this
point, but there are plans to mandate this behavior in the next
major release of the standard
A careful programmer who wants to know about I/O errors may pre‐
cede close() with a call to fsync(2).
The EINTR error is a somewhat special case. Regarding the EINTR
error, POSIX.1-2013 says:
If close() is interrupted by a signal that is to be caught,
it shall return -1 with errno set to EINTR and the state of
fildes is unspecified.
This permits the behavior that occurs on Linux and many other
implementations, where, as with other errors that may be reported
by close(), the file descriptor is guaranteed to be closed. How‐
ever, it also permits another possibility: that the implementation
returns an EINTR error and keeps the file descriptor open.
(According to its documentation, HP-UX's close() does this.) The
caller must then once more use close() to close the file descrip‐
tor, to avoid file descriptor leaks. This divergence in implemen‐
tation behaviors provides a difficult hurdle for portable applica‐
tions, since on many implementations, close() must not be called
again after an EINTR error, and on at least one, close() must be
called again. There are plans to address this conundrum for the
next major release of the POSIX.1 standard.
(Excepting the first paragraph, all of the text there was added by me,
and can be considered in the public domain if any of it is useful for
inclusion in the glibc manual.)
> @comment unistd.h
> @comment POSIX.1
> @deftypefun int close (int @var{filedes})
> @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{@acsfd{}}}
> The function @code{close} closes the file descriptor @var{filedes}.
> Closing a file has the following consequences:
>
> @itemize @bullet
> @item
> The file descriptor is deallocated.
>
> @item
> Any record locks owned by the process on the file are unlocked.
>
> @item
> When all file descriptors associated with a pipe or FIFO have been closed,
> any unread data is discarded.
> @end itemize
>
> Closing a file does @emph{not} guarantee that all data has been
> successfully written to disk. If you need to make sure of this, you
> must call @code{fsync} before @code{close} (@pxref{Synchronizing I/O
> operations}).
>
> @code{close} never @emph{fails}; after any call to @code{close}
> returns, the file descriptor @var{filedes} has been closed and must
> not be used again. However, it may still report an error. It returns
> 0 if there was no error, and @math{-1} if an error occurred. The
> following @code{errno} error conditions are defined for this function:
>
> @table @code
> @item ENOSPC
> @itemx EIO
> @itemx EDQUOT
> These codes indicate an I/O error in an earlier call to @code{write}
> that appeared to succeed. @xref{I/O Primitives}, for details on their
> meaning. Because of these ``delayed failures,'' it is important to
> check for errors in @code{close}.
>
> @item EINTR
> The @code{close} call was interrupted by a signal. The only
> consequence of this is that a ``delayed failure'' may have been
> lost---the file descriptor has still been closed. You should
> @emph{not} retry the system call.
>
> @item EBADF
> The @var{filedes} argument was not an open file descriptor to begin
> with (and still isn't).
> @end table
>
> This function is a cancellation point in multi-threaded programs. This
> is a problem if the thread allocates some resources (like memory, file
> descriptors, semaphores or whatever) at the time @code{close} is
> called. If the thread gets canceled these resources stay allocated
> until the program ends. To avoid this, calls to @code{close} should be
> protected using cancellation handlers.
> @c ref pthread_cleanup_push / pthread_cleanup_pop
>
> @code{close} can close any type of file descriptor, no matter how it
> was opened. Therefore, there is no @code{close64}, as it is
> unnecessary. However, @code{FILE} objects must be closed using
> @code{fclose} instead (@pxref{Closing Streams}).
> @end deftypefun
While the above is accurate for glibc on Linux, and is also
true for many other systems (AFAICT), I can see why there
may be resistance to accepting it for the glibc manual.
Maybe, some more nuanced text would be helpful. In any case
the current manual text should still be changed, IMO.
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/