This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

NSS error reporting (bug 20532)


I'm looking again at bug 20532 and related issues.

The core problem is that the NSS service functions have many different
ways of reporting result status:

[1]  The return value, enum nss_status.
[2]  errno value via the int *errnop argument
[3]  direct errno update
[4]  h_errno value via the int *herrnop argument (for host lookups)
[5]  direct h_errno update (for host lookups)
[6]  empty results (while otherwise reporting success)

The overall behavior is extremely inconsistent even within glibc.

(A) The general assumption seems to be that NSS_STATUS_SUCCESS and
NSS_STATUS_NOTFOUND [1] do not have any other error codes associated
with them (i.e., the other error flags are indeterminate).  This is
explicitly enforced by the get*by*_r wrapper in glibc, which discards
errno for the success case.

(B) For NSS_STATUS_TRYAGAIN [1], *errnop [2] equals ERANGE, and, for
hosts, lookups, *errnop equals NETDB_INTERNAL, is used to request retry
with a larger buffer.  Therefore NSS_STATUS_TRYAGAIN requires a valid
*errnop [2] value (non-host lookup) or a valid *herrnop [4] value (host
lookup), or a valid *herrnop [4] value and valid *errnop [2] value (host
lookup with *herrnop equals NETDB_INTERNAL).

(C) NSS_STATUS_UNAVAIL [1] seems to follow the same rules about validity
as NSS_STATUS_TRYAGAIN.

(D) getaddrinfo, in gethosts, unconditionally uses *errnop [2] and
*herrnop [4], without checking the return value, in violation of (B).

(E) In contrast to (D), getaddrinfo largely follows (B) for the
AF_UNSPEC/fct4 NSS function call.

(F) getaddrinfo manually propagates *herrnop [4] to h_errno in many
cases, for internal use and perhaps for use by the caller of getaddrinfo
(but this is not part of POSIX and underspecified).

(G) Googles fix for bug 20532 touches getaddrinfo code which reads
h_errno [5] directly (not *herrnop [4]).

(H) getaddrinfo documentation and actually callers suggest that errno is
only valid if getaddrinfo fails with EAI_SYSTEM.

(I) getaddrinfo, besides the handling for (B)/(D)/(E), generally assumes
that the NSS service function has set errno directly [3], and does not
propagate it to the caller even for EAI_SYSTEM.

(J) Empty results [6] cannot be reported by most of the get* NSS
functions (e.g., a struct passwd * indicates a single).  For
getaddrinfo, the struct addrinfo * list includes at least one element if
getaddrinfo is successful (says POSIX).  So that seems to rule out [6]
completely.

(K) _nss_files_get*_r saves and restores errno around the
internal_setent call.  This means that even a failure with
NSS_STATUS_UNAVAIL or NSS_STATUS_TRYAGAIN does not result in a proper
*errnop [2] or errno [3] value.  This could cause problems for the
get*ent implementation in glibc if the initial errno value was ERANGE
(because there is no way for the caller to tell that the error occurred
during the internal_setent phase).

(L) Carlos added NSS_STATUS_NOTFOUND with *errnop equals 0 as a
documented special case to the manual, in commit
d4e301c5c65393837e438b6d81feabfbfde7b9c7.  This contradicts (A).
NSS_STATUS_NOTFOUND is handled implicitly by __nss_next2, which does not
have access to the errno value, so I do not understand how this could work.

(M) NSS_STATUS_RETURN does not seem to have associated error codes, like
NSS_STATUS_SUCCESS.  The manual refuses to document this status code,
and I have not tried to reverse-engineer its behavior.

(N) Many callers alias *errnop [2] and errno [3] (passing &errno as
errnop), but this is not fully consistent.

(O) Fewer callers alias *herrnop [4] and h_errnop, and most of them are
bugs:

(Q) The glibc get*ent wrappers with a herrnop parameter appear to ignore
that and use &h_errno instead (which looks like a genuine bug).


I suppose this list could go on for quite some time.  The question is in
which direction we want to move this.

I think the three values (enum nss_status return value [1], *errnop [2],
*herrnop [4]) should behave roughly like this (as expressed in Standard
ML, I hope anyone reading thus far can understand this):

  type errno = int

  datatype h_errno =
      NETDB_SUCCESS
    | HOST_NOT_FOUND
    | TRY_AGAIN
    | NO_RECOVERY
    | NO_DATA
    | NETDB_INTERNAL of errno

  datatype 'a nss_status =
      NSS_STATUS_TRYAGAIN of 'a
    | NSS_STATUS_UNAVAIL of 'a
    | NSS_STATUS_NOTFOUND
    | NSS_STATUS_SUCCESS
    | NSS_STATUS_RETURN

  (* The hosts database has an associated h_errno variable, and errno is
     only valid in the NETDB_INTERNAL case.)
  type host_status = h_errno nss_status

  (* The other databases use errno directly, without h_errno.  *)
  type other_status = errno nss_status

In addition, we should change NSS service function calls so that we call
them with *errnop == 0, *herrnop == 0, and if we need any of these error
values (due to NSS_STATUS_TRYAGAIN, NSS_STATUS_UNAVAIL results) and they
are still 0, we use errno [3] and h_errno [5] directly as a replacement.
 This should provide backwards compatibility in cases where external NSS
service functions currently expect aliasing of to the thread-local
variables and updates (say) errno, but not *errnop.

The gethosts issue (D) should be fixed, and getaddrinfo should only read
the other error codes for NSS_STATUS_TRYAGAIN, NSS_STATUS_UNAVAIL, per
the SML types above.

(Q) is also a clear bug.  (L) probably should be reverted.  (K) looks
like it needs fixing as well.

I don't really know how to test this, except by making the cleanups and
doing some whole-system tests.

Comments?

Thanks,
Florian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]