This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: NSS error reporting (bug 20532)

From: Florian Weimer <fweimer at redhat dot com>
To: Carlos O'Donell <carlos at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>
Date: Fri, 4 Aug 2017 11:54:11 +0200
Subject: Re: NSS error reporting (bug 20532)
Authentication-results: sourceware.org; auth=none
Authentication-results: ext-mx09.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-results: ext-mx09.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=fweimer at redhat dot com
Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com DBCC3DB935
References: <c128f474-992e-c7de-5fae-a2437778c1eb@redhat.com> <87e3d20e-5a51-93ba-6276-0f5305836b2c@redhat.com> <02cabe4c-f2e8-03f7-8307-71cf2aedace5@redhat.com> <a2d5b334-e02f-b999-6605-c1d4ff114250@redhat.com>

On 08/03/2017 09:28 PM, Carlos O'Donell wrote:
> On 08/03/2017 01:18 PM, Florian Weimer wrote:
>> On 08/03/2017 06:49 PM, Carlos O'Donell wrote:
>>> On 08/03/2017 12:28 PM, Florian Weimer wrote:
>>>> (L) Carlos added NSS_STATUS_NOTFOUND with *errnop equals 0 as a
>>>> documented special case to the manual, in commit
>>>> d4e301c5c65393837e438b6d81feabfbfde7b9c7.  This contradicts (A).
>>>> NSS_STATUS_NOTFOUND is handled implicitly by __nss_next2, which does not
>>>> have access to the errno value, so I do not understand how this could work.
>>
>>> I must assume that (A) is not quite correct. I had two reproducers where
>>> errno was propagated to the caller, and did result in observable differences?
>>
>> Can you dig up the details?  And what was fixed in response to this
>> clarification?
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=988068
> 
> SSSD, installed, client setup, but no server running was returning
> NSS_STATUS_UNAVAIL / ENOENT, and that results in the ENOENT being
> propagated to the caller. This was not the intent. The intent was
> for the SSSD NSS service module to be transparent to the user if
> sssd was not running. So I had to clarify a class of NSS service module
> was was intalled, answering replies, lacked a connected data source,
> but wanted to return a no-error result that would be retried later.
> 
> The example reproducer was as simple as:
> 
> #include <sys/types.h>
> #include <pwd.h>
> #include <stdio.h>
> 
> int main(int argc, char* argv[]) {
>   struct passwd pwd;
>   char buf[4096];
>   int err;
>   struct passwd *res;
> 
>   err = getpwnam_r(argv[1], &pwd, buf, 4096, &res);
> 
>   printf("<%s> err: <%d>\n", argv[1], err);
>   return 0;
> }

This reproducer does not match the output below.

I think there are several issues in play here.

(i) NSS_STATUS_UNAVAIL must come with an errno error code on the NSS
module interface.  If there is no error code, NSS_STATUS_UNAVAIL is
likely the wrong result code.

(ii) getpwnam_r (and all of get*_r) uses &errno as the errnop pointer,
and calls __set_errno explicitly (even to 0).  A lot of code relies on
the errno update, which is not required by POSIX.

(iii) As a result, getpwnam (without _r) does not leave errno untouched
on success, which is required to tell the cause of the getpwnam result:
error lack of user.

(getpwnam follows the readdir interface convention here: Set errno to
zero before the call, and if it is still zero, no error happened.)

> With SSSD using status==NSS_STATUS_TRYAGAIN errno==EAGAIN:
> 
> getgrent() OK group(0) = root 
> getgrent() OK group(1) = bin 
> getgrent() OK group(2) = daemon 
> ...
> getgrent: Resource temporarily unavailable
> getgrent error 11 

Note: Now we are talking about getgrent.

get*ent (in the form of __nss_getent_r) has both bugs (ii) and (iii).
For getgrent, the case for readdir-like behavior is even clearer.

> It should have been clarified to say "errno is left untouched."

Untouched by whom?  The NSS module or the framework?  The framework
currently clobbers errno unconditionally for NSS_STATUS_SUCCESS and
NSS_STATUS_NOTFOUND in a few places.

> Should we have saved and restored errno?

We save and restore errno in the NSS framework for NSS_STATUS_NOTFOUND,
so that we get POSIX semantics for errno.

We could also pass a local int variable instead of &errno for errnop,
and leave it to the NSS module not to clobber errno, but that will
result in a lot of code duplication, and the NSS_STATUS_NOTFOUND code is
already unambiguous in this regard.  I now think we should not try to
separate errno and *errnop because so much code relies on errno updates
even for the *_r functions (see glob and nscd for examples), and keeping
the aliasing (even extending it to getaddrinfo) will increase
compatibility with existing NSS Modules.

The errno saving cannot happen at the level of the *_r functions because
of the ERANGE case, which requires an *errnop update.  In particular,
what can happen is that we get NSS_STATUS_TRYAGAIN/ERANGE first,
followed by NSS_STATUS_NOTFOUND once we retry with a larger buffer.
This can happen with nss_files and long comment lines, for instance.

Thanks,
Florian

Follow-Ups:
- Re: NSS error reporting (bug 20532)
  - From: Carlos O'Donell

References:
- NSS error reporting (bug 20532)
  - From: Florian Weimer
- Re: NSS error reporting (bug 20532)
  - From: Carlos O'Donell
- Re: NSS error reporting (bug 20532)
  - From: Florian Weimer
- Re: NSS error reporting (bug 20532)
  - From: Carlos O'Donell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]