This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug network/22775] New: getaddr can fail permanently under rare circumstances


https://sourceware.org/bugzilla/show_bug.cgi?id=22775

            Bug ID: 22775
           Summary: getaddr can fail permanently under rare circumstances
           Product: glibc
           Version: 2.23
            Status: UNCONFIRMED
          Severity: minor
          Priority: P2
         Component: network
          Assignee: unassigned at sourceware dot org
          Reporter: korb@monument-software.de
  Target Milestone: ---

If:
- getaddrinfo is called with AI_ADDRCONFIG set
- the hosts entry in nsswitch.conf contains an invalid/missing library in
  addition to "files dns"
- the system has no non-loopback interface with a v4 IP yet, but does have
  one with a v6 IP (e.g. a link-local one while a DHCP request is still in
  progress)
- the name does not have a v6 entry in /etc/hosts, but does have a v4 entry
  (I can't reproduce it with a name that must be looked up via DNS)
- no successful queries are made for names that do not appear in /etc/hosts

...then getaddrinfo gets stuck returning -11 (System error) even after the
system has acquired a v4 IP.

Since the situation above is a slightly unusual situation, this is the system
where I saw the problem in case I misinterpreted a detail:
- "embedded" Linux box running an application that tries to connect to a
  local service over TCP at bootup
- eth0 IP assigned via DHCP, application starts at roughly the same time
- /etc/nsswitch.conf contains "hosts: files mdns_minimal [NOTFOUND=return]
  dns mdns", even though libnss-mdns is not installed on the system(*)
- host name used by the application listed in /etc/hosts, but only with
  127.0.0.1 and not ::1 because unfortunately it's not IPv6-capable yet
- getaddrinfo is called with ai_family=0, ai_flags=AI_ADDRCONFIG (verified
  with additional fprintfs in glibc)
- application retries the connection after a few seconds after failures

(*) removing the mdns references from nsswitch.conf avoids the issue, which is
why I think this is just a minor bug - it only happens if the system is
misconfigured

The observed behaviour on this system is that the application occasionally
connects to the local service without problems and sometimes keeps reporting
"Host not found" errors. It was determined that the failures happen when
check_pf reports seen_ipv4=0, seen_ipv6=1 as this will only look for a v6 IP in
/etc/hosts and the successful case happens when the application's first lookup
is made before the external interface is up (check_pf reports seen_ipv4=0,
seen_ipv6=0).

In the failing situation, it appears that h_errno is set to -1 (internal error)
because no library could be loaded for the invalid mdns NSS entries. When the
external interface gains an IPv4 address, the loop over the NSS entries in
gaih_inet stops after the first one (files), but since h_errno is still set to
NETDB_INTERNAL from the failure during the previous lookup, it returns
-EAI_SYSTEM even though the lookup was successful.

If another lookup is made that can be resolved via DNS but not /etc/hosts,
h_errno changes its value and getaddrinfo gets unstuck, but our application
only attempts to look up that single name and thus never hits DNS.

It appears that this exact failure case has already been mentioned as a theory
on the mailing list in [1], but looking at the current code in the git repo I
think it can still be triggered (getaddrinfo.c lines 1073-1077).

[1] https://sourceware.org/ml/libc-alpha/2017-08/msg00206.html

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]