Re: getaddrinfo chokes at hostnames containing "emoji" characters

On 05/16/2018 04:03 PM, Name Surname wrote:
Florian Weimer wrote:
On 05/16/2018 10:40 AM, Name Surname wrote:
Greetings everyone.

I recently bought a domain name containing "emoji" characters, as a
novelty and in order to do some experiments. I tried getting the IP
address associated to it using getaddrinfo, however, it errs and returns
"Name or service not known". The same thing happens with any program
that uses glibc for name resolution. I understand that emoji domains are
not valid according to IDNA2008, however, some ccTLDs sell them, they
were supported according to IDNA2003, and web browsers resolve them
normally according to IDNA2003 (at least firefox does).

Is this a bug or a feature?

In the near future, glibc will use the system libidn2 library to
implement AI_IDN getaddrinfo support.  You will have to convince the
libidn2 maintainers to enable Emoji support (by default), but as long as
there is no published standard for that at all (perhaps with the
exception of Unicode TR46 transitional mode, which is not recommended),
this seems difficult.

It seems that, according to the WHATWG URL standard, IDNs should be
processed as per IDNA2008:

  > Let result be the result of running Unicode ToASCII with
  > domain_name set to domain, UseSTD3ASCIIRules set to beStrict,
  > CheckHyphens set to false,
  > CheckBidi set to true, CheckJoiners set to true,
  > *processing_option set to Nontransitional_Processing*,
  > and VerifyDnsLength set to beStrict.


(Emphasis mine)

If I am understanding the standard correctly, then discussion of this
matter is moot, as this implies that emoji domains are not even
considered valid URLs.

Yes, Firefox implements something else. It generates a DNS request for from <http://nä>, which is not allowed according to UseSTD3ASCIIRules. This is probably a specification bug.

But based on what I understand, IDNA with TR46 non-transitional processing does not actually allow emojis.


