This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Is it OK to write ASCII strings directly into locale source files?

From: Florian Weimer <fw at deneb dot enyo dot de>
To: Carlos O'Donell <carlos at redhat dot com>
Cc: Mike FABIAN <mfabian at redhat dot com>, Andreas Schwab <schwab at suse dot de>, libc-alpha at sourceware dot org
Date: Tue, 25 Jul 2017 16:37:03 +0200
Subject: Re: Is it OK to write ASCII strings directly into locale source files?
Authentication-results: sourceware.org; auth=none
References: <s9d8tje9e1k.fsf@redhat.com> <5f71f2f6-be0e-2b5d-91ce-03386eafa7f7@redhat.com> <mvmy3rdx577.fsf@suse.de> <87h8y13gvb.fsf@mid.deneb.enyo.de> <e43a088a-cb33-c322-7587-c20d993e7fa6@redhat.com> <87379lczdi.fsf@mid.deneb.enyo.de> <7fa0552d-c24b-3c5c-cad3-1359eb4dd6bd@redhat.com> <s9dbmo9xcjq.fsf@redhat.com> <f550c8ca-da3f-7fbb-e54a-d372fea36a9d@redhat.com> <87mv7sbo75.fsf@mid.deneb.enyo.de> <a5d4bdc3-163e-a925-0425-c5130f8d3b76@redhat.com>

* Carlos O'Donell:

>>> However, I caution against throwing away the compatibility of our locales
>>> with POSIX, which doesn't seem to allow UTF-8 in the specification.
>> 
>> It does, to some extent:
>> 
>> | A character in the portable character set can be represented by the
>> | character itself, in which case the value of the character is
>> | implementation-defined. (Implementations may allow other characters
>> | to be represented as themselves, but such locale definitions are not
>> | portable.)
>> 
>> You'll need a very hostile interpretation to say that this doesn't
>> allow multi-byte character sequences in localedef input.
>
> I see what you're saying, which is that we are *still* POSIX comliant,
> but not portable?

Right, and I think that's okay because the glibc locales are for
glibc.

> I assume we are focusing on the "()" text which allows some kind of escape
> hatch outside of the portable character set and allow us to use UTF-8?

Exactly.

>> But I found this in the guts of localedef:
>> 
>> 	      /* The standards leave it up to the implementation to decide
>> 		 what to do with character which stand for themself.  We
>> 		 could jump through hoops to find out the value relative to
>> 		 the charmap and the repertoire map, but instead we leave
>> 		 it up to the locale definition author to write a better
>> 		 definition.  We assume here that every character which
>> 		 stands for itself is encoded using ISO 8859-1.  Using the
>> 		 escape character is allowed.  */
>> 
>> So we currently hard-code ISO 8859-1 (not UTF-8) to avoid the
>> bootstrapping problem.
>  
> We could just assume UTF-8, but yes, it looks like this needs a little bit
> more looking into.

Yes, and we don't have a real bootstrapping problem because while we
have charmap file for UTF-8, we have a separate UTF-8 implementation
in iconv/gconv, and we could use that to break the loop.

> Either way, I support using the portable character set today, and that's
> a step forward.

Agreed.

References:
- Is it OK to write ASCII strings directly into locale source files?
  - From: Mike FABIAN
- Re: Is it OK to write ASCII strings directly into locale source files?
  - From: Carlos O'Donell
- Re: Is it OK to write ASCII strings directly into locale source files?
  - From: Andreas Schwab
- Re: Is it OK to write ASCII strings directly into locale source files?
  - From: Florian Weimer
- Re: Is it OK to write ASCII strings directly into locale source files?
  - From: Carlos O'Donell
- Re: Is it OK to write ASCII strings directly into locale source files?
  - From: Florian Weimer
- Re: Is it OK to write ASCII strings directly into locale source files?
  - From: Carlos O'Donell
- Re: Is it OK to write ASCII strings directly into locale source files?
  - From: Mike FABIAN
- Re: Is it OK to write ASCII strings directly into locale source files?
  - From: Carlos O'Donell
- Re: Is it OK to write ASCII strings directly into locale source files?
  - From: Florian Weimer
- Re: Is it OK to write ASCII strings directly into locale source files?
  - From: Carlos O'Donell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]