This is the mail archive of the
libc-locales@sourceware.org
mailing list for the GNU libc locales project.
Re: locale encodings
- From: Keld Simonsen <keld at keldix dot com>
- To: Carlos O'Donell <carlos at redhat dot com>
- Cc: Troy Korjuslommi <tjk at tksoft dot com>, Steven Abner <pheonix at zoomtown dot com>, libc-locales at sourceware dot org
- Date: Tue, 12 Nov 2013 18:11:04 +0200
- Subject: Re: locale encodings
- Authentication-results: sourceware.org; auth=none
- References: <31AACAB8-A716-47CC-B755-F33DD77BA51E at zoomtown dot com> <1384174607 dot 4028 dot 8 dot camel at uno11 dot loco> <20131112012257 dot GA31828 at rap dot rap dot dk> <5281BEB1 dot 2010909 at redhat dot com> <20131112133642 dot GA22738 at rap dot rap dot dk> <52823D8C dot 5060309 at redhat dot com>
On Tue, Nov 12, 2013 at 09:39:08AM -0500, Carlos O'Donell wrote:
> On 11/12/2013 08:36 AM, Keld Simonsen wrote:
> > On Tue, Nov 12, 2013 at 12:37:53AM -0500, Carlos O'Donell wrote:
> >> On 11/11/2013 08:22 PM, Keld Simonsen wrote:
> >>> Well, the encoding of the source coode of all locales should be 7-bit ascii, for
> >>> maximum portability. Then the target encoding should be recorded via the
> >>> % charset specification, which gives a list of possible charsets, comma separated.
> >>> UTF-8 should always be included there, but other encodings should also be available.
> >>
> >> So one of the points that we've been trying to gather consensus on is:
> >> Is it really important to have 7-bit ASCII? Why not use UTF-8 for the
> >> the locale source? It's readily readable by all editors and allows
> >> language specific comments in teh source files for maximum maintenance.
> >
> > I think to have UTF-8 is a bad idea, eg for embedded systems, and for systems that is
> > not maintained in UTF-8. It also can give trouble when communicating the source.
>
> Sorry, could you please expand on that?
>
> Do you have examples of embedded systems that use glibc locale source and
> don't support UTF-8? All such embedded systems that I know of run Linux
> and do support UTF-8.
No, I don't have examples of embedded systems not run in UTF-8.
But I believe they are out there. Like TV-sets, routers and the like.
And non-linux systems. libc can run on many platforms, not just Linux.
> What do you mean by "systems that is [sic] not maintained in UTF-8?"
Many Linux-systems does not run UTF-8 natively. My own for example.
And the all the UTF-16 and UCS-2 systems. Think Apple.
> What kind of problems do you forsee when communicating the source?
In some IBM systems even some ASCII characters are converted wrongly. Thus the use of %
as a comment character in stead of #. On some printers # is printed wrongly.
And so on. In japan somtimes \ is printed wrongly. In my own country
sometimes Ø is printed wrongly. If we go to full UCS, then many printers
do not support full UCs. Even with fonts many do not summprt full UCS,
and really not the latest version of 10646.
Even if a character is correctly displayed, it could be difficult to see
what character it is, out of the over 100.000 characters in ISO 10646.
Many of our sources do restrict themselves to a restricted ASCII, for the same reasons.
This includes ISO 14652 and ISO 30112. I also believe Unicode tables do the same.
Best regards
keld