This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: locale encodings


On Tue, Nov 12, 2013 at 09:39:08AM -0500, Carlos O'Donell wrote:
> On 11/12/2013 08:36 AM, Keld Simonsen wrote:
> > On Tue, Nov 12, 2013 at 12:37:53AM -0500, Carlos O'Donell wrote:
> >> On 11/11/2013 08:22 PM, Keld Simonsen wrote:
> >>> Well, the encoding of the source coode of all locales should be 7-bit ascii, for
> >>> maximum portability. Then the target encoding should be recorded via the 
> >>> % charset specification, which gives a list of possible charsets, comma separated.
> >>> UTF-8 should always be included there, but other encodings should also be available.
> >>
> >> So one of the points that we've been trying to gather consensus on is:
> >> Is it really important to have 7-bit ASCII? Why not use UTF-8 for the
> >> the locale source? It's readily readable by all editors and allows
> >> language specific comments in teh source files for maximum maintenance.
> > 
> > I think to have UTF-8 is a bad idea, eg for embedded systems, and for systems that is
> > not maintained in UTF-8. It also can give trouble when communicating the source.
> 
> Sorry, could you please expand on that?
> 
> Do you have examples of embedded systems that use glibc locale source and
> don't support UTF-8? All such embedded systems that I know of run Linux
> and do support UTF-8.

No, I don't have examples of embedded systems not run in UTF-8.
But I believe they are out there. Like TV-sets, routers and the like.
And non-linux systems. libc can run on many platforms, not just Linux.

> What do you mean by "systems that is [sic] not maintained in UTF-8?"

Many Linux-systems does not run UTF-8 natively. My own for example.
And the all the UTF-16 and UCS-2 systems. Think Apple.

> What kind of problems do you forsee when communicating the source?

In some IBM systems even some ASCII characters are converted wrongly. Thus the use of %
as a comment character in stead of #. On some printers # is printed wrongly.
And so on. In japan somtimes \ is printed wrongly. In my own country
sometimes Ø is printed wrongly.  If we go to full UCS, then many printers
do not support full UCs. Even with fonts many do not summprt full UCS,
and really not the latest version of 10646.

Even if a character is correctly displayed, it could be difficult to see
what character it is, out of the over 100.000 characters in ISO 10646.

Many of our sources do restrict themselves to a restricted ASCII, for the same reasons.
This includes ISO 14652 and ISO 30112. I also believe Unicode tables do the same.

Best regards
keld


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]