This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range

From: "keld at keldix dot com" <sourceware-bugzilla at sourceware dot org>
To: glibc-bugs at sourceware dot org
Date: Thu, 09 Nov 2017 10:19:17 +0000
Subject: [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
Auto-submitted: auto-generated
References: <bug-22387-131@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #20 from keld at keldix dot com <keld at keldix dot com> ---
On Fri, Nov 03, 2017 at 09:56:16AM +0000, egmont at gmail dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22387
> 
> --- Comment #11 from Egmont Koblinger <egmont at gmail dot com> ---
> I don't understand the EBCDIC worries at all.
> 
> These locale definition files are in ASCII. If you interpret these same files
> in EBCDIC, section names and property names don't make any sense, and neither
> do encoded characters such as "<U0020>", I mean it's no longer
> less/greater-than, uppercase U and digits.

Yes all source files should be converted from Ascii to the ebcdic in question.
This is also the case on UTF-16 systems, the source files should be converted
from some sort of ascii compatible encoding to UTF-16. Or the other way - if
you
move sources from a non ascii-compatible system to an ascii-compatible system.

This process can be done automatically using eg iconv.

> Then, if you iconv the file, the resulting <U0020> and friends still define
> Unicode codepoints and not EBCDIC ones.

No they are not unicode (or UCS) codepoints. When you compile the locale into a
binary
format, then you apply an EBCDIC charmap, and the symbolic <uxxxx> character
names get
encoded according to the EBCDIC encoding applied by localedef -f option
question.

> So, in order to use these files in an EBCDIC environment, they need to be
> converted on two different levels.

No, only one level of conversion is needed and that can be fully automated.

> This does not become any harder or any more complicated by allowing plain ASCII
> characters.

Well, not so, if you operate in an environment with a source encoding different
from the ebcdic target encoding, and vice versa. 

best regards
Keld

-- 
You are receiving this mail because:
You are on the CC list for the bug.

References:
- [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  - From: claude at 2xlibre dot net

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]