This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
- From: "keld at keldix dot com" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Thu, 09 Nov 2017 10:19:17 +0000
- Subject: [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
- Auto-submitted: auto-generated
- References: <bug-22387-131@http.sourceware.org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=22387
--- Comment #20 from keld at keldix dot com <keld at keldix dot com> ---
On Fri, Nov 03, 2017 at 09:56:16AM +0000, egmont at gmail dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22387
>
> --- Comment #11 from Egmont Koblinger <egmont at gmail dot com> ---
> I don't understand the EBCDIC worries at all.
>
> These locale definition files are in ASCII. If you interpret these same files
> in EBCDIC, section names and property names don't make any sense, and neither
> do encoded characters such as "<U0020>", I mean it's no longer
> less/greater-than, uppercase U and digits.
Yes all source files should be converted from Ascii to the ebcdic in question.
This is also the case on UTF-16 systems, the source files should be converted
from some sort of ascii compatible encoding to UTF-16. Or the other way - if
you
move sources from a non ascii-compatible system to an ascii-compatible system.
This process can be done automatically using eg iconv.
> Then, if you iconv the file, the resulting <U0020> and friends still define
> Unicode codepoints and not EBCDIC ones.
No they are not unicode (or UCS) codepoints. When you compile the locale into a
binary
format, then you apply an EBCDIC charmap, and the symbolic <uxxxx> character
names get
encoded according to the EBCDIC encoding applied by localedef -f option
question.
> So, in order to use these files in an EBCDIC environment, they need to be
> converted on two different levels.
No, only one level of conversion is needed and that can be fully automated.
> This does not become any harder or any more complicated by allowing plain ASCII
> characters.
Well, not so, if you operate in an environment with a source encoding different
from the ebcdic target encoding, and vice versa.
best regards
Keld
--
You are receiving this mail because:
You are on the CC list for the bug.