This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range


On Fri, Nov 03, 2017 at 09:56:16AM +0000, egmont at gmail dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22387
> 
> --- Comment #11 from Egmont Koblinger <egmont at gmail dot com> ---
> I don't understand the EBCDIC worries at all.
> 
> These locale definition files are in ASCII. If you interpret these same files
> in EBCDIC, section names and property names don't make any sense, and neither
> do encoded characters such as "<U0020>", I mean it's no longer
> less/greater-than, uppercase U and digits.

Yes all source files should be converted from Ascii to the ebcdic in question.
This is also the case on UTF-16 systems, the source files should be converted
from some sort of ascii compatible encoding to UTF-16. Or the other way - if you
move sources from a non ascii-compatible system to an ascii-compatible system.

This process can be done automatically using eg iconv.

> Then, if you iconv the file, the resulting <U0020> and friends still define
> Unicode codepoints and not EBCDIC ones.

No they are not unicode (or UCS) codepoints. When you compile the locale into a binary
format, then you apply an EBCDIC charmap, and the symbolic <uxxxx> character names get
encoded according to the EBCDIC encoding applied by localedef -f option question.

> So, in order to use these files in an EBCDIC environment, they need to be
> converted on two different levels.

No, only one level of conversion is needed and that can be fully automated.

> This does not become any harder or any more complicated by allowing plain ASCII
> characters.

Well, not so, if you operate in an environment with a source encoding different
from the ebcdic target encoding, and vice versa. 

best regards
Keld


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]