This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range


https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #27 from Egmont Koblinger <egmont at gmail dot com> ---
(In reply to keld@keldix.com from comment #25)

> This commit is highly problematic, damaging the portablilty of glibc locales.

If this kind of portability is really a concern, someone could some up with a
script that converts from the new version to the old one. It could even be
integrated with the build system to the level where these generated files are
actually placed under BUILD and then further processed.

I wish the current change even pushed it further, towards raw UTF-8 at least
for printable and "non-problematic" (to some vague, arbitrary definition)
characters.

I have on a few occasions made some minor edits to effected parts of a locale
file, dealing with the <Uxxxx> notation was a nightmare. Working with a string
like "h<U00E9>tf<U0151>" is already much better than
"<U0068><U00E9><U0074><U0066><U0151>", but seeing "hétfő" would be ideal.

Source code is meant to be human-readable, which all these <Uxxxx>s is most
certainly not.

There's a reason people write code like
  printf("Hello world!\n");
and not
  printf("\x48\x65\x6c\x6c\x6f\x20\x77\x6f\x72\x6c\x64\x21\x0a");

If for whatever reason the latter, hard-to-read (and hard-to-write) form is
required, it should be auto-generated from the former, easy-to-read (and
easy-to-write) one.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]