This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
- From: "egmont at gmail dot com" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Thu, 02 Nov 2017 20:40:12 +0000
- Subject: [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
- Auto-submitted: auto-generated
- References: <bug-22387-131@http.sourceware.org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=22387
Egmont Koblinger <egmont at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |egmont at gmail dot com
--- Comment #6 from Egmont Koblinger <egmont at gmail dot com> ---
(In reply to Andreas Schwab from comment #5)
> See
> <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.
> html#tag_07_03> for the full rules.
Bullet point 2 here says "Within a string, the double-quote character, the
escape character, and the right angle bracket character shall be escaped [...]"
Why not the left angle bracket too? Otherwise you can't tell for sure whether
"<U+0020>" stands for a space, or for literal
lessthan-you-plus-oh-oh-two-oh-greaterthan.
I think it doesn't hurt to remain a bit safer with special characters, e.g.
escape the comma, semicolon, less-than, greater-than, backshash, and whatever
the escape character (typically overridden to slash in locale files)
everywhere.
---
On the other hand, what about non-ASCII characters? Are they allowed as raw
UTF-8, or do they still need to be escaped? Allowing raw UTF-8, such as a
weekday name of "hétfő" rather than "h<U00E9>tf<U0151>" would highly improve
readability of the file.
--
You are receiving this mail because:
You are on the CC list for the bug.