This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range

From: "egmont at gmail dot com" <sourceware-bugzilla at sourceware dot org>
To: glibc-bugs at sourceware dot org
Date: Thu, 02 Nov 2017 20:40:12 +0000
Subject: [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
Auto-submitted: auto-generated
References: <bug-22387-131@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

Egmont Koblinger <egmont at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |egmont at gmail dot com

--- Comment #6 from Egmont Koblinger <egmont at gmail dot com> ---
(In reply to Andreas Schwab from comment #5)

> See
> <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.
> html#tag_07_03> for the full rules.

Bullet point 2 here says "Within a string, the double-quote character, the
escape character, and the right angle bracket character shall be escaped [...]"

Why not the left angle bracket too? Otherwise you can't tell for sure whether
"<U+0020>" stands for a space, or for literal
lessthan-you-plus-oh-oh-two-oh-greaterthan.

I think it doesn't hurt to remain a bit safer with special characters, e.g.
escape the comma, semicolon, less-than, greater-than, backshash, and whatever
the escape character (typically overridden to slash in locale files)
everywhere.

---

On the other hand, what about non-ASCII characters? Are they allowed as raw
UTF-8, or do they still need to be escaped? Allowing raw UTF-8, such as a
weekday name of "hétfő" rather than "h<U00E9>tf<U0151>" would highly improve
readability of the file.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

References:
- [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  - From: claude at 2xlibre dot net

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]