This is the mail archive of the
libc-locales@sourceware.org
mailing list for the GNU libc locales project.
[Bug localedata/22073] charmaps/UTF-8: wcwidth of U+00AD (soft hyphen): 0 or 1 ?
- From: "tg at mirbsd dot de" <sourceware-bugzilla at sourceware dot org>
- To: libc-locales at sourceware dot org
- Date: Wed, 06 Sep 2017 14:38:06 +0000
- Subject: [Bug localedata/22073] charmaps/UTF-8: wcwidth of U+00AD (soft hyphen): 0 or 1 ?
- Auto-submitted: auto-generated
- References: <bug-22073-716@http.sourceware.org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=22073
--- Comment #6 from Thorsten Glaser <tg at mirbsd dot de> ---
Unicode does NOT define the column width of a char in the terminal. This shows
in all those mailing list threads, in which they basically assume all fonts to
be proportional.
wcwidth() however basically *is* the column width of a char in the terminal in
a fixed-width cell layout.
The cōnsēnsus seems to be to ask _users_ avoid using U+00AD because of the two
different histories in interpretation, and use something else for the separate
purposes. That leaves us with needing a definition for this char *should* it
appear anywhere still.
I’m arguing for 1 because:
• 0 is for combining characters and NUL only
• the “possible soft hyphen” reading of U+00AD is not a combining character
• compatibility with previous/older/other wcwidth() implementations, most
importantly
The 0 fraction should not be at a loss here because:
• The char should be avoided already *anyway*
• Terminal emulators never implement wrapping at a “possible soft hyphen”, only
at the end of the line
• Unicode data is still available elsewhere, this bugreport is precisely about
wcwidth() which only “almost” aligns with the various Unicode datas (yes, I
know, wrong plural, but I can’t think of anything better to express what I
mean, right now)
--
You are receiving this mail because:
You are on the CC list for the bug.