This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug localedata/22073] charmaps/UTF-8: wcwidth of U+00AD (soft hyphen): 0 or 1 ?


https://sourceware.org/bugzilla/show_bug.cgi?id=22073

--- Comment #8 from Mike Frysinger <vapier at gentoo dot org> ---
(In reply to Thorsten Glaser from comment #6)

i'm aware wcwidth isn't explicitly defined by Unicode standards, but that
doesn't mean they completely ignore it.  they discuss terminal emulators
multiple times (including the SHY FAQ), and it's why things like
EastAsianWidth.txt exist in the first place.  it's also pretty clear what the
current Unicode standard is wrt their intentions to this codepoint.

> • 0 is for combining characters and NUL only

that is incorrect.  you mishandle Prepended_Concatenation_Mark (see bug 22070),
and ignore Format Character (Cf) characters which are all 0 (or you're
incorrectly claiming that Cf's are not combining characters).  and which U+00AD
is classified as.

> • the “possible soft hyphen” reading of U+00AD is not a combining character

except that it is.  if Unicode wanted it to be an explicit hyphen, they would
have kept its class as Pd (punctuation character), not changed it to Cf (format
control).  they also wouldn't have described it explicitly as:
Soft Hyphen. Despite its name, U+00AD soft hyphen is not a hyphen, but rather
an
invisible format character used to indicate optional intraword breaks.

> • compatibility with previous/older/other wcwidth() implementations, most
> importantly

appealing to historical wcwidth behavior isn't a great argument.  ones written
to older Unicode standards are def wrong across many codepoints (emoji much?),
and as i already mentioned, implementations converge on the latest Unicode
releases.  all of which say this should be 0.

> • The char should be avoided already *anyway*
> • Terminal emulators never implement wrapping at a “possible soft hyphen”,
> only at the end of the line

then by your own argument, having it follow the Unicode standard is a non-issue

(In reply to Thorsten Glaser from comment #7)

if your terminal and the target application disagree about encoding then you've
already lost.  everything above 0x7F will be wrong (0x80 != U+0080 or 0xc2
0x80).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]