This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug localedata/21750] column width of characters incompatible with classical wcwidth

From: "egmont at gmail dot com" <sourceware-bugzilla at sourceware dot org>
To: libc-locales at sourceware dot org
Date: Fri, 18 Aug 2017 10:23:17 +0000
Subject: [Bug localedata/21750] column width of characters incompatible with classical wcwidth
Auto-submitted: auto-generated
References: <bug-21750-716@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=21750

--- Comment #12 from Egmont Koblinger <egmont at gmail dot com> ---
(In reply to Thorsten Glaser from comment #8)

> We’re not strictly speaking deviating from UCD because UCD does *not* define
> wcwidth.

Well, it defines the East_Asian_Width property from which you derive wcwidth
using a couple of generic rules plus a few exceptions to them.

You've just (re?)added 3248..324F and a few other ranges to these exceptions,
which in my eyes means that yes, you are deviating from Unicode.

> Terminal emulators use wcwidth, especially xterm uses ONLY it *and* defines
> it.
> 
> Applications such as editors in the terminal (cf. jupp) use wcwidth or carry
> their own data which is prepared the same way as wcwidth (often they use a
> copy of xterm's code).

To be more precise, xterm and a few others copy Markus Kuhn's implementation. I
don't think anyone copies from xterm.

This defines the 3248..324F range as ambiguous (I've checked the most recent
xterm-330 and a randomly chosen ~4 year old xterm-300 – a randomly picked even
older xterm-260 is different which suggests that case xterm has a long ago
caught up with the changes), which, by default, means it is 1 cell wide in
xterm (unless -cjk_width is specified in which case all other ambiguous ones
are turned into double)...

> You speak of compatibility and breaking. Strictly speaking, the switch glibc
> recently (two or three majors, I think) did to regenerated data *did* break
> applications, and this bugreport is 100% returning the glibc data to the way
> it was before in the places the previous change introduced bugs, while still
> keeping it up-to-date with recent Unicode.
> 
> So, therefore, with this patch applied, less things will break than without.

... so I absolutely don't get why less things would be broken now. As far as I
can see, with this patch you have just further broken the handling of these
codepoints by deviating from Unicode and from xterm.

> Outlyers like libglib (used by only one of the multitude of terminal
> emulators) can then import the data (and mechanism used to generate) from
> here.

You really don't seriously expect that two glibc maintainers decide over a chat
that they add a few exceptions to the generic rules, and "outlyers" (like glib,
maybe Qt, maybe Java, maybe some other "giant" pieces of (perhaps commercial)
software, maybe other libc implementations of other Unices (like Mac), maybe a
whole lot more) will follow; do you??

(And on a side note... IMHO submitting a change right after someone brings up
some concerns, not even giving time for a reasonable discussion, isn't really a
polite thing... Especially since recently it took me about 2 years and about
10-15 pings that were left unanswered to get through a well unittested locale
change, I can't understand why this hurry now.)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

References:
- [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth
  - From: tg at mirbsd dot de

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]