This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug localedata/22070] New: charmaps/UTF-8: wcwidth for Prepended_Concatenation_Mark codepoints set to 0 (should be 1)
- From: "vapier at gentoo dot org" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Sun, 03 Sep 2017 16:35:26 +0000
- Subject: [Bug localedata/22070] New: charmaps/UTF-8: wcwidth for Prepended_Concatenation_Mark codepoints set to 0 (should be 1)
- Auto-submitted: auto-generated
https://sourceware.org/bugzilla/show_bug.cgi?id=22070
Bug ID: 22070
Summary: charmaps/UTF-8: wcwidth for
Prepended_Concatenation_Mark codepoints set to 0
(should be 1)
Product: glibc
Version: unspecified
Status: NEW
Severity: normal
Priority: P2
Component: localedata
Assignee: unassigned at sourceware dot org
Reporter: vapier at gentoo dot org
CC: libc-locales at sourceware dot org
Target Milestone: ---
we currently mark all Cf (Format Character) as width 0, but this ignores
Prepended_Concatenation_Mark codepoints. specifically these should all have a
wcwidth of 1:
0600..0605 ; Prepended_Concatenation_Mark # Cf ARABIC NUMBER SIGN..ARABIC
NUMBER MARK ABOVE
06DD ; Prepended_Concatenation_Mark # Cf ARABIC END OF AYAH
070F ; Prepended_Concatenation_Mark # Cf SYRIAC ABBREVIATION MARK
08E2 ; Prepended_Concatenation_Mark # Cf ARABIC DISPUTED END OF AYAH
110BD ; Prepended_Concatenation_Mark # Cf KAITHI NUMBER SIGN
Unicode 10.0.0 chapter 9 section 2 page 377-378 [1] states:
Signs Spanning Numbers. Several other special signs are written in association
with numbers in the Arabic script. All of these signs can span multiple-digit
numbers, rather than just a single digit. They are not formally considered
combining marks in the sense used by the Unicode Standard, although they
clearly interact graphically with their associated sequence of digits. In the
text representation they precede the sequence of digits that they span, rather
than follow a base character, as would be the case for a combining mark. Their
General_Category value is Cf (format character). Unlike most other format
characters, however, they should be rendered with a visible glyph, even in
circumstances where no suitable digit or sequence of digits follows them in
logical order. The characters have the Bidi_Class value of Arabic_Number to
make them appear in the same run as the numbers following them.
A few similar signs spanning numbers or letters are associated with scripts
other than Arabic. See the discussion of U+070F syriac abbreviation mark in
Section 9.3, Syriac, and the discussion of U+110BD kaithi number sign in
Section 15.2, Kaithi. All of these prefixed format controls, including the
non-Arabic ones, are given the property value
Prepended_Concatenation_Mark=True, to identify them as a class. They also have
special behavior in text segmentation. (See Unicode Standard Annex #29,
“Unicode Text Segmentation.”)
[1] http://unicode.org/versions/Unicode10.0.0/ch09.pdf
--
You are receiving this mail because:
You are on the CC list for the bug.