This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug localedata/22070] New: charmaps/UTF-8: wcwidth for Prepended_Concatenation_Mark codepoints set to 0 (should be 1)


https://sourceware.org/bugzilla/show_bug.cgi?id=22070

            Bug ID: 22070
           Summary: charmaps/UTF-8: wcwidth for
                    Prepended_Concatenation_Mark codepoints set to 0
                    (should be 1)
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: vapier at gentoo dot org
                CC: libc-locales at sourceware dot org
  Target Milestone: ---

we currently mark all Cf (Format Character) as width 0, but this ignores
Prepended_Concatenation_Mark codepoints.  specifically these should all have a
wcwidth of 1:
0600..0605 ; Prepended_Concatenation_Mark # Cf  ARABIC NUMBER SIGN..ARABIC
NUMBER MARK ABOVE
06DD       ; Prepended_Concatenation_Mark # Cf  ARABIC END OF AYAH
070F       ; Prepended_Concatenation_Mark # Cf  SYRIAC ABBREVIATION MARK
08E2       ; Prepended_Concatenation_Mark # Cf  ARABIC DISPUTED END OF AYAH
110BD      ; Prepended_Concatenation_Mark # Cf  KAITHI NUMBER SIGN

Unicode 10.0.0 chapter 9 section 2 page 377-378 [1] states:
Signs Spanning Numbers. Several other special signs are written in association
with numbers in the Arabic script. All of these signs can span multiple-digit
numbers, rather than just a single digit. They are not formally considered
combining marks in the sense used by the Unicode Standard, although they
clearly interact graphically with their associated sequence of digits. In the
text representation they precede the sequence of digits that they span, rather
than follow a base character, as would be the case for a combining mark. Their
General_Category value is Cf (format character). Unlike most other format
characters, however, they should be rendered with a visible glyph, even in
circumstances where no suitable digit or sequence of digits follows them in
logical order. The characters have the Bidi_Class value of Arabic_Number to
make them appear in the same run as the numbers following them.

A few similar signs spanning numbers or letters are associated with scripts
other than Arabic. See the discussion of U+070F syriac abbreviation mark in
Section 9.3, Syriac, and the discussion of U+110BD kaithi number sign in
Section 15.2, Kaithi. All of these prefixed format controls, including the
non-Arabic ones, are given the property value
Prepended_Concatenation_Mark=True, to identify them as a class. They also have
special behavior in text segmentation. (See Unicode Standard Annex #29,
“Unicode Text Segmentation.”)

[1] http://unicode.org/versions/Unicode10.0.0/ch09.pdf

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]