This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug locale/19883] New: Unicode encodings should be limited to characters U+0010FFFF and below


https://sourceware.org/bugzilla/show_bug.cgi?id=19883

            Bug ID: 19883
           Summary: Unicode encodings should be limited to characters
                    U+0010FFFF and below
           Product: glibc
           Version: 2.23
            Status: NEW
          Severity: normal
          Priority: P2
         Component: locale
          Assignee: unassigned at sourceware dot org
          Reporter: jsm28 at gcc dot gnu.org
  Target Milestone: ---

There is code in various places in glibc that allows for UTF-8 encodings
representing characters above U+0010FFFF, which were valid in the 2003 edition
of ISO 10646 but not in the 2011 edition.

Such code should be identified and removed.  Such encodings should be treated
as invalid on input.  Values above U+0010FFFF should be treated as invalid for
UCS-4, wchar_t and any equivalent encodings, in the same way that values above
U+7FFFFFFF and values in the surrogate pair range already are (or should be)
for such encodings, rather than converted to such UTF-8 encodings on conversion
to UTF-8.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]