This is the mail archive of the
libc-locales@sourceware.org
mailing list for the GNU libc locales project.
[Bug localedata/17588] New: Update UTF-8 charmap and width to Unicode 7.0.0
- From: "pravin.d.s at gmail dot com" <sourceware-bugzilla at sourceware dot org>
- To: libc-locales at sourceware dot org
- Date: Wed, 12 Nov 2014 10:11:18 +0000
- Subject: [Bug localedata/17588] New: Update UTF-8 charmap and width to Unicode 7.0.0
- Auto-submitted: auto-generated
https://sourceware.org/bugzilla/show_bug.cgi?id=17588
Bug ID: 17588
Summary: Update UTF-8 charmap and width to Unicode 7.0.0
Product: glibc
Version: unspecified
Status: NEW
Severity: normal
Priority: P2
Component: localedata
Assignee: unassigned at sourceware dot org
Reporter: pravin.d.s at gmail dot com
CC: libc-locales at sourceware dot org
Forked from #14094. Good to have separate bugs for UTF-8 and i18n file update.
Tracking changes and issues will be more clearer in long term.
*************************************************************
Joseph Myers 2012-05-10 20:27:32 UTC
The Unicode locale data - character map and LC_CTYPE information - should be
updated from Unicode 6.1 (the character map is currently based on 6.0, and
LC_CTYPE is currently based on 5.0). This should be done with proper
automation and wiki documentation being added of how to do future updates. I
identified the following tasks at
<http://sourceware.org/ml/libc-alpha/2012-05/msg00590.html>:
* Ensure the character type data in localedata/charmaps/i18n can be
properly reproduced from Unicode 5.0 data using gen-unicode-ctype.c,
adapting gen-unicode-ctype.c as needed to replicate any changes that
may have been made not using that program.
* Update the character type data to Unicode 6.1, removing any local
hacks from gen-unicode-ctype.c that are no longer needed.
(10646:2012, corresponding to Unicode 6.1, appears to be in
publication stage so should be out very soon.)
* Ensure the character data in localedata/charmaps/UTF-8 can be
reproduced in some automated fashion from Unicode 6.0, locating any
previously used automation for this or creating some new automation
if any previous automation can't be found.
* Update the character data to Unicode 6.1, removing any local hacks
in the automation from the previous step.
* Document thoroughly on the wiki how the automation works and how to
do updates to new Unicode versions.
[reply] [â] Comment 1 Rich Felker 2012-05-11 03:25:47 UTC
One of the major "local hacks" can be fixed, fixing many other problems at the
same time, by switching to using the Unicode "Alphabetic" property (from
DerivedCoreProperties.txt) instead of just categories L* for class alpha. Right
now there are many languages whose letters are considered non-alphabetic by
glibc because they're in category Mn or Mc or even Cf. There are "local hacks"
to fix this for maybe one or two languages, but using the right Unicode
property would fix it for all languages.
*******************************************************
--
You are receiving this mail because:
You are on the CC list for the bug.