This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Update __STDC_ISO_10646__, and general Unicode character data issues


glibc's __STDC_ISO_10646__ value hasn't been updated for a long time.
The meaning of this constant is, to quote C11, "An integer constant of
the form yyyymmL (for example, 199712L). If this symbol is defined,
then every character in the Unicode required set, when stored in an
object of type wchar_t, has the same value as the short identifier of
that character. The Unicode required set consists of all the
characters that are defined by ISO/IEC 10646, along with all
amendments and technical corrigenda, as of the specified year and
month. If some other encoding is used, the macro shall not be defined
and the actual encoding used is implementation-defined.".

The meaning doesn't seem wonderfully well-defined, given that you can
always use a wchar_t value for a new character with old glibc and it
can be copied, or converted to and from UTF-8, just fine.
localedata/charmaps/UTF-8 was last updated by

2011-05-09  Ulrich Drepper  <drepper@gmail.com>

        [BZ #12711]
        * charmaps/UTF-8: Update from reason Unidata.txt file.

(note there were two commits in quick succession, the second removing
some incorrect lines added by the first) and from some spot checks it
looks like this corresponds to Unicode 6.0 and ISO/IEC 10646:2011.  So
on that basis I propose this patch (tested x86_64) to update the value
to 201103L.  (It's correct that the comment still says "2nd ed.";
edition numbering changed when 10646-1 and 10646-2 were merged into a
single 10646 document.)  If however you were concerned about character
properties for iswalpha etc., then those (localedata/charmaps/i18n)
were last substantially updated for Unicode 5.0 - which corresponds to
amendment 2 of 10646:2003, published 2006-07-01 so value 200607L.

Do we have any locale experts interested in sorting out the various
bits of Unicode and related data in glibc properly?  I think the tasks
would include:

* Ensure the character type data in localedata/charmaps/i18n can be
  properly reproduced from Unicode 5.0 data using gen-unicode-ctype.c,
  adapting gen-unicode-ctype.c as needed to replicate any changes that
  may have been made not using that program.

* Update the character type data to Unicode 6.1, removing any local
  hacks from gen-unicode-ctype.c that are no longer needed.
  (10646:2012, corresponding to Unicode 6.1, appears to be in
  publication stage so should be out very soon.)

* Ensure the character data in localedata/charmaps/UTF-8 can be
  reproduced in some automated fashion from Unicode 6.0, locating any
  previously used automation for this or creating some new automation
  if any previous automation can't be found.

* Update the character data to Unicode 6.1, removing any local hacks
  in the automation from the previous step.

* Document thoroughly on the wiki how the automation works and how to
  do updates to new Unicode versions.

* Figure out the origins of the localedata/locales/iso14651_t1_* files
  and whether it's possible to update them usefully from Unicode data.

ISO 10646 and 14651 are freely available from
<http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html>.

2012-05-10  Joseph Myers  <joseph@codesourcery.com>

	* include/stdc-predef.h (__STDC_ISO_10646__): Increase to 201103L.

diff --git a/include/stdc-predef.h b/include/stdc-predef.h
index 146bc5c..788669f 100644
--- a/include/stdc-predef.h
+++ b/include/stdc-predef.h
@@ -30,8 +30,9 @@
 #define __STDC_IEC_559__		1
 #define __STDC_IEC_559_COMPLEX__	1
 
-/* wchar_t uses ISO 10646-1 (2nd ed., published 2000-09-15) / Unicode 3.1.  */
-#define __STDC_ISO_10646__		200009L
+/* wchar_t uses ISO/IEC 10646 (2nd ed., published 2011-03-15) /
+   Unicode 6.0.  */
+#define __STDC_ISO_10646__		201103L
 
 /* We do not support C11 <threads.h>.  */
 #define __STDC_NO_THREADS__		1

-- 
Joseph S. Myers
joseph@codesourcery.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]