This is the mail archive of the
libc-locales@sourceware.org
mailing list for the GNU libc locales project.
[Bug localedata/14095] Review / update collation data from Unicode / ISO 14651
- From: "joseph at codesourcery dot com" <sourceware-bugzilla at sourceware dot org>
- To: libc-locales at sourceware dot org
- Date: Tue, 30 Jun 2015 11:14:35 +0000
- Subject: [Bug localedata/14095] Review / update collation data from Unicode / ISO 14651
- Auto-submitted: auto-generated
- References: <bug-14095-716 at http dot sourceware dot org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=14095
--- Comment #2 from joseph at codesourcery dot com <joseph at codesourcery dot com> ---
The people involved in getting the collation data to its present state are
mostly no longer involved in glibc development, so if you want an
authoritative answer you'll need to do a lot of work tracking them down.
My hypothesis would be that each person submitting a change generally had
their own itch to scratch (supporting collation for their own language
better, with no interest in a more general update to a newer version of
ISO 14651, if a newer version even existed at that time, or insufficient
time / expertise / resources to get involved in their national standards
committees parallel to JTC1/SC2/WG2, if ISO 14651 did not support their
language then) and that each person accepting such a change decided that
it was better to have the incremental improvement than to have no
collation support for that language for the indefinite future until
someone appeared to contribute a more thorough update.
We don't, however, need to know people's motivations for making
incremental changes rather than larger bulk updates. The questions that
are actually relevant for updating the data now are more along the lines
of: for the original addition of the ISO 14651 data, what differences are
there from the relevant version of ISO 14651? Do those differences relate
to conceptual differences between the POSIX collation model and the ISO
14651 collation model, or do they reflect different choices for how to
collate particular characters? If they reflect different choices, do we
still agree that those choices are appropriate for the contexts in which
glibc locales are used, or, with hindsight, would the ISO 14651 choices
now be better? Where a change was made subsequently affecting existing
characters, is the change still at variance with current ISO 14651, and do
we think there is still a good reason for such a difference? Where
collation support for new characters was added, how does that support
compare to the support, if any, for those characters in current ISO 14651,
and are there any differences we think are deliberate and should be
preserved? Do any differences reflect cases where e.g. different national
standards specify different collation for the same characters (or
collation differs by context), and so individual locales may need to
override the generic international version?
Yes, there is a lot of detailed, careful work involved in analysis of the
history of the current collation data in order to produce a justified
analysis of those questions with recommendations for how to use data from
current ISO 14651. Given the responsibility to users to avoid
regressions, we need to understand what changes would be involved in such
an update, and satisfy ourselves that they are good changes rather than
regressions, as part of making such an update. Contributors willing to
help with that careful analysis are welcome.
--
You are receiving this mail because:
You are on the CC list for the bug.