This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Bug localedata/14095] Review / update collation data from Unicode / ISO 14651


On Tue, Jun 30, 2015 at 11:14:35AM +0000, joseph at codesourcery dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=14095
> 
> --- Comment #2 from joseph at codesourcery dot com <joseph at codesourcery dot com> ---
> The people involved in getting the collation data to its present state are 
> mostly no longer involved in glibc development, so if you want an 
> authoritative answer you'll need to do a lot of work tracking them down.  
> My hypothesis would be that each person submitting a change generally had 
> their own itch to scratch (supporting collation for their own language 
> better, with no interest in a more general update to a newer version of 
> ISO 14651, if a newer version even existed at that time, or insufficient 
> time / expertise / resources to get involved in their national standards 
> committees parallel to JTC1/SC2/WG2, if ISO 14651 did not support their 
> language then) and that each person accepting such a change decided that 
> it was better to have the incremental improvement than to have no 
> collation support for that language for the indefinite future until 
> someone appeared to contribute a more thorough update.
> 
> We don't, however, need to know people's motivations for making 
> incremental changes rather than larger bulk updates.  The questions that 
> are actually relevant for updating the data now are more along the lines 
> of: for the original addition of the ISO 14651 data, what differences are 
> there from the relevant version of ISO 14651?  Do those differences relate 
> to conceptual differences between the POSIX collation model and the ISO 
> 14651 collation model, or do they reflect different choices for how to 
> collate particular characters?  If they reflect different choices, do we 
> still agree that those choices are appropriate for the contexts in which 
> glibc locales are used, or, with hindsight, would the ISO 14651 choices 
> now be better?  Where a change was made subsequently affecting existing 
> characters, is the change still at variance with current ISO 14651, and do 
> we think there is still a good reason for such a difference?  Where 
> collation support for new characters was added, how does that support 
> compare to the support, if any, for those characters in current ISO 14651, 
> and are there any differences we think are deliberate and should be 
> preserved?  Do any differences reflect cases where e.g. different national 
> standards specify different collation for the same characters (or 
> collation differs by context), and so individual locales may need to 
> override the generic international version?
> 
> Yes, there is a lot of detailed, careful work involved in analysis of the 
> history of the current collation data in order to produce a justified 
> analysis of those questions with recommendations for how to use data from 
> current ISO 14651.  Given the responsibility to users to avoid 
> regressions, we need to understand what changes would be involved in such 
> an update, and satisfy ourselves that they are good changes rather than 
> regressions, as part of making such an update.  Contributors willing to 
> help with that careful analysis are welcome.

Well, I was the author of many of the collation specs for different
languages, and I am still around, and I have even joined glibc maintenance
just a few years ago.

The 14651 and POSIX model are the same, or 14651 is backwards compatible
with Posix. We cannot say that we are following POSIX straightly,
then we could not have locales working, as POSIX is not well suited for
ISO 10646 UCS. So we are not adhering to POSIX, but rather 14651.

The different locale collation data were designed to adhere to
14651, in an orthogonal way, just like 14651 was designed to be used.

I am willing to contribute with a look on the different issues.

Best regards
Keld


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]