This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Bug localedata/17750] wrong collation order of diacritics in most locales


On Wed, Nov 29, 2017 at 07:27:32PM +0000, egmont at gmail dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=17750
> 
> --- Comment #18 from Egmont Koblinger <egmont at gmail dot com> ---
> (In reply to keld@keldix.com from comment #16)
> 
> > Also other languages, where french words and names are the biggest source
> > of multiple accented characters should have diacrit backward.
> > This goes for Danish (my own language), Swedish, Norwegian, Finnish, Dutch.
> 
> I can't speak any of these languages, but looking at some random Finnish text I
> see tons of ä and ö letters, a significant amount of words containing 2 or more
> of them. Hence I seriously doubt the correctness of your claim.

Well, in Finnish and other Nordic languages like Danish, Swedish and Norwegian, ö and ä etc
are not considered accented letters, but genuine separated letters, so that is why 
there are few strings with more than one accented letter.


> Even if looking only at the foreign words within these languages, I'd _guess_
> that they take words from each other or maybe German more often than from
> French. But even if let's assume French is the most common source of foreign
> words, that's still not a strong enough reason to go for backwards diacrit
> ordering. In order for backwards diacrit ordering to even be a possibility to
> consider, I believe French accented words should outweigh all other local and
> foreign accented words combined.

German umlaut letters are much the same in Finnish (and Swedish) and ä and ö are
then the same as the genuine Finnish/Swedish letters.

Yes, I also think that the total number of French words with 2 or more accented letters
(according to the rules of the specific language) should outweight the total
number of other occurrances, But I believe that this is the case in the examples that I
have given.


> By the way, don't these language have some "official" collation rules, or at
> least some established common practice?

There are specs from the official standards bodies specifying the backwards diacrit rules, yes.

Best regards
keld


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]