This is the mail archive of the
mailing list for the GNU libc locales project.
Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.
Thanks again for your valuable input! I hope it'll help us move forward.
> - When a word appears both with and without hyphen (pingpong and
> ping-pong), they collate differently.
This is another case where I haven't touched anything. I was happy
that spaces, hyphens were "accidentally" treated the way the Hungarian
rules specify (and the unittests verify to some extent). The rules say
that spaces and hyphens should be ignored -- but does not specify what
should happen if they are the only difference. Glibc's ordering seems
to be "pingpong" < "ping pong" < "ping-pong" which I personally don't
like, I'd prefer "pingpong" being at the end. Anyway, if we're about
to change this at all, it should be a subsequent separate change.
The standard is not only unspecified in certain cases, it also says in
bullet point 14e that in some cases different rules than the ones
specified might be used, e.g. sort based on the first unit. Similarly,
point 16 mentions that in some cases it's desired to use a generic
Latin alphabet that doesn't know anything about Hungarian compound
letters and such.
Back to 14e, one typical example is phone books. Note that in
Hungarian the names are in "reverse" order, family name followed by
given name. According to 14d, the ordering should be "Kiss Tamás" <
"Kis Tamás". This is counterintuitive and prevents grouping (family
name written out only once for multiple entries). Phone books order
the family names, and within the same family name they order the given
I think it's beyond glibc's scope to address different possible
variations of collations. I, for one, have no desire whatsoever trying
to come up with various hu_HU@whatever collation definitions.