This is the mail archive of the mailing list for the GNU libc locales project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][BZ 18934] hu_HU: Fix multiple sorting bugs.

Carlos, any news?

Did Luis's and my comment help you move forward?

I'd like to emphasize again that my patch does not do anythinig
serious. No big redesign, no fundamental change, nothing like this.
The things Luis mentioned were either already implemented that way, or
I did not touch them. It's just a few, technically small bugfixes that
I made. Really nothing big deal. Plus unittests.

I have, a long time ago, offered that I can turn this all-in-one patch
into like 4-5 patches to be applied on top of each other. But then
they'd have to be reviewed and applied in a particular order (because
they'd heavily conflict) at once. I know that generally this is the
preferred approach, however, it cannot work together with test driven
development since there's no way to test the intermediate (i.e.
deliberately still broken) states. Having chosen TDD, the result of my
work was a patch that fixes all the referred bugs in a single step. I
can, I still offer to spend some more time on it to create a few
smaller, easier to review patches *if* seriously that is what's
missing from getting my work accepted. Let me know.


On Sun, Feb 5, 2017 at 5:29 PM, Egmont Koblinger <> wrote:
> Hi Luis,
> Thanks again for your valuable input! I hope it'll help us move forward.
>> - When a word appears both with and without hyphen (pingpong and
>> ping-pong), they collate differently.
> This is another case where I haven't touched anything. I was happy
> that spaces, hyphens were "accidentally" treated the way the Hungarian
> rules specify (and the unittests verify to some extent). The rules say
> that spaces and hyphens should be ignored -- but does not specify what
> should happen if they are the only difference. Glibc's ordering seems
> to be "pingpong" < "ping pong" < "ping-pong" which I personally don't
> like, I'd prefer "pingpong" being at the end. Anyway, if we're about
> to change this at all, it should be a subsequent separate change.
> The standard is not only unspecified in certain cases, it also says in
> bullet point 14e that in some cases different rules than the ones
> specified might be used, e.g. sort based on the first unit. Similarly,
> point 16 mentions that in some cases it's desired to use a generic
> Latin alphabet that doesn't know anything about Hungarian compound
> letters and such.
> Back to 14e, one typical example is phone books. Note that in
> Hungarian the names are in "reverse" order, family name followed by
> given name. According to 14d, the ordering should be "Kiss Tamás" <
> "Kis Tamás". This is counterintuitive and prevents grouping (family
> name written out only once for multiple entries). Phone books order
> the family names, and within the same family name they order the given
> names.
> I think it's beyond glibc's scope to address different possible
> variations of collations. I, for one, have no desire whatsoever trying
> to come up with various hu_HU@whatever collation definitions.
> cheers,
> egmont

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]