This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [BZ #18441] strcoll performance regression
- From: Leonhard Holz <leonhard dot holz at web dot de>
- To: libc-alpha at sourceware dot org
- Date: Thu, 18 Jun 2015 12:30:00 +0200
- Subject: Re: [BZ #18441] strcoll performance regression
- Authentication-results: sourceware.org; auth=none
- References: <55678186 dot 40106 at web dot de>
Hey,
maybe someone can attach this as a comment to
https://sourceware.org/bugzilla/show_bug.cgi?id=18441?
Best,
Leonhard
Am 28.05.2015 um 22:58 schrieb Leonhard Holz:
> Hello,
>
> the trigger for the regression is that the locale has no information about the
> sort order of the chars given. With the locale th_TH it is pretty quick:
>
> "strcoll": {
> "wikipedia-th#th_TH.UTF-8": {
> "duration": 4.31123e+06,
> "iterations": 16,
> "mean": 269452
> }
> }
>
> The english locale has four passes to determine the sort order. In the first three
> passes it reports one recognized sequence length of zero independent of the thai
> word given. At the fourths levels it recognizes the characters which are all
> considered equal so actually the string length is determining the sort order.
>
> The former version had a cache that avoided lookups in the locale data tables for
> passes > 1 which did probably help in this scenario (but slows down for all others).
>
> Anyhow the huge difference is astonishing. Next I will investigate how exactly the
> sequence lookup works to figure out why it takes so long. But if anyone has an
> idea and can point me in the right direction please comment.
>
> Best,
> Leonhard
>