This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [BZ #18441] strcoll performance regression


Hey,

maybe someone can attach this as a comment to
https://sourceware.org/bugzilla/show_bug.cgi?id=18441?

Best,
Leonhard

Am 28.05.2015 um 22:58 schrieb Leonhard Holz:
> Hello,
> 
> the trigger for the regression is that the locale has no information about the
> sort order of the chars given. With the locale th_TH it is pretty quick:
> 
> "strcoll": {
>    "wikipedia-th#th_TH.UTF-8": {
>     "duration": 4.31123e+06,
>     "iterations": 16,
>     "mean": 269452
>    }
> }
> 
> The english locale has four passes to determine the sort order. In the first three
> passes it reports one recognized sequence length of zero independent of the thai
> word given. At the fourths levels it recognizes the characters which are all
> considered equal so actually the string length is determining the sort order.
> 
> The former version had a cache that avoided lookups in the locale data tables for
> passes > 1 which did probably help in this scenario (but slows down for all others).
> 
> Anyhow the huge difference is astonishing. Next I will investigate how exactly the
> sequence lookup works to figure out why it takes so long. But if anyone has an
> idea and can point me in the right direction please comment.
> 
> Best,
> Leonhard
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]