This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [BZ #18441] strcoll performance regression

From: Leonhard Holz <leonhard dot holz at web dot de>
To: libc-alpha at sourceware dot org
Date: Thu, 18 Jun 2015 12:30:00 +0200
Subject: Re: [BZ #18441] strcoll performance regression
Authentication-results: sourceware.org; auth=none
References: <55678186 dot 40106 at web dot de>

Hey,

maybe someone can attach this as a comment to
https://sourceware.org/bugzilla/show_bug.cgi?id=18441?

Best,
Leonhard

Am 28.05.2015 um 22:58 schrieb Leonhard Holz:
> Hello,
> 
> the trigger for the regression is that the locale has no information about the
> sort order of the chars given. With the locale th_TH it is pretty quick:
> 
> "strcoll": {
>    "wikipedia-th#th_TH.UTF-8": {
>     "duration": 4.31123e+06,
>     "iterations": 16,
>     "mean": 269452
>    }
> }
> 
> The english locale has four passes to determine the sort order. In the first three
> passes it reports one recognized sequence length of zero independent of the thai
> word given. At the fourths levels it recognizes the characters which are all
> considered equal so actually the string length is determining the sort order.
> 
> The former version had a cache that avoided lookups in the locale data tables for
> passes > 1 which did probably help in this scenario (but slows down for all others).
> 
> Anyhow the huge difference is astonishing. Next I will investigate how exactly the
> sequence lookup works to figure out why it takes so long. But if anyone has an
> idea and can point me in the right direction please comment.
> 
> Best,
> Leonhard
>

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]