This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[RFC] further speedup of strcoll
- From: Leonhard Holz <leonhard dot holz at web dot de>
- To: libc-alpha at sourceware dot org
- Date: Wed, 11 Feb 2015 10:38:02 +0100
- Subject: [RFC] further speedup of strcoll
- Authentication-results: sourceware.org; auth=none
The strcoll method can probably be made faster by fast-forwarding to the position
of the first difference of the two strings via bytewise comparison so that the
expensive location-aware comparison has to deal with the interesting part of the
strings only. But this has some implications so I ask for your feedback.
First issue is that I do not know of a function which returns the position of the
first difference of two strings. This functionality is actually included in strcmp
but not accessible. I could implement a new __ function in C and later carve out
platform dependent versions from strcmp.
Then it needs to be encoding aware as for anything but 8bit encodings the bytewise
comparison can stop in the middle of a multi-byte value. Actually there is no easy
way get the encoding type (one could check for "UTF-8" in the locale name, but
this is way to expensive) and probably __locale_t has to be extended with an int
encoding_type that e.g. states 0 for other, 1 for 8-bit, 2 for UTF-8 and so on. Is
this feasible?
And finally it is necessary to check if the difference is inside a backward
sequence since then there could be a difference after the found difference that is
logically before it. And since there is no way to find the start of a backward
sequence from within the comparison has to start from the beginning of the strings
and the effort to get the position of the first difference is wasted. But this
should be overweighted by the speedup in the other cases, especially if the
strings are actually equal.
Best,
Leonhard