This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC] further speedup of strcoll


The strcoll method can probably be made faster by fast-forwarding to the position of the first difference of the two strings via bytewise comparison so that the expensive location-aware comparison has to deal with the interesting part of the strings only. But this has some implications so I ask for your feedback.

First issue is that I do not know of a function which returns the position of the first difference of two strings. This functionality is actually included in strcmp but not accessible. I could implement a new __ function in C and later carve out platform dependent versions from strcmp.

Then it needs to be encoding aware as for anything but 8bit encodings the bytewise comparison can stop in the middle of a multi-byte value. Actually there is no easy way get the encoding type (one could check for "UTF-8" in the locale name, but this is way to expensive) and probably __locale_t has to be extended with an int encoding_type that e.g. states 0 for other, 1 for 8-bit, 2 for UTF-8 and so on. Is this feasible?

And finally it is necessary to check if the difference is inside a backward sequence since then there could be a difference after the found difference that is logically before it. And since there is no way to find the start of a backward sequence from within the comparison has to start from the beginning of the strings and the effort to get the position of the first difference is wasted. But this should be overweighted by the speedup in the other cases, especially if the strings are actually equal.

Best,
Leonhard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]