This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC] Statistics of non-ASCII characters in strings
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Florian Weimer <fweimer at redhat dot com>
- Cc: Wilco Dijkstra <wdijkstr at arm dot com>, libc-alpha at sourceware dot org
- Date: Tue, 23 Dec 2014 16:20:49 +0100
- Subject: Re: [RFC] Statistics of non-ASCII characters in strings
- Authentication-results: sourceware.org; auth=none
- References: <001401d01df6$0f7cc5a0$2e7650e0$ at com> <54997DBF dot 6070305 at redhat dot com>
On Tue, Dec 23, 2014 at 03:35:43PM +0100, Florian Weimer wrote:
> On 12/22/2014 03:46 PM, Wilco Dijkstra wrote:
> >Does anyone have statistics of how often strings contain non-ASCII characters? I'm asking because
> >it's feasible to make many string functions faster if they are predominantly ASCII by using a
> >different check for the null byte.
>
> Why can't you do the equivalent of
>
> X = ((X & 0x80) >> 1) | (X & 0x7F);
>
> before the new check? Does this lengthen the dependency chain too much?
>
When string is short and you do not enter loop its best to determine
these exactly. For longer you get considerable savings even by skipping one
operation.