This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC] Statistics of non-ASCII characters in strings
- From: Alexander Monakov <amonakov at ispras dot ru>
- To: Florian Weimer <fweimer at redhat dot com>
- Cc: Wilco Dijkstra <wdijkstr at arm dot com>, libc-alpha at sourceware dot org
- Date: Tue, 23 Dec 2014 18:25:07 +0300 (MSK)
- Subject: Re: [RFC] Statistics of non-ASCII characters in strings
- Authentication-results: sourceware.org; auth=none
- References: <001401d01df6$0f7cc5a0$2e7650e0$ at com> <54997DBF dot 6070305 at redhat dot com>
On Tue, 23 Dec 2014, Florian Weimer wrote:
> Why can't you do the equivalent of
>
> X = ((X & 0x80) >> 1) | (X & 0x7F);
>
> before the new check? Does this lengthen the dependency chain too much?
If understood the previous discussion correctly, there's another possibility.
Wilco's proposal is to use a zero byte matcher that would give a false
positive on byte 0x80. One can use such matcher to skip from the beginning of
string to the first occurence of either 0x0 or 0x80 in the string, and then
continue with normal strlen from there.
Alexander