This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC] Statistics of non-ASCII characters in strings
- From: Rich Felker <dalias at libc dot org>
- To: Alexander Monakov <amonakov at ispras dot ru>
- Cc: Florian Weimer <fweimer at redhat dot com>, Wilco Dijkstra <wdijkstr at arm dot com>, libc-alpha at sourceware dot org
- Date: Tue, 23 Dec 2014 14:26:34 -0500
- Subject: Re: [RFC] Statistics of non-ASCII characters in strings
- Authentication-results: sourceware.org; auth=none
- References: <001401d01df6$0f7cc5a0$2e7650e0$ at com> <54997DBF dot 6070305 at redhat dot com> <alpine dot LNX dot 2 dot 11 dot 1412231751580 dot 32565 at monopod dot intra dot ispras dot ru>
On Tue, Dec 23, 2014 at 06:25:07PM +0300, Alexander Monakov wrote:
>
>
> On Tue, 23 Dec 2014, Florian Weimer wrote:
> > Why can't you do the equivalent of
> >
> > X = ((X & 0x80) >> 1) | (X & 0x7F);
> >
> > before the new check? Does this lengthen the dependency chain too much?
>
> If understood the previous discussion correctly, there's another possibility.
> Wilco's proposal is to use a zero byte matcher that would give a false
> positive on byte 0x80. One can use such matcher to skip from the beginning of
> string to the first occurence of either 0x0 or 0x80 in the string, and then
> continue with normal strlen from there.
This sounds like a very good approach.
Rich