This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC] Statistics of non-ASCII characters in strings

From: Rich Felker <dalias at libc dot org>
To: Alexander Monakov <amonakov at ispras dot ru>
Cc: Florian Weimer <fweimer at redhat dot com>, Wilco Dijkstra <wdijkstr at arm dot com>, libc-alpha at sourceware dot org
Date: Tue, 23 Dec 2014 14:26:34 -0500
Subject: Re: [RFC] Statistics of non-ASCII characters in strings
Authentication-results: sourceware.org; auth=none
References: <001401d01df6$0f7cc5a0$2e7650e0$ at com> <54997DBF dot 6070305 at redhat dot com> <alpine dot LNX dot 2 dot 11 dot 1412231751580 dot 32565 at monopod dot intra dot ispras dot ru>

On Tue, Dec 23, 2014 at 06:25:07PM +0300, Alexander Monakov wrote:
> 
> 
> On Tue, 23 Dec 2014, Florian Weimer wrote:
> > Why can't you do the equivalent of
> > 
> >   X = ((X & 0x80) >> 1) | (X & 0x7F);
> > 
> > before the new check?  Does this lengthen the dependency chain too much?
> 
> If understood the previous discussion correctly, there's another possibility.
> Wilco's proposal is to use a zero byte matcher that would give a false
> positive on byte 0x80.  One can use such matcher to skip from the beginning of
> string to the first occurence of either 0x0 or 0x80 in the string, and then
> continue with normal strlen from there.

This sounds like a very good approach.

Rich

References:
- [RFC] Statistics of non-ASCII characters in strings
  - From: Wilco Dijkstra
- Re: [RFC] Statistics of non-ASCII characters in strings
  - From: Florian Weimer
- Re: [RFC] Statistics of non-ASCII characters in strings
  - From: Alexander Monakov

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]