This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC] Statistics of non-ASCII characters in strings

From: Alexander Monakov <amonakov at ispras dot ru>
To: Florian Weimer <fweimer at redhat dot com>
Cc: Wilco Dijkstra <wdijkstr at arm dot com>, libc-alpha at sourceware dot org
Date: Tue, 23 Dec 2014 18:25:07 +0300 (MSK)
Subject: Re: [RFC] Statistics of non-ASCII characters in strings
Authentication-results: sourceware.org; auth=none
References: <001401d01df6$0f7cc5a0$2e7650e0$ at com> <54997DBF dot 6070305 at redhat dot com>

On Tue, 23 Dec 2014, Florian Weimer wrote:
> Why can't you do the equivalent of
> 
>   X = ((X & 0x80) >> 1) | (X & 0x7F);
> 
> before the new check?  Does this lengthen the dependency chain too much?

If understood the previous discussion correctly, there's another possibility.
Wilco's proposal is to use a zero byte matcher that would give a false
positive on byte 0x80.  One can use such matcher to skip from the beginning of
string to the first occurence of either 0x0 or 0x80 in the string, and then
continue with normal strlen from there.

Alexander

Follow-Ups:
- Re: [RFC] Statistics of non-ASCII characters in strings
  - From: Rich Felker

References:
- [RFC] Statistics of non-ASCII characters in strings
  - From: Wilco Dijkstra
- Re: [RFC] Statistics of non-ASCII characters in strings
  - From: Florian Weimer

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]