This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Statistics of non-ASCII characters in strings


On 12/23/2014 04:47 PM, Carlos O'Donell wrote:
On 12/22/2014 09:46 AM, Wilco Dijkstra wrote:
Does anyone have statistics of how often strings contain non-ASCII
characters? I'm asking because it's feasible to make many string
functions faster if they are predominantly ASCII by using a different
check for the null byte. So if say 80-90% of strings in strcpy/strlen
are ASCII then it would be well worth optimizing for it.

I don't know that anyone has this data.

The OpenJDK folks are collecting somewhat similar data as part of this project:

  <http://openjdk.java.net/jeps/8054307>

The question is slightly different (how many strings exist which contain non-ASCII characters, and how many of them are not even ISO-8859-1?). Even though the application behavior under consideration is less dynamic (you can get that from a heap dump), it's difficult obtain such data.

--
Florian Weimer / Red Hat Product Security


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]