This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Generic strlen


On 10/29/2010 06:53 PM, Eric Blake wrote:
On 10/29/2010 04:46 PM, David A. Ramos wrote:
As long as reading beyond the end of a string does not fault, you can't
detect the violation of the standard, so the as-if rule applies.  Prove
to me that there is an architecture that can fault on anything less than
a word boundary, and then we'll talk about changing the code.  Until
then, this implementation may violate strict C89, but it is by all means
portable to all possible platforms that newlib will ever target.

Take a look at the February 2008 edition of the Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 3B: System Programming Guide, Part 2, Section 18.2: Debug Registers:


"For each breakpoint, the following information can be specified:
- The linear address where the breakpoint is to occur.
- The length of the breakpoint location (1, 2, or 4 bytes)."

Running under a debugger is not a normal expectation, and you are naive if you expect that libc will be using byte accesses when it is much faster to use word accesses.


"When the DE flag is set, the processor interprets bits as follows: 11 - Break on data reads or writes but not instruction fetches."

Using this version of strlen precludes a developer from setting a watchpoint on a byte within the same word as the end of a string. It would, in fact, fault erroneously and make debugging difficult.

If you're going to the extremes of setting watchpoints on the tail of a string, then you should either be prepared to watch all possible word read sizes, or supply your own strlen() implementation, overriding libc, that does the naive (and SLOW) byte-wise access to guarantee that your debugging session will hit what you want. But we should not penalize libc for this non-typical use.

glibc's generic code versions do _the exact same thing_ of reading
beyond string bounds in a lot of their str* functions, and I don't see
anyone asking glibc to change their generic version.  Just because word
accesses might make debugging a bit more difficult, and just because you
have to add exceptions to your memory tracer tools to skip known safe
patterns like strlen() reading an entire aligned word even though it
exceeds the bounds of the string ending in that word, does not mean that
we should pessimize the code.


Agreed. The glibc implementation of strlen uses the same algorithm. One can already optionally build newlib with str* byte versions using the --enable-target-optspace option.


-- Jeff J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]