This is the mail archive of the libc-alpha@sourceware.cygnus.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: strtok bug


On Fri, 14 Jan 2000, Clifford Wolf wrote:

> The other thing is that it only dumps core it the first argument is "".
> 
> strtok("a",":"); strtok(NULL,":");  strtok(NULL,":"); strtok(NULL,":");

By the way, in general, it is not safe to call strtok with a string literal as
the first argument, because strtok modifies that string when it extracts a
token.  That shouldn't happen in these examples, since no separator
character to be overwritten follows the token.

> Works fine on ppc becouse the pointer is saved as soon as the first
> charecter has been procesed. There is only one execption: If there is
> no character processed.

I believe the exception is more general than that: if no token is extracted.
Characters may be processed even though no token is extracted.

Have you tried, instead of an empty string, strings containing sequences
of separator characters?

For example:

    char colon[] = ":::";

    strtok(colon, ":"); strtok(NULL, ":");

This will probably also crash. Because strtok has to skip an initial sequence
of separator characters, scanning for the first non-separator character.

> The X/Open standard says:
> 
> "... If no such byte is found, the current token extends to the end of the
> string pointed to by s1, and subsequent searches for a token will return a
> null pointer. ..."

ANSI C says the same thing; the X/Open wording here is derived straight. This
refers to the processing that takes place *after* the leading separator
characters are skipped, if any, because the first non-separator character is
found.  The task then is to find the end of the token, by scanning for the
first subsequent separator character.

The above text merely says that if the end of the string is encountered,
the token just extends to the end of the string; the terminating null
byte effectively acts as the separator.

In this case, the pointer must be saved---that requirement is implicit
because subsequent searches must return null. Though I suppose that the
requirement could be met without saving the pointer, so I actually
wouldn't hastily come to that conclusion.
 
> Sure - It's not defined in the standard what should happen for subsequent
> searches when allready the first call returned a NULL pointer. But it does
> not say 'Read and write from a NULL pointer and dump core'. It just says
> nothing about it - so the most logical thing would be to do the same thing
> as it's defined for the case that the last token has been reached and
> another search is invoked: returning a NULL pointer.

The absence of a definition of behavior constitutes undefined behavior. There
are several rational ways to handle undefined behavior.

Whenever it is easy and efficient to do so, undefined behavior should be
diagnosed: a diagnostic should be produced during program translation or
execution, and possibly the program should be stopped. It's certainly
nice if your compiler can help you find array overruns or arithmetic
overflows!

Secondly, if it's not easily diagnosed, the next best thing is to do nothing
about it. The behavior is documented as undefined, so the programmer should
know not invoke it.

Lastly, undefined behavior may be an opportunity to provide some kind of
documented extension. This rarely makes sense for most kinds of undefined
behaviors.  Moreover, purposely correcting for undefined behavior is a poor
strategy. It only hinders the programmer in developing portable software.

> > Ah well. I never want to think about strtok again! ;)
> 
> Me too. But I don't have a choice becouse I have to port programms which
> depend on the function (like inetd) and I don't like to write workarrounds
> for all the programms becouse a bug in the c library.

Despite the undefined behavior, I agree that this is a bug.  Here is why: some
existing library documentation (info page) suggests that this may in fact be a
documented extension:

     The string to be split up is passed as the NEWSTRING argument on
     the first call only.  The `strtok' function uses this to set up
     some internal state information.  Subsequent calls to get additional
     tokens from the same string are indicated by passing a null pointer
     as the NEWSTRING argument.  Calling `strtok' with another non-null
     NEWSTRING argument reinitializes the state information.

Therefore, it would seem that the library fails to provide documented behavior
as promised, since no state information (i.e. the saved pointer) is initialized
on the first call under some conditions.

This is analogous to some C vendor saying that void main(void) { } plays a
lovely melody out of the soundcard, but instead it causes your power supply to
explode.

> yours,
>  - clifford
> 
> PS: I still don't know why I only get the core dumps on ppc. It seams like
> some cracy pointer magic ...

The reason you don't get the core dump on i386 is because custom machine
code is used for strtok rather than the C code in sysdeps/generic.

Do a find on the source tree for 'strtok.S'.

(But did you also say that it doesn't crash on alpha? There is no alpha assembly
file for strtok, but maybe some inline code is used. There is some crap in
string/bits/string2.h )


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]