Bug 12701

Summary: scanf accepts non-matching input
Product: glibc Reporter: Rich Felker <bugdal>
Component: stdioAssignee: Not yet assigned to anyone <unassigned>
Status: REOPENED ---    
Severity: critical CC: BM-2cUWXLL9JqDWpsk8EivFHCgbsbXFtsxQrk, bnnf-yellowbot, bnnf, fweimer, hjl.tools, lynneandallan, namboru, pcarlini, ricilake, tydeman.fred, vincent-srcware
Priority: P2 Flags: fweimer: security+
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Attachments: scanf test cases

Description Rich Felker 2011-04-25 15:13:20 UTC
glib'c scanf function incorrectly handles cases where it reads a sequence of characters which are an initial subsequence of a matching sequence, but not actually a matching sequence, for the conversion specifier. Examples include:

sscanf("abc", "%4c", buf) returns 1 instead of 0 or EOF (not sure which is correct) and leaves no way for the caller to know buf[3] is unfilled.

sscanf("0xz", "%x%c", &x, &c) returns 2 instead of 0.

sscanf("1.0e+!", "%f%c", &x, &c) returns 2 instead of 0.

etc.
Comment 1 Ulrich Drepper 2011-05-02 01:40:09 UTC
All of these cases are correctly handled.

scanf is badly designed, just don't use it if you cannot live with these results.
Comment 2 Rich Felker 2011-05-02 02:35:40 UTC
They are not correctly handled. Please refer to C99, 7.19.6.2, paragraph 9, which defines an input item as:

"the longest sequence of input characters which does not exceed any specified field width and which is, or is a prefix of, a matching input sequence"

Paragraph 10 then reads:

"If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure."

Clearly in the case of sscanf("0xz", "%x%c", &x, &c), the first "input item" is "0x", and it is not a matching sequence for the %x conversion (see the specification of strtoul, in terms of which scanf %x is specified), so the result must be a matching failure.

If you're going to wrongly mark this bug as "RESOLVED", at least mark it "WONTFIX" rather than "INVALID" and acknowledge that it's a bug that you're unwilling to fix, and that glibc is intentionally non-conformant in this matter.
Comment 3 Ulrich Drepper 2011-05-03 00:30:31 UTC
They are handled correctly.  You don't understand the limit of push backs.
Comment 4 Rich Felker 2011-05-03 00:40:26 UTC
Yes I understand pushbacks.

Scanning "0xz" for %x results in an input item of "0x" with "z" pushed back into the unread buffer. The bug has nothing to do with pushbacks, because the right data is pushed back. The bug is that a non-matching input item is treated as a match rather than a matching error.

Perhaps you thought I was saying the input item should be "0", successfully converted, with "x" as the next unread character in the buffer. Of course this is wrong and I do not believe such a thing.

Perhaps you should try reading the actual language standard rather than assuming you're right.
Comment 5 Ulrich Drepper 2011-05-03 01:13:29 UTC
(In reply to comment #4)
> Yes I understand pushbacks.

You apparently don't.   This is no place to get a free education.

Don't reopen the bug, there will be no change.
Comment 6 Rich Felker 2011-05-03 02:22:50 UTC
OK if you insist that I don't reopen it, I'm fixing the resolution to "WONTFIX".
Comment 7 Rich Felker 2011-09-25 04:42:31 UTC
Reopening since I found a statement from an official source (Fred J. Tydeman, Vice-char of PL22.11) that the glibc behavior is incorrect:

http://newsgroups.derkeiler.com/Archive/Comp/comp.std.c/2009-09/msg00045.html

Sorry I don't have a better newsgroup archive link.
Comment 8 Ulrich Drepper 2011-10-29 17:14:29 UTC
What on earth are you talking about.  Fred said exactly the same: 0xz causes the z to be rejected for the %x and therefore used for the %c.  Stop wasting my time.
Comment 9 Rich Felker 2011-10-29 21:24:08 UTC
Apparently you only read the first quoted paragraph and not the second:

> > - the input item "0x" is not a matching sequence, so the execution of
> > the whole directive fails;
> 
> Correct

What part of "the execution of the whole directive fails" are you not understanding? When a directive fails, scanf stops and returns the number of directives successfully converted and stored. This number is zero, not two. The %c is never processed. glibc is wrong. Please fix it.

If you insist on keeping compatibility with hypothetical existing binaries that depend on the wrong behavior, that's what glibc has symbol versioning for...
Comment 10 Ulrich Drepper 2011-10-29 21:36:38 UTC
The behavior is correct and wanted.  Now stop wasting people's time.
Comment 11 Rich Felker 2011-10-30 05:42:51 UTC
Fred Tydeman (vice chair of PL22.11/J11) has stated as clearly and directly that the current glibc behavior is NOT correct. Whether it's wanted is a more subjective question, but I have not seen anyone but yourself who wants scanf to behave incorrectly in this manner. Please fix this bug.
Comment 12 Rich Felker 2012-03-17 20:39:24 UTC
Ping. Would somebody other than Mr. Drepper be willing to review this bug report?
Comment 13 Joseph Myers 2012-03-18 14:28:19 UTC
This bug report appears to be correct, and the erroneous behavior described still present with current glibc (tested x86_64).
Comment 14 Rich Felker 2012-04-18 05:21:24 UTC
Created attachment 6345 [details]
scanf test cases

I recently wrote a set of test cases for verifying my scanf implementation, and running it against glibc reproduces A LOT of instances of this bug... See attached test program.
Comment 15 paxdiablo 2012-11-26 08:26:04 UTC
I think this bug report is correct, at least in relation to the '%x/0xz' sample.

There's a big difference between an input item, which *may* be an initial subset of a properly scanned directive, and the *properly scanned directive* itself.

Pushback controls how far you can back up the "input stream pointer" and is the reason why scanf is usually not used by professionals, who prefer a fgets/sscanf combo so they can bak up to the start of the line themselves. However, the pushback is only relevant here in that context. The failure of '0x' when scanning '%x' will not be able to push back all the way to the '0' because of this limitation.

The function call sscanf ("a0xz", "%c%x%c") should return 1, not 3.

The controlling part of the standard is the bit dealing with the 'x' directive itself:

=====
Matches an optionally signed hexadecimal integer, whose format is the same as expected for the subject sequence of the strtoul function with the value 16 for the base argument.
=====

The strtoul stuff states:

=====
If the value of base is zero, the expected form of the subject sequence is that of an integer constant as described in 6.4.4.1, optionally preceded by a plus or minus sign, but not including an integer suffix. If the value of base is between 2 and 36 (inclusive), the expected form of the subject sequence is a sequence of letters and digits representing an integer with the radix specified by base, optionally preceded by a plus or minus sign, but not including an integer suffix. The letters from a (or A) through z (or Z) are ascribed the values 10 through 35; only letters and digits whose ascribed values are less than that of base are permitted. If the value of base is 16, the characters 0x or 0X may optionally precede the sequence of letters and digits, following the sign if present.
=====

The controlling part there would be "a sequence of letters and digits representing an integer" - you may argue that such a sequence may consist of zero characters but I don't think anyone in their right mind would suggest that definition represented an integer. In any case, the '0x' string fails on strtoul:

    char *x;
    int rc = 42;
    rc = strtoul ("0x", &x, 16);
    printf ("%d [%s]/n", rc, x);
produces:

    0 [0x]
So even though rc is set to 0, the fact that the pointer points to the first bad character means that the '0x' itself is not a valid hex number.

Putting in '0x5' as the string gives you:

    5 []
so that the first bad character is the end of the string (ie, there WERE no bad characters).
Comment 16 Florian Weimer 2014-06-13 14:54:27 UTC
(In reply to Rich Felker from comment #0)

> sscanf("abc", "%4c", buf) returns 1 instead of 0 or EOF (not sure which is
> correct) and leaves no way for the caller to know buf[3] is unfilled.

So this is an information leak.
Comment 17 Florian Weimer 2014-06-27 14:00:25 UTC
*** Bug 12437 has been marked as a duplicate of this bug. ***
Comment 18 Florian Weimer 2018-01-04 12:17:34 UTC
*** Bug 22672 has been marked as a duplicate of this bug. ***
Comment 19 Joseph Myers 2018-06-12 11:15:17 UTC
*** Bug 23274 has been marked as a duplicate of this bug. ***
Comment 20 Joseph Myers 2018-06-12 11:16:01 UTC
*** Bug 1765 has been marked as a duplicate of this bug. ***
Comment 21 Mark winds 2021-02-10 19:30:14 UTC Comment hidden (spam)
Comment 22 mark 2021-03-05 15:27:45 UTC Comment hidden (spam)
Comment 23 namboru 2021-09-15 02:40:36 UTC Comment hidden (spam)
Comment 24 Vincent Lefèvre 2023-07-18 12:01:35 UTC
Note that scanf also accepts "nan(" while it shouldn't (because "nan()" is valid), but for a different reason. See bug 30647 for issues related to scanf with nan.