This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Implement C11 annex K?

From: "David A. Wheeler" <dwheeler at dwheeler dot com>
To: "libc-alpha" <libc-alpha at sourceware dot org>
Date: Fri, 15 Aug 2014 18:00:59 -0400 (EDT)
Subject: Re: Implement C11 annex K?
Authentication-results: sourceware.org; auth=none
Reply-to: dwheeler at dwheeler dot com

David A. Wheeler (me) said:
> > snprintf is overcomplicated for simple string copying (a common operation)

On Fri, 15 Aug 2014 12:39:12 -0400, Rich Felker <dalias@libc.org> wrote:
> One extra argument of "%s" is not really over-complicated. I'll grant
> one thing: having the return value with type int rather than size_t is
> mildly problematic, in that snprintf can't be fully equivalent to
> strlcpy or strncpy_s, but in cases where the input string is >2GB
> you'd probably prefer to have an error rather than processing an
> unbounded amount of input, as it's almost surely malicious. If your
> data is really potentially that large, C strings are not the proper
> format for it.

I agree snprintf's return type (int) is problematic compared to strlcpy/strlcat (size_t).

Your comments here are a little oblique; let me explain what *I* think the return type problem is
(I'm *guessing* we mean the same thing). snprintf returns the number of chars it *WOULD* have written,
not what it ACTUALLY wrote, so the "obvious" way to use it for a string copy looks like this:
 char buf[100];
 if (snprintf(buf, sizeof(buf), "%s", src) >= sizeof(buf)) {  // FAIL
   // handle truncation
 }
But wait, "int" is signed!!  Thus, if you give it a source with a length a little longer than INT_MAX,
then the int will (probably) wrap; snprintf will return a NEGATIVE number on
truncation, which will always less less than a sizeof, and thus the "handle truncation" is *skipped*
even when truncation has occurred. INT_MAX is only guaranteed to be +32767, so this
can happen easily on small systems (think "internet of things"), but even on some
32-bit systems this can happen.

This can be handled, but you can't just use a simple cast; errors also return negatives from snprintf.
Why do I need this complexity for trivial copy and concatenation again?
It might be better for it to just not copy so much (that's the point of rsize_t in annex K),
but clearly returning a value that is unlikely to be interpreted correctly is a problem.

> ... concatenation without storing a "current position" between
> operations is bad anyway (see: Schlemiel the painter).

I completely disagree; this is simply another trade-off.

In many cases you *should* hire Schlemiel.  Tracking current position
can be much faster, sure, but it is also a great way to create off-by-one errors.
In practice errors are often more than one, since it's easy to forget to change the position.
Concatenation code is often not in anything time-critical, anyway.
Thus the performance time "saved" via tracking current position is actually a net project negative
once you consider the increased development time and the increased risk and
potential impact of dangerous buffer overflows from using approaches
that easily lead to mistakes. Even in time-critical code Schlemiel
does quite well when the normal case is fairly short, and this is a
really really common case.

As strings get long, Schlemiel gets slower, but he's a more reliable worker :-).

> Still I agree with you that this makes it hard to fix existing broken code using
> only snprintf, and in fact it's dangerous when people try because they
> do idiotic things that invoke UB like:
> 
> snprintf(buf, size, "%s%s", buf, foo);

...

> > No formatting string means no opportunity for that mistake.
> > There's also the overhead (admittedly slight) of passing and processing the formatting string.
> 
> Actually it's a fairly heavy overhead; for small strings it certainly
> dominates the runtime of the whole operation.

That makes my argument stronger, then.

> > Many people who are trying to write secure software in C (such as the OpenBSD and Microsoft folks)
> > are increasingly trying to *stop* the use of traditional functions like strcpy and strncpy,
> 
> I reject the claim that MS's interest in stopping the use of these
> traditional functions has anything to do with security.

I disagree, but that's not my point.  My point is that buffer overflows are
a serious problem; we need widely-available functions for common cases that are
easy to use and make them much less likely.  Also, strlcpy/strlcat were created by OpenBSD
to counter buffer overflows, so clearly it's not just Microsoft that see a need for
standard easy-to-use functions to counter buffer overflows in fixed-length buffers.

--- David A. Wheeler

Follow-Ups:
- Re: Implement C11 annex K?
  - From: Rich Felker
- Re: Implement C11 annex K?
  - From: Russ Allbery

References:
- Re: Implement C11 annex K?
  - From: Rich Felker

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]