This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Undefined behavior in glibc -- was: Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
- From: Joseph Myers <joseph at codesourcery dot com>
- To: Alexander Cherepanov <ch3root at openwall dot com>
- Cc: Dwight Guth <dwight dot guth at runtimeverification dot com>, <libc-alpha at sourceware dot org>
- Date: Wed, 3 Feb 2016 13:07:16 +0000
- Subject: Re: Undefined behavior in glibc -- was: Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
- Authentication-results: sourceware.org; auth=none
- References: <27c31890079f41775175b94a4abedb0c dot squirrel at server316 dot webhostingpad dot com> <alpine dot DEB dot 2 dot 10 dot 1601282115100 dot 6102 at digraph dot polyomino dot org dot uk> <CACLXh_1_dQ5D1QrKQN0pVPzt001WmS4BgwcKZkULK8XnbEMb+g at mail dot gmail dot com> <alpine dot DEB dot 2 dot 10 dot 1601282246340 dot 6102 at digraph dot polyomino dot org dot uk> <CACLXh_3rAudocTEbtZQpVoDcWgm_ww3KcX6j9XCkSRTZVPTUMg at mail dot gmail dot com> <alpine dot DEB dot 2 dot 10 dot 1601282251350 dot 6102 at digraph dot polyomino dot org dot uk> <20160128225845 dot GE14840 at vapier dot lan> <alpine dot DEB dot 2 dot 10 dot 1601282311351 dot 6102 at digraph dot polyomino dot org dot uk> <20160128234356 dot GH14840 at vapier dot lan> <56AAB6CE dot 8060101 at openwall dot com> <20160129005816 dot GK14840 at vapier dot lan> <56AC16C8 dot 4030202 at openwall dot com> <alpine dot DEB dot 2 dot 10 dot 1601311559210 dot 31071 at digraph dot polyomino dot org dot uk> <56B1F294 dot 5020105 at openwall dot com>
On Wed, 3 Feb 2016, Alexander Cherepanov wrote:
> But those parts which cannot be implemented in ISO C would be implemented in
> asm? AIUI there is no much magic in compiling C parts of glibc so they should
No, not necessarily. Written in GNU C with a good understanding of what
transformations are actually possible given the code visible to the
compiler and the rules of GNU C.
> > If something involves undefined behavior in GNU C,
>
> GNU C is ISO C (whatever it means:-) + GCC implementation-defined behavior as
> described in [1] + GCC extensions[2], right?
Plus undocumented features and a good understanding of aspects of how C
relates to the underlying machines (that is, you cannot assume e.g. that a
C addition corresponds to a machine add instruction, but you can assume
that memory is made of pages and that if it's not visible to the compiler
how memory was allocated, having particular access to any byte in a page
means having such access to all bytes in that page, subject to avoiding
data races on write). Cf
<https://www.cl.cam.ac.uk/~pes20/cerberus/notes50-2015-05-24-survey-discussion.html>
(but the non-ISO aspects of GNU C as used in low-level system software
such as glibc and the Linux kernel should be assumed to be much more
wide-ranging than the issues discussed there).
> 1) strlen and other string functions read memory by chunks which is aliasing
> violation;
> 2) strlen accesses the OOB memory.
Both of those fall under deliberate use of separate compilation, i.e. it's
not visible to the compiler what the original effective type of the memory
was or its original size, so a whole page can be accessed provided the
types used don't involve aliasing violations within the source visible to
the compiler.
> > Where glibc code relies on separate compilation to avoid undefined
> > behavior, this is not a bug; use of LTO on glibc's own code is not
> > supported. For example, where functions with different C types but
> > compatible ABIs are aliased, or a function is otherwise defined with a
> > different type from that used when it is called. Similarly, asm may be
> > used to limit code movement, and that could e.g. mean that aliasing is not
> > undefined behavior where it would otherwise be undefined.
>
> Not sure what you mean. Is it specific to calling functions or it's supposed
> to cover the case of strlen too?
It covers strlen. strlen is well-defined in isolation, so can be presumed
to be compiled to an ABI-conforming object file, and such an object file,
if it would work for programs where it would be valid for that strlen
implementation to be part of the program (under a different name), would
also work for other programs that are valid with ISO C strlen.
> I don't see cleaning glibc for LTO in todo list in the wiki. Shouldn't it be
> there?
The todo list basically has ideas that one person once thought might be
useful. In general, you can't assume that things there have consensus,
and nor do you need consensus to put something there. I think
facilitating LTO of the standard C library is a very hard whole-system
issue, not something that could be considered for glibc in isolation;
you'd probably need several different forms of source annotation that act
as barriers to particular forms of LTO, saying that calls to a function
can only be optimized with reference to its semantics and not with
reference to the contents of a particular implementation of it.
> Well, before I go and file the bugs, perhaps it's worth skimming through the
> list. It's mainly invalid pointer arithmetic. GNU C seems not to deviate from
> ISO C in this topic. Should the issues be grouped by type, by file, by subdir
> or in some other way?
I think such cases should generally be considered bugs except for limited
cases such as string functions that deliberately work in terms of pages or
aligned units smaller than pages, and so may go slightly outside the
original object (before or after) but obviously cannot wrap around the
address space to affect the results of comparisons.
The natural division is based on whether it might make sense to fix two
issues separately - so any issues in parts of glibc for which different
people might have expertise, or that differ in how clear it is that there
is a problem or how clear it is what the correct fix would be, should be
filed separately. Of course, it's even better if fixes (following the
contribution checklist) for such issues are submitted as well.
> Capping pointers at the end of the address space
> ------------------------------------------------
>
> Some functions in C11 have a parameter which limits a number of characters
> written. IMHO it's clear that C11 doesn't intend this parameter to be a real
> size of output buffer and, thus, it could validly be SIZE_MAX. But we have
> seen differing opinions in the recent thread about strlcpy. Perhaps you can
> clear the question.
Given the dispute in the Austin Group over snprintf, a C11 DR is clearly
needed.
> Structs at NULL
> ---------------
>
> Plausible optimization: this is unconditional UB, hence mark the code as
> unreachable and remove everything around.
I think such offsetof-type code should be considered valid GNU C, although
it would still be better to use offsetof. At least some are inside
"#ifndef offsetof" (some of these examples are code shared with gnulib or
gettext) and so obviously not bugs in glibc (given that <stddef.h> is
included).
> Pointers used after free (without dereferencing)
> ------------------------------------------------
>
> https://sourceware.org/git/?p=glibc.git;a=blob;f=libio/fileops.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l1063
> (IIUC conditions "fp->_IO_read_base != NULL" and "!_IO_in_backup (fp)" should
> be swapped.)
>
> https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/tst-obstack.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l25
> https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/tst-malloc-backtrace.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l30
> https://sourceware.org/git/?p=glibc.git;a=blob;f=io/pwd.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l41
I'd say such comparisons should be avoided, but that's more a cleanup than
a bug fix.
> Memory allocations without check
> --------------------------------
>
> Not UB but while we are at it...
>
> AIUI the policy in glibc is to check results of malloc no matter how small the
> requested amount of memory is. Right?
Yes.
> So mallocs like in https://sourceware.org/bugzilla/show_bug.cgi?id=19416
> should be reported?
Yes.
--
Joseph S. Myers
joseph@codesourcery.com