This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Undefined behavior in glibc


On 2016-02-03 16:07, Joseph Myers wrote:
If something involves undefined behavior in GNU C,

GNU C is ISO C (whatever it means:-) + GCC implementation-defined behavior as
described in [1] + GCC extensions[2], right?

Plus undocumented features

The moment you say "undocumented" everything falls apart. One cannot be sure whether some observed behavior is an undocumented feature or something that will be broken by the next gcc release.

and a good understanding of aspects of how C
relates to the underlying machines (that is, you cannot assume e.g. that a
C addition corresponds to a machine add instruction, but you can assume
that memory is made of pages and that if it's not visible to the compiler
how memory was allocated, having particular access to any byte in a page
means having such access to all bytes in that page, subject to avoiding
data races on write).

1. This rules out all modern hardening/debugging solutions like Valgrind, ASan, Intel MPX, CHERI (don't know if the latter is relevant for glibc).

2. You seem to be concerned with crashes only but there are other problems too, e.g., info leaks (think Heartbleed). Saying that the code is free to access any accessible memory means that one cannot catch info leaks automatically.

Cf
<https://www.cl.cam.ac.uk/~pes20/cerberus/notes50-2015-05-24-survey-discussion.html>
(but the non-ISO aspects of GNU C as used in low-level system software
such as glibc and the Linux kernel should be assumed to be much more
wide-ranging than the issues discussed there).

Yes, I have seen it. I have even linked to your reply to this survey in my last email.

1) strlen and other string functions read memory by chunks which is aliasing
violation;
2) strlen accesses the OOB memory.

Both of those fall under deliberate use of separate compilation, i.e. it's
not visible to the compiler what the original effective type of the memory
was or its original size, so a whole page can be accessed provided the
types used don't involve aliasing violations within the source visible to
the compiler.

I thought that this falls under "there is no plausible optimization [due to the info not available to a compiler]". Well, ok.

Then let's look at strlen (&co.) from another angle. It starts accessing its parameter as chars and then continues accessing it as longs, like this:

  ...
  char *p = ...;
  *p++;
  *(long *)p;

Is there a situation where this is valid in GNU C? For ISO C the question seems quite moot but I think Question 16 in DR 017 plus DR 260 say that it's not valid. Maybe there are no restrictions in GNU C on intra-object pointer arithmetic / accesses at all (apart from aliasing rules)?

I don't see cleaning glibc for LTO in todo list in the wiki. Shouldn't it be
there?

The todo list basically has ideas that one person once thought might be
useful.  In general, you can't assume that things there have consensus,
and nor do you need consensus to put something there.  I think
facilitating LTO of the standard C library is a very hard whole-system
issue, not something that could be considered for glibc in isolation;
you'd probably need several different forms of source annotation that act
as barriers to particular forms of LTO, saying that calls to a function
can only be optimized with reference to its semantics and not with
reference to the contents of a particular implementation of it.

Isn't noinline+noclone enough for this right now?

Well, before I go and file the bugs, perhaps it's worth skimming through the
list. It's mainly invalid pointer arithmetic. GNU C seems not to deviate from
ISO C in this topic. Should the issues be grouped by type, by file, by subdir
or in some other way?

I think such cases should generally be considered bugs except for limited
cases such as string functions that deliberately work in terms of pages or
aligned units smaller than pages, and so may go slightly outside the
original object (before or after) but obviously cannot wrap around the
address space to affect the results of comparisons.

Ok

The natural division is based on whether it might make sense to fix two
issues separately - so any issues in parts of glibc for which different
people might have expertise, or that differ in how clear it is that there
is a problem or how clear it is what the correct fix would be, should be
filed separately.  Of course, it's even better if fixes (following the
contribution checklist) for such issues are submitted as well.

Ok

Capping pointers at the end of the address space
------------------------------------------------

Some functions in C11 have a parameter which limits a number of characters
written. IMHO it's clear that C11 doesn't intend this parameter to be a real
size of output buffer and, thus, it could validly be SIZE_MAX. But we have
seen differing opinions in the recent thread about strlcpy. Perhaps you can
clear the question.

Given the dispute in the Austin Group over snprintf, a C11 DR is clearly
needed.

Ok

Structs at NULL
---------------

Plausible optimization: this is unconditional UB, hence mark the code as
unreachable and remove everything around.

I think such offsetof-type code should be considered valid GNU C, although

Ok

it would still be better to use offsetof.  At least some are inside
"#ifndef offsetof" (some of these examples are code shared with gnulib or
gettext) and so obviously not bugs in glibc (given that <stddef.h> is
included).

Yeah, and I managed to include some that are used to get a size of a field instead of an offset and so are fine even in ISO C. Have to filter them out in the future.

Pointers used after free (without dereferencing)
------------------------------------------------

https://sourceware.org/git/?p=glibc.git;a=blob;f=libio/fileops.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l1063
(IIUC conditions "fp->_IO_read_base != NULL" and "!_IO_in_backup (fp)" should
be swapped.)

FTR: this one seems to be a false. fp->_IO_read_base could be accessed on line 1063 after a free on line 1020 but it's set inside _IO_setg on line 1025.

https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/tst-obstack.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l25
https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/tst-malloc-backtrace.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l30
https://sourceware.org/git/?p=glibc.git;a=blob;f=io/pwd.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l41

I'd say such comparisons should be avoided, but that's more a cleanup than
a bug fix.

BTW is there a formal method to tell test files from important files? I see .../tst-* and .../test-* files but io/pwd.c looks like a test too.

--
Alexander Cherepanov


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]