This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [RFC][PATCH] Refactoring FORTIFY

From: George Burgess IV <george dot burgess dot iv at gmail dot com>
To: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
Cc: libc-alpha at sourceware dot org
Date: Mon, 4 Dec 2017 20:52:51 -0800
Subject: Re: [RFC][PATCH] Refactoring FORTIFY
Authentication-results: sourceware.org; auth=none
References: <CAKh6zBFAGomCFc+Y9=qpH-2N6AonKUja1R+cWy=p_=T=rs29cQ@mail.gmail.com> <41677d63-4cec-e24b-e4b7-378aa030a0f7@linaro.org>
Apologies for the lag; catching up from a vacation.

> __use_clang_fortify is a compiler defined flag which is (?) define only
> for clang 5.0 or higher.

The compiler feature we key off of in clang for this is
`overloadable_unmarked`, which isn't overly interesting in itself, but
it does indicate whether clang has the last feature we need
in order to implement FORTIFY in the way this patch does.

> In any case the document referenced is a good
> addition for explanation, but I usually prefer to have a more self
> explanatory detail on the thread and or email itself.

Well, it's not exactly a *short* explanation, but I did include details
on how this all works at the end of this email. Please let me know
if you had something else in mind, or if there's anything I can
expand on. :)

> I really want to avoid a compiler specific flag to deactivate fortify
> where there is already in place better approach [...]

Yeah, the goal was to be as backwards compatible as possible
(so people could still have the FORTIFY protection they have
today if they didn't want to tweak their code), but FORTIFY on
clang today doesn't really provide a great experience, so I'm
happy to keep FORTIFY all-or-nothing.

> Indeed to move forward on review we will need to split this patch by
> family functions, maybe an initial patch to add the required macros
> and gcc required refactoring and other for family of functions or
> by headers with the adjustments for clang.

Sounds good. I'll start on that shortly. :)

> The only issue I see is lacking of testing I do not have an easier
> solution.  One option would to try check for clang with support for
> fortify on path and add extra rules for build the fortify tests
> with clang as well, but I am not sure how complex would be to adjust
> current testcase for such scenario.

Good call. I'll see if I can make this work.

-------

# Details of implementation

Clang's FORTIFY implementation boils down to a set of overloads of
standard library functions. We have special attributes we can
overload on (enable_if + pass_object_size) while keeping the
same language-level function type. In overload resolution, clang
prefers functions that have these attributes over overloads without,
all else being equal, which allows us to 'intercept' all calls to
FORTIFY'ed functions.

(These overloads do all have C++-like mangled names, but they're
also all static + always_inline'd, so the fact that we're using
overloads shouldn't be easily visible to the user.)

For compile-time diagnostics, clang uses diagnose_if, which is a
function-level attribute that tries to evaluate a condition at each
callsite of a function. If the condition is true, it'll emit a warning
or error.

Due to how clang's architected (no inlining before we run
optimizations; no AST/accurate type info is available during
optimizations), we also need to use the pass_object_size(N) attribute
on function parameters that we need to call __builtin_object_size on.
This adds a hidden size_t param to the parameter's function, and
causes all calls to said function to pass __builtin_object_size(p, N)
as that hidden parameter.

There are also a number of cases where FORTIFY uses
pass_object_size on functions that don't call __builtin_object_size.
This is generally for some combination of three reasons:
- there's nothing to overload on (e.g. see printf),
- as noted above, in a set with two overloads that have identical
  signatures (in C/C++), the one with pass_object_size wins, and
- functions with one or more parameters that have the
  pass_object_size attribute cannot have their address taken.

Without the last bullet, code like:

void foo(void *fn);
void bar() { foo(open); }

breaks, since we have open(const char *, int) and
open(const char *, int, int) overloads.

Putting this all into practice, improving run-time bounds
checking for clang FORTIFY is simply an issue of adding
pass_object_size to functions. Compile-time checks, OTOH,
are a bit more interesting, since GCC wants the check to live in
the function's body, whereas clang wants it to live in a function-level
attribute. My patch adds
__FORTIFY_PRECONDITIONS/__FORTIFY_FUNCTION_END
macros, which turn into balanced braces in GCC, and are nops in
clang.

The macros that emit diagnostics (e.g.
__FORTIFY_WARNING_ONLY_IF_BOS_LT2) either turn into
diagnose_if on clang, or unfold into a FORTIFY warning function
declaration + call on GCC. The call is logically unreachable, but
done in a way that GCC can't DCE it if we need to emit a
diagnostic. Because we're hiding all of this behind a macro,
many textual FORTIFY function bodies turn into:
{
  if (__FORTIFY_CALL_CHK && __bos (__ptr) != 1)
    return __foo_chk (__ptr, __bos (__ptr), ...);
  return __foo_real (__ptr, ...);
}

...Where __FORTIFY_CALL_CHK is always 1 on clang, but
on GCC, it's a variable (which should be trivially foldable to a
constant after a tiny bit of optimization). It contains whether
or not our static checks were able to verify that the call must
be safe (0 if guaranteed safe, 1 if not). It's a bit ugly, but
it matches how we act in the current FORTIFY implementation.

True variadic functions are a bit of a pain point, since clang can't
inline them and use __va_arg_pack () like GCC does. So, if
available, clang falls back on va_arg versions (vprintf instead of
printf, ...) instead.

Finally, C's variadic fauxverloads (like open()) are handled by
actual overloads. I've yet to find a clean way to stamp these out
with macros, so this patch just uses one function per reasonable
signature of a function.

...And I think that's about it for the nitty-gritty of the actual
implementation. As you noted, going through macros makes
both GCC and Clang emit "note:"s about each macro; the
implementation of some of these macros tries to keep that to
a bare minimum, at the cost of some repetition.

Hope this helped, :)
George

On Mon, Nov 20, 2017 at 3:03 PM, Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
> On 11/09/2017 03:26, George Burgess IV wrote:
>> Hello,
>>
>> Attached is a patch that aims to substantially improve FORTIFY's
>> usefulness with clang, and make defining FORTIFY'ed functions require
>> less ceremony.
>>
>> I'm not looking for a thorough review at this time, nor do I hope to
>> land this patch in one piece. The goal of this thread is to ensure
>> that everyone's more or less happy with where I'd like to take glibc's
>> FORTIFY implementation.
>
> Thanks for the patch and patience. The idea sound reasonable and it will
> be indeed a good addition to extend fortify functionality to clang as well.
>
>>
>> Please note that this change is intended to be a functional nop for
>> all compilers that aren't clang >= 5.0, which was just released last
>> Thursday.
>
> As a potential side note it would be good to add some short explanation
> of the internal required to adjust it for clang, for instance that
> __use_clang_fortify is a compiler defined flag which is (?) define only
> for clang 5.0 or higher. In any case the document referenced is a good
> addition for explanation, but I usually prefer to have a more self
> explanatory detail on the thread and or email itself.
>
>>
>> Diving in: as said, this patch removes a lot of duplication from
>> FORTIFY in the common case, and makes FORTIFY far more useful for
>> those who use glibc with clang. Namely, with this patch, clang becomes
>> capable of emitting compile-time diagnostics on par (*) with GCC's,
>> and clang's ability to perform run-time checks is substantially
>> improved over what we have today.
>>
>> It essentially does this by wrapping up the majority of the
>> compiler-specific incantations (declaring __foo_chk_warn,
>> conditionally calling it, ...) behind a macro, and uses that to stamp
>> out FORTIFY's compile-time diagnostic bits. While this approach is the
>> cleanest I've been able to come up with, it has potential downsides:
>>
>> - Compile-time diagnostics with GCC are somewhat different than what
>> they are today. To show this, I've attached tst-chk2-output.diff,
>> which is a diff of the diagnostics produced by running GCC 7.1 on
>> debug/tst-chk2.c. I don't find the difference to be substantial, but
>> it does exist.
>
> Taking your provided file as an example I do not see it as blocker
> for this approach.  Although the new warning prints the macros used
> to parametrize the builtins invocation, I think it still shows enough
> information for debugging.  For instance:
>
> -../libio/bits/stdio2.h:64:10: warning: ‘__builtin___snprintf_chk’: specified bound 3 exceeds the size 2 of the destination [-Wstringop-overflow=]
> -   return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
> -          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> -        __bos (__s), __fmt, __va_arg_pack ());
> -        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +../libio/bits/stdio2.h:79:7: warning: ‘__builtin___snprintf_chk’: specified bound 3 exceeds the size 2 of the destination [-Wstringop-overflow=]
> +   int __result = __FORTIFY_CALL_VA_BUILTIN (snprintf, __s, __n,
>
> Also, some seems more convoluted as:
>
>  In function ‘wmemmove’,
>      inlined from ‘do_test’ at tst-chk1.c:681:3:
> -../wcsmbs/bits/wchar2.h:77:9: warning: call to ‘__wmemmove_chk_warn’ declared with attribute warning: wmemmove called with length bigger than size of destination buffer
> -  return __wmemmove_chk_warn (__s1, __s2, __n,
> -         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> -         __bos0 (__s1) / sizeof (wchar_t));
> -         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +../wcsmbs/bits/wchar2.h:56:42: warning: call to ‘__wmemmove_warn’ declared with attribute warning: wmemmove called with length bigger than size of destination buffer
> +      __FORTIFY_WARNING_ONLY_IF_BOS0_LT2 (__wmemmove_warn, __n, __s1,
> +../misc/sys/cdefs.h:234:5: note: in definition of macro ‘__FORTIFY_WARNING_ONLY_IF_BOS0_LT2’
> +     err_fn (); \
> +     ^~~~~~
>
> Which might trigger an warning with different internal names being used
> and also with a more convoluted warning which points to an internal name
> instead direct to a builtin.  I still think this is non blocker since
> the warning is still meaningful and correct to point the api usage issue.
>
>> - In very rare cases, the code generated by GCC will be a bit worse
>> (e.g. slower+larger) with this patch. I know this may be a tough sell,
>> but please hear me out. :)
>>
>> With this patch, we sometimes emit diagnostics by emitting code like:
>> if (should_emit_compile_time_warning) {
>>   volatile char c = 0;
>>   if (__glibc_unlikely (c))
>>     function_with_warnattr ();
>> }
>>
>> Where `should_emit_compile_time_warning` always folds to a constant
>> during compilation. So, 0 diagnostics should mean 0 change in code
>> quality.
>>
>> I don't believe this is a deal-breaker, since:
>> - if you're using FORTIFY, you presumably care enough to fix
>> FORTIFY-related warnings, which makes this regression nonexistent for
>> you,
>> - if you're using FORTIFY, you know there will be a (small, but
>> present) performance penalty,
>> - directly after each __glibc_unlikely(c) branch is call to a FORTIFY
>> function with known broken input, which should abort the program
>> anyway, and
>> - this doesn't apply to *all* diagnostics (e.g. bad calls to open()
>> don't have this penalty); just ones where we can unify clang's and
>> GCC's diagnostic emission bits.
>
> I also agree potential pessimization on code that triggers the fortify
> checks is not really a blocker.  The only issue I would consider for
> it would be if hot path for fortified code also get slower/larger,
> but it does not seem to be the case.
>
>>
>> In any case, please see binary-output.diff for an idea of the
>> difference this makes on code compiled with GCC 7.1. The function
>> being compiled was a call to wmemmove with an undersized buffer. With
>> a sufficiently large buffer, both today's FORTIFY implementation and
>> the proposed one produce identical bodies for the function in
>> question.
>>
>> Other than that, I know of no regressions that this patch causes with GCC.
>>
>> For clang, in very rare cases (read: I've seen ~10 instances of this
>> testing similar implementations across Android, ChromeOS, and another
>> very large code base), it can also break existing code. For specifics
>> on that, and an overview on how clang's FORTIFY implementation works,
>> please see
>> https://docs.google.com/document/d/1DFfZDICTbL7RqS74wJVIJ-YnjQOj1SaoqfhbgddFYSM/edit?usp=sharing
>
> For 'Incompatibilities between clang and GCC FORTIFY' chapter in your
> documentation, the first one I do not see really an issue because
> relying on compiler constant fold strlen is quite fragile.  The second
> one seem more a minor issue, but the a compiler details for a very
> specialized usage.  I do not have a strong opinion if they should be
> used as blockers.
>
>> The "How does clang handle it?" section covers the primary attributes
>> this patch uses, and the "Incompatibilities between clang and GCC
>> FORTIFY" section covers places where this patch might break existing
>> clang users.
>>
>> Though I expect clang breakages to be very rare and trivial to fix,
>> this patch introduces a _CLANG_FORTIFY_DISABLE macro, which can be
>> used to turn this entire patch into a nop for clang >= 5.0.
>
> I really want to avoid a compiler specific flag to deactivate fortify
> where there is already in place better approach (either adjusting
> the code or disabling fortify). Also it means it would need to to
> continue export such macro to maintain compatibility, which also adds
> a lot of complexity with no so straightforward gains.
>
>>
>> -----
>>
>> I apologize if this is a lot to dump into one email. As said, if we
>> decide this is an acceptable direction to head in, I hope to land this
>> patch in many easily reviewable parts later on.
>
> Indeed to move forward on review we will need to split this patch by
> family functions, maybe an initial patch to add the required macros
> and gcc required refactoring and other for family of functions or
> by headers with the adjustments for clang.
>
> The only issue I see is lacking of testing I do not have an easier
> solution.  One option would to try check for clang with support for
> fortify on path and add extra rules for build the fortify tests
> with clang as well, but I am not sure how complex would be to adjust
> current testcase for such scenario.
>
>>
>> If you have any questions, comments, concerns, etc. please don't
>> hesitate to voice them.
>>
>> Thank you very much for your time, :)
>> George
>>
>> (*) - This means that clang can catch almost as many *types* of bugs
>> as GCC at compile-time. Due to architectural constraints, clang can
>> only perform these checks prior to optimizing code.
>> So, clang is still unable to statically diagnose bugs that require it
>> to do more than simple constant folding. __builtin_object_size,
>> however, isn't required to be folded until after we optimize code, so
>> many dynamic checks can still take advantage of optimizations.
>>

On Mon, Nov 20, 2017 at 12:03 PM, Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
> On 11/09/2017 03:26, George Burgess IV wrote:
>> Hello,
>>
>> Attached is a patch that aims to substantially improve FORTIFY's
>> usefulness with clang, and make defining FORTIFY'ed functions require
>> less ceremony.
>>
>> I'm not looking for a thorough review at this time, nor do I hope to
>> land this patch in one piece. The goal of this thread is to ensure
>> that everyone's more or less happy with where I'd like to take glibc's
>> FORTIFY implementation.
>
> Thanks for the patch and patience. The idea sound reasonable and it will
> be indeed a good addition to extend fortify functionality to clang as well.
>
>>
>> Please note that this change is intended to be a functional nop for
>> all compilers that aren't clang >= 5.0, which was just released last
>> Thursday.
>
> As a potential side note it would be good to add some short explanation
> of the internal required to adjust it for clang, for instance that
> __use_clang_fortify is a compiler defined flag which is (?) define only
> for clang 5.0 or higher. In any case the document referenced is a good
> addition for explanation, but I usually prefer to have a more self
> explanatory detail on the thread and or email itself.
>
>>
>> Diving in: as said, this patch removes a lot of duplication from
>> FORTIFY in the common case, and makes FORTIFY far more useful for
>> those who use glibc with clang. Namely, with this patch, clang becomes
>> capable of emitting compile-time diagnostics on par (*) with GCC's,
>> and clang's ability to perform run-time checks is substantially
>> improved over what we have today.
>>
>> It essentially does this by wrapping up the majority of the
>> compiler-specific incantations (declaring __foo_chk_warn,
>> conditionally calling it, ...) behind a macro, and uses that to stamp
>> out FORTIFY's compile-time diagnostic bits. While this approach is the
>> cleanest I've been able to come up with, it has potential downsides:
>>
>> - Compile-time diagnostics with GCC are somewhat different than what
>> they are today. To show this, I've attached tst-chk2-output.diff,
>> which is a diff of the diagnostics produced by running GCC 7.1 on
>> debug/tst-chk2.c. I don't find the difference to be substantial, but
>> it does exist.
>
> Taking your provided file as an example I do not see it as blocker
> for this approach.  Although the new warning prints the macros used
> to parametrize the builtins invocation, I think it still shows enough
> information for debugging.  For instance:
>
> -../libio/bits/stdio2.h:64:10: warning: ‘__builtin___snprintf_chk’: specified bound 3 exceeds the size 2 of the destination [-Wstringop-overflow=]
> -   return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
> -          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> -        __bos (__s), __fmt, __va_arg_pack ());
> -        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +../libio/bits/stdio2.h:79:7: warning: ‘__builtin___snprintf_chk’: specified bound 3 exceeds the size 2 of the destination [-Wstringop-overflow=]
> +   int __result = __FORTIFY_CALL_VA_BUILTIN (snprintf, __s, __n,
>
> Also, some seems more convoluted as:
>
>  In function ‘wmemmove’,
>      inlined from ‘do_test’ at tst-chk1.c:681:3:
> -../wcsmbs/bits/wchar2.h:77:9: warning: call to ‘__wmemmove_chk_warn’ declared with attribute warning: wmemmove called with length bigger than size of destination buffer
> -  return __wmemmove_chk_warn (__s1, __s2, __n,
> -         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> -         __bos0 (__s1) / sizeof (wchar_t));
> -         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +../wcsmbs/bits/wchar2.h:56:42: warning: call to ‘__wmemmove_warn’ declared with attribute warning: wmemmove called with length bigger than size of destination buffer
> +      __FORTIFY_WARNING_ONLY_IF_BOS0_LT2 (__wmemmove_warn, __n, __s1,
> +../misc/sys/cdefs.h:234:5: note: in definition of macro ‘__FORTIFY_WARNING_ONLY_IF_BOS0_LT2’
> +     err_fn (); \
> +     ^~~~~~
>
> Which might trigger an warning with different internal names being used
> and also with a more convoluted warning which points to an internal name
> instead direct to a builtin.  I still think this is non blocker since
> the warning is still meaningful and correct to point the api usage issue.
>
>> - In very rare cases, the code generated by GCC will be a bit worse
>> (e.g. slower+larger) with this patch. I know this may be a tough sell,
>> but please hear me out. :)
>>
>> With this patch, we sometimes emit diagnostics by emitting code like:
>> if (should_emit_compile_time_warning) {
>>   volatile char c = 0;
>>   if (__glibc_unlikely (c))
>>     function_with_warnattr ();
>> }
>>
>> Where `should_emit_compile_time_warning` always folds to a constant
>> during compilation. So, 0 diagnostics should mean 0 change in code
>> quality.
>>
>> I don't believe this is a deal-breaker, since:
>> - if you're using FORTIFY, you presumably care enough to fix
>> FORTIFY-related warnings, which makes this regression nonexistent for
>> you,
>> - if you're using FORTIFY, you know there will be a (small, but
>> present) performance penalty,
>> - directly after each __glibc_unlikely(c) branch is call to a FORTIFY
>> function with known broken input, which should abort the program
>> anyway, and
>> - this doesn't apply to *all* diagnostics (e.g. bad calls to open()
>> don't have this penalty); just ones where we can unify clang's and
>> GCC's diagnostic emission bits.
>
> I also agree potential pessimization on code that triggers the fortify
> checks is not really a blocker.  The only issue I would consider for
> it would be if hot path for fortified code also get slower/larger,
> but it does not seem to be the case.
>
>>
>> In any case, please see binary-output.diff for an idea of the
>> difference this makes on code compiled with GCC 7.1. The function
>> being compiled was a call to wmemmove with an undersized buffer. With
>> a sufficiently large buffer, both today's FORTIFY implementation and
>> the proposed one produce identical bodies for the function in
>> question.
>>
>> Other than that, I know of no regressions that this patch causes with GCC.
>>
>> For clang, in very rare cases (read: I've seen ~10 instances of this
>> testing similar implementations across Android, ChromeOS, and another
>> very large code base), it can also break existing code. For specifics
>> on that, and an overview on how clang's FORTIFY implementation works,
>> please see
>> https://docs.google.com/document/d/1DFfZDICTbL7RqS74wJVIJ-YnjQOj1SaoqfhbgddFYSM/edit?usp=sharing
>
> For 'Incompatibilities between clang and GCC FORTIFY' chapter in your
> documentation, the first one I do not see really an issue because
> relying on compiler constant fold strlen is quite fragile.  The second
> one seem more a minor issue, but the a compiler details for a very
> specialized usage.  I do not have a strong opinion if they should be
> used as blockers.
>
>> The "How does clang handle it?" section covers the primary attributes
>> this patch uses, and the "Incompatibilities between clang and GCC
>> FORTIFY" section covers places where this patch might break existing
>> clang users.
>>
>> Though I expect clang breakages to be very rare and trivial to fix,
>> this patch introduces a _CLANG_FORTIFY_DISABLE macro, which can be
>> used to turn this entire patch into a nop for clang >= 5.0.
>
> I really want to avoid a compiler specific flag to deactivate fortify
> where there is already in place better approach (either adjusting
> the code or disabling fortify). Also it means it would need to to
> continue export such macro to maintain compatibility, which also adds
> a lot of complexity with no so straightforward gains.
>
>>
>> -----
>>
>> I apologize if this is a lot to dump into one email. As said, if we
>> decide this is an acceptable direction to head in, I hope to land this
>> patch in many easily reviewable parts later on.
>
> Indeed to move forward on review we will need to split this patch by
> family functions, maybe an initial patch to add the required macros
> and gcc required refactoring and other for family of functions or
> by headers with the adjustments for clang.
>
> The only issue I see is lacking of testing I do not have an easier
> solution.  One option would to try check for clang with support for
> fortify on path and add extra rules for build the fortify tests
> with clang as well, but I am not sure how complex would be to adjust
> current testcase for such scenario.
>
>>
>> If you have any questions, comments, concerns, etc. please don't
>> hesitate to voice them.
>>
>> Thank you very much for your time, :)
>> George
>>
>> (*) - This means that clang can catch almost as many *types* of bugs
>> as GCC at compile-time. Due to architectural constraints, clang can
>> only perform these checks prior to optimizing code.
>> So, clang is still unable to statically diagnose bugs that require it
>> to do more than simple constant folding. __builtin_object_size,
>> however, isn't required to be folded until after we optimize code, so
>> many dynamic checks can still take advantage of optimizations.
>>
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]