This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 2/4] New string function explicit_bzero (from OpenBSD).
On 19/08/2016 10:00, Zack Weinberg wrote:
> On 08/19/2016 08:43 AM, Adhemerval Zanella wrote:
>> On 18/08/2016 17:53, Torvald Riegel wrote:
>>> On Thu, 2016-08-18 at 20:31 +0200, Florian Weimer wrote:
>>>> On 08/17/2016 07:19 PM, Zack Weinberg wrote:
>>>>> +#ifdef __USE_MISC
>>>>> +/* As bzero, but the compiler will not delete a call to this
>>>>> + function, even if S is dead after the call. Note: this function
>>>>> + has its own implementation file and should not be slurped into
>>>>> + string-inlines.o. */
>>>>> +__extern_inline void
>>>>> +explicit_bzero (void *__s, size_t __n)
>>>>> +{
>>>>> + memset (__s, '\0', __n);
>>>>> + __glibc_read_memory (__s, __n);
>>>>> +}
>>>>> +#endif
>>>>
>>>> __extern_inline can expand to nothing at all, and you would get multiple
>>>> definitions of explicit_bzero this way.
>
> OK, what should I be using instead?
Fortify functions uses '__fortify_function' which defines both
__extern_always_inline and __attribute_artificial__. However fortify is guarded
by _FORTIFY_SOURCE, which requires GCC 4.1 (so __extern_always_inline will be
defined anyway).
Stdio also defines:
#ifndef __extern_inline
# define __STDIO_INLINE inline
#else
# define __STDIO_INLINE __extern_inline
#endif
#ifdef __USE_EXTERN_INLINES
...
#endif
But I think this will also only add unnecessary complexity on the patch.
Also, another issue with gnu_inline is when building for C++ it might get also
multiple definitions [1].
That's why I think simpler solution would be just to remove this inline
optimization.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41194
>
>>>> I don't think we want explicit_bzero to be inlined, it's useful to have
>>>> this name in the executable. Furthermore, we might want to add
>>>> additional state clearing later, so an implementation in libc.so.6 seems
>>>> desirable anyway.
>
> Oddly enough, inlining the memset leads to better security properties
> for generated code. Remember the "key that lives in the vector
> registers until it has to be copied onto the stack to be erased" scenario?
>
> ---
> #include <emmintrin.h>
> #include <string.h>
>
> extern void explicit_bzero (void *ptr, size_t size);
>
> extern __m128 get_key ();
> extern void encrypt_with_key (__m128 key, void *data, size_t size);
>
> void encrypt (void *data, size_t size)
> {
> __m128 key = get_key ();
> encrypt_with_key (key, data, size);
> explicit_bzero (&key, sizeof key);
> }
> ---
>
> (Assume an ABI where __m128 actually gets passed in registers.)
>
> If you expose the memset to inlining, the key *won't* be copied into
> the stack and then immediately erased; the compiler will just dutifully
> clear 16 bytes of the stack and then pass them to __glibc_read_memory.
> It still won't be erased from the vector registers, but that can only be
> fixed with compiler support.
>
> (It also leads to significantly more compact code -- see my experiment
> with OpenSSL from last year -- but that's not as important.)
I also think compact code should not be a blocker here as well. However I
do not have a better solution for this specific solution without compiler
support.
>
>>> I suppose we just want a compiler barrier here though, and don't need a
>>> memory barrier in the sense of something that constrains HW reordering.
>
> That is correct; however, the "memory barrier" I was talking about was
>
> asm volatile ("" ::: "memory");
>
> which is also only a compiler barrier.
>
>> I would also suggest to avoid adding this inline optimization, it just add
>> another exported symbol by glibc with little performance benefit.
>
> The *other* reason why I want to keep the inline optimization is that it
> means I don't need to add __explicit_bzero_chk (which, because of
> ifuncs, means futzing with *every sysdeps definition of memset*!). So
> either way there's going to be a new exported symbol.
>
> zw
I do not think this is really required, you can still add a default
debug/explicit_bzero_chk.c which will call explicit_bzero/memset. Each
port already defines how internal memset is called (either using
internal ifunc or redirecting to a default symbol) so I see no need
to add arch specific implementations.
This does not prevent though to later we add arch-specific optimized
version (which might be ifunc).