This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] elf: Add AES-128 implementation for arc4random

From: Florian Weimer <fweimer at redhat dot com>
To: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, libc-alpha at sourceware dot org
Date: Mon, 5 Mar 2018 15:48:47 +0100
Subject: Re: [PATCH] elf: Add AES-128 implementation for arc4random
Authentication-results: sourceware.org; auth=none
References: <20180302125302.5CF0F4045458E@oldenburg.str.redhat.com> <afe2fe20-8e3f-a7d2-971a-6a08778e92b5@linaro.org>

On 03/02/2018 08:27 PM, Adhemerval Zanella wrote:



On 02/03/2018 09:53, Florian Weimer wrote:

This commit imports the AES-128 implementation from libgcrypt.

This code has to reside in ld.so because it will be used to
initialize the stack protector cookie and the pointer guard
from the AT_RANDOM variable.

AES-128 was chosen as the cryptographic primitive because hardware
support for AES-128 is much more widespread than for SHA-1 or SHA-256.
This means that we can add hardware acceleration for arc4random for
a larger number of systems, as a subsequent optimization.


I noted other system (*BSD, Linux kernel, etc.) are using ChaCha20 instead
of AES-128 for both arc4random and /dev/{u}random, but I don't have much


FreeBSD still uses RC4 for arc4random for some reason.

information why exactly ChaCha20 was picked instead.  Checking some
discussion why ChaCha20 is preferable [1] it seems is usually faster on
hardware without specialized instructions and less susceptible to cache
timing attacks. However, cryptoanalysis is not really forte, so I just
curious why we should do something different than others for arc4random.

[1] https://crypto.stackexchange.com/questions/34455/whats-the-appeal-of-using-chacha20-instead-of-aes

The advantage of ChaCha20 is that the key schedule is very cheap. Thismeans that you can feed back the output from the generator and use itfor the next block (this is called “backtracking resistance”). It's notreally advisable to do this with a software implementation of AES-128for performance reasons.

The downside is that this Xₙ := fⁿ(X₀) construction risks into runninga small(ish) cycle after many blocks. Therefore, the implementation inlibbsd feeds back not just the 256 key bits, but also 64 bits for theinitial vector, probably hoping that the 320 bits make it sufficientlyunlikely that the initial run until the first repeated block is shorterthan 2**64 iterations or so.

If you encrypt a counter using AES-128, you do not have this problembecause the encrypted blocks are all distinct, but this means there is ageneric discriminator because expected block repeats after 2**64 or soblocks (due to the birthday paradox) simply do not happen.

Another advantage of encrypted counter approach is that you have verylittle per-thread state. Basically just the counter and a key streamdiscriminator (see pthread_thread_number_np), although for somecoprocessor implementations, it may be beneficial to create more than asingle block per iteration.

The backtracking protection in libbsd still looks somewhat expensive, solibbsd generates 1024 output bytes for each feedback step. 40 bytes arefed back, the rest is returned to the application piece by piece. Thisbuffer really has to be thread-local if we want an implementation whichscales, and using 1024 bytes for this seems to be a bit over top. Wecould probably do with fewer bytes than that (40 + X), but it willsubstantially reduce generator throughput.

Performance-wise, on current Intel CPUs with AES support, the AES-128encrypted counter approach will provide a throughput of around 3gigabyte per second, with 80 bytes of per-thread state. I expect thatthe ChaCha20 approach in libbsd will reach this level of performanceonly with a per-thread large buffer, such as the 1064 bytes used inlibbsd, which should give around 2.75 gigabyte per second. With 396bytes of per-thread state, the predicted performance is 1.1 gigabyte persecond, and with 104 bytes, it is 0.24 gigabyte per second.

The nominal security strength of ChaCha20 is higher than that of AES-128(256 bits vs less than 128 bits), but this is for the cipher itself, notfor the generators derived from it. I'm not aware of any reviews of theactual generators.

So if we want backtracking protection, we'd probably have to go with the396-byte ChaCha20 approach (maybe after recovering the TLS spaceoccupied by _res). Otherwise, AES-128 will be the better choice for alot of users (who have access to hardware with AES-128 acceleration).

Unfortunately, maintaining both approaches has quite a bit of overheadbecause they are so different.


Thanks,
Florian

Follow-Ups:
- Re: [PATCH] elf: Add AES-128 implementation for arc4random
  - From: Adhemerval Zanella

References:
- [PATCH] elf: Add AES-128 implementation for arc4random
  - From: Florian Weimer
- Re: [PATCH] elf: Add AES-128 implementation for arc4random
  - From: Adhemerval Zanella

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]