This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] elf: Add AES-128 implementation for arc4random


On 03/02/2018 08:27 PM, Adhemerval Zanella wrote:


On 02/03/2018 09:53, Florian Weimer wrote:
This commit imports the AES-128 implementation from libgcrypt.

This code has to reside in ld.so because it will be used to
initialize the stack protector cookie and the pointer guard
from the AT_RANDOM variable.

AES-128 was chosen as the cryptographic primitive because hardware
support for AES-128 is much more widespread than for SHA-1 or SHA-256.
This means that we can add hardware acceleration for arc4random for
a larger number of systems, as a subsequent optimization.

I noted other system (*BSD, Linux kernel, etc.) are using ChaCha20 instead
of AES-128 for both arc4random and /dev/{u}random, but I don't have much

FreeBSD still uses RC4 for arc4random for some reason.

information why exactly ChaCha20 was picked instead.  Checking some
discussion why ChaCha20 is preferable [1] it seems is usually faster on
hardware without specialized instructions and less susceptible to cache
timing attacks. However, cryptoanalysis is not really forte, so I just
curious why we should do something different than others for arc4random.

[1] https://crypto.stackexchange.com/questions/34455/whats-the-appeal-of-using-chacha20-instead-of-aes

The advantage of ChaCha20 is that the key schedule is very cheap. This means that you can feed back the output from the generator and use it for the next block (this is called “backtracking resistance”). It's not really advisable to do this with a software implementation of AES-128 for performance reasons.

The downside is that this Xₙ := fⁿ(X₀) construction risks into running a small(ish) cycle after many blocks. Therefore, the implementation in libbsd feeds back not just the 256 key bits, but also 64 bits for the initial vector, probably hoping that the 320 bits make it sufficiently unlikely that the initial run until the first repeated block is shorter than 2**64 iterations or so.

If you encrypt a counter using AES-128, you do not have this problem because the encrypted blocks are all distinct, but this means there is a generic discriminator because expected block repeats after 2**64 or so blocks (due to the birthday paradox) simply do not happen.

Another advantage of encrypted counter approach is that you have very little per-thread state. Basically just the counter and a key stream discriminator (see pthread_thread_number_np), although for some coprocessor implementations, it may be beneficial to create more than a single block per iteration.

The backtracking protection in libbsd still looks somewhat expensive, so libbsd generates 1024 output bytes for each feedback step. 40 bytes are fed back, the rest is returned to the application piece by piece. This buffer really has to be thread-local if we want an implementation which scales, and using 1024 bytes for this seems to be a bit over top. We could probably do with fewer bytes than that (40 + X), but it will substantially reduce generator throughput.

Performance-wise, on current Intel CPUs with AES support, the AES-128 encrypted counter approach will provide a throughput of around 3 gigabyte per second, with 80 bytes of per-thread state. I expect that the ChaCha20 approach in libbsd will reach this level of performance only with a per-thread large buffer, such as the 1064 bytes used in libbsd, which should give around 2.75 gigabyte per second. With 396 bytes of per-thread state, the predicted performance is 1.1 gigabyte per second, and with 104 bytes, it is 0.24 gigabyte per second.

The nominal security strength of ChaCha20 is higher than that of AES-128 (256 bits vs less than 128 bits), but this is for the cipher itself, not for the generators derived from it. I'm not aware of any reviews of the actual generators.

So if we want backtracking protection, we'd probably have to go with the 396-byte ChaCha20 approach (maybe after recovering the TLS space occupied by _res). Otherwise, AES-128 will be the better choice for a lot of users (who have access to hardware with AES-128 acceleration).

Unfortunately, maintaining both approaches has quite a bit of overhead because they are so different.

Thanks,
Florian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]