This is the mail archive of the
mailing list for the glibc project.
[Bug libc/15615] New: Poor quality output from rand_r
- From: "bugdal at aerifal dot cx" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Wed, 12 Jun 2013 23:39:09 +0000
- Subject: [Bug libc/15615] New: Poor quality output from rand_r
- Auto-submitted: auto-generated
Bug ID: 15615
Summary: Poor quality output from rand_r
Assignee: unassigned at sourceware dot org
Reporter: bugdal at aerifal dot cx
CC: drepper.fsp at gmail dot com
Created attachment 7075
test program to generate data for analysis by dieharder
Implementing a decent rand_r is very tricky because the interface requirement
forces the full PRNG state to fit in 32 bits; this rules out pretty much all
good PRNGs. Nonetheless, glibc's rand_r is much worse than it needs to be.
glibc's rand_r is based on the LCG published in the C standard:
next = next * 1103515245 + 12345;
return next / 65536 % 32768;
This sample code in the standard assumes RAND_MAX == 32767. The lowest bit of
the output has period 2^17 and the highest bit has period 2^31. As such, if
only some of the bits are to be used, the highest bits should be kept and the
lower bits discarded; this is the principle the sample code above is using for
generating 15-bit output from a 32-bit internal state.
However, glibc is trying to generate 31-bit output from a 32-bit state. Just
discarding the lowest bit would not be acceptable (the new lowest bit would
have period 4), so something different needs to be done. glibc's approach is to
advance the LCG 3 times (since 3 and 2^32 are relatively prime, this does not
hurt the period of the internal state) and take 10-11 bits from each iteration
to make a 31-bit output. Unfortunately, this approach has 2 serious flaws:
1. The 10-11 bits taken are bits 16-25 or 16-26, which have moderately low
period, rather than bits 21-31 or 22-31 which would have maximal period. Thus,
a lot of the potential period length is being thrown away.
2. Concatenating the bits from multiple steps, rather than using a single step
of the LCG, introduces bias into the output. The frequency of each possible
output is not the same; instead, some outputs occur much more often, and some
not at all. Bias is especially noticable when the period is short, which it
necessarily is with a 32-bit state, and which is made much worse by item 1
If nothing else, item 1 should be fixed by adjusting the bitshifts to use the
highest available bits, rather than the middle bits, to obtain the 3 10-11 bit
values. This is straightforward and a pure improvement.
To fully fix rand_r, the approach of concatenating multiple iterations should
be abandoned in favor of a single-LCG-iteration approach followed by an
invertable transformation on the output. Obviously a 32-bit cryptographic block
cipher would give the best statistical properties, but it would be slow. In
musl, we are now using the "temper" function from mersenne twister. Another
solution that seems to work is performing a mix of bit rotations, bswap, and
multiplication by well-chosen odd numbers, applied to the output of the LCG.
I am attaching a test program for use with the dieharder PRNG test suite that
demonstrates the low quality of glibc's rand_r. In particular, at least the
OPSO, OQSO, and DNA tests show problems (tests 5, 6, and 7) and offhand it
makes sense that this kind of bias would affect them. Note that, since
dieharder requires 32-bit input as opposed to 31-bit input, I slightly modified
the glibc algorithm to pull 11,11,10 bits instead of 11,10,10 bits, and
included it in the test program rather than calling glibc's rand_r. Producing
tests that work directly with glibc's rand_r to demonstrate the problem would
be a lot more work.
To run the test, compile glibc_rand_r.c and pipe output to dieharder:
./glibc_rand_r | dieharder -a -g 200
You are receiving this mail because:
You are on the CC list for the bug.