[Bug libc/15615] Poor quality output from rand_r

--- Comment #3 from Ondrej Bilka <neleai at seznam dot cz> ---
On Thu, Jun 13, 2013 at 12:38:42PM +0000, bugdal at aerifal dot cx wrote:
> --- Comment #2 from Rich Felker <bugdal at aerifal dot cx> ---
> On Thu, Jun 13, 2013 at 08:26:27AM +0000, neleai at seznam dot cz wrote:
> > A problem here is that for many users predictability is much more
> > important than quality. Developer expects that when he uses rand_r with
> > state that he controls will not vary. This can cause extra debbuging hastle
> > when
> > code mysteriously fails on one machine but not other or desync issues.
> Could you explain better what you're concerned about? By
> "predictable", do you mean keeping the same sequence it's had in the
> past? Aside from that, any PRNG with 32-bit state and 31-bit output is
> equally "predictable".
> > > To fully fix rand_r, the approach of concatenating multiple iterations should
> > > be abandoned in favor of a single-LCG-iteration approach followed by an
> > > invertable transformation on the output. Obviously a 32-bit cryptographic block
> > > cipher would give the best statistical properties, but it would be slow. In
> > 
> > This is false, I have a replacement of this with four rounds of AES. On
> > intel using aesenc I performance is better than current, I did not
> > propose that due of problems above. I wrote a RFC for random
> > replacement on libc-alpha, browse archives.
> AES itself does not use 32-bit blocks, so you must be using a modified
> version. Would you care to explain? I searched the archives but could
> not find your post.
Here, I wrote a version relevant to random. I did this to see how fast I
could get if I employ paralellism and inlining.

To test rand_r equivalent I wrote a simple generator (which is for
mostly to test performance, I did not look for quality.)

  movd    (%rdi),%xmm0
  movdqa %xmm0,%xmm1

  aesenc %xmm0,%xmm1
  aesenc %xmm0,%xmm1
  aesenc %xmm0,%xmm1
  aesenc %xmm0,%xmm1
  movd %xmm1, (%rdi)
  movd %xmm1, %eax
  shr $1, %eax

On sandy bridge this code runs at half of speed of rand_r.

