This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [RFC PATCH] aarch64: improve memset


> Richard wrote:
> On 11/10/2014 09:09 PM, Wilco Dijkstra wrote:
> > I spotted one issue in the alignment code:
> >
> > +	stp	xzr, xzr, [tmp2, #64]
> > +
> > +	/* Store up to first SIZE, aligned 16.  */
> > +.ifgt	\size - 64
> > +	stp	xzr, xzr, [tmp2, #80]
> > +	stp	xzr, xzr, [tmp2, #96]
> > +	stp	xzr, xzr, [tmp2, #112]
> > +	stp	xzr, xzr, [tmp2, #128]
> > +.ifgt	\size - 128
> > +.err
> > +.endif
> > +.endif
> >
> > This should be:
> >
> > +	/* Store up to first SIZE, aligned 16.  */
> > +.ifgt	\size - 64
> > +	stp	xzr, xzr, [tmp2, #64]
> > +	stp	xzr, xzr, [tmp2, #80]
> > +	stp	xzr, xzr, [tmp2, #96]
> > +	stp	xzr, xzr, [tmp2, #112]
> > +.ifgt	\size - 128
> > +.err
> > +.endif
> 
> Incorrect.
> 
> tmp2 is backward aligned from dst_in, which means that tmp2+0 may be before
> dst_in.  Thus we write the first 16 bytes, unaligned, then write to tmp2+16
> through tmp2+N to clear the first N+1 to N+16 bytes.
> 
> However, if we stop at tmp2+48 (or tmp2+112) we could be leaving up to 15 bytes
> uninitialized.

No - in the worst case we need to write 64 bytes. The proof is trivial, 
dst = x0 & -64, tmp2 = x0 & -16, so tmp2 = dst + (x0 & 0x30) or tmp2 >= dst. 
Since we start doing the dc's at dst + 64, the stp to [tmp2 + 64] is redundant.

Wilco



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]