This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
RE: [RFC PATCH] aarch64: improve memset
- From: "Wilco Dijkstra" <wdijkstr at arm dot com>
- To: "'Richard Henderson'" <rth at twiddle dot net>
- Cc: <will dot newton at linaro dot org>, <marcus dot shawcroft at gmail dot com>, <libc-alpha at sourceware dot org>
- Date: Tue, 11 Nov 2014 12:52:03 -0000
- Subject: RE: [RFC PATCH] aarch64: improve memset
- Authentication-results: sourceware.org; auth=none
- References: <002701cffaa0$77623570$6626a050$ at com> <002801cffaa5$eb2852f0$c178f8d0$ at com> <545F237A dot 8070808 at twiddle dot net> <002901cffd22$3fa9ed10$befdc730$ at com> <5461C50C dot 1020508 at twiddle dot net>
> Richard wrote:
> On 11/10/2014 09:09 PM, Wilco Dijkstra wrote:
> > I spotted one issue in the alignment code:
> >
> > + stp xzr, xzr, [tmp2, #64]
> > +
> > + /* Store up to first SIZE, aligned 16. */
> > +.ifgt \size - 64
> > + stp xzr, xzr, [tmp2, #80]
> > + stp xzr, xzr, [tmp2, #96]
> > + stp xzr, xzr, [tmp2, #112]
> > + stp xzr, xzr, [tmp2, #128]
> > +.ifgt \size - 128
> > +.err
> > +.endif
> > +.endif
> >
> > This should be:
> >
> > + /* Store up to first SIZE, aligned 16. */
> > +.ifgt \size - 64
> > + stp xzr, xzr, [tmp2, #64]
> > + stp xzr, xzr, [tmp2, #80]
> > + stp xzr, xzr, [tmp2, #96]
> > + stp xzr, xzr, [tmp2, #112]
> > +.ifgt \size - 128
> > +.err
> > +.endif
>
> Incorrect.
>
> tmp2 is backward aligned from dst_in, which means that tmp2+0 may be before
> dst_in. Thus we write the first 16 bytes, unaligned, then write to tmp2+16
> through tmp2+N to clear the first N+1 to N+16 bytes.
>
> However, if we stop at tmp2+48 (or tmp2+112) we could be leaving up to 15 bytes
> uninitialized.
No - in the worst case we need to write 64 bytes. The proof is trivial,
dst = x0 & -64, tmp2 = x0 & -16, so tmp2 = dst + (x0 & 0x30) or tmp2 >= dst.
Since we start doing the dc's at dst + 64, the stp to [tmp2 + 64] is redundant.
Wilco