This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [Patch, AArch64] Optimized strcpy
- From: Marcus Shawcroft <marcus dot shawcroft at gmail dot com>
- To: Richard Earnshaw <rearnsha at arm dot com>
- Cc: Glibc Development List <libc-alpha at sourceware dot org>
- Date: Wed, 7 Jan 2015 11:17:42 +0000
- Subject: Re: [Patch, AArch64] Optimized strcpy
- Authentication-results: sourceware.org; auth=none
- References: <54917329 dot 4090601 at arm dot com> <5491759B dot 4020704 at arm dot com> <54AD033E dot 509 at arm dot com>
> Following the various discussions about the above, I've done some
> further tweaking of the code and indeed there some further performance
> improvements, particularly for short strings.
>
> I think this is likely to be the final version (at least, for 2.21).
>
> Changes this time around:
>
> - Add the ability to build the code as stpcpy().
>
> - Small change to the page crossing check, that uses the same number of
> instructions, but could be faster on some micro-architectures.
>
> - For the slow (page crossing) check, once a page cross is known to
> occur, jump to the normal entry point.
>
> - For big-endian only, on the first check we pre-reverse the bytes so
> that we don't have to recalculate the syndrome in the (likely) case that
> the string is short.
>
> - For the initial unaligned fetch, detect zeros in the first and second
> DWords independently and jump to the relevant epilogue sequence
> directly. This eliminates another level of branching later on for the
> special cases when we have to use sub-dword sized stores
>
> - Other changes are mostly re-ordering of the hunks of code and
> micro-optimizations that fall out of the above changes.
>
> OK?
>
> * sysdeps/aarch64/strcpy.S: New file.
> * sysdeps/aarch64/stpcpy.S: New file.
OK, can you also add a NEWs entry? /Thanks