This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH RFC] Imporve 64bit memcpy performance for Haswell CPU with AVX instruction
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: Ling Ma <ling dot ma dot program at gmail dot com>
- Cc: Ondrej Bilka <neleai at seznam dot cz>, GNU C Library <libc-alpha at sourceware dot org>, Liubov Dmitrieva <liubov dot dmitrieva at gmail dot com>, yumkam at gmail dot com, Ling Ma <ling dot ml at alibaba-inc dot com>
- Date: Mon, 12 May 2014 11:16:10 -0700
- Subject: Re: [PATCH RFC] Imporve 64bit memcpy performance for Haswell CPU with AVX instruction
- Authentication-results: sourceware.org; auth=none
- References: <1398055946-4493-1-git-send-email-ling dot ma at alipay dot com> <CAOGi=dOQEbbkkzQGz-ZtQ0-WEHj2=hjmbstZXvZyLqycVy18Kg at mail dot gmail dot com>
On Fri, May 9, 2014 at 5:40 AM, Ling Ma <ling.ma.program@gmail.com> wrote:
> If there are still some issues on the latest memcpy and memset, please
> let us know.
>
> Thanks
> Ling
>
> 2014-04-21 12:52 GMT+08:00, ling.ma.program@gmail.com
> <ling.ma.program@gmail.com>:
>> From: Ling Ma <ling.ml@alibaba-inc.com>
>>
>> In this patch we take advantage of HSW memory bandwidth, manage to
>> reduce miss branch prediction by avoiding using branch instructions and
>> force destination to be aligned with avx instruction.
>>
>> The CPU2006 403.gcc benchmark indicates this patch improves performance
>> from 6% to 14%.
>>
>> This version only jump to backward for memove overlap case,
>> Thanks for Ondra'comments, and that Yuriy gave me c code hint on it.
>> ---
>> ChangeLog | 16 +
>> sysdeps/x86_64/multiarch/Makefile | 1 +
>> sysdeps/x86_64/multiarch/ifunc-impl-list.c | 12 +
>> sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S | 399
>> +++++++++++++++++++++++
>> sysdeps/x86_64/multiarch/memcpy.S | 4 +
>> sysdeps/x86_64/multiarch/memcpy_chk.S | 3 +
>> sysdeps/x86_64/multiarch/memmove-avx-unaligned.S | 22 ++
>> sysdeps/x86_64/multiarch/memmove.c | 7 +-
>> sysdeps/x86_64/multiarch/memmove_chk.c | 6 +-
>> sysdeps/x86_64/multiarch/mempcpy-avx-unaligned.S | 22 ++
>> sysdeps/x86_64/multiarch/mempcpy.S | 3 +
>> sysdeps/x86_64/multiarch/mempcpy_chk.S | 3 +
>> 12 files changed, 494 insertions(+), 4 deletions(-)
>> create mode 100644 sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S
>> create mode 100644 sysdeps/x86_64/multiarch/memmove-avx-unaligned.S
>> create mode 100644 sysdeps/x86_64/multiarch/mempcpy-avx-unaligned.S
>>
>> diff --git a/ChangeLog b/ChangeLog
>> index 9bb48ab..b8638e9 100644
>> --- a/ChangeLog
>> +++ b/ChangeLog
>> @@ -1,4 +1,20 @@
>> 2014-04-21 Ling Ma <ling.ml@alibaba-inc.com>
>> +
>> + * sysdeps/x86_64/multiarch/Makefile: Add avx memcpy/mempcpy/memmove
>> + * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Add support for related
>> + flies with avx memcpy
>> + * sysdeps/x86_64/multiarch/memcpy.S: Add support for avx memcpy
>> + * sysdeps/x86_64/multiarch/memcpy_chk.S: Add support for avx memcpy_chk
>> + * sysdeps/x86_64/multiarch/memmove.c: Add support for avx memmove
>> + * sysdeps/x86_64/multiarch/memmove_chk.c: Add support for avx memmove_chk
>> + * sysdeps/x86_64/multiarch/mempcpy.S: Add support for avx mempcpy
>> + * sysdeps/x86_64/multiarch/mempcpy_chk.S: Add support for avx mempcpy_chk
>> + * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S: New file for avx
>> memcpy
>> + * sysdeps/x86_64/multiarch/mempcpy-avx-unaligned.S: New file for avx
>> mempcpy
>> + * sysdeps/x86_64/multiarch/memmove-avx-unaligned.S: New file for avx
>> + memmove
>> +
I didn't see the original patch in the libc-alpha mailing list archive at
https://sourceware.org/ml/libc-alpha/
Can you resubmit it and make sure that it shows up in the libc-alpha
mailing list archive?
Thanks.
--
H.J.