This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH RFC] Imporve 64bit memcpy performance for Haswell CPU with AVX instruction


Any comments ?

Thanks
Ling

2014-06-30 22:45 GMT+08:00, Ling Ma <ling.ma.program@gmail.com>:
> We refined the code and removed backward copy for memcpy on haswell,
> meanwhile simplified code for memmove.
> In the end we re-test code and get the comparison report(with pending
> sse2)  and latest code as gzipped attachment. The report indicates the
> avx memcpy improves performance from 12% ~ 67% when copy size is over
> 256bytes, and when copy size is below 256bytes the result is almost
> the same.
>
> Thanks
> Ling
>
> 2014-06-26 0:34 GMT+08:00, OndÅej BÃlka <neleai@seznam.cz>:
>> On Wed, Jun 25, 2014 at 08:16:58AM -0700, H.J. Lu wrote:
>>> On Wed, Jun 25, 2014 at 7:45 AM, Ling Ma <ling.ma.program@gmail.com>
>>> wrote:
>>> > By modifying test suite, we re-test 403.gcc in two parts: one is below
>>> > 256bytes,
>>> > the other is over 256bytes, The results as gzipped attachment shows
>>> > (compared with pending sse2 memcpy):
>>> > 1. when copy size is below 256 bytes, avx memcpy get almost the same
>>> > performance because its instructions also use 16bytes registers.
>>> >
>>> > 2. when copy size is over 256bytes avx memcpy improve performance from
>>> > 4.9% to 33% because its instructions use 32bytes registers.
>>> >
>>> > So avx memcpy avoid regression for small size and improve performance
>>> > for big size.
>>> >
>>> > Thanks
>>> > Ling
>>> >
>>>
>>> I'd like to get it in.  Any more feedbacks?
>>>
>> Now only generic one that it needs to fix formatting like memset.
>>
>> Also what is a point of this code? A forward/backwared decision was
>> already done.
>>
>> +#ifdef USE_AS_MEMMOVE
>> +       mov     %rsi, %r10
>> +       sub     %rdi, %r10
>> +       cmp     %rdx, %r10
>> +       jae     L(memmove_use_memcpy_fwd)
>> +       cmp     %rcx, %r10
>> +       jae     L(memmove_use_memcpy_fwd)
>> +       jmp L(gobble_mem_fwd_llc_start)
>> +L(memmove_use_memcpy_fwd):
>> +#endif
>> +       cmp     %rcx, %rdx
>> +       jae     L(gobble_big_data_fwd)
>> +#ifdef USE_AS_MEMMOVE
>> +L(gobble_mem_fwd_llc_start):
>> +#endif
>>
>> I will comment performance tests later.
>>
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]