This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH x86_64] Update memcpy, mempcpy and memmove selection order for Excavator CPU BZ #19583


On Fri, Mar 18, 2016 at 6:51 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Mar 18, 2016 at 6:22 AM, Pawar, Amit <Amit.Pawar@amd.com> wrote:
>>>No, it isn't fixed.  Avoid_AVX_Fast_Unaligned_Load should disable __memcpy_avx_unaligned and nothing more.  Also you need to fix ALL selections.
>>
>> diff --git a/sysdeps/x86_64/multiarch/memcpy.S b/sysdeps/x86_64/multiarch/memcpy.S
>> index 8882590..a5afaf4 100644
>> --- a/sysdeps/x86_64/multiarch/memcpy.S
>> +++ b/sysdeps/x86_64/multiarch/memcpy.S
>> @@ -39,6 +39,8 @@ ENTRY(__new_memcpy)
>>         ret
>>  #endif
>>  1:     lea     __memcpy_avx_unaligned(%rip), %RAX_LP
>> +       HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load)
>> +       jnz     3f
>>         HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load)
>>         jnz     2f
>>         lea     __memcpy_sse2_unaligned(%rip), %RAX_LP
>> @@ -52,6 +54,8 @@ ENTRY(__new_memcpy)
>>         jnz     2f
>>         lea     __memcpy_ssse3(%rip), %RAX_LP
>>  2:     ret
>> +3:     lea     __memcpy_ssse3(%rip), %RAX_LP
>> +       ret
>>  END(__new_memcpy)
>>
>>  # undef ENTRY
>>
>> Will update all IFUNC's if this ok else please suggest.
>>
>
> Better, but not OK.  Try something like
>
> iff --git a/sysdeps/x86_64/multiarch/memcpy.S
> b/sysdeps/x86_64/multiarch/memcpy.S
> index ab5998c..2abe2fd 100644
> --- a/sysdeps/x86_64/multiarch/memcpy.S
> +++ b/sysdeps/x86_64/multiarch/memcpy.S
> @@ -42,9 +42,11 @@ ENTRY(__new_memcpy)
>    ret
>  #endif
>  1:   lea   __memcpy_avx_unaligned(%rip), %RAX_LP
> +  HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load)
> +  jnz   3f
>    HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load)
>    jnz   2f
> -  lea   __memcpy_sse2_unaligned(%rip), %RAX_LP
> +3:   lea   __memcpy_sse2_unaligned(%rip), %RAX_LP
>    HAS_ARCH_FEATURE (Fast_Unaligned_Load)
>    jnz   2f
>    lea   __memcpy_sse2(%rip), %RAX_LP
>

One question.  If  you don't want __memcpy_avx_unaligned,
why do you set AVX_Fast_Unaligned_Load?

-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]