This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] x86-64: Add wmemset optimized with SSE2/AVX2/AVX512
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: "Rodriguez Bahena, Victor" <victor dot rodriguez dot bahena at intel dot com>
- Cc: GNU C Library <libc-alpha at sourceware dot org>
- Date: Mon, 5 Jun 2017 10:36:28 -0700
- Subject: Re: [PATCH] x86-64: Add wmemset optimized with SSE2/AVX2/AVX512
- Authentication-results: sourceware.org; auth=none
- References: <20170521203442.GA20131@gmail.com> <CAMe9rOrTUvjYcCYnSKS-7Y8A63-MjFOwQHMATSkLoE8ZCt=thw@mail.gmail.com> <D5541339.151DA%victor.rodriguez.bahena@intel.com> <CAMe9rOojxpE0cCakd8TvZuPeBkQ-uMHrsD1TYbvFFa5zhKBEgw@mail.gmail.com>
On Fri, Jun 2, 2017 at 12:45 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, May 31, 2017 at 4:29 AM, Rodriguez Bahena, Victor
> <victor.rodriguez.bahena@intel.com> wrote:
>> +1
>>
>>
>>
>>
>> -----Original Message-----
>> From: <libc-alpha-owner@sourceware.org> on behalf of "H.J. Lu"
>> <hjl.tools@gmail.com>
>> Date: Tuesday, May 30, 2017 at 6:41 PM
>> To: GNU C Library <libc-alpha@sourceware.org>
>> Subject: Re: [PATCH] x86-64: Add wmemset optimized with SSE2/AVX2/AVX512
>>
>>>On Sun, May 21, 2017 at 1:34 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> The difference between memset and wmemset is byte vs int. Add stubs
>>>> to SSE2/AVX2/AVX512 memset for wmemset with updated constant and size:
>>>>
>>>> SSE2 wmemset:
>>>> shl $0x2,%rdx
>>>> movd %esi,%xmm0
>>>> mov %rdi,%rax
>>>> pshufd $0x0,%xmm0,%xmm0
>>>> jmp entry_from_wmemset
>>>>
>>>> SSE2 memset:
>>>> movd %esi,%xmm0
>>>> mov %rdi,%rax
>>>> punpcklbw %xmm0,%xmm0
>>>> punpcklwd %xmm0,%xmm0
>>>> pshufd $0x0,%xmm0,%xmm0
>>>> entry_from_wmemset:
>>>>
>>>> Since the ERMS versions of wmemset requires "rep stosl" instead of
>>>> "rep stosb", only the vector store stubs of SSE2/AVX2/AVX512 wmemset
>>>> are added. The SSE2 wmemset is about 3X faster and the AVX2 wmemset
>>>> is about 6X faster on Haswell.
>>>>
>>>> OK for master?
>>>
>>>Any objections?
>>>
>>>> H.J.
>>>> ---
>>>> * include/wchar.h (__wmemset_chk): New.
>>>> * sysdeps/x86_64/memset.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed
>>>> to MEMSET_VDUP_TO_VEC0_AND_SET_RETURN.
>>>> (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
>>>> (WMEMSET_CHK_SYMBOL): Likewise.
>>>> (WMEMSET_SYMBOL): Likewise.
>>>> (__wmemset): Add hidden definition.
>>>> (wmemset): Add weak hidden definition.
>>>> * sysdeps/x86_64/multiarch/ifunc-impl-list.c
>>>> (__libc_ifunc_impl_list): Add __wmemset_sse2_unaligned,
>>>> __wmemset_avx2_unaligned, __wmemset_avx512_unaligned,
>>>> __wmemset_chk_sse2_unaligned, __wmemset_chk_avx2_unaligned
>>>> and __wmemset_chk_avx512_unaligned.
>>>> * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S
>>>> (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ...
>>>> (MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This.
>>>> (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
>>>> (WMEMSET_SYMBOL): Likewise.
>>>> * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S
>>>> (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ...
>>>> (MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This.
>>>> (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
>>>> (WMEMSET_SYMBOL): Likewise.
>>>> * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Updated.
>>>> (WMEMSET_CHK_SYMBOL): New.
>>>> (WMEMSET_CHK_SYMBOL (__wmemset_chk, unaligned)): Likewise.
>>>> (WMEMSET_SYMBOL (__wmemset, unaligned)): Likewise.
>>>> * sysdeps/x86_64/multiarch/memset.S (WMEMSET_SYMBOL): New.
>>>> (libc_hidden_builtin_def): Also define __GI_wmemset and
>>>> __GI___wmemset.
>>>> (weak_alias): New.
>>>> * sysdeps/x86_64/multiarch/wmemset.S: New file.
>>>> * sysdeps/x86_64/multiarch/wmemset_chk.S: Likewise.
>>>> * sysdeps/x86_64/wmemset.S: Likewise.
>>>> * sysdeps/x86_64/wmemset_chk.S: Likewise.
>
> Here is the updated patch to implement IFUNC wmemset in C.
>
>
I will check it in today.
--
H.J.