This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug string/19776] Improve sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S


https://sourceware.org/bugzilla/show_bug.cgi?id=19776

--- Comment #23 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, hjl/erms/master has been created
        at  ae2460f45588f301579553cd108e29477488a4b9 (commit)

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ae2460f45588f301579553cd108e29477488a4b9

commit ae2460f45588f301579553cd108e29477488a4b9
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sun Mar 13 00:26:57 2016 -0800

    Add memmove/memset-avx512-unaligned-erms-no-vzeroupper.S

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=6e068a5e5a7db76310931b5dc0244bd572cb7fe7

commit 6e068a5e5a7db76310931b5dc0244bd572cb7fe7
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Mar 25 08:20:17 2016 -0700

    Add x86-64 memset with vector unaliged stores and rep stosb

    Implement x86-64 memset with vector unaliged stores and rep movsb.  Key
    features:

    1. Use overlapping store to avoid branch.
    2. For size <= 4 times of vector register size, fully unroll the loop.
    3. For size > 4 times of vector register size, store 4 times of vector
    register size at a time.
    4. If size > REP_STOSB_THRESHOLD, use rep stosb.

    A single file provides 2 implementations of memset, one with rep stosb
    and one without rep stosb.  They share the same codes when size is
    between 2 times of vector register size and REP_STOSB_THRESHOLD.

        * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
        memset-sse2-unaligned-erms, memset-avx2-unaligned-erms and
        memset-avx512-unaligned-erms.
        * sysdeps/x86_64/multiarch/ifunc-impl-list.c
        (__libc_ifunc_impl_list): Test __memset_chk_sse2_unaligned,
        __memset_chk_sse2_unaligned_erms, __memset_chk_avx2_unaligned,
        __memset_chk_avx2_unaligned_erms, __memset_chk_avx512_unaligned,
        __memset_chk_avx512_unaligned_erms, __memset_sse2_unaligned,
        __memset_sse2_unaligned_erms, __memset_erms,
        __memset_avx2_unaligned, __memset_avx2_unaligned_erms,
        __memset_avx512_unaligned_erms and __memset_avx512_unaligned.
        * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: New
        file.
        * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S:
        Likewise.
        * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S:
        Likewise.
        * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:
        Likewise.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=395b59341eac10502cffe2e6de218af054b8415b

commit 395b59341eac10502cffe2e6de218af054b8415b
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Mar 18 12:36:03 2016 -0700

    Add x86-64 memmove with vector unaliged loads and rep movsb

    Implement x86-64 memmove with vector unaliged loads and rep movsb.  Key
    features:

    1. Use overlapping load and store to avoid branch.
    2. For size <= 8 times of vector register size, load  all sources into
    registers and store them together.
    3. If there is no address overlap bewteen source and destination, copy
    from both ends with 4 times of vector register size at a time.
    4. If address of destination > address of source, backward copy 8 times
    of vector register size at a time.
    5. Otherwise, forward copy 8 times of vector register size at a time.
    6. If size > REP_MOVSB_THRESHOLD, use rep movsb for forward copy.  Avoid
    slow backward rep movsb by fallbacking to backward copy 8 times of vector
    register size at a time.

    When size <= 8 times of vector register size, there is no check for
    address overlap bewteen source and destination.  Since overhead for
    overlap check is small when size > 8 times of vector register size,
    memcpy is an alias of memmove.

    A single file provides 2 implementations of memmove, one with rep movsb
    and one without rep movsb.  They share the same codes when size is
    between 2 times of vector register size and REP_MOVSB_THRESHOLD.

        [BZ #19776]
        * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
        memmove-sse2-unaligned-erms, memmove-avx-unaligned-erms and
        memmove-avx512-unaligned-erms.
        * sysdeps/x86_64/multiarch/ifunc-impl-list.c
        (__libc_ifunc_impl_list): Test
        __memmove_chk_avx512_unaligned_2,
        __memmove_chk_avx512_unaligned_erms,
        __memmove_chk_avx_unaligned_2, __memmove_chk_avx_unaligned_erms,
        __memmove_chk_sse2_unaligned_2,
        __memmove_chk_sse2_unaligned_erms, __memmove_avx_unaligned_2,
        __memmove_avx_unaligned_erms, __memmove_avx512_unaligned_2,
        __memmove_avx512_unaligned_erms, __memmove_erms,
        __memmove_sse2_unaligned_2, __memmove_sse2_unaligned_erms,
        __memcpy_chk_avx512_unaligned_2,
        __memcpy_chk_avx512_unaligned_erms,
        __memcpy_chk_avx_unaligned_2, __memcpy_chk_avx_unaligned_erms,
        __memcpy_chk_sse2_unaligned_2, __memcpy_chk_sse2_unaligned_erms,
        __memcpy_avx_unaligned_2, __memcpy_avx_unaligned_erms,
        __memcpy_avx512_unaligned_2, __memcpy_avx512_unaligned_erms,
        __memcpy_sse2_unaligned_2, __memcpy_sse2_unaligned_erms,
        __memcpy_erms, __mempcpy_chk_avx512_unaligned_2,
        __mempcpy_chk_avx512_unaligned_erms,
        __mempcpy_chk_avx_unaligned_2, __mempcpy_chk_avx_unaligned_erms,
        __mempcpy_chk_sse2_unaligned_2, __mempcpy_chk_sse2_unaligned_erms,
        __mempcpy_avx512_unaligned_2, __mempcpy_avx512_unaligned_erms,
        __mempcpy_avx_unaligned_2, __mempcpy_avx_unaligned_erms,
        __mempcpy_sse2_unaligned_2, __mempcpy_sse2_unaligned_erms and
        __mempcpy_erms.
        * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: New
        file.
        * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S:
        Likwise.
        * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S:
        Likwise.
        * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:
        Likwise.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=30b534f494028b891e15b74e7fd232429d685295

commit 30b534f494028b891e15b74e7fd232429d685295
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Sep 15 15:47:01 2011 -0700

    Initial Enhanced REP MOVSB/STOSB (ERMS) support

    The newer Intel processors support Enhanced REP MOVSB/STOSB (ERMS) which
    has a feature bit in CPUID.  This patch adds the Enhanced REP MOVSB/STOSB
    (ERMS) bit to x86 cpu-features.

        * sysdeps/x86/cpu-features.h (bit_cpu_ERMS): New.
        (index_cpu_ERMS): Likewise.
        (reg_ERMS): Likewise.

-----------------------------------------------------------------------

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]