This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug string/19776] Improve sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S


https://sourceware.org/bugzilla/show_bug.cgi?id=19776

--- Comment #22 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, hjl/erms/master has been created
        at  9a4aba90edef8b8635712299e78a525380420dff (commit)

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=9a4aba90edef8b8635712299e78a525380420dff

commit 9a4aba90edef8b8635712299e78a525380420dff
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sun Mar 13 00:26:57 2016 -0800

    Add memmove/memset-avx512-unaligned-erms-no-vzeroupper.S

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c2240bc05c7efea04397719be2c1ccd8f8e8b745

commit c2240bc05c7efea04397719be2c1ccd8f8e8b745
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Mar 25 08:20:17 2016 -0700

    Add x86-64 memset with vector unaliged stores and rep stosb

    Implement x86-64 memset with vector unaliged stores and rep movsb.  Key
    features:

    1. Use overlapping register store to avoid branch.
    2. For size <= 4 times of vector register size, fully unroll the loop.
    3. For size > 4 times of vector register size, store 4 times of vector
    register size at a time.
    4. If size > REP_STOSB_THRESHOLD, use rep stosb.

        * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
        memset-sse2-unaligned-erms, memset-avx2-unaligned-erms and
        memset-avx512-unaligned-erms.
        * sysdeps/x86_64/multiarch/ifunc-impl-list.c
        (__libc_ifunc_impl_list): Test __memset_chk_sse2_unaligned,
        __memset_chk_sse2_unaligned_erms, __memset_chk_avx2_unaligned,
        __memset_chk_avx2_unaligned_erms, __memset_chk_avx512_unaligned,
        __memset_chk_avx512_unaligned_erms, __memset_sse2_unaligned,
        __memset_sse2_unaligned_erms, __memset_erms,
        __memset_avx2_unaligned, __memset_avx2_unaligned_erms,
        __memset_avx512_unaligned_erms and __memset_avx512_unaligned.
        * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: New
        file.
        * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S:
        Likewise.
        * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S:
        Likewise.
        * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:
        Likewise.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ae9731e6d4c3b97bdd152fb57110ddcecdb210f8

commit ae9731e6d4c3b97bdd152fb57110ddcecdb210f8
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Mar 18 12:36:03 2016 -0700

    Add x86-64 memmove with vector unaliged loads and rep movsb

    Implement x86-64 memmove with vector unaliged loads and rep movsb.  Key
    features:

    1. Use Overlapping register load/store to avoid branch.
    2. For size <= 8 times of vector register size, load  all sources into
    registers and store them together.
    3. If there is no address overflap bewteen source and destination, copy
    from both ends with 4 times of vector register size at a time.
    4. If address of destination > address of source, backward copy 8 times
    of vector register size at a time.
    5. Otherwise, forward copy 8 times of vector register size at a time.
    6. If size > REP_MOVSB_THRESHOLD, use rep movsb for forward copy.  Avoid
    slow backward rep movsb by fallbacking to backward copy 8 times of vector
    register size at a time.

    Also provide an alias for memcpy.

        [BZ #19776]
        * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
        memmove-sse2-unaligned-erms, memmove-avx-unaligned-erms and
        memmove-avx512-unaligned-erms.
        * sysdeps/x86_64/multiarch/ifunc-impl-list.c
        (__libc_ifunc_impl_list): Test
        __memmove_chk_avx512_unaligned_2,
        __memmove_chk_avx512_unaligned_erms,
        __memmove_chk_avx_unaligned_2, __memmove_chk_avx_unaligned_erms,
        __memmove_chk_sse2_unaligned_2,
        __memmove_chk_sse2_unaligned_erms, __memmove_avx_unaligned_2,
        __memmove_avx_unaligned_erms, __memmove_avx512_unaligned_2,
        __memmove_avx512_unaligned_erms, __memmove_erms,
        __memmove_sse2_unaligned_2, __memmove_sse2_unaligned_erms,
        __memcpy_chk_avx512_unaligned_2,
        __memcpy_chk_avx512_unaligned_erms,
        __memcpy_chk_avx_unaligned_2, __memcpy_chk_avx_unaligned_erms,
        __memcpy_chk_sse2_unaligned_2, __memcpy_chk_sse2_unaligned_erms,
        __memcpy_avx_unaligned_2, __memcpy_avx_unaligned_erms,
        __memcpy_avx512_unaligned_2, __memcpy_avx512_unaligned_erms,
        __memcpy_sse2_unaligned_2, __memcpy_sse2_unaligned_erms,
        __memcpy_erms, __mempcpy_chk_avx512_unaligned_2,
        __mempcpy_chk_avx512_unaligned_erms,
        __mempcpy_chk_avx_unaligned_2, __mempcpy_chk_avx_unaligned_erms,
        __mempcpy_chk_sse2_unaligned_2, __mempcpy_chk_sse2_unaligned_erms,
        __mempcpy_avx512_unaligned_2, __mempcpy_avx512_unaligned_erms,
        __mempcpy_avx_unaligned_2, __mempcpy_avx_unaligned_erms,
        __mempcpy_sse2_unaligned_2, __mempcpy_sse2_unaligned_erms and
        __mempcpy_erms.
        * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: New
        file.
        * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S:
        Likwise.
        * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S:
        Likwise.
        * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:
        Likwise.

    Fix sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=30b534f494028b891e15b74e7fd232429d685295

commit 30b534f494028b891e15b74e7fd232429d685295
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Sep 15 15:47:01 2011 -0700

    Initial Enhanced REP MOVSB/STOSB (ERMS) support

    The newer Intel processors support Enhanced REP MOVSB/STOSB (ERMS) which
    has a feature bit in CPUID.  This patch adds the Enhanced REP MOVSB/STOSB
    (ERMS) bit to x86 cpu-features.

        * sysdeps/x86/cpu-features.h (bit_cpu_ERMS): New.
        (index_cpu_ERMS): Likewise.
        (reg_ERMS): Likewise.

-----------------------------------------------------------------------

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]