This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug string/19776] Improve sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S


https://sourceware.org/bugzilla/show_bug.cgi?id=19776

--- Comment #33 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  88b57b8ed41d5ecf2e1bdfc19556f9246a665ebb (commit)
      from  5cdd1989d1d2f135d02e66250f37ba8e767f9772 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=88b57b8ed41d5ecf2e1bdfc19556f9246a665ebb

commit 88b57b8ed41d5ecf2e1bdfc19556f9246a665ebb
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Mar 31 10:04:26 2016 -0700

    Add x86-64 memmove with unaligned load/store and rep movsb

    Implement x86-64 memmove with unaligned load/store and rep movsb.
    Support 16-byte, 32-byte and 64-byte vector register sizes.  When
    size <= 8 times of vector register size, there is no check for
    address overlap bewteen source and destination.  Since overhead for
    overlap check is small when size > 8 times of vector register size,
    memcpy is an alias of memmove.

    A single file provides 2 implementations of memmove, one with rep movsb
    and the other without rep movsb.  They share the same codes when size is
    between 2 times of vector register size and REP_MOVSB_THRESHOLD which
    is 2KB for 16-byte vector register size and scaled up by large vector
    register size.

    Key features:

    1. Use overlapping load and store to avoid branch.
    2. For size <= 8 times of vector register size, load  all sources into
    registers and store them together.
    3. If there is no address overlap bewteen source and destination, copy
    from both ends with 4 times of vector register size at a time.
    4. If address of destination > address of source, backward copy 8 times
    of vector register size at a time.
    5. Otherwise, forward copy 8 times of vector register size at a time.
    6. Use rep movsb only for forward copy.  Avoid slow backward rep movsb
    by fallbacking to backward copy 8 times of vector register size at a
    time.
    7. Skip when address of destination == address of source.

        [BZ #19776]
        * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
        memmove-sse2-unaligned-erms, memmove-avx-unaligned-erms and
        memmove-avx512-unaligned-erms.
        * sysdeps/x86_64/multiarch/ifunc-impl-list.c
        (__libc_ifunc_impl_list): Test
        __memmove_chk_avx512_unaligned_2,
        __memmove_chk_avx512_unaligned_erms,
        __memmove_chk_avx_unaligned_2, __memmove_chk_avx_unaligned_erms,
        __memmove_chk_sse2_unaligned_2,
        __memmove_chk_sse2_unaligned_erms, __memmove_avx_unaligned_2,
        __memmove_avx_unaligned_erms, __memmove_avx512_unaligned_2,
        __memmove_avx512_unaligned_erms, __memmove_erms,
        __memmove_sse2_unaligned_2, __memmove_sse2_unaligned_erms,
        __memcpy_chk_avx512_unaligned_2,
        __memcpy_chk_avx512_unaligned_erms,
        __memcpy_chk_avx_unaligned_2, __memcpy_chk_avx_unaligned_erms,
        __memcpy_chk_sse2_unaligned_2, __memcpy_chk_sse2_unaligned_erms,
        __memcpy_avx_unaligned_2, __memcpy_avx_unaligned_erms,
        __memcpy_avx512_unaligned_2, __memcpy_avx512_unaligned_erms,
        __memcpy_sse2_unaligned_2, __memcpy_sse2_unaligned_erms,
        __memcpy_erms, __mempcpy_chk_avx512_unaligned_2,
        __mempcpy_chk_avx512_unaligned_erms,
        __mempcpy_chk_avx_unaligned_2, __mempcpy_chk_avx_unaligned_erms,
        __mempcpy_chk_sse2_unaligned_2, __mempcpy_chk_sse2_unaligned_erms,
        __mempcpy_avx512_unaligned_2, __mempcpy_avx512_unaligned_erms,
        __mempcpy_avx_unaligned_2, __mempcpy_avx_unaligned_erms,
        __mempcpy_sse2_unaligned_2, __mempcpy_sse2_unaligned_erms and
        __mempcpy_erms.
        * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: New
        file.
        * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S:
        Likwise.
        * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S:
        Likwise.
        * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:
        Likwise.

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                                          |   40 ++
 sysdeps/x86_64/multiarch/Makefile                  |    5 +-
 sysdeps/x86_64/multiarch/ifunc-impl-list.c         |   99 +++++
 .../x86_64/multiarch/memmove-avx-unaligned-erms.S  |    9 +
 .../multiarch/memmove-avx512-unaligned-erms.S      |   11 +
 .../x86_64/multiarch/memmove-sse2-unaligned-erms.S |    9 +
 .../x86_64/multiarch/memmove-vec-unaligned-erms.S  |  462 ++++++++++++++++++++
 7 files changed, 634 insertions(+), 1 deletions(-)
 create mode 100644 sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S
 create mode 100644 sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S
 create mode 100644 sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S
 create mode 100644 sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]