This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
[Bug string/19776] Improve sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S

From: "cvs-commit at gcc dot gnu.org" <sourceware-bugzilla at sourceware dot org>
To: glibc-bugs at sourceware dot org
Date: Sat, 02 Apr 2016 17:13:47 +0000
Subject: [Bug string/19776] Improve sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S
Auto-submitted: auto-generated
References: <bug-19776-131 at http dot sourceware dot org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=19776

--- Comment #38 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, hjl/erms/2.23 has been created
        at  4e339b9dc65217fb9b9be6cdc0e991f4ae64ccfe (commit)

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4e339b9dc65217fb9b9be6cdc0e991f4ae64ccfe

commit 4e339b9dc65217fb9b9be6cdc0e991f4ae64ccfe
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Apr 1 14:01:24 2016 -0700

    X86-64: Add dummy memcopy.h and wordcopy.c

    Since x86-64 doesn't use memory copy functions, add dummy memcopy.h and
    wordcopy.c to reduce code size.  It reduces the size of libc.so by about
    1 KB.

        * sysdeps/x86_64/memcopy.h: New file.
        * sysdeps/x86_64/wordcopy.c: Likewise.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=997e6c0db2c351f4a7b688c3134c1f77a0aa49de

commit 997e6c0db2c351f4a7b688c3134c1f77a0aa49de
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Mar 31 12:46:57 2016 -0700

    X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove

    Since the new SSE2/AVX2 memcpy/memmove are faster than the previous ones,
    we can remove the previous SSE2/AVX2 memcpy/memmove and replace them with
    the new ones.

    No change in IFUNC selection if SSE2 and AVX2 memcpy/memmove weren't used
    before.  If SSE2 or AVX2 memcpy/memmove were used, the new SSE2 or AVX2
    memcpy/memmove optimized with Enhanced REP MOVSB will be used for
    processors with ERMS.  The new AVX512 memcpy/memmove will be used for
    processors with AVX512 which prefer vzeroupper.

    Since the new SSE2 memcpy/memmove are faster than the previous default
    memcpy/memmove used in libc.a and ld.so, we also remove the previous
    default memcpy/memmove and make them the default memcpy/memmove.

    Together, it reduces the size of libc.so by about 6 KB and the size of
    ld.so by about 2 KB.

    It also fixes the placement of __mempcpy_erms and __memmove_erms.

        [BZ #19776]
        * sysdeps/x86_64/memcpy.S: Make it dummy.
        * sysdeps/x86_64/mempcpy.S: Likewise.
        * sysdeps/x86_64/memmove.S: New file.
        * sysdeps/x86_64/memmove_chk.S: Likewise.
        * sysdeps/x86_64/multiarch/memmove.S: Likewise.
        * sysdeps/x86_64/multiarch/memmove_chk.S: Likewise.
        * sysdeps/x86_64/memmove.c: Removed.
        * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S: Likewise.
        * sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: Likewise.
        * sysdeps/x86_64/multiarch/memmove-avx-unaligned.S: Likewise.
        * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S:
        Likewise.
        * sysdeps/x86_64/multiarch/memmove.c: Likewise.
        * sysdeps/x86_64/multiarch/memmove_chk.c: Likewise.
        * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove
        memcpy-sse2-unaligned, memmove-avx-unaligned,
        memcpy-avx-unaligned and memmove-sse2-unaligned-erms.
        * sysdeps/x86_64/multiarch/ifunc-impl-list.c
        (__libc_ifunc_impl_list): Replace
        __memmove_chk_avx512_unaligned_2 with
        __memmove_chk_avx512_unaligned.  Remove
        __memmove_chk_avx_unaligned_2.  Replace
        __memmove_chk_sse2_unaligned_2 with
        __memmove_chk_sse2_unaligned.  Remove __memmove_chk_sse2 and
        __memmove_avx_unaligned_2.  Replace __memmove_avx512_unaligned_2
        with __memmove_avx512_unaligned.  Replace
        __memmove_sse2_unaligned_2 with __memmove_sse2_unaligned.
        Remove __memmove_sse2.  Replace __memcpy_chk_avx512_unaligned_2
        with __memcpy_chk_avx512_unaligned.  Remove
        __memcpy_chk_avx_unaligned_2.  Replace
        __memcpy_chk_sse2_unaligned_2 with __memcpy_chk_sse2_unaligned.
        Remove __memcpy_chk_sse2.  Remove __memcpy_avx_unaligned_2.
        Replace __memcpy_avx512_unaligned_2 with
        __memcpy_avx512_unaligned.  Remove __memcpy_sse2_unaligned_2
        and __memcpy_sse2.  Replace __mempcpy_chk_avx512_unaligned_2
        with __mempcpy_chk_avx512_unaligned.  Remove
        __mempcpy_chk_avx_unaligned_2.  Replace
        __mempcpy_chk_sse2_unaligned_2 with
        __mempcpy_chk_sse2_unaligned.  Remove __mempcpy_chk_sse2.
        Replace __mempcpy_avx512_unaligned_2 with
        __mempcpy_avx512_unaligned.  Remove __mempcpy_avx_unaligned_2.
        Replace __mempcpy_sse2_unaligned_2 with
        __mempcpy_sse2_unaligned.  Remove __mempcpy_sse2.
        * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Support
        __memcpy_avx512_unaligned_erms and __memcpy_avx512_unaligned.
        Use __memcpy_avx_unaligned_erms and __memcpy_sse2_unaligned_erms
        if processor has ERMS.  Default to __memcpy_sse2_unaligned.
        (ENTRY): Removed.
        (END): Likewise.
        (ENTRY_CHK): Likewise.
        (libc_hidden_builtin_def): Likewise.
        Don't include ../memcpy.S.
        * sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Support
        __memcpy_chk_avx512_unaligned_erms and
        __memcpy_chk_avx512_unaligned.  Use
        __memcpy_chk_avx_unaligned_erms and
        __memcpy_chk_sse2_unaligned_erms if if processor has ERMS.
        Default to __memcpy_chk_sse2_unaligned.
        * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: Skip if
        not in libc.
        * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S:
        Likewise.
        * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
        (MEMCPY_SYMBOL): New.
        (MEMPCPY_SYMBOL): Likewise.
        (MEMMOVE_CHK_SYMBOL): Likewise.
        (__mempcpy_erms, __memmove_erms): Moved before __mempcpy_chk
        with unaligned_erms.
        Replace MEMMOVE_SYMBOL with MEMMOVE_CHK_SYMBOL on __mempcpy_chk
        symbols.  Replace MEMMOVE_SYMBOL with MEMPCPY_SYMBOL on
        __mempcpy symbols.  Change function suffix from unaligned_2 to
        unaligned.  Provide alias for __memcpy_chk in libc.a.  Provide
        alias for memcpy in libc.a and ld.so.
        * sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Support
        __mempcpy_avx512_unaligned_erms and __mempcpy_avx512_unaligned.
        Use __mempcpy_avx_unaligned_erms and __mempcpy_sse2_unaligned_erms
        if processor has ERMS.  Default to __mempcpy_sse2_unaligned.
        (ENTRY): Removed.
        (END): Likewise.
        (ENTRY_CHK): Likewise.
        (libc_hidden_builtin_def): Likewise.
        Don't include ../mempcpy.S.
        (mempcpy): New.  Add a weak alias.
        * sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Support
        __mempcpy_chk_avx512_unaligned_erms and
        __mempcpy_chk_avx512_unaligned.  Use
        __mempcpy_chk_avx_unaligned_erms and
        __mempcpy_chk_sse2_unaligned_erms if if processor has ERMS.
        Default to __mempcpy_chk_sse2_unaligned.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0ff8c6a7b53c5bb28ac3d3e0ae8da8099491b16c

commit 0ff8c6a7b53c5bb28ac3d3e0ae8da8099491b16c
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Mar 31 10:42:30 2016 -0700

    X86-64: Remove the previous SSE2/AVX2 memsets

    Since the new SSE2/AVX2 memsets are faster than the previous ones, we
    can remove the previous SSE2/AVX2 memsets and replace them with the
    new ones.  This reduces the size of libc.so by about 900 bytes.

    No change in IFUNC selection if SSE2 and AVX2 memsets weren't used
    before.  If SSE2 or AVX2 memset was used, the new SSE2 or AVX2 memset
    optimized with Enhanced REP STOSB will be used for processors with
    ERMS.  The new AVX512 memset will be used for processors with AVX512
    which prefer vzeroupper.

        [BZ #19881]
        * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Folded
        into ...
        * sysdeps/x86_64/memset.S: This.
        (__bzero): Removed.
        (__memset_tail): Likewise.
        (__memset_chk): Likewise.
        (memset): Likewise.
        (MEMSET_CHK_SYMBOL): New. Define only if MEMSET_SYMBOL isn't
        defined.
        (MEMSET_SYMBOL): Define only if MEMSET_SYMBOL isn't defined.
        * sysdeps/x86_64/multiarch/memset-avx2.S: Removed.
        (__memset_zero_constant_len_parameter): Check SHARED instead of
        PIC.
        * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove
        memset-avx2 and memset-sse2-unaligned-erms.
        * sysdeps/x86_64/multiarch/ifunc-impl-list.c
        (__libc_ifunc_impl_list): Remove __memset_chk_sse2,
        __memset_chk_avx2, __memset_sse2 and __memset_avx2_unaligned.
        * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: Skip
        if not in libc.
        * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S:
        Likewise.
        * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
        (MEMSET_CHK_SYMBOL): New.  Define if not defined.
        (__bzero): Check VEC_SIZE == 16 instead of USE_MULTIARCH.
        Replace MEMSET_SYMBOL with MEMSET_CHK_SYMBOL on __memset_chk
        symbols.
        Properly check USE_MULTIARCH on __memset symbols.
        * sysdeps/x86_64/multiarch/memset.S (memset): Replace
        __memset_sse2 and __memset_avx2 with __memset_sse2_unaligned
        and __memset_avx2_unaligned.  Use __memset_sse2_unaligned_erms
        or __memset_avx2_unaligned_erms if processor has ERMS.  Support
        __memset_avx512_unaligned_erms and __memset_avx512_unaligned.
        (memset): Removed.
        (__memset_chk): Likewise.
        (MEMSET_SYMBOL): New.
        (libc_hidden_builtin_def): Replace __memset_sse2 with
        __memset_sse2_unaligned.
        * sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk): Replace
        __memset_chk_sse2 and __memset_chk_avx2 with
        __memset_chk_sse2_unaligned and __memset_chk_avx2_unaligned_erms.
        Use __memset_chk_sse2_unaligned_erms or
        __memset_chk_avx2_unaligned_erms if processor has ERMS.  Support
        __memset_chk_avx512_unaligned_erms and
        __memset_chk_avx512_unaligned.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=cfb059c79729b26284863334c9aa04f0a3b967b9

commit cfb059c79729b26284863334c9aa04f0a3b967b9
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Apr 1 15:08:48 2016 -0700

    Remove Fast_Copy_Backward from Intel Core processors

    Intel Core i3, i5 and i7 processors have fast unaligned copy and
    copy backward is ignored.  Remove Fast_Copy_Backward from Intel Core
    processors to avoid confusion.

        * sysdeps/x86/cpu-features.c (init_cpu_features): Don't set
        bit_arch_Fast_Copy_Backward for Intel Core proessors.

    (cherry picked from commit 27d3ce1467990f89126e228559dec8f84b96c60e)

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=30c389be1af67c4d0716d207b6780c6169d1355f

commit 30c389be1af67c4d0716d207b6780c6169d1355f
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Mar 31 10:05:51 2016 -0700

    Add x86-64 memset with unaligned store and rep stosb

    Implement x86-64 memset with unaligned store and rep movsb.  Support
    16-byte, 32-byte and 64-byte vector register sizes.  A single file
    provides 2 implementations of memset, one with rep stosb and the other
    without rep stosb.  They share the same codes when size is between 2
    times of vector register size and REP_STOSB_THRESHOLD which defaults
    to 2KB.

    Key features:

    1. Use overlapping store to avoid branch.
    2. For size <= 4 times of vector register size, fully unroll the loop.
    3. For size > 4 times of vector register size, store 4 times of vector
    register size at a time.

        [BZ #19881]
        * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
        memset-sse2-unaligned-erms, memset-avx2-unaligned-erms and
        memset-avx512-unaligned-erms.
        * sysdeps/x86_64/multiarch/ifunc-impl-list.c
        (__libc_ifunc_impl_list): Test __memset_chk_sse2_unaligned,
        __memset_chk_sse2_unaligned_erms, __memset_chk_avx2_unaligned,
        __memset_chk_avx2_unaligned_erms, __memset_chk_avx512_unaligned,
        __memset_chk_avx512_unaligned_erms, __memset_sse2_unaligned,
        __memset_sse2_unaligned_erms, __memset_erms,
        __memset_avx2_unaligned, __memset_avx2_unaligned_erms,
        __memset_avx512_unaligned_erms and __memset_avx512_unaligned.
        * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: New
        file.
        * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S:
        Likewise.
        * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S:
        Likewise.
        * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:
        Likewise.

    (cherry picked from commit 830566307f038387ca0af3fd327706a8d1a2f595)

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=980d639b4ae58209843f09a29d86b0a8303b6650

commit 980d639b4ae58209843f09a29d86b0a8303b6650
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Mar 31 10:04:26 2016 -0700

    Add x86-64 memmove with unaligned load/store and rep movsb

    Implement x86-64 memmove with unaligned load/store and rep movsb.
    Support 16-byte, 32-byte and 64-byte vector register sizes.  When
    size <= 8 times of vector register size, there is no check for
    address overlap bewteen source and destination.  Since overhead for
    overlap check is small when size > 8 times of vector register size,
    memcpy is an alias of memmove.

    A single file provides 2 implementations of memmove, one with rep movsb
    and the other without rep movsb.  They share the same codes when size is
    between 2 times of vector register size and REP_MOVSB_THRESHOLD which
    is 2KB for 16-byte vector register size and scaled up by large vector
    register size.

    Key features:

    1. Use overlapping load and store to avoid branch.
    2. For size <= 8 times of vector register size, load  all sources into
    registers and store them together.
    3. If there is no address overlap bewteen source and destination, copy
    from both ends with 4 times of vector register size at a time.
    4. If address of destination > address of source, backward copy 8 times
    of vector register size at a time.
    5. Otherwise, forward copy 8 times of vector register size at a time.
    6. Use rep movsb only for forward copy.  Avoid slow backward rep movsb
    by fallbacking to backward copy 8 times of vector register size at a
    time.
    7. Skip when address of destination == address of source.

        [BZ #19776]
        * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
        memmove-sse2-unaligned-erms, memmove-avx-unaligned-erms and
        memmove-avx512-unaligned-erms.
        * sysdeps/x86_64/multiarch/ifunc-impl-list.c
        (__libc_ifunc_impl_list): Test
        __memmove_chk_avx512_unaligned_2,
        __memmove_chk_avx512_unaligned_erms,
        __memmove_chk_avx_unaligned_2, __memmove_chk_avx_unaligned_erms,
        __memmove_chk_sse2_unaligned_2,
        __memmove_chk_sse2_unaligned_erms, __memmove_avx_unaligned_2,
        __memmove_avx_unaligned_erms, __memmove_avx512_unaligned_2,
        __memmove_avx512_unaligned_erms, __memmove_erms,
        __memmove_sse2_unaligned_2, __memmove_sse2_unaligned_erms,
        __memcpy_chk_avx512_unaligned_2,
        __memcpy_chk_avx512_unaligned_erms,
        __memcpy_chk_avx_unaligned_2, __memcpy_chk_avx_unaligned_erms,
        __memcpy_chk_sse2_unaligned_2, __memcpy_chk_sse2_unaligned_erms,
        __memcpy_avx_unaligned_2, __memcpy_avx_unaligned_erms,
        __memcpy_avx512_unaligned_2, __memcpy_avx512_unaligned_erms,
        __memcpy_sse2_unaligned_2, __memcpy_sse2_unaligned_erms,
        __memcpy_erms, __mempcpy_chk_avx512_unaligned_2,
        __mempcpy_chk_avx512_unaligned_erms,
        __mempcpy_chk_avx_unaligned_2, __mempcpy_chk_avx_unaligned_erms,
        __mempcpy_chk_sse2_unaligned_2, __mempcpy_chk_sse2_unaligned_erms,
        __mempcpy_avx512_unaligned_2, __mempcpy_avx512_unaligned_erms,
        __mempcpy_avx_unaligned_2, __mempcpy_avx_unaligned_erms,
        __mempcpy_sse2_unaligned_2, __mempcpy_sse2_unaligned_erms and
        __mempcpy_erms.
        * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: New
        file.
        * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S:
        Likwise.
        * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S:
        Likwise.
        * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:
        Likwise.

    (cherry picked from commit 88b57b8ed41d5ecf2e1bdfc19556f9246a665ebb)

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=bf2bc5e5c9d7aa8af28b299ec26b8a37352730cc

commit bf2bc5e5c9d7aa8af28b299ec26b8a37352730cc
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Mon Mar 28 19:22:59 2016 -0700

    Initial Enhanced REP MOVSB/STOSB (ERMS) support

    The newer Intel processors support Enhanced REP MOVSB/STOSB (ERMS) which
    has a feature bit in CPUID.  This patch adds the Enhanced REP MOVSB/STOSB
    (ERMS) bit to x86 cpu-features.

        * sysdeps/x86/cpu-features.h (bit_cpu_ERMS): New.
        (index_cpu_ERMS): Likewise.
        (reg_ERMS): Likewise.

    (cherry picked from commit 0791f91dff9a77263fa8173b143d854cad902c6d)

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7c244283ff12329b3bca9878b8edac3b3fe5c7bc

commit 7c244283ff12329b3bca9878b8edac3b3fe5c7bc
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Mon Mar 28 13:15:59 2016 -0700

    Make __memcpy_avx512_no_vzeroupper an alias

    Since x86-64 memcpy-avx512-no-vzeroupper.S implements memmove, make
    __memcpy_avx512_no_vzeroupper an alias of __memmove_avx512_no_vzeroupper
    to reduce code size of libc.so.

        * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove
        memcpy-avx512-no-vzeroupper.
        * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S: Renamed
        to ...
        * sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: This.
        (MEMCPY): Don't define.
        (MEMCPY_CHK): Likewise.
        (MEMPCPY): Likewise.
        (MEMPCPY_CHK): Likewise.
        (MEMPCPY_CHK): Renamed to ...
        (__mempcpy_chk_avx512_no_vzeroupper): This.
        (MEMPCPY_CHK): Renamed to ...
        (__mempcpy_chk_avx512_no_vzeroupper): This.
        (MEMCPY_CHK): Renamed to ...
        (__memmove_chk_avx512_no_vzeroupper): This.
        (MEMCPY): Renamed to ...
        (__memmove_avx512_no_vzeroupper): This.
        (__memcpy_avx512_no_vzeroupper): New alias.
        (__memcpy_chk_avx512_no_vzeroupper): Likewise.

    (cherry picked from commit 064f01b10b57ff09cda7025f484b848c38ddd57a)

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a9a14991fb2d3e69f80d25e9bbf2f6b0bcf11c3d

commit a9a14991fb2d3e69f80d25e9bbf2f6b0bcf11c3d
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Mon Mar 28 13:13:36 2016 -0700

    Implement x86-64 multiarch mempcpy in memcpy

    Implement x86-64 multiarch mempcpy in memcpy to share most of code.  It
    reduces code size of libc.so.

        [BZ #18858]
        * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove
        mempcpy-ssse3, mempcpy-ssse3-back, mempcpy-avx-unaligned
        and mempcpy-avx512-no-vzeroupper.
        * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S (MEMPCPY_CHK):
        New.
        (MEMPCPY): Likewise.
        * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S
        (MEMPCPY_CHK): New.
        (MEMPCPY): Likewise.
        * sysdeps/x86_64/multiarch/memcpy-ssse3-back.S (MEMPCPY_CHK): New.
        (MEMPCPY): Likewise.
        * sysdeps/x86_64/multiarch/memcpy-ssse3.S (MEMPCPY_CHK): New.
        (MEMPCPY): Likewise.
        * sysdeps/x86_64/multiarch/mempcpy-avx-unaligned.S: Removed.
        * sysdeps/x86_64/multiarch/mempcpy-avx512-no-vzeroupper.S:
        Likewise.
        * sysdeps/x86_64/multiarch/mempcpy-ssse3-back.S: Likewise.
        * sysdeps/x86_64/multiarch/mempcpy-ssse3.S: Likewise.

    (cherry picked from commit c365e615f7429aee302f8af7bf07ae262278febb)

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4fc09dabecee1b7cafdbca26ee7c63f68e53c229

commit 4fc09dabecee1b7cafdbca26ee7c63f68e53c229
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Mon Mar 28 04:39:48 2016 -0700

    [x86] Add a feature bit: Fast_Unaligned_Copy

    On AMD processors, memcpy optimized with unaligned SSE load is
    slower than emcpy optimized with aligned SSSE3 while other string
    functions are faster with unaligned SSE load.  A feature bit,
    Fast_Unaligned_Copy, is added to select memcpy optimized with
    unaligned SSE load.

        [BZ #19583]
        * sysdeps/x86/cpu-features.c (init_cpu_features): Set
        Fast_Unaligned_Copy with Fast_Unaligned_Load for Intel
        processors.  Set Fast_Copy_Backward for AMD Excavator
        processors.
        * sysdeps/x86/cpu-features.h (bit_arch_Fast_Unaligned_Copy):
        New.
        (index_arch_Fast_Unaligned_Copy): Likewise.
        * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check
        Fast_Unaligned_Copy instead of Fast_Unaligned_Load.

    (cherry picked from commit e41b395523040fcb58c7d378475720c2836d280c)

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=75f2d47e459a6bf5656a938e5c63f8b581eb3ee6

commit 75f2d47e459a6bf5656a938e5c63f8b581eb3ee6
Author: Florian Weimer <fweimer@redhat.com>
Date:   Fri Mar 25 11:11:42 2016 +0100

    tst-audit10: Fix compilation on compilers without bit_AVX512F [BZ #19860]

        [BZ# 19860]
        * sysdeps/x86_64/tst-audit10.c (avx512_enabled): Always return
        zero if the compiler does not provide the AVX512F bit.

    (cherry picked from commit f327f5b47be57bc05a4077344b381016c1bb2c11)

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=96c7375cb8b6f1875d9865f2ae92ecacf5f5e6fa

commit 96c7375cb8b6f1875d9865f2ae92ecacf5f5e6fa
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue Mar 22 08:36:16 2016 -0700

    Don't set %rcx twice before "rep movsb"

        * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S (MEMCPY):
        Don't set %rcx twice before "rep movsb".

    (cherry picked from commit 3c9a4cd16cbc7b79094fec68add2df66061ab5d7)

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c273f613b0cc779ee33cc33d20941d271316e483

commit c273f613b0cc779ee33cc33d20941d271316e483
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue Mar 22 07:46:56 2016 -0700

    Set index_arch_AVX_Fast_Unaligned_Load only for Intel processors

    Since only Intel processors with AVX2 have fast unaligned load, we
    should set index_arch_AVX_Fast_Unaligned_Load only for Intel processors.

    Move AVX, AVX2, AVX512, FMA and FMA4 detection into get_common_indeces
    and call get_common_indeces for other processors.

    Add CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P to aoid loading
    GLRO(dl_x86_cpu_features) in cpu-features.c.

        [BZ #19583]
        * sysdeps/x86/cpu-features.c (get_common_indeces): Remove
        inline.  Check family before setting family, model and
        extended_model.  Set AVX, AVX2, AVX512, FMA and FMA4 usable
        bits here.
        (init_cpu_features): Replace HAS_CPU_FEATURE and
        HAS_ARCH_FEATURE with CPU_FEATURES_CPU_P and
        CPU_FEATURES_ARCH_P.  Set index_arch_AVX_Fast_Unaligned_Load
        for Intel processors with usable AVX2.  Call get_common_indeces
        for other processors with family == NULL.
        * sysdeps/x86/cpu-features.h (CPU_FEATURES_CPU_P): New macro.
        (CPU_FEATURES_ARCH_P): Likewise.
        (HAS_CPU_FEATURE): Use CPU_FEATURES_CPU_P.
        (HAS_ARCH_FEATURE): Use CPU_FEATURES_ARCH_P.

    (cherry picked from commit f781a9e96138d8839663af5e88649ab1fbed74f8)

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c858d10a4e7fd682f2e7083836e4feacc2d580f4

commit c858d10a4e7fd682f2e7083836e4feacc2d580f4
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Mar 10 05:26:46 2016 -0800

    Add _arch_/_cpu_ to index_*/bit_* in x86 cpu-features.h

    index_* and bit_* macros are used to access cpuid and feature arrays o
    struct cpu_features.  It is very easy to use bits and indices of cpuid
    array on feature array, especially in assembly codes.  For example,
    sysdeps/i386/i686/multiarch/bcopy.S has

        HAS_CPU_FEATURE (Fast_Rep_String)

    which should be

        HAS_ARCH_FEATURE (Fast_Rep_String)

    We change index_* and bit_* to index_cpu_*/index_arch_* and
    bit_cpu_*/bit_arch_* so that we can catch such error at build time.

        [BZ #19762]
        * sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h
        (EXTRA_LD_ENVVARS): Add _arch_ to index_*/bit_*.
        * sysdeps/x86/cpu-features.c (init_cpu_features): Likewise.
        * sysdeps/x86/cpu-features.h (bit_*): Renamed to ...
        (bit_arch_*): This for feature array.
        (bit_*): Renamed to ...
        (bit_cpu_*): This for cpu array.
        (index_*): Renamed to ...
        (index_arch_*): This for feature array.
        (index_*): Renamed to ...
        (index_cpu_*): This for cpu array.
        [__ASSEMBLER__] (HAS_FEATURE): Add and use field.
        [__ASSEMBLER__] (HAS_CPU_FEATURE)): Pass cpu to HAS_FEATURE.
        [__ASSEMBLER__] (HAS_ARCH_FEATURE)): Pass arch to HAS_FEATURE.
        [!__ASSEMBLER__] (HAS_CPU_FEATURE): Replace index_##name and
        bit_##name with index_cpu_##name and bit_cpu_##name.
        [!__ASSEMBLER__] (HAS_ARCH_FEATURE): Replace index_##name and
        bit_##name with index_arch_##name and bit_arch_##name.

    (cherry picked from commit 6aa3e97e2530f9917f504eb4146af119a3f27229)

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7a90b56b0c3f8e55df44957cf6de7d3c9c04cbb9

commit 7a90b56b0c3f8e55df44957cf6de7d3c9c04cbb9
Author: Roland McGrath <roland@hack.frob.com>
Date:   Tue Mar 8 12:31:13 2016 -0800

    Fix tst-audit10 build when -mavx512f is not supported.

    (cherry picked from commit 3bd80c0de2f8e7ca8020d37739339636d169957e)

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ba80f6ceea3a6b6f711038646f419125fe3ad39c

commit ba80f6ceea3a6b6f711038646f419125fe3ad39c
Author: Florian Weimer <fweimer@redhat.com>
Date:   Mon Mar 7 16:00:25 2016 +0100

    tst-audit4, tst-audit10: Compile AVX/AVX-512 code separately [BZ #19269]

    This ensures that GCC will not use unsupported instructions before
    the run-time check to ensure support.

    (cherry picked from commit 3c0f7407eedb524c9114bb675cd55b903c71daaa)

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=b8fe596e7f750d4ee2fca14d6a3999364c02662e

commit b8fe596e7f750d4ee2fca14d6a3999364c02662e
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sun Mar 6 16:48:11 2016 -0800

    Group AVX512 functions in .text.avx512 section

        * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S:
        Replace .text with .text.avx512.
        * sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S:
        Likewise.

    (cherry picked from commit fee9eb6200f0e44a4b684903bc47fde36d46f1a5)

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=e455d17680cfaebb12692547422f95ba1ed30e29

commit e455d17680cfaebb12692547422f95ba1ed30e29
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Mar 4 08:37:40 2016 -0800

    x86-64: Fix memcpy IFUNC selection

    Chek Fast_Unaligned_Load, instead of Slow_BSF, and also check for
    Fast_Copy_Backward to enable __memcpy_ssse3_back.  Existing selection
    order is updated with following selection order:

    1. __memcpy_avx_unaligned if AVX_Fast_Unaligned_Load bit is set.
    2. __memcpy_sse2_unaligned if Fast_Unaligned_Load bit is set.
    3. __memcpy_sse2 if SSSE3 isn't available.
    4. __memcpy_ssse3_back if Fast_Copy_Backward bit it set.
    5. __memcpy_ssse3

        [BZ #18880]
        * sysdeps/x86_64/multiarch/memcpy.S: Check Fast_Unaligned_Load,
        instead of Slow_BSF, and also check for Fast_Copy_Backward to
        enable __memcpy_ssse3_back.

    (cherry picked from commit 14a1d7cc4c4fd5ee8e4e66b777221dd32a84efe8)

-----------------------------------------------------------------------

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]