This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug dynamic-link/21258] Branch predication in _dl_runtime_resolve_avx512_opt leads to lower CPU frequency


https://sourceware.org/bugzilla/show_bug.cgi?id=21258

--- Comment #4 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, release/2.25/master has been updated
       via  903b77defb6f2ee2552c06472339f33091e3c7b4 (commit)
      from  df29db0bec24211cfc917db52024bf8deecac2c9 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=903b77defb6f2ee2552c06472339f33091e3c7b4

commit 903b77defb6f2ee2552c06472339f33091e3c7b4
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue Mar 21 10:59:31 2017 -0700

    x86-64: Improve branch predication in _dl_runtime_resolve_avx512_opt [BZ
#21258]

    On Skylake server, _dl_runtime_resolve_avx512_opt is used to preserve
    the first 8 vector registers.  The code layout is

      if only %xmm0 - %xmm7 registers are used
         preserve %xmm0 - %xmm7 registers
      if only %ymm0 - %ymm7 registers are used
         preserve %ymm0 - %ymm7 registers
      preserve %zmm0 - %zmm7 registers

    Branch predication always executes the fallthrough code path to preserve
    %zmm0 - %zmm7 registers speculatively, even though only %xmm0 - %xmm7
    registers are used.  This leads to lower CPU frequency on Skylake
    server.  This patch changes the fallthrough code path to preserve
    %xmm0 - %xmm7 registers instead:

      if whole %zmm0 - %zmm7 registers are used
        preserve %zmm0 - %zmm7 registers
      if only %ymm0 - %ymm7 registers are used
         preserve %ymm0 - %ymm7 registers
      preserve %xmm0 - %xmm7 registers

    Tested on Skylake server.

        [BZ #21258]
        * sysdeps/x86_64/dl-trampoline.S (_dl_runtime_resolve_opt):
        Define only if _dl_runtime_resolve is defined to
        _dl_runtime_resolve_sse_vex.
        * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_opt):
        Fallthrough to _dl_runtime_resolve_sse_vex.

    (cherry picked from commit c15f8eb50cea7ad1a4ccece6e0982bf426d52c00)

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                      |    9 +++++++++
 sysdeps/x86_64/dl-trampoline.S |    3 +--
 sysdeps/x86_64/dl-trampoline.h |    9 +++++----
 3 files changed, 15 insertions(+), 6 deletions(-)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]