This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug dynamic-link/21258] Branch predication in _dl_runtime_resolve_avx512_opt leads to lower CPU frequency
- From: "cvs-commit at gcc dot gnu.org" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Fri, 07 Apr 2017 17:24:27 +0000
- Subject: [Bug dynamic-link/21258] Branch predication in _dl_runtime_resolve_avx512_opt leads to lower CPU frequency
- Auto-submitted: auto-generated
- References: <bug-21258-131@http.sourceware.org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=21258
--- Comment #4 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".
The branch, release/2.25/master has been updated
via 903b77defb6f2ee2552c06472339f33091e3c7b4 (commit)
from df29db0bec24211cfc917db52024bf8deecac2c9 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=903b77defb6f2ee2552c06472339f33091e3c7b4
commit 903b77defb6f2ee2552c06472339f33091e3c7b4
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Tue Mar 21 10:59:31 2017 -0700
x86-64: Improve branch predication in _dl_runtime_resolve_avx512_opt [BZ
#21258]
On Skylake server, _dl_runtime_resolve_avx512_opt is used to preserve
the first 8 vector registers. The code layout is
if only %xmm0 - %xmm7 registers are used
preserve %xmm0 - %xmm7 registers
if only %ymm0 - %ymm7 registers are used
preserve %ymm0 - %ymm7 registers
preserve %zmm0 - %zmm7 registers
Branch predication always executes the fallthrough code path to preserve
%zmm0 - %zmm7 registers speculatively, even though only %xmm0 - %xmm7
registers are used. This leads to lower CPU frequency on Skylake
server. This patch changes the fallthrough code path to preserve
%xmm0 - %xmm7 registers instead:
if whole %zmm0 - %zmm7 registers are used
preserve %zmm0 - %zmm7 registers
if only %ymm0 - %ymm7 registers are used
preserve %ymm0 - %ymm7 registers
preserve %xmm0 - %xmm7 registers
Tested on Skylake server.
[BZ #21258]
* sysdeps/x86_64/dl-trampoline.S (_dl_runtime_resolve_opt):
Define only if _dl_runtime_resolve is defined to
_dl_runtime_resolve_sse_vex.
* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_opt):
Fallthrough to _dl_runtime_resolve_sse_vex.
(cherry picked from commit c15f8eb50cea7ad1a4ccece6e0982bf426d52c00)
-----------------------------------------------------------------------
Summary of changes:
ChangeLog | 9 +++++++++
sysdeps/x86_64/dl-trampoline.S | 3 +--
sysdeps/x86_64/dl-trampoline.h | 9 +++++----
3 files changed, 15 insertions(+), 6 deletions(-)
--
You are receiving this mail because:
You are on the CC list for the bug.