This is the mail archive of the
glibc-cvs@sourceware.org
mailing list for the glibc project.
GNU C Library master sources branch gentoo/2.24 updated. glibc-2.24-38-gb73ec92
- From: vapier at sourceware dot org
- To: glibc-cvs at sourceware dot org
- Date: 8 Dec 2016 19:27:11 -0000
- Subject: GNU C Library master sources branch gentoo/2.24 updated. glibc-2.24-38-gb73ec92
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".
The branch, gentoo/2.24 has been updated
via b73ec923c79ab493a9265930a45800391329571a (commit)
via 04c5f782796052de9d06975061eb3376ccbcbdb1 (commit)
via 9b34c1494d8e61bb3d718e2ea83b856030476737 (commit)
via 2afb8a945ddc104c5ef9aa61f32427c19b681232 (commit)
via df13b9c22a0fb690a0ab9dd4af163ae3c459d975 (commit)
via b4391b0c7def246a4503db1af683122681c12a56 (commit)
via 0d5f4a32a34f048b35360a110a0e6d1c87e3eced (commit)
via 0ab02a62e42e63b058e7a4e160dbe51762ef2c46 (commit)
via 901db98f36690e4743feefd985c6ba2d7fd19813 (commit)
from caafe2b2612be88046d7bad4da42dbc2b07fbcd7 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=b73ec923c79ab493a9265930a45800391329571a
commit b73ec923c79ab493a9265930a45800391329571a
Author: Aurelien Jarno <aurelien@aurel32.net>
Date: Tue Aug 2 09:18:59 2016 +0200
alpha: fix trunc for big input values
The alpha specific version of trunc and truncf always add and subtract
0x1.0p23 or 0x1.0p52 even for big values. This causes this kind of
errors in the testsuite:
Failure: Test: trunc_towardzero (0x1p107)
Result:
is: 1.6225927682921334e+32 0x1.fffffffffffffp+106
should be: 1.6225927682921336e+32 0x1.0000000000000p+107
difference: 1.8014398509481984e+16 0x1.0000000000000p+54
ulp : 0.5000
max.ulp : 0.0000
Change this by returning the input value when its absolute value is
greater than 0x1.0p23 or 0x1.0p52. NaN have to go through the add and
subtract operations to get possibly silenced.
Finally remove the code to handle inexact exception, trunc should never
generate such an exception.
Changelog:
* sysdeps/alpha/fpu/s_trunc.c (__trunc): Return the input value
when its absolute value is greater than 0x1.0p52.
[_IEEE_FP_INEXACT] Remove.
* sysdeps/alpha/fpu/s_truncf.c (__truncf): Return the input value
when its absolute value is greater than 0x1.0p23.
[_IEEE_FP_INEXACT] Remove.
(cherry picked from commit b74d259fe793499134eb743222cd8dd7c74a31ce)
(cherry picked from commit e6eab16cc302e6c42f79e1af02ce98ebb9a783bc)
diff --git a/sysdeps/alpha/fpu/s_trunc.c b/sysdeps/alpha/fpu/s_trunc.c
index 16cb114..4b986a6 100644
--- a/sysdeps/alpha/fpu/s_trunc.c
+++ b/sysdeps/alpha/fpu/s_trunc.c
@@ -28,12 +28,11 @@ __trunc (double x)
double two52 = copysign (0x1.0p52, x);
double r, tmp;
+ if (isgreaterequal (fabs (x), 0x1.0p52))
+ return x;
+
__asm (
-#ifdef _IEEE_FP_INEXACT
- "addt/suic %2, %3, %1\n\tsubt/suic %1, %3, %0"
-#else
"addt/suc %2, %3, %1\n\tsubt/suc %1, %3, %0"
-#endif
: "=&f"(r), "=&f"(tmp)
: "f"(x), "f"(two52));
diff --git a/sysdeps/alpha/fpu/s_truncf.c b/sysdeps/alpha/fpu/s_truncf.c
index 2290f28..3e93356 100644
--- a/sysdeps/alpha/fpu/s_truncf.c
+++ b/sysdeps/alpha/fpu/s_truncf.c
@@ -27,12 +27,11 @@ __truncf (float x)
float two23 = copysignf (0x1.0p23, x);
float r, tmp;
+ if (isgreaterequal (fabsf (x), 0x1.0p23))
+ return x;
+
__asm (
-#ifdef _IEEE_FP_INEXACT
- "adds/suic %2, %3, %1\n\tsubs/suic %1, %3, %0"
-#else
"adds/suc %2, %3, %1\n\tsubs/suc %1, %3, %0"
-#endif
: "=&f"(r), "=&f"(tmp)
: "f"(x), "f"(two23));
http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=04c5f782796052de9d06975061eb3376ccbcbdb1
commit 04c5f782796052de9d06975061eb3376ccbcbdb1
Author: Aurelien Jarno <aurelien@aurel32.net>
Date: Tue Aug 2 09:18:59 2016 +0200
alpha: fix rint on sNaN input
The alpha version of rint wrongly return sNaN for sNaN input. Fix that
by checking for NaN and by returning the input value added with itself
in that case.
Changelog:
* sysdeps/alpha/fpu/s_rint.c (__rint): Add argument with itself
when it is a NaN.
* sysdeps/alpha/fpu/s_rintf.c (__rintf): Likewise.
(cherry picked from commit cb7f9d63b921ea1a1cbb4ab377a8484fd5da9a2b)
(cherry picked from commit 8eb9a92e0522f2d4f2d4167df919d066c85d3408)
diff --git a/sysdeps/alpha/fpu/s_rint.c b/sysdeps/alpha/fpu/s_rint.c
index f33fe72..259348a 100644
--- a/sysdeps/alpha/fpu/s_rint.c
+++ b/sysdeps/alpha/fpu/s_rint.c
@@ -23,6 +23,9 @@
double
__rint (double x)
{
+ if (isnan (x))
+ return x + x;
+
if (isless (fabs (x), 9007199254740992.0)) /* 1 << DBL_MANT_DIG */
{
double tmp1, new_x;
diff --git a/sysdeps/alpha/fpu/s_rintf.c b/sysdeps/alpha/fpu/s_rintf.c
index 1400dfe..645728a 100644
--- a/sysdeps/alpha/fpu/s_rintf.c
+++ b/sysdeps/alpha/fpu/s_rintf.c
@@ -22,6 +22,9 @@
float
__rintf (float x)
{
+ if (isnanf (x))
+ return x + x;
+
if (isless (fabsf (x), 16777216.0f)) /* 1 << FLT_MANT_DIG */
{
/* Note that Alpha S_Floating is stored in registers in a
http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=9b34c1494d8e61bb3d718e2ea83b856030476737
commit 9b34c1494d8e61bb3d718e2ea83b856030476737
Author: Aurelien Jarno <aurelien@aurel32.net>
Date: Tue Aug 2 09:18:59 2016 +0200
alpha: fix floor on sNaN input
The alpha version of floor wrongly return sNaN for sNaN input. Fix that
by checking for NaN and by returning the input value added with itself
in that case.
Finally remove the code to handle inexact exception, floor should never
generate such an exception.
Changelog:
* sysdeps/alpha/fpu/s_floor.c (__floor): Add argument with itself
when it is a NaN.
[_IEEE_FP_INEXACT] Remove.
* sysdeps/alpha/fpu/s_floorf.c (__floorf): Likewise.
(cherry picked from commit 65cc568cf57156e5230db9a061645e54ff028a41)
(cherry picked from commit 1912cc082df4739c2388c375f8d486afdaa7d49b)
diff --git a/sysdeps/alpha/fpu/s_floor.c b/sysdeps/alpha/fpu/s_floor.c
index 1a6f8c4..9930f6b 100644
--- a/sysdeps/alpha/fpu/s_floor.c
+++ b/sysdeps/alpha/fpu/s_floor.c
@@ -27,16 +27,15 @@
double
__floor (double x)
{
+ if (isnan (x))
+ return x + x;
+
if (isless (fabs (x), 9007199254740992.0)) /* 1 << DBL_MANT_DIG */
{
double tmp1, new_x;
__asm (
-#ifdef _IEEE_FP_INEXACT
- "cvttq/svim %2,%1\n\t"
-#else
"cvttq/svm %2,%1\n\t"
-#endif
"cvtqt/m %1,%0\n\t"
: "=f"(new_x), "=&f"(tmp1)
: "f"(x));
diff --git a/sysdeps/alpha/fpu/s_floorf.c b/sysdeps/alpha/fpu/s_floorf.c
index 8cd80e2..015c04f 100644
--- a/sysdeps/alpha/fpu/s_floorf.c
+++ b/sysdeps/alpha/fpu/s_floorf.c
@@ -26,6 +26,9 @@
float
__floorf (float x)
{
+ if (isnanf (x))
+ return x + x;
+
if (isless (fabsf (x), 16777216.0f)) /* 1 << FLT_MANT_DIG */
{
/* Note that Alpha S_Floating is stored in registers in a
@@ -36,11 +39,7 @@ __floorf (float x)
float tmp1, tmp2, new_x;
__asm ("cvtst/s %3,%2\n\t"
-#ifdef _IEEE_FP_INEXACT
- "cvttq/svim %2,%1\n\t"
-#else
"cvttq/svm %2,%1\n\t"
-#endif
"cvtqt/m %1,%0\n\t"
: "=f"(new_x), "=&f"(tmp1), "=&f"(tmp2)
: "f"(x));
http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=2afb8a945ddc104c5ef9aa61f32427c19b681232
commit 2afb8a945ddc104c5ef9aa61f32427c19b681232
Author: Aurelien Jarno <aurelien@aurel32.net>
Date: Tue Aug 2 09:18:59 2016 +0200
alpha: fix ceil on sNaN input
The alpha version of ceil wrongly return sNaN for sNaN input. Fix that
by checking for NaN and by returning the input value added with itself
in that case.
Finally remove the code to handle inexact exception, ceil should never
generate such an exception.
Changelog:
* sysdeps/alpha/fpu/s_ceil.c (__ceil): Add argument with itself
when it is a NaN.
[_IEEE_FP_INEXACT] Remove.
* sysdeps/alpha/fpu/s_ceilf.c (__ceilf): Likewise.
(cherry picked from commit 062e53c195b4a87754632c7d51254867247698b4)
(cherry picked from commit 3eff6f84311d2679a58a637e3be78b4ced275762)
diff --git a/sysdeps/alpha/fpu/s_ceil.c b/sysdeps/alpha/fpu/s_ceil.c
index c1ff864..e9c350a 100644
--- a/sysdeps/alpha/fpu/s_ceil.c
+++ b/sysdeps/alpha/fpu/s_ceil.c
@@ -26,17 +26,16 @@
double
__ceil (double x)
{
+ if (isnan (x))
+ return x + x;
+
if (isless (fabs (x), 9007199254740992.0)) /* 1 << DBL_MANT_DIG */
{
double tmp1, new_x;
new_x = -x;
__asm (
-#ifdef _IEEE_FP_INEXACT
- "cvttq/svim %2,%1\n\t"
-#else
"cvttq/svm %2,%1\n\t"
-#endif
"cvtqt/m %1,%0\n\t"
: "=f"(new_x), "=&f"(tmp1)
: "f"(new_x));
diff --git a/sysdeps/alpha/fpu/s_ceilf.c b/sysdeps/alpha/fpu/s_ceilf.c
index 7e63a6f..77e01a9 100644
--- a/sysdeps/alpha/fpu/s_ceilf.c
+++ b/sysdeps/alpha/fpu/s_ceilf.c
@@ -25,6 +25,9 @@
float
__ceilf (float x)
{
+ if (isnanf (x))
+ return x + x;
+
if (isless (fabsf (x), 16777216.0f)) /* 1 << FLT_MANT_DIG */
{
/* Note that Alpha S_Floating is stored in registers in a
@@ -36,11 +39,7 @@ __ceilf (float x)
new_x = -x;
__asm ("cvtst/s %3,%2\n\t"
-#ifdef _IEEE_FP_INEXACT
- "cvttq/svim %2,%1\n\t"
-#else
"cvttq/svm %2,%1\n\t"
-#endif
"cvtqt/m %1,%0\n\t"
: "=f"(new_x), "=&f"(tmp1), "=&f"(tmp2)
: "f"(new_x));
http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=df13b9c22a0fb690a0ab9dd4af163ae3c459d975
commit df13b9c22a0fb690a0ab9dd4af163ae3c459d975
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Tue Sep 6 08:50:55 2016 -0700
X86-64: Add _dl_runtime_resolve_avx[512]_{opt|slow} [BZ #20508]
There is transition penalty when SSE instructions are mixed with 256-bit
AVX or 512-bit AVX512 load instructions. Since _dl_runtime_resolve_avx
and _dl_runtime_profile_avx512 save/restore 256-bit YMM/512-bit ZMM
registers, there is transition penalty when SSE instructions are used
with lazy binding on AVX and AVX512 processors.
To avoid SSE transition penalty, if only the lower 128 bits of the first
8 vector registers are non-zero, we can preserve %xmm0 - %xmm7 registers
with the zero upper bits.
For AVX and AVX512 processors which support XGETBV with ECX == 1, we can
use XGETBV with ECX == 1 to check if the upper 128 bits of YMM registers
or the upper 256 bits of ZMM registers are zero. We can restore only the
non-zero portion of vector registers with AVX/AVX512 load instructions
which will zero-extend upper bits of vector registers.
This patch adds _dl_runtime_resolve_sse_vex which saves and restores
XMM registers with 128-bit AVX store/load instructions. It is used to
preserve YMM/ZMM registers when only the lower 128 bits are non-zero.
_dl_runtime_resolve_avx_opt and _dl_runtime_resolve_avx512_opt are added
and used on AVX/AVX512 processors supporting XGETBV with ECX == 1 so
that we store and load only the non-zero portion of vector registers.
This avoids SSE transition penalty caused by _dl_runtime_resolve_avx and
_dl_runtime_profile_avx512 when only the lower 128 bits of vector
registers are used.
_dl_runtime_resolve_avx_slow is added and used for AVX processors which
don't support XGETBV with ECX == 1. Since there is no SSE transition
penalty on AVX512 processors which don't support XGETBV with ECX == 1,
_dl_runtime_resolve_avx512_slow isn't provided.
[BZ #20495]
[BZ #20508]
* sysdeps/x86/cpu-features.c (init_cpu_features): For Intel
processors, set Use_dl_runtime_resolve_slow and set
Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1.
* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
New.
(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
(index_arch_Use_dl_runtime_resolve_opt): Likewise.
(index_arch_Use_dl_runtime_resolve_slow): Likewise.
* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use
_dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt
if Use_dl_runtime_resolve_opt is set. Use
_dl_runtime_resolve_slow if Use_dl_runtime_resolve_slow is set.
* sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>.
(_dl_runtime_resolve_opt): New. Defined for AVX and AVX512.
(_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex.
* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx_slow):
New.
(_dl_runtime_resolve_opt): Likewise.
(_dl_runtime_profile): Define only if _dl_runtime_profile is
defined.
(cherry picked from commit fb0f7a6755c1bfaec38f490fbfcaa39a66ee3604)
diff --git a/ChangeLog b/ChangeLog
index a51771c..406a1f2 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,28 @@
+2016-11-30 H.J. Lu <hongjiu.lu@intel.com>
+
+ [BZ #20495]
+ [BZ #20508]
+ * sysdeps/x86/cpu-features.c (init_cpu_features): For Intel
+ processors, set Use_dl_runtime_resolve_slow and set
+ Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1.
+ * sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
+ New.
+ (bit_arch_Use_dl_runtime_resolve_slow): Likewise.
+ (index_arch_Use_dl_runtime_resolve_opt): Likewise.
+ (index_arch_Use_dl_runtime_resolve_slow): Likewise.
+ * sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use
+ _dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt
+ if Use_dl_runtime_resolve_opt is set. Use
+ _dl_runtime_resolve_slow if Use_dl_runtime_resolve_slow is set.
+ * sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>.
+ (_dl_runtime_resolve_opt): New. Defined for AVX and AVX512.
+ (_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex.
+ * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx_slow):
+ New.
+ (_dl_runtime_resolve_opt): Likewise.
+ (_dl_runtime_profile): Define only if _dl_runtime_profile is
+ defined.
+
2016-11-03 Joseph Myers <joseph@codesourcery.com>
* conform/Makefile ($(linknamespace-header-tests)): Also depend on
http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=b4391b0c7def246a4503db1af683122681c12a56
commit b4391b0c7def246a4503db1af683122681c12a56
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Tue Sep 6 08:50:55 2016 -0700
X86-64: Add _dl_runtime_resolve_avx[512]_{opt|slow} [BZ #20508]
There is transition penalty when SSE instructions are mixed with 256-bit
AVX or 512-bit AVX512 load instructions. Since _dl_runtime_resolve_avx
and _dl_runtime_profile_avx512 save/restore 256-bit YMM/512-bit ZMM
registers, there is transition penalty when SSE instructions are used
with lazy binding on AVX and AVX512 processors.
To avoid SSE transition penalty, if only the lower 128 bits of the first
8 vector registers are non-zero, we can preserve %xmm0 - %xmm7 registers
with the zero upper bits.
For AVX and AVX512 processors which support XGETBV with ECX == 1, we can
use XGETBV with ECX == 1 to check if the upper 128 bits of YMM registers
or the upper 256 bits of ZMM registers are zero. We can restore only the
non-zero portion of vector registers with AVX/AVX512 load instructions
which will zero-extend upper bits of vector registers.
This patch adds _dl_runtime_resolve_sse_vex which saves and restores
XMM registers with 128-bit AVX store/load instructions. It is used to
preserve YMM/ZMM registers when only the lower 128 bits are non-zero.
_dl_runtime_resolve_avx_opt and _dl_runtime_resolve_avx512_opt are added
and used on AVX/AVX512 processors supporting XGETBV with ECX == 1 so
that we store and load only the non-zero portion of vector registers.
This avoids SSE transition penalty caused by _dl_runtime_resolve_avx and
_dl_runtime_profile_avx512 when only the lower 128 bits of vector
registers are used.
_dl_runtime_resolve_avx_slow is added and used for AVX processors which
don't support XGETBV with ECX == 1. Since there is no SSE transition
penalty on AVX512 processors which don't support XGETBV with ECX == 1,
_dl_runtime_resolve_avx512_slow isn't provided.
[BZ #20495]
[BZ #20508]
* sysdeps/x86/cpu-features.c (init_cpu_features): For Intel
processors, set Use_dl_runtime_resolve_slow and set
Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1.
* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
New.
(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
(index_arch_Use_dl_runtime_resolve_opt): Likewise.
(index_arch_Use_dl_runtime_resolve_slow): Likewise.
* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use
_dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt
if Use_dl_runtime_resolve_opt is set. Use
_dl_runtime_resolve_slow if Use_dl_runtime_resolve_slow is set.
* sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>.
(_dl_runtime_resolve_opt): New. Defined for AVX and AVX512.
(_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex.
* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx_slow):
New.
(_dl_runtime_resolve_opt): Likewise.
(_dl_runtime_profile): Define only if _dl_runtime_profile is
defined.
(cherry picked from commit fb0f7a6755c1bfaec38f490fbfcaa39a66ee3604)
(cherry picked from commit 4b8790c81c1a7b870a43810ec95e08a2e501123d)
diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
index 9ce4b49..11b9af2 100644
--- a/sysdeps/x86/cpu-features.c
+++ b/sysdeps/x86/cpu-features.c
@@ -205,6 +205,20 @@ init_cpu_features (struct cpu_features *cpu_features)
if (CPU_FEATURES_ARCH_P (cpu_features, AVX2_Usable))
cpu_features->feature[index_arch_AVX_Fast_Unaligned_Load]
|= bit_arch_AVX_Fast_Unaligned_Load;
+
+ /* To avoid SSE transition penalty, use _dl_runtime_resolve_slow.
+ If XGETBV suports ECX == 1, use _dl_runtime_resolve_opt. */
+ cpu_features->feature[index_arch_Use_dl_runtime_resolve_slow]
+ |= bit_arch_Use_dl_runtime_resolve_slow;
+ if (cpu_features->max_cpuid >= 0xd)
+ {
+ unsigned int eax;
+
+ __cpuid_count (0xd, 1, eax, ebx, ecx, edx);
+ if ((eax & (1 << 2)) != 0)
+ cpu_features->feature[index_arch_Use_dl_runtime_resolve_opt]
+ |= bit_arch_Use_dl_runtime_resolve_opt;
+ }
}
/* This spells out "AuthenticAMD". */
else if (ebx == 0x68747541 && ecx == 0x444d4163 && edx == 0x69746e65)
diff --git a/sysdeps/x86/cpu-features.h b/sysdeps/x86/cpu-features.h
index 97ffe76..a8b5a73 100644
--- a/sysdeps/x86/cpu-features.h
+++ b/sysdeps/x86/cpu-features.h
@@ -37,6 +37,8 @@
#define bit_arch_Prefer_No_VZEROUPPER (1 << 17)
#define bit_arch_Fast_Unaligned_Copy (1 << 18)
#define bit_arch_Prefer_ERMS (1 << 19)
+#define bit_arch_Use_dl_runtime_resolve_opt (1 << 20)
+#define bit_arch_Use_dl_runtime_resolve_slow (1 << 21)
/* CPUID Feature flags. */
@@ -107,6 +109,8 @@
# define index_arch_Prefer_No_VZEROUPPER FEATURE_INDEX_1*FEATURE_SIZE
# define index_arch_Fast_Unaligned_Copy FEATURE_INDEX_1*FEATURE_SIZE
# define index_arch_Prefer_ERMS FEATURE_INDEX_1*FEATURE_SIZE
+# define index_arch_Use_dl_runtime_resolve_opt FEATURE_INDEX_1*FEATURE_SIZE
+# define index_arch_Use_dl_runtime_resolve_slow FEATURE_INDEX_1*FEATURE_SIZE
# if defined (_LIBC) && !IS_IN (nonlib)
@@ -277,6 +281,8 @@ extern const struct cpu_features *__get_cpu_features (void)
# define index_arch_Prefer_No_VZEROUPPER FEATURE_INDEX_1
# define index_arch_Fast_Unaligned_Copy FEATURE_INDEX_1
# define index_arch_Prefer_ERMS FEATURE_INDEX_1
+# define index_arch_Use_dl_runtime_resolve_opt FEATURE_INDEX_1
+# define index_arch_Use_dl_runtime_resolve_slow FEATURE_INDEX_1
#endif /* !__ASSEMBLER__ */
diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
index ed0c1a8..c0f0fa1 100644
--- a/sysdeps/x86_64/dl-machine.h
+++ b/sysdeps/x86_64/dl-machine.h
@@ -68,7 +68,10 @@ elf_machine_runtime_setup (struct link_map *l, int lazy, int profile)
Elf64_Addr *got;
extern void _dl_runtime_resolve_sse (ElfW(Word)) attribute_hidden;
extern void _dl_runtime_resolve_avx (ElfW(Word)) attribute_hidden;
+ extern void _dl_runtime_resolve_avx_slow (ElfW(Word)) attribute_hidden;
+ extern void _dl_runtime_resolve_avx_opt (ElfW(Word)) attribute_hidden;
extern void _dl_runtime_resolve_avx512 (ElfW(Word)) attribute_hidden;
+ extern void _dl_runtime_resolve_avx512_opt (ElfW(Word)) attribute_hidden;
extern void _dl_runtime_profile_sse (ElfW(Word)) attribute_hidden;
extern void _dl_runtime_profile_avx (ElfW(Word)) attribute_hidden;
extern void _dl_runtime_profile_avx512 (ElfW(Word)) attribute_hidden;
@@ -118,9 +121,26 @@ elf_machine_runtime_setup (struct link_map *l, int lazy, int profile)
indicated by the offset on the stack, and then jump to
the resolved address. */
if (HAS_ARCH_FEATURE (AVX512F_Usable))
- *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_resolve_avx512;
+ {
+ if (HAS_ARCH_FEATURE (Use_dl_runtime_resolve_opt))
+ *(ElfW(Addr) *) (got + 2)
+ = (ElfW(Addr)) &_dl_runtime_resolve_avx512_opt;
+ else
+ *(ElfW(Addr) *) (got + 2)
+ = (ElfW(Addr)) &_dl_runtime_resolve_avx512;
+ }
else if (HAS_ARCH_FEATURE (AVX_Usable))
- *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_resolve_avx;
+ {
+ if (HAS_ARCH_FEATURE (Use_dl_runtime_resolve_opt))
+ *(ElfW(Addr) *) (got + 2)
+ = (ElfW(Addr)) &_dl_runtime_resolve_avx_opt;
+ else if (HAS_ARCH_FEATURE (Use_dl_runtime_resolve_slow))
+ *(ElfW(Addr) *) (got + 2)
+ = (ElfW(Addr)) &_dl_runtime_resolve_avx_slow;
+ else
+ *(ElfW(Addr) *) (got + 2)
+ = (ElfW(Addr)) &_dl_runtime_resolve_avx;
+ }
else
*(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_resolve_sse;
}
diff --git a/sysdeps/x86_64/dl-trampoline.S b/sysdeps/x86_64/dl-trampoline.S
index 12f1a5c..39f595e 100644
--- a/sysdeps/x86_64/dl-trampoline.S
+++ b/sysdeps/x86_64/dl-trampoline.S
@@ -18,6 +18,7 @@
#include <config.h>
#include <sysdep.h>
+#include <cpu-features.h>
#include <link-defines.h>
#ifndef DL_STACK_ALIGNMENT
@@ -86,9 +87,11 @@
#endif
#define VEC(i) zmm##i
#define _dl_runtime_resolve _dl_runtime_resolve_avx512
+#define _dl_runtime_resolve_opt _dl_runtime_resolve_avx512_opt
#define _dl_runtime_profile _dl_runtime_profile_avx512
#include "dl-trampoline.h"
#undef _dl_runtime_resolve
+#undef _dl_runtime_resolve_opt
#undef _dl_runtime_profile
#undef VEC
#undef VMOV
@@ -104,9 +107,11 @@
#endif
#define VEC(i) ymm##i
#define _dl_runtime_resolve _dl_runtime_resolve_avx
+#define _dl_runtime_resolve_opt _dl_runtime_resolve_avx_opt
#define _dl_runtime_profile _dl_runtime_profile_avx
#include "dl-trampoline.h"
#undef _dl_runtime_resolve
+#undef _dl_runtime_resolve_opt
#undef _dl_runtime_profile
#undef VEC
#undef VMOV
@@ -126,3 +131,18 @@
#define _dl_runtime_profile _dl_runtime_profile_sse
#undef RESTORE_AVX
#include "dl-trampoline.h"
+#undef _dl_runtime_resolve
+#undef _dl_runtime_profile
+#undef VMOV
+#undef VMOVA
+
+/* Used by _dl_runtime_resolve_avx_opt/_dl_runtime_resolve_avx512_opt
+ to preserve the full vector registers with zero upper bits. */
+#define VMOVA vmovdqa
+#if DL_RUNTIME_RESOLVE_REALIGN_STACK || VEC_SIZE <= DL_STACK_ALIGNMENT
+# define VMOV vmovdqa
+#else
+# define VMOV vmovdqu
+#endif
+#define _dl_runtime_resolve _dl_runtime_resolve_sse_vex
+#include "dl-trampoline.h"
diff --git a/sysdeps/x86_64/dl-trampoline.h b/sysdeps/x86_64/dl-trampoline.h
index b90836a..abe4471 100644
--- a/sysdeps/x86_64/dl-trampoline.h
+++ b/sysdeps/x86_64/dl-trampoline.h
@@ -50,6 +50,105 @@
#endif
.text
+#ifdef _dl_runtime_resolve_opt
+/* Use the smallest vector registers to preserve the full YMM/ZMM
+ registers to avoid SSE transition penalty. */
+
+# if VEC_SIZE == 32
+/* Check if the upper 128 bits in %ymm0 - %ymm7 registers are non-zero
+ and preserve %xmm0 - %xmm7 registers with the zero upper bits. Since
+ there is no SSE transition penalty on AVX512 processors which don't
+ support XGETBV with ECX == 1, _dl_runtime_resolve_avx512_slow isn't
+ provided. */
+ .globl _dl_runtime_resolve_avx_slow
+ .hidden _dl_runtime_resolve_avx_slow
+ .type _dl_runtime_resolve_avx_slow, @function
+ .align 16
+_dl_runtime_resolve_avx_slow:
+ cfi_startproc
+ cfi_adjust_cfa_offset(16) # Incorporate PLT
+ vorpd %ymm0, %ymm1, %ymm8
+ vorpd %ymm2, %ymm3, %ymm9
+ vorpd %ymm4, %ymm5, %ymm10
+ vorpd %ymm6, %ymm7, %ymm11
+ vorpd %ymm8, %ymm9, %ymm9
+ vorpd %ymm10, %ymm11, %ymm10
+ vpcmpeqd %xmm8, %xmm8, %xmm8
+ vorpd %ymm9, %ymm10, %ymm10
+ vptest %ymm10, %ymm8
+ # Preserve %ymm0 - %ymm7 registers if the upper 128 bits of any
+ # %ymm0 - %ymm7 registers aren't zero.
+ PRESERVE_BND_REGS_PREFIX
+ jnc _dl_runtime_resolve_avx
+ # Use vzeroupper to avoid SSE transition penalty.
+ vzeroupper
+ # Preserve %xmm0 - %xmm7 registers with the zero upper 128 bits
+ # when the upper 128 bits of %ymm0 - %ymm7 registers are zero.
+ PRESERVE_BND_REGS_PREFIX
+ jmp _dl_runtime_resolve_sse_vex
+ cfi_adjust_cfa_offset(-16) # Restore PLT adjustment
+ cfi_endproc
+ .size _dl_runtime_resolve_avx_slow, .-_dl_runtime_resolve_avx_slow
+# endif
+
+/* Use XGETBV with ECX == 1 to check which bits in vector registers are
+ non-zero and only preserve the non-zero lower bits with zero upper
+ bits. */
+ .globl _dl_runtime_resolve_opt
+ .hidden _dl_runtime_resolve_opt
+ .type _dl_runtime_resolve_opt, @function
+ .align 16
+_dl_runtime_resolve_opt:
+ cfi_startproc
+ cfi_adjust_cfa_offset(16) # Incorporate PLT
+ pushq %rax
+ cfi_adjust_cfa_offset(8)
+ cfi_rel_offset(%rax, 0)
+ pushq %rcx
+ cfi_adjust_cfa_offset(8)
+ cfi_rel_offset(%rcx, 0)
+ pushq %rdx
+ cfi_adjust_cfa_offset(8)
+ cfi_rel_offset(%rdx, 0)
+ movl $1, %ecx
+ xgetbv
+ movl %eax, %r11d
+ popq %rdx
+ cfi_adjust_cfa_offset(-8)
+ cfi_restore (%rdx)
+ popq %rcx
+ cfi_adjust_cfa_offset(-8)
+ cfi_restore (%rcx)
+ popq %rax
+ cfi_adjust_cfa_offset(-8)
+ cfi_restore (%rax)
+# if VEC_SIZE == 32
+ # For YMM registers, check if YMM state is in use.
+ andl $bit_YMM_state, %r11d
+ # Preserve %xmm0 - %xmm7 registers with the zero upper 128 bits if
+ # YMM state isn't in use.
+ PRESERVE_BND_REGS_PREFIX
+ jz _dl_runtime_resolve_sse_vex
+# elif VEC_SIZE == 64
+ # For ZMM registers, check if YMM state and ZMM state are in
+ # use.
+ andl $(bit_YMM_state | bit_ZMM0_15_state), %r11d
+ cmpl $bit_YMM_state, %r11d
+ # Preserve %xmm0 - %xmm7 registers with the zero upper 384 bits if
+ # neither YMM state nor ZMM state are in use.
+ PRESERVE_BND_REGS_PREFIX
+ jl _dl_runtime_resolve_sse_vex
+ # Preserve %ymm0 - %ymm7 registers with the zero upper 256 bits if
+ # ZMM state isn't in use.
+ PRESERVE_BND_REGS_PREFIX
+ je _dl_runtime_resolve_avx
+# else
+# error Unsupported VEC_SIZE!
+# endif
+ cfi_adjust_cfa_offset(-16) # Restore PLT adjustment
+ cfi_endproc
+ .size _dl_runtime_resolve_opt, .-_dl_runtime_resolve_opt
+#endif
.globl _dl_runtime_resolve
.hidden _dl_runtime_resolve
.type _dl_runtime_resolve, @function
@@ -162,7 +261,10 @@ _dl_runtime_resolve:
.size _dl_runtime_resolve, .-_dl_runtime_resolve
-#ifndef PROF
+/* To preserve %xmm0 - %xmm7 registers, dl-trampoline.h is included
+ twice, for _dl_runtime_resolve_sse and _dl_runtime_resolve_sse_vex.
+ But we don't need another _dl_runtime_profile for XMM registers. */
+#if !defined PROF && defined _dl_runtime_profile
# if (LR_VECTOR_OFFSET % VEC_SIZE) != 0
# error LR_VECTOR_OFFSET must be multples of VEC_SIZE
# endif
http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=0d5f4a32a34f048b35360a110a0e6d1c87e3eced
commit 0d5f4a32a34f048b35360a110a0e6d1c87e3eced
Author: Aurelien Jarno <aurelien@aurel32.net>
Date: Thu Nov 24 12:10:13 2016 +0100
x86_64: fix static build of __memcpy_chk for compilers defaulting to PIC/PIE
When glibc is compiled with gcc 6.2 that has been configured with
to default to PIC/PIE, the static version of __memcpy_chk is not built,
as the test is done on PIC instead of SHARED. Fix the test to check for
SHARED, like it is done for similar functions like memmove_chk.
Changelog:
* sysdeps/x86_64/memcpy_chk.S (__memcpy_chk): Check for SHARED
instead of PIC.
(cherry picked from commit 380ec16d62f459d5a28cfc25b7b20990c45e1cc9)
(cherry picked from commit 2d16e81babd1d7b66d10cec0bc6d6d86a7e0c95e)
diff --git a/sysdeps/x86_64/memcpy_chk.S b/sysdeps/x86_64/memcpy_chk.S
index 2296b55..a95b3ad 100644
--- a/sysdeps/x86_64/memcpy_chk.S
+++ b/sysdeps/x86_64/memcpy_chk.S
@@ -19,7 +19,7 @@
#include <sysdep.h>
#include "asm-syntax.h"
-#ifndef PIC
+#ifndef SHARED
/* For libc.so this is defined in memcpy.S.
For libc.a, this is a separate source to avoid
memcpy bringing in __chk_fail and all routines
http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=0ab02a62e42e63b058e7a4e160dbe51762ef2c46
commit 0ab02a62e42e63b058e7a4e160dbe51762ef2c46
Author: Maciej W. Rozycki <macro@imgtec.com>
Date: Thu Nov 17 19:15:51 2016 +0000
MIPS: Add `.insn' to ensure a text label is defined as code not data
Avoid a build error with microMIPS compilation and recent versions of
GAS which complain if a branch targets a label which is marked as data
rather than microMIPS code:
../sysdeps/mips/mips32/crti.S: Assembler messages:
../sysdeps/mips/mips32/crti.S:72: Error: branch to a symbol in another ISA mode
make[2]: *** [.../csu/crti.o] Error 1
as commit 9d862524f6ae ("MIPS: Verify the ISA mode and alignment of
branch and jump targets") closed a hole in branch processing, making
relocation calculation respect the ISA mode of the symbol referred.
This allowed diagnosing the situation where an attempt is made to pass
control from code assembled for one ISA mode to code assembled for a
different ISA mode and either relaxing the branch to a cross-mode jump
or if that is not possible, then reporting this as an error rather than
letting such code build and then fail unpredictably at the run time.
This however requires the correct annotation of branch targets as code,
because the ISA mode is not relevant for data symbols and is therefore
not recorded for them. The `.insn' pseudo-op is used for this purpose
and has been supported by GAS since:
Wed Feb 12 14:36:29 1997 Ian Lance Taylor <ian@cygnus.com>
* config/tc-mips.c (mips_pseudo_table): Add "insn".
(s_insn): New static function.
* doc/c-mips.texi: Document .insn.
so there has been no reason to avoid it where required. More recently
this pseudo-op has been documented, by the microMIPS architecture
specification[1][2], as required for the correct interpretation of any
code label which is not followed by an actual instruction in an assembly
source.
Use it in our crti.S files then, to mark that the trailing label there
with no instructions following is indeed not a code bug and the branch
is legitimate.
References:
[1] "MIPS Architecture for Programmers, Volume II-B: The microMIPS32
Instruction Set", MIPS Technologies, Inc., Document Number: MD00582,
Revision 5.04, January 15, 2014, Section 7.1 "Assembly-Level
Compatibility", p. 533
[2] "MIPS Architecture for Programmers, Volume II-B: The microMIPS64
Instruction Set", MIPS Technologies, Inc., Document Number: MD00594,
Revision 5.04, January 15, 2014, Section 8.1 "Assembly-Level
Compatibility", p. 623
2016-11-23 Matthew Fortune <Matthew.Fortune@imgtec.com>
Maciej W. Rozycki <macro@imgtec.com>
* sysdeps/mips/mips32/crti.S (_init): Add `.insn' pseudo-op at
`.Lno_weak_fn' label.
* sysdeps/mips/mips64/n32/crti.S (_init): Likewise.
* sysdeps/mips/mips64/n64/crti.S (_init): Likewise.
(cherry picked from commit cfaf1949ff1f8336b54c43796d0e2531bc8a40a2)
(cherry picked from commit 65a2b63756a4d622b938910d582d8b807c471c9a)
diff --git a/sysdeps/mips/mips32/crti.S b/sysdeps/mips/mips32/crti.S
index 5c0ad73..dfbbdc4 100644
--- a/sysdeps/mips/mips32/crti.S
+++ b/sysdeps/mips/mips32/crti.S
@@ -74,6 +74,7 @@ _init:
.reloc 1f,R_MIPS_JALR,PREINIT_FUNCTION
1: jalr $25
.Lno_weak_fn:
+ .insn
#else
lw $25,%got(PREINIT_FUNCTION)($28)
.reloc 1f,R_MIPS_JALR,PREINIT_FUNCTION
diff --git a/sysdeps/mips/mips64/n32/crti.S b/sysdeps/mips/mips64/n32/crti.S
index 00b89f3..afe6d8e 100644
--- a/sysdeps/mips/mips64/n32/crti.S
+++ b/sysdeps/mips/mips64/n32/crti.S
@@ -74,6 +74,7 @@ _init:
.reloc 1f,R_MIPS_JALR,PREINIT_FUNCTION
1: jalr $25
.Lno_weak_fn:
+ .insn
#else
lw $25,%got_disp(PREINIT_FUNCTION)($28)
.reloc 1f,R_MIPS_JALR,PREINIT_FUNCTION
diff --git a/sysdeps/mips/mips64/n64/crti.S b/sysdeps/mips/mips64/n64/crti.S
index f59b20c..4049d29 100644
--- a/sysdeps/mips/mips64/n64/crti.S
+++ b/sysdeps/mips/mips64/n64/crti.S
@@ -74,6 +74,7 @@ _init:
.reloc 1f,R_MIPS_JALR,PREINIT_FUNCTION
1: jalr $25
.Lno_weak_fn:
+ .insn
#else
ld $25,%got_disp(PREINIT_FUNCTION)($28)
.reloc 1f,R_MIPS_JALR,PREINIT_FUNCTION
http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=901db98f36690e4743feefd985c6ba2d7fd19813
commit 901db98f36690e4743feefd985c6ba2d7fd19813
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date: Mon Nov 21 11:06:15 2016 -0200
Fix writes past the allocated array bounds in execvpe (BZ#20847)
This patch fixes an invalid write out or stack allocated buffer in
2 places at execvpe implementation:
1. On 'maybe_script_execute' function where it allocates the new
argument list and it does not account that a minimum of argc
plus 3 elements (default shell path, script name, arguments,
and ending null pointer) should be considered. The straightforward
fix is just to take account of the correct list size on argument
copy.
2. On '__execvpe' where the executable file name lenght may not
account for ending '\0' and thus subsequent path creation may
write past array bounds because it requires to add the terminating
null. The fix is to change how to calculate the executable name
size to add the final '\0' and adjust the rest of the code
accordingly.
As described in GCC bug report 78433 [1], these issues were masked off by
GCC because it allocated several bytes more than necessary so that many
off-by-one bugs went unnoticed.
Checked on x86_64 with a latest GCC (7.0.0 20161121) with -O3 on CFLAGS.
[BZ #20847]
* posix/execvpe.c (maybe_script_execute): Remove write past allocated
array bounds.
(__execvpe): Likewise.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78433
(cherry picked from commit d174436712e3cabce70d6cd771f177b6fe0e097b)
diff --git a/posix/execvpe.c b/posix/execvpe.c
index d933f9c..7cdb06a 100644
--- a/posix/execvpe.c
+++ b/posix/execvpe.c
@@ -48,12 +48,13 @@ maybe_script_execute (const char *file, char *const argv[], char *const envp[])
}
}
- /* Construct an argument list for the shell. */
+ /* Construct an argument list for the shell. It will contain at minimum 3
+ arguments (current shell, script, and an ending NULL. */
char *new_argv[argc + 1];
new_argv[0] = (char *) _PATH_BSHELL;
new_argv[1] = (char *) file;
if (argc > 1)
- memcpy (new_argv + 2, argv + 1, argc * sizeof(char *));
+ memcpy (new_argv + 2, argv + 1, (argc - 1) * sizeof(char *));
else
new_argv[2] = NULL;
@@ -91,10 +92,11 @@ __execvpe (const char *file, char *const argv[], char *const envp[])
/* Although GLIBC does not enforce NAME_MAX, we set it as the maximum
size to avoid unbounded stack allocation. Same applies for
PATH_MAX. */
- size_t file_len = __strnlen (file, NAME_MAX + 1);
+ size_t file_len = __strnlen (file, NAME_MAX) + 1;
size_t path_len = __strnlen (path, PATH_MAX - 1) + 1;
- if ((file_len > NAME_MAX)
+ /* NAME_MAX does not include the terminating null character. */
+ if (((file_len-1) > NAME_MAX)
|| !__libc_alloca_cutoff (path_len + file_len + 1))
{
errno = ENAMETOOLONG;
@@ -103,6 +105,9 @@ __execvpe (const char *file, char *const argv[], char *const envp[])
const char *subp;
bool got_eacces = false;
+ /* The resulting string maximum size would be potentially a entry
+ in PATH plus '/' (path_len + 1) and then the the resulting file name
+ plus '\0' (file_len since it already accounts for the '\0'). */
char buffer[path_len + file_len + 1];
for (const char *p = path; ; p = subp)
{
@@ -123,7 +128,7 @@ __execvpe (const char *file, char *const argv[], char *const envp[])
execute. */
char *pend = mempcpy (buffer, p, subp - p);
*pend = '/';
- memcpy (pend + (p < subp), file, file_len + 1);
+ memcpy (pend + (p < subp), file, file_len);
__execve (buffer, argv, envp);
-----------------------------------------------------------------------
Summary of changes:
ChangeLog | 25 ++++++++++
posix/execvpe.c | 15 ++++--
sysdeps/alpha/fpu/s_ceil.c | 7 +--
sysdeps/alpha/fpu/s_ceilf.c | 7 +--
sysdeps/alpha/fpu/s_floor.c | 7 +--
sysdeps/alpha/fpu/s_floorf.c | 7 +--
sysdeps/alpha/fpu/s_rint.c | 3 +
sysdeps/alpha/fpu/s_rintf.c | 3 +
sysdeps/alpha/fpu/s_trunc.c | 7 +--
sysdeps/alpha/fpu/s_truncf.c | 7 +--
sysdeps/mips/mips32/crti.S | 1 +
sysdeps/mips/mips64/n32/crti.S | 1 +
sysdeps/mips/mips64/n64/crti.S | 1 +
sysdeps/x86/cpu-features.c | 14 +++++
sysdeps/x86/cpu-features.h | 6 ++
sysdeps/x86_64/dl-machine.h | 24 ++++++++-
sysdeps/x86_64/dl-trampoline.S | 20 ++++++++
sysdeps/x86_64/dl-trampoline.h | 104 +++++++++++++++++++++++++++++++++++++++-
sysdeps/x86_64/memcpy_chk.S | 2 +-
19 files changed, 228 insertions(+), 33 deletions(-)
hooks/post-receive
--
GNU C Library master sources