This is the mail archive of the glibc-cvs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

GNU C Library master sources branch master updated. glibc-2.21-478-g1663be0


This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  1663be053d50c06bb0f971c87d41a7b83f96fe15 (commit)
      from  9c02f663f6b387b3905b629ffe584c9abf2030dc (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=1663be053d50c06bb0f971c87d41a7b83f96fe15

commit 1663be053d50c06bb0f971c87d41a7b83f96fe15
Author: Andrew Senkevich <andrew.senkevich@intel.com>
Date:   Wed Jun 17 16:10:51 2015 +0300

    Vector expf for x86_64 and tests.
    
    Here is implementation of vectorized expf containing SSE, AVX,
    AVX2 and AVX512 versions according to Vector ABI
    <https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>.
    
        * sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New symbols added.
        * sysdeps/x86/fpu/bits/math-vector.h: Added SIMD declaration and asm
        redirections for expf.
        * sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files.
        * sysdeps/x86_64/fpu/Versions: New versions added.
        * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated.
        * sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added
        build of SSE, AVX2 and AVX512 IFUNC versions.
        * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core.S: New file.
        * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core_avx512.S: New file.
        * sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core.S: New file.
        * sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core_sse4.S: New file.
        * sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core.S: New file.
        * sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core_avx2.S: New file.
        * sysdeps/x86_64/fpu/svml_s_expf16_core.S: New file.
        * sysdeps/x86_64/fpu/svml_s_expf4_core.S: New file.
        * sysdeps/x86_64/fpu/svml_s_expf8_core.S: New file.
        * sysdeps/x86_64/fpu/svml_s_expf8_core_avx.S: New file.
        * sysdeps/x86_64/fpu/svml_s_expf_data.S: New file.
        * sysdeps/x86_64/fpu/svml_s_expf_data.h: New file.
        * sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Vector expf tests.
        * sysdeps/x86_64/fpu/test-float-vlen16.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-vlen4.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-vlen8-avx2.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-vlen8.c: Likewise.
        * NEWS: Mention addition of x86_64 vector expf.

diff --git a/ChangeLog b/ChangeLog
index 62ddb2f..8122db3 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,35 @@
 2015-06-17  Andrew Senkevich  <andrew.senkevich@intel.com>
 
+	* sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New symbols added.
+	* sysdeps/x86/fpu/bits/math-vector.h: Added SIMD declaration and asm
+	redirections for expf.
+	* sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files.
+	* sysdeps/x86_64/fpu/Versions: New versions added.
+	* sysdeps/x86_64/fpu/libm-test-ulps: Regenerated.
+	* sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added
+	build of SSE, AVX2 and AVX512 IFUNC versions.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core.S: New file.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core_avx512.S: New file.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core.S: New file.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core_sse4.S: New file.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core.S: New file.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core_avx2.S: New file.
+	* sysdeps/x86_64/fpu/svml_s_expf16_core.S: New file.
+	* sysdeps/x86_64/fpu/svml_s_expf4_core.S: New file.
+	* sysdeps/x86_64/fpu/svml_s_expf8_core.S: New file.
+	* sysdeps/x86_64/fpu/svml_s_expf8_core_avx.S: New file.
+	* sysdeps/x86_64/fpu/svml_s_expf_data.S: New file.
+	* sysdeps/x86_64/fpu/svml_s_expf_data.h: New file.
+	* sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Vector expf tests.
+	* sysdeps/x86_64/fpu/test-float-vlen16.c: Likewise.
+	* sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise.
+	* sysdeps/x86_64/fpu/test-float-vlen4.c: Likewise.
+	* sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise.
+	* sysdeps/x86_64/fpu/test-float-vlen8-avx2.c: Likewise.
+	* sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise.
+	* sysdeps/x86_64/fpu/test-float-vlen8.c: Likewise.
+	* NEWS: Mention addition of x86_64 vector expf.
+
 	* bits/libm-simd-decl-stubs.h: Added stubs for exp.
 	* math/bits/mathcalls.h: Added exp declaration with __MATHCALL_VEC.
 	* sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New versions added.
diff --git a/NEWS b/NEWS
index bfb4487..d66a64b 100644
--- a/NEWS
+++ b/NEWS
@@ -53,7 +53,7 @@ Version 2.22
   condition in some applications.
 
 * Added vector math library named libmvec with the following vectorized x86_64
-  implementations: cos, cosf, sin, sinf, log, logf, exp.
+  implementations: cos, cosf, sin, sinf, log, logf, exp, expf.
   The library can be disabled with --disable-mathvec. Use of the functions is
   enabled with -fopenmp -ffast-math starting from -O1 for GCC version >= 4.9.0.
   The library is linked in as needed when using -lm (no need to specify -lmvec
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
index ff9431f..9652215 100644
--- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -5,6 +5,7 @@ GLIBC_2.22
  _ZGVbN2v_log F
  _ZGVbN2v_sin F
  _ZGVbN4v_cosf F
+ _ZGVbN4v_expf F
  _ZGVbN4v_logf F
  _ZGVbN4v_sinf F
  _ZGVcN4v_cos F
@@ -12,6 +13,7 @@ GLIBC_2.22
  _ZGVcN4v_log F
  _ZGVcN4v_sin F
  _ZGVcN8v_cosf F
+ _ZGVcN8v_expf F
  _ZGVcN8v_logf F
  _ZGVcN8v_sinf F
  _ZGVdN4v_cos F
@@ -19,9 +21,11 @@ GLIBC_2.22
  _ZGVdN4v_log F
  _ZGVdN4v_sin F
  _ZGVdN8v_cosf F
+ _ZGVdN8v_expf F
  _ZGVdN8v_logf F
  _ZGVdN8v_sinf F
  _ZGVeN16v_cosf F
+ _ZGVeN16v_expf F
  _ZGVeN16v_logf F
  _ZGVeN16v_sinf F
  _ZGVeN8v_cos F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
index 9a353bc..3b71589 100644
--- a/sysdeps/x86/fpu/bits/math-vector.h
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -42,6 +42,8 @@
 #  define __DECL_SIMD_logf __DECL_SIMD_x86_64
 #  undef __DECL_SIMD_exp
 #  define __DECL_SIMD_exp __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_expf
+#  define __DECL_SIMD_expf __DECL_SIMD_x86_64
 
 /* Workaround to exclude unnecessary symbol aliases in libmvec
    while GCC creates the vector names based on scalar asm name.
@@ -59,6 +61,10 @@ __asm__ ("_ZGVbN2v___exp_finite = _ZGVbN2v_exp");
 __asm__ ("_ZGVcN4v___exp_finite = _ZGVcN4v_exp");
 __asm__ ("_ZGVdN4v___exp_finite = _ZGVdN4v_exp");
 __asm__ ("_ZGVeN8v___exp_finite = _ZGVeN8v_exp");
+__asm__ ("_ZGVbN4v___expf_finite = _ZGVbN4v_expf");
+__asm__ ("_ZGVcN8v___expf_finite = _ZGVcN8v_expf");
+__asm__ ("_ZGVdN8v___expf_finite = _ZGVdN8v_expf");
+__asm__ ("_ZGVeN16v___expf_finite = _ZGVeN16v_expf");
 
 # endif
 #endif
diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile
index bd6d693..eab738f 100644
--- a/sysdeps/x86_64/fpu/Makefile
+++ b/sysdeps/x86_64/fpu/Makefile
@@ -12,6 +12,8 @@ libmvec-support += svml_d_cos2_core svml_d_cos4_core_avx \
 		   svml_s_logf8_core_avx svml_s_logf8_core svml_s_logf16_core \
 		   svml_s_logf_data svml_d_exp2_core svml_d_exp4_core_avx \
 		   svml_d_exp4_core svml_d_exp8_core svml_d_exp_data \
+		   svml_s_expf4_core svml_s_expf8_core_avx svml_s_expf8_core \
+		   svml_s_expf16_core svml_s_expf_data \
 		   init-arch
 endif
 
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
index 00e34e7..0eaa8e8 100644
--- a/sysdeps/x86_64/fpu/Versions
+++ b/sysdeps/x86_64/fpu/Versions
@@ -7,5 +7,6 @@ libmvec {
     _ZGVbN4v_cosf; _ZGVcN8v_cosf; _ZGVdN8v_cosf; _ZGVeN16v_cosf;
     _ZGVbN4v_sinf; _ZGVcN8v_sinf; _ZGVdN8v_sinf; _ZGVeN16v_sinf;
     _ZGVbN4v_logf; _ZGVcN8v_logf; _ZGVdN8v_logf; _ZGVeN16v_logf;
+    _ZGVbN4v_expf; _ZGVcN8v_expf; _ZGVdN8v_expf; _ZGVeN16v_expf;
   }
 }
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index 45ebc04..ba1367f 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -1535,17 +1535,25 @@ idouble: 1
 ildouble: 1
 ldouble: 1
 
+Function: "exp_vlen16":
+float: 1
+
 Function: "exp_vlen2":
 double: 1
 
 Function: "exp_vlen4":
 double: 1
+float: 1
 
 Function: "exp_vlen4_avx2":
 double: 1
 
 Function: "exp_vlen8":
 double: 1
+float: 1
+
+Function: "exp_vlen8_avx2":
+float: 1
 
 Function: "expm1":
 double: 1
diff --git a/sysdeps/x86_64/fpu/multiarch/Makefile b/sysdeps/x86_64/fpu/multiarch/Makefile
index d6355ae..9e10251 100644
--- a/sysdeps/x86_64/fpu/multiarch/Makefile
+++ b/sysdeps/x86_64/fpu/multiarch/Makefile
@@ -63,5 +63,7 @@ libmvec-sysdep_routines += svml_d_cos2_core_sse4 svml_d_cos4_core_avx2 \
 			   svml_s_sinf8_core_avx2 svml_s_sinf16_core_avx512 \
 			   svml_s_logf4_core_sse4 svml_s_logf8_core_avx2 \
 			   svml_s_logf16_core_avx512 svml_d_exp2_core_sse4 \
-			   svml_d_exp4_core_avx2 svml_d_exp8_core_avx512
+			   svml_d_exp4_core_avx2 svml_d_exp8_core_avx512 \
+			   svml_s_expf4_core_sse4 svml_s_expf8_core_avx2 \
+			   svml_s_expf16_core_avx512
 endif
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2.c b/sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core.S
similarity index 50%
copy from sysdeps/x86_64/fpu/test-float-vlen8-avx2.c
copy to sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core.S
index 1026d63..3b3489d 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2.c
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core.S
@@ -1,4 +1,4 @@
-/* Tests for AVX2 ISA versions of vector math functions.
+/* Multiple versions of vectorized expf.
    Copyright (C) 2014-2015 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
@@ -16,15 +16,24 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#include "test-float-vlen8.h"
-
-#undef VEC_SUFF
-#define VEC_SUFF _vlen8_avx2
-
-#define TEST_VECTOR_cosf 1
-#define TEST_VECTOR_sinf 1
-#define TEST_VECTOR_logf 1
-
-#define REQUIRE_AVX2
-
-#include "libm-test.c"
+#include <sysdep.h>
+#include <init-arch.h>
+
+	.text
+ENTRY (_ZGVeN16v_expf)
+        .type   _ZGVeN16v_expf, @gnu_indirect_function
+        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
+        jne     1
+        call    __init_cpu_features
+1:      leaq    _ZGVeN16v_expf_skx(%rip), %rax
+        testl   $bit_AVX512DQ_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512DQ_Usable(%rip)
+        jnz     3
+2:      leaq    _ZGVeN16v_expf_knl(%rip), %rax
+        testl   $bit_AVX512F_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512F_Usable(%rip)
+        jnz     3
+        leaq    _ZGVeN16v_expf_avx2_wrapper(%rip), %rax
+3:      ret
+END (_ZGVeN16v_expf)
+
+#define _ZGVeN16v_expf _ZGVeN16v_expf_avx2_wrapper
+#include "../svml_s_expf16_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core_avx512.S
new file mode 100644
index 0000000..cb807e0
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core_avx512.S
@@ -0,0 +1,446 @@
+/* Function expf vectorized with AVX-512. KNL and SKX versions.
+   Copyright (C) 2014-2015 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_expf_data.h"
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN16v_expf_knl)
+#ifndef HAVE_AVX512_ASM_SUPPORT
+WRAPPER_IMPL_AVX512 _ZGVdN8v_expf
+#else
+/*
+   ALGORITHM DESCRIPTION:
+
+     Argument representation:
+     M = rint(X*2^k/ln2) = 2^k*N+j
+     X = M*ln2/2^k + r = N*ln2 + ln2*(j/2^k) + r
+     then -ln2/2^(k+1) < r < ln2/2^(k+1)
+     Alternatively:
+     M = trunc(X*2^k/ln2)
+     then 0 < r < ln2/2^k
+
+     Result calculation:
+     exp(X) = exp(N*ln2 + ln2*(j/2^k) + r)
+     = 2^N * 2^(j/2^k) * exp(r)
+     2^N is calculated by bit manipulation
+     2^(j/2^k) is computed from table lookup
+     exp(r) is approximated by polynomial
+
+     The table lookup is skipped if k = 0.
+     For low accuracy approximation, exp(r) ~ 1 or 1+r.  */
+
+        cfi_adjust_cfa_offset (8)
+        cfi_rel_offset (%rbp, 0)
+        movq      %rsp, %rbp
+        cfi_def_cfa_register (%rbp)
+        andq      $-64, %rsp
+        subq      $1280, %rsp
+        movq      __svml_sexp_data@GOTPCREL(%rip), %rax
+
+/* r = x-n*ln2_hi/2^k */
+        vmovaps   %zmm0, %zmm6
+
+/* compare against threshold */
+        movl      $-1, %ecx
+        vmovups   __sInvLn2(%rax), %zmm3
+        vmovups   __sLn2hi(%rax), %zmm5
+
+/* m = x*2^k/ln2 + shifter */
+        vfmadd213ps __sShifter(%rax), %zmm0, %zmm3
+        vmovups     __sPC5(%rax), %zmm9
+
+/* n = m - shifter = rint(x*2^k/ln2) */
+        vsubps    __sShifter(%rax), %zmm3, %zmm7
+
+/* remove sign of x by "and" operation */
+        vpandd   __iAbsMask(%rax), %zmm0, %zmm1
+        vpaddd   __iBias(%rax), %zmm3, %zmm4
+        vpcmpgtd __iDomainRange(%rax), %zmm1, %k1
+
+/* compute 2^N with "shift" */
+        vpslld       $23, %zmm4, %zmm8
+        vfnmadd231ps %zmm7, %zmm5, %zmm6
+        vpbroadcastd %ecx, %zmm2{%k1}{z}
+
+/* r = r-n*ln2_lo/2^k = x - n*ln2/2^k */
+        vfnmadd132ps __sLn2lo(%rax), %zmm6, %zmm7
+
+/* set mask for overflow/underflow */
+        vptestmd  %zmm2, %zmm2, %k0
+        kmovw     %k0, %ecx
+
+/* c5*r+c4 */
+        vfmadd213ps __sPC4(%rax), %zmm7, %zmm9
+
+/* (c5*r+c4)*r+c3 */
+        vfmadd213ps __sPC3(%rax), %zmm7, %zmm9
+
+/* ((c5*r+c4)*r+c3)*r+c2 */
+        vfmadd213ps __sPC2(%rax), %zmm7, %zmm9
+
+/* (((c5*r+c4)*r+c3)*r+c2)*r+c1 */
+        vfmadd213ps __sPC1(%rax), %zmm7, %zmm9
+
+/* exp(r) = ((((c5*r+c4)*r+c3)*r+c2)*r+c1)*r+c0 */
+        vfmadd213ps __sPC0(%rax), %zmm7, %zmm9
+
+/* 2^N*exp(r) */
+        vmulps    %zmm9, %zmm8, %zmm1
+        testl     %ecx, %ecx
+        jne       .LBL_1_3
+
+.LBL_1_2:
+        cfi_remember_state
+        vmovaps   %zmm1, %zmm0
+        movq      %rbp, %rsp
+        cfi_def_cfa_register (%rsp)
+        popq      %rbp
+        cfi_adjust_cfa_offset (-8)
+        cfi_restore (%rbp)
+        ret
+
+.LBL_1_3:
+        cfi_restore_state
+        vmovups   %zmm0, 1152(%rsp)
+        vmovups   %zmm1, 1216(%rsp)
+        je        .LBL_1_2
+
+        xorb      %dl, %dl
+        kmovw     %k4, 1048(%rsp)
+        xorl      %eax, %eax
+        kmovw     %k5, 1040(%rsp)
+        kmovw     %k6, 1032(%rsp)
+        kmovw     %k7, 1024(%rsp)
+        vmovups   %zmm16, 960(%rsp)
+        vmovups   %zmm17, 896(%rsp)
+        vmovups   %zmm18, 832(%rsp)
+        vmovups   %zmm19, 768(%rsp)
+        vmovups   %zmm20, 704(%rsp)
+        vmovups   %zmm21, 640(%rsp)
+        vmovups   %zmm22, 576(%rsp)
+        vmovups   %zmm23, 512(%rsp)
+        vmovups   %zmm24, 448(%rsp)
+        vmovups   %zmm25, 384(%rsp)
+        vmovups   %zmm26, 320(%rsp)
+        vmovups   %zmm27, 256(%rsp)
+        vmovups   %zmm28, 192(%rsp)
+        vmovups   %zmm29, 128(%rsp)
+        vmovups   %zmm30, 64(%rsp)
+        vmovups   %zmm31, (%rsp)
+        movq      %rsi, 1064(%rsp)
+        movq      %rdi, 1056(%rsp)
+        movq      %r12, 1096(%rsp)
+        cfi_offset_rel_rsp (12, 1096)
+        movb      %dl, %r12b
+        movq      %r13, 1088(%rsp)
+        cfi_offset_rel_rsp (13, 1088)
+        movl      %ecx, %r13d
+        movq      %r14, 1080(%rsp)
+        cfi_offset_rel_rsp (14, 1080)
+        movl      %eax, %r14d
+        movq      %r15, 1072(%rsp)
+        cfi_offset_rel_rsp (15, 1072)
+        cfi_remember_state
+
+.LBL_1_6:
+        btl       %r14d, %r13d
+        jc        .LBL_1_12
+
+.LBL_1_7:
+        lea       1(%r14), %esi
+        btl       %esi, %r13d
+        jc        .LBL_1_10
+
+.LBL_1_8:
+        addb      $1, %r12b
+        addl      $2, %r14d
+        cmpb      $16, %r12b
+        jb        .LBL_1_6
+
+        kmovw     1048(%rsp), %k4
+        movq      1064(%rsp), %rsi
+        kmovw     1040(%rsp), %k5
+        movq      1056(%rsp), %rdi
+        kmovw     1032(%rsp), %k6
+        movq      1096(%rsp), %r12
+        cfi_restore (%r12)
+        movq      1088(%rsp), %r13
+        cfi_restore (%r13)
+        kmovw     1024(%rsp), %k7
+        vmovups   960(%rsp), %zmm16
+        vmovups   896(%rsp), %zmm17
+        vmovups   832(%rsp), %zmm18
+        vmovups   768(%rsp), %zmm19
+        vmovups   704(%rsp), %zmm20
+        vmovups   640(%rsp), %zmm21
+        vmovups   576(%rsp), %zmm22
+        vmovups   512(%rsp), %zmm23
+        vmovups   448(%rsp), %zmm24
+        vmovups   384(%rsp), %zmm25
+        vmovups   320(%rsp), %zmm26
+        vmovups   256(%rsp), %zmm27
+        vmovups   192(%rsp), %zmm28
+        vmovups   128(%rsp), %zmm29
+        vmovups   64(%rsp), %zmm30
+        vmovups   (%rsp), %zmm31
+        movq      1080(%rsp), %r14
+        cfi_restore (%r14)
+        movq      1072(%rsp), %r15
+        cfi_restore (%r15)
+        vmovups   1216(%rsp), %zmm1
+        jmp       .LBL_1_2
+
+.LBL_1_10:
+        cfi_restore_state
+        movzbl    %r12b, %r15d
+        vmovss    1156(%rsp,%r15,8), %xmm0
+        call      expf@PLT
+        vmovss    %xmm0, 1220(%rsp,%r15,8)
+        jmp       .LBL_1_8
+
+.LBL_1_12:
+        movzbl    %r12b, %r15d
+        vmovss    1152(%rsp,%r15,8), %xmm0
+        call      expf@PLT
+        vmovss    %xmm0, 1216(%rsp,%r15,8)
+        jmp       .LBL_1_7
+
+#endif
+END (_ZGVeN16v_expf_knl)
+
+ENTRY (_ZGVeN16v_expf_skx)
+#ifndef HAVE_AVX512_ASM_SUPPORT
+WRAPPER_IMPL_AVX512 _ZGVdN8v_expf
+#else
+/*
+   ALGORITHM DESCRIPTION:
+
+     Argument representation:
+     M = rint(X*2^k/ln2) = 2^k*N+j
+     X = M*ln2/2^k + r = N*ln2 + ln2*(j/2^k) + r
+     then -ln2/2^(k+1) < r < ln2/2^(k+1)
+     Alternatively:
+     M = trunc(X*2^k/ln2)
+     then 0 < r < ln2/2^k
+
+     Result calculation:
+     exp(X) = exp(N*ln2 + ln2*(j/2^k) + r)
+     = 2^N * 2^(j/2^k) * exp(r)
+     2^N is calculated by bit manipulation
+     2^(j/2^k) is computed from table lookup
+     exp(r) is approximated by polynomial
+
+     The table lookup is skipped if k = 0.
+     For low accuracy approximation, exp(r) ~ 1 or 1+r.  */
+
+        pushq     %rbp
+        cfi_adjust_cfa_offset (8)
+        cfi_rel_offset (%rbp, 0)
+        movq      %rsp, %rbp
+        cfi_def_cfa_register (%rbp)
+        andq      $-64, %rsp
+        subq      $1280, %rsp
+        movq      __svml_sexp_data@GOTPCREL(%rip), %rax
+
+/* r = x-n*ln2_hi/2^k */
+        vmovaps   %zmm0, %zmm7
+
+/* compare against threshold */
+        vmovups   .L_2il0floatpacket.13(%rip), %zmm3
+        vmovups __sInvLn2(%rax), %zmm4
+        vmovups __sShifter(%rax), %zmm1
+        vmovups __sLn2hi(%rax), %zmm6
+        vmovups __sPC5(%rax), %zmm10
+
+/* m = x*2^k/ln2 + shifter */
+        vfmadd213ps %zmm1, %zmm0, %zmm4
+
+/* n = m - shifter = rint(x*2^k/ln2) */
+        vsubps    %zmm1, %zmm4, %zmm8
+        vpaddd __iBias(%rax), %zmm4, %zmm5
+        vfnmadd231ps %zmm8, %zmm6, %zmm7
+
+/* compute 2^N with "shift" */
+        vpslld    $23, %zmm5, %zmm9
+
+/* r = r-n*ln2_lo/2^k = x - n*ln2/2^k */
+        vfnmadd132ps __sLn2lo(%rax), %zmm7, %zmm8
+
+/* c5*r+c4 */
+        vfmadd213ps __sPC4(%rax), %zmm8, %zmm10
+
+/* (c5*r+c4)*r+c3 */
+        vfmadd213ps __sPC3(%rax), %zmm8, %zmm10
+
+/* ((c5*r+c4)*r+c3)*r+c2 */
+        vfmadd213ps __sPC2(%rax), %zmm8, %zmm10
+
+/* (((c5*r+c4)*r+c3)*r+c2)*r+c1 */
+        vfmadd213ps __sPC1(%rax), %zmm8, %zmm10
+
+/* exp(r) = ((((c5*r+c4)*r+c3)*r+c2)*r+c1)*r+c0 */
+        vfmadd213ps __sPC0(%rax), %zmm8, %zmm10
+
+/* 2^N*exp(r) */
+        vmulps    %zmm10, %zmm9, %zmm1
+
+/* remove sign of x by "and" operation */
+        vpandd __iAbsMask(%rax), %zmm0, %zmm2
+        vpcmpd    $2, __iDomainRange(%rax), %zmm2, %k1
+        vpandnd   %zmm2, %zmm2, %zmm3{%k1}
+
+/* set mask for overflow/underflow */
+        vptestmd  %zmm3, %zmm3, %k0
+        kmovw     %k0, %ecx
+        testl     %ecx, %ecx
+        jne       .LBL_2_3
+
+.LBL_2_2:
+        cfi_remember_state
+        vmovaps   %zmm1, %zmm0
+        movq      %rbp, %rsp
+        cfi_def_cfa_register (%rsp)
+        popq      %rbp
+        cfi_adjust_cfa_offset (-8)
+        cfi_restore (%rbp)
+        ret
+
+.LBL_2_3:
+        cfi_restore_state
+        vmovups   %zmm0, 1152(%rsp)
+        vmovups   %zmm1, 1216(%rsp)
+        je        .LBL_2_2
+
+        xorb      %dl, %dl
+        xorl      %eax, %eax
+        kmovw     %k4, 1048(%rsp)
+        kmovw     %k5, 1040(%rsp)
+        kmovw     %k6, 1032(%rsp)
+        kmovw     %k7, 1024(%rsp)
+        vmovups   %zmm16, 960(%rsp)
+        vmovups   %zmm17, 896(%rsp)
+        vmovups   %zmm18, 832(%rsp)
+        vmovups   %zmm19, 768(%rsp)
+        vmovups   %zmm20, 704(%rsp)
+        vmovups   %zmm21, 640(%rsp)
+        vmovups   %zmm22, 576(%rsp)
+        vmovups   %zmm23, 512(%rsp)
+        vmovups   %zmm24, 448(%rsp)
+        vmovups   %zmm25, 384(%rsp)
+        vmovups   %zmm26, 320(%rsp)
+        vmovups   %zmm27, 256(%rsp)
+        vmovups   %zmm28, 192(%rsp)
+        vmovups   %zmm29, 128(%rsp)
+        vmovups   %zmm30, 64(%rsp)
+        vmovups   %zmm31, (%rsp)
+        movq      %rsi, 1064(%rsp)
+        movq      %rdi, 1056(%rsp)
+        movq      %r12, 1096(%rsp)
+        cfi_offset_rel_rsp (12, 1096)
+        movb      %dl, %r12b
+        movq      %r13, 1088(%rsp)
+        cfi_offset_rel_rsp (13, 1088)
+        movl      %ecx, %r13d
+        movq      %r14, 1080(%rsp)
+        cfi_offset_rel_rsp (14, 1080)
+        movl      %eax, %r14d
+        movq      %r15, 1072(%rsp)
+        cfi_offset_rel_rsp (15, 1072)
+        cfi_remember_state
+
+
+.LBL_2_6:
+        btl       %r14d, %r13d
+        jc        .LBL_2_12
+
+.LBL_2_7:
+        lea       1(%r14), %esi
+        btl       %esi, %r13d
+        jc        .LBL_2_10
+
+.LBL_2_8:
+        incb      %r12b
+        addl      $2, %r14d
+        cmpb      $16, %r12b
+        jb        .LBL_2_6
+
+        kmovw     1048(%rsp), %k4
+        kmovw     1040(%rsp), %k5
+        kmovw     1032(%rsp), %k6
+        kmovw     1024(%rsp), %k7
+        vmovups   960(%rsp), %zmm16
+        vmovups   896(%rsp), %zmm17
+        vmovups   832(%rsp), %zmm18
+        vmovups   768(%rsp), %zmm19
+        vmovups   704(%rsp), %zmm20
+        vmovups   640(%rsp), %zmm21
+        vmovups   576(%rsp), %zmm22
+        vmovups   512(%rsp), %zmm23
+        vmovups   448(%rsp), %zmm24
+        vmovups   384(%rsp), %zmm25
+        vmovups   320(%rsp), %zmm26
+        vmovups   256(%rsp), %zmm27
+        vmovups   192(%rsp), %zmm28
+        vmovups   128(%rsp), %zmm29
+        vmovups   64(%rsp), %zmm30
+        vmovups   (%rsp), %zmm31
+        vmovups   1216(%rsp), %zmm1
+        movq      1064(%rsp), %rsi
+        movq      1056(%rsp), %rdi
+        movq      1096(%rsp), %r12
+        cfi_restore (%r12)
+        movq      1088(%rsp), %r13
+        cfi_restore (%r13)
+        movq      1080(%rsp), %r14
+        cfi_restore (%r14)
+        movq      1072(%rsp), %r15
+        cfi_restore (%r15)
+        jmp       .LBL_2_2
+
+.LBL_2_10:
+        cfi_restore_state
+        movzbl    %r12b, %r15d
+        vmovss    1156(%rsp,%r15,8), %xmm0
+        vzeroupper
+        vmovss    1156(%rsp,%r15,8), %xmm0
+
+        call      expf@PLT
+
+        vmovss    %xmm0, 1220(%rsp,%r15,8)
+        jmp       .LBL_2_8
+
+.LBL_2_12:
+        movzbl    %r12b, %r15d
+        vmovss    1152(%rsp,%r15,8), %xmm0
+        vzeroupper
+        vmovss    1152(%rsp,%r15,8), %xmm0
+
+        call      expf@PLT
+
+        vmovss    %xmm0, 1216(%rsp,%r15,8)
+        jmp       .LBL_2_7
+
+#endif
+END (_ZGVeN16v_expf_skx)
+
+	.section .rodata, "a"
+.L_2il0floatpacket.13:
+	.long	0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff
+	.type	.L_2il0floatpacket.13,@object
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4.c b/sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core.S
similarity index 56%
copy from sysdeps/x86_64/fpu/test-float-vlen4.c
copy to sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core.S
index 5cb293f..37d38bc 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4.c
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core.S
@@ -1,4 +1,4 @@
-/* Tests for SSE ISA versions of vector math functions.
+/* Multiple versions of vectorized expf.
    Copyright (C) 2014-2015 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
@@ -16,10 +16,23 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#include "test-float-vlen4.h"
+#include <sysdep.h>
+#include <init-arch.h>
 
-#define TEST_VECTOR_cosf 1
-#define TEST_VECTOR_sinf 1
-#define TEST_VECTOR_logf 1
+	.text
+ENTRY (_ZGVbN4v_expf)
+        .type   _ZGVbN4v_expf, @gnu_indirect_function
+        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
+        jne     1f
+        call    __init_cpu_features
+1:      leaq    _ZGVbN4v_expf_sse4(%rip), %rax
+        testl   $bit_SSE4_1, __cpu_features+CPUID_OFFSET+index_SSE4_1(%rip)
+        jz      2f
+        ret
+2:      leaq    _ZGVbN4v_expf_sse2(%rip), %rax
+        ret
+END (_ZGVbN4v_expf)
+libmvec_hidden_def (_ZGVbN4v_expf)
 
-#include "libm-test.c"
+#define _ZGVbN4v_expf _ZGVbN4v_expf_sse2
+#include "../svml_s_expf4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core_sse4.S
new file mode 100644
index 0000000..fcc1859
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core_sse4.S
@@ -0,0 +1,212 @@
+/* Function expf vectorized with SSE4.
+   Copyright (C) 2014-2015 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_expf_data.h"
+
+	.text
+ENTRY (_ZGVbN4v_expf_sse4)
+/*
+   ALGORITHM DESCRIPTION:
+
+     Argument representation:
+     M = rint(X*2^k/ln2) = 2^k*N+j
+     X = M*ln2/2^k + r = N*ln2 + ln2*(j/2^k) + r
+     then -ln2/2^(k+1) < r < ln2/2^(k+1)
+     Alternatively:
+     M = trunc(X*2^k/ln2)
+     then 0 < r < ln2/2^k
+
+     Result calculation:
+     exp(X) = exp(N*ln2 + ln2*(j/2^k) + r)
+     = 2^N * 2^(j/2^k) * exp(r)
+     2^N is calculated by bit manipulation
+     2^(j/2^k) is computed from table lookup
+     exp(r) is approximated by polynomial
+
+     The table lookup is skipped if k = 0.
+     For low accuracy approximation, exp(r) ~ 1 or 1+r.  */
+
+        pushq     %rbp
+        cfi_adjust_cfa_offset (8)
+        cfi_rel_offset (%rbp, 0)
+        movq      %rsp, %rbp
+        cfi_def_cfa_register (%rbp)
+        andq      $-64, %rsp
+        subq      $320, %rsp
+        movaps    %xmm0, %xmm5
+        movq      __svml_sexp_data@GOTPCREL(%rip), %rax
+        movups __sInvLn2(%rax), %xmm0
+
+/* m = x*2^k/ln2 + shifter */
+        mulps     %xmm5, %xmm0
+        movups __sShifter(%rax), %xmm6
+        movups __sLn2hi(%rax), %xmm4
+        addps     %xmm6, %xmm0
+
+/* n = m - shifter = rint(x*2^k/ln2) */
+        movaps    %xmm0, %xmm2
+
+/* remove sign of x by "and" operation */
+        movdqu __iAbsMask(%rax), %xmm7
+        subps     %xmm6, %xmm2
+
+/* r = x-n*ln2_hi/2^k */
+        mulps     %xmm2, %xmm4
+        pand      %xmm5, %xmm7
+
+/* compare against threshold */
+        pcmpgtd __iDomainRange(%rax), %xmm7
+        movups __sLn2lo(%rax), %xmm1
+
+/* set mask for overflow/underflow */
+        movmskps  %xmm7, %ecx
+        movaps    %xmm5, %xmm7
+        movups __sPC5(%rax), %xmm3
+        subps     %xmm4, %xmm7
+
+/* r = r-n*ln2_lo/2^k = x - n*ln2/2^k */
+        mulps     %xmm1, %xmm2
+
+/* compute 2^N with "shift" */
+        movdqu __iBias(%rax), %xmm6
+        subps     %xmm2, %xmm7
+
+/* c5*r+c4 */
+        mulps     %xmm7, %xmm3
+        paddd     %xmm6, %xmm0
+        pslld     $23, %xmm0
+        addps __sPC4(%rax), %xmm3
+
+/* (c5*r+c4)*r+c3 */
+        mulps     %xmm7, %xmm3
+        addps __sPC3(%rax), %xmm3
+
+/* ((c5*r+c4)*r+c3)*r+c2 */
+        mulps     %xmm7, %xmm3
+        addps __sPC2(%rax), %xmm3
+
+/* (((c5*r+c4)*r+c3)*r+c2)*r+c1 */
+        mulps     %xmm7, %xmm3
+        addps __sPC1(%rax), %xmm3
+
+/* exp(r) = ((((c5*r+c4)*r+c3)*r+c2)*r+c1)*r+c0 */
+        mulps     %xmm3, %xmm7
+        addps __sPC0(%rax), %xmm7
+
+/* 2^N*exp(r) */
+        mulps     %xmm7, %xmm0
+        testl     %ecx, %ecx
+        jne       .LBL_1_3
+
+.LBL_1_2:
+        cfi_remember_state
+        movq      %rbp, %rsp
+        cfi_def_cfa_register (%rsp)
+        popq      %rbp
+        cfi_adjust_cfa_offset (-8)
+        cfi_restore (%rbp)
+        ret
+
+.LBL_1_3:
+        cfi_restore_state
+        movups    %xmm5, 192(%rsp)
+        movups    %xmm0, 256(%rsp)
+        je        .LBL_1_2
+
+        xorb      %dl, %dl
+        xorl      %eax, %eax
+        movups    %xmm8, 112(%rsp)
+        movups    %xmm9, 96(%rsp)
+        movups    %xmm10, 80(%rsp)
+        movups    %xmm11, 64(%rsp)
+        movups    %xmm12, 48(%rsp)
+        movups    %xmm13, 32(%rsp)
+        movups    %xmm14, 16(%rsp)
+        movups    %xmm15, (%rsp)
+        movq      %rsi, 136(%rsp)
+        movq      %rdi, 128(%rsp)
+        movq      %r12, 168(%rsp)
+        cfi_offset_rel_rsp (12, 168)
+        movb      %dl, %r12b
+        movq      %r13, 160(%rsp)
+        cfi_offset_rel_rsp (13, 160)
+        movl      %ecx, %r13d
+        movq      %r14, 152(%rsp)
+        cfi_offset_rel_rsp (14, 152)
+        movl      %eax, %r14d
+        movq      %r15, 144(%rsp)
+        cfi_offset_rel_rsp (15, 144)
+        cfi_remember_state
+
+.LBL_1_6:
+        btl       %r14d, %r13d
+        jc        .LBL_1_12
+
+.LBL_1_7:
+        lea       1(%r14), %esi
+        btl       %esi, %r13d
+        jc        .LBL_1_10
+
+.LBL_1_8:
+        incb      %r12b
+        addl      $2, %r14d
+        cmpb      $16, %r12b
+        jb        .LBL_1_6
+
+        movups    112(%rsp), %xmm8
+        movups    96(%rsp), %xmm9
+        movups    80(%rsp), %xmm10
+        movups    64(%rsp), %xmm11
+        movups    48(%rsp), %xmm12
+        movups    32(%rsp), %xmm13
+        movups    16(%rsp), %xmm14
+        movups    (%rsp), %xmm15
+        movq      136(%rsp), %rsi
+        movq      128(%rsp), %rdi
+        movq      168(%rsp), %r12
+        cfi_restore (%r12)
+        movq      160(%rsp), %r13
+        cfi_restore (%r13)
+        movq      152(%rsp), %r14
+        cfi_restore (%r14)
+        movq      144(%rsp), %r15
+        cfi_restore (%r15)
+        movups    256(%rsp), %xmm0
+        jmp       .LBL_1_2
+
+.LBL_1_10:
+        cfi_restore_state
+        movzbl    %r12b, %r15d
+        movss     196(%rsp,%r15,8), %xmm0
+
+        call      expf@PLT
+
+        movss     %xmm0, 260(%rsp,%r15,8)
+        jmp       .LBL_1_8
+
+.LBL_1_12:
+        movzbl    %r12b, %r15d
+        movss     192(%rsp,%r15,8), %xmm0
+
+        call      expf@PLT
+
+        movss     %xmm0, 256(%rsp,%r15,8)
+        jmp       .LBL_1_7
+
+END (_ZGVbN4v_expf_sse4)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4.c b/sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core.S
similarity index 51%
copy from sysdeps/x86_64/fpu/test-float-vlen4.c
copy to sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core.S
index 5cb293f..e3dc1b1 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4.c
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core.S
@@ -1,4 +1,4 @@
-/* Tests for SSE ISA versions of vector math functions.
+/* Multiple versions of vectorized expf.
    Copyright (C) 2014-2015 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
@@ -10,16 +10,29 @@
    The GNU C Library is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
+    Lesser General Public License for more details.
 
    You should have received a copy of the GNU Lesser General Public
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#include "test-float-vlen4.h"
+#include <sysdep.h>
+#include <init-arch.h>
 
-#define TEST_VECTOR_cosf 1
-#define TEST_VECTOR_sinf 1
-#define TEST_VECTOR_logf 1
+	.text
+ENTRY (_ZGVdN8v_expf)
+        .type   _ZGVdN8v_expf, @gnu_indirect_function
+        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
+        jne     1f
+        call    __init_cpu_features
+1:      leaq    _ZGVdN8v_expf_avx2(%rip), %rax
+        testl   $bit_AVX2_Usable, __cpu_features+FEATURE_OFFSET+index_AVX2_Usable(%rip)
+        jz      2f
+        ret
+2:      leaq    _ZGVdN8v_expf_sse_wrapper(%rip), %rax
+        ret
+END (_ZGVdN8v_expf)
+libmvec_hidden_def (_ZGVdN8v_expf)
 
-#include "libm-test.c"
+#define _ZGVdN8v_expf _ZGVdN8v_expf_sse_wrapper
+#include "../svml_s_expf8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core_avx2.S
new file mode 100644
index 0000000..c876ecc
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core_avx2.S
@@ -0,0 +1,202 @@
+/* Function expf vectorized with AVX2.
+   Copyright (C) 2014-2015 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_expf_data.h"
+
+	.text
+ENTRY(_ZGVdN8v_expf_avx2)
+/*
+   ALGORITHM DESCRIPTION:
+
+     Argument representation:
+     M = rint(X*2^k/ln2) = 2^k*N+j
+     X = M*ln2/2^k + r = N*ln2 + ln2*(j/2^k) + r
+     then -ln2/2^(k+1) < r < ln2/2^(k+1)
+     Alternatively:
+     M = trunc(X*2^k/ln2)
+     then 0 < r < ln2/2^k
+
+     Result calculation:
+     exp(X) = exp(N*ln2 + ln2*(j/2^k) + r)
+     = 2^N * 2^(j/2^k) * exp(r)
+     2^N is calculated by bit manipulation
+     2^(j/2^k) is computed from table lookup
+     exp(r) is approximated by polynomial
+
+     The table lookup is skipped if k = 0.
+     For low accuracy approximation, exp(r) ~ 1 or 1+r.  */
+
+        pushq     %rbp
+        cfi_adjust_cfa_offset (8)
+        cfi_rel_offset (%rbp, 0)
+        movq      %rsp, %rbp
+        cfi_def_cfa_register (%rbp)
+        andq      $-64, %rsp
+        subq      $448, %rsp
+        movq      __svml_sexp_data@GOTPCREL(%rip), %rax
+        vmovaps   %ymm0, %ymm2
+        vmovups __sInvLn2(%rax), %ymm7
+        vmovups __sShifter(%rax), %ymm4
+        vmovups __sLn2hi(%rax), %ymm3
+        vmovups __sPC5(%rax), %ymm1
+
+/* m = x*2^k/ln2 + shifter */
+        vfmadd213ps %ymm4, %ymm2, %ymm7
+
+/* n = m - shifter = rint(x*2^k/ln2) */
+        vsubps    %ymm4, %ymm7, %ymm0
+        vpaddd __iBias(%rax), %ymm7, %ymm4
+
+/* remove sign of x by "and" operation */
+        vandps __iAbsMask(%rax), %ymm2, %ymm5
+
+/* compare against threshold */
+        vpcmpgtd __iDomainRange(%rax), %ymm5, %ymm6
+
+/* r = x-n*ln2_hi/2^k */
+        vmovaps   %ymm2, %ymm5
+        vfnmadd231ps %ymm0, %ymm3, %ymm5
+
+/* r = r-n*ln2_lo/2^k = x - n*ln2/2^k */
+        vfnmadd132ps __sLn2lo(%rax), %ymm5, %ymm0
+
+/* c5*r+c4 */
+        vfmadd213ps __sPC4(%rax), %ymm0, %ymm1
+
+/* (c5*r+c4)*r+c3 */
+        vfmadd213ps __sPC3(%rax), %ymm0, %ymm1
+
+/* ((c5*r+c4)*r+c3)*r+c2 */
+        vfmadd213ps __sPC2(%rax), %ymm0, %ymm1
+
+/* (((c5*r+c4)*r+c3)*r+c2)*r+c1 */
+        vfmadd213ps __sPC1(%rax), %ymm0, %ymm1
+
+/* exp(r) = ((((c5*r+c4)*r+c3)*r+c2)*r+c1)*r+c0 */
+        vfmadd213ps __sPC0(%rax), %ymm0, %ymm1
+
+/* set mask for overflow/underflow */
+        vmovmskps %ymm6, %ecx
+
+/* compute 2^N with "shift" */
+        vpslld    $23, %ymm4, %ymm6
+
+/* 2^N*exp(r) */
+        vmulps    %ymm1, %ymm6, %ymm0
+        testl     %ecx, %ecx
+        jne       .LBL_1_3
+
+.LBL_1_2:
+        cfi_remember_state
+        movq      %rbp, %rsp
+        cfi_def_cfa_register (%rsp)
+        popq      %rbp
+        cfi_adjust_cfa_offset (-8)
+        cfi_restore (%rbp)
+        ret
+
+.LBL_1_3:
+        cfi_restore_state
+        vmovups   %ymm2, 320(%rsp)
+        vmovups   %ymm0, 384(%rsp)
+        je        .LBL_1_2
+
+        xorb      %dl, %dl
+        xorl      %eax, %eax
+        vmovups   %ymm8, 224(%rsp)
+        vmovups   %ymm9, 192(%rsp)
+        vmovups   %ymm10, 160(%rsp)
+        vmovups   %ymm11, 128(%rsp)
+        vmovups   %ymm12, 96(%rsp)
+        vmovups   %ymm13, 64(%rsp)
+        vmovups   %ymm14, 32(%rsp)
+        vmovups   %ymm15, (%rsp)
+        movq      %rsi, 264(%rsp)
+        movq      %rdi, 256(%rsp)
+        movq      %r12, 296(%rsp)
+        cfi_offset_rel_rsp (12, 296)
+        movb      %dl, %r12b
+        movq      %r13, 288(%rsp)
+        cfi_offset_rel_rsp (13, 288)
+        movl      %ecx, %r13d
+        movq      %r14, 280(%rsp)
+        cfi_offset_rel_rsp (14, 280)
+        movl      %eax, %r14d
+        movq      %r15, 272(%rsp)
+        cfi_offset_rel_rsp (15, 272)
+        cfi_remember_state
+
+.LBL_1_6:
+        btl       %r14d, %r13d
+        jc        .LBL_1_12
+
+.LBL_1_7:
+        lea       1(%r14), %esi
+        btl       %esi, %r13d
+        jc        .LBL_1_10
+
+.LBL_1_8:
+        incb      %r12b
+        addl      $2, %r14d
+        cmpb      $16, %r12b
+        jb        .LBL_1_6
+
+        vmovups   224(%rsp), %ymm8
+        vmovups   192(%rsp), %ymm9
+        vmovups   160(%rsp), %ymm10
+        vmovups   128(%rsp), %ymm11
+        vmovups   96(%rsp), %ymm12
+        vmovups   64(%rsp), %ymm13
+        vmovups   32(%rsp), %ymm14
+        vmovups   (%rsp), %ymm15
+        vmovups   384(%rsp), %ymm0
+        movq      264(%rsp), %rsi
+        movq      256(%rsp), %rdi
+        movq      296(%rsp), %r12
+        cfi_restore (%r12)
+        movq      288(%rsp), %r13
+        cfi_restore (%r13)
+        movq      280(%rsp), %r14
+        cfi_restore (%r14)
+        movq      272(%rsp), %r15
+        cfi_restore (%r15)
+        jmp       .LBL_1_2
+
+.LBL_1_10:
+        cfi_restore_state
+        movzbl    %r12b, %r15d
+        vmovss    324(%rsp,%r15,8), %xmm0
+        vzeroupper
+
+        call      expf@PLT
+
+        vmovss    %xmm0, 388(%rsp,%r15,8)
+        jmp       .LBL_1_8
+
+.LBL_1_12:
+        movzbl    %r12b, %r15d
+        vmovss    320(%rsp,%r15,8), %xmm0
+        vzeroupper
+
+        call      expf@PLT
+
+        vmovss    %xmm0, 384(%rsp,%r15,8)
+        jmp       .LBL_1_7
+
+END(_ZGVdN8v_expf_avx2)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8.c b/sysdeps/x86_64/fpu/svml_s_expf16_core.S
similarity index 79%
copy from sysdeps/x86_64/fpu/test-float-vlen8.c
copy to sysdeps/x86_64/fpu/svml_s_expf16_core.S
index 3fe10ad..d9d355c 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8.c
+++ b/sysdeps/x86_64/fpu/svml_s_expf16_core.S
@@ -1,4 +1,4 @@
-/* Tests for AVX ISA versions of vector math functions.
+/* Function expf vectorized with AVX-512. Wrapper to AVX2 version.
    Copyright (C) 2014-2015 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
@@ -16,10 +16,10 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#include "test-float-vlen8.h"
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
 
-#define TEST_VECTOR_cosf 1
-#define TEST_VECTOR_sinf 1
-#define TEST_VECTOR_logf 1
-
-#include "libm-test.c"
+	.text
+ENTRY (_ZGVeN16v_expf)
+WRAPPER_IMPL_AVX512 _ZGVdN8v_expf
+END (_ZGVeN16v_expf)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16.c b/sysdeps/x86_64/fpu/svml_s_expf4_core.S
similarity index 77%
copy from sysdeps/x86_64/fpu/test-float-vlen16.c
copy to sysdeps/x86_64/fpu/svml_s_expf4_core.S
index 10da5fe..71c5da4 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16.c
+++ b/sysdeps/x86_64/fpu/svml_s_expf4_core.S
@@ -1,4 +1,4 @@
-/* Tests for AVX-512 ISA versions of vector math functions.
+/* Function expf vectorized with SSE2.
    Copyright (C) 2014-2015 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
@@ -16,12 +16,15 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#include "test-float-vlen16.h"
 
-#define TEST_VECTOR_cosf 1
-#define TEST_VECTOR_sinf 1
-#define TEST_VECTOR_logf 1
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
 
-#define REQUIRE_AVX512F
+	.text
+ENTRY (_ZGVbN4v_expf)
+WRAPPER_IMPL_SSE2 expf
+END (_ZGVbN4v_expf)
 
-#include "libm-test.c"
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN4v_expf)
+#endif
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16.c b/sysdeps/x86_64/fpu/svml_s_expf8_core.S
similarity index 75%
copy from sysdeps/x86_64/fpu/test-float-vlen16.c
copy to sysdeps/x86_64/fpu/svml_s_expf8_core.S
index 10da5fe..d254a99 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16.c
+++ b/sysdeps/x86_64/fpu/svml_s_expf8_core.S
@@ -1,4 +1,4 @@
-/* Tests for AVX-512 ISA versions of vector math functions.
+/* Function expf vectorized with AVX2, wrapper version.
    Copyright (C) 2014-2015 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
@@ -16,12 +16,14 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#include "test-float-vlen16.h"
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
 
-#define TEST_VECTOR_cosf 1
-#define TEST_VECTOR_sinf 1
-#define TEST_VECTOR_logf 1
+	.text
+ENTRY (_ZGVdN8v_expf)
+WRAPPER_IMPL_AVX _ZGVbN4v_expf
+END (_ZGVdN8v_expf)
 
-#define REQUIRE_AVX512F
-
-#include "libm-test.c"
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN8v_expf)
+#endif
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8.c b/sysdeps/x86_64/fpu/svml_s_expf8_core_avx.S
similarity index 79%
copy from sysdeps/x86_64/fpu/test-float-vlen8.c
copy to sysdeps/x86_64/fpu/svml_s_expf8_core_avx.S
index 3fe10ad..ece40ba 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8.c
+++ b/sysdeps/x86_64/fpu/svml_s_expf8_core_avx.S
@@ -1,4 +1,4 @@
-/* Tests for AVX ISA versions of vector math functions.
+/* Function expf vectorized in AVX ISA as wrapper to SSE4 ISA version.
    Copyright (C) 2014-2015 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
@@ -16,10 +16,10 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#include "test-float-vlen8.h"
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
 
-#define TEST_VECTOR_cosf 1
-#define TEST_VECTOR_sinf 1
-#define TEST_VECTOR_logf 1
-
-#include "libm-test.c"
+        .text
+ENTRY(_ZGVcN8v_expf)
+WRAPPER_IMPL_AVX _ZGVbN4v_expf
+END(_ZGVcN8v_expf)
diff --git a/sysdeps/x86_64/fpu/svml_s_expf_data.S b/sysdeps/x86_64/fpu/svml_s_expf_data.S
new file mode 100644
index 0000000..eee9d69
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_expf_data.S
@@ -0,0 +1,63 @@
+/* Data for function expf.
+   Copyright (C) 2014-2015 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "svml_s_expf_data.h"
+
+	.section .rodata, "a"
+	.align 64
+
+/* Data table for vector implementations of function expf.
+   The table may contain polynomial, reduction, lookup coefficients and
+   other coefficients obtained through different methods of research and
+   experimental work.  */
+
+	.globl __svml_sexp_data
+__svml_sexp_data:
+
+/* Range reduction coefficients:
+ * log(2) inverted */
+float_vector __sInvLn2 0x3fb8aa3b
+
+/* right shifter constant */
+float_vector __sShifter 0x4b400000
+
+/* log(2) high part */
+float_vector __sLn2hi 0x3f317200
+
+/* log(2) low part */
+float_vector __sLn2lo 0x35bfbe8e
+
+/* bias */
+float_vector __iBias 0x0000007f
+
+/* Polynomial coefficients:
+ * Here we approximate 2^x on [-0.5, 0.5] */
+float_vector __sPC0 0x3f800000
+float_vector __sPC1 0x3f7ffffe
+float_vector __sPC2 0x3effff34
+float_vector __sPC3 0x3e2aacac
+float_vector __sPC4 0x3d2b8392
+float_vector __sPC5 0x3c07d9fe
+
+/* absolute value mask */
+float_vector __iAbsMask 0x7fffffff
+
+/* working domain range */
+float_vector __iDomainRange 0x42aeac4f
+	.type	__svml_sexp_data,@object
+	.size __svml_sexp_data,.-__svml_sexp_data
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2.c b/sysdeps/x86_64/fpu/svml_s_expf_data.h
similarity index 50%
copy from sysdeps/x86_64/fpu/test-float-vlen8-avx2.c
copy to sysdeps/x86_64/fpu/svml_s_expf_data.h
index 1026d63..beaa290 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2.c
+++ b/sysdeps/x86_64/fpu/svml_s_expf_data.h
@@ -1,4 +1,4 @@
-/* Tests for AVX2 ISA versions of vector math functions.
+/* Offsets for data table for vector function expf.
    Copyright (C) 2014-2015 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
@@ -16,15 +16,30 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#include "test-float-vlen8.h"
-
-#undef VEC_SUFF
-#define VEC_SUFF _vlen8_avx2
-
-#define TEST_VECTOR_cosf 1
-#define TEST_VECTOR_sinf 1
-#define TEST_VECTOR_logf 1
-
-#define REQUIRE_AVX2
-
-#include "libm-test.c"
+#ifndef S_EXPF_DATA_H
+#define S_EXPF_DATA_H
+
+#define __sInvLn2                     	0
+#define __sShifter                    	64
+#define __sLn2hi                      	128
+#define __sLn2lo                      	192
+#define __iBias                       	256
+#define __sPC0                        	320
+#define __sPC1                        	384
+#define __sPC2                        	448
+#define __sPC3                        	512
+#define __sPC4                        	576
+#define __sPC5                        	640
+#define __iAbsMask                    	704
+#define __iDomainRange                	768
+
+.macro float_vector offset value
+.if .-__svml_sexp_data != \offset
+.err
+.endif
+.rept 16
+.long \value
+.endr
+.endm
+
+#endif
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
index 72435e4..19c249a 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
@@ -25,3 +25,4 @@
 VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVeN16v_cosf)
 VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVeN16v_sinf)
 VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVeN16v_logf)
+VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVeN16v_expf)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16.c b/sysdeps/x86_64/fpu/test-float-vlen16.c
index 10da5fe..09514c4 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16.c
@@ -21,6 +21,7 @@
 #define TEST_VECTOR_cosf 1
 #define TEST_VECTOR_sinf 1
 #define TEST_VECTOR_logf 1
+#define TEST_VECTOR_expf 1
 
 #define REQUIRE_AVX512F
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
index f51575d..55bd026 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
@@ -25,3 +25,4 @@
 VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVbN4v_cosf)
 VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVbN4v_sinf)
 VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVbN4v_logf)
+VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVbN4v_expf)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4.c b/sysdeps/x86_64/fpu/test-float-vlen4.c
index 5cb293f..df31593 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4.c
@@ -21,5 +21,6 @@
 #define TEST_VECTOR_cosf 1
 #define TEST_VECTOR_sinf 1
 #define TEST_VECTOR_logf 1
+#define TEST_VECTOR_expf 1
 
 #include "libm-test.c"
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
index 7515a59..637949b 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
@@ -28,3 +28,4 @@
 VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVdN8v_cosf)
 VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVdN8v_sinf)
 VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVdN8v_logf)
+VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVdN8v_expf)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2.c
index 1026d63..e550f4d 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2.c
@@ -24,6 +24,7 @@
 #define TEST_VECTOR_cosf 1
 #define TEST_VECTOR_sinf 1
 #define TEST_VECTOR_logf 1
+#define TEST_VECTOR_expf 1
 
 #define REQUIRE_AVX2
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
index 6dde1a2..3b0a63d 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
@@ -25,3 +25,4 @@
 VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVcN8v_cosf)
 VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVcN8v_sinf)
 VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVcN8v_logf)
+VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVcN8v_expf)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8.c b/sysdeps/x86_64/fpu/test-float-vlen8.c
index 3fe10ad..db5b2e5 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8.c
@@ -21,5 +21,6 @@
 #define TEST_VECTOR_cosf 1
 #define TEST_VECTOR_sinf 1
 #define TEST_VECTOR_logf 1
+#define TEST_VECTOR_expf 1
 
 #include "libm-test.c"

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                                          |   30 ++
 NEWS                                               |    2 +-
 sysdeps/unix/sysv/linux/x86_64/libmvec.abilist     |    4 +
 sysdeps/x86/fpu/bits/math-vector.h                 |    6 +
 sysdeps/x86_64/fpu/Makefile                        |    2 +
 sysdeps/x86_64/fpu/Versions                        |    1 +
 sysdeps/x86_64/fpu/libm-test-ulps                  |    8 +
 sysdeps/x86_64/fpu/multiarch/Makefile              |    4 +-
 sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core.S  |   39 ++
 .../fpu/multiarch/svml_s_expf16_core_avx512.S      |  446 ++++++++++++++++++++
 sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core.S   |   38 ++
 .../x86_64/fpu/multiarch/svml_s_expf4_core_sse4.S  |  212 ++++++++++
 sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core.S   |   38 ++
 .../x86_64/fpu/multiarch/svml_s_expf8_core_avx2.S  |  202 +++++++++
 sysdeps/x86_64/fpu/svml_s_expf16_core.S            |   25 ++
 sysdeps/x86_64/fpu/svml_s_expf4_core.S             |   30 ++
 sysdeps/x86_64/fpu/svml_s_expf8_core.S             |   29 ++
 sysdeps/x86_64/fpu/svml_s_expf8_core_avx.S         |   25 ++
 sysdeps/x86_64/fpu/svml_s_expf_data.S              |   63 +++
 sysdeps/x86_64/fpu/svml_s_expf_data.h              |   45 ++
 sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c    |    1 +
 sysdeps/x86_64/fpu/test-float-vlen16.c             |    1 +
 sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c     |    1 +
 sysdeps/x86_64/fpu/test-float-vlen4.c              |    1 +
 .../x86_64/fpu/test-float-vlen8-avx2-wrappers.c    |    1 +
 sysdeps/x86_64/fpu/test-float-vlen8-avx2.c         |    1 +
 sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c     |    1 +
 sysdeps/x86_64/fpu/test-float-vlen8.c              |    1 +
 28 files changed, 1255 insertions(+), 2 deletions(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_expf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_expf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_expf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_expf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_expf_data.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_expf_data.h


hooks/post-receive
-- 
GNU C Library master sources


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]