This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug dynamic-link/21871] New: _dl_runtime_resolve_avx_opt is slower than _dl_runtime_resolve_avx_slow


https://sourceware.org/bugzilla/show_bug.cgi?id=21871

            Bug ID: 21871
           Summary: _dl_runtime_resolve_avx_opt is slower than
                    _dl_runtime_resolve_avx_slow
           Product: glibc
           Version: 2.26
            Status: NEW
          Severity: normal
          Priority: P2
         Component: dynamic-link
          Assignee: unassigned at sourceware dot org
          Reporter: hjl.tools at gmail dot com
  Target Milestone: ---
            Target: x86-64

On AVX machines with XGETBV (ECX == 1) like Skylake processors,

(gdb) disass _dl_runtime_resolve_avx_opt
Dump of assembler code for function _dl_runtime_resolve_avx_opt:
   0x0000000000015890 <+0>:     push   %rax
   0x0000000000015891 <+1>:     push   %rcx
   0x0000000000015892 <+2>:     push   %rdx
   0x0000000000015893 <+3>:     mov    $0x1,%ecx
   0x0000000000015898 <+8>:     xgetbv 
   0x000000000001589b <+11>:    mov    %eax,%r11d
   0x000000000001589e <+14>:    pop    %rdx
   0x000000000001589f <+15>:    pop    %rcx
   0x00000000000158a0 <+16>:    pop    %rax
   0x00000000000158a1 <+17>:    and    $0x4,%r11d
   0x00000000000158a5 <+21>:    bnd je 0x16200 <_dl_runtime_resolve_sse_vex>
End of assembler dump.

is slower than:

(gdb) disass _dl_runtime_resolve_avx_slow
Dump of assembler code for function _dl_runtime_resolve_avx_slow:
   0x0000000000015850 <+0>:     vorpd  %ymm0,%ymm1,%ymm8
   0x0000000000015854 <+4>:     vorpd  %ymm2,%ymm3,%ymm9
   0x0000000000015858 <+8>:     vorpd  %ymm4,%ymm5,%ymm10
   0x000000000001585c <+12>:    vorpd  %ymm6,%ymm7,%ymm11
   0x0000000000015860 <+16>:    vorpd  %ymm8,%ymm9,%ymm9
   0x0000000000015865 <+21>:    vorpd  %ymm10,%ymm11,%ymm10
   0x000000000001586a <+26>:    vpcmpeqd %xmm8,%xmm8,%xmm8
   0x000000000001586f <+31>:    vorpd  %ymm9,%ymm10,%ymm10
   0x0000000000015874 <+36>:    vptest %ymm10,%ymm8
   0x0000000000015879 <+41>:    bnd jae 0x158b0 <_dl_runtime_resolve_avx>
   0x000000000001587c <+44>:    vzeroupper 
   0x000000000001587f <+47>:    bnd jmpq 0x16200 <_dl_runtime_resolve_sse_vex>
End of assembler dump.
(gdb) 

since xgetbv takes much more cycles than single cycle operations like
vpord/vvpcmpeq/ptest.  _dl_runtime_resolve_opt should be used only with
AVX512 where AVX512 instructions lead to lower CPU frequency on Skylake
server.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]