This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Remove unnecessary IFUNC dispatch for __memset_chk.

On Sun, Aug 9, 2015 at 11:24 AM, Zack Weinberg <> wrote:
> On 08/09/2015 01:56 PM, H.J. Lu wrote:
>>> Thanks, that clarifies what IFUNC _does_, but it doesn't help me
>>> understand how it interacts with the libc_hidden_* optimization.  I see
>>> in the code that e.g. __GI_memset is pointed directly at __memset_sse2
>>> (for amd64) but I do not understand whether that is a limitation of the
>>> current implementation, a a deliberate choice to avoid indirection at
>>> the cost of missing out on AVX2 tuning, or both.
>> Those comments were made when the first IFUNC implementation
>> was done.  We have improved IFUNC implementation since then
>> and those comments may not be true today.  But we have to verify
>> that at least the extra indirect via PLT doesn't hurt performance on
>> most of current processors.
> That doesn't help me understand.  Let me try to ask more specific questions.
> Is an IFUNC's variant-selecting function called only once per process,
> or every time?


> If we sent calls to 'memset' through the PLT (as is
> currently done for 'malloc') would that mean they were subject to IFUNC
> dispatch?


> Is there any *other* way (that already exists - nothing that would
> require binutils changes) to cause calls to 'memset' to
> be subject to IFUNC dispatch?  Compared to using the PLT, what are the

Just remove

#  ifdef SHARED
#  undef libc_hidden_builtin_def
/* It doesn't make sense to send libc-internal memset calls through a PLT.
   The speedup we get from using GPR instruction is likely eaten away
   by the indirect call in the PLT.  */
#  define libc_hidden_builtin_def(name) \
.globl __GI_memset; __GI_memset = __memset_sse2
#  endif

> costs and benefits of doing it that way?

Local IFUNC call must go through PLT, whose cost is an extra indirect
branch instruction.  It gives you the best implementation for your
processor at run-time.

> What is the function of __libc_ifunc_impl_list?  The document you
> referred me to does not mention it or suggest that it might be necessary.

__libc_ifunc_impl_list gives all supported implementations on your
processors, which is used for testing and benchmarking.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]