This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] Remove unnecessary IFUNC dispatch for __memset_chk.
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: Zack Weinberg <zackw at panix dot com>
- Cc: Andreas Schwab <schwab at linux-m68k dot org>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Sun, 9 Aug 2015 13:36:03 -0700
- Subject: Re: [PATCH] Remove unnecessary IFUNC dispatch for __memset_chk.
- Authentication-results: sourceware.org; auth=none
- References: <20150809013434 dot 0B16814B9A at panix1 dot panix dot com> <m28u9lotfk dot fsf at linux-m68k dot org> <55C76FCD dot 5020607 at panix dot com> <CAMe9rOoAWjRma_mG_FazVh3FGOyiGJ=g82=bsfGqa-COnt5p1g at mail dot gmail dot com> <55C78525 dot 40402 at panix dot com> <CAMe9rOrKg8nzB67+OCXz5n1u7ZLnJncpX7J6KkEXqe0Bra843w at mail dot gmail dot com> <55C79AD8 dot 3070301 at panix dot com>
On Sun, Aug 9, 2015 at 11:24 AM, Zack Weinberg <firstname.lastname@example.org> wrote:
> On 08/09/2015 01:56 PM, H.J. Lu wrote:
>>> Thanks, that clarifies what IFUNC _does_, but it doesn't help me
>>> understand how it interacts with the libc_hidden_* optimization. I see
>>> in the code that e.g. __GI_memset is pointed directly at __memset_sse2
>>> (for amd64) but I do not understand whether that is a limitation of the
>>> current implementation, a a deliberate choice to avoid indirection at
>>> the cost of missing out on AVX2 tuning, or both.
>> Those comments were made when the first IFUNC implementation
>> was done. We have improved IFUNC implementation since then
>> and those comments may not be true today. But we have to verify
>> that at least the extra indirect via PLT doesn't hurt performance on
>> most of current processors.
> That doesn't help me understand. Let me try to ask more specific questions.
> Is an IFUNC's variant-selecting function called only once per process,
> or every time?
> If we sent libc.so-internal calls to 'memset' through the PLT (as is
> currently done for 'malloc') would that mean they were subject to IFUNC
> Is there any *other* way (that already exists - nothing that would
> require binutils changes) to cause libc.so-internal calls to 'memset' to
> be subject to IFUNC dispatch? Compared to using the PLT, what are the
# ifdef SHARED
# undef libc_hidden_builtin_def
/* It doesn't make sense to send libc-internal memset calls through a PLT.
The speedup we get from using GPR instruction is likely eaten away
by the indirect call in the PLT. */
# define libc_hidden_builtin_def(name) \
.globl __GI_memset; __GI_memset = __memset_sse2
> costs and benefits of doing it that way?
Local IFUNC call must go through PLT, whose cost is an extra indirect
branch instruction. It gives you the best implementation for your
processor at run-time.
> What is the function of __libc_ifunc_impl_list? The document you
> referred me to does not mention it or suggest that it might be necessary.
__libc_ifunc_impl_list gives all supported implementations on your
processors, which is used for testing and benchmarking.