Re: [PATCH] Remove unnecessary IFUNC dispatch for __memset_chk.

On Mon, Aug 10, 2015 at 10:48:03PM -0400, Mike Frysinger wrote:
> On 10 Aug 2015 23:12, OndÅej BÃlka wrote:
> > On Sun, Aug 09, 2015 at 11:09:20PM -0400, Mike Frysinger wrote:
> > > On 09 Aug 2015 14:24, Zack Weinberg wrote:
> > > > Is an IFUNC's variant-selecting function called only once per process,
> > > > or every time?
> > > 
> > > it's once-per-process.  if it were every time, it'd defeat the point of the
> > > optimization.
> >
> > No, its once per each shared library.
> it depends on how you're counting.  i'm talking about each ifunc resolver -- it 
> only executes once per process.  yes, the overall lookup of ifunc relocs happens
> on a per-object basis, but it doesn't mean each resolver runs more than once.
No, you are wrong. You should verify, not make guesses. Its clearly
called twice in following program.

gcc libfoo.c -fPIC -shared -o

void foo (int x)  __attribute__ ((ifunc ("resolve_foo")));

int foo_impl (int x)
  return 42;
int bar()
  foo (5);

static void (*resolve_foo (void)) (void)
  return foo_impl; // we'll just always select this routine

gcc main.c -L. -lfoo

int foo(int x);
int bar();
int main(){

> > > > If we sent calls to 'memset' through the PLT (as is
> > > > currently done for 'malloc') would that mean they were subject to IFUNC
> > > > dispatch?
> > > 
> > > it's a double edge sword.  we specifically want to avoid the PLT for two 
> > > reasons:
> > > (1) speed (PLT is slow)
> > > (2) interposition (we don't want someone exporting a memset symbol and then 
> > > internal glibc code calling that instead of our own version)
> >
> > No, as I wrote in 
> > 
> > [PATCH] x86-64: Remove plt bypassing of ifuncs. 
> > 
> > thats completely flawed analysis. In best case you could save few
> > cycles. As I looked on functions you would for most functions lose
> > at leat twenty cycles as differences between implementations are that
> > big.
> it isn't a flawed analysis as i covered this explicitly in the part of my
> e-mail that you snipped

No, as you snipped essential part its flawed. You didn't consider code
size issues. You will have two versions of same function effectively
doubling code size for each of these instruction. You will spend much
more on cache line misses than you are trying to save. So why do you
think that all internal functions are frequently called by user app
calling some libc function? And rest of application is so small that it
didn't notices it has 1kb of cache per function available?

