This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] ifunc suck, use ufunc.


On Mon, May 25, 2015 at 04:36:52AM +0200, OndÅej BÃlka wrote:
> On Mon, May 25, 2015 at 03:43:23AM +0200, Szabolcs Nagy wrote:
> > * Ond??ej B?lka <neleai@seznam.cz> [2015-05-24 23:38:58 +0200]:
> > > A main benefit would be interlibrary constant folding. Why waste cycles
> > > on reinitializing constant, just save it to ufunc structure. Resolver 
> > > then could precompute tables to improve speed.
> > > 
> > > As interposing these you would need to interpose resolver.
> > > 
> > > An gcc support is not needed but we could get something with alternate
> > > calling convention as passing resolver struct is common and could be
> > > preserved for loops with tail calls.
> > > 
> > > A future direction could be replace plt and linker with ufunc, it would
> > > require adding function string pointer to structure and calling first
> > > generic resolver to select specific resolver.
> > > 
> > > Comments?
> > > 
> > 
> > this makes memset non-async-signal-safe. (qoi issue)
> >
> Did I explicitly say that its architecture specific optimization or did
> I forgot?

AS-safety is broken regardless of arch. Only the barrier stuff if
arch-specific.

> > it is not thread-safe either and would need an acquire
> > load barrier on every invocation of memset to fix that
> > or the use of thread local storage. (conformance issue)
> > 
> > (in the example only resolve->fn is modified and idempotently,
> > this would work in practice but as soon as ->data is accessed
> > too the memory ordering guarantees are required.. which can
> > be made efficient on some archs but only in asm)
> > 
> > in the example memset is called through the wrong type
> > of function pointer: the resolver and resolvee are
> > incompatible so this is invalid c, only works in asm.
> > 
> Thats why I intended it as architecture-specific. On x64 it will work
> along with memset prototype. Adding atomic/locking in resolver would be unnecessary
> overhead.
> 
> Could make this generic by defining macros that expand to atomic read on
> archs that don't act as pram.

Do you realize the relative cost of an atomic read (barrier) versus a
small memset? This is like driving an extra mile to a cheaper gas
station to save $0.01 per gallon...

> > it is not clear to me how many such ufunc structs will be
> > in a program for a specific function and how their redundant
> > initialization is avoided.
> > (one for every call site? every tu? every dso?)
> > 
> Main objective is neccessary on call-site basis. These aren't retundant
> as data will be different for different call sites.
> For example in sequence
> 
> memset (x,0,n);
> memset (y,1,n);
> 
> In first memset data would contain 16 zeros, second will have 16 ones to
> save cycles on repeated creating of mask.
> 
> Then there is planned optimization x64-specific where I need to change prototype
> more to pass data in xmm0 register and end of string. Then you could
> call different places of unrolled moves like
> 
> ....
> movdqu %xmm0, -64(%rdi)
> movdqu %xmm0, -48(%rdi)
> movdqu %xmm0, -32(%rdi)
> movdqu %xmm0, -16(%rdi)
> ret
> 
> It fixes that gcc does similar unrolling but only with 8-byte moves and
> tends to be quite excessive with it and it would cause performance
> penalty on cold paths. Also gcc couldn't do this without spliting
> function to several per-cpu variants as on some arch this would be slow
> without aligning, on others a rep stosq would be faster etc and you must
> do resolution to determine what happened.
> 
> A per-dso would be possible with more bookkeeping (as I don't know how
> convince compiler to do that), you would need to end
> compilation by adding file with hidden variables with protected
> attribute.
> 
> A similar idea would make more sense as gcc optimization to first
> extract address from plt.

This is utterly hideous...

Rich


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]