This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] ifunc suck, use ufunc.


On Mon, May 25, 2015 at 03:11:35PM -0300, Adhemerval Zanella wrote:
> 
> 
> On 25-05-2015 10:49, OndÅej BÃlka wrote:
> > On Mon, May 25, 2015 at 09:17:48AM -0300, Adhemerval Zanella wrote:
> >>
> >> Although the reason for such mechanism seems reasonable, I do not see a
> >> good approach to add more architecture specific behaviour on GLIBC. 
> >> IFUNC is already not really on all platforms and it has its own 
> >> idiosyncrasies for some ports (like hwcap init order).
> >>
> > While it isn't wise to add another mechanism it is necessary. If you
> > have program where in 50% of call sites implementation A is better while
> > in other half B is better how would you resolve that? No matter how you
> > select you would get bad performance (and resolving selection on
> > backtrace would likely cost more than savings.)
> > 
> > Also you would lose performance from not being able to use precomputed
> > tables.
> > 
> > You could use ifunc with losing per-call site information. However
> > ifuncs would get ugly not simple if you have feature X select
> > implementation using X in order from newest to oldest.
> > 
> > You would need to at start of program read file and you would must
> > decide to select function according to profile from file. You should
> > probably do same with ufuncs but these provide some stack.
> > 
> > An aim would be replace ifuncs with ufunc or some variant. For that they
> > would need to coexist for some time until they are converted.
> > 
> > One side benefit is that you wont need add ifunc for new architectures,
> > just use ufuncs. One problem would be cost, as mentioned before without
> > harware atomic write it could be too costy. One possibility that I
> > mentioned was force double compilation on these arch so you could do
> > resolving for given cpu and then distribute separate binaries with each
> > cpu data. 
> 
> I understood the problem, but I see your initial idea troublesome. As pointed
> out by Szabolcs, there is security issues about adding the function pointer
> information in read-write segments (which recent changes to binutils seems
> to avoid with copy segments and relro), it has potentially issues on debugging
> and profiling tools, it relies on memory semantics that is too x86_64 centric, 
> among others.
>
As i wrote before thats solved problem. Use PTR_MANGLE/DEMANGLE equivalent which 
harms performance a bit. Also what is problem with debugging? Now
steping to string function would send you to random assembly so how
would adding ufunc different?
 
> Also, I see IFUNC as *not* to fix this issue your are bringing, but rather to
> provide chip specific general optimization for average case.  What you are
> suggesting is a polymorphic way to change the function call based on profiling
> feedback.  How are you going to the granularity? Per program base, per thread
> base, per call?.  Also, as pointed out, what happen if you start to use more
> profiling feedback information than call site and start to require more atomic
> access to select/update the function call?
>
Not quite. with profiling you should use ufunc to replace ifunc to stub status.
You said that you want chip specific general optimization. A natural
question is which one is best. For that you need to do profiling and
select one that looks best. Otherwise it would be suboptimal.

So ifunc selection would be granularity where you aggregate profile over
all programs.

As I mentioned before that you need two phases, first where you enable
profiling and compile with per-call granularity everywhere second where
you strip rarely used ufunc, precompute arrays and select implementation
for given cpu.
 
> > 
> > Yes, main problem why java-like approach is unsuitable is that in linux
> > lot of programs have very short lifetime so you need persistent storage
> > to collect enough data to select implementation with some reliability.
> > This is compounded that some functions are called rarely.
> > 
> > As i said before one benefit could be make resolver select
> > implementation that minimizes size for first twenty calls and then do
> > potentialy expensive profiling/reading profile to determine optimum.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]