This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Doing more inside an ifunc (Was Re: Document use of IFUNC support outside of libc.)


On Mon, Mar 07, 2016 at 05:33:24PM +0000, Szabolcs Nagy wrote:
> there seems to be interest in optimizations/dispatch based
> on the micro architecture which is not easily available in
> userspace currently (on aarch64).

Sorry, I was interested in this conversation but completely missed it,
so starting it again.  I hope it's not too late :)

> linux exports various cpu info in /sys but that is not
> stable abi and users probably don't want large number of
> syscalls traversing the /sys tree at process startup just
> to get slightly better tuned memcpy or similar.
> 
> one idea by Adhemerval Zanella was to use vdso for this.
> (the kernel can provide a versioned function symbol there
> to return a pointer to some cpu info struct, which can be
> read only thus shared across processes).
> there is no proposed design for this yet either on kernel
> or libc side, but it would make sense if ifunc could use it.
> 
> currently the only reliable mechanisms for ifunc dispatch
> are hwcap feature bits (if passed as argument) or cpuid
> like instruction (e.g. on aarch64 cpuid like instructions
> are not available to userspace, but can be emulated by the
> kernel or provided as syscall, in either case it would be
> context switch into the kernel, which can be bad if large
> number of ifunc resolvers do it e.g. because function multi-
> versioning is implemented that way, unless there is some
> caching mechanism which is also not easy to do in ifunc...)

The context switch is not the worst thing that can happen for the
emulated instructions because we can easily cache the result and
reduce the number of context switches to a minimum.  The difficult bit
for the emulated instruction (MRS) is heterogenous systems, where it
would be difficult (impossible?) for userspace to just use the
emulated instruction to deterministically identify all of the
processor cores.

So the emulated instruction will only work for specific processor
cores that are known to always be in a homogenous configuration and
never otherwise.  For anything else, we will need the kernel to give
us full information about all of the cores in another way, either via
sysfs or vdso.  The sysfs route has been proposed earlier[1] but is
hairy for us because it traverses the filesystem to identify all CPU
cores, resulting in a proportional number of syscalls.  The vdso
alternative is better because the kernel can then give us all of the
information in exactly one call and avoid the context switch at the
same time.

I had hacked up a patch to test using the sysfs patches in [1] and it
required reimplementing some string functions to avoid referencing
them but that was about the only thing needed to get it working.
Safety however is a completely different issue and I don't know if we
can even guarantee that during symbol resolution.

Siddhesh

[1] https://lkml.org/lkml/2015/9/16/452


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]