This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]


On Thu, Oct 19, 2017 at 9:30 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 10/19/2017 03:36 PM, H.J. Lu wrote:
>>> Do you have any statistics on the timing for large applications
>>> that use a lot of libraries? I don't see gcc, binutils, or glibc as
>>> indicative of the complexity of shared libraries in terms of loaded
>>> shared libraries.
>>
>> _dl_runtime_resolve is only called once when an external function is
>> called the first time.  Many shared libraries isn't a problem unless
>> all execution time is spent in _dl_runtime_resolve.  I don't believe
>> this is a typical behavior.
>
> When you have many shared libraries, you are constantly calling
> _dl_runtime_resolve as the application features are first being used,
> and the question I have is "What kind of additional latency does this
> look like for an application with a lot of DSOs and a lot of external
> functions?"

When there are many DSOs, it takes more time to lookup a symbol
and time to save/restore vector registers becomes noise.   The only
case when time to save/restore vector registers becomes non-trivial is

1. There are a few DSOs so that symbol lookup takes fewer cycles.  And
2. There are many external function calls which are executed only once.  And
3. These external functions take very few cycles.

I can create such a testcase.  But I don't think it is a typical case.

> I understand that you *have* tested the raw latency of the call itself,
> but it's not clear how that relates to real-world performance. I would
> like to see a real look at some application to see how it operates.
>
>>> If we can show that the above latency is in the noise for real
>>> applications using many DSOs, then it makes your case better for
>>> supporting the alternate calling conventions.
>>>
>>
>> Here is the updated patch which updates xsave state size for
>> GLIBC_TUNABLES=glibc.tune.hwcaps=-XSAVEC_Usable
>
> OK.
>
> Is the purpose of the tunable to disable or enable using xsave and
> allow an application to get back the performance they might have lost?

No.  This disables XSAVEC and uses XSAVE instead.  XSAVEC takes
less space and should be faster.

> If we are going to recommend this to users, we should add another
> tunable that is easier to use and document that.
>
> e.g. glibc.tune.x86_optimize_???call=1 (disables xsave usage,
> defaults to 0).

We can add this on top of what we have now.  But it will make text
size of ld.so bigger, which pollute cache.  Increased cycles to save/restore
vector registers with fxsave/xsave/xsavec is just noise.

-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]