This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Save and restore xmm0-xmm7 in _dl_runtime_resolve
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: OndÅej BÃlka <neleai at seznam dot cz>
- Cc: "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
- Date: Tue, 28 Jul 2015 19:03:35 -0700
- Subject: Re: [PATCH] Save and restore xmm0-xmm7 in _dl_runtime_resolve
- Authentication-results: sourceware.org; auth=none
- References: <CAMe9rOoXLPUr_LUexoRKjrCdNhP0J8EMY+1XNAaLnpW1qknb7w at mail dot gmail dot com> <20150709142827 dot GA18030 at domone> <CAMe9rOoXCwiPdQVP7_tV7599f6y9w_n1P+SXsE7urb69f3v7gA at mail dot gmail dot com> <20150711104654 dot GA26570 at domone> <20150711202742 dot GA9074 at gmail dot com> <20150711235002 dot GA7543 at gmail dot com> <20150726131622 dot GA10623 at domone> <CAMe9rOre_GQimKou2PXjp95xcfN1jYO5-tkEAB7eMbP1HMO+FQ at mail dot gmail dot com> <20150727101015 dot GA489 at domone> <CAMe9rOrpDk6ixUJ+9RU5L0aV=uLUJzzNtJc-XuPURTFHhXzGRw at mail dot gmail dot com> <20150727132623 dot GA13448 at domone> <CAMe9rOpwaByc1ogE2Y4fJzc_hwNo5+B23F7T1Qdsv2qKJy8DcQ at mail dot gmail dot com> <CAMe9rOpniyn+1f4Lq=RBqD+F4C-KYF+vgxUBqS+x01EHuKdzdA at mail dot gmail dot com>
On Tue, Jul 28, 2015 at 1:55 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Jul 27, 2015 at 6:37 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Mon, Jul 27, 2015 at 6:26 AM, OndÅej BÃlka <neleai@seznam.cz> wrote:
>>> On Mon, Jul 27, 2015 at 06:14:07AM -0700, H.J. Lu wrote:
>>>> >>
>>>> >> There is a potential performance issue. This won't change parameters
>>>> >> passed in S256-bit/512-bit vector registers because SSE load will only
>>>> >> update the lower 128 bits of 256-bit/512-bit vector registers while
>>>> >> preserving the upper bits. But these SSE load operations may not be
>>>> >> fast on all current and future processors. To load the entire
>>>> >> 256-bit/512-bit vector registers, we need to check CPU feature in
>>>> >> each symbol lookup. On the other hand, we can compile x86-64 ld.so
>>>> >> with -msse2. I don't know what the final performance impact is.
>>>> >>
>>>> > Yes, these should be saved due problems with modes. There could be
>>>> > problem that saving these takes longer. You don't need
>>>> > check cpu features on each call.
>>>> > Make _dl_runtime_resolve a function pointer and on
>>>> > startup initialize it to correct variant.
>>>>
>>>> One more indirect call.
>>>>
>>> no, my proposal is different, we could do this:
>>>
>>> void *_dl_runtime_resolve;
>>> int startup()
>>> {
>>> if (has_avx())
>>> _dl_runtime_resolve = _dl_runtime_resolve_avx;
>>> else
>>> _dl_runtime_resolve = _dl_runtime_resolve_sse;
>>> }
>>>
>>> Then we will assign correct variant.
>>
>> Yes, this may work for both _dl_runtime_profile and
>> _dl_runtime_resolve. I will see what I can do.
>>
>
> Please try hjl/pr18661 branch. I implemented:
>
> 0000000000016fd0 t _dl_runtime_profile_avx
> 0000000000016b50 t _dl_runtime_profile_avx512
> 0000000000017450 t _dl_runtime_profile_sse
> 00000000000168d0 t _dl_runtime_resolve_avx
> 0000000000016780 t _dl_runtime_resolve_avx512
> 0000000000016a20 t _dl_runtime_resolve_sse
I enabled SSE in ld.so and it works fine.
--
H.J.