This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Save and restore xmm0-xmm7 in _dl_runtime_resolve
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: OndÅej BÃlka <neleai at seznam dot cz>
- Cc: "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
- Date: Wed, 29 Jul 2015 05:11:51 -0700
- Subject: Re: [PATCH] Save and restore xmm0-xmm7 in _dl_runtime_resolve
- Authentication-results: sourceware.org; auth=none
- References: <CAMe9rOoXLPUr_LUexoRKjrCdNhP0J8EMY+1XNAaLnpW1qknb7w at mail dot gmail dot com> <20150709142827 dot GA18030 at domone> <CAMe9rOoXCwiPdQVP7_tV7599f6y9w_n1P+SXsE7urb69f3v7gA at mail dot gmail dot com> <20150711104654 dot GA26570 at domone> <20150711202742 dot GA9074 at gmail dot com> <20150711235002 dot GA7543 at gmail dot com> <20150726131622 dot GA10623 at domone> <CAMe9rOre_GQimKou2PXjp95xcfN1jYO5-tkEAB7eMbP1HMO+FQ at mail dot gmail dot com> <20150727101015 dot GA489 at domone> <CAMe9rOrpDk6ixUJ+9RU5L0aV=uLUJzzNtJc-XuPURTFHhXzGRw at mail dot gmail dot com> <20150727132623 dot GA13448 at domone> <CAMe9rOpwaByc1ogE2Y4fJzc_hwNo5+B23F7T1Qdsv2qKJy8DcQ at mail dot gmail dot com> <CAMe9rOpniyn+1f4Lq=RBqD+F4C-KYF+vgxUBqS+x01EHuKdzdA at mail dot gmail dot com> <CAMe9rOrYYVsvGEjXWVyro2VfOsnjH30yiR=CDvSFcyU6qwU6Lg at mail dot gmail dot com>
On Tue, Jul 28, 2015 at 7:03 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Jul 28, 2015 at 1:55 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Mon, Jul 27, 2015 at 6:37 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Mon, Jul 27, 2015 at 6:26 AM, OndÅej BÃlka <neleai@seznam.cz> wrote:
>>>> On Mon, Jul 27, 2015 at 06:14:07AM -0700, H.J. Lu wrote:
>>>>> >>
>>>>> >> There is a potential performance issue. This won't change parameters
>>>>> >> passed in S256-bit/512-bit vector registers because SSE load will only
>>>>> >> update the lower 128 bits of 256-bit/512-bit vector registers while
>>>>> >> preserving the upper bits. But these SSE load operations may not be
>>>>> >> fast on all current and future processors. To load the entire
>>>>> >> 256-bit/512-bit vector registers, we need to check CPU feature in
>>>>> >> each symbol lookup. On the other hand, we can compile x86-64 ld.so
>>>>> >> with -msse2. I don't know what the final performance impact is.
>>>>> >>
>>>>> > Yes, these should be saved due problems with modes. There could be
>>>>> > problem that saving these takes longer. You don't need
>>>>> > check cpu features on each call.
>>>>> > Make _dl_runtime_resolve a function pointer and on
>>>>> > startup initialize it to correct variant.
>>>>>
>>>>> One more indirect call.
>>>>>
>>>> no, my proposal is different, we could do this:
>>>>
>>>> void *_dl_runtime_resolve;
>>>> int startup()
>>>> {
>>>> if (has_avx())
>>>> _dl_runtime_resolve = _dl_runtime_resolve_avx;
>>>> else
>>>> _dl_runtime_resolve = _dl_runtime_resolve_sse;
>>>> }
>>>>
>>>> Then we will assign correct variant.
>>>
>>> Yes, this may work for both _dl_runtime_profile and
>>> _dl_runtime_resolve. I will see what I can do.
>>>
>>
>> Please try hjl/pr18661 branch. I implemented:
>>
>> 0000000000016fd0 t _dl_runtime_profile_avx
>> 0000000000016b50 t _dl_runtime_profile_avx512
>> 0000000000017450 t _dl_runtime_profile_sse
>> 00000000000168d0 t _dl_runtime_resolve_avx
>> 0000000000016780 t _dl_runtime_resolve_avx512
>> 0000000000016a20 t _dl_runtime_resolve_sse
>
> I enabled SSE in ld.so and it works fine.
>
I fully enabled SSE in ld.so on hjl/pr18661 branch. I didn't notice
any issue on both AVX and non-AVX machines. There may be
some performance improvement. But it is hard to tell.
--
H.J.