This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Save and restore xmm0-xmm7 in _dl_runtime_resolve


On Tue, Jul 28, 2015 at 7:03 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Jul 28, 2015 at 1:55 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Mon, Jul 27, 2015 at 6:37 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Mon, Jul 27, 2015 at 6:26 AM, OndÅej BÃlka <neleai@seznam.cz> wrote:
>>>> On Mon, Jul 27, 2015 at 06:14:07AM -0700, H.J. Lu wrote:
>>>>> >>
>>>>> >> There is a potential performance issue.  This won't change parameters
>>>>> >> passed in S256-bit/512-bit vector registers because SSE load will only
>>>>> >> update the lower 128 bits of 256-bit/512-bit vector registers while
>>>>> >> preserving the upper bits.  But these SSE load operations may not be
>>>>> >> fast on all current and future processors.  To load the entire
>>>>> >> 256-bit/512-bit vector registers, we need to check CPU feature in
>>>>> >> each symbol lookup.  On the other hand, we can compile x86-64 ld.so
>>>>> >> with -msse2.  I don't know what the final performance impact is.
>>>>> >>
>>>>> > Yes, these should be saved due problems with modes. There could be
>>>>> > problem that saving these takes longer. You don't need
>>>>> > check cpu features on each call.
>>>>> > Make _dl_runtime_resolve a function pointer and on
>>>>> > startup initialize it to correct variant.
>>>>>
>>>>> One more indirect call.
>>>>>
>>>> no, my proposal is different, we could do this:
>>>>
>>>> void *_dl_runtime_resolve;
>>>> int startup()
>>>> {
>>>>   if (has_avx())
>>>>     _dl_runtime_resolve = _dl_runtime_resolve_avx;
>>>>   else
>>>>     _dl_runtime_resolve = _dl_runtime_resolve_sse;
>>>> }
>>>>
>>>> Then we will assign correct variant.
>>>
>>> Yes, this may work for both _dl_runtime_profile and
>>>  _dl_runtime_resolve.  I will see what I can do.
>>>
>>
>> Please try hjl/pr18661 branch.  I implemented:
>>
>> 0000000000016fd0 t _dl_runtime_profile_avx
>> 0000000000016b50 t _dl_runtime_profile_avx512
>> 0000000000017450 t _dl_runtime_profile_sse
>> 00000000000168d0 t _dl_runtime_resolve_avx
>> 0000000000016780 t _dl_runtime_resolve_avx512
>> 0000000000016a20 t _dl_runtime_resolve_sse
>
> I enabled SSE in ld.so and it works fine.
>

I fully enabled SSE in ld.so on hjl/pr18661 branch.  I didn't notice
any issue on both AVX and non-AVX machines.  There may be
some performance improvement.  But it is hard to tell.


-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]