This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: memcpy performance regressions 2.19 -> 2.24(5)


Sorry, yes I meant independent of the tunables discussion.  Thanks for
pointing that macro out, I hadn't realized, but makes sense for
supporting older compilers that didn't have IFUNCs.

I see you added the original ifunc implementation back in 2009!
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40528

It seems like GCC 4.7 is needed to build now, so should be ok to
switch?  I'm happy to volunteer to do the conversions for the x86_64
routines if you think it makes sense.



On Tue, May 23, 2017 at 8:42 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, May 23, 2017 at 5:56 PM, Erich Elsen <eriche@google.com> wrote:
>> Ok.  Do you have any specific concerns?  It would help make it easier
>> for us to do the testing internally to switch to memcpy.c.
>
> We use libc_ifunc to implement IFUNC, like x86_64/multiarch/strstr.c. It may be
> a good idea to switch to a different format and require all IFUNCs in
> C for x86-64
> if compilers with IFUNC attribute are required to build glibc. But this is
> independent to tunables.
>
>> Interesting, thanks for the info.  More reason for being able to
>> select the implementation!
>> On Tue, May 23, 2017 at 3:55 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Tue, May 23, 2017 at 3:12 PM, Erich Elsen <eriche@google.com> wrote:
>>>> Sounds good to me.  Even if tunables aren't added, does memcpy.S ->
>>>> memcpy.c seem reasonable?
>>>
>>> I prefer not to do it for now.  We can revisit it later after tunable is added
>>> to cpu_features.
>>>
>>> BTW,  REP MOV is expected to have lower bandwidth on multi-socket
>>> systems, but has the benefit of lower cache disruption throughout the
>>> cache hierarchy.   This is trade off of between overall system throughput
>>> and single program performance.
>>>
>>>
>>>> On Tue, May 23, 2017 at 3:07 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Tue, May 23, 2017 at 1:57 PM, Erich Elsen <eriche@google.com> wrote:
>>>>>> Maybe there's room for both?
>>>>>>
>>>>>> Setting the cpu_features would affect everything; it would be useful
>>>>>> to be able to target only specific (and very important) routines.
>>>>>
>>>>> I prefer to do the cpu_features first.  If it turns out not
>>>>> sufficient, we then do
>>>>> the IFUNC implementation.
>>>>>
>>>>>> On Tue, May 23, 2017 at 1:46 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>>> On Tue, May 23, 2017 at 1:39 PM, Erich Elsen <eriche@google.com> wrote:
>>>>>>>> I was also thinking that it might be nice to have a TUNABLE that sets
>>>>>>>> the implementation of memcpy directly.  It would be easier to do this
>>>>>>>> if memcpy.S was memcpy.c.  Attached is a patch that does the
>>>>>>>> conversion but doesn't add the tunables.  How would you feel about
>>>>>>>> this?  It has no runtime impact, probably increases the size slightly,
>>>>>>>> and makes the code easier to read / modify.
>>>>>>>>
>>>>>>>
>>>>>>> It depends on how far you want to go.  We can add TUNABLE support
>>>>>>> to each IFUNC implementation or we can add TUNABLE support to
>>>>>>> cpu_features to update processor features.  I prefer latter.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> H.J.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> H.J.
>>>
>>>
>>>
>>> --
>>> H.J.
>
>
>
> --
> H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]