This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 4/4] S390: Implement mempcpy with help of memcpy. [BZ #19765]



On 09/05/2016 11:15, Stefan Liebler wrote:
> On 05/05/2016 06:36 PM, H.J. Lu wrote:
>> On Thu, May 5, 2016 at 9:34 AM, Adhemerval Zanella
>> <adhemerval.zanella@linaro.org> wrote:
>>>
>>>
>>>> On May 5, 2016, at 11:45, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>
>>>> On Thu, May 5, 2016 at 7:15 AM, Adhemerval Zanella
>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>
>>>>>
>>>>>> On 05/05/2016 10:37, H.J. Lu wrote:
>>>>>> On Wed, May 4, 2016 at 1:58 PM, Adhemerval Zanella
>>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On 04/05/2016 17:51, Wilco Dijkstra wrote:
>>>>>>>> Adhemerval Zanella wrote:
>>>>>>>>>
>>>>>>>>> But my point is all the architectures which provide an optimized mempcpy is
>>>>>>>>> though either 1. jump directly to optimized memcpy (s390 case for this patchset),
>>>>>>>>> 2. clonning the same memcpy implementation and adjusting the pointers (x86_64) or
>>>>>>>>> 3. using a similar strategy for both implementations (powerpc).
>>>>>>>>
>>>>>>>> Indeed, which of those are used doesn't matter much.
>>>>>>>>
>>>>>>>>> So for this change I am proposing compiler support won't be required because both
>>>>>>>>> memcpy and __mempcpy will be transformed to memcpy + s.  Based on assumption that
>>>>>>>>> memcpy is fast as mempcpy implementation I think there is no need to just add
>>>>>>>>> this micro-optimization to only s390, but rather make is general.
>>>>>>>>
>>>>>>>> GLIBC already has this optimization in the generic string header, it's just that s390 wants
>>>>>>>> to do something different again. As long as GCC isn't fixed this isn't possible to support
>>>>>>>> s390 without this header workaround. And we need GCC to improve so things work
>>>>>>>> better for all the other C libraries...
>>>>>>>
>>>>>>> But the current one at string/string.h is only enabled with !defined _HAVE_STRING_ARCH_mempcpy,
>>>>>>> so if a port actually adds a mempcpy one it won't be enabled.  What I am trying to argue it
>>>>>>> to just remove the !defined _HAVE_STRING_ARCH_mempcpy and enable it as default for all
>>>>>>> ports.
>>>>>>
>>>>>> Please don't enable it for x86.  Calling memcpy means we have to
>>>>>> save and restore 2 registers for no good reasons.
>>>>>
>>>>> Yes, direct call will require save and restore the size for further add
>>>>> and this is true for most architectures.  My question is if does this
>>>>> really matter in currently GLIBC internal usage and on programs that
>>>>> might use it compared against the burden of keeping the various
>>>>> string*.h header in check for multiple architectures or adding this
>>>>> logic (mempcpy transformation to memcpy) on compiler.
>>>>
>>>> What burden? There is nothing to do in glibc for x86.  GCC can
>>>> inline mempcpy for x86.
>>>
>>> In fact I am objecting all the bits GLIBC added on string*.h that only adds complexity for some micro-optimizations. For x86 I do agree that transforming mempcpy to memcpy is no the best strategy.
>>>
>>> My rationale is to avoid add even more arch-specific bits in installed headers to add such optimizations.
>>
>> I believe most of those micro-optimizations belong to GCC, not glibc.
>> Of course, we should keep the existing ones for older GCCs.  We
>> should avoid adding new ones.
>>
> 
> Does this mean, the proposed way is to not add a macro for mempcpy in sysdeps/s390/bits/string.h or e.g. for another architecture?
> 
> If _HAVE_STRING_ARCH_mempcpy is not defined for s390, then it will always use the macro defined in string/string.h, which inlines memcpy + n and the mempcpy-function is not called. The memcpy is transformed to mvc, mvhi for e.g. constant lengths. Otherwise, the memcpy-function is called and the length will be added after the call.
> 
> According to PR70140 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70140), the macro in string/string.h will be removed after this bug is fixed in GCC?
> Then I think the decision, if mempcpy or memcpy is called or an inline version is emitted, will be done per architecture backend?
> 
> If this is all true, then the mempcpy-function will be called in future if it makes sense!?
> 
> 
> If _HAVE_STRING_ARCH_mempcpy will be defined for s390, then mempcpy-function is always called - even for cases with constant length where mvc or mvhi is emitted.
> This is definitely not the intention of this patch.
> After fixing PR70140, GCC will correctly handle the constant length cases!?

What I am proposing is to avoid add more arch-specific optimization on installed
string*.h headers and instead work on adding them on compiler side.  My view is
we should cleanup as much as possible the string headers and only add optimization
that are more architecture neutral.

Related to patch, my understanding is s390x does not really provide an optimized
mempcpy (it uses the default mempcpy.c) and I think a better approach would be
to add an optimized mempcpy like x88_64 (c365e615f7429aee302f8af7bf07ae262278febb).
The mempcpy won't be optimized directly to mvc instruction, but at least it will
call the optimized memcpy.  The full optimization will be done by correctly
handling the transformation on compiler size (the PR#70140 as you referred).


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]