This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 4/4] S390: Implement mempcpy with help of memcpy. [BZ #19765]


On 05/13/2016 05:30 PM, Stefan Liebler wrote:


On 05/13/2016 05:06 PM, H.J. Lu wrote:
On Fri, May 13, 2016 at 7:59 AM, Stefan Liebler
<stli@linux.vnet.ibm.com> wrote:


On 05/13/2016 04:49 PM, H.J. Lu wrote:

On Fri, May 13, 2016 at 7:42 AM, Stefan Liebler
<stli@linux.vnet.ibm.com>
wrote:



On 05/12/2016 04:10 PM, Adhemerval Zanella wrote:




On 09/05/2016 11:15, Stefan Liebler wrote:


On 05/05/2016 06:36 PM, H.J. Lu wrote:


On Thu, May 5, 2016 at 9:34 AM, Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:




On May 5, 2016, at 11:45, H.J. Lu <hjl.tools@gmail.com> wrote:

On Thu, May 5, 2016 at 7:15 AM, Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:




On 05/05/2016 10:37, H.J. Lu wrote:
On Wed, May 4, 2016 at 1:58 PM, Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:




On 04/05/2016 17:51, Wilco Dijkstra wrote:
Adhemerval Zanella wrote:



But my point is all the architectures which provide an
optimized
mempcpy is
though either 1. jump directly to optimized memcpy (s390
case
for
this patchset),
2. clonning the same memcpy implementation and adjusting the
pointers (x86_64) or
3. using a similar strategy for both implementations
(powerpc).



Indeed, which of those are used doesn't matter much.

So for this change I am proposing compiler support won't be
required because both
memcpy and __mempcpy will be transformed to memcpy + s.
Based
on
assumption that
memcpy is fast as mempcpy implementation I think there is no
need
to just add
this micro-optimization to only s390, but rather make is
general.



GLIBC already has this optimization in the generic string
header,
it's just that s390 wants
to do something different again. As long as GCC isn't
fixed this
isn't possible to support
s390 without this header workaround. And we need GCC to
improve
so
things work
better for all the other C libraries...



But the current one at string/string.h is only enabled with
!defined _HAVE_STRING_ARCH_mempcpy,
so if a port actually adds a mempcpy one it won't be enabled.
What
I am trying to argue it
to just remove the !defined _HAVE_STRING_ARCH_mempcpy and
enable
it
as default for all
ports.



Please don't enable it for x86.  Calling memcpy means we
have to
save and restore 2 registers for no good reasons.



Yes, direct call will require save and restore the size for
further
add
and this is true for most architectures.  My question is if does
this
really matter in currently GLIBC internal usage and on programs
that
might use it compared against the burden of keeping the various
string*.h header in check for multiple architectures or
adding this
logic (mempcpy transformation to memcpy) on compiler.



What burden? There is nothing to do in glibc for x86.  GCC can
inline mempcpy for x86.



In fact I am objecting all the bits GLIBC added on string*.h that
only
adds complexity for some micro-optimizations. For x86 I do
agree that
transforming mempcpy to memcpy is no the best strategy.

My rationale is to avoid add even more arch-specific bits in
installed
headers to add such optimizations.



I believe most of those micro-optimizations belong to GCC, not
glibc.
Of course, we should keep the existing ones for older GCCs.  We
should avoid adding new ones.


Does this mean, the proposed way is to not add a macro for
mempcpy in
sysdeps/s390/bits/string.h or e.g. for another architecture?

If _HAVE_STRING_ARCH_mempcpy is not defined for s390, then it will
always
use the macro defined in string/string.h, which inlines memcpy +
n and
the
mempcpy-function is not called. The memcpy is transformed to mvc,
mvhi
for
e.g. constant lengths. Otherwise, the memcpy-function is called
and the
length will be added after the call.

According to PR70140
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70140), the macro in
string/string.h will be removed after this bug is fixed in GCC?
Then I think the decision, if mempcpy or memcpy is called or an
inline
version is emitted, will be done per architecture backend?

If this is all true, then the mempcpy-function will be called in
future
if it makes sense!?


If _HAVE_STRING_ARCH_mempcpy will be defined for s390, then
mempcpy-function is always called - even for cases with constant
length
where mvc or mvhi is emitted.
This is definitely not the intention of this patch.
After fixing PR70140, GCC will correctly handle the constant length
cases!?



What I am proposing is to avoid add more arch-specific
optimization on
installed
string*.h headers and instead work on adding them on compiler
side.  My
view is
we should cleanup as much as possible the string headers and only add
optimization
that are more architecture neutral.

Related to patch, my understanding is s390x does not really
provide an
optimized
mempcpy (it uses the default mempcpy.c) and I think a better approach
would be
to add an optimized mempcpy like x88_64
(c365e615f7429aee302f8af7bf07ae262278febb).
The mempcpy won't be optimized directly to mvc instruction, but at
least
it will
call the optimized memcpy.  The full optimization will be done by
correctly
handling the transformation on compiler size (the PR#70140 as you
referred).



Okay. Here is the updated patch. It implements mempcpy with memcpy as
before, but does not change the s390-specific string.h and
_HAVE_STRING_ARCH_mempcpy is not defined. Thus at the moment mempcpy()
won't
be called, but instead memcpy + n is inlined by the mempcpy macro in
string/string.h. After fxing PR70140, either the mempcpy macro in
string/string.h has to be removed or _HAVE_STRING_ARCH_mempcpy has
to be
defined for S390.

Okay to commit?


+__asm__ (".weak mempcpy\n\t"
+ ".set mempcpy,__mempcpy\n\t");

Don't we have a macro for this?

Yes that is true, but I can't use it in mempcpy.c in this case,
because the macro s390_libc_ifunc (__mempcpy) uses an inline assembly
to implement the __mempcpy ifunc symbol.
If I use the weak_alias macro here in the c-file, gcc fails because
it has
no knowledge of symbol __mempcpy.


Can you make s390_libc_ifunc similar to libc_ifunc to expose
FUNC to gcc?


No I won't do that.
At the moment s390_vx_libc_ifunc is implemented in this way,
then the weak_alias works.
But I have to remove the asm(FUNCNAME) in
"extern void *resolverfunction(..) asm (FUNCNAME)"!

The dwarf information for the resolverfunction has a DW_AT_linkage_name
entry, which shows to FUNCNAME. If you do an inferior function call to
FUNCNAME in lldb, then it fails due to something like that:
"error: no matching function for call to 'FUNCNAME'
candidate function not viable: no known conversion from 'const char [6]'
to 'unsigned long' for 1st argument"
(The unsigned long is the dl_hwcap argument of the resolverfunction).

Here is the dwarf information for e.g. __resolve___strlen:
<1><1e6424>: Abbrev Number: 43 (DW_TAG_subprogram)
     <1e6425>   DW_AT_external    : 1
     <1e6425>   DW_AT_name        : (indirect string, offset: 0x1146e):
__resolve___strlen
     <1e6429>   DW_AT_decl_file   : 1
     <1e642a>   DW_AT_decl_line   : 23
     <1e642b>   DW_AT_linkage_name: (indirect string, offset: 0x1147a):
strlen
     <1e642f>   DW_AT_prototyped  : 1
     <1e642f>   DW_AT_type        : <0x1e4ccd>
     <1e6433>   DW_AT_low_pc      : 0x998e0
     <1e643b>   DW_AT_high_pc     : 0x16
     <1e6443>   DW_AT_frame_base  : 1 byte block: 9c
(DW_OP_call_frame_cfa)
     <1e6445>   DW_AT_GNU_all_call_sites: 1
     <1e6445>   DW_AT_sibling     : <0x1e6459>
  <2><1e6449>: Abbrev Number: 44 (DW_TAG_formal_parameter)
     <1e644a>   DW_AT_name        : (indirect string, offset: 0x1845):
dl_hwcap
     <1e644e>   DW_AT_decl_file   : 1
     <1e644f>   DW_AT_decl_line   : 23
     <1e6450>   DW_AT_type        : <0x1e4c8d>
     <1e6454>   DW_AT_location    : 0x122115 (location list)


Without the asm (FUNCNAME), there is no DW_AT_linkage_name entry and the
inferior function call is working.

Do you have any better idea?


Any objection?
Otherwise I'll commit the patch series soon.

Bye
Stefan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]