This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Review decision to inline mempcpy to memcpy.

From: Jakub Jelinek <jakub at redhat dot com>
To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
Cc: "Carlos O'Donell" <carlos at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>, Ondrej Bilka <neleai at seznam dot cz>, "Joseph S. Myers" <joseph at codesourcery dot com>, Jeff Law <law at redhat dot com>
Date: Fri, 4 Mar 2016 17:57:31 +0100
Subject: Re: Review decision to inline mempcpy to memcpy.
Authentication-results: sourceware.org; auth=none
References: <56D856F2 dot 4020000 at redhat dot com> <AM3PR08MB0088D8CBEE224AA54E620F6983BE0 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com>
Reply-to: Jakub Jelinek <jakub at redhat dot com>

On Fri, Mar 04, 2016 at 04:48:00PM +0000, Wilco Dijkstra wrote:
> > Were the changes in glibc to optimize mempcpy as memcpy
> > originally motivated by performance for ARM?
> 
> OK, so the goal behind this was to provide the best possible out of the box performance
> in GLIBC without requiring all targets to write a lot of assembler code. For less

If people use mempcpy in their code, they do it for a reason.

> > The crux of the argument is that the compiler may be able
> > to do a better job of optimizing if it knows the call was
> > a mempcpy as opposed to memcpy + addition.
> 
> No, unfortunately even GCC6 optimizes memcpy better than mempcpy:

Must be some aarch64 backend issue then.  On x86_64 I get for memcpy (x, y,
32):
        movq    (%rsi), %rdx
        movq    %rdi, %rax
        movq    %rdx, (%rdi)
        movq    8(%rsi), %rdx
        movq    %rdx, 8(%rdi)
        movq    16(%rsi), %rdx
        movq    %rdx, 16(%rdi)
        movq    24(%rsi), %rdx
        movq    %rdx, 24(%rdi)
        ret
and for mempcpy (x, y, 32):
        movq    (%rsi), %rax
        movq    %rax, (%rdi)
        movq    8(%rsi), %rax
        movq    %rax, 8(%rdi)
        movq    16(%rsi), %rax
        movq    %rax, 16(%rdi)
        movq    24(%rsi), %rax
        movq    %rax, 24(%rdi)
        leaq    32(%rdi), %rax
        ret

> If we can get GCC to do the right thing depending of the preference of the target and

That is wrong, the preference isn't cast in stone, but you force it into the
apps.

	Jakub

References:
- Review decision to inline mempcpy to memcpy.
  - From: Carlos O'Donell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]