This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Do not multiply by zero in bzero.
- From: "Carlos O'Donell" <carlos at redhat dot com>
- To: OndÅej BÃlka <neleai at seznam dot cz>
- Cc: libc-alpha at sourceware dot org
- Date: Thu, 21 Mar 2013 16:40:37 -0400
- Subject: Re: [PATCH] Do not multiply by zero in bzero.
- References: <20130321162026 dot GA16603 at domone dot kolej dot mff dot cuni dot cz>
On 03/21/2013 12:20 PM, OndÅej BÃlka wrote:
> Hello,
>
> I noticed that bzero jumps directly at memset. Memset then computes mask
> by 64bit multiplication. This is suboptimal in bzero case as mask iz
> zero.
>
> I emulating first few instructions of memset and simplified it.
>
> Related question is that we first check if n==1. In my profile this
> happens only with 2% of cases. Is there special reason to keep it?
I agree that it seems odd to have a special case for n==1 with
no comment or indication why that might be better.
Please post as distinct patch with that change to remove the
n==1 check and provide some performance numbers if it actually
makes a difference. If it doesn't then at least it's a code size
win, which is usually a performance win by reducing cache size.
> Passes test on x86-64. OK to commit?
>
> * sysdeps/x86_64/memset.S (memset): Move entry from bzero.
> (bzero): Skip multiplication by zero in memset.
>
> ---
> sysdeps/x86_64/memset.S | 14 ++++++++++----
> 1 files changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/sysdeps/x86_64/memset.S b/sysdeps/x86_64/memset.S
> index b393efe..375c284 100644
> --- a/sysdeps/x86_64/memset.S
> +++ b/sysdeps/x86_64/memset.S
> @@ -25,9 +25,15 @@
> .text
> #if !defined NOT_IN_libc
> ENTRY(__bzero)
> - mov %rsi,%rdx /* Adjust parameter. */
> - xorl %esi,%esi /* Fill with 0s. */
> - jmp L(memset_entry)
> + mov %rsi,%r8 /* Adjust parameter. */
> + xor %rdx,%rdx /* Fill with 0s. */
> +
> + cmp $0x1,%r8
> + mov %rdi,%rax /* memset returns the dest address. */
> + jne L(entry_from_bzero)
> + mov %dl,(%rdi)
> + retq
> +
> END(__bzero)
> weak_alias (__bzero, bzero)
> #endif
> @@ -39,7 +45,6 @@ ENTRY_CHK (__memset_chk)
> END_CHK (__memset_chk)
> #endif
> ENTRY (memset)
> -L(memset_entry):
> cmp $0x1,%rdx
> mov %rdi,%rax /* memset returns the dest address. */
> jne L(ck2)
> @@ -50,6 +55,7 @@ L(ck2):
> mov %rdx,%r8
> movzbq %sil,%rdx
> imul %r9,%rdx
> +L(entry_from_bzero):
> L(now_dw_aligned):
> cmp $0x90,%r8
> ja L(ck_mem_ops_method)
>
Richard has already commented on the assembly.
Post a follow-up version.
Cheers,
Carlos.