This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v4] faster strlen on x64
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Andreas Jaeger <aj at suse dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Thu, 7 Mar 2013 01:29:20 +0100
- Subject: Re: [PATCH v4] faster strlen on x64
- References: <20130213113840.GA7781@domone.kolej.mff.cuni.cz><5137BB03.7020206@suse.com>
On Wed, Mar 06, 2013 at 10:54:11PM +0100, Andreas Jaeger wrote:
> On 02/13/2013 12:38 PM, OndÅej BÃlka wrote:
> >-# define RETURN jmp L(StartStrcpyPart)
> >-# include "strlen-sse2-pminub.S"
> >-# undef RETURN
>
> You say that you inline the strlen part in the changes entry but I
> do not see this from the function.
>
> Please add comments here, it's not clear what this code does at all.
>
> Explain at the beginning the algorithm used, and explain what's
> happening in the code.
>
It inlines strlen exactly as in comment. This is done by including
strlen-sse2-pminub.S as it is and expanding RETURN.
It is a temporary solution to resolve dependency of strcat on strlen.
Alternative is to keep strlen-sse2-pminub.S and strcat files.
I delayed proper strcat patch as it also depends on strcpy as I
described in appropriate threads.
snip.
>
> this needs a comment. What is this macro doing?
>
Creates bitmask in %rdx that has i-th bit set if byte %rax[i] is zero.
> >+#define FIND_ZERO \
> >+ pcmpeqb (%rax), %xmm8; \
> >+ pcmpeqb 16(%rax), %xmm9; \
> >+ pcmpeqb 32(%rax), %xmm10; \
> >+ pcmpeqb 48(%rax), %xmm11; \
> >+ pmovmskb %xmm8, %esi; \
> >+ pmovmskb %xmm9, %edx; \
> >+ pmovmskb %xmm10, %r8d; \
> >+ pmovmskb %xmm11, %ecx; \
> >+ salq $16, %rdx; \
> >+ salq $16, %rcx; \
> >+ orq %rsi, %rdx; \
> >+ orq %r8, %rcx; \
> >+ salq $32, %rcx; \
> >+ orq %rcx, %rdx;
> >+
>
> What is this prolog doing?
Tests if end condition happened.
>
> >+# define STRNLEN_PROLOG \
> >+ mov %r11, %rsi; \
> >+ subq %rax, %rsi; \
> >+ andq $-64, %rax; \
> >+ testq $-64, %rsi; \
> >+ je L(strnlen_ret)
> >+#else
> >+# define STRNLEN_PROLOG andq $-64, %rax;
> >+#endif
> >+
>
> And this one? Please document!
Avoids duplication as code for common case and crossing page are nearly
identical. I do not have better explanation.
> >+#define PROLOG(lab) \
> >+ movq %rdi, %rcx; \
> >+ xorq %rax, %rcx; \
> >+ STRNLEN_PROLOG; \
> >+ sarq %cl, %rdx; \
> >+ test %rdx, %rdx; \
> >+ je L(lab); \
> >+ bsfq %rdx, %rax; \
> >+ ret
> >+
Rest tommorow