This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v4] faster strlen on x64


On Wed, Mar 06, 2013 at 10:54:11PM +0100, Andreas Jaeger wrote:
> On 02/13/2013 12:38 PM, OndÅej BÃlka wrote:

> >-# define RETURN  jmp L(StartStrcpyPart)
> >-# include "strlen-sse2-pminub.S"
> >-# undef RETURN
> 
> You say that you inline the strlen part in the changes entry but I
> do not see this from the function.
> 
> Please add comments here, it's not clear what this code does at all.
> 
> Explain at the beginning the algorithm used, and explain what's
> happening in the code.
> 
It inlines strlen exactly as in comment. This is done by including
strlen-sse2-pminub.S as it is and expanding RETURN. 

It is a temporary solution to resolve dependency of strcat on strlen.
Alternative is to keep strlen-sse2-pminub.S and strcat files.

I delayed proper strcat patch as it also depends on strcpy as I
described in appropriate threads.

snip.

> 
> this needs a comment. What is this macro doing?
>
Creates bitmask in %rdx that has i-th bit set if byte %rax[i] is zero.
> >+#define FIND_ZERO	\
> >+	pcmpeqb	(%rax), %xmm8;	\
> >+	pcmpeqb	16(%rax), %xmm9;	\
> >+	pcmpeqb	32(%rax), %xmm10;	\
> >+	pcmpeqb	48(%rax), %xmm11;	\
> >+	pmovmskb	%xmm8, %esi;	\
> >+	pmovmskb	%xmm9, %edx;	\
> >+	pmovmskb	%xmm10, %r8d;	\
> >+	pmovmskb	%xmm11, %ecx;	\
> >+	salq	$16, %rdx;	\
> >+	salq	$16, %rcx;	\
> >+	orq	%rsi, %rdx;	\
> >+	orq	%r8, %rcx;	\
> >+	salq	$32, %rcx;	\
> >+	orq	%rcx, %rdx;
> >+
> 
> What is this prolog doing?
Tests if end condition happened. 
> 
> >+# define STRNLEN_PROLOG	\
> >+	mov	%r11, %rsi;	\
> >+	subq	%rax, %rsi;	\
> >+	andq	$-64, %rax;	\
> >+	testq	$-64, %rsi;	\
> >+	je	L(strnlen_ret)
> >+#else
> >+# define STRNLEN_PROLOG  andq $-64, %rax;
> >+#endif
> >+
> 
> And this one? Please document!
Avoids duplication as code for common case and crossing page are nearly
identical. I do not have better explanation.
> >+#define PROLOG(lab)	\
> >+	movq	%rdi, %rcx;	\
> >+	xorq	%rax, %rcx;	\
> >+	STRNLEN_PROLOG;	\
> >+	sarq	%cl, %rdx;	\
> >+	test	%rdx, %rdx;	\
> >+	je	L(lab);	\
> >+	bsfq	%rdx, %rax;	\
> >+	ret
> >+

Rest tommorow


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]