This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 3/3] Add i386 memset and memcpy assembly functions


On Wed, Aug 26, 2015 at 06:46:31AM -0700, H.J. Lu wrote:
> Add i386 memset and memcpy assembly functions with REP MOVSB/STOSB
> instructions.  They will be used to implement i386 multi-arch memcpy.
> 
> OK for master?
>
No, as rep stosb has terrible performance on most of machines, on ivy
bridge its around six times slower than rep stosq. I wouldn't be
surprised when you test it for affected machines it would be at least three times
slower than rep stosl on affected machines.

Only exception where you should use rep stosb that I know is haswell.

Perhaps you could adapt this implementation that I used for rep stosq
and change to rep stosl?

.text ;.globl memset_rep8; .type memset_rep8, @function;memset_rep8:; .cfi_startproc
 movzbl  %sil, %eax
 lea (%rdi, %rdx), %rcx
 movabsq $72340172838076673, %rsi
 imulq %rsi, %rax 

 cmp $7, %rdx 
 jbe .Lless_16_bytes
 movq %rax, (%rdi)
 movq %rdi, %rsi
 leaq 8(%rdi), %rdi
 movq %rax, -8(%rcx)
 andq $-8, %rdi
 subq %rdi, %rcx
 shrq $3, %rcx
 rep stosq
 movq %rsi, %rax
 ret

.p2align 4
.Lless_16_bytes:
 movq %rax, %rsi
 movq %rdi, %rax
 testb $4, %dl; jne .Lbetween_4_7_bytes
 cmp $1, %dl; jbe .Lbetween_0_1_byte
 movw %si, -2(%rcx)
 movb %sil, (%rdi)
 ret

.p2align 3
.Lbetween_4_7_bytes:
 movl %esi, (%rdi)
 movl %esi, -4(%rcx)
 ret

.Lbetween_0_1_byte:
 jb .Lzero_byte
 movb %sil, (%rdi)
.Lzero_byte:
 ret

.cfi_endproc ; .size memset_rep8, .-memset_rep8


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]