This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC] i386 memcmp implementation


Here is a memcmp implementation for the i386.  From timing the test program
it looks faster than the GCC builtin memcmp (a simple "repe cmpsb") and than
the glibc implementation of memcmp (also a simple "repe cmpsb" but with
additional call/return overhead).  Though I have not timed it, it should
also be a lot faster than sysdeps/generic/memcmp.c because this has a
register pressure that makes it impossible to make it fast on the x86
architecture.

utente@engineer:~/esperimenti$ gcc -g -O3 -fno-builtin test.c memcmp.S
utente@engineer:~/esperimenti$ time ./a.out

real    0m0.088s
user    0m0.090s
sys     0m0.000s

utente@engineer:~/esperimenti$ gcc -g -O3 -fno-builtin test.c
utente@engineer:~/esperimenti$ time ./a.out

real    0m0.108s
user    0m0.100s
sys     0m0.010s

utente@engineer:~/esperimenti$ gcc -g -O3 test.c
utente@engineer:~/esperimenti$ time ./a.out

real    0m0.102s
user    0m0.100s
sys     0m0.010s



Notes:

1) the ideas are the same behind memcmp.c, but the implementation was
heavily simplified (no loop unrolling, use of the shrdl instruction even
though it might be suboptimal on the i586) to decrease register pressure.

2) this is not meant to be optimized for a particular arch, so it should be
good for sysdeps/i386/i686.  By taking a more careful look at pairing
instructions, it can easily be adapted to work well for sysdeps/i386/i586
too.

3) I can send papers, add headers, and do more things that are like a `real'
patch if it is accepted.

4) If this is accepted, I would disable inlining memcmp in GCC because it is
a lot slower.  It would also be useful to remove the memcmp optimization in
bits/string.h and bits/string2.h, or to redo it so that the first few bytes
are compared and then the real meat is still done with the faster algorithm.

5) The improvement (20%) is consistent across several runs, and also with
the performance pitfall that I saw when using memcmp in the regex routines.
So the current implementation of memcmp for i386 architectures is decidedly
suboptimal.

Paolo Bonzini


Attachment: memcmp.S
Description: Binary data

Attachment: test.c
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]