This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Implementation of some string.h function using SSE2 instructions


Paweł Sikora wrote:
$ cat sse2_strings.c
#include "sse2_strings.h"
#include <emmintrin.h>

static inline __m128i not( __m128i x )
{
        __m128i zero = { 0 };
        __m128i ones = _mm_cmpeq_epi8( zero, zero );
        return _mm_xor_si128( x, ones );
}

int sse2_strcmp( sse2_byte_buffer s1, sse2_byte_buffer s2 )
{
for ( int mask = 0; ; s1 += sizeof( __m128i ), s2 += sizeof( __m128i ) )
{
__m128i m1 = *( __m128i* )( s1 );
__m128i m2 = *( __m128i* )( s2 );
__m128i r1 = not( _mm_cmpeq_epi8( m1, m2 ) );
__m128i zero = { 0 };
__m128i r2 = _mm_cmpeq_epi8( m1, zero );
__m128i r3 = _mm_cmpeq_epi8( m2, zero );
__m128i r = _mm_or_si128( r1, _mm_or_si128( r2, r3 ) );
mask = _mm_movemask_epi8( r );
if ( mask )
{
unsigned index = __builtin_ffs( mask ) - 1;
return ( s1[ index ] - s2[ index ] );
}
}
}

Yesterday I found a bit faster method, however tests I've done show that this function could be **slower** than plain i686 code, specially for short string, i.e. when inner loop is exected 1-3 times. Some speedup, around 2x, appear for relative long strings (inner loop executed 30+ times). Thus I don't think that strcmp could be simply replaced.

Also I've done some test with strlen: for short strings speedup is
around 6x, for longer strings up to 13x.


w.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]