This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] powerpc: strstr optimization




On 05/06/2015 02:44 AM, OndÅej BÃlka wrote:

It does. AFAIR for practical strings on x64 its only around 3 times slower than
theoretical lower bound.

A lower bound is strlen call as you must find terminating null when
needle isn't present.

I hope that main idea was clear and you will
use it. Code I send is complex as it contains ten different speedups
that improve performance.


In my optimization there are totally three branches.
case 1# if needle len >= 8 (common case)
case 2# if needle < 8 (common case)
case 3# For page boundary cross haystack. (rare case)

Checking if its case 1 or 2 is relatively costy as in practice haystacks
tend to be short. Most apps that use strstr have 90% call haystack less than 64 bytes long.

You don't need to know needle size if you know that it doesn't contain
first pair. Also its rare that it contains first pair of characters twice.
strlen of needle is needed in all the cases so as to make sure haystack is longer than needle as part of initial check before proceeding to search. Once this check is passed, strchr is
called to find the  position of first character.


For case 1, I always compare 8 characters at a time using cmpb instruction.
-> perform basic strlen, strnlen, checks. Move haystack based on
strchr result.
-> Load first 8 bytes(taking care of aligned load)from haystack and
needle and compare them.
If they are same proceed to next 8 bytes of haystack and needle.
Else load next doubleword from haystack.Form a doubleword with last
seven characters from first load and first character from
second load and compare again with needle. Proceed in a similar way.
This gives good improvement.

Thats looks worse than naive algorithm on practical strings. Most
instructions you do are useless as you have mismatch at first character.

Following function should beat it.

char *strstr (char *s, char *n)
{
   if (n[0] == 0)
     return s;
   while (s = strchr (s, n[0]))
     {
       long i;
       for (i=1; n[i] && n[i] == s[i]; i++);
       if (n[i] == 0)
         return s;
       s++;
     }
}
Calling strchr in a loop is not showing improvement when compared to the proposed patch.

Or use musl strstr idea. While implementation is terrible as it in
practice unnecessary spends 90% time on preprocessing their loop is
relatively good for portable implementation.



--
Thanks
Rajalakshmi S


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]