This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] powerpc: strstr optimization

From: Rajalakshmi Srinivasaraghavan <raji at linux dot vnet dot ibm dot com>
To: OndÅej BÃlka <neleai at seznam dot cz>
Cc: libc-alpha at sourceware dot org, sjmunroe at us dot ibm dot com
Date: Wed, 06 May 2015 18:58:35 +0530
Subject: Re: [PATCH] powerpc: strstr optimization
Authentication-results: sourceware.org; auth=none
References: <1429254584-124997-1-git-send-email-raji at linux dot vnet dot ibm dot com> <20150501120510 dot GB2203 at domone> <554704A1 dot 3010007 at linux dot vnet dot ibm dot com> <20150505211414 dot GA18085 at domone>



On 05/06/2015 02:44 AM, OndÅej BÃlka wrote:


It does. AFAIR for practical strings on x64 its only around 3 times slower than
theoretical lower bound.

A lower bound is strlen call as you must find terminating null when
needle isn't present.

I hope that main idea was clear and you will
use it. Code I send is complex as it contains ten different speedups
that improve performance.

In my optimization there are totally three branches.
case 1# if needle len >= 8 (common case)
case 2# if needle < 8 (common case)
case 3# For page boundary cross haystack. (rare case)

Checking if its case 1 or 2 is relatively costy as in practice haystacks
tend to be short. Most apps that use strstr have 90% call haystack less than 64 bytes long.

You don't need to know needle size if you know that it doesn't contain
first pair. Also its rare that it contains first pair of characters twice.

strlen of needle is needed in all the cases so as to make sure haystackis longer than needleas part of initial check before proceeding to search. Once this check ispassed, strchr is

called to find the  position of first character.

For case 1, I always compare 8 characters at a time using cmpb instruction.
-> perform basic strlen, strnlen, checks. Move haystack based on
strchr result.
-> Load first 8 bytes(taking care of aligned load)from haystack and
needle and compare them.
If they are same proceed to next 8 bytes of haystack and needle.
Else load next doubleword from haystack.Form a doubleword with last
seven characters from first load and first character from
second load and compare again with needle. Proceed in a similar way.
This gives good improvement.

Thats looks worse than naive algorithm on practical strings. Most
instructions you do are useless as you have mismatch at first character.

Following function should beat it.

char *strstr (char *s, char *n)
{
   if (n[0] == 0)
     return s;
   while (s = strchr (s, n[0]))
     {
       long i;
       for (i=1; n[i] && n[i] == s[i]; i++);
       if (n[i] == 0)
         return s;
       s++;
     }
}

Calling strchr in a loop is not showing improvement when compared tothe proposed patch.


Or use musl strstr idea. While implementation is terrible as it in
practice unnecessary spends 90% time on preprocessing their loop is
relatively good for portable implementation.


--
Thanks
Rajalakshmi S

Follow-Ups:
- Re: [PATCH] powerpc: strstr optimization
  - From: Steven Munroe
- Re: [PATCH] powerpc: strstr optimization
  - From: OndÅej BÃlka

References:
- Re: [PATCH] powerpc: strstr optimization
  - From: OndÅej BÃlka
- Re: [PATCH] powerpc: strstr optimization
  - From: Rajalakshmi Srinivasaraghavan
- Re: [PATCH] powerpc: strstr optimization
  - From: OndÅej BÃlka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]