On Thu, Jun 18, 2015 at 03:59:55PM +0530, Rajalakshmi Srinivasaraghavan wrote:
2048 number is chosen by checking various length inputs and
comparing the performance. So for 2047 case, there wont be worst
performance.
What are you talking about? I am saying that aaaa haystack with aaa...ab
needle of size 2047 is worst case for your algorithm performance.
I ran the attached test.c in a loop 10000 times.
proposed patch took 19s and existing code took 20s on power machine.
How could you ran it? You have two problems with it
1. It doesn't compile, its missing final } to end main
2. It doesn't measure strstr at all. Its pure function so gcc optimized
that away.
As when I fixed these two issues just 100 iterations take 1.002s with my
x64 vector implementation and 44.538s on generic implemenation your test
couldn't take just 20s for existing code.
The reason I mentioned them as different is, in strstr() patch the
input is loaded as double word ie. 8 bytes at a time by byte and in
strcasestr its byte by byte load.The input is handled in different
way.
I see now that you wrote basically
long i=0;
n0 = tolower(n[0])
while (s[i])
{
int j=1;
if (tolower(s[i])==n0)
while (n[j] && tolower(n[j]) == tolower(s[i+j]))
j++;
if (n[j]=='\0')
return s+i;
i++;
}
return NULL;
For some reason I skimmed that patch and though that you did obvious
optimization once you have vector strstr. You use vectorized tolower to
convert 8 bytes at once. Example of conversion is in
sysdeps/x86_64/multiarch/strcmp-sse42.S
Second is about average case perforance. There worst case is irrelevant
as you use buy-or-rent techniques to handle it. I sumbitted following
patch with easy way to handle that.
[PATCH] Improve generic strstr performance.
I applied your proposed patches and the benchtests and I could see
proposed patch(__strstr_power7) is better than existing code
(__strstr_ppc).
13363 out of 13442 combinations shows improvement.
Attached benchtest results.
I said use that patch. Next line of my previous mail explains that.
its currently superseeded by generic vector bigraph search but it should
beat your strstr.
So you should do also comparison with that.
Your benchmark is invalid for simple reason, I guard vector patch with:
if (__BYTE_ORDER == __LITTLE_ENDIAN)
{