This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] powerpc: strcasestr optimization

From: Rajalakshmi Srinivasaraghavan <raji at linux dot vnet dot ibm dot com>
To: OndÅej BÃlka <neleai at seznam dot cz>
Cc: libc-alpha at sourceware dot org, Steve Munroe <sjmunroe at us dot ibm dot com>
Date: Wed, 24 Jun 2015 12:47:23 +0530
Subject: Re: [PATCH] powerpc: strcasestr optimization
Authentication-results: sourceware.org; auth=none
References: <1433270823 dot 14958 dot 15 dot camel at sjmunroe-ThinkPad-W500> <20150602210000 dot GB20004 at domone> <1433882432 dot 21101 dot 71 dot camel at sjmunroe-ThinkPad-W500> <20150609210555 dot 3A7CA2C3BDC at topped-with-meat dot com> <1433885184 dot 21101 dot 83 dot camel at sjmunroe-ThinkPad-W500> <20150609214202 dot 5D0612C3BE3 at topped-with-meat dot com> <20150610104842 dot GA4871 at domone> <55788320 dot 5090405 at linux dot vnet dot ibm dot com> <20150616072122 dot GA15725 at domone> <55829DA3 dot 1010108 at linux dot vnet dot ibm dot com> <20150618125838 dot GA27174 at domone>



On 06/18/2015 06:28 PM, OndÅej BÃlka wrote:

On Thu, Jun 18, 2015 at 03:59:55PM +0530, Rajalakshmi Srinivasaraghavan wrote:

2048 number is chosen by checking various length inputs and
comparing the performance. So for 2047 case, there wont be worst
performance.

What are you talking about? I am saying that aaaa haystack with aaa...ab
  needle of size 2047 is worst case for your algorithm performance.

I ran the attached test.c in a loop 10000 times.
proposed patch took 19s and existing code took 20s on power machine.

How could you ran it? You have two problems with it
1. It doesn't compile, its missing final } to end main
2. It doesn't measure strstr at all. Its pure function so gcc optimized
that away.

As when I fixed these two issues just 100 iterations take 1.002s with my
x64 vector implementation and 44.538s on generic implemenation your test
couldn't take just 20s for existing code.

The reason I mentioned them as different is, in strstr() patch the
input is loaded as double word ie. 8 bytes at a time by byte and in
strcasestr its byte by byte load.The input is handled in different
way.


I see now that you wrote basically

long i=0;
n0 = tolower(n[0])
while (s[i])
   {
     int j=1;
     if (tolower(s[i])==n0)
       while (n[j] && tolower(n[j]) == tolower(s[i+j]))
         j++;
     if (n[j]=='\0')
       return s+i;
     i++;
   }
return NULL;

For some reason I skimmed that patch and though that you did obvious
optimization once you have vector strstr. You use vectorized tolower to
convert 8 bytes at once. Example of conversion is in
  sysdeps/x86_64/multiarch/strcmp-sse42.S

For strcasestr() optimization I am trying to use tolower to convert 8bytes and will send the modified patch.

For strstr() I have sent the new version
https://sourceware.org/ml/libc-alpha/2015-06/msg00788.html


Second is about average case perforance. There worst case is irrelevant
as you use buy-or-rent techniques to handle it. I sumbitted following
patch with easy way to handle that.

[PATCH] Improve generic strstr performance.

I applied your proposed patches and the benchtests and I could see
proposed patch(__strstr_power7) is better than existing code
(__strstr_ppc).
13363 out of 13442 combinations shows improvement.
Attached benchtest results.

I said use that patch. Next line of my previous mail explains that.

its currently superseeded by generic vector bigraph search but it should
beat your strstr.


So you should do also comparison with that.


Your benchmark is invalid for simple reason, I guard vector patch with:

if (__BYTE_ORDER == __LITTLE_ENDIAN)
   {

Modified new version works better for both LE and BE. Results attachedin https://sourceware.org/ml/libc-alpha/2015-06/msg00788.html

So as powerpc is big endian my patch does nothing at all and you are
just measuring generic strstr again. If you want test bigraph search to
that on power64le.

Used your proposed code on powerpc64le and observed that quadratic worstcase still takes more time with your proposed code.It took more than 2 minutes when the needle size is 100000 and haystackis of size 1000000.

What overall performance gain? Did you try to find application and
measure performance difference before and after? Big sizes are not very
relevant in practice. You look behind 32'th byte of haystack only in
1.5% of calls so 10% regression at these sizes is much worse that 30%
gain on 256 byte range. In my profiler a simple collection shows
following

No I don't check on any specific application and I use the glibc
benchtests for performance measurement which is considered to be the
standard way of performance measurement before proposing a patch.


Then you have problem that you don't know which data are important and
which are not.

Proof by numbers is quite suspect. Input ranges are completely
arbitrary. So I could decide that I want to check for sizes 1..64 all
possible 64byte needle and haystack alignments. That generates 262144
test cases which should be enough.


--
Thanks
Rajalakshmi S

Follow-Ups:
- Re: [PATCH] powerpc: strcasestr optimization
  - From: OndÅej BÃlka

References:
- Re: [PATCH] powerpc: strcasestr optimization
  - From: Steven Munroe
- Re: [PATCH] powerpc: strcasestr optimization
  - From: OndÅej BÃlka
- Re: [PATCH] powerpc: strcasestr optimization
  - From: Steven Munroe
- Re: [PATCH] powerpc: strcasestr optimization
  - From: Roland McGrath
- Re: [PATCH] powerpc: strcasestr optimization
  - From: Steven Munroe
- Re: [PATCH] powerpc: strcasestr optimization
  - From: Roland McGrath
- Re: [PATCH] powerpc: strcasestr optimization
  - From: OndÅej BÃlka
- Re: [PATCH] powerpc: strcasestr optimization
  - From: Rajalakshmi Srinivasaraghavan
- Re: [PATCH] powerpc: strcasestr optimization
  - From: OndÅej BÃlka
- Re: [PATCH] powerpc: strcasestr optimization
  - From: Rajalakshmi Srinivasaraghavan
- Re: [PATCH] powerpc: strcasestr optimization
  - From: OndÅej BÃlka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]