This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] powerpc: strcasestr optimization
- From: Steven Munroe <munroesj at linux dot vnet dot ibm dot com>
- To: Rajalakshmi Srinivasaraghavan <raji at linux dot vnet dot ibm dot com>
- Cc: OndÅej BÃlka <neleai at seznam dot cz>, Roland McGrath <roland at hack dot frob dot com>, GNU C Library <libc-alpha at sourceware dot org>, Steve Munroe <sjmunroe at us dot ibm dot com>
- Date: Wed, 10 Jun 2015 14:06:01 -0500
- Subject: Re: [PATCH] powerpc: strcasestr optimization
- Authentication-results: sourceware.org; auth=none
- References: <556C36D8 dot 2070208 at linux dot vnet dot ibm dot com> <20150601122830 dot GA14649 at domone> <1433169407 dot 10235 dot 5 dot camel at sjmunroe-ThinkPad-W500> <20150601162215 dot GA8955 at domone> <1433270823 dot 14958 dot 15 dot camel at sjmunroe-ThinkPad-W500> <20150602210000 dot GB20004 at domone> <1433882432 dot 21101 dot 71 dot camel at sjmunroe-ThinkPad-W500> <20150609210555 dot 3A7CA2C3BDC at topped-with-meat dot com> <1433885184 dot 21101 dot 83 dot camel at sjmunroe-ThinkPad-W500> <20150609214202 dot 5D0612C3BE3 at topped-with-meat dot com> <20150610104842 dot GA4871 at domone> <55788320 dot 5090405 at linux dot vnet dot ibm dot com>
- Reply-to: munroesj at linux dot vnet dot ibm dot com
On Thu, 2015-06-11 at 00:04 +0530, Rajalakshmi Srinivasaraghavan wrote:
>
> On 06/10/2015 04:18 PM, OndÅej BÃlka wrote:
> >
> > The onus for making all this happen is on the person who wants to change the code from its status quo.
> >
> > That would mean that its Rajalakshmi and responsibility to
> > write benchmark that shows its performance improvement. I stated my
> > objections explicitly described which inputs to use.
> >
> > Problem is that Steven failed to do his duty as a powerpc maintainer
> > which is to read and review change, check benchmark inputs and that
> > they are adequate. I could supply really bad implementation of strstr
> > which first checks special case that needle and haystack are periodic
> > abcdef sequences and returns result expected from benchmark.
> >
> > I did just simple request that every freshmen should know and Steven
> > should as powerpc maintainer check how its handled. Its that naive
> > algorithm and proposed algorithm are slow on inputs of form
> >
> > strstr("aaaaa....aaaa","aaaa...aaab");
> >
> > As from your previous mail its their responsibility to add that case for
> > benchmark. A 'solution' on v2 would be to switch into two-way algorithm
> > when size is bigger than 2048. That isn't solution as you still have
> > problem with size 2047. I seen somewhere comment that its security
> > problem when you have website with search function on user comments and
> > implementation internally uses strstr then attacker could just post
> > comment with aaaaa...a and search few times for aa...ab.
> >
Thanks Raja for correcting my misconception. So Raja did responded (in
https://sourceware.org/ml/libc-alpha/2015-06/msg00032.html) with a new
benchmark that we believe should address OndÅej BÃlka concerns.
I apologize if this error on my part cause any undo increases in blood
pressure or heart burn.
> 2048 number is chosen by checking various length inputs and comparing
> the performance. So for 2047 case, there wont be worst performance.
>
> > Roland as its really basic issue I think that its proposer
> > responsibility to check. Also for strstr thread Carlos said about strstr
> > quadratic case:
> >
> >> We should add gnulib's test to the microbenchmarks.
> >
> >
> > Then as I commented that algorithm is flawed its from my experience,
> > Raji's description was:
> >
> > For case 1, I always compare 8 characters at a time using cmpb instruction.
> > instruction.
> > -> perform basic strlen, strnlen, checks. Move haystack based on strchr result.
> > strchr result.
> > -> Load first 8 bytes(taking care of aligned load)from haystack and needle and compare them.
> > needle and compare them.
> > If they are same proceed to next 8 bytes of haystack and needle.
> > Else load next doubleword from haystack.Form a doubleword with last
> > seven characters from first load and first character from
> > second load and compare again with needle. Proceed in a similar way.
> > This gives good improvement.
> >
> Lets not mix the review comments of strstr and strcasestr.(as the
> algorithm is different)
> > My comment was that it wastes time most of time as in common case these
> > already differ in first byte so all these shifts were completely
> > unnecessary, a simple optimization of naive algorithm using
> >
> > while(s=strchr(s,n[0]))
> If I use while(s=strchr(s,n[0]) as mentioned in
> https://sourceware.org/ml/libc-alpha/2015-05/msg00060.html
> the quadratic testcase gives worst peformance. Attached the testcase.
> >
> > would be faster. Steven instead of doing his duty as powerpc maintainer
> > and checking result of benchtest that I send instead tried to bullshit
> > himself by telling that powerpc architecture is too special. Then he
> > asked me do do benchmark instead as stalling tactics.
> >
> > If he did his duty of reading benchmark results there is already
> > evidence. First to notice is there are regressions in original
> > benchmark, like following:
> >
> > Length 30/2, alignment 0/ 9, found: 20.7812 32.2656 13.125 19.7188
> > Length 30/2, alignment 0/ 9, fail: 32.4688 33.9219 36.75 31.4688
> >
> I agree there are few regression cases(79 out of 13442 cases) in
> 'proposed_benchresults.txt' attached in
> https://sourceware.org/ml/libc-alpha/2015-05/msg00276.html.
> This is minimal when compared to the overall performance gain.
>
> > That he should comment, as strstr on average end on 10.6 byte after
> > haystack start its quite relevant.
> >
> > But if he read proposed benchmark where needle and haystack are randomly
> > generated from list of 128 characters then he would easily see it.
> I have used your proposed benchtest, and it always gives improvement for
> strcasestr.Please check the file 'newresults' in
> https://sourceware.org/ml/libc-alpha/2015-06/msg00032.html
> >
> > Here is trimmed version to see pattern. Note that first there are lot of
> > times where performance is less than 13. Thats case where strchr didn't
> > find first character and we just return(generic case now does
> > unnecessary work leading to worse numbers but thats addressed on patch that I send with strchr loop.)
> >
> > Its natural to ask what would happen if you called strchr again. In 120
> > bytes it would probably didn't found second character and we would
> > finish in 26 cycles at most not 111.5
> >
> > I expect that average loop performance would be 13.4375/135 = 0.1 cycles
> > per byte not cycle per byte that this algorithm exhibits after he loses
> > performance gains from first correct strchr call.
> >
> > simple_strstr stupid_strstr __strstr_power7
> > __strstr_ppc
> >
> > Length 4/2, alignment 0/ 3, found: 9.35938 8.09375 7.90625 8.5
> > Length 12/2, alignment 3/ 9, fail: 9.23438 16.5625 7.90625 9.20312
> > Length 24/2, alignment 3/ 9, fail: 9.54688 26.2812 8.32812 10.1875
> > Length 44/4, alignment 9/ 3, found: 13.4688 40.9219 9.59375 13.9688
> > Length 44/4, alignment 9/ 3, fail: 13.4688 40.9531 50.6875 13.9844
> > Length 52/4, alignment 3/ 9, found: 13.5312 47.2812 9.5625 14.7344
> > Length 52/4, alignment 3/ 9, fail: 45.9688 53.625 35.9688 47.5938
> > Length 75/5, alignment 15/ 9, fail: 73.75 67.9375 64.2188 75.9062
> > Length 75/5, alignment 15/15, found: 15.2188 65.9062 10.7344 17.8438
> > Length 75/5, alignment 15/15, fail: 61.5469 68.6562 50.375 63.0312
> > Length 120/8, alignment 15/ 3, fail: 113.281 111.5 103.5 114.406
> > Length 120/8, alignment 15/ 9, found: 21.25 99.1406 12.8438 28.1406
> > Length 135/9, alignment 15/15, found: 23.125 110.734 13.4375 30.7969
> > Length 135/9, alignment 15/15, fail: 147.484 113.781 110.516 165.984
> > Length 168/12, alignment 3/15, found: 31.2344 134.172 13.6719 39.9375
> > Length 168/12, alignment 3/15, fail: 170.766 137.609 131.859 179.859
> > Length 168/12, alignment 9/ 0, found: 100.562 137.922 76.6562 112.578
> > Length 168/12, alignment 9/ 0, fail: 168.703 144.078 144.828 181.203
> > Length 168/12, alignment 9/ 3, found: 129.938 137.703 109.828 136.266
> > Length 168/12, alignment 9/ 3, fail: 178.734 137.875 165.469 183.609
> > Length 465/31, alignment 15/ 0, fail: 73.0469 363.375 32.2969 87.0625
> > Length 465/31, alignment 15/ 3, found: 507.656 373.703 429.797 543.625
> > Length 465/31, alignment 15/ 3, fail: 512.922 373.188 437.047 556.844
> > Length 465/31, alignment 15/ 9, found: 443.969 379.359 392.828 464.75
> > Length 465/31, alignment 15/ 9, fail: 484.312 376.406 423.703 513.672
> > Length 465/31, alignment 15/15, found: 326.859 376 256.625 350.422
> > Length 465/31, alignment 15/15, fail: 512.641 376.203 439.234 545.906
> > Length 131071/16, alignment 0/ 0, found: 116583 108378 122125
> > 127809
> > Length 131071/16, alignment 0/ 0, fail: 118298 108639 122208
> > 128700
> >
> >
>