This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] powerpc: strcasestr optimization


On Thu, 2015-06-11 at 00:04 +0530, Rajalakshmi Srinivasaraghavan wrote:
> 
> On 06/10/2015 04:18 PM, OndÅej BÃlka wrote:
> >
> >   The onus for making all this happen is on the person who wants to change the code from its status quo.
> >
> > That would mean that its Rajalakshmi and responsibility to
> > write benchmark that shows its performance improvement. I stated my
> > objections explicitly described which inputs to use.
> >
> > Problem is that Steven failed to do his duty as a powerpc maintainer
> > which is to read and review change, check benchmark inputs and that
> > they are adequate. I could supply really bad implementation of strstr
> > which first checks special case that needle and haystack are periodic
> > abcdef sequences and returns result expected from benchmark.
> >
> > I did just simple request that every freshmen should know and Steven
> > should as powerpc maintainer check how its handled. Its that naive
> > algorithm and proposed algorithm are slow on inputs of form
> >
> > strstr("aaaaa....aaaa","aaaa...aaab");
> >
> > As from your previous mail its their responsibility to add that case for
> > benchmark. A 'solution' on v2 would be to switch into two-way algorithm
> > when size is bigger than 2048. That isn't solution as you still have
> > problem with size 2047. I seen somewhere comment that its security
> > problem when you have website with search function on user comments and
> > implementation internally uses strstr then attacker could just post
> > comment with aaaaa...a and search few times for aa...ab.
> >

Thanks Raja for correcting my misconception. So Raja did responded (in
https://sourceware.org/ml/libc-alpha/2015-06/msg00032.html) with a new
benchmark that we believe should address OndÅej BÃlka concerns.

I apologize if this error on my part cause any undo increases in blood
pressure or heart burn.

> 2048 number is chosen by checking various length inputs and comparing 
> the performance. So for 2047 case, there wont be worst performance.
> 
> > Roland as its really basic issue I think that its proposer
> > responsibility to check. Also for strstr thread Carlos said about strstr
> > quadratic case:
> >
> >> We should add gnulib's test to the microbenchmarks.
> >
> >
> > Then as I commented that algorithm is flawed its from my experience,
> > Raji's description was:
> >
> > For case 1, I always compare 8 characters at a time using cmpb instruction.
> > instruction.
> > -> perform basic strlen, strnlen, checks. Move haystack based on strchr result.
> > strchr result.
> > -> Load first 8 bytes(taking care of aligned load)from haystack and needle and compare them.
> > needle and compare them.
> > If they are same proceed to next 8 bytes of haystack and needle.
> > Else load next doubleword from haystack.Form a doubleword with last
> > seven characters from first load and first character from
> > second load and compare again with needle. Proceed in a similar way.
> > This gives good improvement.
> >
> Lets not mix the review comments of strstr and strcasestr.(as the 
> algorithm is different)
> > My comment was that it wastes time most of time as in common case these
> > already differ in first byte so all these shifts were completely
> > unnecessary, a simple optimization of naive algorithm using
> >
> > while(s=strchr(s,n[0]))
> If I use while(s=strchr(s,n[0]) as mentioned in
> https://sourceware.org/ml/libc-alpha/2015-05/msg00060.html
> the quadratic testcase gives worst peformance. Attached the testcase.
> >
> > would be faster. Steven instead of doing his duty as powerpc maintainer
> > and checking result of benchtest that I send instead tried to bullshit
> > himself by telling that powerpc architecture is too special. Then he
> > asked me do do benchmark instead as stalling tactics.
> >
> > If he did his duty of reading benchmark results there is already
> > evidence. First to notice is there are regressions in original
> > benchmark, like following:
> >
> > Length   30/2, alignment  0/ 9, found:  20.7812 32.2656 13.125  19.7188
> > Length   30/2, alignment  0/ 9, fail:   32.4688 33.9219 36.75   31.4688
> >
> I agree there are few regression cases(79 out of 13442 cases) in 
> 'proposed_benchresults.txt' attached in
> https://sourceware.org/ml/libc-alpha/2015-05/msg00276.html.
> This is minimal when compared to the overall performance gain.
> 
> > That he should comment, as strstr on average end on 10.6 byte after
> > haystack start its quite relevant.
> >
> > But if he read proposed benchmark where needle and haystack are randomly
> > generated from list of 128 characters then he would easily see it.
> I have used your proposed benchtest, and it always gives improvement for 
> strcasestr.Please check the file 'newresults' in 
> https://sourceware.org/ml/libc-alpha/2015-06/msg00032.html
> >
> > Here is trimmed version to see pattern. Note that first there are lot of
> > times where performance is less than 13. Thats case where strchr didn't
> > find first character and we just return(generic case now does
> > unnecessary work leading to worse numbers but thats addressed on patch that I send with strchr loop.)
> >
> > Its natural to ask what would happen if you called strchr again. In 120
> > bytes it would probably didn't found second character and we would
> > finish in 26 cycles at most not 111.5
> >
> > I expect that average loop performance would be 13.4375/135 = 0.1 cycles
> > per byte not cycle per byte that this algorithm exhibits after he loses
> > performance gains from first correct strchr call.
> >
> >                          simple_strstr   stupid_strstr   __strstr_power7
> > __strstr_ppc
> >
> > Length    4/2, alignment  0/ 3, found:  9.35938 8.09375 7.90625 8.5
> > Length   12/2, alignment  3/ 9, fail:   9.23438 16.5625 7.90625 9.20312
> > Length   24/2, alignment  3/ 9, fail:   9.54688 26.2812 8.32812 10.1875
> > Length   44/4, alignment  9/ 3, found:  13.4688 40.9219 9.59375 13.9688
> > Length   44/4, alignment  9/ 3, fail:   13.4688 40.9531 50.6875 13.9844
> > Length   52/4, alignment  3/ 9, found:  13.5312 47.2812 9.5625  14.7344
> > Length   52/4, alignment  3/ 9, fail:   45.9688 53.625  35.9688 47.5938
> > Length   75/5, alignment 15/ 9, fail:   73.75   67.9375 64.2188 75.9062
> > Length   75/5, alignment 15/15, found:  15.2188 65.9062 10.7344 17.8438
> > Length   75/5, alignment 15/15, fail:   61.5469 68.6562 50.375  63.0312
> > Length  120/8, alignment 15/ 3, fail:   113.281 111.5   103.5   114.406
> > Length  120/8, alignment 15/ 9, found:  21.25   99.1406 12.8438 28.1406
> > Length  135/9, alignment 15/15, found:  23.125  110.734 13.4375 30.7969
> > Length  135/9, alignment 15/15, fail:   147.484 113.781 110.516 165.984
> > Length  168/12, alignment  3/15, found: 31.2344 134.172 13.6719 39.9375
> > Length  168/12, alignment  3/15, fail:  170.766 137.609 131.859 179.859
> > Length  168/12, alignment  9/ 0, found: 100.562 137.922 76.6562 112.578
> > Length  168/12, alignment  9/ 0, fail:  168.703 144.078 144.828 181.203
> > Length  168/12, alignment  9/ 3, found: 129.938 137.703 109.828 136.266
> > Length  168/12, alignment  9/ 3, fail:  178.734 137.875 165.469 183.609
> > Length  465/31, alignment 15/ 0, fail:  73.0469 363.375 32.2969 87.0625
> > Length  465/31, alignment 15/ 3, found: 507.656 373.703 429.797 543.625
> > Length  465/31, alignment 15/ 3, fail:  512.922 373.188 437.047 556.844
> > Length  465/31, alignment 15/ 9, found: 443.969 379.359 392.828 464.75
> > Length  465/31, alignment 15/ 9, fail:  484.312 376.406 423.703 513.672
> > Length  465/31, alignment 15/15, found: 326.859 376     256.625 350.422
> > Length  465/31, alignment 15/15, fail:  512.641 376.203 439.234 545.906
> > Length 131071/16, alignment  0/ 0, found:       116583  108378  122125
> > 127809
> > Length 131071/16, alignment  0/ 0, fail:        118298  108639  122208
> > 128700
> >
> >
> 



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]