This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v3] powerpc: strstr optimization


On Fri, Jul 17, 2015 at 09:13:59AM -0300, Adhemerval Zanella wrote:
> 
> 
> On 17-07-2015 08:53, OndÅej BÃlka wrote:
> > On Fri, Jul 17, 2015 at 04:14:53PM +0530, Rajalakshmi Srinivasaraghavan wrote:
> >>
> >>
> >> On 07/17/2015 06:11 AM, OndÅej BÃlka wrote:
> >>> On Thu, Jul 16, 2015 at 06:05:10PM -0300, Tulio Magno Quites Machado Filho wrote:
> >  >
> >>> And as generic case I many times asked if you ran my generic strstr
> >>> patch. It was lot faster than first two iteration and in practice looks
> >> I have run benchtest against your generic strstr patches
> >> and the results are attached in
> >> https://sourceware.org/ml/libc-alpha/2015-06/msg00788.html
> >> where 12750 out of 13442 combinations shows improvement.
> >>
> > And I was repeatedly telling you that your benchmark is invalid. You
> > compare generic digraph search with power7 strchr. That has two
> > problems.
> > 
> > First one is that I previously described that reason of using digraphs instead
> > of just strchr loop is better performance in more degenerated cases.
> > Your benchmarks just shows that checking for one character is better
> > than for two when probability of that character is very low.
> > 
> > So to test these you need to use different benchmark where you set
> > probabilities of first character to 1/8, 1/16, 1/32, 1/64 and you will
> > see that on some point digraphs beat single character. 
> > 
> > From my previous mail:
> > 
> > "
> > Just use same patch like I send with ((unsigned) rand())%16 + 1 and you
> > will see completely different numbers in benchmark.
> > 
> > This is real problem, not just pathological case as it could easily
> > happen in some programs. If you search english text for 
> > strstr(haystack," the") then on average every fourth character is space.
> > "
> > 
> > As second problem I was repeatedly telling you that you need compare
> > power7 optimized code with power7 one. So either add powerpc
> > instructions to skeleton or use my previous patch. As I said that my
> > patch would beat yours it holds as that applied to strstr v2. Then I
> > realized this problem and said that you should check naive loop which
> > you didn't do.
> > 
> > Following comment from my previous mail still applies:
> > 
> > "
> > My comment was that it wastes time most of time as in common case these
> > already differ in first byte so all these shifts were completely
> > unnecessary, a simple optimization of naive algorithm using
> > 
> > while(s=strchr(s,n[0]))
> > 
> > would be faster.
> > "
> > 
> > Which I also paraphrased here which applies to your v3. 
> > 
> > "
> > Second is about average case perforance. There worst case is irrelevant
> > as you use buy-or-rent techniques to handle it. I sumbitted following
> > patch with easy way to handle that.
> > 
> > [PATCH] Improve generic strstr performance.
> > 
> > its currently superseeded by generic vector bigraph search but it should
> > beat your strstr.
> > "
> >  
> >>> twice faster than this assembly. Thats because strings are generaly
> >>> short. Its bit slower than assembly or large size. Thats just caused by
> >>> using ppc strchr instead power7 one which is slower. So I just used two
> >>> versions.
> >>>
> >>> Results with modified benchmark attached, where I did hack to rename my
> >>> proposal as strstr_power7b while strstr_power7 is current to clearly see
> >>> difference.
> >>>
> >> How did you manage to get both __strstr_power7b and __strstr_power7
> >> results in the same benchtest run? or Did you run two different
> >> benchtests and merge the output?
> >>
> > Trivial.
> > 1. Just add new implementation after patching without deleting old one, name it __strstr_power7b
> > 2. Add __strstr_power7b to ifunc_impl_list
> > 3. Run benchmark
> > 
> 
> I would be very good if we could work on improve the strstr benchmark
> to shows this issues you just raised.

For quadratic case I already send patch in thread. For other issues
microbenchmark isn't sufficient as usual. These inputs are not
completely random. While I could with lot of effort model these as
markov chains it would be wasted work as getting trace is much simpler
and more reliable. 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]