This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][BZ #12100] strstr with unaligned loads.


On Tue, Aug 27, 2013 at 04:45:00AM -0400, Rich Felker wrote:
> On Tue, Aug 27, 2013 at 10:30:48AM +0200, OndÅej BÃlka wrote:
> > > winning on random testcases is meaningless since it has quadratic
> > > performance in the worst-case.
> > > For strstr random input is always the
> > > easy case.
> > 
> > How good is it on binary alphabeth?
> 
> Two-way is very good on binary alphabet, and on 4-letter alphabet
> (e.g. DNA). The bad character table optimization is rather useless on
> small alphabets, of course, but that's not a core part of the
> algorithm, just an optimization that helps achieve sublinear
> performance in many real-world usages (while maintaining worst-case
> linear performance).
> 
> Note that having an optimized search for the next occurrance of the
> first character or two is also rather useless in a binary alphabet...
>
A binary alphabet tends to be worst case for optimized implementations.
A most trick do not apply there, most shifts tend to be 1 or 2 bytes and
branches tend to be mispredicted. A performance impact changing alphabet from
binary to ternary in two-way is big.
 
> > > > > presumably because musl always uses the bad-character table). There
> > > > > may be additional logic considerations to integrating any optimized
> > > > > "next pair" search with the bad character table; I have not checked
> > > > > this.
> > > > > 
> > > > Bad character is also heuristic that is not needed when pair heuristic
> > > > is used as it would only slow it down. 
> > > 
> > > Bad character table changes the typical runtime from O(n) to O(n/m).
> > > This is HUGE.
> > 
> > You are making  basic mistake of arguing that some algorithm is better
> > in best case.
> 
> My above comments were comparing two-way with bad character table to
> plain two-way. Bad character table does not always help (it just helps
> in many typical real-world cases), but it never hurts more than a
> small O(m) (m = needle length) startup cost. In my implementation, I
> combine this with the operation of computing the needle length. Of
> course strlen is a faster O(m) operation by several times, so one
> could argue that the bad character table still "costs too much", but
> empirically I found my version of two-way (in musl) outperforming
> glibc's version of two-way (not the SSE code in glibc) on many
> reasonable inputs where the needle was 16-31 bytes.
>
> > > > > On Sun, Aug 25, 2013 at 11:07:58AM +0200, OndÅej BÃlka wrote:
> > > > > > This could be solved by standard buy or rent techique which I employ.
> > > > > > For any position p there could be only p+64n+512 comparisons until I
> > > > > > switch to two way algorithm which gives linear complexity.
> > > 
> > > That's a reasonable solution, but I think it would be even better to
> > > integrate your SSE tricks with two-way.
> > >
> > Its not worth doing from my perspective as factorization step cost
> > around 20 cycles per needle byte. In that time I could often find three
> > times that there is no match.
> 
> I think we'd need to see more empirical data to determine this. Your
> switchover threshold is not exactly cheap. If the intent is just to
> give good performance for "typical" (e.g. English text) uses of
> strstr, and otherwise only avoid pathologically bad performance that
> could be used for DoS, then your approach may be reasonable. On the

Indent was get best possible performance for typical cases and
reasonable for all cases. 


> other hand, if you want to support arbitrary reasonable uses of strstr
> (like DNA searches in an entire genome), it's probably not adequate.
> But that's just my semi-educated guess. A real determination would
> require empirical measurement.
> 
You can not beat specialized algorithm (DNA search) without supplying
contextual information to algorithm. When you decide switch based of
only current run data you will slow common case down.
A profile-guided optimization is possible but it would need add
infrastructure to save context info. 



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]