This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][BZ #12100] strstr with unaligned loads.


On Tue, Aug 27, 2013 at 02:16:41PM +0200, OndÅej BÃlka wrote:
> On Tue, Aug 27, 2013 at 04:45:00AM -0400, Rich Felker wrote:
> > On Tue, Aug 27, 2013 at 10:30:48AM +0200, OndÅej BÃlka wrote:
> > > > winning on random testcases is meaningless since it has quadratic
> > > > performance in the worst-case.
> > > > For strstr random input is always the
> > > > easy case.
> > > 
> > > How good is it on binary alphabeth?
> > 
> > Two-way is very good on binary alphabet, and on 4-letter alphabet
> > (e.g. DNA). The bad character table optimization is rather useless on
> > small alphabets, of course, but that's not a core part of the
> > algorithm, just an optimization that helps achieve sublinear
> > performance in many real-world usages (while maintaining worst-case
> > linear performance).
> > 
> > Note that having an optimized search for the next occurrance of the
> > first character or two is also rather useless in a binary alphabet...
> >
> A binary alphabet tends to be worst case for optimized implementations.

It's even worse for unoptimized implementations, since long prefix
matches and periodicity in needles becomes the norm rather than the
pathological case.

> > > Its not worth doing from my perspective as factorization step cost
> > > around 20 cycles per needle byte. In that time I could often find three
> > > times that there is no match.
> > 
> > I think we'd need to see more empirical data to determine this. Your
> > switchover threshold is not exactly cheap. If the intent is just to
> > give good performance for "typical" (e.g. English text) uses of
> > strstr, and otherwise only avoid pathologically bad performance that
> > could be used for DoS, then your approach may be reasonable. On the
> 
> Indent was get best possible performance for typical cases and
> reasonable for all cases. 

I don't think you're _too_ far off from that intent.

Rich


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]