This is the mail archive of the glibc-bugs-regex@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Bug regex/429] regex hangs on backreferences


> But even with your patch bug-regex11.c with s/#if 0/#if 1/ eats certainly
> more than 10 minutes of CPU time (until I killed it).

Yes.  I managed to run a 7-backreference version, which took a few minutes
before my patch.  True, it does not make the full testcase feasible yet, but
I'll work on it.

> Either there is a better algorithm for many backreferences, or we should
> consider using NFA for patterns where DFA with backtracing is known to
take
> too long.

I think you can do some kind of caching to cut the number of invocations of
calc_dst_limits_pos.  Its implementation is naive.  Complexity is
exponential in N because every epsilon closure visits N backreferences
without any remote hope of succeeding, because no OP_{OPEN,CLOSE}_SUBEXP is
on the epsilon closure.

I have to figure out exactly how the backref cache enters the game, because
adding something ad hoc in regcomp.c does not seem the right way to fix it.
And also, I want to understand which cases are common both in practice (to
avoid slowing down the common case) and in the worst case: I'd like
factor.sed and dc.sed to be sped up by 10% while fixing this bug.  I
certainly hope to bring it down to O(N) in the number of backreferences,
albeit with a pretty big constant in front of it.

Paolo


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]