This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Speed up backreferences


I profiled factor.sed with cachegrind.  Cache utilization is very good,
showing practically no misses.  Time is spent 99.99% in sift_states_bkref,
with only half a dozen lines resulting in literally a few hundred million
instructions (99.9%)!  I attach a small extract of the output with the
execution counts of sift_states_backward and sift_states_bkref.

I noticed that a couple of `if' tests in this routine can easily be hoisted
to just after the tested value is computed.  This is trivial, but it does
save up to 20% on factor.sed when factorizing 143.

I alsi prepared a more complex patch that attempts to do more aggressive
caching.  It does give a bigger performance improvement, but only 25% (5%
better than to the simpler patch) so I am including both because Isamu
probably can do this stuff better than me.  It might even be more effective
to try some kind of hashing based on entry->node and entry->str_idx, since
they are loop invariant.  Knowing the size of the cache would help deciding
if it's worth doing it; but I do know you can afford doing complicated
things in match_ctx_add_entry because it is executed less than 1% of the
time.

Anyway, I am afraid the matcher is losing time chasing pointers and sieving
for valid cache entries rather than effectively looking for a match for the
backreference. :-(

The patches pass the testsuite in GNU sed will-be-4.0 and 3.9x.  They both
apply on CVS regex, the 25% patch not requiring the 20% one to have been
applied previously.

Paolo Bonzini

Attachment: regex.cachegrind
Description: Binary data

Attachment: regex-20-percent-speedup.patch
Description: Binary data

Attachment: regex-25-percent-speedup.patch
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]