This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Three regex speedups, one of which is actually a bugfix


This is a set of three patches.

The first, regex-speedup-uniq-sed.patch, fixes a bug in the definition
of re_dfastate_t.  I noticed that the uniq.sed testcase in the sed
testsuite had a different profile than the other backreference
stress tests, with re_acquire_state_context accounting for 50% of
the run time due to a quadratic bottleneck.  The reason was that
the searched state's context did not fit in the hash table's context
field, so each call created a new state!  Fixing the definition of
re_dfa_state_t trims 97% of re_acquire_state_context's run time,
and 50% of the total run time, for this particular testcase.

The second, regex-rewrite-node-set-insert.patch, gains 2-3% on every
testcase I tried by rewriting re_node_set_insert.  Doing a binary search
is overkill, the algorithm is still going to be O(n) because of
memmove; the patch implements the insertion sort with a single loop,
reducing the loop and setup overheaps.  Note that the comment
for the function is not exact, as the function expects that the node is
not in the set.

For the third, regex-cache-word-char.patch, I noticed an unused word_char
bitset in regex_t; instead of zapping it, I tried caching the result of
IS_WORD_CHAR, and it resulted in a consistent 3-6% speedup with
__ctype_b_loc disappearing from the profile.  The patch also moves the fetch of
"preg->newline_anchor" in re_string_context_at rather than in the caller, since
"preg" had to be passed nevertheless, and this contributes to the speedup.

Paolo


Attachment: regex-cache-word-char.patch
Description: Binary data

Attachment: regex-rewrite-node-set-insert.patch
Description: Binary data

Attachment: regex-speedup-uniq-sed.patch
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]