This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Three regex speedups, one of which is actually a bugfix


> Why context : 10? 4 bits are enough IMHO.

I keep confusing contexts and constraints.  The latter are 10 bits wide.

> If fetching preg->newline_anchor contributes to the speedup, then that
> argument should be removed, not changed.

preg->newline_anchor is not needed unless the character is a newline.  Passing preg
instead of preg->newline-anchor saves a memory access for almost all calls to
re_string_context_at.  Fetching it from the re_string_t or from the re_regex_t does
not save anything -- except if you move word_char to the re_string_t as well, so that
you can remove the argument, but that's a somewhat complementary optimization that
can be made in a follow-up patch.

> It is not initialized always, because in the common case there is no
> \<, \>, \b, \B, \w and \W in regular expression and so differentiating
> between word and non-word characters is not needed at all.

But, every match goes down to build_tr_table, which calls IS_WORD_CHAR 256 times and
brings __ctype_b_loc high in the profile.  That's why it's better to always
initialize word_char.  If you use a cached bitset instead of calling isalnum, it
makes no difference if the cached bitset is correct (initialized with isalnum) or
all-zeros (unless you go down into branch prediction which is overkill, isn't it?).
Using a flag to avoid iswalnum calls in IS_WIDE_WORD_CHAR is again a complementary
optimization, which can be done with a separate patch.

Paolo





Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]