This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[PATCH] Speed up fnmatch.
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: libc-alpha at sourceware dot org
- Date: Wed, 9 Oct 2013 19:43:59 +0200
- Subject: [PATCH] Speed up fnmatch.
- Authentication-results: sourceware.org; auth=none
Hi, fnmatch on multibyte string is slow for several reasons.
Main reason is that we spend ages converting strings to wide
strings.
Most of time this overhead is not needed as user does a simple search
with no special characters.
For utf8 these could be checked by strstr as there is only one way how
encode ascii-only sequence. I could generalize this to any encoding for
which this is true.
A strstr will quickly determine that there is no match because there is
no match with start.
I excluded all special characters from documentation and it passes test.
OK to commit?
* posix/fnmatch.c: Optimize likely case.
diff --git a/posix/fnmatch.c b/posix/fnmatch.c
index 0f26a2e..27aec2e 100644
--- a/posix/fnmatch.c
+++ b/posix/fnmatch.c
@@ -329,7 +329,37 @@ fnmatch (pattern, string, flags)
int flags;
{
# if HANDLE_MULTIBYTE
- if (__builtin_expect (MB_CUR_MAX, 1) != 1)
+ int multibyte_needed = 0;
+ if (MB_CUR_MAX != 1)
+ {
+ char *encoding = nl_langinfo (CODESET);
+ if (strcmp (encoding, "UTF8"))
+ multibyte_needed = 1;
+ }
+
+ /* ASCII with \+/.*?[{(@! excluded. */
+ static unsigned char normal[256] = {
+ 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,
+ 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
+
+ if (!multibyte_needed)
+ {
+ char start[16];
+ size_t i;
+ for (i = 0; i < 8 && normal[(unsigned char) pattern[i]]; i++)
+ start[i] = pattern[i];
+ start[i] = 0;
+ char *string2 = strstr (string, start);
+ if (!string2)
+ return FNM_NOMATCH;
+ }
+ if (MB_CUR_MAX != 1)
{
mbstate_t ps;
size_t n;