This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Speed up fnmatch.


Hi, fnmatch on multibyte string is slow for several reasons.
Main reason is that we spend ages converting strings to wide
strings.

Most of time this overhead is not needed as user does a simple search
with no special characters. 

For utf8 these could be checked by strstr as there is only one way how
encode ascii-only sequence. I could generalize this to any encoding for
which this is true.

A strstr will quickly determine that there is no match because there is
no match with start.

I excluded all special characters from documentation and it passes test.

OK to commit?

	* posix/fnmatch.c: Optimize likely case.

diff --git a/posix/fnmatch.c b/posix/fnmatch.c
index 0f26a2e..27aec2e 100644
--- a/posix/fnmatch.c
+++ b/posix/fnmatch.c
@@ -329,7 +329,37 @@ fnmatch (pattern, string, flags)
      int flags;
 {
 # if HANDLE_MULTIBYTE
-  if (__builtin_expect (MB_CUR_MAX, 1) != 1)
+  int multibyte_needed = 0;
+  if (MB_CUR_MAX != 1)
+    {
+      char *encoding = nl_langinfo (CODESET);
+      if (strcmp (encoding, "UTF8"))
+	multibyte_needed = 1;
+    }
+
+  /* ASCII with \+/.*?[{(@! excluded.  */
+  static unsigned char normal[256] = {
+ 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,
+ 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
+
+  if (!multibyte_needed)
+    {
+      char start[16];
+      size_t i;
+      for (i = 0; i < 8 && normal[(unsigned char) pattern[i]]; i++)
+        start[i] = pattern[i];
+      start[i] = 0;
+      char *string2 = strstr (string, start);
+      if (!string2)
+        return FNM_NOMATCH;
+    }
+  if (MB_CUR_MAX != 1)
     {
       mbstate_t ps;
       size_t n;


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]