This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug libc/5514] New: memmem is O(n^2), but should be O(n)
- From: "ebb9 at byu dot net" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sources dot redhat dot com
- Date: 20 Dec 2007 13:30:13 -0000
- Subject: [Bug libc/5514] New: memmem is O(n^2), but should be O(n)
- Reply-to: sourceware-bugzilla at sourceware dot org
The worst-case complexity of memmem is currently quadratic, O(m*n) where m is
the length of the haystack and n the length of the needle. The worst case
occurs when the distinguishing feature of the needle is at the end, the haystack
consists of repetitions of the needle's prefix, and the needle is either not
present or occurs late in the haystack. Since each iteration examines the bulk
of the needle, but only advances one byte per iteration, you have quadratic
performance. Using the Knuth-Morris-Pratt algorithm or Boyer-Moore algorithm
would guarantee worst-case complexity of O(m+n), at the expense of a memory
allocation of O(n).
Gnulib recently converted its memmem implementation to use KMP, along with an
allocation amortization that avoids KMP until after a bounded number of failed
naive iterations, and falling back to the naive algorithm when KMP cannot
allocate enough memory.
http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/memmem.c;h=69d2edc;hb=fc068cf
It looks like strstr could also benefit from this algorithmic speedup.
--
Summary: memmem is O(n^2), but should be O(n)
Product: glibc
Version: 2.4
Status: NEW
Severity: normal
Priority: P2
Component: libc
AssignedTo: drepper at redhat dot com
ReportedBy: ebb9 at byu dot net
CC: glibc-bugs at sources dot redhat dot com
http://sourceware.org/bugzilla/show_bug.cgi?id=5514
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.