This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v3] Improve performance of strstr


On Oct 18 17:29, Wilco Dijkstra wrote:
> v3: Add support for read ahead using strnlen, giving an additional 25% speedup
> on large inputs (both short and long needles).
> 
> This patch significantly improves performance of strstr by using Sunday's
> Quick-Search algorithm.  Due to its simplicity it has the best average
> performance of string matching algorithms on almost all inputs.  It uses a
> bad-character shift table to skip past mismatches.
> 
> The needle length is limited to 254 - this reduces the shift table memory 
> 4 to 8 times, lowering preprocessing overhead and minimizing cache effects.
> The limit also implies its worst-case performance is linear.
> 
> Larger needles are processed by the Two-Way algorithm.  The macro AVAILABLE
> has been improved to use strnlen to read the input in chunks.  This results
> in a 2.5 times speedup for large needles, reducing the performance drop when
> the Quick-Search algorithm can't be used.
> 
> The code for 1-4 byte needles has been simplified and now uses unsigned
> char.  Since the optimized code relies on 8-bit chars, we defer to the
> size-optimized implementation if CHAR_BIT > 8.
> 
> The performance gain of finding a set of randomly chosen words of size 8 in 
> 256 bytes of English text is 14 times on AArch64. For longer haystacks the
> gain is well over 20 times.
> 
> The size-optimized strstr has also been rewritten from scratch to improve
> performance.  On the same test the performance gain is 69%.
> 
> Tested against GLIBC testsuite, randomized tests and the GNULIB strstr test
> (https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-strstr.c).
> 
> --

Pushed.


Thanks,
Corinna

-- 
Corinna Vinschen
Cygwin Maintainer
Red Hat

Attachment: signature.asc
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]