This is the mail archive of the
glibc-bugs-regex@sourceware.org
mailing list for the glibc project.
[Bug regex/19348] New: re_search is incredibly slow when processing '$' on long lines
- From: "alex_y_xu at yahoo dot ca" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs-regex at sourceware dot org
- Date: Wed, 09 Dec 2015 14:40:18 +0000
- Subject: [Bug regex/19348] New: re_search is incredibly slow when processing '$' on long lines
- Auto-submitted: auto-generated
https://sourceware.org/bugzilla/show_bug.cgi?id=19348
Bug ID: 19348
Summary: re_search is incredibly slow when processing '$' on
long lines
Product: glibc
Version: unspecified
Status: NEW
Severity: normal
Priority: P2
Component: regex
Assignee: unassigned at sourceware dot org
Reporter: alex_y_xu at yahoo dot ca
CC: drepper.fsp at gmail dot com
Target Milestone: ---
$ echo {1..5000000} > file # adjust based on CPU speed
$ time sed -e 's/$/stuff/' file >/dev/null # logical way to append to lines
sed -e 's/$/stuff/' file > /dev/null 2.91s user 0.09s system 99% cpu 3.007
total
$ time sed -e 's/.*/&stuff/' file >/dev/null
sed -e 's/.*/&stuff/' file > /dev/null 1.62s user 0.34s system 99% cpu
1.972 total
musl via busybox sed was tested to be 2x faster in the first case than in the
second.
intuitively, this does not make sense. .* should be slower because it needs to
match the entire string whereas $ can skip to the end of the line (since sed
must already find the new line in order to run the commands).
however, glibc spends an inordinate amount of time inside of
check_halt_state_context, re_state_reconstruct, and re_string_context_at,
according to callgrind.
I am unsure whether this qualifies as a glibc bug or how to fix it, but I think
it is useful to have on the record.
--
You are receiving this mail because:
You are on the CC list for the bug.