This is the mail archive of the glibc-bugs-regex@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug regex/19348] New: re_search is incredibly slow when processing '$' on long lines

From: "alex_y_xu at yahoo dot ca" <sourceware-bugzilla at sourceware dot org>
To: glibc-bugs-regex at sourceware dot org
Date: Wed, 09 Dec 2015 14:40:18 +0000
Subject: [Bug regex/19348] New: re_search is incredibly slow when processing '$' on long lines
Auto-submitted: auto-generated

https://sourceware.org/bugzilla/show_bug.cgi?id=19348

            Bug ID: 19348
           Summary: re_search is incredibly slow when processing '$' on
                    long lines
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: regex
          Assignee: unassigned at sourceware dot org
          Reporter: alex_y_xu at yahoo dot ca
                CC: drepper.fsp at gmail dot com
  Target Milestone: ---

$ echo {1..5000000} > file # adjust based on CPU speed
    $ time sed -e 's/$/stuff/' file >/dev/null # logical way to append to lines
    sed -e 's/$/stuff/' file > /dev/null  2.91s user 0.09s system 99% cpu 3.007
total
    $ time sed -e 's/.*/&stuff/' file >/dev/null
    sed -e 's/.*/&stuff/' file > /dev/null  1.62s user 0.34s system 99% cpu
1.972 total

musl via busybox sed was tested to be 2x faster in the first case than in the
second.

intuitively, this does not make sense. .* should be slower because it needs to
match the entire string whereas $ can skip to the end of the line (since sed
must already find the new line in order to run the commands).

however, glibc spends an inordinate amount of time inside of
check_halt_state_context, re_state_reconstruct, and re_string_context_at,
according to callgrind.

I am unsure whether this qualifies as a glibc bug or how to fix it, but I think
it is useful to have on the record.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Follow-Ups:
- [Bug regex/19348] re_search matches $ much slower than .*
  - From: alex_y_xu at yahoo dot ca

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]