This is the mail archive of the
glibc-bugs-regex@sourceware.org
mailing list for the glibc project.
[Bug regex/13637] New: incorrect match in multi-byte (non-UTF8) string
- From: "leonardo at ngdn dot org" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs-regex at sources dot redhat dot com
- Date: Tue, 31 Jan 2012 12:47:55 +0000
- Subject: [Bug regex/13637] New: incorrect match in multi-byte (non-UTF8) string
- Auto-submitted: auto-generated
http://sourceware.org/bugzilla/show_bug.cgi?id=13637
Bug #: 13637
Summary: incorrect match in multi-byte (non-UTF8) string
Product: glibc
Version: 2.15
Status: NEW
Severity: normal
Priority: P2
Component: regex
AssignedTo: drepper.fsp@gmail.com
ReportedBy: leonardo@ngdn.org
Classification: Unclassified
Created attachment 6186
--> http://sourceware.org/bugzilla/attachment.cgi?id=6186
reg.sh: a script to reproduce the problem
When a special string composed of single and multi-byte characters is passed to
re_search(), the function seems to lose track of which characters are
multi-byte and returns an incorrect match. This seems to be exclusive to the
ja_JP.eucjp locale.
The problem can be reproduced when the following string:
aaa\xb7\xefa\xbf\xb7\xbd\xe8
... is matched against the pattern:
\xb7\xbd
The two bytes in the pattern are respectively "the last byte of the second
multi-byte char" and "the first byte of the third multi-byte char" in the
original string.
The number of "a"s prefixed in the original string seems to make all the
difference here. I could only reproduce the problem when exactly 3 or 4 "a"s
are prefixed. I.e., if you remove one "a" from the prefix of the original
string:
aa\xb7\xefa\xbf\xb7\xbd\xe8
... the problem no longer happens.
I'm attaching a script that reproduces the problem. The 'sed' version I'm using
is compiled with "--without-included-regex", so it should use glibc's regex
functions. Unfortunately I can't affirm yet that the bug is not in sed, but I'm
trying to create a self contained program to demonstrate the problem.
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.