This is the mail archive of the
glibc-bugs@sources.redhat.com
mailing list for the glibc project.
[Bug regex/522] New: [regex] charset-based optimizations inhibited outside glibc
- From: "bonzini at gnu dot org" <sourceware-bugzilla at sources dot redhat dot com>
- To: glibc-bugs at sources dot redhat dot com
- Date: 8 Nov 2004 10:07:01 -0000
- Subject: [Bug regex/522] New: [regex] charset-based optimizations inhibited outside glibc
- Reply-to: sourceware-bugzilla at sources dot redhat dot com
The attached patch enables optimizations based on the active charset being
UTF-8, or a superset of ASCII, even when regex is being compiled outside glibc.
While a more complete solution would use gnulib and gettext's locale_charset
function, this would make central tools such as /bin/sed and /bin/awk require
external files. Since regex does not need the charset name, but only to know if
it is UTF-8, we can use simple string matching.
The patch also discovers if the active charset is a superset of ASCII, checking
if btowc(c)==(wchar_t)c for 0<=c<=127. This assumes that wchar_t encoding is
ISO10646, which always seems to be case.
--
Summary: [regex] charset-based optimizations inhibited outside
glibc
Product: glibc
Version: unspecified
Status: NEW
Severity: normal
Priority: P2
Component: regex
AssignedTo: bonzini at gnu dot org
ReportedBy: bonzini at gnu dot org
CC: glibc-bugs-regex at sources dot redhat dot com,glibc-
bugs at sources dot redhat dot com
http://sources.redhat.com/bugzilla/show_bug.cgi?id=522
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.