This is the mail archive of the
libc-help@sourceware.org
mailing list for the glibc project.
coding neutral regexec to do UTF-8 ranges
- From: Joël Krähemann <jkraehemann at gmail dot com>
- To: libc-help at sourceware dot org
- Date: Fri, 23 Jun 2017 11:50:35 +0200
- Subject: coding neutral regexec to do UTF-8 ranges
- Authentication-results: sourceware.org; auth=none
- Reply-to: jkraehemann-guest at users dot alioth dot debian dot org
Hi
The regexec() function has got some issues as computing UTF-8 ranges.
Since it requires the environment variables to be set like:
LANG=C
LC_ALL=C
My application is not able to apply any gettext translations. Here is
a sample of such an expression used by my application:
static const char *chars_pattern =
"^(([0-9])|(\xC2\xB7)|((\xCC[\x80-\xBF])|(\xCD[\x80-\xAF]))|((\xE2\x80\xBF)|(\xE2\x81\x80)))";
The situation now is as using any UTF-8 encoding on my system. The
expression above causes program failure. Since it does interpret the
ranges as multi-byte sequence. What is definitely wrong in this
situation.
http://www.nongnu.org/gsequencer/
The file using UTF-8 ranges:
http://git.savannah.nongnu.org/cgit/gsequencer.git/tree/ags/lib/ags_turtle.c?h=0.8.x
The main function setting environment variables:
http://git.savannah.nongnu.org/cgit/gsequencer.git/tree/ags/gsequencer_main.c?h=0.8.x
Bests,
Joël