This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

coding neutral regexec to do UTF-8 ranges


Hi

The regexec() function has got some issues as computing UTF-8 ranges.
Since it requires the environment variables to be set like:

LANG=C
LC_ALL=C

My application is not able to apply any gettext translations. Here is
a sample of such an expression used by my application:

static const char *chars_pattern =
"^(([0-9])|(\xC2\xB7)|((\xCC[\x80-\xBF])|(\xCD[\x80-\xAF]))|((\xE2\x80\xBF)|(\xE2\x81\x80)))";

The situation now is as using any UTF-8 encoding on my system. The
expression above causes program failure. Since it does interpret the
ranges as multi-byte sequence. What is definitely wrong in this
situation.

http://www.nongnu.org/gsequencer/

The file using UTF-8 ranges:
http://git.savannah.nongnu.org/cgit/gsequencer.git/tree/ags/lib/ags_turtle.c?h=0.8.x

The main function setting environment variables:
http://git.savannah.nongnu.org/cgit/gsequencer.git/tree/ags/gsequencer_main.c?h=0.8.x

Bests,
Joël


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]