This is the mail archive of the
newlib@sourceware.org
mailing list for the newlib project.
Re: [PATCH/RFA] Internationalize ctype functionality
On Mar 26 21:55, Howland Craig D (Craig) wrote:
> 1) Wouldn't it be cleaner, especially in files in which it happens more
> than once, to replace things like:
>
> #ifdef __CYGWIN__
> char __declspec(dllexport) *__ctype_ptr__ = _ctype_b + 127;
> [...]
> char DLLEXPORT *__ctype_ptr__ = _ctype_b + 127;
>
> (given that the only differences on the lines is the dll attribute)?
> This would not only make ctype_.c more readable, but more maintainable.
I didn't change that. It's just as it was in the original code.
It's Jeff call.
> 2) I don't entirely understand the following, possibly due to my lack
> of knowledge on the topic:
> >- The toupper and tolower functions are now charset independent. If
> the
> > character is > 0x7f, it will be converted to wide char and then
> > towupper/towlower is called on it.
> > This is only a temporary solution. It works, but it's a bit sedated
> > for native charaters. In the long run we should rather add
> > upper/lower-case transformation tables, similar to the new ctype
> > character class tables.
> toupper and tolower operate on regular characters, which have a defined
> range of unsigned-char-allowed-values and EOF. How can it work to
> change it to a wide character except in the degenerate case when wide
> characters are the same width as regular characters?
What the code does is this:
if (mbtowc (&wc, s, 1) >= 0
- If the character is convertable to a wide char
&& wctomb (s, (wchar_t) towupper ((wint_t) wc)) == 1)
- And the towupper (or towlower) of the result can be converted back
to a single byte char
c = s[0];
- Use it. The wide char conversion is lossless. If the conversion
works and the result is a singlebyte char, it's used, otherwise c is
returned. I don't see a problem with this approach.
> That is, should
> it be gated by a check that MB_CUR_MAX == 1?
I'm not quite sure. While POSIX state that the incoming int must be
representable as an unsigned char, it doesn't explicitely state that
this unsigned char must be from a singlebyte charset.
OTOH, all the other isalpha/isprint/etc functions only work for
singlebyte chars anyway. And if we start using transition tables
at one point...
> 3) (both toupper.c and tolower.c do this)
> [...]
> + if ((unsigned char) c <= 0x7f)
> + return isupper (c) ? c - 'A' + 'a' : c;
> + char s[8] = { c, '\0' };
> + wchar_t wc;
> + if (mbtowc (&wc, s, 1) >= 0
> + && wctomb (s, (wchar_t) towlower ((wint_t) wc)) == 1)
> + c = s[0];
>
> The char s[8] and wchar_t lines will not work, coming in the middle
> of a block, unless the compiler is C99 compliant. Does Newlib assume
> (require) C99 compilers? (I hope so, but don't think so.)
That was an oversight. I created that code for Cygwin originally and it
uses what gcc provides. I'll fixed that together with the constant 8 in
s[8] which should actually be MB_LEN_MAX, and an additional check for
EOF (which is in the domain for tolower/toupper per POSIX). The new
patch for tolower/toupper looks like this now:
Index: libc/ctype/tolower.c
===================================================================
RCS file: /cvs/src/src/newlib/libc/ctype/tolower.c,v
retrieving revision 1.2
diff -u -p -r1.2 tolower.c
--- libc/ctype/tolower.c 28 Oct 2005 21:33:22 -0000 1.2
+++ libc/ctype/tolower.c 27 Mar 2009 09:55:42 -0000
@@ -46,10 +46,31 @@ No supporting OS subroutines are require
#include <_ansi.h>
#include <ctype.h>
+#ifdef _MB_CAPABLE
+#include <limits.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <wctype.h>
+#include <wchar.h>
+#endif
#undef tolower
int
_DEFUN(tolower,(c),int c)
{
- return isupper(c) ? (c) - 'A' + 'a' : c;
+#ifdef _MB_CAPABLE
+ if ((unsigned char) c <= 0x7f)
+ return isupper (c) ? c - 'A' + 'a' : c;
+ else if (c != EOF && MB_CUR_MAX == 1)
+ {
+ char s[MB_LEN_MAX] = { c, '\0' };
+ wchar_t wc;
+ if (mbtowc (&wc, s, 1) >= 0
+ && wctomb (s, (wchar_t) towlower ((wint_t) wc)) == 1)
+ c = s[0];
+ }
+ return c;
+#else
+ return isupper(c) ? (c) - 'A' + 'a' : c;
+#endif
}
Index: libc/ctype/toupper.c
===================================================================
RCS file: /cvs/src/src/newlib/libc/ctype/toupper.c,v
retrieving revision 1.2
diff -u -p -r1.2 toupper.c
--- libc/ctype/toupper.c 28 Oct 2005 21:33:22 -0000 1.2
+++ libc/ctype/toupper.c 27 Mar 2009 09:55:42 -0000
@@ -45,10 +45,31 @@ No supporting OS subroutines are require
#include <_ansi.h>
#include <ctype.h>
+#ifdef _MB_CAPABLE
+#include <limits.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <wctype.h>
+#include <wchar.h>
+#endif
#undef toupper
int
_DEFUN(toupper,(c),int c)
{
- return islower(c) ? c - 'a' + 'A' : c;
+#ifdef _MB_CAPABLE
+ if ((unsigned char) c <= 0x7f)
+ return islower (c) ? c - 'a' + 'A' : c;
+ else if (c != EOF && MB_CUR_MAX == 1)
+ {
+ char s[MB_LEN_MAX] = { c, '\0' };
+ wchar_t wc;
+ if (mbtowc (&wc, s, 1) >= 0
+ && wctomb (s, (wchar_t) towupper ((wint_t) wc)) == 1)
+ c = s[0];
+ }
+ return c;
+#else
+ return islower (c) ? c - 'a' + 'A' : c;
+#endif
}
Corinna
--
Corinna Vinschen
Cygwin Project Co-Leader
Red Hat