This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH/RFA] Extended wctomb/mbtowc conversion and more stuff


Corinna Vinschen wrote:
Ok,

this is the new patch about the extended wctomb_r/mbtowc_r stuff.

It got more complicated because of various requirements in Cygwin.
One of them is the requirement to be able to call mbtowc for a charset
other than the current locale charset.

I guess the best I can do is to start to explain what this patch is
doing and explain the details while going aloing with the flow.

- Set the default chrset to "ASCII", rather than ISO-8859-1.

  This change has two reasons.  First of all, POSIX requires that
  the default setting for all applications which don't explicitely
  call setlocale is the "POSIX" or "C" locale.  In this locale,
  only ASCII characters are supported.  This is also (correctly) the case
  in the ctype functions in newlib.  Only the charset is wrongly
  set to "ISO-8859-1".  Wrong in POSIX terms, and wrong because it's
  not really supported by default.

- Add support for correct ISO-8859-x multibyte<->wide char conversion.

- If the input to setlocale is "C" or "POSIX", set the charset to
  "ASCII" now.

- Add support for all default ANSI and OEM codepages used on Windows,
  CP437, CP720, CP737, CP775, CP850, CP852, CP855, CP857, CP858, CP862,
  CP866, CP874, CP1125, CP1250, CP1251, CP1252, CP1253, CP1254, CP1255,
  CP1256, CP1257, CP1258.

  This new charset support require a couple of new character conversion
  tables which I put into a new file called libc/stdlib/sb_charsets.c,
  and which are only built on _MB_CAPABLE systems.  The tables are now
  guarded by the defines we talked about, _MB_EXTENDED_CHARSETS_ISO and
  _MB_EXTENDED_CHARSETS_DOS.  Maybe the latter should be better renamed
  to _MB_EXTENDED_CHARSETS_WINDOWS, though.

- On Cygwin, add support for the charsets GBK, CP949 (Korean unified Hangul),
  and BIG5.  My current implementation of these charset conversion requires
  OS support, so Cygwin needs to be able to set them in setlocale(), but
  I have no implementation for newlib so far.

- On Cygwin, if no explicit charset is defined as input to setlocale,
  search for the current ANSI codepage and set it as current charset,
  if it's one of the supported charsets, otherwise default to ISO-8859-1.

  The change to the former patch is that the function
  __set_charset_from_codepage is now defined in Cygwin, not in newlib.

- Also on Cygwin, call a function __set_ctype, also defined in Cygwin only
  for now.  This allows to switch the ctype tables for the various charsets.

  The idea is that this function can also be defined in newlib at one
  point.  We just have to discuss the implementation.  In Cygwin the
  ctype data is copied over into the standard ctype array.  This is the
  only way to do it which allows backward compatible behaviour with
  existing applications due to the nature of the isXXX functions being
  mostly used as macros defined in ctype.h.

- Allow "eucJP" additionally to "EUCJP", and "Big5" additionally to "BIG5",
  to support typical settings of these charsets on other systems.

- The functions _wctomb_r and _mbtowc_r are now split into multiple
  functions for each supported charset, rather than having to call
  strcmp multiple times to determine which charset is used.

  To do that, the setlocale() function sets function pointers
  __wctomb/__mbtowc according to the current charset.  On systems not
  being _MB_CAPABLE, only two such functions exist, __ascii_wctomb and
  __ascii_mbtowc.'

  The change in contrast to the former implementation is that the charset
  is one of the parameters to these functions.  That's necessary to
  allow Cygwin to call the __iso_mbtowc and __cp_mbtowc functions with
  an alternate charset.

- On Cygwin, don't use the newlib implementation of SJIS, JIS, and EUCJP
  mbtowc/wctomb.  The reason is that newlib's implementations don't
  convert the input multibyte chars to UTF wchars, rather it converts
  them to a simple self-made form of wchars.  This doesn't work well
  on Cygwin, because the underlying OS always requires wchars to be UTF-16.
  Therefore Cygwin has it's own implementations of __sjis_mbtowc, etc.

- Along the same lines, the function __jp2uc now does not convert the
  incoming character at all on Cygwin, because the incoming char is
  already UTF on Cygwin.

- All iswXXX and towXXX functions have been changed so that on
  _MB_CAPABLE systems all wchar_t input is either SJIS/JIS/EUCP, which
  requires to convert the character to unicode first, or the input is
  already unicode.  This is the wchar_t representation for all other
  charsets anyway, and the only wchar_t representation on Cygwin as
  outlined above.

- The _MB_EXTENDED_CHARSETS_ISO and _MB_EXTENDED_CHARSETS_DOS are
  defined in libc/include/sys/config.h.  I also added a define
  _MB_EXTENDED_CHARSETS_ALL which is right now only set on Cygwin.
  It enables the other two, and I expect them to enable the still
  missing _MB_EXTENDED_CHARSETS_GBK, _MB_EXTENDED_CHARSETS_KOR,
  and _MB_EXTENDED_CHARSETS_BIG5, as soon as they are available.

- In libc/include/sys/reent.h, I marked the struct _reent members
  _current_category and _current_locale as unused.  They are, because
  they were only (incorrectly) used by the old setlocale implementation.
  I don't want to remove them to keep the size of struct _reent the
  same for backward compatibility with existing code.

Again, the patch is split in two.  The first one containing all changes
except those in ctype, the second one containg the ctype changes.

I have a rather big patch to Cygwin which requires this functionality
to go in first.  I hope the patch is basically ok to apply.

I have split up the long ChangeLog entry for better readability.

Please put the _mbtowc_r and _wctomb_r functions at the top of the files plus the default ASCII
versions so people don't have to wade through to the bottom. I don't think the change of the default
charset name is going to affect anybody. I am ok with you checking in the patch.


-- Jeff J.

Corinna


* libc/ctype/iswalpha.c: Handle all wchar_t as unicode on _MB_CAPABLE systems. * libc/ctype/iswblank.c: Ditto. * libc/ctype/iswcntrl.c: Ditto. * libc/ctype/iswprint.c: Ditto. * libc/ctype/iswpunct.c: Ditto. * libc/ctype/iswspace.c: Ditto. * libc/ctype/jp2uc.c (__jp2uc): On Cygwin, just return c. Explain why. * libc/ctype/towlower.c: Ditto. * libc/ctype/towupper.c: Ditto.

	* libc/include/sys/config.h: Define _MB_EXTENDED_CHARSETS_ISO
	and _MB_EXTENDED_CHARSETS_DOS if _MB_EXTENDED_CHARSETS_ALL is
	defined.  Define _MB_EXTENDED_CHARSETS_ALL on Cygwin only for now.
	* libc/include/sys/reent.h (struct _reent): Mark _current_category
	and _current_locale as unused.

	* libc/locale/locale.c: Add new charset support to documentation.
	Include ../stdio/local.h from here.
	(lc_ctype_charset): Set to "ASCII" by default.
	(lc_message_charset): Ditto.
	(_setlocale_r): Don't set _current_category and _current_locale.
	(loadlocale): Add Cygwin codepage support.  On _MB_CAPABLE
	systems, set __mbtowc and __wctomb function pointers to function
	corresponding with current charset.  Don't allow non-existant
	ISO-8859-12 charset.  Add support for Windows singlebyte codepages.
	On Cygwin, add support for GBK, CP949, and BIG5.  On Cygwin,
	call __set_ctype() in case the catorgy is LC_CTYPE.  Don't set
	_current_category and _current_locale.

	* libc/stdlib/Makefile.am (GENERAL_SOURCES): Add sb_charsets.c.
	* libc/stdlib/Makefile.in: Regenerate.
	* libc/stdlib/local.h: Add prototype for __locale_charset.
	Add prototypes for __mbtowc and __wctomb pointers.
	Add prototypes for charset-specific _wctomb_r and _mbtowc_r
	functions.
	Declare tables and functions from sb_charsets.c.
	* libc/stdlib/mbtowc_r.c (__mbtowc): Define.  Set to __ascii_mbtowc
	by default.
	(__iso_mbtowc): New function.
	(__cp_mbtowc): New function.
	(__utf8_mbtowc): New function.
	(__sjis_mbtowc): New function.  Disable on Cygwin.
	(__eucjp_mbtowc): New function.  Disable on Cygwin.
	(__jis_mbtowc): New function.  Disable on Cygwin.
	(__ascii_mbtowc): New function.
	(_mbtowc_r): Just call __mbtowc from here.
	* libc/stdlib/sb_charsets.c: New file, adding singlebyte to UTF
	conversion tables for all ISO and CP charsets.
	(__iso_8859_index): New function.
	(__cp_index): New function.
	* libc/stdlib/wctomb_r.c (__wctomb): Define.  Set to __ascii_wctomb
	by default.
	(__utf8_wctomb): New function.
	(__sjis_wctomb): New function.  Disable on Cygwin.
	(__eucjp_wctomb): New function.  Disable on Cygwin.
	(__jis_wctomb): New function.  Disable on Cygwin.
	(__iso_wctomb): New function.
	(__cp_wctomb): New function.
	(__ascii_wctomb): New function.
	(_wctomb_r): Just call __wctomb from here.


Index: libc/include/sys/config.h
===================================================================
RCS file: /cvs/src/src/newlib/libc/include/sys/config.h,v
retrieving revision 1.50
diff -u -p -r1.50 config.h
--- libc/include/sys/config.h 20 Mar 2009 20:44:14 -0000 1.50
+++ libc/include/sys/config.h 22 Mar 2009 16:25:07 -0000
@@ -179,6 +179,7 @@
#if defined(__CYGWIN__)
#include <cygwin/config.h>
#define __LINUX_ERRNO_EXTENSIONS__ 1
+#define _MB_EXTENDED_CHARSETS_ALL 1
#endif
#if defined(__rtems__)
@@ -211,4 +212,12 @@
#endif
#endif
+/* If _MB_EXTENDED_CHARSETS_ALL is set, we want all of the extended
+ charsets. The extended charsets add a few functions and a couple
+ of tables of a few K each. */
+#ifdef _MB_EXTENDED_CHARSETS_ALL
+#define _MB_EXTENDED_CHARSETS_ISO 1
+#define _MB_EXTENDED_CHARSETS_DOS 1
+#endif
+
#endif /* __SYS_CONFIG_H__ */
Index: libc/include/sys/reent.h
===================================================================
RCS file: /cvs/src/src/newlib/libc/include/sys/reent.h,v
retrieving revision 1.45
diff -u -p -r1.45 reent.h
--- libc/include/sys/reent.h 10 Dec 2008 23:43:12 -0000 1.45
+++ libc/include/sys/reent.h 22 Mar 2009 16:25:07 -0000
@@ -371,8 +371,8 @@ struct _reent
int __sdidinit; /* 1 means stdio has been init'd */
- int _current_category; /* used by setlocale */
- _CONST char *_current_locale;
+ int _current_category; /* unused */
+ _CONST char *_current_locale; /* unused */
struct _mprec *_mp;
Index: libc/locale/locale.c
===================================================================
RCS file: /cvs/src/src/newlib/libc/locale/locale.c,v
retrieving revision 1.9
diff -u -p -r1.9 locale.c
--- libc/locale/locale.c 3 Mar 2009 09:28:45 -0000 1.9
+++ libc/locale/locale.c 22 Mar 2009 16:25:07 -0000
@@ -47,11 +47,18 @@ and <<"C">> values for <[locale]>; strin
honored unless _MB_CAPABLE is defined in which case POSIX locale strings
are allowed, plus five extensions supported for backward compatibility with
older implementations using newlib: <<"C-UTF-8">>, <<"C-JIS">>, <<"C-EUCJP">>,
-<<"C-SJIS">>, or <<"C-ISO-8859-x">> with 1 <= x <= 15. Even when using
-POSIX locale strings, the only charsets allowed are <<"UTF-8">>, <<"JIS">>,
-<<"EUCJP">>, <<"SJIS">>, or <<"ISO-8859-x">> with 1 <= x <= 15. (<<"">> is -also accepted; if given, the settings are read from the corresponding
-LC_* environment variables and $LANG according to POSIX rules.
+<<"C-SJIS">>, <<"C-ISO-8859-x">> with 1 <= x <= 15, or <<"C-CPxxx">> with
+xxx in [437, 720, 737, 775, 850, 852, 855, 857, 858, 862, 866, 874, 1125, 1250,
+1251, 1252, 1253, 1254, 1255, 1256, 1257, 1258]. Even when using POSIX
+locale strings, the only charsets allowed are <<"UTF-8">>, <<"JIS">>,
+<<"EUCJP">>, <<"SJIS">>, <<"ISO-8859-x">> with 1 <= x <= 15, or
+<<"CPxxx">> with xxx in [437, 720, 737, 775, 850, 852, 855, 857, 858, 862, 866,
+874, 1125, 1250, 1251, 1252, 1253, 1254, 1255, 1256, 1257, 1258]. +(<<"">> is also accepted; if given, the settings are read from the
+corresponding LC_* environment variables and $LANG according to POSIX rules.
+
+Under Cygwin, this implementation additionally supports the charsets <<"GBK">>,
+<<"CP949">>, and <<"BIG5">>.
If you use <<NULL>> as the <[locale]> argument, <<setlocale>> returns
a pointer to the string representing the current locale (always
@@ -85,6 +92,9 @@ PORTABILITY
ANSI C requires <<setlocale>>, but the only locale required across all
implementations is the C locale.
+NOTES
+There is no ISO-8859-12 codepage. It's also refused by this implementation.
+
No supporting OS subroutines are required.
*/
@@ -129,6 +139,11 @@ No supporting OS subroutines are require
#include <limits.h>
#include <reent.h>
#include <stdlib.h>
+#include <wchar.h>
+#include "../stdlib/local.h"
+#ifdef __CYGWIN__
+#include <windows.h>
+#endif
#define _LC_LAST 7
#define ENCODING_LEN 31
@@ -190,8 +205,8 @@ static const char *__get_locale_env(stru
#endif
-static char lc_ctype_charset[ENCODING_LEN + 1] = "ISO-8859-1";
-static char lc_message_charset[ENCODING_LEN + 1] = "ISO-8859-1";
+static char lc_ctype_charset[ENCODING_LEN + 1] = "ASCII";
+static char lc_message_charset[ENCODING_LEN + 1] = "ASCII";
char *
_DEFUN(_setlocale_r, (p, category, locale),
@@ -205,8 +220,6 @@ _DEFUN(_setlocale_r, (p, category, local
if (strcmp (locale, "POSIX") && strcmp (locale, "C")
&& strcmp (locale, ""))
return NULL;
- p->_current_category = category; - p->_current_locale = locale;
}
return "C";
#else
@@ -361,6 +374,11 @@ currentlocale()
#endif
#ifdef _MB_CAPABLE
+#ifdef __CYGWIN__
+extern void *__set_charset_from_codepage (unsigned int, char *charset);
+extern void __set_ctype (const char *charset);
+#endif /* __CYGWIN__ */
+
static char *
loadlocale(struct _reent *p, int category)
{
@@ -382,7 +400,7 @@ loadlocale(struct _reent *p, int categor
if (!strcmp (locale, "POSIX"))
strcpy (locale, "C");
if (!strcmp (locale, "C")) /* Default "C" locale */
- strcpy (charset, "ISO-8859-1");
+ strcpy (charset, "ASCII");
else if (locale[0] == 'C' && locale[1] == '-') /* Old newlib style */
strcpy (charset, locale + 2);
else /* POSIX style */
@@ -414,7 +432,11 @@ loadlocale(struct _reent *p, int categor
}
else if (c[0] == '\0' || c[0] == '@')
/* End of string or just a modifier */
+#ifdef __CYGWIN__
+ __set_charset_from_codepage (GetACP (), charset);
+#else
strcpy (charset, "ISO-8859-1");
+#endif
else
/* Invalid string */
return NULL;
@@ -426,42 +448,155 @@ loadlocale(struct _reent *p, int categor
if (strcmp (charset, "UTF-8"))
return NULL;
mbc_max = 6;
+#ifdef _MB_CAPABLE
+ __wctomb = __utf8_wctomb;
+ __mbtowc = __utf8_mbtowc;
+#endif
break;
case 'J':
if (strcmp (charset, "JIS"))
return NULL;
mbc_max = 8;
+#ifdef _MB_CAPABLE
+ __wctomb = __jis_wctomb;
+ __mbtowc = __jis_mbtowc;
+#endif
break;
case 'E':
- if (strcmp (charset, "EUCJP"))
+ if (strcmp (charset, "EUCJP") && strcmp (charset, "eucJP"))
return NULL;
+ strcpy (charset, "EUCJP");
mbc_max = 2;
+#ifdef _MB_CAPABLE
+ __wctomb = __eucjp_wctomb;
+ __mbtowc = __eucjp_mbtowc;
+#endif
break;
case 'S':
if (strcmp (charset, "SJIS"))
return NULL;
mbc_max = 2;
+#ifdef _MB_CAPABLE
+ __wctomb = __sjis_wctomb;
+ __mbtowc = __sjis_mbtowc;
+#endif
break;
case 'I':
- default:
- /* Must be exactly one of ISO-8859-1, [...] ISO-8859-15. */
+ /* Must be exactly one of ISO-8859-1, [...] ISO-8859-16, except for
+ ISO-8859-12. */
if (strncmp (charset, "ISO-8859-", 9))
return NULL;
- val = strtol (charset + 9, &end, 10);
- if (val < 1 || val > 15 || *end)
+ val = _strtol_r (p, charset + 9, &end, 10);
+ if (val < 1 || val > 16 || val == 12 || *end)
+ return NULL;
+ mbc_max = 1;
+#ifdef _MB_CAPABLE
+#ifdef _MB_EXTENDED_CHARSETS_ISO
+ __wctomb = __iso_wctomb;
+ __mbtowc = __iso_mbtowc;
+#else /* !_MB_EXTENDED_CHARSETS_ISO */
+ __wctomb = __ascii_wctomb;
+ __mbtowc = __ascii_mbtowc;
+#endif /* _MB_EXTENDED_CHARSETS_ISO */
+#endif
+ break;
+ case 'C':
+ if (charset[1] != 'P')
+ return NULL;
+ val = _strtol_r (p, charset + 2, &end, 10);
+ if (*end)
+ return NULL;
+ switch (val)
+ {
+ case 437:
+ case 720:
+ case 737:
+ case 775:
+ case 850:
+ case 852:
+ case 855:
+ case 857:
+ case 858:
+ case 862:
+ case 866:
+ case 874:
+ case 1125:
+ case 1250:
+ case 1251:
+ case 1252:
+ case 1253:
+ case 1254:
+ case 1255:
+ case 1256:
+ case 1257:
+ case 1258:
+ mbc_max = 1;
+#ifdef _MB_CAPABLE
+#ifdef _MB_EXTENDED_CHARSETS_DOS
+ __wctomb = __cp_wctomb;
+ __mbtowc = __cp_mbtowc;
+#else /* !_MB_EXTENDED_CHARSETS_DOS */
+ __wctomb = __ascii_wctomb;
+ __mbtowc = __ascii_mbtowc;
+#endif /* _MB_EXTENDED_CHARSETS_DOS */
+#endif
+ break;
+#ifdef __CYGWIN__
+ case 949:
+ mbc_max = 2;
+#ifdef _MB_CAPABLE
+ __wctomb = __kr_wctomb;
+ __mbtowc = __kr_mbtowc;
+#endif
+ break;
+#endif
+ default:
+ return NULL;
+ }
+ break;
+ case 'A':
+ if (strcmp (charset, "ASCII"))
return NULL;
mbc_max = 1;
+#ifdef _MB_CAPABLE
+ __wctomb = __ascii_wctomb;
+ __mbtowc = __ascii_mbtowc;
+#endif
+ break;
+#ifdef __CYGWIN__
+ case 'G':
+ if (strcmp (charset, "GBK"))
+ return NULL;
+ mbc_max = 2;
+#ifdef _MB_CAPABLE
+ __wctomb = __gbk_wctomb;
+ __mbtowc = __gbk_mbtowc;
+#endif
break;
+ case 'B':
+ if (strcmp (charset, "BIG5") && strcmp (charset, "Big5"))
+ return NULL;
+ strcpy (charset, "BIG5");
+ mbc_max = 2;
+#ifdef _MB_CAPABLE
+ __wctomb = __big5_wctomb;
+ __mbtowc = __big5_mbtowc;
+#endif
+ break;
+#endif /* __CYGWIN__ */
+ default:
+ return NULL;
}
if (category == LC_CTYPE)
{
strcpy (lc_ctype_charset, charset);
__mb_cur_max = mbc_max;
+#ifdef __CYGWIN__
+ __set_ctype (charset);
+#endif
}
else if (category == LC_MESSAGES)
strcpy (lc_message_charset, charset);
- p->_current_category = category; - p->_current_locale = locale;
return strcpy(current_categories[category], new_categories[category]);
}
Index: libc/stdlib/Makefile.am
===================================================================
RCS file: /cvs/src/src/newlib/libc/stdlib/Makefile.am,v
retrieving revision 1.28
diff -u -p -r1.28 Makefile.am
--- libc/stdlib/Makefile.am 25 Feb 2009 21:33:17 -0000 1.28
+++ libc/stdlib/Makefile.am 22 Mar 2009 16:25:07 -0000
@@ -48,6 +48,7 @@ GENERAL_SOURCES = \
rand_r.c \
realloc.c \
reallocf.c \
+ sb_charsets.c \
strtod.c \
strtol.c \
strtoul.c \
Index: libc/stdlib/local.h
===================================================================
RCS file: /cvs/src/src/newlib/libc/stdlib/local.h,v
retrieving revision 1.1.1.1
diff -u -p -r1.1.1.1 local.h
--- libc/stdlib/local.h 17 Feb 2000 19:39:47 -0000 1.1.1.1
+++ libc/stdlib/local.h 22 Mar 2009 16:25:07 -0000
@@ -5,4 +5,61 @@
char * _EXFUN(_gcvt,(struct _reent *, double , int , char *, char, int));
+char *__locale_charset ();
+
+#ifndef __mbstate_t_defined
+#include <wchar.h>
+#endif
+
+int (*__wctomb) (struct _reent *, char *, wchar_t, const char *, mbstate_t *);
+int __ascii_wctomb (struct _reent *, char *, wchar_t, const char *,
+ mbstate_t *);
+#ifdef _MB_CAPABLE
+int __utf8_wctomb (struct _reent *, char *, wchar_t, const char *, mbstate_t *);
+int __sjis_wctomb (struct _reent *, char *, wchar_t, const char *, mbstate_t *);
+int __eucjp_wctomb (struct _reent *, char *, wchar_t, const char *,
+ mbstate_t *);
+int __jis_wctomb (struct _reent *, char *, wchar_t, const char *, mbstate_t *);
+int __iso_wctomb (struct _reent *, char *, wchar_t, const char *, mbstate_t *);
+int __cp_wctomb (struct _reent *, char *, wchar_t, const char *, mbstate_t *);
+#ifdef __CYGWIN__
+int __gbk_wctomb (struct _reent *, char *, wchar_t, const char *, mbstate_t *);
+int __kr_wctomb (struct _reent *, char *, wchar_t, const char *, mbstate_t *);
+int __big5_wctomb (struct _reent *, char *, wchar_t, const char *, mbstate_t *);
+#endif
+#endif
+
+int (*__mbtowc) (struct _reent *, wchar_t *, const char *, size_t,
+ const char *, mbstate_t *);
+int __ascii_mbtowc (struct _reent *, wchar_t *, const char *, size_t,
+ const char *, mbstate_t *);
+#ifdef _MB_CAPABLE
+int __utf8_mbtowc (struct _reent *, wchar_t *, const char *, size_t,
+ const char *, mbstate_t *);
+int __sjis_mbtowc (struct _reent *, wchar_t *, const char *, size_t,
+ const char *, mbstate_t *);
+int __eucjp_mbtowc (struct _reent *, wchar_t *, const char *, size_t,
+ const char *, mbstate_t *);
+int __jis_mbtowc (struct _reent *, wchar_t *, const char *, size_t,
+ const char *, mbstate_t *);
+int __iso_mbtowc (struct _reent *, wchar_t *, const char *, size_t,
+ const char *, mbstate_t *);
+int __cp_mbtowc (struct _reent *, wchar_t *, const char *, size_t,
+ const char *, mbstate_t *);
+#ifdef __CYGWIN__
+int __gbk_mbtowc (struct _reent *, wchar_t *, const char *, size_t,
+ const char *, mbstate_t *);
+int __kr_mbtowc (struct _reent *, wchar_t *, const char *, size_t,
+ const char *, mbstate_t *);
+int __big5_mbtowc (struct _reent *, wchar_t *, const char *, size_t,
+ const char *, mbstate_t *);
+#endif
+#endif
+
+wchar_t __iso_8859_conv[14][0x60];
+int __iso_8859_index (const char *);
+
+wchar_t __cp_conv[12][0x80];
+int __cp_index (const char *);
+
#endif
Index: libc/stdlib/mbtowc_r.c
===================================================================
RCS file: /cvs/src/src/newlib/libc/stdlib/mbtowc_r.c,v
retrieving revision 1.11
diff -u -p -r1.11 mbtowc_r.c
--- libc/stdlib/mbtowc_r.c 19 Mar 2009 19:47:52 -0000 1.11
+++ libc/stdlib/mbtowc_r.c 22 Mar 2009 16:25:07 -0000
@@ -5,10 +5,13 @@
#include <wchar.h>
#include <string.h>
#include <errno.h>
+#include "local.h"
-#ifdef _MB_CAPABLE
-extern char *__locale_charset ();
+int (*__mbtowc) (struct _reent *, wchar_t *, const char *, size_t,
+ const char *, mbstate_t *)
+ = __ascii_mbtowc;
+#ifdef _MB_CAPABLE
typedef enum { ESCAPE, DOLLAR, BRACKET, AT, B, J, NUL, JIS_CHAR, OTHER, JIS_C_NUM } JIS_CHAR_TYPE;
typedef enum { ASCII, JIS, A_ESC, A_ESC_DL, JIS_1, J_ESC, J_ESC_BR,
@@ -43,17 +46,18 @@ static JIS_ACTION JIS_action_table[JIS_S
/* J_ESC */ { ERROR, ERROR, NOOP, ERROR, ERROR, ERROR, ERROR, ERROR, ERROR },
/* J_ESC_BR */{ ERROR, ERROR, ERROR, ERROR, MAKE_A, MAKE_A, ERROR, ERROR, ERROR },
};
-#endif /* _MB_CAPABLE */
/* we override the mbstate_t __count field for more complex encodings and use it store a state value */
#define __state __count
+#ifdef _MB_EXTENDED_CHARSETS_ISO
int
-_DEFUN (_mbtowc_r, (r, pwc, s, n, state),
- struct _reent *r _AND
- wchar_t *pwc _AND - const char *s _AND - size_t n _AND
+_DEFUN (__iso_mbtowc, (r, pwc, s, n, charset, state),
+ struct _reent *r _AND
+ wchar_t *pwc _AND + const char *s _AND + size_t n _AND
+ const char *charset _AND
mbstate_t *state)
{
wchar_t dummy;
@@ -62,190 +66,384 @@ _DEFUN (_mbtowc_r, (r, pwc, s, n, state)
if (pwc == NULL)
pwc = &dummy;
- if (s != NULL && n == 0)
+ if (s == NULL)
+ return 0;
+
+ if (n == 0)
return -2;
-#ifdef _MB_CAPABLE
- if (strlen (__locale_charset ()) <= 1)
- { /* fall-through */ }
- else if (!strcmp (__locale_charset (), "UTF-8"))
- {
- int ch;
- int i = 0;
-
- if (s == NULL)
- return 0; /* UTF-8 character encodings are not state-dependent */
-
- if (state->__count == 4)
- {
- /* Create the second half of the surrogate pair. For a description
- see the comment below. */
- wint_t tmp = (wchar_t)((state->__value.__wchb[0] & 0x07) << 18)
- | (wchar_t)((state->__value.__wchb[1] & 0x3f) << 12)
- | (wchar_t)((state->__value.__wchb[2] & 0x3f) << 6)
- | (wchar_t)(state->__value.__wchb[3] & 0x3f);
- state->__count = 0;
- *pwc = 0xdc00 | ((tmp - 0x10000) & 0x3ff);
- return 2;
- }
- if (state->__count == 0)
- ch = t[i++];
- else
+ if (*t >= 0xa0)
+ {
+ int iso_idx = __iso_8859_index (charset + 9);
+ if (iso_idx >= 0)
{
- if (n < (size_t)-1)
- ++n;
- ch = state->__value.__wchb[0];
+ *pwc = __iso_8859_conv[iso_idx][*t - 0xa0];
+ if (*pwc == 0) /* Invalid character */
+ {
+ r->_errno = EILSEQ;
+ return -1;
+ }
+ return 1;
}
+ }
+
+ *pwc = (wchar_t) *t;
+ + if (*t == '\0')
+ return 0;
+
+ return 1;
+}
+#endif /* _MB_EXTENDED_CHARSETS_ISO */
+
+#ifdef _MB_EXTENDED_CHARSETS_DOS
+int
+_DEFUN (__cp_mbtowc, (r, pwc, s, n, charset, state),
+ struct _reent *r _AND
+ wchar_t *pwc _AND + const char *s _AND + size_t n _AND
+ const char *charset _AND
+ mbstate_t *state)
+{
+ wchar_t dummy;
+ unsigned char *t = (unsigned char *)s;
+
+ if (pwc == NULL)
+ pwc = &dummy;
+
+ if (s == NULL)
+ return 0;
+
+ if (n == 0)
+ return -2;
- if (ch == '\0')
+ if (*t >= 0x80)
+ {
+ int cp_idx = __cp_index (charset + 2);
+ if (cp_idx >= 0)
{
- *pwc = 0;
- state->__count = 0;
- return 0; /* s points to the null character */
+ *pwc = __cp_conv[cp_idx][*t - 0x80];
+ if (*pwc == 0) /* Invalid character */
+ {
+ r->_errno = EILSEQ;
+ return -1;
+ }
+ return 1;
}
+ }
+
+ *pwc = (wchar_t)*t;
+ + if (*t == '\0')
+ return 0;
+
+ return 1;
+}
+#endif /* _MB_EXTENDED_CHARSETS_DOS */
+
+int
+_DEFUN (__utf8_mbtowc, (r, pwc, s, n, charset, state),
+ struct _reent *r _AND
+ wchar_t *pwc _AND + const char *s _AND + size_t n _AND
+ const char *charset _AND
+ mbstate_t *state)
+{
+ wchar_t dummy;
+ unsigned char *t = (unsigned char *)s;
+ int ch;
+ int i = 0;
+
+ if (pwc == NULL)
+ pwc = &dummy;
+
+ if (s == NULL)
+ return 0;
+
+ if (n == 0)
+ return -2;
+
+ if (state->__count == 4)
+ {
+ /* Create the second half of the surrogate pair. For a description
+ see the comment below. */
+ wint_t tmp = (wchar_t)((state->__value.__wchb[0] & 0x07) << 18)
+ | (wchar_t)((state->__value.__wchb[1] & 0x3f) << 12)
+ | (wchar_t)((state->__value.__wchb[2] & 0x3f) << 6)
+ | (wchar_t)(state->__value.__wchb[3] & 0x3f);
+ state->__count = 0;
+ *pwc = 0xdc00 | ((tmp - 0x10000) & 0x3ff);
+ return 2;
+ }
+ if (state->__count == 0)
+ ch = t[i++];
+ else
+ {
+ if (n < (size_t)-1)
+ ++n;
+ ch = state->__value.__wchb[0];
+ }
+
+ if (ch == '\0')
+ {
+ *pwc = 0;
+ state->__count = 0;
+ return 0; /* s points to the null character */
+ }
- if (ch >= 0x0 && ch <= 0x7f)
+ if (ch >= 0x0 && ch <= 0x7f)
+ {
+ /* single-byte sequence */
+ state->__count = 0;
+ *pwc = ch;
+ return 1;
+ }
+ if (ch >= 0xc0 && ch <= 0xdf)
+ {
+ /* two-byte sequence */
+ state->__value.__wchb[0] = ch;
+ state->__count = 1;
+ if (n < 2)
+ return -2;
+ ch = t[i++];
+ if (ch < 0x80 || ch > 0xbf)
{
- /* single-byte sequence */
- state->__count = 0;
- *pwc = ch;
- return 1;
+ r->_errno = EILSEQ;
+ return -1;
+ }
+ if (state->__value.__wchb[0] < 0xc2)
+ {
+ /* overlong UTF-8 sequence */
+ r->_errno = EILSEQ;
+ return -1;
+ }
+ state->__count = 0;
+ *pwc = (wchar_t)((state->__value.__wchb[0] & 0x1f) << 6)
+ | (wchar_t)(ch & 0x3f);
+ return i;
+ }
+ if (ch >= 0xe0 && ch <= 0xef)
+ {
+ /* three-byte sequence */
+ wchar_t tmp;
+ state->__value.__wchb[0] = ch;
+ if (state->__count == 0)
+ state->__count = 1;
+ else if (n < (size_t)-1)
+ ++n;
+ if (n < 2)
+ return -2;
+ ch = (state->__count == 1) ? t[i++] : state->__value.__wchb[1];
+ if (state->__value.__wchb[0] == 0xe0 && ch < 0xa0)
+ {
+ /* overlong UTF-8 sequence */
+ r->_errno = EILSEQ;
+ return -1;
+ }
+ if (ch < 0x80 || ch > 0xbf)
+ {
+ r->_errno = EILSEQ;
+ return -1;
+ }
+ state->__value.__wchb[1] = ch;
+ state->__count = 2;
+ if (n < 3)
+ return -2;
+ ch = t[i++];
+ if (ch < 0x80 || ch > 0xbf)
+ {
+ r->_errno = EILSEQ;
+ return -1;
+ }
+ state->__count = 0;
+ tmp = (wchar_t)((state->__value.__wchb[0] & 0x0f) << 12)
+ | (wchar_t)((state->__value.__wchb[1] & 0x3f) << 6)
+ | (wchar_t)(ch & 0x3f);
+ + if (tmp >= 0xd800 && tmp <= 0xdfff)
+ {
+ r->_errno = EILSEQ;
+ return -1;
+ }
+ *pwc = tmp;
+ return i;
+ }
+ if (ch >= 0xf0 && ch <= 0xf7)
+ {
+ /* four-byte sequence */
+ wint_t tmp;
+ state->__value.__wchb[0] = ch;
+ if (state->__count == 0)
+ state->__count = 1;
+ else if (n < (size_t)-1)
+ ++n;
+ if (n < 2)
+ return -2;
+ ch = (state->__count == 1) ? t[i++] : state->__value.__wchb[1];
+ if (state->__value.__wchb[0] == 0xf0 && ch < 0x90)
+ {
+ /* overlong UTF-8 sequence */
+ r->_errno = EILSEQ;
+ return -1;
+ }
+ if (ch < 0x80 || ch > 0xbf)
+ {
+ r->_errno = EILSEQ;
+ return -1;
}
- else if (ch >= 0xc0 && ch <= 0xdf)
+ state->__value.__wchb[1] = ch;
+ if (state->__count == 1)
+ state->__count = 2;
+ else if (n < (size_t)-1)
+ ++n;
+ if (n < 3)
+ return -2;
+ ch = (state->__count == 2) ? t[i++] : state->__value.__wchb[2];
+ if (ch < 0x80 || ch > 0xbf)
+ {
+ r->_errno = EILSEQ;
+ return -1;
+ }
+ state->__value.__wchb[2] = ch;
+ state->__count = 3;
+ if (n < 4)
+ return -2;
+ ch = t[i++];
+ if (ch < 0x80 || ch > 0xbf)
+ {
+ r->_errno = EILSEQ;
+ return -1;
+ }
+ tmp = (wint_t)((state->__value.__wchb[0] & 0x07) << 18)
+ | (wint_t)((state->__value.__wchb[1] & 0x3f) << 12)
+ | (wint_t)((state->__value.__wchb[2] & 0x3f) << 6)
+ | (wint_t)(ch & 0x3f);
+ if (tmp > 0xffff && sizeof(wchar_t) == 2)
+ {
+ /* On systems which have wchar_t being UTF-16 values, the value
+ doesn't fit into a single wchar_t in this case. So what we
+ do here is to store the state with a special value of __count
+ and return the first half of a surrogate pair. As return
+ value we choose to return the half of the actual UTF-8 char.
+ The second half is returned in case we recognize the special
+ __count value above. */
+ state->__value.__wchb[3] = ch;
+ state->__count = 4;
+ *pwc = 0xd800 | (((tmp - 0x10000) >> 10) & 0x3ff);
+ return 2;
+ }
+ *pwc = tmp;
+ state->__count = 0;
+ return i;
+ }
+
+ r->_errno = EILSEQ;
+ return -1;
+}
+
+/* Cygwin defines its own doublebyte charset conversion functions + because the underlying OS requires wchar_t == UTF-16. */
+#ifndef __CYGWIN__
+int
+_DEFUN (__sjis_mbtowc, (r, pwc, s, n, charset, state),
+ struct _reent *r _AND
+ wchar_t *pwc _AND + const char *s _AND + size_t n _AND
+ const char *charset _AND
+ mbstate_t *state)
+{
+ wchar_t dummy;
+ unsigned char *t = (unsigned char *)s;
+ int ch;
+ int i = 0;
+
+ if (pwc == NULL)
+ pwc = &dummy;
+
+ if (s == NULL)
+ return 0; /* not state-dependent */
+
+ if (n == 0)
+ return -2;
+
+ ch = t[i++];
+ if (state->__count == 0)
+ {
+ if (_issjis1 (ch))
{
- /* two-byte sequence */
state->__value.__wchb[0] = ch;
state->__count = 1;
- if (n < 2)
+ if (n <= 1)
return -2;
ch = t[i++];
- if (ch < 0x80 || ch > 0xbf)
- {
- r->_errno = EILSEQ;
- return -1;
- }
- if (state->__value.__wchb[0] < 0xc2)
- {
- /* overlong UTF-8 sequence */
- r->_errno = EILSEQ;
- return -1;
- }
- state->__count = 0;
- *pwc = (wchar_t)((state->__value.__wchb[0] & 0x1f) << 6)
- | (wchar_t)(ch & 0x3f);
- return i;
}
- else if (ch >= 0xe0 && ch <= 0xef)
+ }
+ if (state->__count == 1)
+ {
+ if (_issjis2 (ch))
{
- /* three-byte sequence */
- wchar_t tmp;
- state->__value.__wchb[0] = ch;
- if (state->__count == 0)
- state->__count = 1;
- else if (n < (size_t)-1)
- ++n;
- if (n < 2)
- return -2;
- ch = (state->__count == 1) ? t[i++] : state->__value.__wchb[1];
- if (state->__value.__wchb[0] == 0xe0 && ch < 0xa0)
- {
- /* overlong UTF-8 sequence */
- r->_errno = EILSEQ;
- return -1;
- }
- if (ch < 0x80 || ch > 0xbf)
- {
- r->_errno = EILSEQ;
- return -1;
- }
- state->__value.__wchb[1] = ch;
- state->__count = 2;
- if (n < 3)
- return -2;
- ch = t[i++];
- if (ch < 0x80 || ch > 0xbf)
- {
- r->_errno = EILSEQ;
- return -1;
- }
+ *pwc = (((wchar_t)state->__value.__wchb[0]) << 8) + (wchar_t)ch;
state->__count = 0;
- tmp = (wchar_t)((state->__value.__wchb[0] & 0x0f) << 12)
- | (wchar_t)((state->__value.__wchb[1] & 0x3f) << 6)
- | (wchar_t)(ch & 0x3f);
-
- if (tmp >= 0xd800 && tmp <= 0xdfff)
- {
- r->_errno = EILSEQ;
- return -1;
- }
- *pwc = tmp;
return i;
}
- else if (ch >= 0xf0 && ch <= 0xf7)
+ else + {
+ r->_errno = EILSEQ;
+ return -1;
+ }
+ }
+
+ *pwc = (wchar_t)*t;
+ + if (*t == '\0')
+ return 0;
+
+ return 1;
+}
+
+int
+_DEFUN (__eucjp_mbtowc, (r, pwc, s, n, charset, state),
+ struct _reent *r _AND
+ wchar_t *pwc _AND + const char *s _AND + size_t n _AND
+ const char *charset _AND
+ mbstate_t *state)
+{
+ wchar_t dummy;
+ unsigned char *t = (unsigned char *)s;
+ int ch;
+ int i = 0;
+
+ if (pwc == NULL)
+ pwc = &dummy;
+
+ if (s == NULL)
+ return 0;
+
+ if (n == 0)
+ return -2;
+
+ ch = t[i++];
+ if (state->__count == 0)
+ {
+ if (_iseucjp (ch))
{
- /* four-byte sequence */
- wint_t tmp;
state->__value.__wchb[0] = ch;
- if (state->__count == 0)
- state->__count = 1;
- else if (n < (size_t)-1)
- ++n;
- if (n < 2)
- return -2;
- ch = (state->__count == 1) ? t[i++] : state->__value.__wchb[1];
- if (state->__value.__wchb[0] == 0xf0 && ch < 0x90)
- {
- /* overlong UTF-8 sequence */
- r->_errno = EILSEQ;
- return -1;
- }
- if (ch < 0x80 || ch > 0xbf)
- {
- r->_errno = EILSEQ;
- return -1;
- }
- state->__value.__wchb[1] = ch;
- if (state->__count == 1)
- state->__count = 2;
- else if (n < (size_t)-1)
- ++n;
- if (n < 3)
- return -2;
- ch = (state->__count == 2) ? t[i++] : state->__value.__wchb[2];
- if (ch < 0x80 || ch > 0xbf)
- {
- r->_errno = EILSEQ;
- return -1;
- }
- state->__value.__wchb[2] = ch;
- state->__count = 3;
- if (n < 4)
+ state->__count = 1;
+ if (n <= 1)
return -2;
ch = t[i++];
- if (ch < 0x80 || ch > 0xbf)
- {
- r->_errno = EILSEQ;
- return -1;
- }
- tmp = (wint_t)((state->__value.__wchb[0] & 0x07) << 18)
- | (wint_t)((state->__value.__wchb[1] & 0x3f) << 12)
- | (wint_t)((state->__value.__wchb[2] & 0x3f) << 6)
- | (wint_t)(ch & 0x3f);
- if (tmp > 0xffff && sizeof(wchar_t) == 2)
- {
- /* On systems which have wchar_t being UTF-16 values, the value
- doesn't fit into a single wchar_t in this case. So what we
- do here is to store the state with a special value of __count
- and return the first half of a surrogate pair. As return
- value we choose to return the half of the actual UTF-8 char.
- The second half is returned in case we recognize the special
- __count value above. */
- state->__value.__wchb[3] = ch;
- state->__count = 4;
- *pwc = 0xd800 | (((tmp - 0x10000) >> 10) & 0x3ff);
- return 2;
- }
- *pwc = tmp;
+ }
+ }
+ if (state->__count == 1)
+ {
+ if (_iseucjp (ch))
+ {
+ *pwc = (((wchar_t)state->__value.__wchb[0]) << 8) + (wchar_t)ch;
state->__count = 0;
return i;
}
@@ -254,165 +452,141 @@ _DEFUN (_mbtowc_r, (r, pwc, s, n, state)
r->_errno = EILSEQ;
return -1;
}
- } - else if (!strcmp (__locale_charset (), "SJIS"))
+ }
+
+ *pwc = (wchar_t)*t;
+ + if (*t == '\0')
+ return 0;
+
+ return 1;
+}
+
+int
+_DEFUN (__jis_mbtowc, (r, pwc, s, n, charset, state),
+ struct _reent *r _AND
+ wchar_t *pwc _AND + const char *s _AND + size_t n _AND
+ const char *charset _AND
+ mbstate_t *state)
+{
+ wchar_t dummy;
+ unsigned char *t = (unsigned char *)s;
+ JIS_STATE curr_state;
+ JIS_ACTION action;
+ JIS_CHAR_TYPE ch;
+ unsigned char *ptr;
+ unsigned int i;
+ int curr_ch;
+
+ if (pwc == NULL)
+ pwc = &dummy;
+
+ if (s == NULL)
{
- int ch;
- int i = 0;
- if (s == NULL)
- return 0; /* not state-dependent */
- ch = t[i++];
- if (state->__count == 0)
- {
- if (_issjis1 (ch))
- {
- state->__value.__wchb[0] = ch;
- state->__count = 1;
- if (n <= 1)
- return -2;
- ch = t[i++];
- }
- }
- if (state->__count == 1)
- {
- if (_issjis2 (ch))
- {
- *pwc = (((wchar_t)state->__value.__wchb[0]) << 8) + (wchar_t)ch;
- state->__count = 0;
- return i;
- }
- else - {
- r->_errno = EILSEQ;
- return -1;
- }
- }
+ state->__state = ASCII;
+ return 1; /* state-dependent */
}
- else if (!strcmp (__locale_charset (), "EUCJP"))
+
+ if (n == 0)
+ return -2;
+
+ curr_state = state->__state;
+ ptr = t;
+
+ for (i = 0; i < n; ++i)
{
- int ch;
- int i = 0;
- if (s == NULL)
- return 0; /* not state-dependent */
- ch = t[i++];
- if (state->__count == 0)
+ curr_ch = t[i];
+ switch (curr_ch)
{
- if (_iseucjp (ch))
- {
- state->__value.__wchb[0] = ch;
- state->__count = 1;
- if (n <= 1)
- return -2;
- ch = t[i++];
- }
- }
- if (state->__count == 1)
- {
- if (_iseucjp (ch))
- {
- *pwc = (((wchar_t)state->__value.__wchb[0]) << 8) + (wchar_t)ch;
- state->__count = 0;
- return i;
- }
+ case ESC_CHAR:
+ ch = ESCAPE;
+ break;
+ case '$':
+ ch = DOLLAR;
+ break;
+ case '@':
+ ch = AT;
+ break;
+ case '(':
+ ch = BRACKET;
+ break;
+ case 'B':
+ ch = B;
+ break;
+ case 'J':
+ ch = J;
+ break;
+ case '\0':
+ ch = NUL;
+ break;
+ default:
+ if (_isjis (curr_ch))
+ ch = JIS_CHAR;
else
- {
- r->_errno = EILSEQ;
- return -1;
- }
+ ch = OTHER;
+ }
+
+ action = JIS_action_table[curr_state][ch];
+ curr_state = JIS_state_table[curr_state][ch];
+ + switch (action)
+ {
+ case NOOP:
+ break;
+ case EMPTY:
+ state->__state = ASCII;
+ *pwc = (wchar_t)0;
+ return 0;
+ case COPY_A:
+ state->__state = ASCII;
+ *pwc = (wchar_t)*ptr;
+ return (i + 1);
+ case COPY_J1:
+ state->__value.__wchb[0] = t[i];
+ break;
+ case COPY_J2:
+ state->__state = JIS;
+ *pwc = (((wchar_t)state->__value.__wchb[0]) << 8) + (wchar_t)(t[i]);
+ return (i + 1);
+ case MAKE_A:
+ ptr = (unsigned char *)(t + i + 1);
+ break;
+ case ERROR:
+ default:
+ r->_errno = EILSEQ;
+ return -1;
}
+
}
- else if (!strcmp (__locale_charset (), "JIS"))
- {
- JIS_STATE curr_state;
- JIS_ACTION action;
- JIS_CHAR_TYPE ch;
- unsigned char *ptr;
- unsigned int i;
- int curr_ch;
- - if (s == NULL)
- {
- state->__state = ASCII;
- return 1; /* state-dependent */
- }
-
- curr_state = state->__state;
- ptr = t;
-
- for (i = 0; i < n; ++i)
- {
- curr_ch = t[i];
- switch (curr_ch)
- {
- case ESC_CHAR:
- ch = ESCAPE;
- break;
- case '$':
- ch = DOLLAR;
- break;
- case '@':
- ch = AT;
- break;
- case '(':
- ch = BRACKET;
- break;
- case 'B':
- ch = B;
- break;
- case 'J':
- ch = J;
- break;
- case '\0':
- ch = NUL;
- break;
- default:
- if (_isjis (curr_ch))
- ch = JIS_CHAR;
- else
- ch = OTHER;
- }
- action = JIS_action_table[curr_state][ch];
- curr_state = JIS_state_table[curr_state][ch];
- - switch (action)
- {
- case NOOP:
- break;
- case EMPTY:
- state->__state = ASCII;
- *pwc = (wchar_t)0;
- return 0;
- case COPY_A:
- state->__state = ASCII;
- *pwc = (wchar_t)*ptr;
- return (i + 1);
- case COPY_J1:
- state->__value.__wchb[0] = t[i];
- break;
- case COPY_J2:
- state->__state = JIS;
- *pwc = (((wchar_t)state->__value.__wchb[0]) << 8) + (wchar_t)(t[i]);
- return (i + 1);
- case MAKE_A:
- ptr = (unsigned char *)(t + i + 1);
- break;
- case ERROR:
- default:
- r->_errno = EILSEQ;
- return -1;
- }
+ state->__state = curr_state;
+ return -2; /* n < bytes needed */
+}
+#endif /* !__CYGWIN__*/
+#endif /* _MB_CAPABLE */
- }
+int
+_DEFUN (__ascii_mbtowc, (r, pwc, s, n, charset, state),
+ struct _reent *r _AND
+ wchar_t *pwc _AND + const char *s _AND + size_t n _AND
+ const char *charset _AND
+ mbstate_t *state)
+{
+ wchar_t dummy;
+ unsigned char *t = (unsigned char *)s;
- state->__state = curr_state;
- return -2; /* n < bytes needed */
- }
-#endif /* _MB_CAPABLE */ + if (pwc == NULL)
+ pwc = &dummy;
- /* otherwise this must be the "C" locale or unknown locale */
if (s == NULL)
- return 0; /* not state-dependent */
+ return 0;
+
+ if (n == 0)
+ return -2;
*pwc = (wchar_t)*t;
@@ -421,3 +595,14 @@ _DEFUN (_mbtowc_r, (r, pwc, s, n, state)
return 1;
}
+
+int
+_DEFUN (_mbtowc_r, (r, pwc, s, n, state),
+ struct _reent *r _AND
+ wchar_t *pwc _AND + const char *s _AND + size_t n _AND
+ mbstate_t *state)
+{
+ return __mbtowc (r, pwc, s, n, __locale_charset (), state);
+}
Index: libc/stdlib/sb_charsets.c
===================================================================
RCS file: libc/stdlib/sb_charsets.c
diff -N libc/stdlib/sb_charsets.c
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ libc/stdlib/sb_charsets.c 22 Mar 2009 16:25:07 -0000
@@ -0,0 +1,697 @@
+#include <newlib.h>
+#include <wchar.h>
+
+#ifdef _MB_CAPABLE
+extern char *__locale_charset ();
+
+#ifdef _MB_EXTENDED_CHARSETS_ISO
+/* Tables for the ISO-8859-x to UTF conversion. The first index into the
+ table is a value computed from the value x (function __iso_8859_index),
+ the second index is the value of the incoming character - 0xa0.
+ Values < 0xa0 don't have to be converted anyway. */
+wchar_t __iso_8859_conv[14][0x60] = {
+ /* ISO-8859-2 */
+ { 0xa0, 0x104, 0x2d8, 0x141, 0xa4, 0x13d, 0x15a, 0xa7,
+ 0xa8, 0x160, 0x15e, 0x164, 0x179, 0xad, 0x17d, 0x17b,
+ 0xb0, 0x105, 0x2db, 0x142, 0xb4, 0x13e, 0x15b, 0x2c7,
+ 0xb8, 0x161, 0x15f, 0x165, 0x17a, 0x2dd, 0x17e, 0x17c,
+ 0x154, 0xc1, 0xc2, 0x102, 0xc4, 0x139, 0x106, 0xc7,
+ 0x10c, 0xc9, 0x118, 0xcb, 0x11a, 0xcd, 0xce, 0x10e,
+ 0x110, 0x143, 0x147, 0xd3, 0xd4, 0x150, 0xd6, 0xd7,
+ 0x158, 0x16e, 0xda, 0x170, 0xdc, 0xdd, 0x162, 0xdf,
+ 0x155, 0xe1, 0xe2, 0x103, 0xe4, 0x13a, 0x107, 0xe7,
+ 0x10d, 0xe9, 0x119, 0xeb, 0x11b, 0xed, 0xee, 0x10f,
+ 0x111, 0x144, 0x148, 0xf3, 0xf4, 0x151, 0xf6, 0xf7,
+ 0x159, 0x16f, 0xfa, 0x171, 0xfc, 0xfd, 0x163, 0x2d9 },
+ /* ISO-8859-3 */
+ { 0xa0, 0x126, 0x2d8, 0xa3, 0xa4, 0x0, 0x124, 0xa7,
+ 0xa8, 0x130, 0x15e, 0x11e, 0x134, 0xad, 0x0, 0x17b,
+ 0xb0, 0x127, 0xb2, 0xb3, 0xb4, 0xb5, 0x125, 0xb7,
+ 0xb8, 0x131, 0x15f, 0x11f, 0x135, 0xbd, 0x0, 0x17c,
+ 0xc0, 0xc1, 0xc2, 0x0, 0xc4, 0x10a, 0x108, 0xc7,
+ 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
+ 0x0, 0xd1, 0xd2, 0xd3, 0xd4, 0x120, 0xd6, 0xd7,
+ 0x11c, 0xd9, 0xda, 0xdb, 0xdc, 0x16c, 0x15c, 0xdf,
+ 0xe0, 0xe1, 0xe2, 0x0, 0xe4, 0x10b, 0x109, 0xe7,
+ 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
+ 0x0, 0xf1, 0xf2, 0xf3, 0xf4, 0x121, 0xf6, 0xf7,
+ 0x11d, 0xf9, 0xfa, 0xfb, 0xfc, 0x16d, 0x15d, 0x2d9 },
+ /* ISO-8859-4 */
+ { 0xa0, 0x104, 0x138, 0x156, 0xa4, 0x128, 0x13b, 0xa7,
+ 0xa8, 0x160, 0x112, 0x122, 0x166, 0xad, 0x17d, 0xaf,
+ 0xb0, 0x105, 0x2db, 0x157, 0xb4, 0x129, 0x13c, 0x2c7,
+ 0xb8, 0x161, 0x113, 0x123, 0x167, 0x14a, 0x17e, 0x14b,
+ 0x100, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0x12e,
+ 0x10c, 0xc9, 0x118, 0xcb, 0x116, 0xcd, 0xce, 0x12a,
+ 0x110, 0x145, 0x14c, 0x136, 0xd4, 0xd5, 0xd6, 0xd7,
+ 0xd8, 0x172, 0xda, 0xdb, 0xdc, 0x168, 0x16a, 0xdf,
+ 0x101, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0x12f,
+ 0x10d, 0xe9, 0x119, 0xeb, 0x117, 0xed, 0xee, 0x12b,
+ 0x111, 0x146, 0x14d, 0x137, 0xf4, 0xf5, 0xf6, 0xf7,
+ 0xf8, 0x173, 0xfa, 0xfb, 0xfc, 0x169, 0x16b, 0x2d9 },
+ /* ISO-8859-5 */
+ { 0xa0, 0x401, 0x402, 0x403, 0x404, 0x405, 0x406, 0x407,
+ 0x408, 0x409, 0x40a, 0x40b, 0x40c, 0xad, 0x40e, 0x40f,
+ 0x410, 0x411, 0x412, 0x413, 0x414, 0x415, 0x416, 0x417,
+ 0x418, 0x419, 0x41a, 0x41b, 0x41c, 0x41d, 0x41e, 0x41f,
+ 0x420, 0x421, 0x422, 0x423, 0x424, 0x425, 0x426, 0x427,
+ 0x428, 0x429, 0x42a, 0x42b, 0x42c, 0x42d, 0x42e, 0x42f,
+ 0x430, 0x431, 0x432, 0x433, 0x434, 0x435, 0x436, 0x437,
+ 0x438, 0x439, 0x43a, 0x43b, 0x43c, 0x43d, 0x43e, 0x43f,
+ 0x440, 0x441, 0x442, 0x443, 0x444, 0x445, 0x446, 0x447,
+ 0x448, 0x449, 0x44a, 0x44b, 0x44c, 0x44d, 0x44e, 0x44f,
+ 0x2116, 0x451, 0x452, 0x453, 0x454, 0x455, 0x456, 0x457,
+ 0x458, 0x459, 0x45a, 0x45b, 0x45c, 0xa7, 0x45e, 0x45f },
+ /* ISO-8859-6 */
+ { 0xa0, 0x0, 0x0, 0x0, 0xa4, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0, 0x60c, 0xad, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x61b, 0x0, 0x0, 0x0, 0x61f,
+ 0x0, 0x621, 0x622, 0x623, 0x624, 0x625, 0x626, 0x627,
+ 0x628, 0x629, 0x62a, 0x62b, 0x62c, 0x62d, 0x62e, 0x62f,
+ 0x630, 0x631, 0x632, 0x633, 0x634, 0x635, 0x636, 0x637,
+ 0x638, 0x639, 0x63a, 0x0, 0x0, 0x0, 0x0, 0x0,
+ 0x640, 0x641, 0x642, 0x643, 0x644, 0x645, 0x646, 0x647,
+ 0x648, 0x649, 0x64a, 0x64b, 0x64c, 0x64d, 0x64e, 0x64f,
+ 0x650, 0x651, 0x652, 0x64b, 0xf4, 0xf5, 0xf6, 0xf7,
+ 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff },
+ /* ISO-8859-7 */
+ { 0xa0, 0x2018, 0x2019, 0xa3, 0x20ac, 0x20af, 0xa6, 0xa7,
+ 0xa8, 0xa9, 0x37a, 0xab, 0xac, 0xad, 0x0, 0x2015,
+ 0xb0, 0xb1, 0xb2, 0xb3, 0x384, 0x385, 0x386, 0xb7,
+ 0x388, 0x389, 0x38a, 0xbb, 0x38c, 0xbd, 0x38e, 0x38f,
+ 0x390, 0x391, 0x392, 0x393, 0x394, 0x395, 0x396, 0x397,
+ 0x398, 0x399, 0x39a, 0x39b, 0x39c, 0x39d, 0x39e, 0x39f,
+ 0x3a0, 0x3a1, 0x0, 0x3a3, 0x3a4, 0x3a5, 0x3a6, 0x3a7,
+ 0x3a8, 0x3a9, 0x3aa, 0x3ab, 0x3ac, 0x3ad, 0x3ae, 0x3af,
+ 0x3b0, 0x3b1, 0x3b2, 0x3b3, 0x3b4, 0x3b5, 0x3b6, 0x3b7,
+ 0x3b8, 0x3b9, 0x3ba, 0x3bb, 0x3bc, 0x3bd, 0x3be, 0x3bf,
+ 0x3c0, 0x3c1, 0x3c2, 0x3c3, 0x3c4, 0x3c5, 0x3c6, 0x3c7,
+ 0x3c8, 0x3c9, 0x3ca, 0x3cb, 0x3cc, 0x3cd, 0x3ce, 0xff },
+ /* ISO-8859-8 */
+ { 0xa0, 0x0, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
+ 0xa8, 0xa9, 0xd7, 0xab, 0xac, 0xad, 0xae, 0xaf,
+ 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
+ 0xb8, 0xb9, 0xf7, 0xbb, 0xbc, 0xbd, 0xbe, 0x0,
+ 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2017,
+ 0x5d0, 0x5d1, 0x5d2, 0x5d3, 0x5d4, 0x5d5, 0x5d6, 0x5d7,
+ 0x5d8, 0x5d9, 0x5da, 0x5db, 0x5dc, 0x5dd, 0x5de, 0x5df,
+ 0x5e0, 0x5e1, 0x5e2, 0x5e3, 0x5e4, 0x5e5, 0x5e6, 0x5e7,
+ 0x5e8, 0x5e9, 0x5ea, 0x0, 0x0, 0x200e, 0x200f, 0x200e },
+ /* ISO-8859-9 */
+ { 0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
+ 0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf,
+ 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
+ 0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf,
+ 0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7,
+ 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
+ 0x11e, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7,
+ 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0x130, 0x15e, 0xdf,
+ 0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
+ 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
+ 0x11f, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7,
+ 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0x131, 0x15f, 0xff },
+ /* ISO-8859-10 */
+ { 0xa0, 0x104, 0x112, 0x122, 0x12a, 0x128, 0x136, 0xa7,
+ 0x13b, 0x110, 0x160, 0x166, 0x17d, 0xad, 0x16a, 0x14a,
+ 0xb0, 0x105, 0x113, 0x123, 0x12b, 0x129, 0x137, 0xb7,
+ 0x13c, 0x111, 0x161, 0x167, 0x17e, 0x2015, 0x16b, 0x14b,
+ 0x100, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0x12e,
+ 0x10c, 0xc9, 0x118, 0xcb, 0x116, 0xcd, 0xce, 0xcf,
+ 0xd0, 0x145, 0x14c, 0xd3, 0xd4, 0xd5, 0xd6, 0x168,
+ 0xd8, 0x172, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf,
+ 0x101, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0x12f,
+ 0x10d, 0xe9, 0x119, 0xeb, 0x117, 0xed, 0xee, 0xef,
+ 0xf0, 0x146, 0x14d, 0xf3, 0xf4, 0xf5, 0xf6, 0x169,
+ 0xf8, 0x173, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0x138 },
+ /* ISO-8859-11 */
+ { 0xa0, 0xe01, 0xe02, 0xe03, 0xe04, 0xe05, 0xe06, 0xe07,
+ 0xe08, 0xe09, 0xe0a, 0xe0b, 0xe0c, 0xe0d, 0xe0e, 0xe0f,
+ 0xe10, 0xe11, 0xe12, 0xe13, 0xe14, 0xe15, 0xe16, 0xe17,
+ 0xe18, 0xe19, 0xe1a, 0xe1b, 0xe1c, 0xe1d, 0xe1e, 0xe1f,
+ 0xe20, 0xe21, 0xe22, 0xe23, 0xe24, 0xe25, 0xe26, 0xe27,
+ 0xe28, 0xe29, 0xe2a, 0xe2b, 0xe2c, 0xe2d, 0xe2e, 0xe2f,
+ 0xe30, 0xe31, 0xe32, 0xe33, 0xe34, 0xe35, 0xe36, 0xe37,
+ 0xe38, 0xe39, 0xe3a, 0x0, 0x0, 0x0, 0x0, 0xe3f,
+ 0xe40, 0xe41, 0xe42, 0xe43, 0xe44, 0xe45, 0xe46, 0xe47,
+ 0xe48, 0xe49, 0xe4a, 0xe4b, 0xe4c, 0xe4d, 0xe4e, 0xe4f,
+ 0xe50, 0xe51, 0xe52, 0xe53, 0xe54, 0xe55, 0xe56, 0xe57,
+ 0xe58, 0xe59, 0xe5a, 0xe5b, 0xe31, 0xe34, 0xe47, 0xff },
+ /* ISO-8859-12 doesn't exist. The below code decrements the index
+ into the table by one for ISO numbers > 12. */
+ /* ISO-8859-13 */
+ { 0xa0, 0x201d, 0xa2, 0xa3, 0xa4, 0x201e, 0xa6, 0xa7,
+ 0xd8, 0xa9, 0x156, 0xab, 0xac, 0xad, 0xae, 0xc6,
+ 0xb0, 0xb1, 0xb2, 0xb3, 0x201c, 0xb5, 0xb6, 0xb7,
+ 0xf8, 0xb9, 0x157, 0xbb, 0xbc, 0xbd, 0xbe, 0xe6,
+ 0x104, 0x12e, 0x100, 0x106, 0xc4, 0xc5, 0x118, 0x112,
+ 0x10c, 0xc9, 0x179, 0x116, 0x122, 0x136, 0x12a, 0x13b,
+ 0x160, 0x143, 0x145, 0xd3, 0x14c, 0xd5, 0xd6, 0xd7,
+ 0x172, 0x141, 0x15a, 0x16a, 0xdc, 0x17b, 0x17d, 0xdf,
+ 0x105, 0x12f, 0x101, 0x107, 0xe4, 0xe5, 0x119, 0x113,
+ 0x10d, 0xe9, 0x17a, 0x117, 0x123, 0x137, 0x12b, 0x13c,
+ 0x161, 0x144, 0x146, 0xf3, 0x14d, 0xf5, 0xf6, 0xf7,
+ 0x173, 0x142, 0x15b, 0x16b, 0xfc, 0x17c, 0x17e, 0x2019 },
+ /* ISO-8859-14 */
+ { 0xa0, 0x1e02, 0x1e03, 0xa3, 0x10a, 0x10b, 0x1e0a, 0xa7,
+ 0x1e80, 0xa9, 0x1e82, 0x1e0b, 0x1ef2, 0xad, 0xae, 0x178,
+ 0x1e1e, 0x1e1f, 0x120, 0x121, 0x1e40, 0x1e41, 0xb6, 0x1e56,
+ 0x1e81, 0x1e57, 0x1e83, 0x1e60, 0x1ef3, 0x1e84, 0x1e85, 0x1e61,
+ 0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7,
+ 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
+ 0x174, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0x1e6a,
+ 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0x176, 0xdf,
+ 0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
+ 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
+ 0x175, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0x1e6b,
+ 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0x177, 0xff },
+ /* ISO-8859-15 */
+ { 0xa0, 0xa1, 0xa2, 0xa3, 0x20ac, 0xa5, 0x160, 0xa7,
+ 0x161, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf,
+ 0xb0, 0xb1, 0xb2, 0xb3, 0x17d, 0xb5, 0xb6, 0xb7,
+ 0x17e, 0xb9, 0xba, 0xbb, 0x152, 0x153, 0x178, 0xbf,
+ 0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7,
+ 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
+ 0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7,
+ 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf,
+ 0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
+ 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
+ 0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7,
+ 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff },
+ /* ISO-8859-16 */
+ { 0xa0, 0x104, 0x105, 0x141, 0x20ac, 0x201e, 0x160, 0xa7,
+ 0x161, 0xa9, 0x218, 0xab, 0x179, 0xad, 0x17a, 0x17b,
+ 0xb0, 0xb1, 0x10c, 0x142, 0x17d, 0x201d, 0xb6, 0xb7,
+ 0x17e, 0x10d, 0x219, 0xbb, 0x152, 0x153, 0x178, 0x17c,
+ 0xc0, 0xc1, 0xc2, 0x102, 0xc4, 0x106, 0xc6, 0xc7,
+ 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
+ 0x110, 0x143, 0xd2, 0xd3, 0xd4, 0x150, 0xd6, 0x15a,
+ 0x170, 0xd9, 0xda, 0xdb, 0xdc, 0x118, 0x21a, 0xdf,
+ 0xe0, 0xe1, 0xe2, 0x103, 0xe4, 0x107, 0xe6, 0xe7,
+ 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
+ 0x111, 0x144, 0xf2, 0xf3, 0xf4, 0x151, 0xf6, 0x15b,
+ 0x171, 0xf9, 0xfa, 0xfb, 0xfc, 0x119, 0x21b, 0xff }
+};
+#endif /* _MB_EXTENDED_CHARSETS_ISO */
+
+#ifdef _MB_EXTENDED_CHARSETS_DOS
+/* Tables for the Windows default singlebyte ANSI codepage conversion. + The first index into the table is a value computed from the codepage
+ value (function __cp_index), the second index is the value of the
+ incoming character - 0x80.
+ Values < 0x80 don't have to be converted anyway. */
+wchar_t __cp_conv[22][0x80] = {
+ /* CP437 */
+ { 0xc7, 0xfc, 0xe9, 0xe2, 0xe4, 0xe0, 0xe5, 0xe7,
+ 0xea, 0xeb, 0xe8, 0xef, 0xee, 0xec, 0xc4, 0xc5,
+ 0xc9, 0xe6, 0xc6, 0xf4, 0xf6, 0xf2, 0xfb, 0xf9,
+ 0xff, 0xd6, 0xdc, 0xa2, 0xa3, 0xa5, 0x20a7, 0x192,
+ 0xe1, 0xed, 0xf3, 0xfa, 0xf1, 0xd1, 0xaa, 0xba,
+ 0xbf, 0x2310, 0xac, 0xbd, 0xbc, 0xa1, 0xab, 0xbb,
+ 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0x2561, 0x2562, 0x2556,
+ 0x2555, 0x2563, 0x2551, 0x2557, 0x255d, 0x255c, 0x255b, 0x2510,
+ 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0x255e, 0x255f,
+ 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0x2567,
+ 0x2568, 0x2564, 0x2565, 0x2559, 0x2558, 0x2552, 0x2553, 0x256b,
+ 0x256a, 0x2518, 0x250c, 0x2588, 0x2584, 0x258c, 0x2590, 0x2580,
+ 0x3b1, 0xdf, 0x393, 0x3c0, 0x3a3, 0x3c3, 0xb5, 0x3c4,
+ 0x3a6, 0x398, 0x3a9, 0x3b4, 0x221e, 0x3c6, 0x3b5, 0x2229,
+ 0x2261, 0xb1, 0x2265, 0x2264, 0x2320, 0x2321, 0xf7, 0x2248,
+ 0xb0, 0x2219, 0xb7, 0x221a, 0x207f, 0xb2, 0x25a0, 0xa0 },
+ /* CP720 */
+ { 0x0, 0x0, 0xe9, 0xe2, 0x0, 0xe0, 0x0, 0xe7,
+ 0xea, 0xeb, 0xe8, 0xef, 0xee, 0x0, 0x0, 0x0,
+ 0x0, 0x651, 0x652, 0xf4, 0xa4, 0x640, 0xfb, 0xf9,
+ 0x621, 0x622, 0x623, 0x624, 0xa3, 0x625, 0x626, 0x627,
+ 0x628, 0x629, 0x62a, 0x62b, 0x62c, 0x62d, 0x62e, 0x62f,
+ 0x630, 0x631, 0x632, 0x633, 0x634, 0x635, 0xab, 0xbb,
+ 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0x2561, 0x2562, 0x2556,
+ 0x2555, 0x2563, 0x2551, 0x2557, 0x255d, 0x255c, 0x255b, 0x2510,
+ 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0x255e, 0x255f,
+ 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0x2567,
+ 0x2568, 0x2564, 0x2565, 0x2559, 0x2558, 0x2552, 0x2553, 0x256b,
+ 0x256a, 0x2518, 0x250c, 0x2588, 0x2584, 0x258c, 0x2590, 0x2580,
+ 0x636, 0x637, 0x638, 0x639, 0x63a, 0x641, 0xb5, 0x642,
+ 0x643, 0x644, 0x645, 0x646, 0x647, 0x648, 0x649, 0x64a,
+ 0x2261, 0x64b, 0x64c, 0x64d, 0x64e, 0x64f, 0x650, 0x2248,
+ 0xb0, 0x2219, 0xb7, 0x221a, 0x207f, 0xb2, 0x25a0, 0xa0 },
+ /* CP737 */
+ { 0x391, 0x392, 0x393, 0x394, 0x395, 0x396, 0x397, 0x398,
+ 0x399, 0x39a, 0x39b, 0x39c, 0x39d, 0x39e, 0x39f, 0x3a0,
+ 0x3a1, 0x3a3, 0x3a4, 0x3a5, 0x3a6, 0x3a7, 0x3a8, 0x3a9,
+ 0x3b1, 0x3b2, 0x3b3, 0x3b4, 0x3b5, 0x3b6, 0x3b7, 0x3b8,
+ 0x3b9, 0x3ba, 0x3bb, 0x3bc, 0x3bd, 0x3be, 0x3bf, 0x3c0,
+ 0x3c1, 0x3c3, 0x3c2, 0x3c4, 0x3c5, 0x3c6, 0x3c7, 0x3c8,
+ 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0x2561, 0x2562, 0x2556,
+ 0x2555, 0x2563, 0x2551, 0x2557, 0x255d, 0x255c, 0x255b, 0x2510,
+ 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0x255e, 0x255f,
+ 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0x2567,
+ 0x2568, 0x2564, 0x2565, 0x2559, 0x2558, 0x2552, 0x2553, 0x256b,
+ 0x256a, 0x2518, 0x250c, 0x2588, 0x2584, 0x258c, 0x2590, 0x2580,
+ 0x3c9, 0x3ac, 0x3ad, 0x3ae, 0x3ca, 0x3af, 0x3cc, 0x3cd,
+ 0x3cb, 0x3ce, 0x386, 0x388, 0x389, 0x38a, 0x38c, 0x38e,
+ 0x38f, 0xb1, 0x2265, 0x2264, 0x3aa, 0x3ab, 0xf7, 0x2248,
+ 0xb0, 0x2219, 0xb7, 0x221a, 0x207f, 0xb2, 0x25a0, 0xa0 },
+ /* CP775 */
+ { 0x106, 0xfc, 0xe9, 0x101, 0xe4, 0x123, 0xe5, 0x107,
+ 0x142, 0x113, 0x156, 0x157, 0x12b, 0x179, 0xc4, 0xc5,
+ 0xc9, 0xe6, 0xc6, 0x14d, 0xf6, 0x122, 0xa2, 0x15a,
+ 0x15b, 0xd6, 0xdc, 0xf8, 0xa3, 0xd8, 0xd7, 0xa4,
+ 0x100, 0x12a, 0xf3, 0x17b, 0x17c, 0x17a, 0x201d, 0xa6,
+ 0xa9, 0xae, 0xac, 0xbd, 0xbc, 0x141, 0xab, 0xbb,
+ 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0x104, 0x10c, 0x118,
+ 0x116, 0x2563, 0x2551, 0x2557, 0x255d, 0x12e, 0x160, 0x2510,
+ 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0x172, 0x16a,
+ 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0x17d,
+ 0x105, 0x10d, 0x119, 0x117, 0x12f, 0x161, 0x173, 0x16b,
+ 0x17e, 0x2518, 0x250c, 0x2588, 0x2584, 0x258c, 0x2590, 0x2580,
+ 0xd3, 0xdf, 0x14c, 0x143, 0xf5, 0xd5, 0xb5, 0x144,
+ 0x136, 0x137, 0x13b, 0x13c, 0x146, 0x112, 0x145, 0x2019,
+ 0xad, 0xb1, 0x201c, 0xbe, 0xb6, 0xa7, 0xf7, 0x201e,
+ 0xb0, 0x2219, 0xb7, 0xb9, 0xb3, 0xb2, 0x25a0, 0xa0 },
+ /* CP850 */
+ { 0xc7, 0xfc, 0xe9, 0xe2, 0xe4, 0xe0, 0xe5, 0xe7,
+ 0xea, 0xeb, 0xe8, 0xef, 0xee, 0xec, 0xc4, 0xc5,
+ 0xc9, 0xe6, 0xc6, 0xf4, 0xf6, 0xf2, 0xfb, 0xf9,
+ 0xff, 0xd6, 0xdc, 0xf8, 0xa3, 0xd8, 0xd7, 0x192,
+ 0xe1, 0xed, 0xf3, 0xfa, 0xf1, 0xd1, 0xaa, 0xba,
+ 0xbf, 0xae, 0xac, 0xbd, 0xbc, 0xa1, 0xab, 0xbb,
+ 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0xc1, 0xc2, 0xc0,
+ 0xa9, 0x2563, 0x2551, 0x2557, 0x255d, 0xa2, 0xa5, 0x2510,
+ 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0xe3, 0xc3,
+ 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0xa4,
+ 0xf0, 0xd0, 0xca, 0xcb, 0xc8, 0x131, 0xcd, 0xce,
+ 0xcf, 0x2518, 0x250c, 0x2588, 0x2584, 0xa6, 0xcc, 0x2580,
+ 0xd3, 0xdf, 0xd4, 0xd2, 0xf5, 0xd5, 0xb5, 0xfe,
+ 0xde, 0xda, 0xdb, 0xd9, 0xfd, 0xdd, 0xaf, 0xb4,
+ 0xad, 0xb1, 0x2017, 0xbe, 0xb6, 0xa7, 0xf7, 0xb8,
+ 0xb0, 0xa8, 0xb7, 0xb9, 0xb3, 0xb2, 0x25a0, 0xa0 },
+ /* CP852 */
+ { 0xc7, 0xfc, 0xe9, 0xe2, 0xe4, 0x16f, 0x107, 0xe7,
+ 0x142, 0xeb, 0x150, 0x151, 0xee, 0x179, 0xc4, 0x106,
+ 0xc9, 0x139, 0x13a, 0xf4, 0xf6, 0x13d, 0x13e, 0x15a,
+ 0x15b, 0xd6, 0xdc, 0x164, 0x165, 0x141, 0xd7, 0x10d,
+ 0xe1, 0xed, 0xf3, 0xfa, 0x104, 0x105, 0x17d, 0x17e,
+ 0x118, 0x119, 0xac, 0x17a, 0x10c, 0x15f, 0xab, 0xbb,
+ 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0xc1, 0xc2, 0x11a,
+ 0x15e, 0x2563, 0x2551, 0x2557, 0x255d, 0x17b, 0x17c, 0x2510,
+ 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0x102, 0x103,
+ 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0xa4,
+ 0x111, 0x110, 0x10e, 0xcb, 0x10f, 0x147, 0xcd, 0xce,
+ 0x11b, 0x2518, 0x250c, 0x2588, 0x2584, 0x162, 0x16e, 0x2580,
+ 0xd3, 0xdf, 0xd4, 0x143, 0x144, 0x148, 0x160, 0x161,
+ 0x154, 0xda, 0x155, 0x170, 0xfd, 0xdd, 0x163, 0xb4,
+ 0xad, 0x2dd, 0x2db, 0x2c7, 0x2d8, 0xa7, 0xf7, 0xb8,
+ 0xb0, 0xa8, 0x2d9, 0x171, 0x158, 0x159, 0x25a0, 0xa0 },
+ /* CP855 */
+ { 0x452, 0x402, 0x453, 0x403, 0x451, 0x401, 0x454, 0x404,
+ 0x455, 0x405, 0x456, 0x406, 0x457, 0x407, 0x458, 0x408,
+ 0x459, 0x409, 0x45a, 0x40a, 0x45b, 0x40b, 0x45c, 0x40c,
+ 0x45e, 0x40e, 0x45f, 0x40f, 0x44e, 0x42e, 0x44a, 0x42a,
+ 0x430, 0x410, 0x431, 0x411, 0x446, 0x426, 0x434, 0x414,
+ 0x435, 0x415, 0x444, 0x424, 0x433, 0x413, 0xab, 0xbb,
+ 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0x445, 0x425, 0x438,
+ 0x418, 0x2563, 0x2551, 0x2557, 0x255d, 0x439, 0x419, 0x2510,
+ 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0x43a, 0x41a,
+ 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0xa4,
+ 0x43b, 0x41b, 0x43c, 0x41c, 0x43d, 0x41d, 0x43e, 0x41e,
+ 0x43f, 0x2518, 0x250c, 0x2588, 0x2584, 0x41f, 0x44f, 0x2580,
+ 0x42f, 0x440, 0x420, 0x441, 0x421, 0x442, 0x422, 0x443,
+ 0x423, 0x436, 0x416, 0x432, 0x412, 0x44c, 0x42c, 0x2116,
+ 0xad, 0x44b, 0x42b, 0x437, 0x417, 0x448, 0x428, 0x44d,
+ 0x42d, 0x449, 0x429, 0x447, 0x427, 0xa7, 0x25a0, 0xa0 },
+ /* CP857 */
+ { 0xc7, 0xfc, 0xe9, 0xe2, 0xe4, 0xe0, 0xe5, 0xe7,
+ 0xea, 0xeb, 0xe8, 0xef, 0xee, 0x131, 0xc4, 0xc5,
+ 0xc9, 0xe6, 0xc6, 0xf4, 0xf6, 0xf2, 0xfb, 0xf9,
+ 0x130, 0xd6, 0xdc, 0xf8, 0xa3, 0xd8, 0x15e, 0x15f,
+ 0xe1, 0xed, 0xf3, 0xfa, 0xf1, 0xd1, 0x11e, 0x11f,
+ 0xbf, 0xae, 0xac, 0xbd, 0xbc, 0xa1, 0xab, 0xbb,
+ 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0xc1, 0xc2, 0xc0,
+ 0xa9, 0x2563, 0x2551, 0x2557, 0x255d, 0xa2, 0xa5, 0x2510,
+ 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0xe3, 0xc3,
+ 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0xa4,
+ 0xba, 0xaa, 0xca, 0xcb, 0xc8, 0x0, 0xcd, 0xce,
+ 0xcf, 0x2518, 0x250c, 0x2588, 0x2584, 0xa6, 0xcc, 0x2580,
+ 0xd3, 0xdf, 0xd4, 0xd2, 0xf5, 0xd5, 0xb5, 0x0,
+ 0xd7, 0xda, 0xdb, 0xd9, 0xec, 0xff, 0xaf, 0xb4,
+ 0xad, 0xb1, 0x0, 0xbe, 0xb6, 0xa7, 0xf7, 0xb8,
+ 0xb0, 0xa8, 0xb7, 0xb9, 0xb3, 0xb2, 0x25a0, 0xa0 },
+ /* CP858 */
+ { 0xc7, 0xfc, 0xe9, 0xe2, 0xe4, 0xe0, 0xe5, 0xe7,
+ 0xea, 0xeb, 0xe8, 0xef, 0xee, 0xec, 0xc4, 0xc5,
+ 0xc9, 0xe6, 0xc6, 0xf4, 0xf6, 0xf2, 0xfb, 0xf9,
+ 0xff, 0xd6, 0xdc, 0xf8, 0xa3, 0xd8, 0xd7, 0x192,
+ 0xe1, 0xed, 0xf3, 0xfa, 0xf1, 0xd1, 0xaa, 0xba,
+ 0xbf, 0xae, 0xac, 0xbd, 0xbc, 0xa1, 0xab, 0xbb,
+ 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0xc1, 0xc2, 0xc0,
+ 0xa9, 0x2563, 0x2551, 0x2557, 0x255d, 0xa2, 0xa5, 0x2510,
+ 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0xe3, 0xc3,
+ 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0xa4,
+ 0xf0, 0xd0, 0xca, 0xcb, 0xc8, 0x20ac, 0xcd, 0xce,
+ 0xcf, 0x2518, 0x250c, 0x2588, 0x2584, 0xa6, 0xcc, 0x2580,
+ 0xd3, 0xdf, 0xd4, 0xd2, 0xf5, 0xd5, 0xb5, 0xfe,
+ 0xde, 0xda, 0xdb, 0xd9, 0xfd, 0xdd, 0xaf, 0xb4,
+ 0xad, 0xb1, 0x2017, 0xbe, 0xb6, 0xa7, 0xf7, 0xb8,
+ 0xb0, 0xa8, 0xb7, 0xb9, 0xb3, 0xb2, 0x25a0, 0xa0 },
+ /* CP862 */
+ { 0x5d0, 0x5d1, 0x5d2, 0x5d3, 0x5d4, 0x5d5, 0x5d6, 0x5d7,
+ 0x5d8, 0x5d9, 0x5da, 0x5db, 0x5dc, 0x5dd, 0x5de, 0x5df,
+ 0x5e0, 0x5e1, 0x5e2, 0x5e3, 0x5e4, 0x5e5, 0x5e6, 0x5e7,
+ 0x5e8, 0x5e9, 0x5ea, 0xa2, 0xa3, 0xa5, 0x20a7, 0x192,
+ 0xe1, 0xed, 0xf3, 0xfa, 0xf1, 0xd1, 0xaa, 0xba,
+ 0xbf, 0x2310, 0xac, 0xbd, 0xbc, 0xa1, 0xab, 0xbb,
+ 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0x2561, 0x2562, 0x2556,
+ 0x2555, 0x2563, 0x2551, 0x2557, 0x255d, 0x255c, 0x255b, 0x2510,
+ 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0x255e, 0x255f,
+ 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0x2567,
+ 0x2568, 0x2564, 0x2565, 0x2559, 0x2558, 0x2552, 0x2553, 0x256b,
+ 0x256a, 0x2518, 0x250c, 0x2588, 0x2584, 0x258c, 0x2590, 0x2580,
+ 0x3b1, 0xdf, 0x393, 0x3c0, 0x3a3, 0x3c3, 0xb5, 0x3c4,
+ 0x3a6, 0x398, 0x3a9, 0x3b4, 0x221e, 0x3c6, 0x3b5, 0x2229,
+ 0x2261, 0xb1, 0x2265, 0x2264, 0x2320, 0x2321, 0xf7, 0x2248,
+ 0xb0, 0x2219, 0xb7, 0x221a, 0x207f, 0xb2, 0x25a0, 0xa0 },
+ /* CP866 */
+ { 0x410, 0x411, 0x412, 0x413, 0x414, 0x415, 0x416, 0x417,
+ 0x418, 0x419, 0x41a, 0x41b, 0x41c, 0x41d, 0x41e, 0x41f,
+ 0x420, 0x421, 0x422, 0x423, 0x424, 0x425, 0x426, 0x427,
+ 0x428, 0x429, 0x42a, 0x42b, 0x42c, 0x42d, 0x42e, 0x42f,
+ 0x430, 0x431, 0x432, 0x433, 0x434, 0x435, 0x436, 0x437,
+ 0x438, 0x439, 0x43a, 0x43b, 0x43c, 0x43d, 0x43e, 0x43f,
+ 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0x2561, 0x2562, 0x2556,
+ 0x2555, 0x2563, 0x2551, 0x2557, 0x255d, 0x255c, 0x255b, 0x2510,
+ 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0x255e, 0x255f,
+ 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0x2567,
+ 0x2568, 0x2564, 0x2565, 0x2559, 0x2558, 0x2552, 0x2553, 0x256b,
+ 0x256a, 0x2518, 0x250c, 0x2588, 0x2584, 0x258c, 0x2590, 0x2580,
+ 0x440, 0x441, 0x442, 0x443, 0x444, 0x445, 0x446, 0x447,
+ 0x448, 0x449, 0x44a, 0x44b, 0x44c, 0x44d, 0x44e, 0x44f,
+ 0x401, 0x451, 0x404, 0x454, 0x407, 0x457, 0x40e, 0x45e,
+ 0xb0, 0x2219, 0xb7, 0x221a, 0x2116, 0xa4, 0x25a0, 0xa0 },
+ /* CP874 */
+ { 0x20ac, 0x0, 0x0, 0x0, 0x0, 0x2026, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x2018, 0x2019, 0x201c, 0x201d, 0x2022, 0x2013, 0x2014,
+ 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+ 0xa0, 0xe01, 0xe02, 0xe03, 0xe04, 0xe05, 0xe06, 0xe07,
+ 0xe08, 0xe09, 0xe0a, 0xe0b, 0xe0c, 0xe0d, 0xe0e, 0xe0f,
+ 0xe10, 0xe11, 0xe12, 0xe13, 0xe14, 0xe15, 0xe16, 0xe17,
+ 0xe18, 0xe19, 0xe1a, 0xe1b, 0xe1c, 0xe1d, 0xe1e, 0xe1f,
+ 0xe20, 0xe21, 0xe22, 0xe23, 0xe24, 0xe25, 0xe26, 0xe27,
+ 0xe28, 0xe29, 0xe2a, 0xe2b, 0xe2c, 0xe2d, 0xe2e, 0xe2f,
+ 0xe30, 0xe31, 0xe32, 0xe33, 0xe34, 0xe35, 0xe36, 0xe37,
+ 0xe38, 0xe39, 0xe3a, 0x0, 0x0, 0x0, 0x0, 0xe3f,
+ 0xe40, 0xe41, 0xe42, 0xe43, 0xe44, 0xe45, 0xe46, 0xe47,
+ 0xe48, 0xe49, 0xe4a, 0xe4b, 0xe4c, 0xe4d, 0xe4e, 0xe4f,
+ 0xe50, 0xe51, 0xe52, 0xe53, 0xe54, 0xe55, 0xe56, 0xe57,
+ 0xe58, 0xe59, 0xe5a, 0xe5b, 0xfc, 0xfd, 0xfe, 0xff },
+ /* CP1125 */
+ { 0x410, 0x411, 0x412, 0x413, 0x414, 0x415, 0x416, 0x417,
+ 0x418, 0x419, 0x41a, 0x41b, 0x41c, 0x41d, 0x41e, 0x41f,
+ 0x420, 0x421, 0x422, 0x423, 0x424, 0x425, 0x426, 0x427,
+ 0x428, 0x429, 0x42a, 0x42b, 0x42c, 0x42d, 0x42e, 0x42f,
+ 0x430, 0x431, 0x432, 0x433, 0x434, 0x435, 0x436, 0x437,
+ 0x438, 0x439, 0x43a, 0x43b, 0x43c, 0x43d, 0x43e, 0x43f,
+ 0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0x2561, 0x2562, 0x2556,
+ 0x2555, 0x2563, 0x2551, 0x2557, 0x255d, 0x255c, 0x255b, 0x2510,
+ 0x2514, 0x2534, 0x252c, 0x251c, 0x2500, 0x253c, 0x255e, 0x255f,
+ 0x255a, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256c, 0x2567,
+ 0x2568, 0x2564, 0x2565, 0x2559, 0x2558, 0x2552, 0x2553, 0x256b,
+ 0x256a, 0x2518, 0x250c, 0x2588, 0x2584, 0x258c, 0x2590, 0x2580,
+ 0x440, 0x441, 0x442, 0x443, 0x444, 0x445, 0x446, 0x447,
+ 0x448, 0x449, 0x44a, 0x44b, 0x44c, 0x44d, 0x44e, 0x44f,
+ 0x401, 0x451, 0x490, 0x491, 0x404, 0x454, 0x406, 0x456,
+ 0x407, 0x457, 0xb7, 0x221a, 0x2116, 0xa4, 0x25a0, 0xa0 },
+ /* CP1250 */
+ { 0x20ac, 0x0, 0x201a, 0x0, 0x201e, 0x2026, 0x2020, 0x2021,
+ 0x0, 0x2030, 0x160, 0x2039, 0x15a, 0x164, 0x17d, 0x179,
+ 0x0, 0x2018, 0x2019, 0x201c, 0x201d, 0x2022, 0x2013, 0x2014,
+ 0x0, 0x2122, 0x161, 0x203a, 0x15b, 0x165, 0x17e, 0x17a,
+ 0xa0, 0x2c7, 0x2d8, 0x141, 0xa4, 0x104, 0xa6, 0xa7,
+ 0xa8, 0xa9, 0x15e, 0xab, 0xac, 0xad, 0xae, 0x17b,
+ 0xb0, 0xb1, 0x2db, 0x142, 0xb4, 0xb5, 0xb6, 0xb7,
+ 0xb8, 0x105, 0x15f, 0xbb, 0x13d, 0x2dd, 0x13e, 0x17c,
+ 0x154, 0xc1, 0xc2, 0x102, 0xc4, 0x139, 0x106, 0xc7,
+ 0x10c, 0xc9, 0x118, 0xcb, 0x11a, 0xcd, 0xce, 0x10e,
+ 0x110, 0x143, 0x147, 0xd3, 0xd4, 0x150, 0xd6, 0xd7,
+ 0x158, 0x16e, 0xda, 0x170, 0xdc, 0xdd, 0x162, 0xdf,
+ 0x155, 0xe1, 0xe2, 0x103, 0xe4, 0x13a, 0x107, 0xe7,
+ 0x10d, 0xe9, 0x119, 0xeb, 0x11b, 0xed, 0xee, 0x10f,
+ 0x111, 0x144, 0x148, 0xf3, 0xf4, 0x151, 0xf6, 0xf7,
+ 0x159, 0x16f, 0xfa, 0x171, 0xfc, 0xfd, 0x163, 0x2d9 },
+ /* CP1251 */
+ { 0x402, 0x403, 0x201a, 0x453, 0x201e, 0x2026, 0x2020, 0x2021,
+ 0x20ac, 0x2030, 0x409, 0x2039, 0x40a, 0x40c, 0x40b, 0x40f,
+ 0x452, 0x2018, 0x2019, 0x201c, 0x201d, 0x2022, 0x2013, 0x2014,
+ 0x0, 0x2122, 0x459, 0x203a, 0x45a, 0x45c, 0x45b, 0x45f,
+ 0xa0, 0x40e, 0x45e, 0x408, 0xa4, 0x490, 0xa6, 0xa7,
+ 0x401, 0xa9, 0x404, 0xab, 0xac, 0xad, 0xae, 0x407,
+ 0xb0, 0xb1, 0x406, 0x456, 0x491, 0xb5, 0xb6, 0xb7,
+ 0x451, 0x2116, 0x454, 0xbb, 0x458, 0x405, 0x455, 0x457,
+ 0x410, 0x411, 0x412, 0x413, 0x414, 0x415, 0x416, 0x417,
+ 0x418, 0x419, 0x41a, 0x41b, 0x41c, 0x41d, 0x41e, 0x41f,
+ 0x420, 0x421, 0x422, 0x423, 0x424, 0x425, 0x426, 0x427,
+ 0x428, 0x429, 0x42a, 0x42b, 0x42c, 0x42d, 0x42e, 0x42f,
+ 0x430, 0x431, 0x432, 0x433, 0x434, 0x435, 0x436, 0x437,
+ 0x438, 0x439, 0x43a, 0x43b, 0x43c, 0x43d, 0x43e, 0x43f,
+ 0x440, 0x441, 0x442, 0x443, 0x444, 0x445, 0x446, 0x447,
+ 0x448, 0x449, 0x44a, 0x44b, 0x44c, 0x44d, 0x44e, 0x44f },
+ /* CP1252 */
+ { 0x20ac, 0x0, 0x201a, 0x192, 0x201e, 0x2026, 0x2020, 0x2021,
+ 0x2c6, 0x2030, 0x160, 0x2039, 0x152, 0x0, 0x17d, 0x0,
+ 0x0, 0x2018, 0x2019, 0x201c, 0x201d, 0x2022, 0x2013, 0x2014,
+ 0x2dc, 0x2122, 0x161, 0x203a, 0x153, 0x0, 0x17e, 0x178,
+ 0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
+ 0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf,
+ 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
+ 0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf,
+ 0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7,
+ 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
+ 0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7,
+ 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf,
+ 0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
+ 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
+ 0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7,
+ 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff },
+ /* CP1253 */
+ { 0x20ac, 0x0, 0x201a, 0x192, 0x201e, 0x2026, 0x2020, 0x2021,
+ 0x0, 0x2030, 0x0, 0x2039, 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x2018, 0x2019, 0x201c, 0x201d, 0x2022, 0x2013, 0x2014,
+ 0x0, 0x2122, 0x0, 0x203a, 0x0, 0x0, 0x0, 0x0,
+ 0xa0, 0x385, 0x386, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
+ 0xa8, 0xa9, 0x0, 0xab, 0xac, 0xad, 0xae, 0x2015,
+ 0xb0, 0xb1, 0xb2, 0xb3, 0x384, 0xb5, 0xb6, 0xb7,
+ 0x388, 0x389, 0x38a, 0xbb, 0x38c, 0xbd, 0x38e, 0x38f,
+ 0x390, 0x391, 0x392, 0x393, 0x394, 0x395, 0x396, 0x397,
+ 0x398, 0x399, 0x39a, 0x39b, 0x39c, 0x39d, 0x39e, 0x39f,
+ 0x3a0, 0x3a1, 0x0, 0x3a3, 0x3a4, 0x3a5, 0x3a6, 0x3a7,
+ 0x3a8, 0x3a9, 0x3aa, 0x3ab, 0x3ac, 0x3ad, 0x3ae, 0x3af,
+ 0x3b0, 0x3b1, 0x3b2, 0x3b3, 0x3b4, 0x3b5, 0x3b6, 0x3b7,
+ 0x3b8, 0x3b9, 0x3ba, 0x3bb, 0x3bc, 0x3bd, 0x3be, 0x3bf,
+ 0x3c0, 0x3c1, 0x3c2, 0x3c3, 0x3c4, 0x3c5, 0x3c6, 0x3c7,
+ 0x3c8, 0x3c9, 0x3ca, 0x3cb, 0x3cc, 0x3cd, 0x3ce, 0xff },
+ /* CP1254 */
+ { 0x20ac, 0x0, 0x201a, 0x192, 0x201e, 0x2026, 0x2020, 0x2021,
+ 0x2c6, 0x2030, 0x160, 0x2039, 0x152, 0x0, 0x0, 0x0,
+ 0x0, 0x2018, 0x2019, 0x201c, 0x201d, 0x2022, 0x2013, 0x2014,
+ 0x2dc, 0x2122, 0x161, 0x203a, 0x153, 0x0, 0x0, 0x178,
+ 0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
+ 0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf,
+ 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
+ 0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf,
+ 0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7,
+ 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
+ 0x11e, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7,
+ 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0x130, 0x15e, 0xdf,
+ 0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
+ 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
+ 0x11f, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7,
+ 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0x131, 0x15f, 0xff },
+ /* CP1255 */
+ { 0x20ac, 0x0, 0x201a, 0x192, 0x201e, 0x2026, 0x2020, 0x2021,
+ 0x2c6, 0x2030, 0x0, 0x2039, 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x2018, 0x2019, 0x201c, 0x201d, 0x2022, 0x2013, 0x2014,
+ 0x2dc, 0x2122, 0x0, 0x203a, 0x0, 0x0, 0x0, 0x0,
+ 0xa0, 0xa1, 0xa2, 0xa3, 0x20aa, 0xa5, 0xa6, 0xa7,
+ 0xa8, 0xa9, 0xd7, 0xab, 0xac, 0xad, 0xae, 0xaf,
+ 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
+ 0xb8, 0xb9, 0xf7, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf,
+ 0x5b0, 0x5b1, 0x5b2, 0x5b3, 0x5b4, 0x5b5, 0x5b6, 0x5b7,
+ 0x5b8, 0x5b9, 0x0, 0x5bb, 0x5bc, 0x5bd, 0x5be, 0x5bf,
+ 0x5c0, 0x5c1, 0x5c2, 0x5c3, 0x5f0, 0x5f1, 0x5f2, 0x5f3,
+ 0x5f4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+ 0x5d0, 0x5d1, 0x5d2, 0x5d3, 0x5d4, 0x5d5, 0x5d6, 0x5d7,
+ 0x5d8, 0x5d9, 0x5da, 0x5db, 0x5dc, 0x5dd, 0x5de, 0x5df,
+ 0x5e0, 0x5e1, 0x5e2, 0x5e3, 0x5e4, 0x5e5, 0x5e6, 0x5e7,
+ 0x5e8, 0x5e9, 0x5ea, 0x0, 0x0, 0x200e, 0x200f, 0xff },
+ /* CP1256 */
+ { 0x20ac, 0x67e, 0x201a, 0x192, 0x201e, 0x2026, 0x2020, 0x2021,
+ 0x2c6, 0x2030, 0x679, 0x2039, 0x152, 0x686, 0x698, 0x688,
+ 0x6af, 0x2018, 0x2019, 0x201c, 0x201d, 0x2022, 0x2013, 0x2014,
+ 0x6a9, 0x2122, 0x691, 0x203a, 0x153, 0x200c, 0x200d, 0x6ba,
+ 0xa0, 0x60c, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
+ 0xa8, 0xa9, 0x6be, 0xab, 0xac, 0xad, 0xae, 0xaf,
+ 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
+ 0xb8, 0xb9, 0x61b, 0xbb, 0xbc, 0xbd, 0xbe, 0x61f,
+ 0x6c1, 0x621, 0x622, 0x623, 0x624, 0x625, 0x626, 0x627,
+ 0x628, 0x629, 0x62a, 0x62b, 0x62c, 0x62d, 0x62e, 0x62f,
+ 0x630, 0x631, 0x632, 0x633, 0x634, 0x635, 0x636, 0xd7,
+ 0x637, 0x638, 0x639, 0x63a, 0x640, 0x641, 0x642, 0x643,
+ 0xe0, 0x644, 0xe2, 0x645, 0x646, 0x647, 0x648, 0xe7,
+ 0xe8, 0xe9, 0xea, 0xeb, 0x649, 0x64a, 0xee, 0xef,
+ 0x64b, 0x64c, 0x64d, 0x64e, 0xf4, 0x64f, 0x650, 0xf7,
+ 0x651, 0xf9, 0x652, 0xfb, 0xfc, 0x200e, 0x200f, 0x6d2 },
+ /* CP1257 */
+ { 0x20ac, 0x0, 0x201a, 0x0, 0x201e, 0x2026, 0x2020, 0x2021,
+ 0x0, 0x2030, 0x0, 0x2039, 0x0, 0xa8, 0x2c7, 0xb8,
+ 0x0, 0x2018, 0x2019, 0x201c, 0x201d, 0x2022, 0x2013, 0x2014,
+ 0x0, 0x2122, 0x0, 0x203a, 0x0, 0xaf, 0x2db, 0x0,
+ 0xa0, 0x0, 0xa2, 0xa3, 0xa4, 0x0, 0xa6, 0xa7,
+ 0xd8, 0xa9, 0x156, 0xab, 0xac, 0xad, 0xae, 0xc6,
+ 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
+ 0xf8, 0xb9, 0x157, 0xbb, 0xbc, 0xbd, 0xbe, 0xe6,
+ 0x104, 0x12e, 0x100, 0x106, 0xc4, 0xc5, 0x118, 0x112,
+ 0x10c, 0xc9, 0x179, 0x116, 0x122, 0x136, 0x12a, 0x13b,
+ 0x160, 0x143, 0x145, 0xd3, 0x14c, 0xd5, 0xd6, 0xd7,
+ 0x172, 0x141, 0x15a, 0x16a, 0xdc, 0x17b, 0x17d, 0xdf,
+ 0x105, 0x12f, 0x101, 0x107, 0xe4, 0xe5, 0x119, 0x113,
+ 0x10d, 0xe9, 0x17a, 0x117, 0x123, 0x137, 0x12b, 0x13c,
+ 0x161, 0x144, 0x146, 0xf3, 0x14d, 0xf5, 0xf6, 0xf7,
+ 0x173, 0x142, 0x15b, 0x16b, 0xfc, 0x17c, 0x17e, 0x2d9 },
+ /* CP1258 */
+ { 0x20ac, 0x0, 0x201a, 0x192, 0x201e, 0x2026, 0x2020, 0x2021,
+ 0x2c6, 0x2030, 0x0, 0x2039, 0x152, 0x0, 0x0, 0x0,
+ 0x0, 0x2018, 0x2019, 0x201c, 0x201d, 0x2022, 0x2013, 0x2014,
+ 0x2dc, 0x2122, 0x0, 0x203a, 0x153, 0x0, 0x0, 0x178,
+ 0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
+ 0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf,
+ 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
+ 0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf,
+ 0xc0, 0xc1, 0xc2, 0x102, 0xc4, 0xc5, 0xc6, 0xc7,
+ 0xc8, 0xc9, 0xca, 0xcb, 0x300, 0xcd, 0xce, 0xcf,
+ 0x110, 0xd1, 0x309, 0xd3, 0xd4, 0x1a0, 0xd6, 0xd7,
+ 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0x1af, 0x303, 0xdf,
+ 0xe0, 0xe1, 0xe2, 0x103, 0xe4, 0xe5, 0xe6, 0xe7,
+ 0xe8, 0xe9, 0xea, 0xeb, 0x301, 0xed, 0xee, 0xef,
+ 0x111, 0xf1, 0x323, 0xf3, 0xf4, 0x1a1, 0xf6, 0xf7,
+ 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0x1b0, 0x20ab, 0xff }
+};
+#endif /* _MB_EXTENDED_CHARSETS_DOS */
+
+/* Handle one to five decimal digits. Return -1 in any other case. */
+static int
+__micro_atoi (const char *s)
+{
+ int ret = 0;
+
+ if (!*s)
+ return -1;
+ while (*s)
+ {
+ if (*s < '0' || *s > '9' || ret >= 10000)
+ return -1;
+ ret = 10 * ret + (*s++ - '0');
+ }
+ return ret;
+}
+
+#ifdef _MB_EXTENDED_CHARSETS_ISO
+int
+__iso_8859_index (const char *charset_ext)
+{
+ int iso_idx = __micro_atoi (charset_ext);
+ if (iso_idx >= 2 && iso_idx <= 16)
+ {
+ iso_idx -= 2;
+ if (iso_idx > 10)
+ --iso_idx;
+ return iso_idx;
+ }
+ return -1;
+}
+#endif /* _MB_EXTENDED_CHARSETS_ISO */
+
+#ifdef _MB_EXTENDED_CHARSETS_DOS
+int
+__cp_index (const char *charset_ext)
+{
+ int cp_idx = __micro_atoi (charset_ext);
+ switch (cp_idx)
+ {
+ case 437:
+ cp_idx = 0;
+ break;
+ case 720:
+ cp_idx = 1;
+ break;
+ case 737:
+ cp_idx = 2;
+ break;
+ case 775:
+ cp_idx = 3;
+ break;
+ case 850:
+ cp_idx = 4;
+ break;
+ case 852:
+ cp_idx = 5;
+ break;
+ case 855:
+ cp_idx = 6;
+ break;
+ case 857:
+ cp_idx = 7;
+ break;
+ case 858:
+ cp_idx = 8;
+ break;
+ case 862:
+ cp_idx = 9;
+ break;
+ case 866:
+ cp_idx = 10;
+ break;
+ case 874:
+ cp_idx = 11;
+ break;
+ case 1125:
+ cp_idx = 12;
+ break;
+ case 1250:
+ cp_idx = 13;
+ break;
+ case 1251:
+ cp_idx = 14;
+ break;
+ case 1252:
+ cp_idx = 15;
+ break;
+ case 1253:
+ cp_idx = 16;
+ break;
+ case 1254:
+ cp_idx = 17;
+ break;
+ case 1255:
+ cp_idx = 18;
+ break;
+ case 1256:
+ cp_idx = 19;
+ break;
+ case 1257:
+ cp_idx = 20;
+ break;
+ case 1258:
+ cp_idx = 21;
+ break;
+ default:
+ cp_idx = -1;
+ break;
+ }
+ return cp_idx;
+}
+#endif /* _MB_EXTENDED_CHARSETS_DOS */
+#endif /* _MB_CAPABLE */
Index: libc/stdlib/wctomb_r.c
===================================================================
RCS file: /cvs/src/src/newlib/libc/stdlib/wctomb_r.c,v
retrieving revision 1.12
diff -u -p -r1.12 wctomb_r.c
--- libc/stdlib/wctomb_r.c 19 Mar 2009 19:47:52 -0000 1.12
+++ libc/stdlib/wctomb_r.c 22 Mar 2009 16:25:07 -0000
@@ -4,209 +4,338 @@
#include <wchar.h>
#include <locale.h>
#include "mbctype.h"
+#include "local.h"
-extern char *__locale_charset ();
+int (*__wctomb) (struct _reent *, char *, wchar_t, const char *charset,
+ mbstate_t *)
+ = __ascii_wctomb;
+#ifdef _MB_CAPABLE
/* for some conversions, we use the __count field as a place to store a state value */
#define __state __count
int
-_DEFUN (_wctomb_r, (r, s, wchar, state),
- struct _reent *r _AND - char *s _AND
- wchar_t _wchar _AND
+_DEFUN (__utf8_wctomb, (r, s, wchar, charset, state),
+ struct _reent *r _AND + char *s _AND
+ wchar_t _wchar _AND
+ const char *charset _AND
mbstate_t *state)
{
- /* Avoids compiler warnings about comparisons that are always false
- due to limited range when sizeof(wchar_t) is 2 but sizeof(wint_t)
- is 4, as is the case on cygwin. */
wint_t wchar = _wchar;
- if (strlen (__locale_charset ()) <= 1)
- { /* fall-through */ }
- else if (!strcmp (__locale_charset (), "UTF-8"))
- {
- if (s == NULL)
- return 0; /* UTF-8 encoding is not state-dependent */
+ if (s == NULL)
+ return 0; /* UTF-8 encoding is not state-dependent */
- if (state->__count == -4 && (wchar < 0xdc00 || wchar >= 0xdfff))
+ if (state->__count == -4 && (wchar < 0xdc00 || wchar >= 0xdfff))
+ {
+ /* At this point only the second half of a surrogate pair is valid. */
+ r->_errno = EILSEQ;
+ return -1;
+ }
+ if (wchar <= 0x7f)
+ {
+ *s = wchar;
+ return 1;
+ }
+ if (wchar >= 0x80 && wchar <= 0x7ff)
+ {
+ *s++ = 0xc0 | ((wchar & 0x7c0) >> 6);
+ *s = 0x80 | (wchar & 0x3f);
+ return 2;
+ }
+ if (wchar >= 0x800 && wchar <= 0xffff)
+ {
+ if (wchar >= 0xd800 && wchar <= 0xdfff)
{
- /* At this point only the second half of a surrogate pair is valid. */
- r->_errno = EILSEQ;
- return -1;
- }
- if (wchar <= 0x7f)
- {
- *s = wchar;
- return 1;
- }
- else if (wchar >= 0x80 && wchar <= 0x7ff)
- {
- *s++ = 0xc0 | ((wchar & 0x7c0) >> 6);
- *s = 0x80 | (wchar & 0x3f);
- return 2;
- }
- else if (wchar >= 0x800 && wchar <= 0xffff)
- {
- if (wchar >= 0xd800 && wchar <= 0xdfff)
+ wint_t tmp;
+ /* UTF-16 surrogates -- must not occur in normal UCS-4 data */
+ if (sizeof (wchar_t) != 2)
+ {
+ r->_errno = EILSEQ;
+ return -1;
+ }
+ if (wchar >= 0xdc00)
{
- wint_t tmp;
- /* UTF-16 surrogates -- must not occur in normal UCS-4 data */
- if (sizeof (wchar_t) != 2)
+ /* Second half of a surrogate pair. It's not valid if
+ we don't have already read a first half of a surrogate
+ before. */
+ if (state->__count != -4)
{
r->_errno = EILSEQ;
return -1;
}
- if (wchar >= 0xdc00)
- {
- /* Second half of a surrogate pair. It's not valid if
- we don't have already read a first half of a surrogate
- before. */
- if (state->__count != -4)
- {
- r->_errno = EILSEQ;
- return -1;
- }
- /* If it's valid, reconstruct the full Unicode value and
- return the trailing three bytes of the UTF-8 char. */
- tmp = (state->__value.__wchb[0] << 16)
- | (state->__value.__wchb[1] << 8)
- | (wchar & 0x3ff);
- state->__count = 0;
- *s++ = 0x80 | ((tmp & 0x3f000) >> 12);
- *s++ = 0x80 | ((tmp & 0xfc0) >> 6);
- *s = 0x80 | (tmp & 0x3f);
- return 3;
- }
- /* First half of a surrogate pair. Store the state and return
- the first byte of the UTF-8 char. */
- tmp = ((wchar & 0x3ff) << 10) + 0x10000;
- state->__value.__wchb[0] = (tmp >> 16) & 0xff;
- state->__value.__wchb[1] = (tmp >> 8) & 0xff;
- state->__count = -4;
- *s = (0xf0 | ((tmp & 0x1c0000) >> 18));
- return 1;
+ /* If it's valid, reconstruct the full Unicode value and
+ return the trailing three bytes of the UTF-8 char. */
+ tmp = (state->__value.__wchb[0] << 16)
+ | (state->__value.__wchb[1] << 8)
+ | (wchar & 0x3ff);
+ state->__count = 0;
+ *s++ = 0x80 | ((tmp & 0x3f000) >> 12);
+ *s++ = 0x80 | ((tmp & 0xfc0) >> 6);
+ *s = 0x80 | (tmp & 0x3f);
+ return 3;
}
- *s++ = 0xe0 | ((wchar & 0xf000) >> 12);
- *s++ = 0x80 | ((wchar & 0xfc0) >> 6);
- *s = 0x80 | (wchar & 0x3f);
- return 3;
- }
- else if (wchar >= 0x10000 && wchar <= 0x10ffff)
- {
- *s++ = 0xf0 | ((wchar & 0x1c0000) >> 18);
- *s++ = 0x80 | ((wchar & 0x3f000) >> 12);
- *s++ = 0x80 | ((wchar & 0xfc0) >> 6);
- *s = 0x80 | (wchar & 0x3f);
- return 4;
- }
+ /* First half of a surrogate pair. Store the state and return
+ the first byte of the UTF-8 char. */
+ tmp = ((wchar & 0x3ff) << 10) + 0x10000;
+ state->__value.__wchb[0] = (tmp >> 16) & 0xff;
+ state->__value.__wchb[1] = (tmp >> 8) & 0xff;
+ state->__count = -4;
+ *s = (0xf0 | ((tmp & 0x1c0000) >> 18));
+ return 1;
+ }
+ *s++ = 0xe0 | ((wchar & 0xf000) >> 12);
+ *s++ = 0x80 | ((wchar & 0xfc0) >> 6);
+ *s = 0x80 | (wchar & 0x3f);
+ return 3;
+ }
+ if (wchar >= 0x10000 && wchar <= 0x10ffff)
+ {
+ *s++ = 0xf0 | ((wchar & 0x1c0000) >> 18);
+ *s++ = 0x80 | ((wchar & 0x3f000) >> 12);
+ *s++ = 0x80 | ((wchar & 0xfc0) >> 6);
+ *s = 0x80 | (wchar & 0x3f);
+ return 4;
+ }
+
+ r->_errno = EILSEQ;
+ return -1;
+}
+
+/* Cygwin defines its own doublebyte charset conversion functions + because the underlying OS requires wchar_t == UTF-16. */
+#ifndef __CYGWIN__
+int
+_DEFUN (__sjis_wctomb, (r, s, wchar, charset, state),
+ struct _reent *r _AND + char *s _AND
+ wchar_t _wchar _AND
+ const char *charset _AND
+ mbstate_t *state)
+{
+ wint_t wchar = _wchar;
+
+ unsigned char char2 = (unsigned char)wchar;
+ unsigned char char1 = (unsigned char)(wchar >> 8);
+
+ if (s == NULL)
+ return 0; /* not state-dependent */
+
+ if (char1 != 0x00)
+ {
+ /* first byte is non-zero..validate multi-byte char */
+ if (_issjis1(char1) && _issjis2(char2)) + {
+ *s++ = (char)char1;
+ *s = (char)char2;
+ return 2;
+ }
else
{
r->_errno = EILSEQ;
return -1;
}
}
- else if (!strcmp (__locale_charset (), "SJIS"))
+ *s = (char) wchar;
+ return 1;
+}
+
+int
+_DEFUN (__eucjp_wctomb, (r, s, wchar, charset, state),
+ struct _reent *r _AND + char *s _AND
+ wchar_t _wchar _AND
+ const char *charset _AND
+ mbstate_t *state)
+{
+ wint_t wchar = _wchar;
+ unsigned char char2 = (unsigned char)wchar;
+ unsigned char char1 = (unsigned char)(wchar >> 8);
+
+ if (s == NULL)
+ return 0; /* not state-dependent */
+
+ if (char1 != 0x00)
{
- unsigned char char2 = (unsigned char)wchar;
- unsigned char char1 = (unsigned char)(wchar >> 8);
+ /* first byte is non-zero..validate multi-byte char */
+ if (_iseucjp (char1) && _iseucjp (char2)) + {
+ *s++ = (char)char1;
+ *s = (char)char2;
+ return 2;
+ }
+ else
+ {
+ r->_errno = EILSEQ;
+ return -1;
+ }
+ }
+ *s = (char) wchar;
+ return 1;
+}
- if (s == NULL)
- return 0; /* not state-dependent */
+int
+_DEFUN (__jis_wctomb, (r, s, wchar, charset, state),
+ struct _reent *r _AND + char *s _AND
+ wchar_t _wchar _AND
+ const char *charset _AND
+ mbstate_t *state)
+{
+ wint_t wchar = _wchar;
+ int cnt = 0; + unsigned char char2 = (unsigned char)wchar;
+ unsigned char char1 = (unsigned char)(wchar >> 8);
- if (char1 != 0x00)
- {
- /* first byte is non-zero..validate multi-byte char */
- if (_issjis1(char1) && _issjis2(char2)) - {
- *s++ = (char)char1;
- *s = (char)char2;
- return 2;
- }
- else
+ if (s == NULL)
+ return 1; /* state-dependent */
+
+ if (char1 != 0x00)
+ {
+ /* first byte is non-zero..validate multi-byte char */
+ if (_isjis (char1) && _isjis (char2)) + {
+ if (state->__state == 0)
{
- r->_errno = EILSEQ;
- return -1;
+ /* must switch from ASCII to JIS state */
+ state->__state = 1;
+ *s++ = ESC_CHAR;
+ *s++ = '$';
+ *s++ = 'B';
+ cnt = 3;
}
- }
+ *s++ = (char)char1;
+ *s = (char)char2;
+ return cnt + 2;
+ }
+ r->_errno = EILSEQ;
+ return -1;
}
- else if (!strcmp (__locale_charset (), "EUCJP"))
+ if (state->__state != 0)
{
- unsigned char char2 = (unsigned char)wchar;
- unsigned char char1 = (unsigned char)(wchar >> 8);
+ /* must switch from JIS to ASCII state */
+ state->__state = 0;
+ *s++ = ESC_CHAR;
+ *s++ = '(';
+ *s++ = 'B';
+ cnt = 3;
+ }
+ *s = (char)char2;
+ return cnt + 1;
+}
+#endif /* !__CYGWIN__ */
- if (s == NULL)
- return 0; /* not state-dependent */
-
- if (char1 != 0x00)
- {
- /* first byte is non-zero..validate multi-byte char */
- if (_iseucjp (char1) && _iseucjp (char2)) - {
- *s++ = (char)char1;
- *s = (char)char2;
- return 2;
- }
- else
- {
- r->_errno = EILSEQ;
- return -1;
- }
- }
+#ifdef _MB_EXTENDED_CHARSETS_ISO
+int
+_DEFUN (__iso_wctomb, (r, s, wchar, charset, state),
+ struct _reent *r _AND + char *s _AND
+ wchar_t _wchar _AND
+ const char *charset _AND
+ mbstate_t *state)
+{
+ wint_t wchar = _wchar;
+
+ if (s == NULL)
+ return 0;
+
+ /* wchars <= 0x9f translate to all ISO charsets directly. */
+ if (wchar >= 0xa0)
+ {
+ int iso_idx = __iso_8859_index (charset + 9);
+ if (iso_idx >= 0)
+ {
+ unsigned char mb;
+
+ if (s == NULL)
+ return 0;
+
+ for (mb = 0; mb < 0x60; ++mb)
+ if (__iso_8859_conv[iso_idx][mb] == wchar)
+ {
+ *s = (char) (mb + 0xa0);
+ return 1;
+ }
+ r->_errno = EILSEQ;
+ return -1;
+ }
}
- else if (!strcmp (__locale_charset (), "JIS"))
+ + if ((size_t)wchar >= 0x100)
{
- int cnt = 0; - unsigned char char2 = (unsigned char)wchar;
- unsigned char char1 = (unsigned char)(wchar >> 8);
-
- if (s == NULL)
- return 1; /* state-dependent */
-
- if (char1 != 0x00)
- {
- /* first byte is non-zero..validate multi-byte char */
- if (_isjis (char1) && _isjis (char2)) - {
- if (state->__state == 0)
- {
- /* must switch from ASCII to JIS state */
- state->__state = 1;
- *s++ = ESC_CHAR;
- *s++ = '$';
- *s++ = 'B';
- cnt = 3;
- }
- *s++ = (char)char1;
- *s = (char)char2;
- return cnt + 2;
- }
- else
- {
- r->_errno = EILSEQ;
- return -1;
- }
- }
- else
- {
- if (state->__state != 0)
- {
- /* must switch from JIS to ASCII state */
- state->__state = 0;
- *s++ = ESC_CHAR;
- *s++ = '(';
- *s++ = 'B';
- cnt = 3;
- }
- *s = (char)char2;
- return cnt + 1;
- }
+ r->_errno = EILSEQ;
+ return -1;
+ }
+
+ *s = (char) wchar;
+ return 1;
+}
+#endif /* _MB_EXTENDED_CHARSETS_ISO */
+
+#ifdef _MB_EXTENDED_CHARSETS_DOS
+int
+_DEFUN (__cp_wctomb, (r, s, wchar, charset, state),
+ struct _reent *r _AND + char *s _AND
+ wchar_t _wchar _AND
+ const char *charset _AND
+ mbstate_t *state)
+{
+ wint_t wchar = _wchar;
+
+ if (s == NULL)
+ return 0;
+
+ if (wchar >= 0x80)
+ {
+ int cp_idx = __cp_index (charset + 2);
+ if (cp_idx >= 0)
+ {
+ unsigned char mb;
+
+ if (s == NULL)
+ return 0;
+
+ for (mb = 0; mb < 0x80; ++mb)
+ if (__cp_conv[cp_idx][mb] == wchar)
+ {
+ *s = (char) (mb + 0x80);
+ return 1;
+ }
+ r->_errno = EILSEQ;
+ return -1;
+ }
+ }
+
+ if ((size_t)wchar >= 0x100)
+ {
+ r->_errno = EILSEQ;
+ return -1;
}
+ *s = (char) wchar;
+ return 1;
+}
+#endif /* _MB_EXTENDED_CHARSETS_DOS */
+#endif /* _MB_CAPABLE */
+
+int
+_DEFUN (__ascii_wctomb, (r, s, wchar, charset, state),
+ struct _reent *r _AND + char *s _AND
+ wchar_t _wchar _AND
+ const char *charset _AND
+ mbstate_t *state)
+{
+ /* Avoids compiler warnings about comparisons that are always false
+ due to limited range when sizeof(wchar_t) is 2 but sizeof(wint_t)
+ is 4, as is the case on cygwin. */
+ wint_t wchar = _wchar;
+
if (s == NULL)
return 0;
- /* otherwise we are dealing with a single byte character */
if ((size_t)wchar >= 0x100)
{
r->_errno = EILSEQ;
@@ -216,4 +345,13 @@ _DEFUN (_wctomb_r, (r, s, wchar, state),
*s = (char) wchar;
return 1;
}
- +
+int
+_DEFUN (_wctomb_r, (r, s, wchar, state),
+ struct _reent *r _AND + char *s _AND
+ wchar_t _wchar _AND
+ mbstate_t *state)
+{
+ return __wctomb (r, s, _wchar, __locale_charset (), state);
+}





Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]