This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

a more exhaustive iconv check



Hi,

Up to now the gconv module for a particular encoding and its charmap have
had all freedom to disagree, and they often did, because they come from
different sources. But it does not make sense if the locale tables (created
using the charmap) and the runtime conversion (using the gconv module)
disagree.

Therefore here is a new test that verifies that the charmap and iconv
(in the charset to unicode direction) agree. The reverse iconv direction
must generally agree as well, except for a few limited and known cases,
which can be stored in CHARSET.irreversible files.

This patch uncovers a few bugs which are fixed in the next mails. If you
don't put in one of these fixes, you have to comment out the corresponding
line in iconvdata/tst-tables.sh.

             Bruno


New files to be "chmod a+x" before commit:
  iconvdata/tst-tables.sh
  iconvdata/tst-table.sh
  iconvdata/tst-table-charmap.sh

2000-09-03  Bruno Haible  <haible@clisp.cons.org>

	* iconvdata/tst-tables.sh: New file.
	* iconvdata/tst-table.sh: New file.
	* iconvdata/tst-table-from.c: New file.
	* iconvdata/tst-table-to.c: New file.
	* iconvdata/tst-table-charmap.sh: New file.
	* iconvdata/Makefile (test-srcs): Set to tst-table-from tst-table-to.
	(distribute): Add tst-tables.sh, tst-table.sh, tst-table-charmap.sh,
	tst-table-from.c, tst-table-to.c, EUC-JP.irreversible,
	ISIRI-3342.irreversible, SJIS.irreversible.
	(tests): Add dependency on tst-tables.out.
	(tst-tables.out, tst-tables-clean): New rules.
	(do-tests-clean, common-mostlyclean): Require tst-tables-clean.
	* iconvdata/ISIRI-3342.irreversible: New file.
	* iconvdata/EUC-JP.irreversible: New file.
	* iconvdata/SJIS.irreversible: New file.

*** glibc-20000831/iconvdata/tst-tables.sh.bak	Sun Sep  3 00:19:30 2000
--- glibc-20000831/iconvdata/tst-tables.sh	Sun Sep  3 15:51:47 2000
***************
*** 0 ****
--- 1,213 ----
+ #!/bin/sh
+ # Copyright (C) 2000 Free Software Foundation, Inc.
+ # This file is part of the GNU C Library.
+ # Contributed by Bruno Haible <haible@clisp.cons.org>, 2000.
+ #
+ # The GNU C Library is free software; you can redistribute it and/or
+ # modify it under the terms of the GNU Library General Public License as
+ # published by the Free Software Foundation; either version 2 of the
+ # License, or (at your option) any later version.
+ #
+ # The GNU C Library is distributed in the hope that it will be useful,
+ # but WITHOUT ANY WARRANTY; without even the implied warranty of
+ # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ # Library General Public License for more details.
+ #
+ # You should have received a copy of the GNU Library General Public
+ # License along with the GNU C Library; see the file COPYING.LIB.  If not,
+ # write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ # Boston, MA 02111-1307, USA.
+ 
+ # Checks that the iconv() implementation (in both directions) for the
+ # stateless encodings agrees with the corresponding charmap table.
+ 
+ common_objpfx=$1
+ objpfx=$2
+ 
+ status=0
+ 
+ cat <<EOF |
+   # Single-byte and other "small" encodings come here.
+   # Keep this list in the same order as gconv-modules.
+   #
+   # charset name    table name          comment
+   ASCII             ANSI_X3.4-1968
+   ISO646-GB         BS_4730
+   ISO646-CA         CSA_Z243.4-1985-1
+   ISO646-CA2        CSA_Z243.4-1985-2
+   ISO646-DE         DIN_66003
+   ISO646-DK         DS_2089
+   ISO646-ES         ES
+   ISO646-ES2        ES2
+   ISO646-CN         GB_1988-80
+   ISO646-IT         IT
+   ISO646-JP         JIS_C6220-1969-RO
+   ISO646-JP-OCR-B   JIS_C6229-1984-B
+   ISO646-YU         JUS_I.B1.002
+   ISO646-KR         KSC5636
+   ISO646-HU         MSZ_7795.3
+   ISO646-CU         NC_NC00-10
+   ISO646-FR         NF_Z_62-010
+   ISO646-FR1        NF_Z_62-010_1973
+   ISO646-NO         NS_4551-1
+   ISO646-NO2        NS_4551-2
+   ISO646-PT         PT
+   ISO646-PT2        PT2
+   ISO646-SE         SEN_850200_B
+   ISO646-SE2        SEN_850200_C
+   ISO-8859-1
+   ISO-8859-2
+   ISO-8859-3
+   ISO-8859-4
+   ISO-8859-5
+   ISO-8859-6
+   ISO-8859-7
+   ISO-8859-8
+   ISO-8859-9
+   ISO-8859-10
+   #ISO-8859-11                          No corresponding table, nonstandard
+   ISO-8859-13
+   ISO-8859-14
+   ISO-8859-15
+   ISO-8859-16
+   T.61-8BIT
+   ISO_6937
+   #ISO_6937-2        ISO-IR-90          Handling of combining marks is broken
+   KOI-8
+   KOI8-R
+   LATIN-GREEK
+   LATIN-GREEK-1
+   HP-ROMAN8
+   EBCDIC-AT-DE
+   EBCDIC-AT-DE-A
+   EBCDIC-CA-FR
+   EBCDIC-DK-NO
+   EBCDIC-DK-NO-A
+   EBCDIC-ES
+   EBCDIC-ES-A
+   EBCDIC-ES-S
+   EBCDIC-FI-SE
+   EBCDIC-FI-SE-A
+   EBCDIC-FR
+   EBCDIC-IS-FRISS
+   EBCDIC-IT
+   EBCDIC-PT
+   EBCDIC-UK
+   EBCDIC-US
+   IBM037
+   IBM038
+   IBM256
+   IBM273
+   IBM274
+   IBM275
+   IBM277
+   IBM278
+   IBM280
+   IBM281
+   IBM284
+   IBM285
+   IBM290
+   IBM297
+   IBM420
+   IBM423
+   IBM424
+   IBM437
+   IBM500
+   IBM850
+   IBM851
+   IBM852
+   IBM855
+   IBM857
+   IBM860
+   IBM861
+   IBM862
+   IBM863
+   IBM864
+   IBM865
+   IBM866
+   IBM868
+   IBM869
+   IBM870
+   IBM871
+   IBM875
+   IBM880
+   IBM891
+   IBM903
+   IBM904
+   IBM905
+   IBM918
+   IBM1004
+   IBM1026
+   IBM1047
+   CP1250
+   CP1251
+   CP1252
+   CP1253
+   CP1254
+   CP1255
+   CP1256
+   CP1257
+   CP1258
+   IBM874
+   CP737
+   CP775
+   MACINTOSH
+   IEC_P27-1
+   ASMO_449
+   ISO-IR-99         ANSI_X3.110-1983
+   ISO-IR-139        CSN_369103
+   CWI
+   DEC-MCS
+   ECMA-CYRILLIC
+   ISO-IR-153        GOST_19768-74
+   GREEK-CCITT
+   GREEK7
+   GREEK7-OLD
+   INIS
+   INIS-8
+   INIS-CYRILLIC
+   ISO_2033          ISO_2033-1983
+   ISO_5427
+   ISO_5427-EXT
+   #ISO_5428                             Handling of combining marks is broken
+   ISO_10367-BOX
+   MAC-IS
+   MAC-UK
+   NATS-DANO
+   NATS-SEFI
+   WIN-SAMI-2        SAMI-WS2
+   ISO-IR-197
+   TIS-620
+   KOI8-U
+   ISIRI-3342
+   #
+   # Multibyte encodings come here
+   #
+   SJIS
+   #EUC-KR                               Charmap contains extraneous entries
+   CP949
+   #JOHAB                                No charmap exists
+   BIG5
+   #BIG5HKSCS                            Broken, please fix it
+   EUC-JP
+   EUC-CN            GB2312
+   #GBK                                  Converter uses private area characters
+   EUC-TW
+   #GB18030                              Broken, please fix it
+   #
+   # Stateful encodings not testable this way
+   #
+   #ISO-2022-JP
+   #ISO-2022-JP-2
+   #ISO-2022-KR
+   #ISO-2022-CN
+   #
+ EOF
+ while read charset charmap; do
+   case ${charset} in \#*) continue;; esac
+   echo "Testing ${charset}" 1>&2
+   ./tst-table.sh ${common_objpfx} ${objpfx} ${charset} ${charmap} \
+   || { echo "failed: ./tst-table.sh ${common_objpfx} ${objpfx} ${charset} ${charmap}"; status=1; }
+ done
+ 
+ exit $status
*** glibc-20000831/iconvdata/tst-table.sh.bak	Sun Sep  3 01:00:10 2000
--- glibc-20000831/iconvdata/tst-table.sh	Sun Sep  3 15:46:49 2000
***************
*** 0 ****
--- 1,75 ----
+ #!/bin/sh
+ # Copyright (C) 2000 Free Software Foundation, Inc.
+ # This file is part of the GNU C Library.
+ # Contributed by Bruno Haible <haible@clisp.cons.org>, 2000.
+ #
+ # The GNU C Library is free software; you can redistribute it and/or
+ # modify it under the terms of the GNU Library General Public License as
+ # published by the Free Software Foundation; either version 2 of the
+ # License, or (at your option) any later version.
+ #
+ # The GNU C Library is distributed in the hope that it will be useful,
+ # but WITHOUT ANY WARRANTY; without even the implied warranty of
+ # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ # Library General Public License for more details.
+ #
+ # You should have received a copy of the GNU Library General Public
+ # License along with the GNU C Library; see the file COPYING.LIB.  If not,
+ # write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ # Boston, MA 02111-1307, USA.
+ 
+ # Checks that the iconv() implementation (in both directions) for a
+ # stateless encoding agrees with the charmap table.
+ 
+ common_objpfx=$1
+ objpfx=$2
+ charset=$3
+ charmap=$4
+ 
+ GCONV_PATH=${common_objpfx}iconvdata
+ export GCONV_PATH
+ LC_ALL=C
+ export LC_ALL
+ 
+ set -e
+ 
+ # Get the charmap.
+ ./tst-table-charmap.sh ${charmap:-$charset} \
+   < ../localedata/charmaps/${charmap:-$charset} \
+   > ${objpfx}tst-${charset}.charmap.table
+ 
+ # Precompute expected differences between the two iconv directions.
+ if test ${charset} = EUC-TW; then
+   irreversible=${objpfx}tst-${charset}.irreversible
+   grep '^0x8EA1' ${objpfx}tst-${charset}.charmap.table > ${irreversible}
+ else
+   irreversible=${charset}.irreversible
+ fi
+ 
+ # iconv in one direction.
+ ${common_objpfx}elf/ld.so --library-path $common_objpfx \
+ ${objpfx}tst-table-from ${charset} \
+   > ${objpfx}tst-${charset}.table
+ 
+ # iconv in the other direction.
+ ${common_objpfx}elf/ld.so --library-path $common_objpfx \
+ ${objpfx}tst-table-to ${charset} | sort \
+   > ${objpfx}tst-${charset}.inverse.table
+ 
+ # Difference between the two iconv directions.
+ diff ${objpfx}tst-${charset}.table ${objpfx}tst-${charset}.inverse.table | \
+   grep '^[<>]' | sed -e 's,^. ,,' > ${objpfx}tst-${charset}.irreversible.table
+ 
+ # Check 1: charmap and iconv forward should be identical.
+ cmp -s ${objpfx}tst-${charset}.charmap.table ${objpfx}tst-${charset}.table
+ 
+ # Check 2: the difference between the two iconv directions.
+ if test -f ${irreversible}; then
+   cat ${objpfx}tst-${charset}.charmap.table ${irreversible} | sort | uniq -u \
+     > ${objpfx}tst-${charset}.tmp.table
+   cmp -s ${objpfx}tst-${charset}.tmp.table ${objpfx}tst-${charset}.inverse.table
+ else
+   cmp -s ${objpfx}tst-${charset}.table ${objpfx}tst-${charset}.inverse.table
+ fi
+ 
+ exit 0
*** glibc-20000831/iconvdata/tst-table-from.c.bak	Sun Sep  3 00:19:30 2000
--- glibc-20000831/iconvdata/tst-table-from.c	Sun Sep  3 02:49:14 2000
***************
*** 0 ****
--- 1,225 ----
+ /* Copyright (C) 2000 Free Software Foundation, Inc.
+    This file is part of the GNU C Library.
+    Contributed by Bruno Haible <haible@clisp.cons.org>, 2000.
+ 
+    The GNU C Library is free software; you can redistribute it and/or
+    modify it under the terms of the GNU Library General Public License as
+    published by the Free Software Foundation; either version 2 of the
+    License, or (at your option) any later version.
+ 
+    The GNU C Library is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Library General Public License for more details.
+ 
+    You should have received a copy of the GNU Library General Public
+    License along with the GNU C Library; see the file COPYING.LIB.  If not,
+    write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+    Boston, MA 02111-1307, USA.  */
+ 
+ /* Create a table from CHARSET to Unicode.
+    This is a good test for CHARSET's iconv() module, in particular the
+    FROM_LOOP BODY macro.  */
+ 
+ #include <stddef.h>
+ #include <stdio.h>
+ #include <stdlib.h>
+ #include <iconv.h>
+ #include <errno.h>
+ 
+ /* Converts a byte buffer to a hexadecimal string.  */
+ static const char*
+ hexbuf (unsigned char buf[], unsigned int buflen)
+ {
+   static char msg[50];
+ 
+   switch (buflen)
+     {
+     case 1:
+       sprintf (msg, "0x%02X", buf[0]);
+       break;
+     case 2:
+       sprintf (msg, "0x%02X%02X", buf[0], buf[1]);
+       break;
+     case 3:
+       sprintf (msg, "0x%02X%02X%02X", buf[0], buf[1], buf[2]);
+       break;
+     case 4:
+       sprintf (msg, "0x%02X%02X%02X%02X", buf[0], buf[1], buf[2], buf[3]);
+       break;
+     default:
+       abort ();
+     }
+   return msg;
+ }
+ 
+ /* Attempts to convert a byte buffer BUF (BUFLEN bytes) to OUT (6 bytes)
+    using the conversion descriptor CD.  Returns the number of written bytes,
+    or 0 if ambiguous, or -1 if invalid.  */
+ static int
+ try (iconv_t cd, unsigned char buf[], unsigned int buflen, unsigned char *out)
+ {
+   const char *inbuf = (const char *) buf;
+   size_t inbytesleft = buflen;
+   char *outbuf = (char *) out;
+   size_t outbytesleft = 6;
+   size_t result = iconv (cd,
+ 			 (char *) &inbuf, &inbytesleft,
+ 			 &outbuf, &outbytesleft);
+   if (result == (size_t)(-1))
+     {
+       if (errno == EILSEQ)
+ 	{
+ 	  return -1;
+ 	}
+       else if (errno == EINVAL)
+ 	{
+ 	  return 0;
+ 	}
+       else
+ 	{
+ 	  int saved_errno = errno;
+ 	  fprintf (stderr, "%s: iconv error: ", hexbuf (buf, buflen));
+ 	  errno = saved_errno;
+ 	  perror ("");
+ 	  exit (1);
+ 	}
+     }
+   else
+     {
+       if (inbytesleft != 0)
+ 	{
+ 	  fprintf (stderr, "%s: inbytes = %ld, outbytes = %ld\n",
+ 		   hexbuf (buf, buflen),
+ 		   (long) (buflen - inbytesleft),
+ 		   (long) (6 - outbytesleft));
+ 	  exit (1);
+ 	}
+       return 6 - outbytesleft;
+     }
+ }
+ 
+ /* Returns the out[] buffer as a Unicode value.  */
+ static unsigned int
+ utf8_decode (const unsigned char *out, unsigned int outlen)
+ {
+   return (outlen==1 ? out[0] :
+ 	  outlen==2 ? ((out[0] & 0x1f) << 6) + (out[1] & 0x3f) :
+ 	  outlen==3 ? ((out[0] & 0x0f) << 12) + ((out[1] & 0x3f) << 6) + (out[2] & 0x3f) :
+ 	  outlen==4 ? ((out[0] & 0x07) << 18) + ((out[1] & 0x3f) << 12) + ((out[2] & 0x3f) << 6) + (out[3] & 0x3f) :
+ 	  outlen==5 ? ((out[0] & 0x03) << 24) + ((out[1] & 0x3f) << 18) + ((out[2] & 0x3f) << 12) + ((out[3] & 0x3f) << 6) + (out[4] & 0x3f) :
+ 	  outlen==6 ? ((out[0] & 0x01) << 30) + ((out[1] & 0x3f) << 24) + ((out[2] & 0x3f) << 18) + ((out[3] & 0x3f) << 12) + ((out[4] & 0x3f) << 6) + (out[5] & 0x3f) :
+ 	  0xfffd);
+ }
+ 
+ int
+ main (int argc, char *argv[])
+ {
+   const char *charset;
+   iconv_t cd;
+ 
+   if (argc != 2)
+     {
+       fprintf (stderr, "Usage: tst-table-to charset\n");
+       exit (1);
+     }
+   charset = argv[1];
+ 
+   cd = iconv_open ("UTF-8", charset);
+   if (cd == (iconv_t)(-1))
+     {
+       perror ("iconv_open");
+       exit (1);
+     }
+ 
+   {
+     unsigned char out[6];
+     unsigned char buf[4];
+     unsigned int i0, i1, i2, i3;
+     int result;
+ 
+     for (i0 = 0; i0 < 0x100; i0++)
+       {
+ 	buf[0] = i0;
+ 	result = try (cd, buf, 1, out);
+ 	if (result < 0)
+ 	  {
+ 	  }
+ 	else if (result > 0)
+ 	  {
+ 	    printf ("0x%02X\t0x%04X\n",
+ 		    i0, utf8_decode (out, result));
+ 	  }
+ 	else
+ 	  {
+ 	    for (i1 = 0; i1 < 0x100; i1++)
+ 	      {
+ 		buf[1] = i1;
+ 		result = try (cd, buf, 2, out);
+ 		if (result < 0)
+ 		  {
+ 		  }
+ 		else if (result > 0)
+ 		  {
+ 		    printf ("0x%02X%02X\t0x%04X\n",
+ 			    i0, i1, utf8_decode (out, result));
+ 		  }
+ 		else
+ 		  {
+ 		    for (i2 = 0; i2 < 0x100; i2++)
+ 		      {
+ 			buf[2] = i2;
+ 			result = try (cd, buf, 3, out);
+ 			if (result < 0)
+ 			  {
+ 			  }
+ 			else if (result > 0)
+ 			  {
+ 			    printf ("0x%02X%02X%02X\t0x%04X\n",
+ 				    i0, i1, i2, utf8_decode (out, result));
+ 			  }
+ 			else if (strcmp (charset, "UTF-8"))
+ 			  {
+ 			    for (i3 = 0; i3 < 0x100; i3++)
+ 			      {
+ 				buf[3] = i3;
+ 				result = try (cd, buf, 4, out);
+ 				if (result < 0)
+ 				  {
+ 				  }
+ 				else if (result > 0)
+ 				  {
+ 				    printf ("0x%02X%02X%02X%02X\t0x%04X\n",
+ 					    i0, i1, i2, i3,
+ 					    utf8_decode (out, result));
+ 				  }
+ 				else
+ 				  {
+ 				    fprintf (stderr,
+ 					     "%s: incomplete byte sequence\n",
+ 					     hexbuf (buf, 4));
+ 				    exit (1);
+ 				  }
+ 			      }
+ 			  }
+ 		      }
+ 		  }
+ 	      }
+ 	  }
+       }
+   }
+ 
+   if (iconv_close (cd) < 0)
+     {
+       perror ("iconv_close");
+       exit (1);
+     }
+ 
+   if (ferror (stdin) || ferror (stdout))
+     {
+       fprintf (stderr, "I/O error\n");
+       exit (1);
+     }
+ 
+   exit (0);
+ }
*** glibc-20000831/iconvdata/tst-table-to.c.bak	Sun Sep  3 00:19:30 2000
--- glibc-20000831/iconvdata/tst-table-to.c	Sun Sep  3 02:48:44 2000
***************
*** 0 ****
--- 1,107 ----
+ /* Copyright (C) 2000 Free Software Foundation, Inc.
+    This file is part of the GNU C Library.
+    Contributed by Bruno Haible <haible@clisp.cons.org>, 2000.
+ 
+    The GNU C Library is free software; you can redistribute it and/or
+    modify it under the terms of the GNU Library General Public License as
+    published by the Free Software Foundation; either version 2 of the
+    License, or (at your option) any later version.
+ 
+    The GNU C Library is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Library General Public License for more details.
+ 
+    You should have received a copy of the GNU Library General Public
+    License along with the GNU C Library; see the file COPYING.LIB.  If not,
+    write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+    Boston, MA 02111-1307, USA.  */
+ 
+ /* Create a table from Unicode to CHARSET.
+    This is a good test for CHARSET's iconv() module, in particular the
+    TO_LOOP BODY macro.  */
+ 
+ #include <stddef.h>
+ #include <stdio.h>
+ #include <stdlib.h>
+ #include <iconv.h>
+ #include <errno.h>
+ 
+ int
+ main (int argc, char *argv[])
+ {
+   const char *charset;
+   iconv_t cd;
+ 
+   if (argc != 2)
+     {
+       fprintf (stderr, "Usage: tst-table-to charset\n");
+       exit (1);
+     }
+   charset = argv[1];
+ 
+   cd = iconv_open (charset, "UCS-2");
+   if (cd == (iconv_t)(-1))
+     {
+       perror ("iconv_open");
+       exit (1);
+     }
+ 
+   {
+     unsigned int i;
+     unsigned char buf[10];
+ 
+     for (i = 0; i < 0x10000; i++)
+       {
+ 	unsigned short in = i;
+ 	const char *inbuf = (const char *) &in;
+ 	size_t inbytesleft = sizeof (unsigned short);
+ 	char *outbuf = (char *) buf;
+ 	size_t outbytesleft = sizeof (buf);
+ 	size_t result = iconv (cd,
+ 			       (char *) &inbuf, &inbytesleft,
+ 			       &outbuf, &outbytesleft);
+ 	if (result == (size_t)(-1))
+ 	  {
+ 	    if (errno != EILSEQ)
+ 	      {
+ 		int saved_errno = errno;
+ 		fprintf (stderr, "0x%02X: iconv error: ", i);
+ 		errno = saved_errno;
+ 		perror ("");
+ 		exit (1);
+ 	      }
+ 	  }
+ 	else if (result == 0) /* ignore conversions with transliteration */
+ 	  {
+ 	    unsigned int j, jmax;
+ 	    if (inbytesleft != 0 || outbytesleft == sizeof (buf))
+ 	      {
+ 		fprintf (stderr, "0x%02X: inbytes = %ld, outbytes = %ld\n", i,
+ 			 (long) (sizeof (unsigned short) - inbytesleft),
+ 			 (long) (sizeof (buf) - outbytesleft));
+ 		exit (1);
+ 	      }
+ 	    jmax = sizeof (buf) - outbytesleft;
+ 	    printf ("0x");
+ 	    for (j = 0; j < jmax; j++)
+ 	      printf ("%02X", buf[j]);
+ 	    printf ("\t0x%04X\n", i);
+ 	  }
+       }
+   }
+ 
+   if (iconv_close (cd) < 0)
+     {
+       perror ("iconv_close");
+       exit (1);
+     }
+ 
+   if (ferror (stdin) || ferror (stdout))
+     {
+       fprintf (stderr, "I/O error\n");
+       exit (1);
+     }
+ 
+   exit (0);
+ }
*** glibc-20000831/iconvdata/tst-table-charmap.sh.bak	Sun Sep  3 00:19:30 2000
--- glibc-20000831/iconvdata/tst-table-charmap.sh	Sun Sep  3 12:00:04 2000
***************
*** 0 ****
--- 1,35 ----
+ #!/bin/sh
+ # Copyright (C) 2000 Free Software Foundation, Inc.
+ # This file is part of the GNU C Library.
+ # Contributed by Bruno Haible <haible@clisp.cons.org>, 2000.
+ #
+ # The GNU C Library is free software; you can redistribute it and/or
+ # modify it under the terms of the GNU Library General Public License as
+ # published by the Free Software Foundation; either version 2 of the
+ # License, or (at your option) any later version.
+ #
+ # The GNU C Library is distributed in the hope that it will be useful,
+ # but WITHOUT ANY WARRANTY; without even the implied warranty of
+ # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ # Library General Public License for more details.
+ #
+ # You should have received a copy of the GNU Library General Public
+ # License along with the GNU C Library; see the file COPYING.LIB.  If not,
+ # write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ # Boston, MA 02111-1307, USA.
+ 
+ # Converts a glibc format charmap to a simple format .table file.
+ 
+ LC_ALL=C
+ export LC_ALL
+ 
+ case "$1" in
+   POSIX )
+     # Old POSIX/DKUUG borrowed format
+     grep '^<.*>.*/x[0-9A-Fa-f]*[ 	]*<U....>.*$' | grep -v 'not a real character' | sed -e 's,^<.*>[ 	]*\([/x0-9A-Fa-f]*\)[ 	]*<U\(....\)>.*$,\1	0x\2,' | tr abcdef ABCDEF | sed -e 's,/x\([0-9A-F][0-9A-F]\),\1,g' | sed -e 's,^,0x,' | sort | uniq | grep -v '^0x00	0x\([1-9A-F]...\|.[1-9A-F]..\|..[1-9A-F].\|...[1-9A-F]\)'
+     ;;
+   *)
+     # New Unicode based format
+     sed -e 's,^%IRREVERSIBLE%,,' | grep '^<U....>[ 	]*/x' | grep -v 'not a real character' | sed -e 's,<U\(....\)>[ 	]*\([/x0-9A-Fa-f]*\).*$,\2	0x\1,' | tr abcdef ABCDEF | sed -e 's,/x\([0-9A-F][0-9A-F]\),\1,g' | sed -e 's,^,0x,' | sort | uniq | grep -v '^0x00	0x\([1-9A-F]...\|.[1-9A-F]..\|..[1-9A-F].\|...[1-9A-F]\)'
+     ;;
+ esac
*** glibc-20000831/iconvdata/Makefile.bak	Wed Aug 30 23:43:37 2000
--- glibc-20000831/iconvdata/Makefile	Sun Sep  3 16:31:27 2000
***************
*** 51,56 ****
--- 51,58 ----
  
  tests = bug-iconv1 bug-iconv2
  
+ test-srcs := tst-table-from tst-table-to
+ 
  include ../Makeconfig
  
  libJIS-routines := jis0201 jis0208 jis0212
***************
*** 89,95 ****
  distribute := gconv-modules extra-module.mk gap.awk gaptab.awk		    \
  	      gen-8bit.sh gen-8bit-gap.sh gen-8bit-gap-1.sh		    \
  	      TESTS $(filter-out testdata/CVS%, $(wildcard testdata/*))	    \
! 	      run-iconv-test.sh 8bit-generic.c 8bit-gap.c		    \
  	      ansi_x3.110.c asmo_449.c big5.c cp737.c cp737.h		    \
  	      cp775.c cp775.h ibm874.c cns11643.c cns11643.h		    \
  	      cns11643l1.c cns11643l1.h cp1250.c cp1251.c cp1252.c cp1253.c \
--- 91,100 ----
  distribute := gconv-modules extra-module.mk gap.awk gaptab.awk		    \
  	      gen-8bit.sh gen-8bit-gap.sh gen-8bit-gap-1.sh		    \
  	      TESTS $(filter-out testdata/CVS%, $(wildcard testdata/*))	    \
! 	      run-iconv-test.sh tst-tables.sh tst-table.sh		    \
! 	      tst-table-charmap.sh tst-table-from.c tst-table-to.c	    \
! 	      EUC-JP.irreversible ISIRI-3342.irreversible SJIS.irreversible \
! 	      8bit-generic.c 8bit-gap.c					    \
  	      ansi_x3.110.c asmo_449.c big5.c cp737.c cp737.h		    \
  	      cp775.c cp775.h ibm874.c cns11643.c cns11643.h		    \
  	      cns11643l1.c cns11643l1.h cp1250.c cp1251.c cp1252.c cp1253.c \
***************
*** 244,250 ****
  
  ifeq (no,$(cross-compiling))
  ifeq (yes,$(build-shared))
! tests: $(objpfx)iconv-test.out
  endif
  endif
  
--- 249,255 ----
  
  ifeq (no,$(cross-compiling))
  ifeq (yes,$(build-shared))
! tests: $(objpfx)iconv-test.out $(objpfx)tst-tables.out
  endif
  endif
  
***************
*** 254,259 ****
--- 259,275 ----
  			 $(addprefix $(objpfx),$(modules.so)) \
  			 $(common-objdir)/iconv/iconv_prog TESTS
  	$(SHELL) -e $< $(common-objdir) > $@
+ 
+ $(objpfx)tst-tables.out: tst-tables.sh $(objpfx)gconv-modules \
+ 			 $(addprefix $(objpfx),$(modules.so)) \
+ 			 $(objpfx)tst-table-from $(objpfx)tst-table-to
+ 	$(SHELL) $< $(common-objpfx) $(common-objpfx)iconvdata/ > $@
+ 
+ do-tests-clean common-mostlyclean: tst-tables-clean
+ 
+ .PHONY: tst-tables-clean
+ tst-tables-clean:
+ 	-rm -f $(objpfx)tst-*.table $(objpfx)tst-EUC-TW.irreversible
  
  ifdef objpfx
  $(objpfx)gconv-modules: gconv-modules
*** glibc-20000831/iconvdata/ISIRI-3342.irreversible.bak	Sun Sep  3 03:51:34 2000
--- glibc-20000831/iconvdata/ISIRI-3342.irreversible	Sun Sep  3 03:50:02 2000
***************
*** 0 ****
--- 1,52 ----
+ 0x80	0x0000
+ 0x81	0x0001
+ 0x82	0x0002
+ 0x83	0x0003
+ 0x84	0x0004
+ 0x85	0x0005
+ 0x86	0x0006
+ 0x87	0x0007
+ 0x88	0x0008
+ 0x89	0x0009
+ 0x8A	0x000A
+ 0x8B	0x000B
+ 0x8C	0x000C
+ 0x8D	0x000D
+ 0x8E	0x000E
+ 0x8F	0x000F
+ 0x90	0x0010
+ 0x91	0x0011
+ 0x92	0x0012
+ 0x93	0x0013
+ 0x94	0x0014
+ 0x95	0x0015
+ 0x96	0x0016
+ 0x97	0x0017
+ 0x98	0x0018
+ 0x99	0x0019
+ 0x9A	0x001A
+ 0x9B	0x001B
+ 0x9C	0x001C
+ 0x9D	0x001D
+ 0x9E	0x001E
+ 0x9F	0x001F
+ 0xA0	0x0020
+ 0xA3	0x0021
+ 0xA6	0x002E
+ 0xA8	0x0029
+ 0xA9	0x0028
+ 0xAB	0x002B
+ 0xAD	0x002D
+ 0xAF	0x002F
+ 0xBA	0x003A
+ 0xBC	0x003C
+ 0xBD	0x003D
+ 0xBE	0x003E
+ 0xE2	0x005D
+ 0xE3	0x005B
+ 0xE4	0x007D
+ 0xE5	0x007B
+ 0xE8	0x002A
+ 0xEA	0x007C
+ 0xEB	0x005C
+ 0xFF	0x007F
*** glibc-20000831/iconvdata/EUC-JP.irreversible.bak	Sun Sep  3 15:35:47 2000
--- glibc-20000831/iconvdata/EUC-JP.irreversible	Sun Sep  3 12:17:13 2000
***************
*** 0 ****
--- 1,6 ----
+ 0x5C	0x00A5
+ 0x7E	0x203E
+ 0x8FA2B7	0x007E
+ 0x8FA2B7	0xFF5E
+ 0xA1C0	0x005C
+ 0xA1C0	0xFF3C
*** glibc-20000831/iconvdata/SJIS.irreversible.bak	Sun Sep  3 15:36:00 2000
--- glibc-20000831/iconvdata/SJIS.irreversible	Sun Sep  3 04:09:56 2000
***************
*** 0 ****
--- 1,7 ----
+ 0x5C	0x005C
+ 0x7E	0x007E
+ 0x815F	0x005C
+ 0x815F	0xFF3C
+ 0x8191	0xFFE0
+ 0x8192	0xFFE1
+ 0x81CA	0xFFE2

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]