This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

wrong charmap name for Shift_JIS



Hi,

For the Shift_JIS encoding, glibc uses the name "SJIS", but in the IANA
charset registry "SJIS" doesn't exist, only "Shift_JIS" (as preferred MIME
name) and "MS_Kanji". Use of the standard name "Shift_JIS" as argument to
localedef doesn't lead to a working locale:

  # localedef -c -f SHIFT_JIS -i ja_JP ja_JP.SJIS
  <lots of error messages>
  # LC_ALL=ja_JP.SJIS locale charmap
  ANSI_X3.4-1968

And use of "SJIS" leads to nl_langinfo(CODESET) returning a nonstandard name:

  # localedef -c -f SJIS -i ja_JP ja_JP.SJIS
  character map `SJIS' is not ASCII compatible, locale not ISO C compliant
  # LC_ALL=ja_JP.SJIS locale charmap
  SJIS

The fact that GNU gettext expects PO files labelled with "charset=SJIS",
a choice which was made for consistency with glibc, has been reported as
a bug in GNU gettext.

To fix this, here is a patch along the same lines as we did with GB2312
(which was previously named EUC-CN in glibc). It leads to the following
behaviour:

  # localedef -c -f SHIFT_JIS -i ja_JP ja_JP.SJIS
  character map `SHIFT_JIS' is not ASCII compatible, locale not ISO C compliant
  # LC_ALL=ja_JP.SJIS locale charmap
  SHIFT_JIS

  # localedef -c -f SJIS -i ja_JP ja_JP.SJIS
  character map `SHIFT_JIS' is not ASCII compatible, locale not ISO C compliant
  # LC_ALL=ja_JP.SJIS locale charmap
  SHIFT_JIS


localedata/ChangeLog:
2001-05-26  Bruno Haible  <haible@clisp.cons.org>

	* charmaps/SHIFT_JIS: Renamed from charmaps/SJIS. Change code_set_name
	to SHIFT_JIS. Add SJIS as alias.
	* Makefile (CHARMAPS): For SJIS locale, use SHIFT_JIS charmap.
	* gen-locale.sh: Likewise.

ChangeLog:
2001-05-26  Bruno Haible  <haible@clisp.cons.org>

	* iconvdata/tst-tables.sh: For SJIS module, use SHIFT_JIS charmap.
	* manual/charset.texi: Write Shift_JIS, not Shift-JIS.

Please rename localedata/charmaps/SJIS to localedata/charmaps/SHIFT_JIS
before applying the patch.

--- glibc-20010430/localedata/charmaps/SJIS.bak	Mon Dec  4 19:53:45 2000
+++ glibc-20010430/localedata/charmaps/SHIFT_JIS	Sat May 26 16:22:11 2001
@@ -1,9 +1,10 @@
-<code_set_name> SJIS
+<code_set_name> SHIFT_JIS
 <comment_char> %
 <escape_char> /
 <mb_cur_min> 1
 <mb_cur_max> 2
 
+% alias SJIS
 CHARMAP
 <U0000>     /x00     NULL (NUL)
 <U0001>     /x01     START OF HEADING (SOH)
--- glibc-20010430/localedata/Makefile.bak	Tue Feb  6 14:39:11 2001
+++ glibc-20010430/localedata/Makefile	Sat May 26 16:40:33 2001
@@ -125,7 +125,8 @@
 	   en_US.ISO-8859-1 ja_JP.EUC-JP da_DK.ISO-8859-1 \
 	   hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 ja_JP.SJIS fr_FR.ISO-8859-1
 LOCALE_SRCS := $(shell echo "$(LOCALES)"|sed 's/\([^ .]*\)[^ ]*/\1/g')
-CHARMAPS := $(shell echo "$(LOCALES)"|sed 's/[^ .]*[.]\([^ ]*\)/\1/g')
+CHARMAPS := $(shell echo "$(LOCALES)" | \
+		    sed -e 's/[^ .]*[.]\([^ ]*\)/\1/g' -e s/SJIS/SHIFT_JIS/g)
 CTYPE_FILES = $(addsuffix /LC_CTYPE,$(LOCALES))
 
 generated-dirs += $(LOCALES)
--- glibc-20010430/localedata/gen-locale.sh.bak	Thu Jul 13 19:45:51 2000
+++ glibc-20010430/localedata/gen-locale.sh	Sat May 26 16:45:21 2001
@@ -1,6 +1,6 @@
 #! /bin/sh
 # Generate test locale files.
-# Copyright (C) 2000 Free Software Foundation, Inc.
+# Copyright (C) 2000-2001 Free Software Foundation, Inc.
 # This file is part of the GNU C Library.
 #
 # The GNU C Library is free software; you can redistribute it and/or
@@ -43,4 +43,5 @@
 charmap=`echo $locfile|sed 's|[^.]*[.]\(.*\)/LC_CTYPE|\1|'`
 
 echo "Generating locale $locale.$charmap: this might take a while..."
-generate_locale $charmap $locale $locale.$charmap
+generate_locale `echo $charmap | sed -e s/SJIS/SHIFT_JIS/` $locale \
+		$locale.$charmap
--- glibc-20010430/iconvdata/tst-tables.sh.bak	Sat Oct 28 01:18:54 2000
+++ glibc-20010430/iconvdata/tst-tables.sh	Sat May 26 16:58:54 2001
@@ -1,5 +1,5 @@
 #!/bin/sh
-# Copyright (C) 2000 Free Software Foundation, Inc.
+# Copyright (C) 2000-2001 Free Software Foundation, Inc.
 # This file is part of the GNU C Library.
 # Contributed by Bruno Haible <haible@clisp.cons.org>, 2000.
 #
@@ -184,7 +184,7 @@
   #
   # Multibyte encodings come here
   #
-  SJIS
+  SJIS              SHIFT_JIS
   EUC-KR
   CP949
   JOHAB
--- glibc-20010430/manual/charset.texi.bak	Mon Apr 30 22:26:42 2001
+++ glibc-20010430/manual/charset.texi	Sat May 26 16:28:18 2001
@@ -247,6 +247,7 @@
 bytes.
 
 @cindex EUC
+@cindex Shift_JIS
 @cindex SJIS
 In most uses of @w{ISO 2022} the defined character sets do not allow
 state changes which cover more than the next character.  This has the
@@ -254,7 +255,7 @@
 sequence of a character one can interpret a text correctly.  Examples of
 character sets using this policy are the various EUC character sets
 (used by Sun's operations systems, EUC-JP, EUC-KR, EUC-TW, and EUC-CN)
-or SJIS (Shift-JIS, a Japanese encoding).
+or Shift_JIS (SJIS, a Japanese encoding).
 
 But there are also character sets using a state which is valid for more
 than one character and has to be changed by another byte sequence.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]