This is the mail archive of the
libc-alpha@sourceware.cygnus.com
mailing list for the glibc project.
problem with ISO-2022-KR encoder
- To: libc-alpha at sourceware dot cygnus dot com
- Subject: problem with ISO-2022-KR encoder
- From: Bruno Haible <haible at ilog dot fr>
- Date: Mon, 20 Dec 1999 16:54:56 +0100 (MET)
Hello,
The glibc-2.1.1 iconv ISO-2022-KR encoder puts an "Esc $ ) C" sequence
only once, at the beginning of its output, not in every line.
Ken Lunde's CJK.INF says the SO designator needs to appear only once at the
beginning of a text (rationale: because ISO-2022-KR uses only one two-byte
character set), but RFC 1557 says it must appear once in every line
containing SO characters (rationale: so that if some lines of the text get
lost, the remaining are still recognizable as Korean).
glibc-2.1.1 iconv doesn't implement this RFC 1557 requirement:
$ iconv -f UTF-8 -t ISO-2022-KR < KSC5601-snippet.utf-8 > x
Here is a hexdump of the output:
000000 1B 24 29 43 4B 6F 72 65 61 6E 20 28 0E 47 51 31 .$)CKorean (.GQ1
000010 5B 0F 29 09 09 09 0E 3E 48 33 67 47 4F 3C 3C 3F [.)....>H3gGO<<?
000020 64 0F 2C 20 0E 3E 48 33 67 47 4F 3D 4A 34 4F 31 d., .>H3gGO=J4O1
000030 6E 0F 0A 09 4B 53 43 20 20 2D 2D 20 0E 6A 2A 51 n...KSC -- .j*Q
^^ ^^
000040 28 0F 20 20 0E 4B 52 5B 21 0F 0A (. .KR[!..
Bruno