This is the mail archive of the libc-alpha@sourceware.cygnus.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

problem with ISO-2022-KR encoder


Hello,

The glibc-2.1.1 iconv ISO-2022-KR encoder puts an "Esc $ ) C" sequence
only once, at the beginning of its output, not in every line.

Ken Lunde's CJK.INF says the SO designator needs to appear only once at the
beginning of a text (rationale: because ISO-2022-KR uses only one two-byte
character set), but RFC 1557 says it must appear once in every line
containing SO characters (rationale: so that if some lines of the text get
lost, the remaining are still recognizable as Korean).

glibc-2.1.1 iconv doesn't implement this RFC 1557 requirement:

$ iconv -f UTF-8 -t ISO-2022-KR < KSC5601-snippet.utf-8 > x

Here is a hexdump of the output:
000000  1B 24 29 43 4B 6F 72 65 61 6E 20 28 0E 47 51 31  .$)CKorean (.GQ1
000010  5B 0F 29 09 09 09 0E 3E 48 33 67 47 4F 3C 3C 3F  [.)....>H3gGO<<?
000020  64 0F 2C 20 0E 3E 48 33 67 47 4F 3D 4A 34 4F 31  d., .>H3gGO=J4O1
000030  6E 0F 0A 09 4B 53 43 20 20 2D 2D 20 0E 6A 2A 51  n...KSC  -- .j*Q
              ^^                            ^^
000040  28 0F 20 20 0E 4B 52 5B 21 0F 0A                 (.  .KR[!..


          Bruno

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]