This is the mail archive of the libc-alpha@sourceware.cygnus.com mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

using iconv for conversion from/to Unicode

To: libc-alpha at sourceware dot cygnus dot com
Subject: using iconv for conversion from/to Unicode
From: Bruno Haible <haible at ilog dot fr>
Date: Tue, 14 Mar 2000 14:51:10 +0100 (MET)
Cc: linux-utf8 at nl dot linux dot org


Hi Ulrich,

Would it be possible to add to glibc two encodings:

  (a) UCS-2 with the endianness and alignment restrictions of the running CPU,
      without byte order mark, i.e. arrays of uint16_t,

  (b) UCS-4 with the endianness and alignment restrictions of the running CPU,
      without byte order mark, i.e. arrays of uint32_t,

Proposed names: "uint16_t" and "uint32_t". Or "UCS-2-INTERNAL" and
"UCS-4-INTERNAL".

This would be a great help for programs that use Unicode as their internal
representation. Some such programs use UTF-8 as their internal string
representation, but some others use uint16_t[] or uint32_t[]. Currently
such programs, in order to avoid endianness and BOM issues, have to
convert in two different steps: from the locale dependent encoding to
UTF-8 via iconv(), then from UTF-8 to uint16_t[] or uint32_t[] via a
self-written recoding loop. This wastes programmers' efforts and CPU cycles.

"UCS-2-INTERNAL" would not be hard to implement: This is just a #ifdef
choice between "UNICODEBIG" and "UNICODELITTLE", both already implemented
in glibc.

"UCS-4-INTERNAL" would not be hard to add either: It's already glibc's
internal encoding, but unfortunately you can't convert from/to it using
iconv().

Whatever new names you choose in glibc, they will be supported by the next
versions of 'libiconv' and 'recode'. Therefore don't worry about portability.
(glibc is the only system libc with a usable iconv() anyway...)

Bruno

Follow-Ups:
- Re: using iconv for conversion from/to Unicode
  - From: Markus Kuhn

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]