This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: iconv and combining characters


Chris Heath wrote:
> separate codeset name for Unicode that may be non-NFC.  Something like:
>   iconv -f UTF-8-UNNORMALIZED -t L1
> This has the advantage of not having any speed/memory penalty for those
> who know their data is NFC.

That's a good suggestion. While
    iconv -f UTF-8 -t UTF-8
will remain a no-op,
    iconv -f UTF-8-UNNORMALIZED -t UTF-8
will actually be useful. UCS-4-UNNORMALIZED and UTF-16-UNNORMALIZED should
be covered similarly.

Andreas Schwab wrote:
> MacOS X uses NFD throughout.  (If you have filenames in NFC and you import
> them via NFS to MacOS X the Finder gets confused.)

Good point. This means we should also offer something like

    iconv -f UTF-8 -t UTF-8-DECOMPOSED

Thanks for the good suggestions.

Bruno


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]