This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug localedata/22371] U+FFE2 and U+FFE4, iconv does not convert to HALFWIDTH(EUC-JISX0213)


https://sourceware.org/bugzilla/show_bug.cgi?id=22371

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |carlos at redhat dot com

--- Comment #2 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Akira Nakajima from comment #0)
> When converting to EUC-JISX0213,
> iconv does not convert 'FULLWIDTH NOT SIGN' (U+FFE2) to 'NOT SIGN' (U+00AC).
> 
> # printf '\xef\xbf\xa2' | iconv -c -f UTF-8 -t EUC-JISX0213 | od -tx1
> 0000000
> 
> 
> Nearby characters are converted to FULLWIDTH to HALFWIDTH.
> 
> # printf '\xef\xbf\xa3' | iconv -c -f UTF-8 -t EUC-JISX0213 | od -tx1
> 0000000 a1 b1
> 
> 
> nkf converts to 'a2 cc'.

You can attain a similar behaviour with iconv if you use '//TRANSLIT' to
transliterate.

printf '\xef\xbf\xa2' | iconv -c -f UTF-8 -t EUC-JISX0213//TRANSLIT | od -tx1
0000000 a2 cc
0000002

Without transliteration the 'FULLWIDTH NOT SIGN' has no direct representation
in the target character map, and as Andreas explains, the conversion mapping
must be injective for iconv. The <U00AC> is already assigned to the NOT SIGN.

Note that once transliterated you cannot go back, the mapping with
transliteration is not injective.

> As similarly,
> iconv does not convert 'FULLWIDTH BROKEN BAR' (U+FFE4)  to 'BROKEN BAR'
> (U+00A6).
> 
> # printf '\xef\xbf\xa4' | iconv -c -f UTF-8 -t EUC-JISX0213 | od -tx1
> 0000000
> 
> # printf '\xef\xbf\xa4' | unorm --normalization=nfkd | od -tx1
> 0000000 c2 a6
> 
> HALFWIDTH (U+00A6) is exists in EUC-JISX0213 at 'a9 a5'.
> http://charset.uic.jp/show/eucjisx0213/

printf '\xef\xbf\xa4' | iconv -c -f UTF-8 -t EUC-JISX0213//TRANSLIT | od -tx1
0000000 a9 a5
0000002

Again, with transliteration you can change one into the other.

Keep in mind that transliteration is locale specific.

In this case glibc has a 'translit_wide' specifically for wide
transliterations, and it *is* part of our language and locale neutral
transliterations that is included by our global i18n locale specification. Thus
all locales "should" have this neutral transformation available.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]