This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug localedata/20903] charmaps: glibc's Windows single-byte pages don't map like Windows for previously unmapped points
- From: "arthur200126 at gmail dot com" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Fri, 02 Dec 2016 15:00:45 +0000
- Subject: [Bug localedata/20903] charmaps: glibc's Windows single-byte pages don't map like Windows for previously unmapped points
- Auto-submitted: auto-generated
- References: <bug-20903-131@http.sourceware.org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=20903
--- Comment #3 from Mingye Wang <arthur200126 at gmail dot com> ---
$ iconv -f utf-8 -t windows-1252 <<< $'\u0081' | hexdump -C
iconv: illegal input sequence at position 0
(Expect output is \x80\n.)
$ iconv -f windows-874 -t utf-32le <<< $'\x9f\x81' | hexdump -C
iconv: illegal input sequence at position 0
(Expect output is \x9f\0\0\0\x81\0\0\0\n\0\0\0.)
Theoretically this should work when things are fixed:
$ blob=$' '
$ blob=${blob%' '}$'\u00a0'
$ blob=$(iconv -t cp1252 -f utf-8 <<< "$blob")
$ blob=$(iconv -t cp1252 -f utf-8 <<< "$blob") # ERROR
$ blob=$(iconv -t iso-8859-1 -f utf-8 <<< "$blob")
$ blob=$(iconv -t cp1252 -f utf-8 <<< "$blob")
$ blob=$(iconv -t iso-8859-1 -f utf-8 <<< "$blob")
$ [[ "$blob" == $'\xa0' ]]; echo $? # expected: 0
* * *
Correction: Windows only does the same-value assignment for C1 range
(0x80-0x9f). Outside of this range, Windows assigns PUA mappings as if they
were EUDC chars.[1] (Should have read "best fit" more carefully.)
[1]: https://bugs.python.org/issue28712#msg281044
--
You are receiving this mail because:
You are on the CC list for the bug.