This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug localedata/20903] charmaps: glibc's Windows single-byte pages don't map like Windows for previously unmapped points


https://sourceware.org/bugzilla/show_bug.cgi?id=20903

--- Comment #3 from Mingye Wang <arthur200126 at gmail dot com> ---
$ iconv -f utf-8 -t windows-1252 <<< $'\u0081' | hexdump -C
iconv: illegal input sequence at position 0
(Expect output is \x80\n.)

$ iconv -f windows-874 -t utf-32le <<< $'\x9f\x81' | hexdump -C
iconv: illegal input sequence at position 0
(Expect output is \x9f\0\0\0\x81\0\0\0\n\0\0\0.)

Theoretically this should work when things are fixed:

$ blob=$' '
$ blob=${blob%' '}$'\u00a0'
$ blob=$(iconv -t cp1252 -f utf-8 <<< "$blob")
$ blob=$(iconv -t cp1252 -f utf-8 <<< "$blob") # ERROR
$ blob=$(iconv -t iso-8859-1 -f utf-8 <<< "$blob")
$ blob=$(iconv -t cp1252 -f utf-8 <<< "$blob")
$ blob=$(iconv -t iso-8859-1 -f utf-8 <<< "$blob")
$ [[ "$blob" == $'\xa0' ]]; echo $? # expected: 0

* * *

Correction: Windows only does the same-value assignment for C1 range
(0x80-0x9f). Outside of this range, Windows assigns PUA mappings as if they
were EUDC chars.[1] (Should have read "best fit" more carefully.)
  [1]: https://bugs.python.org/issue28712#msg281044

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]