This is the mail archive of the
glibc-bugs@sources.redhat.com
mailing list for the glibc project.
[Bug libc/1124] iconv incorrectly converts cp1255
- From: "redhat-bugzilla at future dot shiny dot co dot il" <sourceware-bugzilla at sources dot redhat dot com>
- To: glibc-bugs at sources dot redhat dot com
- Date: 31 Jul 2005 20:44:54 -0000
- Subject: [Bug libc/1124] iconv incorrectly converts cp1255
- References: <20050723195836.1124.z9u2k@bezeqint.net>
- Reply-to: sourceware-bugzilla at sources dot redhat dot com
------- Additional Comments From redhat-bugzilla at future dot shiny dot co dot il 2005-07-31 20:44 -------
>From a look into the code today, it turns out the problem origins from the fact
that CP-1255 is a stateful encoding. Upon taking a Hebrew letter from input,
it'll buffer it and take the next letter. If the next letter is a Nikud (Hebrew
diacritic mark) one, it'll try to find a Unicode codepoint which can represent
the Letter+Nikud sequence as a single Unicode codepoint and then use it.
In short, it always buffers one letter back. Therefore, unless the string is
null-terminated (thus having the null flush the last character), the last
character will remain in the "state" storage and will never be flushed as a
Unicode codepoint to the output.
It doesn't perform a flush of the state at the end, probably since iconv is a
stream recoder and therefore the next call might provide the continuation of the
stream (e.g. the first part of the stream ends with a Letter and the next part
begins with a Nikud, and they can be joined as a single Unicode codepoint).
There's no way to tell iconv "This is the last chunk of the stream, so flush away".
Solutions? Either:
1. Always flush the last character. (This is not a biggie. We will use the
non-composed form of Unicode codepoint for letter + Unicode codepoint for Nikud
in this case.)
2. Make iconv with NUL-input and non-NUL output perform this flush.
What you say?
--
http://sources.redhat.com/bugzilla/show_bug.cgi?id=1124
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.