This is the mail archive of the glibc-bugs@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libc/1124] iconv incorrectly converts cp1255


------- Additional Comments From redhat-bugzilla at future dot shiny dot co dot il  2005-07-31 20:44 -------
>From a look into the code today, it turns out the problem origins from the fact
that CP-1255 is a stateful encoding. Upon taking a Hebrew letter from input,
it'll buffer it and take the next letter. If the next letter is a Nikud (Hebrew
diacritic mark) one, it'll try to find a Unicode codepoint which can represent
the Letter+Nikud sequence as a single Unicode codepoint and then use it.

In short, it always buffers one letter back. Therefore, unless the string is
null-terminated (thus having the null flush the last character), the last
character will remain in the "state" storage and will never be flushed as a
Unicode codepoint to the output.

It doesn't perform a flush of the state at the end, probably since iconv is a
stream recoder and therefore the next call might provide the continuation of the
stream (e.g. the first part of the stream ends with a Letter and the next part
begins with a Nikud, and they can be joined as a single Unicode codepoint).
There's no way to tell iconv "This is the last chunk of the stream, so flush away".

Solutions? Either:
1. Always flush the last character. (This is not a biggie. We will use the
non-composed form of Unicode codepoint for letter + Unicode codepoint for Nikud
in this case.)
2. Make iconv with NUL-input and non-NUL output perform this flush.

What you say?

-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=1124

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]