This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug manual/19404] New: Treatment of combining characters by iconv not documented well
- From: "GavinSmith0123 at gmail dot com" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Sun, 27 Dec 2015 11:47:09 +0000
- Subject: [Bug manual/19404] New: Treatment of combining characters by iconv not documented well
- Auto-submitted: auto-generated
https://sourceware.org/bugzilla/show_bug.cgi?id=19404
Bug ID: 19404
Summary: Treatment of combining characters by iconv not
documented well
Product: glibc
Version: unspecified
Status: NEW
Severity: normal
Priority: P2
Component: manual
Assignee: unassigned at sourceware dot org
Reporter: GavinSmith0123 at gmail dot com
CC: mtk.manpages at gmail dot com, roland at gnu dot org
Target Milestone: ---
In some character encodings like cp-1255, there are combining characters that
can be combined with the preceding character: for example, to represent accents
or vowel points.
When iconv processes input from such an encoding, it may not output a character
from the input until it sees whether a combining character follows which would
have to be combined with the character. At the end of the input, it is
necessary to call iconv with null input arguments to flush the last character.
The manual (in manual/charset.texi, node Generic Conversion Interface) doesn't
document this well. It says the following:
"If INBUF is a null pointer, the `iconv' function performs the
necessary action to put the state of the conversion into the
initial state.
...
"Therefore an `iconv' call to reset the state should always
be performed if some protocol requires this for the output text"
This does not obviously apply for combining characters. In this case every
non-combining, graphical character is simultaneously a shift character and not
a shift character: a shift character when a combining character comes after it,
and not a shift character when a combining character doesn't come after it or
it occurs at the end of the input. This is not what people have in mind when
they read about "shift sequences". The manual explains that the shift state is
reset for the output, but not that graphical characters may be waiting to be
output.
Moreover, the following in the manual is misleading:
"If all input from the input buffer is successfully converted and
stored in the output buffer, the function returns the number of
non-reversible conversions performed."
This is not true because a positive return value is possible while a character
from the input waits in the iconv state, and is not stored in the output
buffer.
The extra call to iconv was missing for wget (see
http://lists.gnu.org/archive/html/bug-wget/2015-12/msg00110.html) and info (see
https://lists.gnu.org/archive/html/bug-texinfo/2015-12/msg00010.html).
--
You are receiving this mail because:
You are on the CC list for the bug.