This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug manual/19404] New: Treatment of combining characters by iconv not documented well


https://sourceware.org/bugzilla/show_bug.cgi?id=19404

            Bug ID: 19404
           Summary: Treatment of combining characters by iconv not
                    documented well
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: manual
          Assignee: unassigned at sourceware dot org
          Reporter: GavinSmith0123 at gmail dot com
                CC: mtk.manpages at gmail dot com, roland at gnu dot org
  Target Milestone: ---

In some character encodings like cp-1255, there are combining characters that
can be combined with the preceding character: for example, to represent accents
or vowel points.

When iconv processes input from such an encoding, it may not output a character
from the input until it sees whether a combining character follows which would
have to be combined with the character. At the end of the input, it is
necessary to call iconv with null input arguments to flush the last character.

The manual (in manual/charset.texi, node Generic Conversion Interface) doesn't
document this well. It says the following:

"If INBUF is a null pointer, the `iconv' function performs the
     necessary action to put the state of the conversion into the
     initial state.

...

"Therefore an `iconv' call to reset the state should always
     be performed if some protocol requires this for the output text"


This does not obviously apply for combining characters. In this case every
non-combining, graphical character is simultaneously a shift character and not
a shift character: a shift character when a combining character comes after it,
and not a shift character when a combining character doesn't come after it or
it occurs at the end of the input. This is not what people have in mind when
they read about "shift sequences". The manual explains that the shift state is
reset for the output, but not that graphical characters may be waiting to be
output.

Moreover, the following in the manual is misleading:

"If all input from the input buffer is successfully converted and
     stored in the output buffer, the function returns the number of
     non-reversible conversions performed."

This is not true because a positive return value is possible while a character
from the input waits in the iconv state, and is not stored in the output
buffer.

The extra call to iconv was missing for wget (see
http://lists.gnu.org/archive/html/bug-wget/2015-12/msg00110.html) and info (see
https://lists.gnu.org/archive/html/bug-texinfo/2015-12/msg00010.html).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]