This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug locale/19519] iconv(1) with -c option hangs on illegal multi-byte sequences (CVE-2016-10228)


https://sourceware.org/bugzilla/show_bug.cgi?id=19519

Shane Seymour <shane.seymour at hpe dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |shane.seymour at hpe dot com

--- Comment #8 from Shane Seymour <shane.seymour at hpe dot com> ---
I thought I'd chime in since I believe the independent report mentioned by
Florian was from me.

Although it's not a generic fix for the underlying issue for a class of
character sets that use shift in/shift out sequences (e.g. IBM930 and IBM933)
some of them work as expected and some don't (I can't remember which one did
ignore a duplicate shift in or out). For IBM930 the BODY macro looks in part
like:

#define BODY \
  {                                                                           \
    uint32_t ch = *inptr;                                                     \
    uint32_t res;                                                             \
                                                                              \
    if (__builtin_expect (ch, 0) == SO)                                       \
      {                                                                       \
        /* Shift OUT, change to DBCS converter.  */                           \
        if (curcs == db)                                                      \
          {                                                                   \
            result = __GCONV_ILLEGAL_INPUT;                                   \
            break;                                                            \
          }                                                                   \
        curcs = db;                                                           \
        ++inptr;                                                              \
        continue;                                                             \
      }                                                                       \

if this line:

        if (curcs == db)                                                      \

was changed to:

        if (curcs == db && (! ignore_errors_p ()))                            \

Then a duplicate shift out sequence in IBM930 would not cause the iconv command
to loop forever with the -c option (the shift in code needs the same change). 

But in general the issue could be resolved up by the code that does the
conversions not returning an error when being told ignore errors. In the above
example while it appears to be illegal to have duplicate shift in/out sequences
when ignoring errors the duplicate should probably just be consumed and
ignored.

Should there be a general rule that the character set conversion macros must do
something with their input when faced with something invalid and they've been
told to ignore errors? Then iconv doesn't need to know anything when set to
ignore errors and it shouldn't loop.

For non-variable size multi-byte character sets the converter at least it knows
based on the character set the number of bytes it should consume so it's easier
for it to skip past any error (e.g. IBM930 knows if it should skip one or two
bytes based on the current shift state). 

I thought of these general rules when errors are ignored:

1. Character set converters using a fixed or know number of bytes will skip
forward one character when encountering an invalid character (includes
character sets that are always the one fixed size and character sets using
shift states between different size fixed characters)
2. Invalid structural data will be consumed (e.g. duplicate shift in/out states
in character sets like IBM930)
3. Variable size character sets will consume everything in from the start of a
character up to and including the byte that made the character invalid (e.g. in
SJIS if the sequence of bytes 0x81 0x20 was seen both bytes would be consumed
not just the 0x81 but that could instead easily be advance one byte and start
again).

Those rules are entirely open to discussion though - I've probably missed some
conditions that need to be covered. The downside would be that there are a lot
of macros to review to make sure that they would all follow a defined set of
rules and may it change their behaviour.

Anything following a conversion failure should really be considered undefined
anyway (as someone else said garbage in garbage out) so how bad data gets
treated can follow set rules as long as they are followed consistently by
everything.

Does it make more sense that the character set converters should decide what to
drop, skip or quitely consume when being told to ignore errors vs doing it
somewhere that doesn't really know much about any particular character set?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]