This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: 1.7.0-48: [BUG] Passing characters above 128 from bash command line


Corinna Vinschen wrote:
On Jun 3 12:02, Christopher Faylor wrote:
On Wed, Jun 03, 2009 at 04:27:55PM +0200, Corinna Vinschen wrote:
On Jun 3 09:18, Edward Lam wrote:
Corinna Vinschen wrote:
The question is, what do you expect? [...]
[...]
Wikipedia has several suggestions on how to handle invalid UTF-8 byte sequences (http://en.wikipedia.org/wiki/UTF-8). Personally, I favor the rule that uses the replacement character.
Chris implemented using the invalid code point solution.  The discussion
in http://www.mail-archive.com/linux-utf8@nl.linux.org/msg00080.html
supports this solution.  What's missing so far is the way back, from
an invalid single second half of a surrogate pair in the 0xDCxx range
back to the correct byte value.  I'm just looking into that.
The way back was not, AFAIK, needed for Cygwin programs.  I don't think
there is a valid way back for Windows programs.

The way back is not needed for the argv handling in Cygwin, but it gets necessary if you converted to UTF-16 in other circumstances. It's not much of a problem since the way back is a no-brainer, in contrast to the conversion to UTF-16.

What is the current state of affairs in cygwin 1.7.0-48? Is the invalid code point solution currently being used when converting the command line to UTF-16 when spawning non-cygwin processes? What I'm trying to understand is where the command line truncation is taking place, in the parent or child process.


If the truncation is happening in the child process because of the invalid code point, then perhaps we should consider using the replacement character solution when spawning non-cygwin child processes. IMHO, having a bad character is better than having a truncated command line. At least, the problem (invalid UTF-8) then becomes more obvious.

-Edward

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]