This is the mail archive of the cygwin-apps@cygwin.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re[2]: libgetopt++ and setup and libstdc++


Hello Robert,

Wednesday, May 01, 2002, 10:22:03 AM, you wrote:

>> -----Original Message-----
>> From: Gary R. Van Sickle [mailto:g.r.vansickle@worldnet.att.net] 
>> Sent: Monday, April 29, 2002 5:39 AM

>> > Except that widechar != unicode. WCHAR is still an 0 terminated 
>> > string, but Unicode strings are not 0 terminated.
>> 
>> Sure they are.  A Unicode '\0' == 0x0000 (regardless of your 
>> byte order ;-)).
>>

Zero terminated strings (C style strings) has nothing to do with the
basic_string template class. basic_string can contain any character
including \0. Its much the same as the STL vector. The WCHAR here
specifies the size of storage of a single character...

I.e. you can have typedef basic_string<struct SomeStrangeChar>
SomeStrangeCharString;

RC> Read http://www.unicode.org/unicode/uni2book/ch05.pdf section 5.2.
RC> Also read http://www.unicode.org/unicode/uni2book/ch02.pdf which does
RC> note that nul(U+0000) can be used as a string terminator.

RC> Then http://www.unicode.org/unicode/reports/tr17/
RC> "C and C++ char* APIs use serialized bytes, which could represent a
RC> variety of different character maps, including ISO Latin 1, UTF-8,
RC> Windows 1252, as well as compound character maps such as Shift-JIS or
RC> 2022-JP. A byte API could also handle UTF-16BE or UTF-16LE, which are
RC> serialized forms of Unicode. However, these APIs must be allow for the
RC> existence of any byte value, and typically use memcpy plus length
RC> instead of strcpy for manipulating strings." (which is possibly
RC> referring to a non-wchar_t aware strcpy, not sure here).

RC> Anyway, things like UTF-8 can confuse the heck out of c-libraries
RC> because of their multi-byte nature, where
RC> a) a NULL may be part way through a chacter, not terminating, and
RC> b) a NULL may be illegal at a given point, and the previous partial
RC> character is invalid.

RC> Finally, note that Unicde requires 21 bits of storage, so a 16 bit WCHAR
RC> will still involve multi-byte sequence.

Quote from "The C++ Programming Language":

  "A wide character - that is, an object of type wchar_t ($4.3) - is
  like a char, except that it take up two or more  bytes."

RC> Does the newlib && lib-gcc and libstdc++ string <WCHAR> correctly
RC> understand unicode (and what representation does it use?). Does it use
RC> the same as Win32 WCHAR does? 

>> > (See the NT kernel defines for
>> > UNICODE_STRING to see how unicode strings are represented.).

Btw I read somewhere else that Windows does not support the full
japanese characterset, but only the most used characters.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]