This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: "C" character set (again)


2010/1/10 Corinna Vinschen:
>> >> Was that the only change you applied? ÂAre you aware that this also
>> >> requires to revert the changes to libc/stdlib/mbtowc_r.c and
>> >> libc/stdlib/wctomb_r.c which set the function pointers __mbtowc to
>> >> __utf8_mbtowc and __wctomb to __utf8_wctomb on Cygwin?
>>
>> D'oh. I was (in the past), but didn't remember it this time round ...
>>
>>
>> > Boy, that's tricky. ÂIf this change gets reverted, the initial
>> > environment is converted to ASCII, rather than UTF-8. ÂThis very change
>> > was what allowed to remove the special code to fetch the LC_xxx vars
>> > from the Windows environment before converting the environment.
>>
>> ... which is why the environment is being correctly converted to UTF-8 here.
>>
>> So how about leaving the initial __mbtowc and __wctomb pointers as
>> they are?
>
> It feels so unclean...

Does that matter, as long as everything's cleaned up by the time the
actual program starts? Speaking of which, what locale context are C++
global constructors executed in? Is the filesystem/console charset
already set according to the environment by that point?

Here's another concern regarding C changing to ASCII: what would a
user who sets LANG=C (or LANG=C.ASCII, for that matter) expect to
happen to filenames? Currently, anything non-ASCII would turn into
^X-escaped UTF-8. However, since ASCII doesn't have anything beyond
0x7F (btw, thanks for patching newlib accordingly), the ^X isn't
actually necessary and filenames in C(.ASCII) could just use straight
UTF-8 anyway.

Therefore, would something like the patch below make sense?

Andy


Index: winsup/cygwin/syscalls.cc
===================================================================
RCS file: /cvs/src/src/winsup/cygwin/syscalls.cc,v
retrieving revision 1.553
diff -r1.553 syscalls.cc
4351,4353c4352,4363
<   cygheap->locale.mbtowc = __mbtowc;
<   cygheap->locale.wctomb = __wctomb;
<   strcpy (cygheap->locale.charset, __locale_charset ());
---
>   if (strcmp (__locale_charset (), "ASCII") == 0) {
>     /* Use UTF-8 for filenames and console anyway */
>     cygheap->locale.mbtowc = __utf8_mbtowc;
>     cygheap->locale.wctomb = __utf8_wctomb;
>     strcpy (cygheap->locale.charset, "UTF-8");
>   }
>   else {
>     cygheap->locale.mbtowc = __mbtowc;
>     cygheap->locale.wctomb = __wctomb;
>     strcpy (cygheap->locale.charset, __locale_charset ());
>   }


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]