This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Q: Is anybody here using the CYGWIN=codepage:oem setting?


On Mar 19 11:48, David Rothenberger wrote:
> On 3/19/2009 11:13 AM, Corinna Vinschen wrote:
>> On Mar 19 10:33, David Rothenberger wrote:
>>> On 3/19/2009 6:09 AM, Corinna Vinschen wrote:
>>>> If you've set $LANG to, say, "en_US.UTF-8", Cygwin would use the UTF-8
>>>> charset *iff* the application switched the codepage by calling something
>>>> along the lines of `setlocale(LC_ALL, "");'.
>>>> An application which does not call setlocale (which means, it's not
>>>> native language aware anyway) would still use the default ANSI codepage.
>>>
>>> I ran into an issue yesterday where I was trying to "du -sh" a directory
>>> that contained files whose names included UTF characters, I think.
>>> Without CYGWIN=codepage:utf8, this failed. It worked fine when I added
>>> CYGWIN=codepage:utf8.
>>
>> Yes, sure.  As described in the User's Guide.  That's exactly what bugs
>> me right now.  To get UTF-8 support you have to set LANG or LC_ALL or
>> whatever, *and* CYGWIN=codepage:utf8.
>
> In my specific case, I didn't need to set LANG or LC_ALL, just  
> CYGWIN=codepage:utf8.

Yes, sure.  LANG and freinds are used in the locale-specific functions
in newlib, codepage:xxx is used in Cygwin.  Your case is only a case
of converting filenames from UTF-16 to some multipbyte charset.  That
conversion is using the codepage:xxx right now.  Every other multibyte/
wide character stuff in the application is controlled by setlocale,
though.

>>> So my question is, will this work if codepage is dropped and I set LANG
>>> to en_US.UTF-8? Is there anything in the Cygwin DLL itself that uses
>>> codepage that might be valuable to enable even for applications that
>>> aren't native language aware and don't call setlocale()?
>>
>> Not exactly.  However, assuming you have a file using characters which
>> are not in your current ANSI codeset, then you could only manipulate
>> that file when setting LANG="xx_YY.UTF-8", and only in applications
>> which call setlocale().
>
> I have no idea whether du calls setlocale() or not. I think you're  
> saying that today, with codepage:utf8, it is able to get sizes for files  
> using non-ANSI characters, but if codepage is removed, it would not be  
> able to do so unless it called setlocale(). Is that right?

Right.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]