This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: [BUG REPORT]sed -e 's/[B-D]/_/g' replaces unexpected characters

From: "Buchbinder, Barry (NIH/NIAID) [E]" <BBuchbinder at niaid dot nih dot gov>
To: "cygwin at cygwin dot com" <cygwin at cygwin dot com>
Date: Tue, 25 Jun 2013 16:06:58 +0000
Subject: RE: [BUG REPORT]sed -e 's/[B-D]/_/g' replaces unexpected characters
References: <CA+nJC97He=j-O2FZ-Y2jJhYXEJn2o2EfC1wO39+2bZ=nj1f-zA at mail dot gmail dot com> <20130625152356 dot GD11958 at calimero dot vinschen dot de> <5F8AAC04F9616747BC4CC0E803D5907D0C37C25C at MLBXv04 dot nih dot gov>

Lavrentiev, Anton sent the following at Tuesday, June 25, 2013 11:44 AM
>> The character ordering is based on the default Windows ordering for the
>> locale, and that's dictionary ordering, apparently.
>
>Ah, I see what you meant here. There's an elaborated explanation:
>
>http://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html

Also, the sed info documentation "Reporting Bugs" explicitly says that
this is not a bug.

`[a-z]' is case insensitive
     You are encountering problems with locales.  POSIX mandates that
     `[a-z]' uses the current locale's collation order - in C parlance,
     that means using `strcoll(3)' instead of `strcmp(3)'.  Some
     locales have a case-insensitive collation order, others don't.

     Another problem is that `[a-z]' tries to use collation symbols.
     This only happens if you are on the GNU system, using GNU libc's
     regular expression matcher instead of compiling the one supplied
     with GNU sed.  In a Danish locale, for example, the regular
     expression `^[a-z]$' matches the string `aa', because this is a
     single collating symbol that comes after `a' and before `b'; `ll'
     behaves similarly in Spanish locales, or `ij' in Dutch locales.

     To work around these problems, which may cause bugs in shell
     scripts, set the `LC_COLLATE' and `LC_CTYPE' environment variables
     to `C'.

References:
- [BUG REPORT]sed -e 's/[B-D]/_/g' replaces unexpected characters
  - From: Atry
- Re: [BUG REPORT]sed -e 's/[B-D]/_/g' replaces unexpected characters
  - From: Corinna Vinschen
- RE: [BUG REPORT]sed -e 's/[B-D]/_/g' replaces unexpected characters
  - From: Lavrentiev, Anton (NIH/NLM/NCBI) [C]

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]