This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug localedata/13063] 'sort -u' will erase some Chinese characters

From: "maiku.fabian at gmail dot com" <sourceware-bugzilla at sourceware dot org>
To: libc-locales at sourceware dot org
Date: Thu, 20 Jul 2017 08:01:58 +0000
Subject: [Bug localedata/13063] 'sort -u' will erase some Chinese characters
Auto-submitted: auto-generated
References: <bug-13063-716@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=13063

--- Comment #7 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Mingye Wang from comment #6)
> This bug is not only seen with extA characters, but also seen with simple
> punctuations and/or kanas. 
> 
> $ printf '%s\n' ， 。 ： ￥ あ か ア カ a b c , . : $ | LC_COLLATE=zh_CN.UTF-8 sort
> -u
> ,
> :
> .
> $
> ，
> a
> b
> c
> 
> (uniq does the same thing.)
> 
> It seems that glibc is just eating away anything not on that list. (What
> kind of equivalence assumption is that?)

This is caused by the collation symbol UNDEFINED not working correctly,
see:

https://sourceware.org/bugzilla/show_bug.cgi?id=18978

-- 
You are receiving this mail because:
You are the assignee for the bug.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]