This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Improved check-localedef script
- From: Mike FABIAN <mfabian at redhat dot com>
- To: Luis Javier Merino <ninjalj at gmail dot com>
- Cc: Zack Weinberg <zackw at panix dot com>, GNU C Library <libc-alpha at sourceware dot org>, Rafal Luzynski <digitalfreak at lingonborough dot com>
- Date: Fri, 04 Aug 2017 14:45:21 +0200
- Subject: Re: Improved check-localedef script
- Authentication-results: sourceware.org; auth=none
- Authentication-results: ext-mx10.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
- Authentication-results: ext-mx10.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=mfabian at redhat dot com
- Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 965B06147F
- References: <CAKCAbMjLN7SMWwveXVokSCttqso+r+1AttpFEpDBdJcSyiuQ4Q@mail.gmail.com> <s9dfud7j0kc.fsf@redhat.com> <CABjvSdjwLw_eJu1G9WU7PG=KarWp6At5uU0rM0HeoS8v=wktSg@mail.gmail.com>
Luis Javier Merino <ninjalj@gmail.com> さんはかきました:
>> This is the first abmon string:
>>
>> abmon "جنوری";/
>>
>> The last letter in this string, ی U+06CC ARABIC LETTER FARSI YEH
>> is not convertible to CP1256.
>>
>> But this letter seems to be really used in writing Urdu, see:
>>
>> https://en.wikipedia.org/wiki/Urdu_alphabet
>> https://en.wikipedia.org/wiki/Urdu_alphabet#Ye
>>
>> So I think CP1256 is not a suitable charset to use for Urdu.
>
>
> Note that there is a transliteration rule for that letter:
>
> translit_start
> include "translit_combining";""
>
> % those two lettes are not in cp1256...
>
> % Maddah above -> Alef with madda above
> <U0653> "<U0622>"
> % Farsi yeh -> yeh
> <U06CC> "<U064A>"
>
> translit_end
Yes, this transliterates in an Arabic letter which looks identical
in most cases and that Arabic letter is contained in CP1256.
echo -n ی | iconv -f utf-8 -t cp1256//translit
does not transliterate it though.
>> https://en.wikipedia.org/wiki/Windows-1256
>>
>> says:
>>
>> Wikipedia> Windows-1256 is a code page used to write Arabic (and possibly
>> some
>>
>> Note the “possibly”.
>>
>> Wikipedia> other languages that use Arabic script, like Persian and Urdu)
>> under
>> Wikipedia> Microsoft Windows.
>> Wikipedia> [...]
>> Wikipedia> Unicode and UTF-8 are preferred to Windows 1256 in modern
>> Wikipedia> applications. 0.1% of all web pages use Windows-1256 in June
>> 2016.
>>
>> So CP1256 doesn’t seem to be used much anymore.
>>
>
> Still, Xorg's locale.alias aliases ur_PK to ur_PK.CP1256:
> https://cgit.freedesktop.org/xorg/lib/libX11/tree/nls/locale.alias.pre#n1121
> , but that line comes straight from 2004:
> https://cgit.freedesktop.org/xorg/lib/libX11/commit/nls/locale.alias.pre?id=c6349f43193b74a3c09945f3093a871b0157ba47
In glibc, we do not have a ur_PK locale using CP1256 encoding:
$ locale -a | grep ^ur
locale -a | grep ^ur
ur_IN
ur_IN.utf8
ur_PK
ur_PK.utf8
$ LC_ALL=ur_PK locale charmap
LC_ALL=ur_PK locale charmap
UTF-8
$
--
Mike FABIAN <mfabian@redhat.com>