This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Improved check-localedef script


4.08.2017 08:42 Mike FABIAN <mfabian@redhat.com> wrote:
>
>
> Zack Weinberg <zackw@panix.com> wrote:
>
> [...]
> > ... and finds dozens and dozens of errors. The full list is attached,
> > but here's a small sample:
> >
> > localedata/locales/ur_PK... (charset: cp1256)
> > localedata/locales/ur_PK:114: string not representable in cp1256:
> > 062C 0646 0648 0631 06CC
> > localedata/locales/ur_PK:115: string not representable in cp1256:
> > 0641 0631 0648 0631 06CC
> > localedata/locales/ur_PK:117: string not representable in cp1256:
> > 0627 067E 0631 06CC 0644
> >
> > These are the abmon strings, so I think it really would be a problem...
>
> This is the first abmon string:
>
> abmon "جنوری";/
>
> The last letter in this string, ی U+06CC ARABIC LETTER FARSI YEH
> is not convertible to CP1256.
> [...]

This "Charset: CP1256" is just a comment.  Is it used anywhere? I don't
think so.  I think that localedata/SUPPORTED file is relevant and it
requires ur_PK (and ur_IN as well) to be converted to UTF-8 only.

> [...]
> So I think we should replace
>
> % Charset: CP1256
>
> with
>
> % Charset: UTF-8
>
> in ur_PK.

The file currently is in pure 7-bit ASCII.  Do we need this line
at all?  What about removing it?  If it should not be removed then
maybe let's consider ASCII.  UTF-8 is good if ASCII cannot be used.
Actually, CP1256 is also true but misleading, the file uses an ASCII
charset which is a common subset of many other subsets.  The only
problem is that CP1256 is misleading and causes those false positives.

TL;DR: my suggestions are (in the order of my preference):

- remove this line,
- replace with % Charset: ASCII
- replace with % Charset: UTF-8
- leave unchanged,
- feel free to post your own suggestion.

Regards,

Rafal


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]