This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug localedata/22241] New locale: Yakut (Sakha) locale for Russia (sah_RU)


https://sourceware.org/bugzilla/show_bug.cgi?id=22241

--- Comment #1 from Rafal Luzynski <digitalfreak at lingonborough dot com> ---
Comment on attachment 10501
  --> https://sourceware.org/bugzilla/attachment.cgi?id=10501
sah_RU locale file for glibc

Thank you for providing this new locale file. Here is my review:

>escape_char  /
>comment_char  %

This is correct but I think there is no need to have two spaces after
"escape_char" and "comment_char".

Also, as stated in bug 1123, here should be the legal formula starting with:
"% This file is part of the GNU C Library and contains locale data."

See: https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=a4cea54

>% Yakut (Sakha) locale for Russian Federation
>% Source: Valery Timiriliyev
>% Email: timiriliyev@gmail.com
>% Tel: (none)
>% Fax: (none)

Probably it does not matter much but there is no need to put "none" in
parentheses.

>% Language: sah
>% Territory: RU
>% Revision: 1.0.0
>% Date: 2017-10-02

Regarding the dates: please remember to update them when you deliver the new
version.

>% Users: general
>% Repertoiremap: mnemonic,ds

I don't understand what is this "Repertoiremap" comment.  I suggest to remove
it.

>% Charset: UTF-8

We have agreed to remove this line because this is misleading.  See:
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=39b20aa

>%
>
>LC_IDENTIFICATION
>title      "Yakut (Sakha) locale for Russian Federation"
>source     "Valery Timiriliyev"
>address    ""
>contact    "Valery Timiriliyev"
>email      "timiriliyev@gmail.com"
>tel        ""
>fax        ""
>language   "Yakut (Sakha)"
>territory  "Russian Federation"
>revision   "1.0.0"
>date       "2017-10-02"
>%
>category  "i18n:2012";LC_IDENTIFICATION
>category  "i18n:2012";LC_CTYPE
>category  "i18n:2012";LC_COLLATE
>category  "i18n:2012";LC_TIME
>category  "i18n:2012";LC_NUMERIC
>category  "i18n:2012";LC_MONETARY
>category  "i18n:2012";LC_MESSAGES
>category  "i18n:2012";LC_PAPER
>category  "i18n:2012";LC_MEASUREMENT
>category  "i18n:2012";LC_NAME
>category  "i18n:2012";LC_ADDRESS
>category  "i18n:2012";LC_TELEPHONE
>END LC_IDENTIFICATION

LGTM.

>
>LC_CTYPE
>copy "ru_RU"
>END LC_CTYPE

I haven't checked, are you sure that all Yakut-specific characters are already
there?

>
>LC_COLLATE
>copy "iso14651_t1"
>
>collating-symbol <YAK-GHE>
>collating-symbol <YAK-ENG>
>collating-symbol <YAK-OE>
>collating-symbol <YAK-HE>
>collating-symbol <YAK-UE>
>
>reorder-after <CYR-GHE>
><YAK-GHE>
>reorder-after <CYR-EN>
><YAK-ENG>
>reorder-after <CYR-O>
><YAK-OE>
>reorder-after <CYR-ES>
><YAK-HE>
>reorder-after <CYR-OU>
><YAK-UE>
>
>% Ҕ after Г
>reorder-after <U0433>
><U0495> <YAK-GHE>;<PCL>;<MIN>;IGNORE
>reorder-after <U0413>
><U0494> <YAK-GHE>;<PCL>;<CAP>;IGNORE
>
>% Ҥ after Н
>reorder-after <U043D>
><U04A5> <YAK-ENG>;<PCL>;<MIN>;IGNORE
>reorder-after <U041D>
><U04A4> <YAK-ENG>;<PCL>;<CAP>;IGNORE
>
>% Ө after О
>reorder-after <U043E>
><U04E9> <YAK-OE>;<PCL>;<MIN>;IGNORE
>reorder-after <U041E>
><U04E8> <YAK-OE>;<PCL>;<CAP>;IGNORE
>
>% Һ after С
>reorder-after <U0441>
><U04BB> <YAK-HE>;<PCL>;<MIN>;IGNORE
>reorder-after <U0421>
><U04BA> <YAK-HE>;<PCL>;<CAP>;IGNORE
>
>% Ү after У
>reorder-after <U0443>
><U04AF> <YAK-UE>;<PCL>;<MIN>;IGNORE
>reorder-after <U0423>
><U04AE> <YAK-UE>;<PCL>;<CAP>;IGNORE
>
>reorder-end
>
>END LC_COLLATE

I haven't tested this but I've compared the source code with the Yakut alphabet
as described in Wikipedia.  Looks good.  Please ignore the Unicode issues of my
browser. :-)

>
>LC_MONETARY
>% ISO 4217 Currency and fund codes
>
>% 'RUB '
>int_curr_symbol         "<U0052><U0055><U0042><U0020>"
>
>% '₽'
>currency_symbol         "<U20BD>"
>
>% '.'
>mon_decimal_point       "<U002E>"
>
>% ' ' (no-break space)
>mon_thousands_sep       "<U00A0>"
>
>mon_grouping            3;3
>positive_sign           ""
>negative_sign           "<U002D>"
>int_frac_digits         2
>frac_digits             2
>p_cs_precedes           0
>p_sep_by_space          1
>n_cs_precedes           0
>n_sep_by_space          1
>p_sign_posn             1
>n_sign_posn             1
>END LC_MONETARY

This section is identical to ru_RU so please consider using ”copy "ru_RU"”
instead.

>
>LC_NUMERIC
>% ","
>decimal_point           "<U002C>"
>
>% no-break space
>thousands_sep           "<U00A0>"
>grouping                3;3
>END LC_NUMERIC

The same here.

>
>LC_TIME
>% abday - The abbreviations for the week days:
>% - бн, оп, ср, чп, бт, сб, бс

THE MAJOR ISSUE HERE: The abday and day arrays must always start with Sunday. 
If you want to display Monday first it will be specified elsewhere.
See also: https://sourceware.org/glibc/wiki/Locales#LC_TIME

>abday       "<U0431><U043D>";/
>            "<U043E><U043F>";"<U0441><U0440>";/

CLDR says that Wednesday abbreviation should be "сэ" while yours is "ср".  We
tend to trust CLDR more so please consider changing to "сэ" or report an issue
to CLDR.
See CLDR: http://st.unicode.org/cldr-apps/v#/sah/Gregorian/

>            "<U0447><U043F>";"<U0431><U0442>";/

Similar problem with Friday: CLDR says "бэ" while your version is "бт".  Again,
please switch your version to CLDR or report an issue to CLDR.

>            "<U0441><U0431>";"<U0431><U0441>"
>
>% day - The full names of the week days:
>% - бэнидиэнньик, оптуорунньук, сэрэдэ,
>%   чэппиэр, бээтинсэ, субуота, баскыһыанньа

Same problem with days order.

>day         "<U0431><U044D><U043D><U0438><U0434><U0438><U044D><U043D><U043D><U044C><U0438><U043A>";/
>            "<U043E><U043F><U0442><U0443><U043E><U0440><U0443><U043D><U043D><U044C><U0443><U043A>";/
>            "<U0441><U044D><U0440><U044D><U0434><U044D>";/
>            "<U0447><U044D><U043F><U043F><U0438><U044D><U0440>";/
>            "<U0431><U044D><U044D><U0442><U0438><U043D><U0441><U044D>";/

Friday: CLDR says "Бээтиҥсэ" while your version is "бээтинсэ".  I believe that
it's CLDR's bug to start with the uppercase "Б" because all other letters are
lowercase (please report an issue) but there is still the difference "ҥ" vs.
"н".  Please switch to CLDR version or report an issue to CLDR.

>            "<U0441><U0443><U0431><U0443><U043E><U0442><U0430>";/
>            "<U0431><U0430><U0441><U043A><U044B><U04BB><U044B><U0430><U043D><U043D><U044C><U0430>"
>
>% abmon - The abbreviations for the months
>% - тохс, олун, кул, муус, ыам, бэс, от, атыр, бал, алт, сэт, ахс
>abmon       "<U0442><U043E><U0445><U0441>";"<U043E><U043B><U0443><U043D>";/
>            "<U043A><U0443><U043B>";"<U043C><U0443><U0443><U0441>";/

CLDR says "Клн" and "Мсу" here while your version is "кул" and "муус".  I
believe it's CLDR bug to start with uppercase but I'd rather believe CLDR
regarding other letters.  Regarding uppercase vs. lowercase: for the languages
which don't need different grammatical cases for standalone vs. formatting
month names CLDR suggests that standalone month names can start with uppercase
while formatting can start with lowercase.

>            "<U044B><U0430><U043C>";"<U0431><U044D><U0441>";/
>            "<U043E><U0442>";"<U0430><U0442><U044B><U0440>";/

CLDR says "Отй and "Атр" here while your version is "от" and "атыр".  Same
comment as above.

>            "<U0431><U0430><U043B>";"<U0430><U043B><U0442>";/

September: CLDR says "Блҕ" while your version is "бал".  Same comment as above.

>            "<U0441><U044D><U0442>";"<U0430><U0445><U0441>"
>
>% mon - The full names of the months -
>% - тохсунньу, олунньу, кулун тутар, муус устар, ыам ыйа
>%   бэс ыйа, от ыйа, атырдьах ыйа, балаҕан ыйа, алтынньы
>%   сэтинньи, ахсынньы

I suggest adding commas after the month names at the end of each line to avoid
confusion.  One may think that "ыам ыйа бэс ыйа" is a single month name with a
line broken inside.

>mon         "<U0442><U043E><U0445><U0441><U0443><U043D><U043D><U044C><U0443>";/
>            "<U043E><U043B><U0443><U043D><U043D><U044C><U0443>";/
>            "<U043A><U0443><U043B><U0443><U043D><U0020><U0442><U0443><U0442><U0430><U0440>";/
>            "<U043C><U0443><U0443><U0441><U0020><U0443><U0441><U0442><U0430><U0440>";/
>            "<U044B><U0430><U043C><U0020><U044B><U0439><U0430>";/

I am confused here: CLDR says "ыам ыйа" if the month name is standalone (same
as your version so probably it is correct) but "Ыам ыйын" when it is in a
formatting context.  Which version is correct, modulo upper/lowercase?

>            "<U0431><U044D><U0441><U0020><U044B><U0439><U0430>";/
>            "<U043E><U0442><U0020><U044B><U0439><U0430>";/
>            "<U0430><U0442><U044B><U0440><U0434><U044C><U0430><U0445><U0020><U044B><U0439><U0430>";/
>            "<U0431><U0430><U043B><U0430><U0495><U0430><U043D><U0020><U044B><U0439><U0430>";/

Same comment everywhere above.  Additionally, for August CLDR consequently says
"атырдьых" while your version says "атырдьах".  Please fix.

>            "<U0430><U043B><U0442><U044B><U043D><U043D><U044C><U044B>";/
>            "<U0441><U044D><U0442><U0438><U043D><U043D><U044C><U0438>";/
>            "<U0430><U0445><U0441><U044B><U043D><U043D><U044C><U044B>"

Probably correct; please read again my comment about lower/uppercase letters in
CLDR.

>
>% Abreviated date and time representation to be referenced by the "%c" field descriptor -

Typo: s/Abreviated/Abbreviated, and please don't exceed 80 chars per line.

>%
>% "%a" (short weekday name),
>% "%d" (day of month as a decimal number),
>% "%b" (short month name),
>% "%Y" (year with century as a decimal number),
>% "%T" (24-hour clock time in format HH:MM:SS),
>% "%Z" (Time zone name)
>d_t_fmt "%a %Y с. %b %d к. %T (%Z)"

Please don't use non-ASCII characters in the data.  We have agreed to use the
ASCII characters directly so it is correct to write "%a %Y" but "с" should be
"<U0441>" instead.

>
>% Date representation to be referenced by the "%x" field descriptor -
>% "%d/%m/%Y", day/month/year as decimal numbers (01/01/2000).
>d_fmt       "%Y.%m.%d"
>
>% Time representation to be referenced by the "%X" field descriptor -
>% "%T" (24-hour clock time in format HH:MM:SS)
>t_fmt       "%T"
>
>% Define representation of ante meridiem and post meridiem strings -
>% The "" mean default to "AM" and "PM".
>am_pm       "";""
>
>% Define time representation in 12-hour format with "am_pm", to be referenced by the "%r"
>% The "" means that this format is not supported.
>t_fmt_ampm  ""

LGTM

>
>% Date representation not described in ISO/IEC 14652. Comes out as -
>% "%a %b %e %H:%M:%S %Z %Y" which is default "date" command output
>%
>% %a - abbreviated weekday name,
>% %b - abreviated month name,
>% %e - day of month as a decimal number with leading space (1 to 31),
>% %H - hour (24-hour clock) as a decimal number (00 to 23),
>% %M - minute as a decimal number (00 to 59),
>% %S - seconds as a decimal number (00 to 59),
>% %Z - time-zone name,
>% %Y - year with century as a decimal number,e.g. 2001.
>date_fmt "%a %Y с. %b %e к. %H:%M:%S (%Z)"

Again please don't use non-ASCII characters.

>
>week    7;19971130;1
>first_weekday 2

It seems that it's not a problem to skip "first_workday" here but please add to
avoid confusion:

first_workday=2

>END LC_TIME
>
>LC_MESSAGES
>% The affirmative response -
>% '^[yYдДэЭ]'
>yesexpr "<U005E><U005B><U0079><U0059><U0434><U0414><U044D><U042D><U005D>"
>
>% The negative response -
>% '^[nNнНсС]'
>noexpr "<U005E><U005B><U006E><U004E><U043D><U041D><U0441><U0421><U005D>"

Correct but we have agreed that "+" and "1" should be included in yesexpr
and "-" and "0" should be included in noexpr.
See: https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=f982160

Please include "yesstr" and "nostr" here.  They should contain "yes" and "no"
in Yakut language, respectively.  Their capitalization should be based on the
regular grammar rules, not assuming that they are in the beginning of the
sentence.

>END LC_MESSAGES
>
>LC_PAPER
>copy "ru_RU"
>END LC_PAPER
>
>LC_TELEPHONE
>copy "ru_RU"
>END LC_TELEPHONE
>
>LC_MEASUREMENT
>copy "ru_RU"
>END LC_MEASUREMENT

LGTM.

>
>LC_NAME
>% Format for addressing a person.
>% "%d%t%g%t%m%t%f"
>%
>% "Salutation",
>% "Empty string, or <Space>",
>% "First given name",
>% "Empty string, or <Space>",
>% "Middle names",
>% "Empty string, or <Space>",
>% "Clan names"
>name_fmt    "<U0025><U0064><U0025><U0074><U0025><U0067><U0025><U0074>/
><U0025><U006D><U0025><U0074><U0025><U0066>"
>END LC_NAME

It looks identical to ru_RU so please consider using ”copy "ru_RU"”.

>
>LC_ADDRESS
>postal_fmt    "<U0025><U0066><U0025><U004E><U0025><U0061><U0025><U004E>/
><U0025><U0064><U0025><U004E><U0025><U0062><U0025><U004E><U0025><U0073>/
><U0020><U0025><U0068><U0020><U0025><U0065><U0020><U0025><U0072><U0025>/
><U004E><U0025><U0025><U007A><U0020><U0025><U0054><U0025>/
><U004E><U0025><U0063><U0025><U004E>"
>
>% Россия
>country_name  "<U0420><U043E><U0441><U0441><U0438><U044F>"
>
>% Саха тыла
>lang_name     "<U0421><U0430><U0445><U0430><U0020><U0442><U044B><U043B><U0430>"
>
>% UN Geneve 1949:68 Distinguishing signs of vehicles in international traffic
>% RUS
>country_car    "<U0052><U0055><U0053>"
>
>% ISO 639 language abbreviations:
>% 639-1 2 letter, 639-2 3 letter terminology
>% (empty), sah, sah
>lang_ab       ""
>lang_term     "<U0073><U0061><U0068>"
>lang_lib      "<U0073><U0061><U0068>"
>
>% ISO 3166 country number and 2 and 3 letter abreviations
>% RU, RUS
>country_ab2   "<U0052><U0055>"
>country_ab3   "<U0052><U0055><U0053>"
>country_num   643
>
>END LC_ADDRESS

It would be nice if you could provide country_isbn, otherwise this section
LGTM.

Now when you fix these bugs please provide a patch instead of a file.  The
patch should add this file and add the language to localedata/SUPPORTED. 
Though, I'm not sure ATM if the new line in localedata/SUPPORTED should be
"sah_RU/UTF-8" or "sah_RU.UTF-8/UTF-8".  Please see some examples of patches
which add a new language:
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=f8de956
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=3f802ae

Note that the file locale/iso-639.def already contains a line for Yakut
language but please verify if it is correct.  It says "Yakut" while you provide
"Yakut (Sakha)".  I believe your version is correct.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]