This is the mail archive of the glibc-bugs@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug localedata/428] New: Several bugs in te_IN localedata


There are many problems in the Telugu locale. I have fixed most (collation data
not added yet) of them and created a patch. Following is the justification for
each of the changes I have made. Please let me know if more explanation is needed.


===================

Justfication:
------------

LC_IDENTIFICATION

- The name of the language is not "Telgu" but "Telugu".
http://www.w3.org/WAI/ER/IG/ert/iso639.htm

- email address has been changed by following the the discussion at 
http://sources.redhat.com/ml/libc-locales/2004-q3/msg00021.html

- language should be a 2 letter code or 3 letter code (if 2 letter code does not
exist) as specified in ISO 639.
- territory should be a 2 letter code as mentioned in ISO 3166. Discussion on
these here
http://sources.redhat.com/ml/libc-alpha/2003-06/msg00138.html

- revision and date were also in violation of ISO 14652. These anyway had to
updated.

- This is used to specify that a category is present and what specification it
is in conformance with. The first operand should either take the value
"i18n:2003" or "posix:1993" (or others of the localedata conforms to other
standards). This has been discussed at 
http://sources.redhat.com/ml/libc-alpha/2003-06/msg00184.html

LC_MONETARY

- The currency symbol should include a dot "." after the symbol (Rs) like
Rs.12,34,56,789. Look at the figures on the Reserve Bank of India site 
(http://www.rbi.org.in). And the symbol itself is in Devanagari (hi_IN) and not
in Telugu. The new symbol is "rU." written in Telugu. Examples usages at
http://www.enaadu.net/ and http://www.vaartha.com/ - the online versions of the
Andhra Pradesh's (were more than 75 million Telugu speakers live) most popular
news papers.

- mon_grouping should be 3;2 since in India we have demarkation for thousands
(10^3), lakhs (10^5) and crores (10^7). Examples can be found on the RBI site.
http://www.rbi.org.in

- p_sep_by_space and n_sep_by_space should be zero as now be have a dot after
currency symbol. Example usages can be looked again at
http://www.enaadu.net
http://www.vaartha.com
http://www.rbi.org.in

LC_NUMERIC

- grouping has same problem as in mon_grouping.

LC_TIME

- abmon and mon are the same. The original is not entirely wrong. Months can be
written either way. In Telugu generally we try to end the words with vowels and
it brings sweetness to the language. The Eenadu calender has the usage I have given.
http://www.eenadu.net/calender2004/calhome.htm

- In am_pm keyword, the values given are "poorvahna" and "aparaahna". There are
sanskrit words and not excatly Telugu words. Infact nobody (TV channels,
newspapers) uses these words. We (Telugu speakers) use four divisions in a day
instead of AM and PM: early morning (0000-0600 hrs), morning (0600-1200 hrs),
evening (1200-1800 hrs) and night (1800-2400 hrs). I see that giving all the
four is not possible according to the cultural convention definition. So have
given the abbrevations (proper usage) for morning and evening. One can find the
usages at 
http://www.eenadu.net

- d_t_fmt, t_fmt, d_fmt and t_fmt_ampm are all incorrect. Firstly, if we look at
the original formats, %I (hours in 12 hour clock) is given with out %p (am_pm
info). So the time formats in d_t_fmt and t_fmt were not logical. Moreover these
are not the format used for Telugu at all. I have taken the formats from the
Vaartha (has more occurances) and Eenaadu newspapers.
http://www.eenadu.net
http://www.vaartha.com

LC_MESSAGES

- After looking at discussion about always placing [yY] and [nN] here:
http://sources.redhat.com/bugzilla/show_bug.cgi?id=71
http://sources.redhat.com/ml/libc-locales/2004-q3/msg00018.html
I felt that including [yY] and [nN] was a good idea and Telugu does not have any
conflicts because of this because all symbols in Telugu have a seperate code
page in Unicode. I have also felt that having a .* after the expressions would
be correct thing to do incase the application using the data to check if the
expression completely matches the response string (not just finding the
expression with in the response string).

LC_NAME

- name_fmt is given as "%p%t%f%t%g" which means "Profession FamilyName
FirstGivenName". OtherGivenNames are not included. So my name "Sunil Mohan
Adapa" will only comeup as "Adapa Sunil". "Mohan" will be missing! So the format
should be "%p%t%f%t%g%t%m". Included "%m" for OtherGivenNames.

- name_mr, name_mrs, name_miss, name_ms are all given in plain English when
Telugu has very well defined salutaions. There is no salutation in Telugu which
is valid for all females. So I left it blank.

LC_ADDRESS

- postal_fmt is totally wrong. First thing is that it does even have a white
space anywhere in the entire format. Further it is close the opposite order of
how we write addresses in India. I have collected some sample Indian addresses
from the 'net. I am attaching this file. I have based the new format on these
address and from what I know from daily experience.

- I have added the missing fields kerwords lang_name, lang_ab, lang_term (after
getting warnings from 'localedef')

-- 
           Summary: Several bugs in te_IN localedata
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: localedata
        AssignedTo: pere at hungry dot com
        ReportedBy: sunil at atc dot tcs dot co dot in
                CC: glibc-bugs at sources dot redhat dot com


http://sources.redhat.com/bugzilla/show_bug.cgi?id=428

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]