This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29


On 09.10.2018 00:23, Rafal Luzynski wrote:
> 8.10.2018 14:40 Marko Myllynen <myllynen@redhat.com> wrote:
>> Hi,
>>
>> Thanks for the update. I have few mostly cosmetic comments below,
>> hopefully we'll hear from others whether they agree with this direction.
>>

Yeah, the earlier we have feedback the more productive we are. I'd be
happy to get much feedback on this as early as possible. So please
everybody concerned please chime in.

> 
>> - No duplicates:
>>
>> % CYRILLIC SMALL LETTER IE
>> <U0435> <U0065>; <U0065>
>>
>> should become:
>>
>> % CYRILLIC SMALL LETTER IE
>> <U0435> <U0065>
>>
>> - There are few issues with the definitions:
>>
>> % CYRILLIC CAPITAL LETTER U
>> <U0423> <U0055>; <U0055>
>> % CYRILLIC UNDEFINED
>> <U0423><U0423> <U00DA>; "<U0055><U0060>"
>>
>> % CYRILLIC SMALL LETTER U
>> <U0443> <U0075>; <U0075>
>> % CYRILLIC UNDEFINED
>> <U0443><U0443> <U00FA>; "<U0075><U0060>"
> 
> Are the duplicates here because some Cyrillic letters may have multiple
> Latin transliterations depending on the context, for example Cyrillic IE
> must be transliterated sometimes as "e", sometimes as "ie", sometimes
> as "ye" or "je"?  Can we provide rules for groups of characters instead?
No, the duplicates are just by design of my line generating logic. I
have fixed (removed) them. The varying transcription between
languages/locales can not be handled in one file at all as far as I
understood.

> 
>> I wonder would it be possible to automate generation of this file so
>> that issues like the above could avoided? But perhaps that could be the
>> next step once this initial patch lands.

I am generating the content part of the translit_cyrillc from the
LibreOffice Spreadsheet. Not sure if you had time to view it by now?
https://sourceware.org/bugzilla/attachment.cgi?id=11299

Anyway I have just fixed the issues identified by Marko above in that
spreadsheet. I will do the changes for the below request and then upload
the new translit_cyrillic file to the bugzilla.


>> - Please add the standard glibc locale header (see the existing
>> translit_* files for reference)
>> - Consider wrapping the header lines at or around column 70-72
>> - Consider describing which characters, character ranges, or blocks are
>> supported (perhaps also describe why some of those are not included, see
>> e.g. https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode)
>> - Please remove trailing whitespaces and spaces after ;
>
> Thanks for this, Marko.  While at this, in the ChangeLog and in the commit
> message these paths:
>
> 	* locales/aa_DJ: likewise
>
> 1. Should be a relative path starting in the root directory of glibc
source,
>    that is: "* localedata/locales/aa_DJ".
> 2. Should be "Likewise." (starting with an uppercase and ending with a
dot).

will do.

Bests,
Egor


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]