This is the mail archive of the
libc-locales@sourceware.org
mailing list for the GNU libc locales project.
[Bug localedata/16061] Review / update transliteration data
- From: "maiku.fabian at gmail dot com" <sourceware-bugzilla at sourceware dot org>
- To: libc-locales at sourceware dot org
- Date: Mon, 04 May 2015 10:42:12 +0000
- Subject: [Bug localedata/16061] Review / update transliteration data
- Auto-submitted: auto-generated
- References: <bug-16061-716 at http dot sourceware dot org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=16061
--- Comment #5 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Marko Myllynen from comment #4)
> (In reply to Mike FABIAN from comment #2)
> > (In reply to Marko Myllynen from comment #0)
> >
> > C-translit.h.in seems to be manually edited and not generated from
> > Unicode data.
>
> Based on earlier changelog comments it seems that C-translit.h.in was
> updated manually for Unicode 3.2.0, should it now be updated for Unicode
> 7.0.0 by some means?
Probably, but how?
> > is apparently manually edited and not generated.
> >
> > locales/translit_cjk_variants
> >
> > is not generated from Unicode data either but from a UniVariants.Z
> > file which can still be found here:
> >
> > http://kanji.zinbun.kyoto-u.ac.jp/~yasuoka/ftp/CJKtable/UniVariants.Z
> >
> > It is from 2002-08-15 and I have no idea how it has been created.
> > So I did not touch /translit_cjk_variants.
>
> Perhaps we could add a note about its origins to the file.
There is already a note in the comment section of that file.
> Also, shouldn't à and à be handled in the same way?
What do you mean by âhandled in the same wayâ?
> Looking at translit_neutral in more detail, I think it's actually wrong
> place for letters, it should contain non-letters only and if specific rules
> are needed for letters like à or Ã, those should be added directly in locale
> files (so the patch discussed in bug 15593 should have not been applied to
> translit_neutral after all). This would also mean that the special rules in
> the generator for cases like EM DASH and EN DASH should probably end up to
> translit_neutral not translit_combining.
My guess is that the purpose of translit_neutral is to contain
transliterations which are locale âneutralâ, i.e. are the same for
all locales. So I see no reason not to include letters.
> > > but some characters (like U+00D6, Ã) have decomposition defined in
> > > Unicode but not in glibc.
> >
> > glibc had this already in translit_combining:
> >
> > (was already there, not added by my patch, it is generated from
> > UnicodeData.txt by decomposing to U+004F U+0308 and then stripping the
> > combining character U+0308).
>
> Yes, I think what I meant to say was that the decomposition to U+004F U+0308
> was missing but as you point out it is defined in some locales where it
> would be needed. Btw, I wonder should U+00D6 actually decompose to U+004F
> U+00A8 after U+004F U+0308 in those locales?
à -> OÂ
Why? Is that a reasonable transliteration? It throws away less
information but I think it is common practice to transliterate Ã
just as O in English for example.
--
You are receiving this mail because:
You are on the CC list for the bug.