This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: How does Cygwin handle non-Latin1 man pages? (move to UTF-8?)



Erwin Waterlander schreef, Op 24-9-2013 22:01:
Hi,

As far as I see it, on Cygwin it is assumed that man pages are encoded in Latin-1 (ISO-8859-1).
For instance the man pages of vim.

/usr/share/man/fr/vim.1.gz is encoded in Latin-1.

$ export LANG=fr_FR.UTF-8
$ man vim

This will show the French man page correctly. Latin-1 is converted to UTF-8.

For the Russian translation of the vim manual I see two files:
/usr/share/man/ru.UTF-8/man1/vim.1.gz
/usr/share/man/ru.KOI8-R/man1/vim.1.gz


When I type
$ export LANG=ru_RU.UTF-8
$ man vim

I get the English man page, instead of the Russian man page.
I think because there is no /usr/share/man/ru/man1/vim.1.gz present.


The problem is here that man looks for the manual in these directories in this order:
/usr/share/man/ru_RU.UTF-8
/usr/share/man/ru_RU
/usr/share/man/ru

All three paths are not present on Cygwin.
I could set LANG to ru.UTF-8, but this is not common practice. Normally you set LANG to ru_RU.UTF-8. Therefore I think that the non-Latin1 folders under /usr/share/man have the wrong name. When I set LANG to ru.UTF-8, man finds the Russian man page, but displays it wrongly. Even when I fix the NROFF line in /etc/man.conf. Moving /usr/share/man/ru.UTF-8 to /usr/share/man/ru_RU.UTF-8 (and fixing man.conf) makes the man page display properly. This confirms that the non-latin1 directories have the wrong name in Cygwin.

When I type

$ export LANG=ru_RU.UTF-8
$ export LANGUAGE=ru.UTF-8
$ man vim

The Russian man page is displayed, but all Russian characters are wrongly displayed.
I think because it is assumed the man page is in Latin-1.

To get a correct display of the Russian man page I need to change /etc/man.config
I change the line with NROFF to:
NROFF         /usr/bin/preconv | /usr/bin/nroff -c -mandoc 2>/dev/null

Now the Russian man page displays correctly, but now all the Latin-1 pages display wrongly.

This can be fixed by adding a coding tag to the first or second line of the man page, which is understood by preconv. When I set LANG to fr_FR.UTF-8, move /usr/share/man/fr.UTF-8 to /usr/share/man/fr_FR.UTF-8, and add this tag to vim.1

.\" -*- coding: latin-1; -*-

The French manual displays properly.



So I undo my change in /etc/man.conf


On Linux the trend is to convert all man pages to UTF-8 encoding.
Will Cygwin follow this trend?



The following needs to be done in Cygwin to have man pages for all scripts displayed properly out of the box (assuming an UTF-8 locale and use of mintty):

* Rename the non-latin1 directories under /usr/share/man/ to fr_FR.UTF-8, ru_RU.UTF-8, and so on.
* Change /etc/man.conf to use preconv:
NROFF         /usr/bin/preconv | /usr/bin/nroff -c -mandoc 2>/dev/null
* Convert all Latin-1 coded man pages to UTF-8, or add a latin-1 coding tag on the first line.

regards,

--
Erwin Waterlander
http://waterlan.home.xs4all.nl/


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]