This is the mail archive of the
kawa@sources.redhat.com
mailing list for the Kawa project.
Re: A few tips
- From: Per Bothner <per at bothner dot com>
- To: Dominique Boucher <dboucher at nuecho dot com>
- Cc: "'Kawa List'" <kawa at sources dot redhat dot com>
- Date: Sun, 09 Nov 2003 20:47:14 -0800
- Subject: Re: A few tips
- References: <000501c39f55$a8ce8850$6400a8c0@Forman>
Dominique Boucher wrote:
I run Kawa servlets with Tomcat 4.1 on Linux, using Sun’s JDK 2
v1.4.1_01.
"Linux" means a lot of different things. My impression is that many
distributions (including Red Hat for sure) are moving towards using
UTF-8 as the standard/default encoding. If your files are instead
ISO-8859-1, you'll probably have problems.
> Some of the configuration files contain
Scheme strings with diacritics (French accents, for instance).
To be pedantic, the files only "contain" diacritics if interpreted using
the correct encoding. Specifically, the files are encoded in
ISO-Latin-1, but your software environment thinks they use some other
encoding, probably UTF-8.
This page http://fedora.redhat.com/docs/release-notes/ has some notes on
encoding. Red Hat believes "In the long term, all systems are expected
to migrate to UTF-8, eliminating this issue."
My guess is you might need to change your LANG environment variable,
unless you're willing to migrate to UTF-8.
2. For the constant Scheme strings in the source code, you must make
sure that Kawa reads them properly when compiling. So add the mutation
to 'port-char-encoding' on the command-line:
shell> kawa -e '(set! port-char-encoding "ISO-8859-1")' -C
sourcefile.scm
The same presumably also applies for files loaded with the -f flag.
And presumably for expressions type at the Kawa console. I don't think
setting port-char-encoding will help with the latter, since the the
standard input has already been opened. For that you need to set LANG.
3. Make sure the strings are not put in 'unescaped-data'.
I assume the "not" was unintended.
This way, all the special characters (those with French diacritics) will
be translated to their equivalent numerical entities (é for é)
properly.
Whether é is written as é or as é should depend on the encoding used
for the underlying PrintWriter. Unfortunately, I don't know of any
reliable way to get that. However, we can use port-char-encoding. We
can also use OutputStreamWriter's getEncoding.
Does anyone know how one finds out in Java what the default (system)
encoding is?
[Note: these problems may be due to a special configuration of the C
locale, but I can't modify it easily for Tomcat.
It's not a problem with the C local per se. However, it may be a
problem that Tomcat is *using* the C locale - or a UTF-8 locale. I bet
with the correct environment flags (LANG? LOCALE?) you could fix that.
--
--Per Bothner
per@bothner.com http://per.bothner.com/