This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: difficulty of writing translators



; It is often mentioned that translators from other language into
; scheme are an important point in Stallman's plans for the guile
; project. Already the tcl-war archive logs some arguing about that.
; Though scheme seems to have everything necessary to adapt elegantly
; to foreign syntaxes, writing particular translators seems always a
; non-trivial task.  Are there intuitive estimates, which languages
; are hard to translate and thus require a brainstorming of
; programmers with decades of experience, and which are fairly easy?

Yes and no.  I don't think anyone knows quite what it will be like for
all these languages.  A better way to answer your question might be to
describe the process and judge for yourself.

* Construct a reader which takes source code and converts it into a
parse tree.

** You could write a custom reader in Scheme, using a C module, scheme
regular-expression stuff, the scheme reader, and any other
miscellaneous toys that are handy.  In particular, one could possibly
use the reader from some free implementation of the language.  (One
would dynamically link this in).

** You could write a lexer and a parser spec using the lang module.
This resembles using lex and yacc.

* Translate the parse tree into scheme source in some reasonable way.

** for example, replace (while (> a 0) statement statement ...) with
the scheme code for a loop.

* Write the standard library of functions.

** These would typically be written in terms of the scheme standard
functions.  As an example, Java method calls would have to be mapped
onto tiny-clos calls (or something).

** These could also be done by using C code from a free
implementation.  As an example, elisp has a pile of buffer
manipulation functions.  The best way to put these into scheme would
be to use the emacs C code. 

There is also some work that needs to be done on guile before this is
convenient.  The module system needs to be reworked somewhat (probably
into something like schem-48's) so that other-language modules can
easily be loaded. 

Some thought also needs to be put into the language we're merging ---
it's important that the user be able to see new functions that appear
at the sceme level.  (for example, CTAX users need to be able to get
at all the user scheme code).  This poses some problems when
other-language functions have very different calling conventions thatn
scheme does. 

Static typing may be a headache. 

; How do e.g. elisp, gcl, clisp, perl, pike, python, tcl (without tk),
; C, C++, Pascal, Modula, Java, Smalltalk, Javascript, Haskell, ML,
; compare with respect to translatability into guile? Or some
; pseudo-languages like sendmail, TeX, html, VB, fvwmrc, awk, sed,
; ...?

Well, my estimates:

elisp, gcl, clisp: pretty easy if one rips apart the source code
perl: Nasty and difficult.  Maybe doable by using perl as a library.
python: not too bad natively
tcl: Ugh.  Not too hard, but nasty and messy.  Some
	data-representation issues.
C, C++: problems of batch compilation, low-level access. Doable, but
	hard.
Pascal, Modula-3: not as hard as C,C++
Java: not too hard (see Kawa)
Haskell, ML: not bad, but comparable to the C implementation

Pseudo-langauges are probably easier, because theuy tend to be
simpler.  But their "standard libraries" are more problematic.

A third class you might want to think about are new custom languages.
I know, I know.  But frequently it's a good idea; cf. MATLAB, yorick,
MAPLE, etc.  These tend to be easy.  For example, CTAX is done in a
few k of source.  If people's objection to guile is not "another new
language" but "cursed parenteses" then this is a good solution.  For
example, a dotfile reader should be doable in a few k (that is, a
gizmo which accepts dotfile-like syntax and translates it into scheme,
for configuration of, say scwm)

Anyway, in short: 
You're writing a new compiler for a language.
But you've got a really cool assembler.

Andrew