This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Perl translator: roadmap


Hello Guile-folk,

This message details the steps I believe are necessary for creating a
successful Perl-to-Scheme translator.  By "successful" I mean one that
can run as many pure-Perl programs as a given version of Perl, and
could at least conceivably be made source-compatible with a subset of
Perl's C interface.

Do not start by wrting a parser.  Instead, start by using Perl's
excellent embedding interface to link the Perl interpreter into a
Guile module or Guile proper.  (type perldoc perlembed)  When all else
is done, this part may be jettisoned, but it will extremely valuable
between now and then.

Devise a way to wrap Perl data in Scheme objects, and vice versa.
Perl values' reference count must be maintained; a simple inc on
import and dec on sweep will do for starters.  Scheme objects held by
Perl must be marked.  I recommend keeping a doubly-linked list of them
in the Perl interpreter object.  The Perl wrapper object's destructor
should remove its Scheme thing from the list.  Of course, the same
object may occur in the list more than once, which amounts to a
reference count.  (Naturally, one will want to avoid getting Perl
references mixed up in circular data structures.)

Write a general-purpose Perl-to-Scheme data conversion function in C
and another for Scheme-to-Perl.  They should convert natural
counterparts to each other (e.g., integers and strings) and use the
wrapper for unconvertible stuff.  I recommend converting Perl
arrayrefs to Scheme lists and vice versa, performing a deep copy.  I
also suggest using a Perl arrayref-ref to represent a vector.  #f,
'(), and the various Perl forms of false will get sticky... I guess #f
goes with undef, so the Perl test of Scheme truth is "defined($bla)".

Provide a way to evaluate and call Perl code in Scheme, and to call
Scheme functions in Perl.  Scheme should be able to specify any of the
three Perl contexts (scalar, list, void) for Perl evaluation.  To be
robust, both languages must be able to catch each other's exceptions.
Perl has mechanisms to ensure that control returns to a given frame in
the event of a longjmp and to unwind its context.  If a longjmp occurs
because of a Perl error, the code that catches it should raise a
Scheme exception, and vice versa.

You may well ask what any of the above has to do with writing a
translator.  Er, nothing... except that it will create an environment
which is very conducive to experimenting with and debugging conversion
and evaluation techniques.  We will know the semantics are correct,
because Perl will be doing all of the work.  Then, bit by bit, we can
replace Perl data structures and operators with Scheme equivalents,
using the humongous body of existing Perl code to test every step of
the way.

You may then ask whether it is really worth the trouble to set up such
an environment when the goal is to write a translator.  I have two
answers to this.  One, a Perl-in-Scheme embedding would be useful in
its own right.  Two, it has been done in the case of Emacs Lisp.  The
Perlmacs implementation would, I imagine, serve as a good example for
doing the same thing in Guile.  See
http://www.perl.com/CPAN/authors/id/JTOBEY/ for further information.

Are you with me so far?  Good!  Now comes the exciting part.

I will lay out the steps for writing a Perl-to-Elisp compiler based on
what is already done.  I don't know Guile as well as Emacs, so I will
just assume that everything I say would apply to Guile as well.
Substitute "Scheme" for "Lisp".

Write accessors for all Perl data types.  For example, to get at an
element in a hash, you would have something like:

    (perl-hash-fetch hash key)

and to list the keys, you could do:

    (perl-hash-keys 'list-context hash)

I have made a start at this with Elisp, but it is still alpha, and the
details are very subject to change.  I have a Perl module with subs
like

    sub HASH::NEW        ()         { my %x; \%x }
    sub HASH::GET        ($)        { \%{$_[0]} }
    sub HASH::FETCH      (\%$)      { $_[0]->{$_[1]} }
    sub HASH::STORE      (\%$$)     { $_[0]->{$_[1]} = $_[2] }
    sub HASH::KEYS       (\%)       { keys %{$_[0]} }
    sub HASH::EACH       (\%)       { each %{$_[0]} }
    sub HASH::CLEAR      (\%)       { %{$_[0]} = () }

and a Lisp module that auto-generates the accessors from them using a
built-in `perl-call' subr.

>From there, it will be easy to convert any ordinary Perl data
structure (not counting foreign objects such as database handles) into
a Lisp form which, when evaluated, produces a copy of the original.
The copy would be a Perl data structure of arbitrary complexity rooted
in a single Lisp object.

Next, do the same for Perl code references.  The B module (which is
the workhorse of the Perl-to-C and Perl bytecode compilers) will be
very useful for this.  There will have to be a Lisp data type for
holding a Perl "op" (a C function pointer with associated data).  B
will allow us to decompose a compiled Perl sub into its op tree.

Next, write a Lisp subr that executes a single Perl op.  When executed
in a loop, it should have the same effect as perl's runops() function
(defined in run.c).  Ops work directly with Perl's stack, which is
essentially a Perl array.  For example, pp_helem() pops a key and a
hash pointer from the stack, looks up the hash element, and pushes the
resulting value.  We will need to make push and pop functions
available to Lisp.

Then it will be possible to replace Perl's hashes (for instance) with
an equivalent Lisp structure.  All you need to do is write new
versions of the (hopefully few) opcodes that deal directly with the
hash type.  Assuming the semantics of Perl hashes are exactly
replicated, the interpreter will still be able to run every bit of
pure Perl code that it could before.  Hash *values* will still be Perl
objects, but the structure that contains them and the keys will not.

Proceeding in this manner, I think it will be possible to replace the
Perl interpreter piece by piece.  In the end, the code decomposer will
start with a Perl coderef and output a representation of its op tree
as purely Lisp data.  Up till now, Perl and Lisp run in the same
address space.

The next step will be to rewrite the Perl bytecode reader in Lisp.
All it has to do, essentially, I think, is recreate an op tree from
its serialized form.  Once this is accomplished, you could drop the
Perl core and rely on the bytecode compiler, if it were not for Perl's
ability to eval strings at runtime.

Maybe 10% of all serious Perl programs will run in an interpreter that
can't eval strings.  Perhaps the number could be increased to 50% by
making small changes to the programs.  But we should not count on the
sympathy of their maintainers.  As a last step, one should reimplement
Perl's yylex() and yyparse() functions.

This may all sound too difficult to contemplate.  (Indeed, it may *be*
too difficult.)  However, the alternative, a pseudo-Perl that can run
hello world and maybe for (1..10) {print}, just does not appeal to me
in the least.

Regards
-John