This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Hello Guile-folk, This message details the steps I believe are necessary for creating a successful Perl-to-Scheme translator. By "successful" I mean one that can run as many pure-Perl programs as a given version of Perl, and could at least conceivably be made source-compatible with a subset of Perl's C interface. Do not start by wrting a parser. Instead, start by using Perl's excellent embedding interface to link the Perl interpreter into a Guile module or Guile proper. (type perldoc perlembed) When all else is done, this part may be jettisoned, but it will extremely valuable between now and then. Devise a way to wrap Perl data in Scheme objects, and vice versa. Perl values' reference count must be maintained; a simple inc on import and dec on sweep will do for starters. Scheme objects held by Perl must be marked. I recommend keeping a doubly-linked list of them in the Perl interpreter object. The Perl wrapper object's destructor should remove its Scheme thing from the list. Of course, the same object may occur in the list more than once, which amounts to a reference count. (Naturally, one will want to avoid getting Perl references mixed up in circular data structures.) Write a general-purpose Perl-to-Scheme data conversion function in C and another for Scheme-to-Perl. They should convert natural counterparts to each other (e.g., integers and strings) and use the wrapper for unconvertible stuff. I recommend converting Perl arrayrefs to Scheme lists and vice versa, performing a deep copy. I also suggest using a Perl arrayref-ref to represent a vector. #f, '(), and the various Perl forms of false will get sticky... I guess #f goes with undef, so the Perl test of Scheme truth is "defined($bla)". Provide a way to evaluate and call Perl code in Scheme, and to call Scheme functions in Perl. Scheme should be able to specify any of the three Perl contexts (scalar, list, void) for Perl evaluation. To be robust, both languages must be able to catch each other's exceptions. Perl has mechanisms to ensure that control returns to a given frame in the event of a longjmp and to unwind its context. If a longjmp occurs because of a Perl error, the code that catches it should raise a Scheme exception, and vice versa. You may well ask what any of the above has to do with writing a translator. Er, nothing... except that it will create an environment which is very conducive to experimenting with and debugging conversion and evaluation techniques. We will know the semantics are correct, because Perl will be doing all of the work. Then, bit by bit, we can replace Perl data structures and operators with Scheme equivalents, using the humongous body of existing Perl code to test every step of the way. You may then ask whether it is really worth the trouble to set up such an environment when the goal is to write a translator. I have two answers to this. One, a Perl-in-Scheme embedding would be useful in its own right. Two, it has been done in the case of Emacs Lisp. The Perlmacs implementation would, I imagine, serve as a good example for doing the same thing in Guile. See http://www.perl.com/CPAN/authors/id/JTOBEY/ for further information. Are you with me so far? Good! Now comes the exciting part. I will lay out the steps for writing a Perl-to-Elisp compiler based on what is already done. I don't know Guile as well as Emacs, so I will just assume that everything I say would apply to Guile as well. Substitute "Scheme" for "Lisp". Write accessors for all Perl data types. For example, to get at an element in a hash, you would have something like: (perl-hash-fetch hash key) and to list the keys, you could do: (perl-hash-keys 'list-context hash) I have made a start at this with Elisp, but it is still alpha, and the details are very subject to change. I have a Perl module with subs like sub HASH::NEW () { my %x; \%x } sub HASH::GET ($) { \%{$_[0]} } sub HASH::FETCH (\%$) { $_[0]->{$_[1]} } sub HASH::STORE (\%$$) { $_[0]->{$_[1]} = $_[2] } sub HASH::KEYS (\%) { keys %{$_[0]} } sub HASH::EACH (\%) { each %{$_[0]} } sub HASH::CLEAR (\%) { %{$_[0]} = () } and a Lisp module that auto-generates the accessors from them using a built-in `perl-call' subr. >From there, it will be easy to convert any ordinary Perl data structure (not counting foreign objects such as database handles) into a Lisp form which, when evaluated, produces a copy of the original. The copy would be a Perl data structure of arbitrary complexity rooted in a single Lisp object. Next, do the same for Perl code references. The B module (which is the workhorse of the Perl-to-C and Perl bytecode compilers) will be very useful for this. There will have to be a Lisp data type for holding a Perl "op" (a C function pointer with associated data). B will allow us to decompose a compiled Perl sub into its op tree. Next, write a Lisp subr that executes a single Perl op. When executed in a loop, it should have the same effect as perl's runops() function (defined in run.c). Ops work directly with Perl's stack, which is essentially a Perl array. For example, pp_helem() pops a key and a hash pointer from the stack, looks up the hash element, and pushes the resulting value. We will need to make push and pop functions available to Lisp. Then it will be possible to replace Perl's hashes (for instance) with an equivalent Lisp structure. All you need to do is write new versions of the (hopefully few) opcodes that deal directly with the hash type. Assuming the semantics of Perl hashes are exactly replicated, the interpreter will still be able to run every bit of pure Perl code that it could before. Hash *values* will still be Perl objects, but the structure that contains them and the keys will not. Proceeding in this manner, I think it will be possible to replace the Perl interpreter piece by piece. In the end, the code decomposer will start with a Perl coderef and output a representation of its op tree as purely Lisp data. Up till now, Perl and Lisp run in the same address space. The next step will be to rewrite the Perl bytecode reader in Lisp. All it has to do, essentially, I think, is recreate an op tree from its serialized form. Once this is accomplished, you could drop the Perl core and rely on the bytecode compiler, if it were not for Perl's ability to eval strings at runtime. Maybe 10% of all serious Perl programs will run in an interpreter that can't eval strings. Perhaps the number could be increased to 50% by making small changes to the programs. But we should not count on the sympathy of their maintainers. As a last step, one should reimplement Perl's yylex() and yyparse() functions. This may all sound too difficult to contemplate. (Indeed, it may *be* too difficult.) However, the alternative, a pseudo-Perl that can run hello world and maybe for (1..10) {print}, just does not appeal to me in the least. Regards -John