Code generation

A Compilation object manages the classes, methods, and temporary state generated as a result of compiling a single top-level ModuleExp.

class Compilation
{ ...;
  ClassType[] classes;
  boolean immediate;
  public ClassType addClass
    (LambdaExp lexp, String name)
  { ... }
  public ClassType(ModuleExp exp, ...)
  { ...; addClass (exp, ...); }
}

Each Compilation may create one or more ClassType objects, each of which generates the bytecodes for one class. Each ClassType is generated from a LambdaExp, including the top ModuleExp. The boolean immediate is true if we are compiling for immediate loading, and is false if the target is one or more .class files.

The addClass method does all the work to compile a given LambdaExp. It creates a ClassType, adds it to Compilation's classes array, and generates Method objects for the constructor and the main applyX method. Once the applyX Method has been created, addClass emits some bytecodes to set up the incoming parameters, and then invokes the virtual compile method on the body of the LambdaExp, which generates the code that does the actual work of the procedure.

The Compilation constructor gets a ModuleExp, which it passes to addClass. The compile method of LambdaExp (which gets called for all lambdas except the dummy top-level) also calls addClass to generate the class corresponding to the lambda, and then it emits instructions to create a new instance of the generated Procedure class, and pushes it on the Java stack.

Targets

Most operations in the Java VM leave their result on the VM stack, where they are available for succeeding operations. The obvious and general way to compile an expression is therefore to generate bytecode instructions that leave the result (in the form of a Object reference) on the stack. This handles most cases quite well, but we can do better. We specify a Target parameter when invoking the compile method; the Target specifies where to leave the result.

public abstract class Target
{ ...;
  public abstract void compileFromStack
    (Compilation comp, Type stackType);
  public static final Target Ignore
  = new IgnoreTarget();
}

An Expression's compile method does not have to handle all the kinds of Targets, as long as it can generate code to leave the result on the VM stack, and then invoke compileFromStack, which is responsible for moving the result to the actual target.

The simplest Target is an IgnoreTarget. It is used when the result of an expression will be ignored, but we still need to evaluate it for possible side-effects. The implementation of IgnoreTarget.compileFromStack just emits an instrcution to pop a value from the VM stack. Expressions that have no side-effects can check if the target is an IgnoreTarget, and then immediately resturn. This saves a useless push-pop pair.

The usual Target is an StackTarget. This specifies that an expression should leave the result on the VM stack. Normally, the type of the result is Object, but a StackTarget can specify some other expected type, when that can be determined. The implementation of StackTarget.compileFromStack is also trivial: If the type of the result on the stack is a sub-type of the expected target type, nothing needs to be done; otherwise, it generates code to do the type conversion.

Things get more interesting when we come to ConditionalTarget.

public class ConditionalTarget
    extends Target
{ ...;
  public Label ifTrue, ifFalse;
}

A ConditionalTarget is used when compiling the test expression in a conditional. The expression is evaluated as a boolean value; if the result is true, control transfers to ifTrue; otherwise control transfers to ifFalse. Using ConditionalTarget makes it straight-forward to generate optimal code for nested conditionals, including and and or macros, and (when inlining) functions such as not and eq?.

The bytecode package

The ClassType and Method classes are in a separate gnu.bytecode package, which is an intermediate-level interface to code generation and Java .class files. It is essentially independent of Scheme or the rest of Kawa.

class ClassType extends Type
{ ...;
  CpoolEntry[] constant_pool;
  Method methods; // List of methods.
  Field fields; // List of fields.
  public Field addField
    (String name, Type type, int flags)
  { ...Create new field... }
  public method addMethod(String name,...)
  { ...Create new method... }
  public void writeToStream
    (OutputStream stream) { ... }
  public void writeToFile(String filename)
  { ... }
  public byte[] writeToArray()
  { ... }
}

The ClassType class is the main class of the bytecode package. It manages a list Fields, a list of Methods, and the constant pool. There are utility methods for adding new fields, methods, and constant pool entries.

When the ClassType has been fully built, the writeToFile method can be used to write out the contents into a file. The result has the format of a .class file [JavaVmSpec]. Alternatively, the class can be written to an internal byte array (that has the same layout as a .class file) using the writeToArray method. The resulting byte array may be used by a ClassLoader to define a new class for immediate execution. Both of the these methods are implemented on top of the more general writeToStream.

Each method is represented by a Method object.

class Method implements AttrContainer
{ ...;
  Type[] arg_types;
  Type return_type;
  Attribute attributes;
}

An AttrContainer is an object that contains zero or more Attributes. The Java .class file format is quite extensible. Much of the information is stored in named attributes. There are standard attributes, but an application can also define new ones (that are supposed to be ignored by applications that do not understand them). Each class file may have a set of top-level attributes. In addition, each field and method may have attributes. Some standard attributes may have nested sub-attributes.

public abstract class Attribute
{ ...;
  AttrContainer container;
  String name;
}

An Attribute's container specifies who owns the attribute. The attribute also has a name, plus methods to gets its size, write it out, etc.

The most interesting (and large) standard Attribute occurs in a method and has the name "Code". It contains the actual bytecode instructions of a non-native non-abstract method, and we represent it using CodeAttr.

class CodeAttr extends Attribute
{ ...;
  Variable addLocal(Type t, String name)
  { ... }
  public void emitLoad(Variable var)
  { ... }
  public void emitPushInt(int i)
  { ... }
  public void putLineNumber(int lineno)
  { ... }
}

As an example of the level of functionality, emitPushInt compiles code to push an integer i on stack. It selects the right instruction, and if i is too big for one of the instructions that take an inline value, it will create a constant pool entry for i, and push that.

The method addLocal creates a new local variable (and makes sure debugging information is emitted for it), while emitLoad pushes the value of the variable on the stack.

Kawa calls putLineNumber to indicate that the current location corresponds to a given line number. These are emitted in the .class file, and most Java interpreters will use them when printing a stack trace.

We use gnu.bytecode mainly for generating .class files, but it also has classes to read .class files, and also classes to print a ClassType in readable format. The combination makes for a decent Java dis-assembler.

There are other toolkits for creating or analyzing .class files, but gnu.bytecode was written to provide a lot of support for code generation while having little overhead. For example, some assemblers represent each instruction using an Instruction instance, whereas CodeAttr just stores all the instruction in a byte array. Using a linked list of Instructions may be more “object-oriented”, and it does make it easier to do peep-hole optimizations, but the time and space overhead compared to using an array of bytes is huge. (If you do need to do peephole optimizations, then it makes sense to use a doubly-linked list of Instructions, but to use that in conjunction with CodeAttr. You will in any case want a byte-array representation for input and output.)

Literals

A Scheme quoted form or self-evaluating form expands to a QuoteExp. Compiling a QuoteExp would seem a trivial exercise, but it is not. There is no way to embed (say) a list literal in Java code. Instead we create a static field in the top-level class for a each (different) QuoteExp in the body we are compiling. The code compiled for a QuoteExp then just needs to load the value from the corresponding static field. The tricky part is making sure that the static field gets initialized (when the top-level class is loaded) to the value of the quoted form.

The basic idea is that for:

(define (foo) '(3 . 4))

we compile:

class foo extends Procedure0
{
  Object static lit1;
  public static
  { // Initializer
    lit1 = new Pair(IntNum.make(3),
                    IntNum.make(4));
  }
  public Object apply0()
  { return lit1; }
}

When the compiled class foo is loaded, we do:

Class fooCl = Class.forName("foo");
Procedure fooPr
  = (Procedure) fooCl.newInstance ();
// Using foo:
Object result = fooPr.apply0 ();

How does the Kawa compiler generate the appropriate new Pair expression as shown above? A class whose instances may appear in a quoted form implements the Compilable interface:

interface Compilable
{
  Literal makeLiteral(Compilation comp);
  void emit(Literal l, Compilation comp);
}

The makeLiteral creates a Literal object that represents the value of this object. That Literal is later passed to emit, which emits bytecode instructions that (when evaluated) cause a value equal to this to be pushed on the Java evaluation stack.

This two-part protocol may be overkill, but it makes it possible to combine duplicate constants and it also supports circularly defined constants. (Standard Scheme does not support self-referential constants, but Common Lisp does. See section 25.1.4 “Similarity of Constants” in [CommonLisp2].)

It is possible that the Compilable interface will be replaced or augmented with JDK 1.1's serialization feature.

If we are compiling for immediate execution, we do not need to generate code to regenerate the literal. In fact, we want to re-use the literal from the original source form. The problem is passing the source literal to the byte-compiled class. To do that, we use the CompiledProc interface.

interface CompiledProc
{
  public abstract void setLiterals
    (Object[] values);
}

An immediate class compiled from a top-level form implements the CompiledProc form. After an instance of the ModuleBody has been created, it is coerced to a CompiledProc, and setLiterals is called. The argument to setLiterals is an array of the necessary literal values, and the method that implements it in the compiled code causes the array of literal values to be saved in the ModuleBody instance, so it can be accessed by the compiled code.