8. Writing an application

This chapter contains information for those wishing to write their own CGEN application.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.1 File Layout

Source files in cgen are organized in a very specific way.(32) It makes it easy to find things.

top level file is cgen-<app>.scm
The best way to create this file is to copy an existing application's file (e.g. cgen-opc.scm) and modify to suit.
file <app>.scm contains general app-specific utilities
other files are <app>-foo.scm
add entry to dev.scm (load-<app>)

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.2 File Generation Process

This is an overview of cgen workflow.

cgen is started with list of files to generate and code generation options
source code is loaded
- - application independent code is loaded if not compiled in
- - application specific code is loaded
  Currently app-specific code is never compiled in. (33)
  - - doesn't affect speed as much as application independent stuff
  - - subject to more frequent changes
  - - makes it easier to do application development if changes to .scm files are "ready to use"
ultimately procedure `cpu-load' is called which is the main driver for loading .cpu files
various data structures are initialized
data files are loaded
- - main <arch>.cpu file is loaded
  There is a #include-like mechanism for loading other files so big architectures can be broken up into several files.
  
  While the architecture description is being loaded, entries not requested are discarded. This happens, for example, when building a simulator: there's no point in keeping instructions specific to a machine that is not being generated. What to keep is based on the MACH and ISA attributes.
- - application specific data files are loaded
  e.g. <arch>.sim
builtin elements are created
each requested file is generated by calling cgen-<file> generator
The output is written to the output file with with-output-to-file so the code must write to (current-output-port).

Some files require heavy duty processing of the cpu description. For example the simulator computes the instruction formats from the instruction field lists of each instruction. This computation is deferred to each cgen-<file> procedure that needs it and must be explicitly requested by them. The results are cached so this is only done once of course.
additional processing for some opcodes files
Several opcodes files are built from three sources.
- - generated code
- - section in <arch>.opc file
  It's not appropriate to put large amounts of C (or perhaps any C) in cgen description files, yet some things are best expressed in some other language (e.g. assembler/disassembler operand parsing/printing).
- - foo.in file
  It seems cleaner to put large amounts of non-machine-generated C in separate files from code generator.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.3 Coding Conventions

unless definition occupies one line, final trailing parenthesis is on a line by itself beginning in column one
definitions internal to a source file begin with '-'
global state variables are named *foo-bar* [FIXME: current code needs updating]
avoid uppercase, except for constants (e.g. *UNSPECIFIED*)
procedures that return a boolean result end in '?'
procedures that modify something end in '!'
classes are named <name>

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.4 Accessing Loaded Data

Each kind of description file entry (defined with `define-foo') is recorded in an object of class <foo>.(34) All the data is collected together in an object of class <arch>. (35)

Data for the currently selected architecture is obtained with several access functions.

  (current-arch-name)
  - return symbol that is the name of the arch
  - this is the name specified with `define-arch'

  (current-arch-comment)
  - return the comment specified with `define-arch'

  (current-arch-atlist)
  - return the attributes specified with `define-arch'

  (current-arch-default-alignment)
  - return a symbol indicated the default alignment
    - one of aligned, unaligned, forced

  (current-arch-insn-lsb0?)
  - return a #t if the least significant bit in a word is numbered 0
  - return a #f if the most significant bit in a word is numbered 0

  (current-arch-mach-name-list)
  - return a list of names (as symbols) of all machs in the architecture

  (current-arch-isa-name-list)
  - return a list of names (as symbols) of all isas in the architecture

  For most of the remaining elements, there are three main accessors:
  [foo is sometimes abbreviated]
    - current-foo-list - returns list of <foo> objects in the architecture
    - current-foo-add! - add a <foo> object to the architecture
    - current-foo-lookup - lookup the <foo> object based on its name

  <atlist>
  (current-attr-list)
  (current-attr-add!)
  (current-attr-lookup)

  <enum>
  (current-enum-list)
  (current-enum-add!)
  (current-enum-lookup)

  <keyword>
  (current-kw-list)
  (current-kw-add!)
  (current-kw-lookup)

  <isa>
  (current-isa-list)
  (current-isa-add!)
  (current-isa-lookup)

  <cpu>
  (current-cpu-list)
  (current-cpu-add!)
  (current-cpu-lookup)

  <mach>
  (current-mach-list)
  (current-mach-add!)
  (current-mach-lookup)

  <model>
  (current-model-list)
  (current-model-add!)
  (current-model-lookup)

  <hardware>
  (current-hw-list)
  (current-hw-add!)
  (current-hw-lookup)

  <ifield>
  (current-ifld-list)
  (current-ifld-add!)
  (current-ifld-lookup)

  <operand>
  (current-op-list)
  (current-op-add!)
  (current-op-lookup)

  <insn>
  (current-insn-list)
  (current-insn-add!)
  (current-insn-lookup)

  <macro-insn>
  (current-minsn-list)
  (current-minsn-add!)
  (current-minsn-lookup)

  (current-ifmt-list)
  - return list of computed <iformat> objects

  (current-sfmt-list)
  - return list of computed <sformat> objects

  [there are a few more to be documented, not sure they'll remain as is]

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.5 Arch Name References

To simplify writing code generators, system names can be specified with fixed strings rather than having to compute them. The output is post-processed to convert the strings to the actual names. Upper and lower case names are supported.

For the architecture name use @arch@, @ARCH@.
For the cpu family name use @cpu@, @CPU@.
For the prefix use @prefix@, @PREFIX@.

The ‘prefix’ notion is to segregate different code for the same cpu family. For example, this is used to segregate the ARM ISA from the Thumb ISA.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.6 String Building

Output generation uses a combination of writing text out as it is computed and building text for later writing out.

The top level file generator uses string-write. It takes string-lists and thunks as arguments and writes each argument in turn to stdout. String-lists are lists of strings (nested arbitrarily deep). It's cheaper to cons long strings together than to use string-append. Thunks return string-lists to write out, but isn't computed until all preceding arguments to `string-write' have been written out. This allows deferring building up of large amounts of text until it needs to be.

The main procedures for building strings and writing them out are:

(string-write string-list-or-thunk1 string-list-or-thunk2 …)
Loops over arguments writing them out in turn.
(string-write-map proc string-list-or-thunk-list)
Apply proc to each element in string-list-or-thunk-list and write out the result.
(string-list arg1 arg2 …)
Return list of arguments. This is identical to list except it is intended to take string-lists as arguments.
(string-list-map proc arg-list)
Return list of proc applied to each element of arg-list. This is identical to map except it is intended to take strings as arguments.
(string-append string1 string2 …)
For small arguments it's just as well to use string-append. This is a standard Scheme procedure. The output is also easier to read when developing interactively. And some subroutines are used in multiple contexts including some where strings are required, so sometimes you have to use string-append.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

8.7 COS

COS is CGEN's Object System. It's a simple OO system for Guile that was written to provide something useful until Guile had its own. COS will be replaced with GOOPs if the Scheme implementation of CGEN is kept.

The pure Scheme implementation of COS uses vectors to record objects and classes.

A complete list of user-visible functions is at the top of ‘cos.scm’.

Here is a list of the frequently used ones.

(class-make name parent-name-list element-list method-list)
Use class-make to define a class.
name: symbol, <name-of-class> parent-name-list: list of symbols, names of each parent class element-list: list of either symbols or (symbol . initial-value) method-list: list of (symbol . lambda)
The result is the class's definition. It is usually assigned to a global variable with same name as class's name. Current cgen code always does this. It's not a requirement but it is convention.
(new <class-name>)
Create a new object with new. <class-name> is typically the global variable that recorded the results of class-make. The result is a new object of the requested class. Class elements have either an "undefined" value or an initial value if one was specified when the class was defined.
(define-getters class-name prefix element-list)
Elements (aka members) are read/written with "accessors". Read accessors are defined with define-getters, which creates one procedure for each element, each defined as (prefix-element-name object).

This is a macro so don't quote anything.
(define-setters class-name prefix element-list)
Write accessors are defined with define-setters, which creates one procedure for each element, each defined as (prefix-set-element-name! object new-value).

This is a macro so don't quote anything.
(elm-get object elm-name)
This can only be used in method definitions (blech, blah blah blah).
(elm-set! object elm-name new-value)
This can only be used in method definitions (blech, blah blah blah).
(send object method-name arg1 arg2)
Invoke method method-name on object.

The convention is to put this in a cover fn: (class-name-method-name object arg1 arg2).
(send-next object method-name arg1 arg2)
Same as send except only usable in methods and is used to invoke the method in the parent class.
(make object . args)
One standard way to create a new object is with make. It is a wrapper, defined as
(define (make object . args) (apply send (cons (new object) (cons 'make! args))) )
(vmake class . args)
The other standard way to create objects is with vmake.

args is a list of option names and arguments.

??? Not completely implemented yet.
(method-make! class method-name lambda)
The normal way of creating methods is to use method-make!, not define them with the class. It's just easier to define them separately.
(method-make-virtual! class method-name lambda)
Create virtual methods created with method-make-virtual!.
(method-make-forward! class elm-name methods) -> unspecified
Forwarding a method invocation on one object to another is extremely useful so some utilities have been created to simplify creating forwarding methods.

methods is a list of method names. A method is created for each one that forwards the method onto the object contained in element ELM-NAME.
(method-make-virtual-forward!)
Same as method-make-forward! except that it creates virtual methods.

[ << ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

This document was generated by Doug Evans on January, 28 2010 using texi2html 1.78.

8.1 File Layout		Organization of source files
8.2 File Generation Process		Workflow in cgen
8.3 Coding Conventions		Coding conventions
8.4 Accessing Loaded Data		Reading data from loaded .cpu files
8.5 Arch Name References		Architecture names in generated code
8.6 String Building		Building long strings and writing them out
8.7 COS		Cgen's Object System