[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5. Porting

This chapter describes how to do a CGEN port. It focuses on doing binutils and simulator ports, but the general procedure should be generally applicable.

5.1 Introduction to porting  
5.2 Supported Guile versions  
5.3 Running configure  
5.4 Writing a CPU description file  
5.5 Doing an opcodes port  
5.6 Doing a GAS port  
5.7 Building a GAS test suite  
5.8 Doing a simulator port  
5.9 Building a simulator test suite  

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.1 Introduction to porting

Doing a GNU tools port for a new processor basically consists of porting the following components more or less in order. The order can be changed, of course, but the following order is reasonable. Certainly things like BFD and opcodes need to be finished earlier than others. Bugs in earlier pieces are often not found until testing later pieces so each piece isn't necessarily finished until they all are.

The use of CGEN affects the opcodes, GAS, and simulator portions only. As always, the M32R port is a good reference base.

One goal of CGEN is to describe the CPU in an application independent manner so that program generators can do all the repetitive work of generating code and tables for each CPU that is ported.

For opcodes, several files are generated. No additional code need be written in the opcodes directory although as an escape hatch the user can add target specific code to file <arch>.opc in the CGEN cpu source directory. These functions will be included in the relevant generated files. An example of when you need to create an <arch>.opc file is when there are special pseudo-ops that need to be parsed, for example the high/shigh pseudo-ops of the M32R. See section 5.5 Doing an opcodes port.

For GAS, no files are generated (except test cases!) so the port is done more or less like the other GAS ports except that the assembler uses the CGEN-built opcode table plus `toplevel/gas/cgen.[ch]'.

For the simulator, several files are built, and other support files need to be written. See section 5.8 Doing a simulator port.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.2 Supported Guile versions

In order to avoid suffering from the bug of the day when using snapshots, CGEN development has been confined to Guile releases only. As of this writing (1999-04-26) only Guile 1.2 and 1.3 are supported. At some point in the future older versions of Guile will no longer be supported.

If using Guile 1.2, configure it with --enable-guile-debug --enable-dynamic-linking to work around an unknown bug in this version of Guile. I ran into this on Solaris 2.6.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.3 Running configure

When doing porting or maintenance activity with CGEN, the build tree must be configured with the --enable-cgen-maint option. This adds the necessary dependencies to the `toplevel/opcodes' and `toplevel/sim' directories.

CGEN uses Guile so it must be installed. At present the CGEN configury requires that if Guile isn't installed in `/usr/local' then the --with-guile=/guile/install/dir option must be passed to `configure' to specify where Guile is installed.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4 Writing a CPU description file

The first step in doing a CGEN port is writing a CPU description file. The best way to do that is to take an existing file (such as the M32R) and use it as a template.

Writing a CPU description file generally involves writing each of the following types of entries, in order. See section 3. CGEN's Register Transfer Language, for detailed descriptions of each type of entry that appears in the description file.

5.4.1 Conventions  Programming style conventions
5.4.2 Writing define-arch  Architecture wide specs
5.4.3 Writing define-isa  Instruction set characteristics
5.4.4 Writing define-cpu  CPU families
5.4.5 Writing define-mach  Machine variants
5.4.6 Writing define-model  Models of each machine variant
5.4.7 Writing define-hardware  Hardware elements
5.4.8 Writing define-ifield  Instruction fields
5.4.9 Writing define-normal-insn-enum  Instruction enums
5.4.10 Writing define-operand  Instruction operands
5.4.11 Writing define-insn  Instructions
5.4.12 Writing define-macro-insn  Macro instructions
5.4.13 Using define-pmacro  Preprocessor macros
5.4.14 Interactive development  Useful things to do in a Guile shell

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4.1 Conventions

First a digression on conventions and programming style.

  1. define-foo vs. define-normal-foo

    Each CPU description define- entry generally provides two forms: the normal form and the general form. The normal form has a simple, fixed-argument syntax that allows one to specify the most popular elements. When one needs to specify more obscure elements of the entry one uses the long form which is a list of name/value pairs. The naming convention is to call the normal form define-normal-foo and the general form define-foo.

  2. Parentheses placement


      insn-op1 "insn format enums" () f-op1 OP1_
       AND OR   XOR INV)

    All Lisp/Scheme code I've read puts the trailing parenthesis on the previous line. CGEN programming style says the last trailing parenthesis goes on a line by itself. If someone wants to put forth an argument of why this should change, please do. I like putting the very last parenthesis on a line by itself in column 1 because it makes it easier to traverse the file with a parenthesis matching keystroke.

  3. StudlyCaps vs. _ vs. -

    The convention is to have most things lowercase with words separated by `-'. Things that are uppercase are fixed and well defined: enum values and mode names. This convention must be followed.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4.2 Writing define-arch

Various simple and architecture-wide common things like the name of the processor must be defined somewhere, so all of this stuff is put under define-arch.

This must be the first entry in the description file.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4.3 Writing define-isa

There are two purposes to define-isa. The first is to specify parameters needed to decode instructions.

The second is to give the instruction set a name. This is important for architectures like the ARM where one CPU can execute multiple instruction sets.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4.4 Writing define-cpu

CPU families are an internal and artificial classification designed to collect processor variants that are sufficiently similar together under one roof for the simulator. What is "sufficiently similar" is up to the programmer. For example, if the only difference between two processor variants is that one has a few extra instructions, there's no point in treating them separately in the simulator.

When simulating the variant without the extra instructions, said instructions are marked as "invalid". On the other hand, putting 32 and 64 bit variants of an architecture under one roof is problematic since the word size is different. What "under one roof" means is left fuzzy for now, but basically the simulator engine has a collection of structures defining internal state, and "CPU families" minimize the number of copies of generated code that manipulate this state.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4.5 Writing define-mach

CGEN uses "mach" in the same sense that BFD uses "mach". "Mach", which is short for `machine', defines a variant of the architecture.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4.6 Writing define-model

When describing a CPU, in any context, there is "architecture" and there is "implementation". In CGEN parlance a "model" is an implementation of a "mach". Models specify pipeline and other performance related characteristics of the implementation.

Some architectures bring pipeline details up into the architecture (rather than making them an implementation detail). It's not clear yet how to handle all the various possibilities so at present this is done on a case-by-case basis. Maybe a straightforward solution will emerge.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4.7 Writing define-hardware

The registers of the processor are specified with define-hardware. Also, immediate constants and addresses are defined to be "hardware". By convention, all hardware elements names are prefaced with `h-'. This convention must be followed.

Pre-defined hardware elements are:

Normal CPU memory(17)
signed integer
unsigned integer
an address
an instruction address

Where are floats you ask? They'll be defined when the need arises.

The program counter is named `h-pc' and must be specified. It is not a builtin element as sometimes architectures need to modify its behaviour (in the get/set specs).

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4.8 Writing define-ifield

Writing instruction field entries involves analyzing the instruction set and creating an entry for each field. If a field has multiple purposes, one can create separate entries for each intended purpose. The names should generally follow the names used by the architecture reference manual.

By convention, all instruction field names are prefaced with `f-'. This convention must be followed.

CGEN tries to allow the use of the bit numbering as found in the architecture reference manual. This minimizes transcription errors both when writing the `.cpu' file and later when communicating field info to people.

There are two key pieces of data that CGEN uses to organize field specification: the default insn word size (in bits), and whether bit number 0 is the LSB (least significant bit) or the MSB (most significant bit).

In the general case, fields are described with 4 numbers: word-offset, word-length, start, and length. All instruction fields (*) live in exactly one word and must be contiguous. Non-contiguous fields are specified with "multi-ifields" which are fields built up out of several smaller typically disjoint fields. The size of the word depends on the context. `word-offset' specifies the offset in bits from the start of the insn to the word containing the field. `word-length' specifies the size in bits of the word containing the field. `start' specifies the position of the MSB of the field in the word. `length' specifies the size in bits of the field.


Suppose an ISA has instructions that are normally 16 bits, but has instructions that may take an additional 32 bit immediate and optionally an additional 16 bit immediate after that. Also suppose the ISA numbers the bits starting from the LSB.

default-insn-word-bitsize = 16, lsb0? = #t

An instruction with four 4 bit fields and one 32 bit immediate might be:

  | op1 | op2 | r1 | r2 | simm32 | simm16 |

            word-offset  word-length  start  length
f-op1:           0            16        15      4
f-op2:           0            16        11      4
f-r1:            0            16         7      4
f-r2:            0            16         3      4
f-simm32:       16            32        31     32
f-simm16:       48            16        15     16

If lsb0? = #f, then the example becomes:

            word-offset  word-length  start  length
f-op1:           0            16         0      4
f-op2:           0            16         4      4
f-r1:            0            16         8      4
f-r2:            0            16        12      4
f-simm32:       16            32         0     32
f-simm16:       48            16         0     16

Endianness for the purposes of this example is irrelevant. In the word containing op1,op2,r1,r2, op1 is in the most significant nibble and r2 is in the least significant nibble.

For a large number of cases specifying all 4 numbers is excessive. With careful redefinition of the starting bit number, one can get away with only specifying start,length. Imagine several words of the default insn word size laid out from the start of the insn. On top of that lay the field. Now pick the minimal set of words that are required to contain the field. That is the "word" we use. The `start' value is basically computed by adding the offset of the first containing word to the starting bit of the field in the word. It's slightly more complicated than that because lsb0? and the word's size must be taken into account. This is best illustrated by rewriting the above example:

lsb0? = #t

            start  length
f-op1:        15      4
f-op2:        11      4
f-r1:          7      4
f-r2:          3      4
f-simm32:     47     32
f-simm16:     63     16

lsb0? = #f

            start  length
f-op1:         0      4
f-op2:         4      4
f-r1:          8      4
f-r2:         12      4
f-simm32:     16     32
f-simm16:     48     16

Note: This simpler definition doesn't work in all cases. Where it doesn't the full-blown definition must be used.

There are currently no shorthand macros for specifying the full-blown definition. It is recommended that if you have to use one that you write a macro to reduce typing.

(*) This doesn't include fields like multi-ifields.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4.9 Writing define-normal-insn-enum

Writing instruction enum entries involves analyzing the instruction set and attaching names to the opcode fields. For example, if a field named `op1' is used to select which of add, addc, sub, subc, and, or, xor, and inv instructions, one would write something like the following:

  insn-op1 "insn format enums" () f-op1 OP1_

These entries simplify instruction definitions by giving a name to a particular value for a particular instruction field. By convention, enum names are uppercase. This convention must be followed.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4.10 Writing define-operand

Operands are what instruction semantics use to refer to hardware elements. The typical use of an operand is to map instruction fields to hardware. For example, if field `f-r2' is used to specify one of the registers defined by the h-gr hardware entry, one would write:

(dnop sr "source register" () h-gr f-r2)

dnop is short for "define normal operand" (18). See section 3. CGEN's Register Transfer Language, for more information.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4.11 Writing define-insn

This involves going through the CPU manual and writing an entry for each instruction. Instructions specific to a particular machine variant are indicated so with the `MACH' attribute. Example:

  add "add instruction"
  ((MACH mach1)) ; or (MACH mach1,mach2,...) for multiple variants

The `base' machine is a predefined machine variant that includes instructions available to all variants, and is the default if no `MACH' attribute is specified.

When the `.cpu' file is processed, CGEN will analyze the semantics to determine:

CGEN will also try to simplify the semantics as much as possible:

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4.12 Writing define-macro-insn

Some instructions are really aliases for other instructions, maybe even a sequence of them. For example, an architecture that has a general decrement-then-store instruction might have a specialized version of this instruction called push supported by the assembler. These are handled with "macro instructions". Macro instructions are used by the assembler/disassembler only. They are not used by the simulator.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4.13 Using define-pmacro

When a group of entries, say instructions, share similar information, a macro (in the C preprocessor sense) can be used to simplify the description. This can be used to save a lot of typing, which also improves readability since often 1 page of code is easier to understand than 4.

Here is an example from the M32R port.

(define-pmacro (bin-op mnemonic op2-op sem-op imm-prefix imm)
     (dni mnemonic
	  (.str mnemonic " reg/reg")
	  (.str mnemonic " $dr,$sr")
	  (+ OP1_0 op2-op dr sr)
	  (set dr (sem-op dr sr))
     (dni (.sym mnemonic "3")
	  (.str mnemonic " reg/" imm)
	  (.str mnemonic "3 $dr,$sr," imm-prefix "$" imm)
	  (+ OP1_8 op2-op dr sr imm)
	  (set dr (sem-op sr imm))
(bin-op add OP2_10 add "$hash" slo16)
(bin-op and OP2_12 and ""      uimm16)
(bin-op or  OP2_14 or  "$hash" ulo16)
(bin-op xor OP2_13 xor ""      uimm16)

.sym/.str are short for Scheme's symbol-append and string-append operations and are conceptually the same as the C preprocessor's ## concatenation operator. See section 4.6 Symbol concatenation, and See section 4.7 String concatenation, for details.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4.14 Interactive development

The normal way(19) of writing a CPU description file involves starting Guile and developing the .CPU file interactively. The basic steps are

  1. Run guile.
  2. (load "dev.scm")
  3. Load application, e.g. (load-opc) or (load-sim)
  4. Load CPU description file, e.g. (cload #:arch "m32r")
  5. Run generators until output looks reasonable, e.g. (cgen-opc.c)

To assist in the development process and to cut down on some typing, `dev.scm' looks for `$HOME/.cgenrc' and, if present, loads it. Typical things that `.cgenrc' contains are definitions of procedures that combine steps 3 and 4 above.


(define (m32r-opc)
  (cload #:arch "m32r")
(define (m32r-sim)
  (cload #:arch "m32r" #:options "with-scache with-profile=fn")
(define (m32rbf-sim)
  (cload #:arch "m32r" #:machs "m32r" #:options "with-scache with-profile=fn")
(define (m32rxf-sim)
  (cload #:arch "m32r" #:machs "m32rx" #:options "with-scache with-profile=fn")

CPU description files are loaded into an interactive guile session with cload. The syntax is:

(cload #:arch arch
       [#:machs "mach-list"]
       [#:isas "isa-list"]
       [#:options "option-list"])

Only the #:arch argument is mandatory.

`mach-list' is a comma separated string of machines to keep.

`isa-list' is a comma separated string of isas to keep.

`options' is a space separated string of options for the application.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.5 Doing an opcodes port

The best way to begin a port is to take an existing one (preferably one that is similar to the new port) and use it as a template.

  1. Run guile.
  2. (load "dev.scm"). This loads in a set of interactive development routines.
  3. (load-opc). Load the opcodes support.
  4. Edit your `cpu/<arch>.cpu' and `cpu/<arch>.opc' files.
  5. (cload #:arch "cpu/")
  6. Run each of:
  7. Repeat steps 4, 5 and 6 until the output looks reasonable.
  8. Add dependencies to `opcodes/Makefile.am' to generate the eight opcodes files (use the M32R port as an example).
  9. Run make dep from the `opcodes' build directory.
  10. Run make all-opcodes from the top level build directory.

Note that Guile is not currently shipped with Binutils, etc. Until Guile is shipped with Binutils, etc. or a C implementation of CGEN is done, the generated files are installed in the source directory and checked into CVS.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.6 Doing a GAS port

A GAS CGEN port is essentially no different than a normal port except that the CGEN opcode table is used, and there are extra supporting routines available in `gas/cgen.[ch]'. As always, a good way to get started is to take the M32R port as a template and go from there.

The important CGEN-specific things to keep in mind are:

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.7 Building a GAS test suite

CGEN can also build the template for test cases for all instructions. In some cases it can also generate the actual instructions. The result is then assembled, disassembled, verified, and checked into CVS. Further changes are usually done by hand as it's easier. The goal here is to save the enormous amount of initial typing that is required.

  1. cd to the CGEN build directory
  2. make gas-test

    At this point two files have been created in the CGEN build directory: `gas-allinsn.exp' and `gas-build.sh'. The `gas-build.sh' script normally requires one command line argument: the location of your `gas' build directory. If this argument is omitted, the script searches in `../gas' automatically.

  3. Copy `gas-allinsn.exp' to `toplevel/gas/testsuite/gas/<arch>/allinsn.exp'.
  4. sh gas-build.sh

    At this point directory tmpdir contains two files: `allinsn.s' and `allinsn.d'. File `allinsn.d' usually needs a bit of massaging.

  5. Copy `tmpdir/allinsn.[sd]' to `toplevel/gas/testsuite/gas/<arch>'
  6. Run make check in the `gas' build directory and massage things until you're satisfied the files are correct.
  7. Check files into CVS.

At this point further additions/modifications are usually done by hand.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.8 Doing a simulator port

The same basic procedure for opcodes porting applies here.

  1. Run guile.
  2. (load "dev.scm")
  3. (load-sim)
  4. Edit your `cpu/<arch>.cpu' file.
  5. (cload #:arch "cpu/")
  6. Run each of:
  7. Repeat steps 4,5,6 until the output looks reasonable.
  8. Edit your cpu/<arch>.cpu file.
  9. (cload #:arch "cpu/" #:machs "mach1[,mach2[,...]]")
  10. Run each of:
  11. Repeat steps 8, 9 and 10 until the output looks reasonable.

The following additional files are also needed. These live in the `sim/<arch>' directory. Administrivia files like `configure.in' and `Makefile.in' are omitted.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.9 Building a simulator test suite

CGEN can also build the template for test cases for all instructions. In some cases it can also generate the actual instructions (20). The result is then verified and checked into CVS. Further changes are usually done by hand as it's easier. The goal here is to save the enormous amount of initial typing that is required.

  1. cd to the CGEN build directory
  2. make sim-test ISA=<arch>

    At this point two files have been created in the CGEN build directory: `sim-allinsn.exp' and `sim-build.sh'.

  3. Copy `sim-allinsn.exp' to `toplevel/sim/testsuite/sim/<arch>/allinsn.exp'.
  4. sh sim-build.sh

    At this point a new subdirectory called `tmpdir' will be created and will contain one test case for each instruction. The framework has been filled in but not the actual test case. It's handy to write an "include file" containing assembler macros that simplify writing test cases. See `toplevel/sim/testsuite/sim/m32r/testutils.inc' for an example.

  5. write testutils.inc
  6. finish each test case
  7. copy `tmpdir/*.cgs' to `toplevel/sim/testsuite/sim/<arch>'
  8. run make check in the sim build directory and massage things until you're satisfied the files are correct
  9. Check files into CVS.

At this point further additions/modifications are usually done by hand.

[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Ben Elliston on January, 8 2003 using texi2html