This is the mail archive of the cgen@sources.redhat.com mailing list for the CGEN project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

exposed pipeline patch (long!)


I have big problems with this patch.

The SID stuff I don't much care about.
Each application's developers must be free to do things as they see fit,
provided they do so in the domain of that application and not intrude
into cgen proper.

One must have compelling reasons for moving or putting
application specific stuff into the non-application specific parts of cgen
(and to be honest I don't think they exist in this case).
Before this patch goes in I think someone needs to justify the following:

Refering to APPLICATION in rtl-c.scm.  Blech.

 > -(define-fn xop (estate options mode object) object)
 > +(define-fn xop (estate options mode object) 
 > +  (let ((delayed (assoc '#:delay (estate-modifiers estate))))
 > +    (if (and delayed
 > +	     (equal? APPLICATION 'SID-SIMULATOR)
 > +	     (operand? object))

and here

 > +(define-fn delay (estate options mode num-node rtx)
 > +  (case APPLICATION
 > +    ((SID-SIMULATOR)

and, references in rtl.texi.

 > +@item (delay num expr)
 > +In older "sim" simulators, indicates that there are @samp{num} delay
 > +slots in the processing of @samp{expr}. When using this rtx in instruction
 > +semantics, CGEN will infer that the instruction has the DELAY-SLOT
 > +attribute.  
 > +
 > +In newer "sid" simulators, evaluates to the writeback queue for hardware
 > +operand @samp{expr}, at @samp{num} instruction cycles in the
 > +future. @samp{expr} @emph{must} be a hardware operand in this case. 

rtl.texi shall not mention any particular app, and especially not go
into details about implementation.
[it can certainly make general references to classes of apps,
e.g. simulators, but that's it]

rtl shall define the ISA in an application independent way.
We can't have `delay' mean one thing to one app and one thing to another app.

Can the people who want this patch to go in try to come up with
a different way to do things?  At the very least define `delay'
in an application independent manner.  If any reasonable form of
`delay' is insufficient to describe every architecture we are interested
in then clearly we need something more.  [But obviously before
creating new rtl, one should make sure it's warranted.]

IMO it is not ok to commit this.

Ben Elliston writes:
 > I'm posting this patch on behalf of Graydon Hoare, who write this
 > exposed pipeline support last year.  It's a more generalised form of
 > the (delay ..) rtx and has been used for a couple of ports already.
 > 
 > Rather than just commit it, I thought I would post it for review.
 > Okay to commit?
 > 
 > Ben
 > 
 > 2001-06-05  graydon hoare  <graydon@redhat.com>
 > 
 >         * utils.scm (foldl): Define.
 >         (foldr): Define.
 >         (union): Define.
 >         (intersection): Simplify.
 >         * sid.scm : Set APPLICATION to SID-SIMULATOR.
 >         (-op-gen-delayed-set-maybe-trace): Define.
 >         (<operand> 'gen-set-{quiet,trace}): Delegate to
 >         op-gen-delayed-set-quiet etc. Note: this is still a little tangled
 >         up and needs cleaning.
 >         (-with-parallel?): Hardwire with-parallel to #t.
 >         (<operand> 'cxmake-get): Replace with lookahead-aware code
 >         * sid-decode.scm: Remove per-insn writeback fns.
 >         (-gen-idesc-decls): Redefine sem_fn type.
 >         * sid-cpu.scm (gen-write-stack-structure): Replace parexec stuff
 >         with write stack stuff.
 >         (cgen-write.cxx): Replace per-insn writebacks with single write
 >         stack writeback. Add write stack reset function.
 >         (-gen-scache-semantic-fn insn): Replace parexec stuff with write
 >         stack stuff.
 >         * rtl-c.scm (xop): Clone operand into delayed operand if #:delayed
 >         estate attribute set.
 >         (delay): Set #:delayed attribute to calculated delay, update
 >         maximum delay of cpu, check (delay ...) usage.
 >         * operand.scm (<operand>): Add delayed slot to <operand>.
 >         * mach.scm (<cpu>): Add max-delay slot to <cpu>.
 >         * dev.scm (load-sid): Set APPLICATION to SID-SIMULATOR.
 >         * doc/rtl.texi (Expressions): Add section on (delay ...).
 > 
 > Index: utils.scm
 > ===================================================================
 > RCS file: /cvs/src/src/cgen/utils.scm,v
 > retrieving revision 1.7
 > diff -u -p -r1.7 utils.scm
 > --- utils.scm	7 Jan 2002 08:23:59 -0000	1.7
 > +++ utils.scm	9 Jan 2003 03:22:12 -0000
 > @@ -78,6 +78,10 @@
 >  
 >  (define (spaces n) (make-string n #\space))
 >  
 > +; simple list-generators
 > +(define (seq p q) (if (> p q) '() (cons p (seq (+ p 1) q))))
 > +(define (fill x n) (if (> n 0) (cons x (fill x (- n 1))) '()))
 > +
 >  ; Write N spaces to PORT, or the current output port if elided.
 >  
 >  (define (write-spaces n . port)
 > @@ -471,6 +475,17 @@
 >    (reverse! (list-drop n (reverse l)))
 >  )
 >  
 > +;; left fold
 > +(define (foldl kons accum lis) 
 > +  (if (null? lis) accum 
 > +      (foldl kons (kons accum (car lis)) (cdr lis))))
 > +
 > +;; right fold
 > +(define (foldr kons knil lis) 
 > +  (if (null? lis) knil 
 > +      (kons (car lis) (foldr kons knil (cdr lis)))))
 > +
 > +
 >  ; APL's +\ operation on a vector of numbers.
 >  
 >  (define (plus-scan l)
 > @@ -540,12 +555,13 @@
 >  
 >  ; Return intersection of two lists.
 >  
 > -(define (intersection l1 l2)
 > -  (cond ((null? l1) l1)
 > -	((null? l2) l2)
 > -	((memq (car l1) l2) (cons (car l1) (intersection (cdr l1) l2)))
 > -	(else (intersection (cdr l1) l2)))
 > -)
 > +(define (intersection a b) 
 > +  (foldl (lambda (l e) (if (memq e a) (cons e l) l)) '() b))
 > +
 > +; Return union of two lists.
 > +
 > +(define (union a b) 
 > +  (foldl (lambda (l e) (if (memq e l) l (cons e l))) a b))
 >  
 >  ; Return a count of the number of elements of list L1 that are in list L2.
 >  ; Uses memq.
 > Index: sid.scm
 > ===================================================================
 > RCS file: /cvs/src/src/cgen/sid.scm,v
 > retrieving revision 1.7
 > diff -u -p -r1.7 sid.scm
 > --- sid.scm	7 Jan 2002 08:23:59 -0000	1.7
 > +++ sid.scm	9 Jan 2003 03:22:18 -0000
 > @@ -10,7 +10,7 @@
 >  ; [It still does but that's to be fixed.]
 >  
 >  ; Specify which application.
 > -(set! APPLICATION 'SIMULATOR)
 > +(set! APPLICATION 'SID-SIMULATOR)
 >  
 >  ; Misc. state info.
 >  
 > @@ -118,7 +118,7 @@
 >  ; While processing operand reading (or writing), parallel execution support
 >  ; needs to be turned off, so it is up to the appropriate cgen-foo.c proc to
 >  ; set-with-parallel?! appropriately.
 > -(define -with-parallel? #f)
 > +(define -with-parallel? #t)
 >  (define (with-parallel?) -with-parallel?)
 >  (define (set-with-parallel?! flag) (set! -with-parallel? flag))
 >  
 > @@ -924,43 +924,6 @@
 >  	 (rtl-c++ INT yes? nil #:rtl-cover-fns? #t)))
 >  )
 >  
 > -; For parallel write post-processing, we don't want to defer setting the pc.
 > -; ??? Not sure anymore.
 > -;(method-make!
 > -; <pc> 'gen-set-quiet
 > -; (lambda (self estate mode index selector newval)
 > -;   (-op-gen-set-quiet self estate mode index selector newval)))
 > -;(method-make!
 > -; <pc> 'gen-set-trace
 > -; (lambda (self estate mode index selector newval)
 > -;   (-op-gen-set-trace self estate mode index selector newval)))
 > -
 > -; Name of C macro to access parallel execution operand support.
 > -
 > -(define -par-operand-macro "OPRND")
 > -
 > -; Return C code to fetch an operand's value and save it away for the
 > -; semantic handler.  This is used to handle parallel execution of several
 > -; instructions where all inputs of all insns are read before any outputs are
 > -; written.
 > -; For operands, the word `read' is only used in this context.
 > -
 > -(define (op:read op sfmt)
 > -  (let ((estate (estate-make-for-normal-rtl-c++ nil nil)))
 > -    (send op 'gen-read estate sfmt -par-operand-macro))
 > -)
 > -
 > -; Return C code to write an operand's value.
 > -; This is used to handle parallel execution of several instructions where all
 > -; outputs are written to temporary spots first, and then a final
 > -; post-processing pass is run to update cpu state.
 > -; For operands, the word `write' is only used in this context.
 > -
 > -(define (op:write op sfmt)
 > -  (let ((estate (estate-make-for-normal-rtl-c++ nil nil)))
 > -    (send op 'gen-write estate sfmt -par-operand-macro))
 > -)
 > -
 >  ; Default gen-read method.
 >  ; This is used to help support targets with parallel insns.
 >  ; Either this or gen-write (but not both) is used.
 > @@ -1010,36 +973,46 @@
 >  (method-make!
 >   <operand> 'cxmake-get
 >   (lambda (self estate mode index selector)
 > -   (let ((mode (if (mode:eq? 'DFLT mode)
 > -		   (send self 'get-mode)
 > -		   mode))
 > -	 (index (if index index (op:index self)))
 > -	 (selector (if selector selector (op:selector self))))
 > -     ; If the object is marked with the RAW attribute, access the hardware
 > -     ; object directly.
 > +   (let* ((mode (if (mode:eq? 'DFLT mode)
 > +		    (send self 'get-mode)
 > +		    mode))
 > +	  (hw (op:type self))
 > +	  (index (if index index (op:index self)))
 > +	  (selector (if selector selector (op:selector self)))
 > +	  (delayval (op:delay self))
 > +	  (md (mode:c-type mode))
 > +	  (name (if 
 > +		 (eq? (obj:name hw) 'h-memory)
 > +		 (string-append md "_memory")
 > +		 (gen-c-symbol (obj:name hw))))
 > +	  (getter (op:getter self))
 > +	  (def-val (cond ((obj-has-attr? self 'RAW)
 > +			  (send hw 'cxmake-get-raw estate mode index selector))
 > +			 (getter
 > +			  (let ((args (car getter))
 > +				(expr (cadr getter)))
 > +			    (rtl-c-expr mode expr
 > +					(if (= (length args) 0) nil
 > +					    (list (list (car args) 'UINT index)))
 > +					#:rtl-cover-fns? #t
 > +					#:output-language (estate-output-language estate))))
 > +			 (else
 > +			  (send hw 'cxmake-get estate mode index selector)))))
 > +     
 >       (logit 4 "<operand> cxmake-get self=" (obj:name self) " mode=" (obj:name mode)
 >  	    " index=" (obj:name index) " selector=" selector "\n")
 > -     (cond ((obj-has-attr? self 'RAW)
 > -	    (send (op:type self) 'cxmake-get-raw estate mode index selector))
 > -	   ; If the instruction could be parallely executed with others and
 > -	   ; we're doing read pre-processing, the operand has already been
 > -	   ; fetched, we just have to grab the cached value.
 > -	   ((with-parallel-read?)
 > -	    (cx:make-with-atlist mode
 > -				 (string-append -par-operand-macro
 > -						" (" (gen-sym self) ")")
 > -				 nil)) ; FIXME: want CACHED attr if present
 > -	   ((op:getter self)
 > -	    (let ((args (car (op:getter self)))
 > -		  (expr (cadr (op:getter self))))
 > -	      (rtl-c-expr mode expr
 > -			  (if (= (length args) 0)
 > -			      nil
 > -			      (list (list (car args) 'UINT index)))
 > -			  #:rtl-cover-fns? #t
 > -			  #:output-language (estate-output-language estate))))
 > -	   (else
 > -	    (send (op:type self) 'cxmake-get estate mode index selector)))))
 > +     
 > +     (if delayval
 > +	 (if (derived-operand? self)
 > +	     (error "delayed derived operands currently unsupported: " self)
 > +	     (let ((idx (if index (string-append ", " (-gen-hw-index index estate)) "")))	   
 > +	       (cx:make mode (string-append "lookahead ("
 > +					    (number->string delayval)
 > +					    ", tick, " 
 > +					    "buf." name "_writes, " 
 > +					    (cx:c def-val) 
 > +					    idx ")"))))
 > +	 def-val)))
 >  )
 >  
 >  
 > @@ -1049,16 +1022,9 @@
 >    (send (op:type op) 'gen-set-quiet estate mode index selector newval)
 >  )
 >  
 > -(define (-op-gen-set-quiet-parallel op estate mode index selector newval)
 > -  (string-append
 > -   (if (op-save-index? op)
 > -       (string-append "    " -par-operand-macro " (" (-op-index-name op) ")"
 > -		      " = " (-gen-hw-index index estate) ";\n")
 > -       "")
 > -   "    "
 > -   -par-operand-macro " (" (gen-sym op) ")"
 > -   " = " (cx:c newval) ";\n")
 > -)
 > +(define (-op-gen-delayed-set-quiet op estate mode index selector newval)
 > +  (-op-gen-delayed-set-maybe-trace op estate mode index selector newval #f))
 > +
 >  
 >  (define (-op-gen-set-trace op estate mode index selector newval)
 >    (string-append
 > @@ -1079,12 +1045,7 @@
 >         ;else
 >         (send (op:type op) 'gen-set-quiet estate mode index selector
 >  		(cx:make-with-atlist mode "opval" (cx:atlist newval))))
 > -   (if (and (with-profile?)
 > -	    (op:cond? op))
 > -       (string-append "    written |= (1ULL << "
 > -		      (number->string (op:num op))
 > -		      ");\n")
 > -       "")
 > +   
 >  ; TRACE_RESULT_<MODE> (cpu, abuf, hwnum, opnum, value);
 >  ; For each insn record array of operand numbers [or indices into
 >  ; operand instance table].
 > @@ -1122,21 +1083,41 @@
 >     "  }\n")
 >  )
 >  
 > -(define (-op-gen-set-trace-parallel op estate mode index selector newval)
 > -  (string-append
 > -   "  {\n"
 > -   "    " (mode:c-type mode) " opval = " (cx:c newval) ";\n"
 > -   (if (op-save-index? op)
 > -       (string-append "    " -par-operand-macro " (" (-op-index-name op) ")"
 > -		      " = " (-gen-hw-index index estate) ";\n")
 > -       "")
 > -   "    " -par-operand-macro " (" (gen-sym op) ")"
 > -   " = opval;\n"
 > -   (if (op:cond? op)
 > -       (string-append "    written |= (1ULL << "
 > -		      (number->string (op:num op))
 > -		      ");\n")
 > -       "")
 > +(define (-op-gen-delayed-set-trace op estate mode index selector newval)
 > +  (-op-gen-delayed-set-maybe-trace op estate mode index selector newval #t))
 > +
 > +(define (-op-gen-delayed-set-maybe-trace op estate mode index selector newval do-trace?)
 > +  (let* ((pad "    ")
 > +	 (hw (op:type op))
 > +	 (delayval (op:delay op))
 > +	 (md (mode:c-type mode))
 > +	 (name (if 
 > +		(eq? (obj:name hw) 'h-memory)
 > +		(string-append md "_memory")
 > +		(gen-c-symbol (obj:name hw))))
 > +	 (val (cx:c newval))
 > +	 (idx (if index (-gen-hw-index index estate) ""))
 > +	 (idx-args (if (equal? idx "") "" (string-append ", " idx)))
 > +	 )
 > +    
 > +    (string-append
 > +     "  {\n"
 > +
 > +     (if delayval 
 > +
 > +	 ;; delayed write: push it to the appropriate buffer
 > +	 (string-append	    
 > +	  pad md " opval = " val ";\n"
 > +	  pad "buf." name "_writes [(tick + " (number->string delayval)
 > +	  ") % @prefix@::pipe_sz].push (@prefix@::write<" md ">(pc, opval" idx-args "));\n")
 > +
 > +	 ;; else, uh, we should never have been called!
 > +	 (error "-op-gen-delayed-set-maybe-trace called on non-delayed operand"))       
 > +     
 > +     
 > +     (if do-trace?
 > +
 > +	 (string-append
 >  ; TRACE_RESULT_<MODE> (cpu, abuf, hwnum, opnum, value);
 >  ; For each insn record array of operand numbers [or indices into
 >  ; operand instance table].
 > @@ -1169,8 +1150,8 @@
 >  	   ""))
 >     "opval << dec << \"  \";\n"
 >     "  }\n")
 > -)
 > -
 > +	 ;; else no tracing is emitted
 > +	 ""))))
 >  
 >  ; Return C code to set the value of an operand.
 >  ; NEWVAL is a <c-expr> object of the value to store.
 > @@ -1189,8 +1170,8 @@
 >  	 (selector (if selector selector (op:selector self))))
 >       (cond ((obj-has-attr? self 'RAW)
 >  	    (send (op:type self) 'gen-set-quiet-raw estate mode index selector newval))
 > -	   ((with-parallel-write?)
 > -	    (-op-gen-set-quiet-parallel self estate mode index selector newval))
 > +	   ((op:delay self)
 > +	    (-op-gen-delayed-set-quiet self estate mode index selector newval))
 >  	   (else
 >  	    (-op-gen-set-quiet self estate mode index selector newval)))))
 >  )
 > @@ -1212,26 +1193,12 @@
 >  	 (selector (if selector selector (op:selector self))))
 >       (cond ((obj-has-attr? self 'RAW)
 >  	    (send (op:type self) 'gen-set-quiet-raw estate mode index selector newval))
 > -	   ((with-parallel-write?)
 > -	    (-op-gen-set-trace-parallel self estate mode index selector newval))
 > +	   ((op:delay self)
 > +	    (-op-gen-delayed-set-trace self estate mode index selector newval))
 >  	   (else
 >  	    (-op-gen-set-trace self estate mode index selector newval)))))
 >  )
 >  
 > -; Define and undefine C macros to tuck away details of instruction format used
 > -; in the parallel execution functions.  See gen-define-field-macro for a
 > -; similar thing done for extraction/semantic functions.
 > -
 > -(define (gen-define-parallel-operand-macro sfmt)
 > -  (string-append "#define " -par-operand-macro "(f) "
 > -		 "par_exec->operands."
 > -		 (gen-sym sfmt)
 > -		 ".f\n")
 > -)
 > -
 > -(define (gen-undef-parallel-operand-macro sfmt)
 > -  (string-append "#undef " -par-operand-macro "\n")
 > -)
 >  
 >  ; Operand profiling and parallel execution support.
 >  
 > Index: sid-decode.scm
 > ===================================================================
 > RCS file: /cvs/src/src/cgen/sid-decode.scm,v
 > retrieving revision 1.8
 > diff -u -p -r1.8 sid-decode.scm
 > --- sid-decode.scm	7 Feb 2002 18:46:19 -0000	1.8
 > +++ sid-decode.scm	9 Jan 2003 03:22:18 -0000
 > @@ -47,10 +47,7 @@ bool @prefix@_idesc::idesc_table_initial
 >  	       (if pbb?
 >  		   "0, "
 >  		   (string-append (-gen-sem-fn-name insn) ", "))
 > -	       "")
 > -           (if (with-parallel?)
 > -               (string-append (-gen-write-fn-name sfmt) ", ")
 > -               "")
 > +	       "") 
 >  	   "\"" (string-upcase name) "\", "
 >  	   (gen-cpu-insn-enum (current-cpu) insn)
 >  	   ", "
 > @@ -131,25 +128,6 @@ bool @prefix@_idesc::idesc_table_initial
 >  )
 >  
 >  
 > -;; and the same for writeback functions
 > -
 > -(define (-gen-write-fn-name sfmt)
 > -  (string-append "@prefix@_write_" (gen-sym sfmt))
 > -)
 > -
 > -
 > -(define (-gen-write-fn-decls)
 > -  (string-write
 > -   "// Decls of each writeback fn.\n\n"
 > -   "using @cpu@::@prefix@_write_fn;\n"
 > -   (string-list-map (lambda (sfmt)
 > -		      (string-list "extern @prefix@_write_fn "
 > -				   (-gen-write-fn-name sfmt)
 > -				   ";\n"))
 > -		    (current-sfmt-list))
 > -   "\n"
 > -   )
 > -)
 >  
 >  
 >  ; idesc, argbuf, and scache types
 > @@ -164,14 +142,9 @@ struct @cpu@_cpu;
 >  struct @prefix@_scache;
 >  "
 >     (if (with-parallel?)
 > -       "struct @prefix@_parexec;\n" "")
 > -   (if (with-parallel?)
 > -       "typedef void (@prefix@_sem_fn) (@cpu@_cpu* cpu, @prefix@_scache* sem, @prefix@_parexec* par_exec);"
 > +       "typedef void (@prefix@_sem_fn) (@cpu@_cpu* cpu, @prefix@_scache* sem, int tick, @prefix@::write_stacks &buf);"
 >         "typedef sem_status (@prefix@_sem_fn) (@cpu@_cpu* cpu, @prefix@_scache* sem);")
 >     "\n"
 > -   (if (with-parallel?)
 > -       "typedef sem_status (@prefix@_write_fn) (@cpu@_cpu* cpu, @prefix@_scache* sem, @prefix@_parexec* par_exec);"
 > -       "")
 >     "\n"   
 >  "
 >  // Instruction descriptor.
 > @@ -192,12 +165,6 @@ struct @prefix@_idesc {
 >    @prefix@_sem_fn* execute;\n\n"
 >         "")
 >  
 > -   (if (with-parallel?)
 > -       "\
 > -  // scache write executor for this insn
 > -  @prefix@_write_fn* writeback;\n\n"
 > -       "")
 > -
 >     "\
 >    const char* insn_name;
 >    enum @prefix@_insn_type sem_index;
 > @@ -300,15 +267,6 @@ struct @prefix@_scache {
 >    // argument buffer
 >    @prefix@_sem_fields fields;
 >  
 > -" (if (or (with-profile?) (with-parallel-write?))
 > -      (string-append "
 > -  // writeback flags
 > -  // Only used if profiling or parallel execution support enabled during
 > -  // file generation.
 > -  unsigned long long written;
 > -")
 > -      "") "
 > -
 >    // decode given instruction
 >    void decode (@cpu@_cpu* current_cpu, PCADDR pc, @prefix@_insn_word base_insn, @prefix@_insn_word entire_insn);
 >  };
 > @@ -718,6 +676,11 @@ void
 >  #ifndef @PREFIX@_DECODE_H
 >  #define @PREFIX@_DECODE_H
 >  
 > +namespace @prefix@ {
 > +// forward declaration of struct in -defs.h
 > +struct write_stacks;
 > +}
 > +
 >  namespace @cpu@ {
 >  
 >  using namespace cgen;
 > @@ -739,10 +702,6 @@ typedef UINT @prefix@_insn_word;
 >     ; There's no pressing need for it though.
 >     (if (with-scache?)
 >         -gen-sem-fn-decls
 > -       "")
 > -
 > -   (if (with-parallel?)
 > -       -gen-write-fn-decls
 >         "")
 >  
 >     "\
 > Index: sid-cpu.scm
 > ===================================================================
 > RCS file: /cvs/src/src/cgen/sid-cpu.scm,v
 > retrieving revision 1.7
 > diff -u -p -r1.7 sid-cpu.scm
 > --- sid-cpu.scm	7 Feb 2002 18:46:19 -0000	1.7
 > +++ sid-cpu.scm	9 Jan 2003 03:22:23 -0000
 > @@ -199,6 +199,34 @@ namespace @arch@ {
 >     (-gen-hardware-struct #f (find hw-need-storage? (current-hw-list))))
 >  )
 >  
 > +(define (-gen-hw-stream-and-destream-fns) 
 > +  (let* ((sa string-append)
 > +	 (regs (find hw-need-storage? (current-hw-list)))
 > +	 (reg-dim (lambda (r) 
 > +		    (let ((dims (-hw-vector-dims r)))
 > +		      (if (equal? 0 (length dims)) 
 > +			  "0"
 > +			  (number->string (car dims))))))
 > +	 (stream-reg (lambda (r) 
 > +		       (let ((rname (sa "hardware." (gen-c-symbol (obj:name r)))))
 > +			 (if (hw-scalar? r)
 > +			     (sa "    ost << " rname " << ' ';\n")
 > +			     (sa "    for (int i = 0; i < " (reg-dim r) 
 > +				 "; i++)\n      ost << " rname "[i] << ' ';\n")))))
 > +	 (destream-reg (lambda (r) 
 > +			 (let ((rname (sa "hardware." (gen-c-symbol (obj:name r)))))
 > +			   (if (hw-scalar? r)
 > +			       (sa "    ist >> " rname ";\n")
 > +			       (sa "    for (int i = 0; i < " (reg-dim r) 
 > +				   "; i++)\n      ist >> " rname "[i];\n"))))))
 > +    (sa
 > +     "  void stream_cgen_hardware (std::ostream &ost) const \n  {\n"
 > +     (string-map stream-reg regs)
 > +     "  }\n"
 > +     "  void destream_cgen_hardware (std::istream &ist) \n  {\n"
 > +     (string-map destream-reg regs)
 > +     "  }\n")))
 > +
 >  ; Generate <cpu>-cpu.h
 >  
 >  (define (cgen-cpu.h)
 > @@ -222,6 +250,8 @@ public:
 >  
 >     -gen-hardware-types
 >  
 > +   -gen-hw-stream-and-destream-fns
 > +
 >     "  // C++ register access function templates\n"
 >     "#define current_cpu this\n\n"
 >     (lambda ()
 > @@ -295,68 +325,161 @@ typedef struct {
 >     )
 >  )
 >  
 > -; Utility of gen-parallel-exec-type to generate the definition of one
 > -; structure in PAREXEC.
 > -; SFMT is an <sformat> object.
 >  
 > -(define (gen-parallel-exec-elm sfmt)
 > -  (string-append
 > -   "    struct { /* " (obj:comment sfmt) " */\n"
 > -   (let ((sem-ops
 > -	  ((if (with-parallel-write?) sfmt-out-ops sfmt-in-ops) sfmt)))
 > -     (if (null? sem-ops)
 > -	 "      int empty;\n"
 > -	 (string-map
 > -	  (lambda (op)
 > -	    (logit 2 "Processing operand " (obj:name op) " of format "
 > -		   (obj:name sfmt) " ...\n")
 > -	      (if (with-parallel-write?)
 > -		  (let ((index-type (and (op-save-index? op)
 > -					 (gen-index-type op sfmt))))
 > -		    (string-append "      " (gen-type op)
 > -				   " " (gen-sym op) ";\n"
 > -				   (if index-type
 > -				       (string-append "      " index-type 
 > -						      " " (gen-sym op) "_idx;\n")
 > -				       "")))
 > -		  (string-append "      "
 > -				 (gen-type op)
 > -				 " "
 > -				 (gen-sym op)
 > -				 ";\n")))
 > -	  sem-ops)))
 > -   "    } " (gen-sym sfmt) ";\n"
 > -   )
 > -)
 > +
 > +
 > +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 > +;;; begin stack-based write schedule
 > +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 > +
 > +(define useful-mode-names '(BI QI HI SI DI UQI UHI USI UDI SF DF))
 > +
 > +;(define (-calculated-memory-write-buffer-size)
 > +;  (let* ((is-mem? (lambda (op) (eq? (hw-sem-name (op:type op)) 'h-memory)))
 > +;	 (count-mem-writes
 > +;	  (lambda (sfmt) (length (find is-mem? (sfmt-out-ops sfmt))))))
 > +;    (apply max (append '(0) (map count-mem-writes (current-sfmt-list))))))
 > +
 > +
 > +;; note: this doesn't really correctly approximate the worst case. user-supplied functions
 > +;; might rewrite the pipeline extensively while it's running. 
 > +;(define (-worst-case-number-of-writes-to hw-name)
 > +;  (let* ((sfmts (current-sfmt-list))
 > +;	 (out-ops (map sfmt-out-ops sfmts))
 > +;	 (pred (lambda (op) (equal? hw-name (gen-c-symbol (obj:name (op:type op))))))
 > +;	 (filtered-ops (map (lambda (ops) (find pred ops)) out-ops)))
 > +;    (apply max (cons 0 (map (lambda (ops) (length ops)) filtered-ops)))))
 > +	 
 > +(define (-hw-gen-write-stack-decl nm mode)
 > +  (let* (
 > +; for the time being, we're disabling this size-estimation stuff and just
 > +; requiring the user to supply a parameter WRITE_BUF_SZ before they include -defs.h
 > +;	 (pipe-sz (+ 1 (max-delay (cpu-max-delay (current-cpu)))))
 > +;	 (sz (* pipe-sz (-worst-case-number-of-writes-to nm))))
 > +	 
 > +	 (mode-pad (spaces (- 4 (string-length mode))))
 > +	 (stack-name (string-append nm "_writes")))
 > +    (string-append
 > +     "  write_stack< write<" mode "> >" mode-pad "\t" stack-name "\t[pipe_sz];\n")))
 > +
 > +
 > +(define (-hw-gen-write-struct-decl)
 > +  (let* ((dims (-worst-case-index-dims))
 > +	 (sa string-append)
 > +	 (ns number->string)
 > +	 (idxs (seq 0 (- dims 1)))
 > +	 (ctor (sa "write (PCADDR _pc, MODE _val"
 > +		   (string-map (lambda (x) (sa ", USI _idx" (ns x) "=0")) idxs)
 > +		   ") : pc(_pc), val(_val)"
 > +		   (string-map (lambda (x) (sa ", idx" (ns x) "(_idx" (ns x) ")")) idxs)
 > +		   " {} \n"))
 > +	 (idx-fields (string-map (lambda (x) (sa "    USI idx" (ns x) ";\n")) idxs)))
 > +    (sa
 > +     "\n\n"
 > +     "  template <typename MODE>\n"
 > +     "  struct write\n"
 > +     "  {\n"
 > +     "    USI pc;\n"
 > +     "    MODE val;\n"
 > +     idx-fields
 > +     "    " ctor 
 > +     "    write() {}\n"
 > +     "  };\n" )))
 > +	       
 > +(define (-hw-vector-dims hw) (elm-get (hw-type hw) 'dimensions))			    
 > +(define (-worst-case-index-dims)
 > +  (apply max
 > +	 (append '(1) ; for memory accesses
 > +		 (map (lambda (hw) (length (-hw-vector-dims hw))) 
 > +		      (find (lambda (hw) (not (scalar? hw))) (current-hw-list))))))
 > +
 > +(define (-gen-writestacks)
 > +  (let* ((hw (find register? (current-hw-list)))
 > +	 (modes useful-mode-names) 
 > +	 (hw-pairs (map (lambda (h) (list (gen-c-symbol (obj:name h))
 > +					    (obj:name (hw-mode h)))) 
 > +			hw))
 > +	 (mem-pairs (map (lambda (m) (list (string-append m "_memory") m)) 
 > +			 modes))
 > +	 (all-pairs (append mem-pairs hw-pairs))
 > +
 > +	 (h1 "\n\n// write stacks used in parallel execution\n\n  struct write_stacks\n  {\n  // types of stacks\n\n")
 > +	 (wb (string-append
 > +	      "\n\n  // unified writeback function (defined in @prefix@-write.cc)"
 > +	        "\n  void writeback (int tick, @cpu@::@cpu@_cpu* current_cpu);"
 > +		"\n  // unified write-stack clearing function (defined in @prefix@-write.cc)"
 > +	        "\n  void reset ();"))
 > +	 (zz "\n\n  }; // end struct @prefix@::write_stacks \n\n")
 > +	 (st (string-append 
 > +	      "  std::ostream &operator<< (std::ostream &ost, const @prefix@::write_stacks &s);\n"
 > +	      "  std::istream &operator>> (std::istream &ist, @prefix@::write_stacks &s);\n"))
 > +	 )
 > +    (string-append	
 > +     (-hw-gen-write-struct-decl)
 > +     (foldl (lambda (s pair) (string-append s (apply -hw-gen-write-stack-decl pair))) h1 all-pairs)	  
 > +     wb
 > +     zz
 > +     st)))
 > +
 > +
 > +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 > +;;; end stack-based write schedule
 > +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 > +	  
 >  
 >  ; Generate the definition of the structure that holds register values, etc.
 > -; for use during parallel execution.  When instructions are executed parallelly
 > -; either
 > -; - their inputs are read before their outputs are written.  Thus we have to
 > -; fetch the input values of several instructions before executing any of them.
 > -; - or their outputs are queued here first and then written out after all insns
 > -; have executed.
 > -; The fetched/queued values are stored in an array of PAREXEC structs, one
 > -; element per instruction.
 > +; for use during parallel execution.  
 >  
 > -(define (gen-parallel-exec-type)
 > -  (logit 2 "Generating PAREXEC type ...\n")
 > -  (string-append
 > -   (if (with-parallel-write?)
 > -       "/* Queued output values of an instruction.  */\n"
 > -       "/* Fetched input values of an instruction.  */\n")
 > -   "\
 > +(define (gen-write-stack-structure)
 > +  (let (;(membuf-sz (-calculated-memory-write-buffer-size))
 > +	(max-delay (cpu-max-delay (current-cpu))))
 > +    (logit 2 "Generating write stack structure ...\n")
 > +    (string-append
 > +     "  static const int max_delay = "   
 > +     (number->string max-delay) ";\n"
 > +     "  static const int pipe_sz = "     
 > +     (number->string (+ 1 max-delay)) "; // max_delay + 1\n"
 >  
 > -struct @prefix@_parexec {
 > -  union {\n"
 > -   (string-map gen-parallel-exec-elm (current-sfmt-list))
 > -   "\
 > -  } operands;
 > -  /* For conditionally written operands, bitmask of which ones were.  */
 > -  unsigned long long written;
 > -};\n\n"
 > -   )
 > -)
 > +"
 > +#ifndef WRITE_BUF_SZ
 > +#define WRITE_BUF_SZ 1
 > +#endif
 > +
 > +  template <typename ELT> 
 > +  struct write_stack 
 > +  {
 > +    int t;
 > +    const int sz;
 > +    ELT buf[WRITE_BUF_SZ];
 > +
 > +    write_stack       ()             : t(-1), sz(WRITE_BUF_SZ) {}
 > +    inline bool empty ()             { return (t == -1); }
 > +    inline void clear ()             { t = -1; }
 > +    inline void pop   ()             { assert (t > -1); t--;}
 > +    inline void push  (const ELT &e) { assert (t+1 < sz); buf [++t] = e;}
 > +    inline ELT &top   ()             { return buf [t>0 ? ( t<sz ? t : sz-1) : 0];}
 > +  };
 > +
 > +  // look ahead for latest write with index = idx, where time of write is
 > +  // <= dist steps from base (present) in write_stack array st.
 > +  // returning def if no scheduled write is found.
 > +
 > +  template <typename STKS, typename VAL>
 > +  inline VAL lookahead (int dist, int base, STKS &st, VAL def, int idx=0)
 > +  {
 > +    for (; dist > 0; --dist)
 > +    {
 > +      write_stack <VAL> &v = st [(base + dist) % pipe_sz];
 > +      for (int i = v.t; i > 0; --i) 
 > +	  if (v.buf [i].idx0 == idx) return v.buf [i];
 > +    }
 > +    return def;
 > +  }
 > +
 > +"
 > + 
 > +     (-gen-writestacks)     
 > +     )))
 >  
 >  ; Generate the TRACE_RECORD struct definition.
 >  
 > @@ -375,16 +498,26 @@ typedef struct @prefix@_trace_record {
 >  
 >  ; Generate <cpu>-defs.h
 >  
 > +(define semantics-processed? #f)
 > +
 >  (define (cgen-defs.h)
 >    (logit 1 "Generating " (gen-cpu-name) " defs.h ...\n")
 >    (assert-keep-one)
 > -
 > +  
 >    ; Turn parallel execution support on if cpu needs it.
 >    (set-with-parallel?! (state-parallel-exec?))
 >  
 >    ; Initialize rtl->c generation.
 >    (rtl-c-config! #:rtl-cover-fns? #t)
 >  
 > +  (sim-analyze-insns!)
 > +
 > +  ; ensure semantc analysis has happened, in time
 > +  ; for the pipeline size to be calculated
 > +  (if (and (with-parallel?)
 > +	   (not semantics-processed?))
 > +      (error "defs.h must be generated after sem.cxx for parallel-execution type CPUs"))
 > +
 >    (string-write
 >     (gen-copyright "CPU family header for @cpu@ / @prefix@."
 >  		  copyright-red-hat package-red-hat-simulators)
 > @@ -392,15 +525,26 @@ typedef struct @prefix@_trace_record {
 >  #ifndef DEFS_@PREFIX@_H
 >  #define DEFS_@PREFIX@_H
 >  
 > +#include <stack>
 > +#include \"cgen-types.h\"
 > +
 > +// forward declaration\n\n  
 >  namespace @cpu@ {
 > +struct @cpu@_cpu;
 > +}
 > +
 > +namespace @prefix@ {
 > +
 > +using namespace cgen;
 > +
 >  \n"
 >  
 >     (if (with-parallel?)
 > -       gen-parallel-exec-type
 > -       "")
 > +       gen-write-stack-structure
 > +       "// no parallel-execution support\n")
 >  
 >     "\
 > -} // end @cpu@ namespace
 > +} // end @prefix@ namespace
 >  
 >  #endif /* DEFS_@PREFIX@_H */\n"
 >     )
 > @@ -417,47 +561,132 @@ namespace @cpu@ {
 >  ; Return C code to fetch and save all output operands to instructions with
 >  ; <sformat> SFMT.
 >  
 > -(define (-gen-write-args sfmt)
 > -  (string-map (lambda (op) (op:write op sfmt))
 > -	      (sfmt-out-ops sfmt))
 > -)
 > +; Generate <cpu>-write.cxx.
 >  
 > -; Utility of gen-write-fns to generate a writer function for <sformat> SFMT.
 >  
 > -(define (-gen-write-fn sfmt)
 > -  (logit 2 "Processing write function for \"" (obj:name sfmt) "\" ...\n")
 > -  (string-list
 > -   "\nsem_status\n"
 > -   (-gen-write-fn-name sfmt) " (@cpu@_cpu* current_cpu, @prefix@_scache* sem, @prefix@_parexec* par_exec)\n"
 > -   "{\n"
 > -   (if (with-scache?)
 > -       (gen-define-field-macro sfmt)
 > -       "")
 > -   (gen-define-parallel-operand-macro sfmt)
 > -   "  @prefix@_scache* abuf = sem;\n"
 > -   "  unsigned long long written = abuf->written;\n"
 > -   "  PCADDR pc = abuf->addr;\n"
 > -   "  PCADDR npc = 0; // dummy value for branches\n"
 > -   "  sem_status status = SEM_STATUS_NORMAL; // ditto\n"
 > -   "\n"
 > -   (-gen-write-args sfmt)
 > -   "\n"
 > -   "  return status;\n"
 > -   (gen-undef-parallel-operand-macro sfmt)
 > -   (if (with-scache?)
 > -       (gen-undef-field-macro sfmt)
 > -       "")
 > -   "}\n\n")
 > -)
 > +(define (-gen-register-writer nm mode dims)
 > +  (let* ((pad "    ")
 > +	 (sa string-append)
 > +	 (idx-args (string-map (lambda (x) (sa "w.idx" (number->string x) ", ")) 
 > +			       (seq 0 (- dims 1)))))
 > +    (sa pad "while (! " nm "_writes[tick].empty())\n"
 > +	pad "{\n"
 > +	pad "  write<" mode "> &w = " nm "_writes[tick].top();\n"
 > +	pad "  current_cpu->" nm "_set(" idx-args "w.val);\n"
 > +	pad "  " nm "_writes[tick].pop();\n"
 > +	pad "}\n\n")))
 > +
 > +(define (-gen-memory-writer nm mode dims)
 > +  (let* ((pad "    ")
 > +	 (sa string-append)
 > +	 (idx-args (string-map (lambda (x) (sa ", w.idx" (number->string x) "")) 
 > +			       (seq 0 (- dims 1)))))
 > +    (sa pad "while (! " nm "_writes[tick].empty())\n"
 > +	pad "{\n"
 > +	pad "  write<" mode "> &w = " nm "_writes[tick].top();\n"
 > +	pad "  current_cpu->SETMEM" mode " (w.pc" idx-args ", w.val);\n"
 > +	pad "  " nm "_writes[tick].pop();\n"
 > +	pad "}\n\n")))
 > +
 > +
 > +(define (-gen-reset-fn)
 > +  (let* ((sa string-append)
 > +	 (objs (append (map (lambda (h) (gen-c-symbol (obj:name h))) 
 > +			    (find register? (current-hw-list)))
 > +		       (map (lambda (m) (sa m "_memory")) useful-mode-names)))
 > +	 (clr (lambda (elt) (sa "    clear_stacks (" elt "_writes);\n"))))
 > +    (sa 
 > +     "  template <typename ST> \n"
 > +     "  static void clear_stacks (ST &st)\n"
 > +     "  {\n"
 > +     "    for (int i = 0; i < @prefix@::pipe_sz; i++)\n"
 > +     "      st[i].clear();\n"
 > +     "  }\n\n"
 > +     "  void @prefix@::write_stacks::reset ()\n  {\n"
 > +     (string-map clr objs)
 > +     "  }")))
 > +
 > +(define (-gen-unified-write-fn) 
 > +  (let* ((hw (find register? (current-hw-list)))
 > +	 (modes useful-mode-names)	
 > +	 (hw-triples (map (lambda (h) (list (gen-c-symbol (obj:name h))
 > +					    (obj:name (hw-mode h))
 > +					    (length (-hw-vector-dims h)))) 
 > +			hw))
 > +	 (mem-triples (map (lambda (m) (list (string-append m "_memory") m 1)) 
 > +			 modes)))
 >  
 > -(define (-gen-write-fns)
 > -  (logit 2 "Processing writer functions ...\n")
 > -  (string-write-map (lambda (sfmt) (-gen-write-fn sfmt))
 > -		    (current-sfmt-list))
 > -)
 > +    (logit 2 "Generating writer function ...\n") 
 > +    (string-append
 > +     "
 > +
 > +  void @prefix@::write_stacks::writeback (int tick, @cpu@::@cpu@_cpu* current_cpu) 
 > +  {
 > +"
 > +     "\n    // register writeback loops\n"
 > +     (string-map (lambda (t) (apply -gen-register-writer t)) hw-triples)
 > +     "\n    // memory writeback loops\n"
 > +     (string-map (lambda (t) (apply -gen-memory-writer t)) mem-triples)
 > +"
 > +  }
 > +")))
 >  
 >  
 > -; Generate <cpu>-write.cxx.
 > +(define (-gen-stacks-stream-and-destream-fns) 
 > +  (let* ((sa string-append)
 > +	 (regs (find hw-need-storage? (current-hw-list)))
 > +	 (reg-dim (lambda (r) 
 > +		    (let ((dims (-hw-vector-dims r)))
 > +		      (if (equal? 0 (length dims)) 
 > +			  "0"
 > +			  (number->string (car dims))))))
 > +	 (write-stacks 
 > +	  (map (lambda (n) (sa n "_writes"))
 > +	       (append (map (lambda (r) (gen-c-symbol (obj:name r))) regs)
 > +		       (map (lambda (m) (sa m "_memory")) useful-mode-names))))
 > +	 (stream-stacks (lambda (s) (sa "    stream_stacks ( s." s ", ost);\n")))
 > +	 (destream-stacks (lambda (s) (sa "    destream_stacks ( s." s ", ist);\n")))
 > +	 (stack-boilerplate
 > +	  (sa
 > +	   "  template <typename ST> \n"
 > +	   "  void stream_stacks (const ST &st, std::ostream &ost)\n"
 > +	   "  {\n"
 > +	   "    for (int i = 0; i < @prefix@::pipe_sz; i++)\n"
 > +	   "    {\n"
 > +	   "      ost << st[i].t << ' ';\n"
 > +	   "      for (int j = 0; j <= st[i].t; j++)\n"
 > +	   "      {\n"
 > +	   "        ost << st[i].buf[j].pc << ' ';\n"
 > +	   "        ost << st[i].buf[j].val << ' ';\n"
 > +	   "        ost << st[i].buf[j].idx0 << ' ';\n"
 > +	   "      }\n"
 > +	   "    }\n"
 > +	   "  }\n"
 > +	   "  \n"
 > +	   "  template <typename ST> \n"
 > +	   "  void destream_stacks (ST &st, std::istream &ist)\n"
 > +	   "  {\n"
 > +	   "    for (int i = 0; i < @prefix@::pipe_sz; i++)\n"
 > +	   "    {\n"
 > +	   "      ist >> st[i].t;\n"
 > +	   "      for (int j = 0; j <= st[i].t; j++)\n"
 > +	   "      {\n"
 > +	   "        ist >> st[i].buf[j].pc;\n"
 > +	   "        ist >> st[i].buf[j].val;\n"
 > +	   "        ist >> st[i].buf[j].idx0;\n"
 > +	   "      }\n"
 > +	   "    }\n"
 > +	   "  }\n"
 > +	   "  \n")))
 > +    (sa stack-boilerplate
 > +	"  std::ostream & @prefix@::operator<< (std::ostream &ost, const @prefix@::write_stacks &s)\n   {\n"
 > +	(string-map stream-stacks write-stacks)
 > +	"\n    return ost;\n"
 > +	"  }\n"
 > +	"  std::istream & @prefix@::operator>> (std::istream &ist, @prefix@::write_stacks &s)\n   {\n"
 > +	(string-map destream-stacks write-stacks)
 > +	"\n    return ist;\n"
 > +	"  }\n")))
 >  
 >  (define (cgen-write.cxx)
 >    (logit 1 "Generating " (gen-cpu-name) " write.cxx ...\n")
 > @@ -465,8 +694,8 @@ namespace @cpu@ {
 >  
 >    (sim-analyze-insns!)
 >  
 > -  ; Turn parallel execution support off.
 > -  (set-with-parallel?! #f)
 > +  ; Turn parallel execution support on if needed.
 > +  (set-with-parallel?! (state-parallel-exec?))
 >  
 >    ; Tell the rtx->c translator we are the simulator.
 >    (rtl-c-config! #:rtl-cover-fns? #t)
 > @@ -478,12 +707,18 @@ namespace @cpu@ {
 >     "\
 >  
 >  #include \"@cpu@.h\"
 > -using namespace @cpu@;
 > -
 > +#include <iostream>
 >  "
 > -   -gen-write-fns
 > +   (if (with-parallel?) 
 > +       (string-append
 > +	 (-gen-reset-fn)
 > +	 (-gen-unified-write-fn)
 > +	 (-gen-stacks-stream-and-destream-fns))
 > +
 > +       "// no write-stack functions required\n")
 >     )
 >  )
 > +
 >  
 >  ; ******************
 >  ; cgen-semantics.cxx
 > @@ -521,19 +756,14 @@ using namespace @cpu@;
 >  	 "sem_status\n")
 >       "@prefix@_sem_" (gen-sym insn)
 >       (if (with-parallel?)
 > -	 " (@cpu@_cpu* current_cpu, @prefix@_scache* sem, @prefix@_parexec* par_exec)\n"
 > +	 (string-append " (@cpu@_cpu* current_cpu, @prefix@_scache* sem, const int tick, \n\t"
 > +			"@prefix@::write_stacks &buf)\n")
 >  	 " (@cpu@_cpu* current_cpu, @prefix@_scache* sem)\n")
 >       "{\n"
 >       (gen-define-field-macro (insn-sfmt insn))
 > -     (if (with-parallel?)
 > -	 (gen-define-parallel-operand-macro (insn-sfmt insn))
 > -	 "")
 >       "  sem_status status = SEM_STATUS_NORMAL;\n"
 >       "  @prefix@_scache* abuf = sem;\n"
 > -     ; Unconditionally written operands are not recorded here.
 > -     (if (or (with-profile?) (with-parallel-write?))
 > -	 "  unsigned long long written = 0;\n"
 > -	 "")
 > +
 >       ; The address of this insn, needed by extraction and semantic code.
 >       ; Note that the address recorded in the cpu state struct is not used.
 >       ; For faster engines that copy will be out of date.
 > @@ -542,23 +772,12 @@ using namespace @cpu@;
 >       "\n"
 >       (gen-semantic-code insn)
 >       "\n"
 > -     ; Only update what's been written if some are conditionally written.
 > -     ; Otherwise we know they're all written so there's no point in
 > -     ; keeping track.
 > -     (if (or (with-profile?) (with-parallel-write?))
 > -	 (if (-any-cond-written? (insn-sfmt insn))
 > -	     "  abuf->written = written;\n"
 > -	     "")
 > -	 "")
 >       (if cti?
 >  	 "  current_cpu->done_cti_insn (npc, status);\n"
 >  	 "  current_cpu->done_insn (npc, status);\n")
 >       (if (with-parallel?)
 >  	 ""
 >  	 "  return status;\n")
 > -     (if (with-parallel?)
 > -	 (gen-undef-parallel-operand-macro (insn-sfmt insn))
 > -	 "")
 >       (gen-undef-field-macro (insn-sfmt insn))
 >       "}\n\n"
 >       ))
 > @@ -576,13 +795,14 @@ using namespace @cpu@;
 >  ; Each instruction is implemented in its own function.
 >  
 >  (define (cgen-semantics.cxx)
 > -  (logit 1 "Generating " (gen-cpu-name) " semantics.cxx ...\n")
 > +  (logit 1 "Generating " (gen-cpu-name) " semantics.cxx ")
 >    (assert-keep-one)
 >  
 >    (sim-analyze-insns!)
 >  
 >    ; Turn parallel execution support on if cpu needs it.
 >    (set-with-parallel?! (state-parallel-exec?))
 > +  (logit 1 (if (state-parallel-exec?) " (parallel) ...\n" "...\n"))
 >  
 >    ; Tell the rtx->c translator we are the simulator.
 >    (rtl-c-config! #:rtl-cover-fns? #t)
 > @@ -590,6 +810,8 @@ using namespace @cpu@;
 >    ; Indicate we're currently not generating a pbb engine.
 >    (set-current-pbb-engine?! #f)
 >  
 > +  (set! semantics-processed? #t)
 > +
 >    (string-write
 >     (gen-copyright "Simulator instruction semantics for @prefix@."
 >  		  copyright-red-hat package-red-hat-simulators)
 > @@ -598,6 +820,7 @@ using namespace @cpu@;
 >  #include \"@cpu@.h\"
 >  
 >  using namespace @cpu@; // FIXME: namespace organization still wip
 > +using namespace @prefix@; // FIXME: namespace organization still wip
 >  
 >  #define GET_ATTR(name) GET_ATTR_##name ()
 >  
 > @@ -655,9 +878,6 @@ using namespace @cpu@; // FIXME: namespa
 >       (if (with-scache?)
 >  	 (gen-define-field-macro (insn-sfmt insn))
 >  	 "")
 > -     (if parallel?
 > -	 (gen-define-parallel-operand-macro (insn-sfmt insn))
 > -	 "")
 >       ; Unconditionally written operands are not recorded here.
 >       (if (or (with-profile?) (with-parallel-write?))
 >  	 "      unsigned long long written = 0;\n"
 > @@ -694,9 +914,6 @@ using namespace @cpu@; // FIXME: namespa
 >  	 (string-append "      pbb_br_npc = npc;\n"
 >  			"      pbb_br_status = br_status;\n")
 >  	 "")
 > -     (if parallel?
 > -	 (gen-undef-parallel-operand-macro (insn-sfmt insn))
 > -	 "")
 >       (if (with-scache?)
 >  	 (gen-undef-field-macro (insn-sfmt insn))
 >  	 "")
 > @@ -950,9 +1167,6 @@ struct @prefix@_pbb_label {
 >  			"      vpc = vpc + 1;\n")
 >  	 "")
 >       (gen-define-field-macro (sfrag-sfmt frag))
 > -     (if parallel?
 > -	 (gen-define-parallel-operand-macro (sfrag-sfmt frag))
 > -	 "")
 >       ; Unconditionally written operands are not recorded here.
 >       (if (or (with-profile?) (with-parallel-write?))
 >  	 "      unsigned long long written = 0;\n"
 > @@ -992,9 +1206,6 @@ struct @prefix@_pbb_label {
 >  	      (sfrag-trailer? frag))
 >  	 (string-append "      pbb_br_npc = npc;\n"
 >  			"      pbb_br_status = br_status;\n")
 > -	 "")
 > -     (if parallel?
 > -	 (gen-undef-parallel-operand-macro (sfrag-sfmt frag))
 >  	 "")
 >       (gen-undef-field-macro (sfrag-sfmt frag))
 >       "    }\n"
 > Index: rtl-c.scm
 > ===================================================================
 > RCS file: /cvs/src/src/cgen/rtl-c.scm,v
 > retrieving revision 1.4
 > diff -u -p -r1.4 rtl-c.scm
 > --- rtl-c.scm	8 Sep 2000 22:18:37 -0000	1.4
 > +++ rtl-c.scm	9 Jan 2003 03:22:25 -0000
 > @@ -1304,7 +1304,23 @@
 >  			"bad arg to `operand'" object-or-name)))
 >  )
 >  
 > -(define-fn xop (estate options mode object) object)
 > +(define-fn xop (estate options mode object) 
 > +  (let ((delayed (assoc '#:delay (estate-modifiers estate))))
 > +    (if (and delayed
 > +	     (equal? APPLICATION 'SID-SIMULATOR)
 > +	     (operand? object))
 > +	;; if we're looking at an operand inside a (delay ...) rtx, then we
 > +	;; are talking about a _delayed_ operand, which is a different
 > +	;; beast.  rather than try to work out what context we were
 > +	;; constructed within, we just clone the operand instance and set
 > +	;; the new one to have a delayed value. the setters and getters
 > +	;; will work it out.
 > +	(let ((obj (object-copy object))
 > +	      (amount (cadr delayed)))
 > +	  (op:set-delay! obj amount)
 > +	  obj)
 > +	;; else return the normal object
 > +	object)))
 >  
 >  (define-fn local (estate options mode object-or-name)
 >    (cond ((rtx-temp? object-or-name)
 > @@ -1363,9 +1379,38 @@
 >    (cx:make VOID "; /*clobber*/\n")
 >  )
 >  
 > -(define-fn delay (estate options mode n rtx)
 > -  (s-sequence (estate-with-modifiers estate '((#:delay))) VOID '() rtx) ; wip!
 > -)
 > +
 > +(define-fn delay (estate options mode num-node rtx)
 > +  (case APPLICATION
 > +    ((SID-SIMULATOR)
 > +     (let* ((n (cadddr num-node))
 > +	    (old-delay (let ((old (assoc '#:delay (estate-modifiers estate))))
 > +			 (if old (cadr old) 0)))
 > +	    (new-delay (+ n old-delay)))    
 > +       (begin
 > +	 ;; check for proper usage
 > +     	 (if (let* ((hw (case (car rtx) 
 > +			  ((operand) (op:type (rtx-operand-obj rtx)))
 > +			  ((xop) (op:type (rtx-xop-obj rtx)))
 > +			  (else #f))))		    	       
 > +	       (not (and hw (or (pc? hw) (memory? hw) (register? hw)))))
 > +	     (context-error 
 > +	      (estate-context estate) 
 > +	      (string-append 
 > +	       "(delay ...) rtx applied to wrong type of operand '" (car rtx) "'. should be pc, register or memory")))
 > +	 ;; signal an error if we're delayed and not in a "parallel-insns" CPU
 > +	 (if (not (with-parallel?)) 
 > +	     (context-error 	      
 > +	      (estate-context estate) 
 > +	      "delayed operand in a non-parallel cpu"))
 > +	 ;; update cpu-global pipeline bound
 > +	 (cpu-set-max-delay! (current-cpu) (max (cpu-max-delay (current-cpu)) new-delay))      
 > +	 ;; pass along new delay to embedded rtx
 > +	 (rtx-eval-with-estate rtx mode (estate-with-modifiers estate `((#:delay ,new-delay)))))))
 > +
 > +    ;; not in sid-land
 > +    (else (s-sequence (estate-with-modifiers estate '((#:delay))) VOID '() rtx))))
 > +
 >  
 >  ; Gets expanded as a macro.
 >  ;(define-fn annul (estate yes?)
 > Index: operand.scm
 > ===================================================================
 > RCS file: /cvs/src/src/cgen/operand.scm,v
 > retrieving revision 1.5
 > diff -u -p -r1.5 operand.scm
 > --- operand.scm	20 Dec 2002 06:39:04 -0000	1.5
 > +++ operand.scm	9 Jan 2003 03:22:29 -0000
 > @@ -90,6 +90,9 @@
 >  		; referenced.  #f means the operand is always referenced by
 >  		; the instruction.
 >  		(cond? . #f)
 > +		
 > +		; whether (and by how much) this instance of the operand is delayed.
 > +		(delayed . #f)
 >  		)
 >  	      nil)
 >  )
 > @@ -135,6 +138,8 @@
 >  (define op:set-num! (elm-make-setter <operand> 'num))
 >  (define op:cond? (elm-make-getter <operand> 'cond?))
 >  (define op:set-cond?! (elm-make-setter <operand> 'cond?))
 > +(define op:delay (elm-make-getter <operand> 'delayed))
 > +(define op:set-delay! (elm-make-setter <operand> 'delayed))
 >  
 >  ; Compute the hardware type lazily.
 >  ; FIXME: op:type should be named op:hwtype or some such.
 > Index: mach.scm
 > ===================================================================
 > RCS file: /cvs/src/src/cgen/mach.scm,v
 > retrieving revision 1.2
 > diff -u -p -r1.2 mach.scm
 > --- mach.scm	12 Jul 2001 02:32:25 -0000	1.2
 > +++ mach.scm	9 Jan 2003 03:22:31 -0000
 > @@ -755,8 +755,7 @@
 >    (apply min (cons 65535
 >  		   (map insn-length (find (lambda (insn)
 >  					    (and (not (has-attr? insn 'ALIAS))
 > -						 (eq? (obj-attr-value insn 'ISA)
 > -						      (obj:name isa))))
 > +						 (isa-supports? isa insn)))
 >  					  (non-multi-insns (current-insn-list))))))
 >  )
 >  
 > @@ -765,9 +764,8 @@
 >    ; [a language with infinite precision can't have max-reduce-iota-0 :-)]
 >    (apply max (cons 0
 >  		   (map insn-length (find (lambda (insn)
 > -					    (and (not (has-attr? insn 'ALIAS))
 > -						 (eq? (obj-attr-value insn 'ISA)
 > -						      (obj:name isa))))
 > +					  (and (not (has-attr? insn 'ALIAS))
 > +						 (isa-supports? isa insn)))
 >  					  (non-multi-insns (current-insn-list))))))
 >  )
 >  
 > @@ -1008,13 +1006,19 @@
 >  		; Allow a cpu family to override the isa parallel-insns spec.
 >  		; ??? Concession to the m32r port which can go away, in time.
 >  		parallel-insns
 > +
 > +		; Computed: maximum number of insns which may pass before there
 > +		; an insn writes back its output operands.
 > +		max-delay
 > +
 >  		)
 >  	      nil)
 >  )
 >  
 >  ; Accessors.
 >  
 > -(define-getters <cpu> cpu (word-bitsize insn-chunk-bitsize file-transform parallel-insns))
 > +(define-getters <cpu> cpu (word-bitsize insn-chunk-bitsize file-transform parallel-insns max-delay))
 > +(define-setters <cpu> cpu (max-delay))
 >  
 >  ; Return endianness of instructions.
 >  
 > @@ -1064,7 +1068,9 @@
 >  	      word-bitsize
 >  	      insn-chunk-bitsize
 >  	      file-transform
 > -	      parallel-insns)
 > +	      parallel-insns
 > +	      0 ; default max-delay. will compute correct value
 > +	      )
 >  	(begin
 >  	  (logit 2 "Ignoring " name ".\n")
 >  	  #f))) ; cpu is not to be kept
 > @@ -1284,13 +1290,13 @@
 >    ; Assert only one cpu family has been selected.
 >    (assert-keep-one)
 >  
 > -  (let ((par-insns (map isa-parallel-insns (current-isa-list)))
 > +  (let ((false->zero (lambda (x) (if x x 0)))
 > +	(par-insns (map isa-parallel-insns (current-isa-list)))
 >  	(cpu-par-insns (cpu-parallel-insns (current-cpu))))
 >      ; ??? The m32r does have parallel execution, but to keep support for the
 >      ; base mach simpler, a cpu family is allowed to override the isa spec.
 > -    (or cpu-par-insns
 > -	; FIXME: ensure all have same value.
 > -	(car par-insns)))
 > +    (max (false->zero cpu-par-insns) 
 > +	 (apply max (map false->zero par-insns))))
 >  )
 >  
 >  ; Return boolean indicating if parallel execution support is required.
 > Index: dev.scm
 > ===================================================================
 > RCS file: /cvs/src/src/cgen/dev.scm,v
 > retrieving revision 1.5
 > diff -u -p -r1.5 dev.scm
 > --- dev.scm	21 Dec 2002 22:22:33 -0000	1.5
 > +++ dev.scm	9 Jan 2003 03:22:31 -0000
 > @@ -115,7 +115,7 @@
 >    (load "sid-model")
 >    (load "sid-decode")
 >    (set! verbose-level 3)
 > -  (set! APPLICATION 'SIMULATOR)
 > +  (set! APPLICATION 'SID-SIMULATOR)
 >  )
 >  
 >  (define (load-sim)
 > Index: doc/rtl.texi
 > ===================================================================
 > RCS file: /cvs/src/src/cgen/doc/rtl.texi,v
 > retrieving revision 1.17
 > diff -u -p -r1.17 rtl.texi
 > --- doc/rtl.texi	22 Dec 2002 04:49:26 -0000	1.17
 > +++ doc/rtl.texi	9 Jan 2003 03:22:34 -0000
 > @@ -1833,7 +1833,7 @@ This is a character string consisting of
 >  Fields are denoted by @code{$operand} or
 >  @code{$@{operand@}}@footnote{Support for @code{$@{operand@}} is
 >  work-in-progress.}.  If a @samp{$} is required in the syntax, it is
 > -specified with @samp{\$}.  At most one white-space character may be
 > +specified with @samp{$$}.  At most one white-space character may be
 >  present and it must be a blank separating the instruction mnemonic from
 >  the operands.  This doesn't restrict the user's assembler, this is
 >  @c Is this reasonable?
 > @@ -2257,10 +2257,39 @@ first argument.
 >  Indicate that @samp{object} is written in mode @samp{mode}, without
 >  saying how. This could be useful in conjunction with the C escape hooks.
 >  
 > -@item (delay mode num expr)
 > -Indicate that there are @samp{num} delay slots in the processing of
 > -@samp{expr}.  When using this rtx in instruction semantics, CGEN will
 > -infer that the instruction has the DELAY-SLOT attribute.
 > +@item (delay num expr)
 > +In older "sim" simulators, indicates that there are @samp{num} delay
 > +slots in the processing of @samp{expr}. When using this rtx in instruction
 > +semantics, CGEN will infer that the instruction has the DELAY-SLOT
 > +attribute.  
 > +
 > +In newer "sid" simulators, evaluates to the writeback queue for hardware
 > +operand @samp{expr}, at @samp{num} instruction cycles in the
 > +future. @samp{expr} @emph{must} be a hardware operand in this case. 
 > +
 > +For example, @code{(set (delay 3 pc) (+ pc 1))} will schedule write to
 > +the @samp{pc} register in the writeback phase of the 3rd instruction
 > +after the current. Alternatively, @code{(set gr1 (delay 3 gr2))} will
 > +immediately update the @samp{gr1} register with the @emph{latest write}
 > +to the @samp{gr2} register scheduled between the present and 3
 > +instructions in the future. @code{(delay 0 ...)}  refers to the
 > +writeback phase of the current instruction.
 > +
 > +This effect is modeled with a circular buffer of "write stacks" for each
 > +hardware element (register banks get a single stack). The size of the
 > +circular buffer is calculated from the uses of @code{(delay ...)} 
 > +rtxs. When a delayed write occurs, the simulator pushes the write onto
 > +the appropriate write stack in the "future" of the circular buffer for
 > +the written-to hardware element. At the end of each instruction cycle,
 > +the simulator executes all writes in all write stacks for the time slice
 > +just ending. When a delayed read (essentially a pipeline bypass) occurs,
 > +the simulator looks ahead in the circular buffer for any writes
 > +scheduled in the future write stack. If it doesn't find one, it
 > +progressively backs off towards the "current" instruction cycle's write
 > +stack, and if it still finds no scheduled writes then it returns the
 > +current state of the CPU. Thus while delayed writes are fast, delayed
 > +reads are potentially slower in a simulator with long pipelines and very
 > +large register banks.
 >  
 >  @item (annul yes?)
 >  @c FIXME: put annul into the glossary.
 > 
 > 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]