This is the mail archive of the
cgen@sources.redhat.com
mailing list for the CGEN project.
Re: Types and other issues with cgen
On Wed, Aug 06, 2003 at 10:27:25AM -0700, Doug Evans wrote:
> Michael Meissner writes:
> > I've been looking at the internal types used within cgen, and I wanted to get
> > some comments before I start making wholesale changes. Sorry for the length,
> > but I thought it is important to talk about the issues (#1, #4, and #8 are
> > minor issues).
> >
> > 1) Cgen uses the PARAMS macro to selectively hide prototypes. Given that both GCC
> > and BINUTILS now require a C90 compiler with prototypes, would patches that go
> > through and compeletely prototype things be accepted?
>
> Yep.
>
> > 2) Cgen has a type mechanism (DI/SI/etc.) but it doesn't seem to be used in the
> > actual code for at least the assembler and disassembler
>
> Using the modes in the assembler/disassembler isn't the right way to go.
> These modes are for semantic operation, not assembly/disassembly.
> Imagine some instruction with an immediate operand that is a fixed
> set of constants that is encoded with special magic numbers.
> Register indices are another example.
> There's a disconnect between representation in the instruction
> and use during semantic evaluation.
>
> > (I haven't gotten to sim/sid yet).
>
> Modes are definately used in simulation.
>
> > All fields in the cgen_fields structure are signed long, no
> > matter what the type that I declare in the .cpu file is. In part this seems
> > to be because extract_normal and friends take an address of the field to
> > fill, and return 0/1 for error and success. Wouldn't a better approach be
> > to size & type the fields as the user specified, and make the extract
> > functions return the extracted value and return error/success via a
> > pointer. I could see either separate extractor functions for each type, or
> > signed/unsigned extractor functions of the widest type, or just a single
> > extract function being used.
>
> Either way one has to have multiple functions per type
> (unless of course one used a union or some such),
> regardless of whether the pass/fail indicator is the result
> or returned via a pointer to it.
Not necessarily, you could always have 1 function which returned the widest
type, and the compiler can do any narrowing/sign conversion.
> Having multiple variants of the internal extract_normal routine
> is an increment in complication I haven't needed yet so I've been
> defering it.
>
> Note that there are already functions that have multiple variants
> dependent on type. See for example m32r_cgen_[gs]et_{int,vma}_operand
> in opcodes/m32r-ibld.c.
> These functions aren't currently used by any binutils program.
> They're services offered to programs outside of binutils.
>
> > 3) Signed long is another problem in that the machine I'm targeting is a 64-bit
> > machine, but I am doing development on an x86 machine. If we keep to a
> > single type, it should be at least bfd_signed_vma which will be the
> > appropriate size to hold addresses in the target machine. This will mean
> > having to rewrite the places that just call printf or the print functions,
> > but that is not too difficult. Another possibility is to use a cgen
> > specific type (or two types for signed/unsigned) that is sized to be as
> > large as the largest type used in the .cpu file. Ideally for 32-bit ports
> > on 32-bit hosts, you would not slow things down by using 64 bit types
> > blindly, but it would allow those of us developing for larger hosts to
> > use cgen.
>
> For assembly/disassembly purposes the issue is what is the maximum
> size of a "word" in the instruction's representation?
> And for the sake of [V]LIW machines let's keep separate the notion
> of individual instructions inside one collection of instructions
> (or in Transmeta parlance: atoms and molecules (whoop dee doo)).
Yep.
> I'm assuming/hoping you can pack each instruction separately
> and then combine them at the end, and for now do the final packing
> (or initial unpacking for disassembly) outside of cgen.
Yes, the packing is fairly trival (break the instructions into a 2 bit field
and a 41 bit field, combine the 3 2-bit fields into 1 5-bit field, and the
resultant 5-bit field, followed by the 3 41-bit fields make for a 128-bit
combined instruction). In the instruction encoding, only the values 0, 1, and
2 are allowed. Labels will force padding to the next 128-bit boundary.
The 1 86-bit instruction is treated as two separate 43-bit instructions.
> > There are machines out there with 128 bit registers, such as the MIPS chip
> > that is at the heart of the Sony playstation, the SES2 registers on the
> > Pentium IV, and the Altivec registers on the newer Powerpcs. However, C
> > compilers don't often times give 128 bit types. We might want to think
> > about how to handle these machines as well. In terms of instruction size, I
> > do have a 86 bit instruction which pushes the problem also. This may
> > require using gmp if needed. Too bad, we aren't coding in C++, where we
> > could just define a class type to get the extra precision.
>
> cgen based simulators (written in C) can already handle simulating
> architectures with 64 bit values on hosts where the compiler doesn't
> have long long (with C++ there's less of an issue).
> Dunno how often it is used, so no claim is made that there isn't bitrot
> or that it's complete, but it was tested way back when.
> Grep for HAVE_LONGLONG in sim/common/cgen-types.h.
>
> Semantics modes are to some extent black boxes.
> As new modes become needed we can add them.
> A simulator on a host with a compiler that can't represent them
> can represent them as a struct and provide the necessary
> manipulators of that struct. (for c++ s/struct/class/ if you prefer)
> No claim is made that the addition will be a walk in the park,
> but that's the plan-of-record.
>
> > 4) As a nit, we use unsigned int for the hash type, and I suspect it might be
> > cleaner if we had a cgen specific type for holding hash values (ie,
> > cgen_hash_t).
>
> Sure. An increment in complication I was defering.
> One might want to add to the name the context in which it is used.
> Cgen might want to use different kinds of hashes in different contexts.
>
> > 5) As an experiment, I compiled cgen with -Wconversion, and it showed a lot of
> > places where implicit signed<->unsigned conversions were going on. A lot of
> > the places were using int to hold sizes like buffer lengths, and passing
> > sizeof(...) to the value, and size_t would be more useful. Unfortunately it
> > also shows other places where having a single type for the fields (such as
> > long currently, or bfd_signed_vma/cgen_int_t possibly in the future). One
> > of my thoughts is to have a union of an appropriate unsigned and signed
> > types of the same size, and use the appropriate element in the expansion.
>
> Removing the warnings would certainly be a good idea, though this
> particular warning doesn't always have a high signal/noise ratio.
My first attempt at using a 64-bit type fails on the m32r since there are
places I haven't caught yet where it stores a 32-bit unsigned value (which
happens to hold a negative value) into a larger 64-bit item, and the sign
doesn't extend correctly. I'm assuming as I go through the tedious task of
fixing all of the warnings, it will show where I'm losing precision.
> > 6) Using bfd_put_bits and bfd_get_bits to convert the bits into proper endian
> > format only works for bit sizes of 8, 16, 32, and 64. In all other places,
> > bfd aborts (my machine has mostly 43 bit instructions, and 1 86 bit
> > instruction before the encoding mentioned in #7). It might be better to
> > open code this, rather than falling back to the bfd functions.
> >
> > Another idea is to always encode instructions expressed as a series of bytes
> > in big endian (or little endian) format, and then expect the final assembler
> > encoding to do the appropriate copying. Otherwise, I see a lot of code that
> > checks the endianess to get the correct byte.
>
> A final assembly pass to do the appropriate copying isn't necessarily
> a slam-dunk.
>
> The asm/disasm side of cgen currently has two modes of representing
> instructions: as an "int" in host byte order, or as a string of bytes
> in target byte order.
Yes I know, but in the code that handles the bytes rather than the integer
case, I see ?: operations to get the correct endian orientation so you know
whether to fetch a byte from the beginning or the end.
> > 7) As I have mentioned in the past, my machine uses 3 43-bit instructions that
> > are encoded into a 128 bit super instruction. Any ideas for the syntax for
> > specifying the encode/decode operations?
>
> I'm not sure I understand. In what context?
Basically where would be the proper place to add this (define-isa seems the
logical canidate, though define-mach/define-cpu are other possibilities). I'm
thinking something like the handlers option in define-operand.
> > 8) The @arch@_cgen_hw_table uses (PTR) in initializing the asm_data field.
> > This makes debugging harder. Would it be possible to have 2 fields so that
> > each member is correctly typed, and you can print out pointers in the
> > debugger?
>
> 2 fields? How would it look?
> (it's certainly a useful thing to do, and I'd say go for it,
> but I'm not clear what the result would look like)
I'm going off of this comment in include/opcode/cgen.h:
typedef struct
{
char *name;
enum cgen_hw_type type;
/* There is currently no example where both index specs and value specs
are required, so for now both are clumped under "asm_data". */
enum cgen_asm_type asm_type;
PTR asm_data;
#ifndef CGEN_HW_NBOOL_ATTRS
#define CGEN_HW_NBOOL_ATTRS 1
#endif
CGEN_ATTR_TYPE (CGEN_HW_NBOOL_ATTRS) attrs;
#define CGEN_HW_ATTRS(hw) (&(hw)->attrs)
} CGEN_HW_ENTRY;
I assume we would just have two fields, one for holding index specs, and the
other for holding value specs.
> Note that things are currently not totally hopeless.
> One could print the value and then say "info sym <value>",
> and then print the variable gdb gives.
>
> > So, suggestions on how you would like me to extend cgen to handle the problems
> > my machine exposes?
>
> For assembly/disassembly, I need to think about it for a bit.
> I think what we need to do is be able to handle each insn
> individually and handle packing/unpacking outside of cgen (for now).
> That reduces the problem to handling how "words" are layed out
> in each individual insn(atom). Since we're dealing with 43 bit
> entities (or 2*43 bits), I'm wondering if treating them as
> 64 bit entities for packing/unpacking will work.
It might once we use a 64-bit type. However, the long instruction still is a
problem since it doesn't fit in an integral type.
> (how the 2*43 bits case would be handled would depend on the details
> I guess, maybe 2*64 bits or maybe 32+64 bits).
>
> I'm guessing studying how to handle ia64 would suffice.
That was my first thought (the designer of my machine used the IA-64 as a
model, so they are similar in some superficial ways). However, I've concluded
that the IA-64 port is not a port that was completed, even if you build it on a
64-bit machine so that long is 64-bits. Among other things, it has no support
for encoding the instrucitons, and so would fail in bfd_put_bits.
> > My initial thoughts are to use a cgen specific type for the types. The first
> > round would use bfd_vma/bfd_signed_vma, but eventually size the type based on
> > the maximum size used in the .cpu file. I'm thinking of using the union with
> > signed and unsigned fields, to deal with many of the conversion issues.
>
> If after reading the above you still think this is the way to go,
> let's discuss it further.
--
Michael Meissner
email: gnu@the-meissners.org
http://www.the-meissners.org