This is the mail archive of the mailing list for the binutils project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: ELF octets_per_byte

> I am in the process of trying to port binutils to a new architecture,
> the ZipCPU.  (You can find a description of it here:
>,zipcpu)  One unique "feature" of this
> processor is that the size of the minimum addressable unit is 32-bits.
> While binutils has support for an "octets_per_byte" value other than
> one, this feature does not appear to be fully supported.  Indeed, the
> "bfd/elflink.c" file contains several "FIXME" lines regarding the
> insufficiency of the current support.

It doesn't seem to me that having a minimum addressable unit of
32-bits necessarily makes your byte size 32 bits. That would actually
surprise me quite a lot. You've simply described a word-addressed
machine, and it would be quite sensible to continue to have four 8-bit
bytes in each of those 32-bit words. Does your C compiler evaluate
sizeof(void *) to 1 or to 4?

Indeed, I'm not at all surprised that there are many FIXME's
associated with a parameter named "octets_per_byte". The name just
doesn't make sense, unless it's a float: I'd expect such a parameter
to be 1 or some fraction close to 1 (e.g., for 6- or 7- or 9-bit bytes
typical of some legacy architectures from the 60s and 70s). But I
guess bfd actually does use "byte" to mean "word" for word-addressed
machines. Sigh. That's not a definition I would have chosen. You can
mentally translate "byte" to "word" inside bfd, and to "octet" when
reading the ELF spec.

> All of these can be easily fixed, and I would like to propose a patch
> (or series of patches) to do this.  The first part of this process will
> need to be identifying which ELF variables/values are "bytes" (units of
> the targets address space), and which are "octets" (8-bit values, units
> of the more commonly used address space).  Sadly, these units are not
> consistent with the meaning of "bytes" found within the ELF
> specification, nor can they be since the ELF specification does not
> acknowledge the potential difference between these two.

Is it really the case that we haven't yet seen a word-addressed
machine that uses ELF?

ELF states right up front that it's designed for 8-bit bytes and
32-bit or 64-bit architectures. It's an on-disk file format, so in
today's world, that means it's byte oriented, where "byte" means the
same thing as "octet" (a term invented by standards bodies just to
avoid any bias against machines where bytes weren't clearly 8 bits
long). Other than that, though, it's defined in terms of C structures,
so its format is completely defined by the ABI of your target

ELF also clearly separates the notions of a file offset (Elfxx_Off)
from that of a program address (Elfxx_Addr). Anything that's declared
in the ELF spec as Elfxx_Off is a file offset, specified in bytes
(octets), while anything that's declared as Elfxx_Addr is a machine
address, whatever that means on your target machine. Sizes, whether on
disk or in memory, are consistently described as in bytes.

> For the purpose of beginning a discussion, and based upon a reading of
> the ELF specification, I propose the following values be in units of
> "octets":
> section size
> section header size
> section header offset
> For the most part, these values *must* be in octets, or it will be
> impossible to read and process an ELF file.
> I also propose that the following values are in units of target address
> space "bytes":
> ELF header "entry" address

Yes, this is an Elfxx_Addr.

> section header address

No, this is an Elfxx_Off. It's not an "address"; it's a "file offset",
so it must be in bytes.

> symbol value

Yes, this is an Elfxx_Addr.

> symbol size

No, this is a size, in bytes.

> relocation offset

Yes, this is an Elfxx_Addr. It's described as an "offset", because it
may be relative to the start of a section, but it's not a "file

> relocation addend

This is just a pure number used in relocation processing, so it makes
sense that it should be in the same units as a symbol value.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]