This is the mail archive of the
mailing list for the binutils project.
Re: ELF octets_per_byte
- From: Dan <dgisselq at verizon dot net>
- To: Cary Coutant <ccoutant at gmail dot com>
- Cc: dgisselq at ieee dot org, Binutils <binutils at sourceware dot org>
- Date: Wed, 24 Feb 2016 21:34:12 -0500
- Subject: Re: ELF octets_per_byte
- Authentication-results: sourceware.org; auth=none
- References: <1456242622 dot 30661 dot 448 dot camel at jericho> <CAJimCsF_3u-CDXkybODnkTAyZqihS-wdwFwL+qqWnUfHKQ3xnQ at mail dot gmail dot com>
- Reply-to: dgisselq at ieee dot org
Please allow me to respond below,
On Wed, 2016-02-24 at 15:51 -0800, Cary Coutant wrote:
+AD4 +AD4 I am in the process of trying to port binutils to a new architecture,
+AD4 +AD4 the ZipCPU. (You can find a description of it here:
+AD4 +AD4 https://opencores.org/project,zipcpu) One unique +ACI-feature+ACI of this
+AD4 +AD4 processor is that the size of the minimum addressable unit is 32-bits.
+AD4 +AD4 While binutils has support for an +ACI-octets+AF8-per+AF8-byte+ACI value other than
+AD4 +AD4 one, this feature does not appear to be fully supported. Indeed, the
+AD4 +AD4 +ACI-bfd/elflink.c+ACI file contains several +ACI-FIXME+ACI lines regarding the
+AD4 +AD4 insufficiency of the current support.
+AD4 It doesn't seem to me that having a minimum addressable unit of 32-bits
+AD4 necessarily makes your byte size 32 bits. That would actually surprise
+AD4 me quite a lot. You've simply described a word-addressed machine, and
+AD4 it would be quite sensible to continue to have four 8-bit bytes in each
+AD4 of those 32-bit words.
While I would be tempted to agree with you wholeheartedly, the +ACI-octets+ACI
versus +ACI-byte+ACI definition is one I reverse engineered from within the gas
code for an assembler. Within that code, there's a lot of support for
OCTETS+AF8-PER+AF8-BYTE being something other than 1 as well as
OCTETS+AF8-PER+AF8-BYTE+AF8-POWER being defined such that
(1+ADwAPA-OCTETS+AF8-PER+AF8-BYTE)+AD0-OCTETS+AF8-PER+AF8-BYTE+AF8-POWER. That said, only two other
CPU's appear to have OCTETS+AF8-PER+AF8-BYTE set to anything other than 1: the
TI-C4x, and the TI-C54.
Using this terminology, a +ACI-byte+ACI is the minimum addressible unit,
whereas an +ACI-octet+ACI is 8-bits.
This +ACI-OCTETS+AF8-PER+AF8-BYTE+ACI feature has some support within BFD, and it was
in the hopes of updating and correcting that support that I am writing.
+AD4 Does your C compiler evaluate sizeof(void +ACo) to 1 or to 4?
On the ZipCPU, ...
sizeof(char) +AD0 sizeof(int) +AD0 sizeof(void +ACo) +AD0 1 (32-bits)
+AD4 Indeed, I'm not at all surprised that there are many FIXME's
+AD4 associated with a parameter named +ACI-octets+AF8-per+AF8-byte+ACI. The name just
+AD4 doesn't make sense, unless it's a float: I'd expect such a parameter
+AD4 to be 1 or some fraction close to 1 (e.g., for 6- or 7- or 9-bit bytes
+AD4 typical of some legacy architectures from the 60s and 70s). But I
+AD4 guess bfd actually does use +ACI-byte+ACI to mean +ACI-word+ACI for word-addressed
+AD4 machines. Sigh. That's not a definition I would have chosen. You can
+AD4 mentally translate +ACI-byte+ACI to +ACI-word+ACI inside bfd, and to +ACI-octet+ACI when
+AD4 reading the ELF spec.
+AD4 +AD4 All of these can be easily fixed, and I would like to propose a patch
+AD4 +AD4 (or series of patches) to do this. The first part of this process will
+AD4 +AD4 need to be identifying which ELF variables/values are +ACI-bytes+ACI (units of
+AD4 +AD4 the targets address space), and which are +ACI-octets+ACI (8-bit values, units
+AD4 +AD4 of the more commonly used address space). Sadly, these units are not
+AD4 +AD4 consistent with the meaning of +ACI-bytes+ACI found within the ELF
+AD4 +AD4 specification, nor can they be since the ELF specification does not
+AD4 +AD4 acknowledge the potential difference between these two.
+AD4 Is it really the case that we haven't yet seen a word-addressed
+AD4 machine that uses ELF?
A quick grep for OCTETS+AF8-PER+AF8-BYTE in binutils/gas/config/+ACo.h reveals that
only the two TI chips have this feature.
+AD4 ELF states right up front that it's designed for 8-bit bytes and
+AD4 32-bit or 64-bit architectures. It's an on-disk file format, so in
+AD4 today's world, that means it's byte oriented, where +ACI-byte+ACI means the
+AD4 same thing as +ACI-octet+ACI (a term invented by standards bodies just to
+AD4 avoid any bias against machines where bytes weren't clearly 8 bits
+AD4 long). Other than that, though, it's defined in terms of C structures,
+AD4 so its format is completely defined by the ABI of your target
+AD4 ELF also clearly separates the notions of a file offset (Elfxx+AF8-Off)
+AD4 from that of a program address (Elfxx+AF8-Addr). Anything that's declared
+AD4 in the ELF spec as Elfxx+AF8-Off is a file offset, specified in bytes
+AD4 (octets), while anything that's declared as Elfxx+AF8-Addr is a machine
+AD4 address, whatever that means on your target machine. Sizes, whether on
+AD4 disk or in memory, are consistently described as in bytes.
+AD4 +AD4 For the purpose of beginning a discussion, and based upon a reading of
+AD4 +AD4 the ELF specification, I propose the following values be in units of
+AD4 +AD4 +ACI-octets+ACI:
+AD4 +AD4 section size
+AD4 +AD4 section header size
+AD4 +AD4 section header offset
+AD4 +AD4 For the most part, these values +ACo-must+ACo be in octets, or it will be
+AD4 +AD4 impossible to read and process an ELF file.
+AD4 +AD4 I also propose that the following values are in units of target address
+AD4 +AD4 space +ACI-bytes+ACI:
+AD4 +AD4 ELF header +ACI-entry+ACI address
+AD4 Yes, this is an Elfxx+AF8-Addr.
+AD4 +AD4 section header address
+AD4 No, this is an Elfxx+AF8-Off. It's not an +ACI-address+ACIAOw it's a +ACI-file offset+ACI,
+AD4 so it must be in bytes.
Looks like I may have gotten one wrong, then. However, I think the
current functionality stores +ACI-bytes+ACI and not +ACI-octets+ACI into this field.
I'll have to go back and double check.
+AD4 +AD4 symbol value
+AD4 Yes, this is an Elfxx+AF8-Addr.
+AD4 +AD4 symbol size
+AD4 No, this is a size, in bytes.
+AD4 +AD4 relocation offset
+AD4 Yes, this is an Elfxx+AF8-Addr. It's described as an +ACI-offset+ACI, because it
+AD4 may be relative to the start of a section, but it's not a +ACI-file
+AD4 +AD4 relocation addend
+AD4 This is just a pure number used in relocation processing, so it makes
+AD4 sense that it should be in the same units as a symbol value.
I'll go dig into the section header address to see if that needs
adjusting for how it is kept.