This is the mail archive of the binutils@sources.redhat.com mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: bfd function to read ELF file image from memory


> Right - my theory was that you could extend and clean up the current
> BFD_IN_MEMORY implementation so that the memory was either local (as
> it currently is), remote (as your patch implements) or even an mmap'ed
> file (something that people have been asking for in bfd for ages, and
> is partially implemented in bfdwin.c).

My patch implements copying from remote memory once, not generalized access
to it.  Lazy reading would be pretty straightforward to implement.  But it
is rather a digression from the motivating purpose here.  The case at hand
is an ELF file image of about 2kb, so it would surely be more overhead to
split it up and access memory piecemeal than just to read the whole thing.
Moreover, I have a problem to solve in gdb and the BFD function I've
written is the naturally clean and somewhat general BFD component of it.
It's pretty trivial to implement because of the existing BFD_IN_MEMORY
functionality.  I have no impetus to add a more complex BFD facility for
which there is no actual demand, and everyone's time is probably better
spent elsewhere.  (All that said, what you've described is a few hours'
hacking on bfdio.c--much more time writing something using such a facility
so as to test it.)

> probably rename the functions to ..._create_bfd_from_remote_memory().
> I would also recommend adding a couple of prototypes to elf-bfd.h.

Of course.  I posted the function for comments about how to integrate it,
not as a finished addition.  (And I never care what the names are.)  

> Certainly.  If you have the code in a separate file, then you could do
> the same thing as the peXXigen.c file, which is conditionally compiled
> and linked into the bfd library based on the configuration target.
> (See bfd/configure.in).  Alternatively you could use a system like the
> definition of COFF_WITH_PE, which is defined in a variety of different
> source files (eg bfd/pe-arm.c) and then tested for in various other
> files (eg bfd/coffcode.h).

I don't see how any of this is apropos.  Your examples are about different
backends sharing common code.  For that purpose, elfcode.h is already the
right place for the new function's implementation.  The question is about
libbfd interfaces used by applications, that are available only conditionally.
I don't think you've cited an example like that for me to examine.  If you
have, please be more specific about what functions I should look at.

> Hmm, I take your point - it would be easier if the function were
> always defined in the bfd library, even if it was not used/not
> intended to be used by a particular bfd target.  (I think it would be
> OK if it were only defined in ELF targeted versions of the library
> though).
> 
> Given that the code is not target specific itself, (just the
> availability of the ELF headers/file contents in remote memory), I
> withdraw my suggestion that the code be conditionalised.

Ok.  That still doesn't tell me explicitly what to do.  Are you saying that
it should in fact be a new backend function pointer in struct bfd_target?
Can you tell me where to find the checklist of things I need to do to
properly add such an interface?

> > Still, I would just to have one calling interface that handles
> > 32-bit and 64-bit ELF.  To implement that front-end function I need
> > a compile-time check I can use in elfcode.h that tells me whether
> > 32-bit and/or 64-bit targets are being built.  What works?
> 
> You can check "#if ARCH_SIZE == 64" or "#if ARCH_SIZE == 32".

I must be misunderstanding you, or else that can't be right.  I asked about
how front-end source code can know whether it needs to call just the 32-bit
function from the 32-bit compile of elfcode.h, just the 64-bit function
from the 64-bit compile of elfcode.h, or switch between the two at runtime
based on bfd_get_arch_size. 

> I would recommend including a description of how TARGET_READ_MEMORY is
> supposed to work - what parameters it takes, what values it returns,
> whether it is possible for it to timeout, etc.
> 
> Also - please document the purpose of the LOADBASEP parameter.

Sure.

> I think that the choice of name of the name for the internal Ehdr is
> slightly confusing.  I would suggest that you change it to "i_ehdr",

Whatever floats your boat.

> 
> > +  contents_size = ehdr.e_phoff + ehdr.e_phnum;
> > +  if ((bfd_vma) contents_size < ehdr.e_ehsize)
> > +    {
> > +      bfd_set_error (bfd_error_wrong_format);
> > +      return NULL;
> > +    }
> 
> This bit of code has me confused.  I assume that 'contents_size' is
> being used to reserve space in the (to be allocated) 'contents'
> buffer for the ELF header, but why are e_phoff and e_phnum being
> used ? 

No.  That variable is tallying the total size of the buffer that will be
allocated to hold the file image.  A valid ELF file can have its phdrs not
covered by any PT_LOAD segment, placed anywhere in the file.  The original
version of this code tried to read a whole contiguous file and so to be
pedantically correct this calculation made absolutely sure the phdrs were
covered.  The current version of the function only reads from the PT_LOAD
segments, so that bit of code doesn't make much sense any more.

> Couldn't you just remember the value of 'i' each time you encounter a
> PT_LOAD section in the for() loop, so saving yourself this do...while
> loop ?

Sure.  The loops costs very little, but there is nothing wrong with saving
the last index.

> It would also be a good idea to check that there actually *is* a PT_LOAD
> section.

Yes, another holdover from the original implementation.  The first version
would be happy just reading an ELF file header and phdrs with no PT_LOAD
segments and delivering a file image containing just those headers.
That no longer makes sense.

> Hmmm, what about files with disjoint segments ?  ie files which load
> some segments to low addresses and some to high addresses ?  With the
> current code 'contents' will be some huge number and so this malloc
> will fail.

You are mistaken.  The load addresses have no bearing on those
calculations.  Perhaps you saw `p_offset' in the code and thought
`p_vaddr'.  The size computed is that of the file image, not the
memory image.  In an ELF file with a segment loaded at 0x1000 and a
segment loaded at 0x1000000, the two segments are contiguous in the
file (modulo alignment padding).

> Wouldn't it be better to load each segment contiguously into the
> 'contents' buffer and adjust the p_vaddr values in the program header
> appropriately ?

Changing p_vaddr values would completely break all address-using
calculations ever done on anything in the file.  It would be possible
to pack segments together by adjusting their p_offset values while
keeping them congruent to p_vaddr modulo p_align.  However, in
practice you will never see an ELF image that isn't already packed
that way.  Given that such overly clever code would in reality never
do anything different, I prefer the simplicity of the current
approach wherein I am replicating the file as it was (according to
its original headers) rather than doing any calculations (into which
I could introduce bugs) to freshly lay out the file.  I will change
it if you prefer, but be aware that the added complexity will never
get any testing unless we contrive ELF images unlike any that exist
in nature.


I've changed the couple of small things you cited, and here is the
new version of the function.


Thanks,
Roland



/* Create a new BFD as if by bfd_openr.  Rather than opening a file,
   reconstruct an ELF file by reading the segments out of remote memory
   based on the ELF file header at EHDR_VMA and the ELF program headers it
   points to.  If not null, *LOADBASEP is filled in with the difference
   between the VMAs from which the segments were read, and the VMAs the
   file headers (and hence BFD's idea of each section's VMA) put them at.

   The function TARGET_READ_MEMORY is called to copy LEN bytes from the
   remote memory at target address VMA into the local buffer at MYADDR; it
   should return zero on success or an `errno' code on failure.  TEMPL must
   be a BFD for a target with the word size and byte order found in the
   remote memory.  */

bfd *
NAME(bfd_elf,bfd_from_memory)
     (bfd *templ, bfd_vma ehdr_vma, bfd_vma *loadbasep,
      int (*target_read_memory) (bfd_vma vma, char *myaddr, int len))
{
  Elf_External_Ehdr x_ehdr;	/* Elf file header, external form */
  Elf_Internal_Ehdr i_ehdr;	/* Elf file header, internal form */
  Elf_External_Phdr *x_phdrs;
  Elf_Internal_Phdr *i_phdrs, *last_phdr;
  bfd *nbfd;
  struct bfd_in_memory *bim;
  int contents_size;
  char *contents;
  int err;
  unsigned int i;
  bfd_vma loadbase;

  /* Read in the ELF header in external format.  */
  err = target_read_memory (ehdr_vma, (char *) &x_ehdr, sizeof x_ehdr);
  if (err)
    {
      bfd_set_error (bfd_error_system_call);
      errno = err;
      return NULL;
    }

  /* Now check to see if we have a valid ELF file, and one that BFD can
     make use of.  The magic number must match, the address size ('class')
     and byte-swapping must match our XVEC entry.  */

  if (! elf_file_p (&x_ehdr)
      || x_ehdr.e_ident[EI_VERSION] != EV_CURRENT
      || x_ehdr.e_ident[EI_CLASS] != ELFCLASS)
    {
      bfd_set_error (bfd_error_wrong_format);
      return NULL;
    }

  /* Check that file's byte order matches xvec's */
  switch (x_ehdr.e_ident[EI_DATA])
    {
    case ELFDATA2MSB:		/* Big-endian */
      if (! bfd_header_big_endian (templ))
	{
	  bfd_set_error (bfd_error_wrong_format);
	  return NULL;
	}
      break;
    case ELFDATA2LSB:		/* Little-endian */
      if (! bfd_header_little_endian (templ))
	{
	  bfd_set_error (bfd_error_wrong_format);
	  return NULL;
	}
      break;
    case ELFDATANONE:		/* No data encoding specified */
    default:			/* Unknown data encoding specified */
      bfd_set_error (bfd_error_wrong_format);
      return NULL;
    }

  elf_swap_ehdr_in (templ, &x_ehdr, &i_ehdr);

  /* The file header tells where to find the program headers.
     These are what we use to actually choose what to read.  */

  if (i_ehdr.e_phentsize != sizeof (Elf_External_Phdr) || i_ehdr.e_phnum == 0)
    {
      bfd_set_error (bfd_error_wrong_format);
      return NULL;
    }

  x_phdrs = (Elf_External_Phdr *)
    bfd_malloc (i_ehdr.e_phnum * (sizeof *x_phdrs + sizeof *i_phdrs));
  if (x_phdrs == NULL)
    {
      bfd_set_error (bfd_error_no_memory);
      return NULL;
    }
  err = target_read_memory (ehdr_vma + i_ehdr.e_phoff, (char *) x_phdrs,
			    i_ehdr.e_phnum * sizeof x_phdrs[0]);
  if (err)
    {
      free (x_phdrs);
      bfd_set_error (bfd_error_system_call);
      errno = err;
      return NULL;
    }
  i_phdrs = (Elf_Internal_Phdr *) &x_phdrs[i_ehdr.e_phnum];

  contents_size = 0;
  last_phdr = NULL;
  loadbase = ehdr_vma;
  for (i = 0; i < i_ehdr.e_phnum; ++i)
    {
      elf_swap_phdr_in (templ, &x_phdrs[i], &i_phdrs[i]);
      if (i_phdrs[i].p_type == PT_LOAD)
	{
	  bfd_vma segment_end;
	  segment_end = (i_phdrs[i].p_offset + i_phdrs[i].p_filesz
			 + i_phdrs[i].p_align - 1) & -i_phdrs[i].p_align;
	  if (segment_end > (bfd_vma) contents_size)
	    contents_size = segment_end;

	  if ((i_phdrs[i].p_offset & -i_phdrs[i].p_align) == 0)
	    loadbase = ehdr_vma - (i_phdrs[i].p_vaddr & -i_phdrs[i].p_align);

	  last_phdr = &i_phdrs[i];
	}
    }

  if (last_phdr == NULL)
    {
      bfd_set_error (bfd_error_wrong_format);
      return NULL;
    }

  /* Trim the last segment so we don't bother with zeros in the last page
     that are off the end of the file.  However, if the extra bit in that
     page includes the section headers, keep them.  */
  if ((bfd_vma) contents_size > last_phdr->p_offset + last_phdr->p_filesz
      && (bfd_vma) contents_size >= (i_ehdr.e_shoff
				     + i_ehdr.e_shnum * i_ehdr.e_shentsize))
    {
      contents_size = last_phdr->p_offset + last_phdr->p_filesz;
      if ((bfd_vma) contents_size < (i_ehdr.e_shoff
				     + i_ehdr.e_shnum * i_ehdr.e_shentsize))
	contents_size = i_ehdr.e_shoff + i_ehdr.e_shnum * i_ehdr.e_shentsize;
    }
  else
    contents_size = last_phdr->p_offset + last_phdr->p_filesz;

  /* Now we know the size of the whole image we want read in.  */
  contents = (char *) bfd_zmalloc ((bfd_size_type) contents_size);
  if (contents == NULL)
    {
      free (x_phdrs);
      bfd_set_error (bfd_error_no_memory);
      return NULL;
    }

  for (i = 0; i < i_ehdr.e_phnum; ++i)
    if (i_phdrs[i].p_type == PT_LOAD)
      {
	bfd_vma start = i_phdrs[i].p_offset & -i_phdrs[i].p_align;
	bfd_vma end = (i_phdrs[i].p_offset + i_phdrs[i].p_filesz
		       + i_phdrs[i].p_align - 1) & -i_phdrs[i].p_align;
	if (end > (bfd_vma) contents_size)
	  end = contents_size;
	err = target_read_memory ((loadbase + i_phdrs[i].p_vaddr)
				  & -i_phdrs[i].p_align,
				  contents + start, end - start);
	if (err)
	  {
	    free (x_phdrs);
	    free (contents);
	    bfd_set_error (bfd_error_system_call);
	    errno = err;
	    return NULL;
	  }
      }
  free (x_phdrs);

  /* If the segments visible in memory didn't include the section headers,
     then clear them from the file header.  */
  if ((bfd_vma) contents_size < (i_ehdr.e_shoff
				 + i_ehdr.e_shnum * i_ehdr.e_shentsize))
    {
      memset (&x_ehdr.e_shoff, 0, sizeof x_ehdr.e_shoff);
      memset (&x_ehdr.e_shnum, 0, sizeof x_ehdr.e_shnum);
      memset (&x_ehdr.e_shstrndx, 0, sizeof x_ehdr.e_shstrndx);
    }

  /* This will normally have been in the first PT_LOAD segment.  But it
     conceivably could be missing, and we might have just changed it.  */
  memcpy (contents, &x_ehdr, sizeof x_ehdr);

  /* Now we have a memory image of the ELF file contents.  Make a BFD.  */
  bim = ((struct bfd_in_memory *)
	 bfd_malloc ((bfd_size_type) sizeof (struct bfd_in_memory)));
  if (bim == NULL)
    {
      free (contents);
      bfd_set_error (bfd_error_no_memory);
      return NULL;
    }
  nbfd = _bfd_new_bfd ();
  if (nbfd == NULL)
    {
      free (bim);
      free (contents);
      bfd_set_error (bfd_error_no_memory);
      return NULL;
    }
  nbfd->filename = "<in-memory>";
  nbfd->xvec = templ->xvec;
  bim->size = contents_size;
  bim->buffer = contents;
  nbfd->iostream = (PTR) bim;
  nbfd->flags = BFD_IN_MEMORY;
  nbfd->direction = read_direction;
  nbfd->mtime = time (NULL);
  nbfd->mtime_set = TRUE;

  if (loadbasep)
    *loadbasep = loadbase;
  return nbfd;
}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]