This is the mail archive of the binutils@sources.redhat.com mailing list for the binutils project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [ELF] symbol sets handling [2]

From: Fabio Alemagna <falemagn at studenti dot unina dot it>
To: Ian Lance Taylor <ian at airs dot com>
Cc: Alan Modra <amodra at bigpond dot net dot au>, Nick Clifton <nickc at redhat dot com>, <binutils at sources dot redhat dot com>
Date: Wed, 12 Feb 2003 16:40:15 +0100 (CET)
Subject: Re: [ELF] symbol sets handling [2]
On 10 Feb 2003, Ian Lance Taylor wrote:

> Fabio Alemagna <falemagn@studenti.unina.it> writes:
>
> > On 8 Feb 2003, Ian Lance Taylor wrote:
> > > You say that you do want to sort a collection of symbol set sections,
> > > but your examples appear to be ones of sorting symbols within a single
> > > symbol set section.
> >
> > It's the same thing, really, for how it works: all symbols with the same
> > priority go in the same section, and all symbol with different priorities
> > go in different sections (each one differing by the priority suffix), then
> > the SORT() ld command is used to sort those sections.
>
> I'm trying to ask what you want to achieve in the final program.

I think I made it clear already...?

> It seems clear to me that the two cases are not the same.

Well, they are, since the end result is the same.

> The way to achieve one goal is not the way to achieve a different
> goal.

The final goal is only one, and the ways you proposed both lead to that
goal.

> > It's just like it happens with c++ constructors/destructors, something
> > that already works with ld (look at the .ctors/.dtors sections in your ld
> > script), I'm not inventing anything new, I'd just like to make it work
> > with general symbol sets.
>
> C++ constructors/destructors in ELF are not handled in the same way
> that what I think of as symbol sets are handled.

Who said that? I said that I _would_ make symbol sets work the same way
c++ constructors/destructors are handled. Right now, of course, they are
handled differently.


> The common elements of C++ constructors/destructors and symbol sets
> are this: they are a simple list of symbols, such that the program at
> runtime can easily find the start of the list and length or the end of
> the list.

And I'm sure you see that, up to here, it's just like with symbol sets,
right?

> C++ constructors/destructors in ELF work because they are explicitly
> called out in the linker script.

Exactly. Now, what I'm looking for, is a way to make that _automated_, at
least for normal symbol sets.

> For historical reasons, the compiler
> cooperates by using two object files: crtbegin.o and crtend.o.  These
> object files define the special symbol which marks the start of the
> constructor/destructor lists (__CTOR_LIST__ and __DTOR_LIST__).  For
> historical reasons, the lists start with -1 and end with 0.  The
> program refers to the appropriate symbol, skips the first entry, and
> moves down the list until it finds a zero.

I know all that, but that's just an implementation detail. If you think of
it, there's no need for crt*.o object files, and there's also no need for
the -1 and the 0 in the list, it could be handled with "__start" and
"__stop" prefixed symbols as well. So, let's just forget about this
detail.

> The compiler puts all C++ constructors/destructors in sections with
> particular names: .ctors and .dtors.  When a constructor/destructor

That's not exact: the compiler put's the constructors/destructors
_symbol values_ (that is, their addresses) in those sections. The
functions themselves stay in the .text section.

> has a priority, the compiler includes the priority in the name of the
> section, e.g. .ctors.PRI.  The linker includes all such sections,
> sorted, in the final .ctors or .dtors section.  This happens because
> of explicit SORT statements in the linker script.

Ok, focus on that now: that's _exactly_ what I proposed to do for symbol
sets. Substitute "constructor/destructor" with "symbol" and substitute
".ctors and .dtors" with "set's name", and you have what I proposed.


> That's how C++ constructors/destructors work.  Symbol sets, at least
> what I call symbol sets, work in a completely different fashion.

They work just like constructors/destructors without priority, except that
they're not explicitely handled in the linker script and their name
doesn't have to begin with a '.' (and of course they also differ for the
implementation details I depicted above). Apart from those, the end result
is the same: more sections are joined together.

> To implement symbol sets, the linker provides a simple enhancement.
> The linker already naturally groups all input sections with the same
> name together.  That is, all input sections named .foo will be grouped
> together into a single output section named .foo.  The support for
> symbol sets is this: if the output section name can be represented as
> a C identifier, then the linker will automatically define special
> symbols which the program may use at runtime to identify the start and
> end of the output section.
>
> This feature may be used in a variety of ways.  To use it to implement
> symbol sets, the program will normally define a section which contains
> only the value of a symbol.  The linker will group all sections with
> the same name together, and the program can then at runtime use the
> specially defined symbol to find the set of all symbols with the same
> name.
>
> This feature permits an arbitrary number of symbol sets, without
> requiring them to be explicitly called out in the linker script.  Note
> that the program must normally still be aware of the name of each
> interesting symbol set (although one could imagine building a set of
> symbol sets).
>
>
> Anyhow, as can be seen, there are two different mechanisms for two
> different results.

Not quite, as I showed above.

Now, the only improvement I'd make to the symbol sets is that I'd make it
possible to handle prioritized symbols.

As said, it would work exactly like with the construcots/destructos
(except for the implementation details).

Let's make an example.

Here's a list of symbols with their priorities and the object files in
which those symbols are defined. I want to put them in the set "myset",
and I want the symbols to be sorted on the basis of their priorities, in
ascending order. Symbols without priority go at the end of the sorted
list.

Object file | Symbol | Priority
------------+--------+---------
a.o         | A      | 10
b.o         | B      ! 3
b.o         | C      | 7
c.o         | D      | <no priority>
c.o         | E      | 12
d.o         | F      | <no priority>

Now let's sort them:

Object file | Symbol | Priority
------------+--------+---------
b.o         | B      | 3
b.o         | C      ! 7
a.o         | A      | 10
c.o         | E      | 12
c.o         | D      | <no priority>
d.o         | F      | <no priority>

How do we achieve that? It's very simple, inded.

Each symbol goes in a section whose name is the set's name. If the symbol
has a priority, a dot and the priority number are added to the section's
name (just like with constructors/destructors, you see?).

Here's a list of symbols, with their sections and the object files in
which they are defined:

Object file | Symbol | Section[.<symbol's priority>]
------------+--------+------------------------------
a.o         | A      | myset.10
b.o         | B      ! myset.3
b.o         | C      | myset.7
c.o         | D      | myset
c.o         | E      | myset.12
d.o         | F      | myset

Now, _IF_ the ld script contained this piece

    __start_myset = .;
    myset :
    {
        *(SORT(.ctors.*))
        *(.dtors)
    }
    __stop_myset = .;

The final executable would contain a section named "myset" whose content
would be the content of the sections in the above list, as if the ld
script contained this piece:

__start_myset = .;
myset :
{
    b.o(myset.3)
    b.o(myset.7)
    a.o(myset.10)
    c.o(myset.12)
    c.o(myset)
    d.o(myset)
}
__stop_myset = .;

And the end result would be that the final executable would contain a
section whose content is an array of symbol values (where the value is the
address of the symbol in the process image), which would look like this in
C:

    void *__start_myset =
    {
        &B, &C, &A, &E, &D, &F
    };

    void *__stop_myset = __start_myset + sizeof (__start_myset);

So, you see, the symbols are ordered on the basis of their priorities, and
everything has been done just by using the linker.

That's basically how I do it now, except that the ld script's pieces that
build the sets are automatically generated by a post processing tool. This
tool acts like a wrapper around ld: it first invokes ld on the given files
with the -r flag, then it feeds the generated file to objdump, then the
output of objdump is searched for section names maching a given pattern,
from these section names the set name is extracted and ld script snippets
generated, these snippets are then merged into a preexisting ld script and
the resulting ld script is fed to ld, together with the previously
generated object file, and a final link is done.

Now, I just want to get rid of that postprocessor and do everything in ld.

> > That's what I proposed... But there's no need for ".gnu_symbol_set", it
> > can be done just by extending the way symbol sets are handled now.
>
> What do you mean by this?  In what way would you extend the way symbol
> sets are handled now?  Can you write down precisely the steps you
> propose that the linker should take?

If I knew the steps the linker should take I would have implemented it
already, don't you think? :) The thing is that I don't know how to
implement it in the emulation script, so I don't know which steps
should ld take, however I showed above how I'd like it to work.

> Remember that right now the linker does not know anything about symbol
> sets, at least as I define them.  What the linker knows is that when a
> section can be named as a C identifier, it will define a __start and
> __stop symbol.
>
> > My
> > problem is that I'm not sure how to handle the "extended" situation... How
> > would you do it in case I had to use ".gnu_symbol_set"?
>
> Well, guessing at what you want to achieve, I would say that when the
> place_orphan function sees an orphan section whose name starts with
> .gnu_symbol_set, it sets the output section name by dropping the
> PRIORITY field.  It defines __start and __stop symbols based on the
> NAME field.  It then places the input section in the output section
> sorted by the full input section name.

Yes, that I know, the problem is: how do I do the sorting? Should I do it
by hand (meaning  should I implement it from scratch), or can I somehow
use the SORT() ld command?

Fabio Alemagna
Follow-Ups:
- Re: [ELF] symbol sets handling [2]
  - From: Fabio Alemagna
- Re: [ELF] symbol sets handling [2]
  - From: Fabio Alemagna
- Re: [ELF] symbol sets handling [2]
  - From: Ian Lance Taylor
References:
- Re: [ELF] symbol sets handling [2]
  - From: Ian Lance Taylor
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]