This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: IFUNC resolver scheduling (bugs 21041, 20019)


On 01/26/2017 02:37 AM, Carlos O'Donell wrote:
On 01/25/2017 08:57 AM, Florian Weimer wrote:
I tried to fix most of the issues involving IFUNC resolvers and their
access to unrelocated symbols.

Thank you very much for tackling this somewhat complicated topic.

Like the issue of constructors and destructions, IFUNC is not a trivially
black and white issue, and some trade-offs need to be made.

My first attempt (on branch fw/bug21041) records all relocations
against IFUNC resolvers during elf_machine_rela processing, and adds
another pass to apply them.

I like this design. It's relatively straight forward to understand,
audit and maintain.

The general approach is to execute IFUNC resolvers
deepest-dependency-first (based on the object in which the IFUNC
resolver resides, not the relocation which references it).  Most of
the time, this ensures that IFUNC resolvers do not encounter
unrelocated symbols because IFUNC resolvers from their dependencies
have already executed, patching in the missing relocation.  Symbol
interposition can still bypass that, though.

Is there a problem with symbol interposition?

Symbol interposition can make available to IFUNC resolver function definitions which reside in objects which have not been fully relocated yet. This is what happens in the initial test case for bug 21041 (libpthread interposes longjmp). Delaying IFUNC relocations should fix this particular test case because the non-IFUNC relocations for libpthread are processed before the IFUNC resolver in libpthread runs. But is not a general solution to this kind of problem.

The branch is currently slightly buggy because it reverses the order
of relocation processing within the same object, and it does not
delay copy relocations, so those could copy unrelocated data.
(Support for copy relocations whose source address is determined by
an IFUNC resolver is missing as well, but this is a separate matter
and quite easy to fix.)

I don't think there is any real bug in reversing the order of the
IRELATIVE relocation handling.  What matters more is that other
earlier listed relocations are completely handled in order that the
resolver function may run correctly (where possible, and if not possible
then at least deterministic).

Right. It might still be beneficial to process these relocations in increasing order.

Isn't delaying copy relocations and support for copy relocations whose
source address is determined by an IFUNC resolver the same thing?

No, I don't think so.

The latter is simply a missing switch case in the handler for delayed IFUNC processing. The former is a relocation ordering issue.

Support for DT_TEXTREL is currently missing, too.  This needs another
preparatory patch like the one to delayed RELRO activation.  (For
dlopen, the RELRO activation delay does a bit too much work because
it mprotects everything again, not just the newly allocated
objects.)

DT_TEXTREL shouldn't be too different, just filling the right pieces
given the framework you already have in place?

Yes, it's similar to RELRO processing.

I think Szabolcs makes a good point here, in that we should consider:

* What is the cost of a second walk?

I fear it is substantial. I'm attaching some crude scripts which I used to gather statistics.

For example, I get this for /bin/bash:

Section '.rela.dyn' contains 2512 relocations:
  19 IFUNC relocations from index 2294 to 2496, interval size 203 (92.3%)
  7 COPY relocations from index 2505 to 2511, interval size 7 (0.3%)

In general, we typically have 10% of relocations which we have redo per object. But if we look at only those relocations which require symbol lookup, the percentage is much higher (92.3% in the bash example).

The main problem with the two-pass approach is that there is no place where it can cache symbol lookup results. If we assume that relocation performance is dominated by symbol lookups (which is reasonable IMHO), we are looking at a very substantial performance hit for the two-pass approach. Some of the performance hit could be avoided if we installed a magic function pointer as a temporary relocation result, and skip relocations which are not equal to that magic value on subsequent passes.

The next problem is that it's not actual two-pass. We'd need a separate pass for each IFUNC provider/consumer pair, I think. Otherwise, we would not perform IFUNC relocations in depth-first order anymore.

* What is the cost of only recording only that IRELATIVE relocs exist
  (recording the first one) and then starting the walk from there?

IRELATIVE relocations are very rare because you need symbol visibility management and IFUNCs in the same object. libc.so.6 has 8 of them. Some of them are likely accidents and unintentional. I'm not sure if they are a good target for optimization.

The combreloc feature sorts relocations of the same type together.
The static linker could sort relocations involving IFUNCs together,
and we could recognize this in the dynamic linker and record spans of
relocations instead individual relocations.  This would reduce the
number of delayed relocation records we'd have to allocate.

This is exactly what POWER does for OPD-related relocs (Alan Modra
explained this to me once).

I think this would be a reasonable optimization, but it's just that,
an optimization. Since you can always record the first record, and
start the walk there for the given object?

It's a very important optimization because it ensures we only re-do the symbol lookups we absolutely have to do. Even if we have the optimized symbol ordering, a newly added IFUNC could easily cause a 25% performance hit for relocation processing because the symbol range is much larger again.

Thanks,
Florian

Attachment: list-ifunc-symbols.sh
Description: application/shellscript

#!/usr/bin/python3

import io
import re
import subprocess
import sys

ifunc_symbol_path, object_path = sys.argv[1:]

def load_ifunc_symbols(path):
    result = set()
    with open(path) as f:
        for line in f:
            line = line.strip()
            if line:
                result.add(line)
    return result

ifunc_symbols = load_ifunc_symbols(ifunc_symbol_path)

RE_RELOCATION_SECTION = re.compile(
    r"^Relocation section \[[^\]]+\] '([^']+)'(?: for section \[[^\]]+\] '[^']+')? at offset 0x[0-9a-f]+ contains (\d+) entries:$")
RE_RELOCATION = re.compile(
    r"^  0x[0-9a-f]+  (\S+) +(?:0x)?[0-9a-f]+ +[+-]\d+ +(?:(\S+))?$")

IRELATIVE_RELOCATIONS = set("386_IRELATIVE X86_64_IRELATIVE".split())
COPY_RELOCATIONS = set("386_COPY X86_64_COPY".split())

def report_section():
    if section_count != count:
        raise IOError("section {r!} count mismatch: expected {}, actual {}".format(
            relocation_section, section_count, count))
    print("Section {!r} contains {} relocations:".format(
        relocation_section, section_count))
    def report_relocs(name, relocs, total):
        if relocs:
            start = min(relocs)
            end = max(relocs)
            length = end - start + 1
            fraction = float(length) / total * 100
            print("  {} {} relocations from index {} to {}, interval size {} ({:.1f}%)".format(
                len(relocs), name, start, end, length, fraction))
    report_relocs("IFUNC", ifuncs, len(symbol_relocs))
    report_relocs("IRELATIVE", irelatives, section_count)
    report_relocs("COPY", copies, section_count)
    
readelf = subprocess.Popen(("eu-readelf", "-r", "--", object_path),
                           stdout = subprocess.PIPE)
state = 0
for line in io.TextIOWrapper(readelf.stdout):
    if state == 0:
        if line != '\n':
            match = RE_RELOCATION_SECTION.match(line)
            if match is None:
                raise IOError(
                    "expecting relocation section: {!r}".format(line))
            relocation_section, section_count = match.groups()
            section_count = int(section_count)
            count = 0
            ifuncs = []
            irelatives = []
            copies = []
            symbol_relocs = []
            state = 1
    elif state == 1:
        if line.startswith("  Offset"):
            state = 2
        else:
            raise IOError("expecting table header: {!r}".format(line))
    elif state == 2:
        if line == '\n':
            report_section()
            state = 0
        else:
            match = RE_RELOCATION.match(line)
            if match is None:
                raise IOError("expecting relocation: {!r}".format(line))
            rtype, symbol = match.groups()
            if rtype in IRELATIVE_RELOCATIONS:
                irelatives.append(count)
            elif rtype in COPY_RELOCATIONS:
                copies.append(count)
            elif symbol:
                symbol_relocs.append(count)
                if symbol in ifunc_symbols:
                    ifuncs.append(count)
            count += 1
if readelf.wait() != 0:
    sys.exit(1)
if count > 0:
    report_section()

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]