This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: RFC: TLS improvements for IA32 and AMD64/EM64T
- From: Alexandre Oliva <aoliva at redhat dot com>
- To: "Menezes, Evandro" <evandro dot menezes at amd dot com>
- Cc: "Jan Beulich" <JBeulich at novell dot com>, "Michael Matz" <matz at suse dot de>, discuss at x86-64 dot org, "Andreas Jaeger" <aj at suse dot de>, binutils at sources dot redhat dot com, libc-alpha at sources dot redhat dot com
- Date: Sun, 08 Oct 2006 17:52:46 -0300
- Subject: Re: RFC: TLS improvements for IA32 and AMD64/EM64T
- References: <84EA05E2CA77634C82730353CBE3A8430346AB84@SAUSEXMB1.amd.com>
Hi, Evandro,
Sorry that it took so long for me to get back to you after the GCC
Summit. I've been quite busy and couldn't focus on this issue for a
while.
Here's an updated patch the should address all of your concerns. The
proposed ABI changes haven't changed at all for almost a year, and in
the mean time we've ported it to one more platform (ARM), so I believe
this is rock solid now.
Let me know what you think about the proposed changes. They document
what's implemented in GNU binutils, GCC and the pending patches I have
for glibc, that I'm retesting after updating them to a current tree.
Thanks,
for ChangeLog
from Alexandre Oliva <aoliva@redhat.com>
* object-files.tex (Relocation Types): Add
R_X86_64_GOTPC32_TLSDESC, R_X86_64_TLSDESC_CALL and
R_X86_64_TLSDESC. Add pointer to description. Add short
description of all TLS relocations. Fix typo in DTPMOD64.
* dl.tex (Procedure Linkage Table): Mention lazy relocation of TLS
descriptors. Add short description.
Index: dl.tex
===================================================================
--- dl.tex.orig 2006-10-08 16:53:13.000000000 -0300
+++ dl.tex 2006-10-08 17:39:44.000000000 -0300
@@ -265,6 +265,22 @@ evaluates procedure linkage table entrie
resolution and relocation until the first execution of a table entry.
\index{procedure linkage table|)}
+Relocation entries of type \codeindex{R_X86_64_TLSDESC} may also be
+subject to lazy relocation, using a single entry in the procedure
+linkage table and in the global offset table, at locations given by
+\texttt{DT_TLSDESC_PLT} and \texttt{DT_TLSDESC_GOT}, respectively, as
+described in ``Thread-Local Storage Descriptors for IA32 and
+AMD64/EM64T''\footnote{This document is currently available via
+ \url{http://people.redhat.com/aoliva/writeups/TLS/RFC-TLSDESC-x86.txt}}.
+
+For self-containment, \texttt{DT_TLSDESC_GOT} specifies a GOT entry in
+which the dynamic loader should store the address of its internal TLS
+Descriptor resolver function, whereas \texttt{DT_TLSDESC_PLT}
+specifies the address of a PLT entry to be used as the TLS descriptor
+resolver function for lazy resolution from within this module. The
+PLT entry must push the linkmap of the module onto the stack and
+tail-call the internal TLS Descriptor resolver function.
+
\subsubsection{Large Models}
In the small and medium code models the size of both the PLT and the GOT
Index: object-files.tex
===================================================================
--- object-files.tex.orig 2006-10-08 16:53:13.000000000 -0300
+++ object-files.tex 2006-10-08 17:46:49.000000000 -0300
@@ -435,7 +435,7 @@ the relocation addend.
\texttt{R_X86_64_PC16} & 13 & \textit{word16} & \texttt{S + A - P} \\
\texttt{R_X86_64_8} & 14 & \textit{word8} & \texttt{S + A} \\
\texttt{R_X86_64_PC8} & 15 & \textit{word8} & \texttt{S + A - P} \\
- \texttt{R_X86_64_DPTMOD64} & 16 & \textit{word64} & \\
+ \texttt{R_X86_64_DTPMOD64} & 16 & \textit{word64} & \\
\texttt{R_X86_64_DTPOFF64} & 17 & \textit{word64} & \\
\texttt{R_X86_64_TPOFF64} & 18 & \textit{word64} & \\
\texttt{R_X86_64_TLSGD} & 19 & \textit{word32} & \\
@@ -448,6 +448,9 @@ the relocation addend.
\texttt{R_X86_64_GOTPC32} & 26 & \textit{word32} & \texttt{GOT + A - P} \\
\texttt{R_X86_64_SIZE32} & 32 & \textit{word32} & \texttt{Z + A} \\
\texttt{R_X86_64_SIZE64} & 33 & \textit{word64} & \texttt{Z + A} \\
+ \texttt{R_X86_64_GOTPC32_TLSDESC} & 34 & \textit{word32} & \\
+ \texttt{R_X86_64_TLSDESC_CALL} & 35 & none & \\
+ \texttt{R_X86_64_TLSDESC} & 36 & \textit{word64}$\times 2$ & \\
% \texttt{R_X86_64_GOT64} & 16 & \textit{word64} & \texttt{G + A} \\
% \texttt{R_X86_64_PLT64} & 17 & \textit{word64} & \texttt{L + A - P} \\
\end{tabular}
@@ -469,6 +472,7 @@ to those used for the \intelabi. \footn
loading the offset into a displacement register; the base plus
immediate displacement addressing form can be used.}
+\begin{sloppypar}
The \texttt{R_X86_64_GOTPCREL} relocation has different semantics from the
\texttt{R_X86_64_GOT32} or equivalent i386 \texttt{R_I386_GOTPC} relocation.
In particular, because the \xARCH architecture has an addressing mode relative
@@ -477,6 +481,7 @@ using a single instruction. The calcula
\texttt{R_X86_64_GOTPCREL} relocation gives the difference between the location
in the GOT where the symbol's address is given and the location where the
relocation is applied.
+\end{sloppypar}
\begin{sloppypar}
The \texttt{R_X86_64_32} and \texttt{R_X86_64_32S} relocations truncate
@@ -492,19 +497,72 @@ relocations is not conformant to this AB
added for documentation purposes. The \texttt{R_X86_64_16}, and
\texttt{R_X86_64_8} relocations truncate the computed value to 16-bits
resp. 8-bits.
+\end{sloppypar}
-The relocations \texttt{R_X86_64_DPTMOD64},
-\texttt{R_X86_64_DTPOFF64}, \texttt{R_X86_64_TPOFF64} ,
-\texttt{R_X86_64_TLSGD} , \texttt{R_X86_64_TLSLD} ,
+\begin{sloppypar}
+The relocations \texttt{R_X86_64_DTPMOD64},
+\texttt{R_X86_64_DTPOFF64}, \texttt{R_X86_64_TPOFF64},
+\texttt{R_X86_64_TLSGD}, \texttt{R_X86_64_TLSLD},
\texttt{R_X86_64_DTPOFF32}, \texttt{R_X86_64_GOTTPOFF} and
\texttt{R_X86_64_TPOFF32} are listed for completeness. They are part
of the Thread-Local Storage ABI extensions and are documented in the
document called ``ELF Handling for Thread-Local
Storage''\footnote{This document is currently available via
- \url{http://people.redhat.com/drepper/tls.pdf}}\index{Thread-Local Storage}.
+ \url{http://people.redhat.com/drepper/tls.pdf}}\index{Thread-Local
+ Storage}. The relocations \texttt{R_X86_64_GOTPC32_TLSDESC},
+\texttt{R_X86_64_TLSDESC_CALL} and \texttt{R_X86_64_TLSDESC} are also
+used for Thread-Local Storage, but are not documented there as of this
+writing. A description can be found in the document ``Thread-Local
+Storage Descriptors for IA32 and AMD64/EM64T''\footnote{This document
+ is currently available via
+ \url{http://people.redhat.com/aoliva/writeups/TLS/RFC-TLSDESC-x86.txt}}.
+\end{sloppypar}
+
+In order to make this document self-contained, a description of the
+TLS relocations follows.
+\begin{sloppypar}
+\texttt{R_X86_64_DTPMOD64} resolves to the index of the dynamic thread
+vector entry that points to the base address of the TLS block
+corresponding to the module that defines the referenced symbol.
+\texttt{R_X86_64_DTPOFF64} and \texttt{R_X86_64_DTPOFF32} compute the
+offset from the pointer in that entry to the referenced symbol. The
+linker generates such relocations in adjacent entries in the GOT, in
+response to \texttt{R_X86_64_TLSGD} and \texttt{R_X86_64_TLSLD}
+relocations. If the linker can compute the offset itself, because the
+referenced symbol binds locally, the \texttt{DTPOFF} may be omitted.
+Otherwise, such relocations are always in pairs, such that the
+\texttt{DTPOFF64} relocation applies to the word64 right past the
+corresponding \texttt{DTPMOD} relocation.
\end{sloppypar}
+\texttt{R_X86_64_TPOFF64} and \texttt{R_X86_64_TPOFF32} resolve to the
+offset from the thread pointer to a thread-local variable. The former
+is generated in response to \texttt{R_X86_64_GOTTPOFF}, that resolves
+to a PC-relative address of a GOT entry containing such a 64-bit
+offset.
+
+\texttt{R_X86_64_TLSGD} and \texttt{R_X86_64_TLSLD} both resolve to
+PC-relative offsets to a \texttt{DTPMOD} GOT entry. The difference
+between them is that, for \texttt{TLSGD}, the following GOT entry will
+contain the offset of the referenced symbol into its TLS block,
+whereas, for \texttt{TLSLD}, the following GOT entry will contain the
+offset for the base address of the TLS block. The idea is that adding
+this offset to the result of \texttt{DTPMOD32} for a symbol ought to
+yield the same as the result of \texttt{DTPMOD64} for the same symbol.
+
+\texttt{R_X86_64_TLSDESC} resolves to a pair of word64s, called TLS
+Descriptor, the first of which is a pointer to a function, followed by
+an argument. The function is passed a pointer to the this pair of
+entries in \%rax and, using the argument in the second entry, it must
+compute and return in \%rax the offset from the thread pointer to the
+symbol referenced in the relocation, without modifying any registers
+other than processor flags. \texttt{R_X86_64_GOTPC32_TLSDESC}
+resolves to the PC-relative address of a TLS descriptor corresponding
+to the named symbol. \texttt{R_X86_64_TLSDESC_CALL} must annotate the
+instruction used to call the TLS Descriptor resolver function, so as
+to enable relaxation of that instruction.
+
\subsection{Large Models}
In order to extend both the PLT and the GOT beyond 2GB, it
On Sep 19, 2005, "Menezes, Evandro" <evandro.menezes@amd.com> wrote:
> Alexandre,
>> Please read the document referenced in the patch, for
>> starters. In it you'll see the exact spelling of the coding
>> samples is not final yet, and it doesn't make sense to
>> maintain yet another copy of this until it settles down.
> When it does, it'll be added to the ABI then. Not before. For now, it's OK to reserve the relocation numbers in this mailing list.
>> Also, you'll find that the calculations are not quite
>> possible to express in the way other relocations are
>> expressed; suggestions are welcome.
> State so, perhaps in a note, expanding what they mean.
>> Finally, what's wrong
>> with following the existing practice of referring to TLS
>> specs elsewhere?
> The intent is that the x86-64 ABI remains a stand-alone document as much as possible. It's not quite there yet, but adding yet another external reference sets it back even further.
> BTW, the TLS reference is slated to be incorporated into the x86-64 ABI.
>> The point of this posting was more to reserve the relocation
>> numbers for these purposes (the purpose of the relocations is
>> quite solid already, even though the numbers have changed as
>> recently as yesterday), but I'm yet to do some more
>> performance tests with some minor variations of the code
>> sequences to choose the best one. I don't want to have to
>> maintain all this information in sync between multiple specs
>> documents and the several different packages that implement
>> them; having a single specs document is much better for now.
> That's fine. When it reaches a mature state, patches against the ABI will be more than welcome.
--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
Secretary for FSF Latin America http://www.fsfla.org/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}