This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: TLS improvements for IA32 and AMD64/EM64T


On Sep 17, 2005, Alexandre Oliva <aoliva@redhat.com> wrote:

> On Sep 16, 2005, Alexandre Oliva <aoliva@redhat.com> wrote:
>> On Sep 16, 2005, Alexandre Oliva <aoliva@redhat.com> wrote:
>>> Over the past few months, I've been working on porting to IA32 and
>>> AMD64/EM64T the interesting bits of the TLS design I came up with for
>>> FR-V, achieving some impressive speedups along with slight code size
>>> reductions in the most common cases.

>>> Although the design is not set in stone yet, it's fully implemented
>>> and functional with patches I'm about to post for binutils, gcc and
>>> glibc mainline, as follow-ups to this message, except that the GCC
>>> patch will go to gcc-patches, as expected.

>> This is the glibc portion of the implementation.  I've built it for
>> amd64-linux-gnu and i686-pc-linux-gnu, with both the current TLS
>> dialect and the new -mtls-dialect=gnu2, without regressions.

> Revised patch using new relocation and dynamic entry numbers.  Tested
> again on all 4 combinations mentioned above.

Revised patch that does not include other changes, fixes a few
inconsistent uses of addends here and there (copy&pastos), avoids
crashing when lazily resolving TLSDESC relocations to weak symbols
that turn out to be undefined, and, as a bonus, handles them correctly
such that their address does map to NULL for all threads.  I know
people shouldn't rely on this, since the linker may very well break
it, but hey, since I was already tweaking the code to avoid crashes,
why not take the final step and make it work as closely as possible to
a random person's expectations? :-)

Index: ChangeLog
from  Alexandre Oliva  <aoliva@redhat.com>

	Introduce TLS descriptors for i386 and x86_64.
	* elf/dl-reloc.c (_dl_try_allocate_static_tls): Extract from...
	(_dl_allocate_static_tls): ... here.  Rearrange failure path.
	(TRY_STATIC_TLS): New macro.
	* elf/dl-conflict.c (TRY_STATIC_TLS): Dummy define.
	* elf/elf.h (DT_TLSDESC_GOT, DT_TLSDESC_PLT): Define.
	(R_386_TLS_GOTDESC, R_386_TLS_DESC_CALL, R_386_TLS_DESC): Define.
	(R_X86_64_PC64, R_X86_GOTOFF64, R_X86_64_GOTPC32): Merge from
	binutils.
	(R_X86_64_GOTPC32_TLSDESC, R_X86_64_TLSDESC_CALL,
	R_X86_64_TLSDESC): Define.
	(R_386_NUM, R_X86_64_NUM): Adjust.
	* sysdeps/i386/Makefile (sysdep-dl-routines, sysdep_routines,
	systep-rtld-routines): Add tlsdesc and dl-tlsdesc for elf subdir.
	(gen-as-const-headers): Add tlsdesc.sym to csu subdir.
	* sysdeps/i386/dl-lookupcfg.h: New file.  Introduce _dl_unmap to
	release tlsdesc_table.
	* sysdeps/i386/dl-machine.h: Include dl-tlsdesc.h.
	(elf_machine_type_class): Mark R_386_TLS_DESC as PLT class.
	(elf_machine_rel): Handle R_386_TLS_DESC.
	(elf_machine_rela): Likewise.
	(elf_machine_lazy_rel): Likewise.
	(elf_machine_lazy_rela): Likewise.
	* sysdeps/i386/dl-tls.h (struct dl_tls_index): Name it.
	* sysdeps/i386/dl-tlsdesc.S: New file.
	* sysdeps/i386/dl-tlsdesc.h: New file.
	* sysdeps/i386/tlsdesc.c: New file.
	* sysdeps/i386/tlsdesc.sym: New file.
	* sysdeps/i386/bits/linkmap.h (struct link_map_machine): Add
	tlsdesc_table.
	* sysdeps/x86_64/Makefile (sysdep-dl-routines, sysdep_routines,
	systep-rtld-routines): Add tlsdesc and dl-tlsdesc for elf subdir.
	(gen-as-const-headers): Add tlsdesc.sym to csu subdir.
	* sysdeps/x86_64/dl-lookupcfg.h: New file.  Introduce _dl_unmap to
	release tlsdesc_table.
	* sysdeps/x86_64/dl-machine.h: Include dl-tlsdesc.h.
	(elf_machine_runtime_setup): Set up lazy TLSDESC GOT entry.
	(elf_machine_type_class): Mark R_X86_64_TLSDESC as PLT class.
	(elf_machine_rel): Handle R_X86_64_TLSDESC.
	(elf_machine_rela): Likewise.
	(elf_machine_lazy_rel): Likewise.
	* sysdeps/x86_64/dl-tls.h (struct dl_tls_index): Name it.
	(__tls_get_addr): Do not declare for non-shared compiles.
	* sysdeps/x86_64/dl-tlsdesc.S: New file.
	* sysdeps/x86_64/dl-tlsdesc.h: New file.
	* sysdeps/x86_64/tlsdesc.c: New file.
	* sysdeps/x86_64/tlsdesc.sym: New file.
	* sysdeps/x86_64/bits/linkmap.h (struct link_map_machine): Add
	tlsdesc_table for both 32- and 64-bit structs.

Index: elf/dl-conflict.c
===================================================================
--- elf/dl-conflict.c.orig
+++ elf/dl-conflict.c
@@ -1,5 +1,5 @@
 /* Resolve conflicts against already prelinked libraries.
-   Copyright (C) 2001, 2002, 2003, 2004 Free Software Foundation, Inc.
+   Copyright (C) 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
    Contributed by Jakub Jelinek <jakub@redhat.com>, 2001.
 
@@ -45,6 +45,7 @@ _dl_resolve_conflicts (struct link_map *
 #define RESOLVE_MAP(ref, version, flags) (*ref = NULL, NULL)
 #define RESOLVE(ref, version, flags) (*ref = NULL, 0)
 #define CHECK_STATIC_TLS(ref_map, sym_map) ((void) 0)
+#define TRY_STATIC_TLS(ref_map, sym_map) (0)
 #define RESOLVE_CONFLICT_FIND_MAP(map, r_offset) \
   do {									      \
     while ((resolve_conflict_map->l_map_end < (ElfW(Addr)) (r_offset))	      \
Index: elf/dl-reloc.c
===================================================================
--- elf/dl-reloc.c.orig
+++ elf/dl-reloc.c
@@ -44,9 +44,9 @@
    This function intentionally does not return any value but signals error
    directly, as static TLS should be rare and code handling it should
    not be inlined as much as possible.  */
-void
-internal_function __attribute_noinline__
-_dl_allocate_static_tls (struct link_map *map)
+int
+internal_function
+_dl_try_allocate_static_tls (struct link_map *map)
 {
   /* If we've already used the variable with dynamic access, or if the
      alignment requirements are too high, fail.  */
@@ -54,8 +54,7 @@ _dl_allocate_static_tls (struct link_map
       || map->l_tls_align > GL(dl_tls_static_align))
     {
     fail:
-      _dl_signal_error (0, map->l_name, NULL, N_("\
-cannot allocate memory in static TLS block"));
+      return -1;
     }
 
 # if TLS_TCB_AT_TP
@@ -109,6 +108,20 @@ cannot allocate memory in static TLS blo
     }
   else
     map->l_need_tls_init = 1;
+
+  return 0;
+}
+
+void
+internal_function __attribute_noinline__
+_dl_allocate_static_tls (struct link_map *map)
+{
+  if (map->l_tls_offset == FORCED_DYNAMIC_TLS_OFFSET
+      || _dl_try_allocate_static_tls (map))
+    {
+      _dl_signal_error (0, map->l_name, NULL, N_("\
+cannot allocate memory in static TLS block"));
+    }
 }
 
 /* Initialize static TLS area and DTV for current (only) thread.
@@ -267,6 +280,12 @@ _dl_relocate_object (struct link_map *l,
 	_dl_allocate_static_tls (sym_map);				\
     } while (0)
 
+#define TRY_STATIC_TLS(map, sym_map)					\
+    (__builtin_expect ((sym_map)->l_tls_offset				\
+		       != FORCED_DYNAMIC_TLS_OFFSET, 1)			\
+     && (__builtin_expect ((sym_map)->l_tls_offset != NO_TLS_OFFSET, 1)	\
+	 || _dl_try_allocate_static_tls (sym_map) == 0))
+
 #include "dynamic-link.h"
 
     ELF_DYNAMIC_RELOCATE (l, lazy, consider_profiling);
Index: elf/elf.h
===================================================================
--- elf/elf.h.orig
+++ elf/elf.h
@@ -699,6 +699,12 @@ typedef struct
    If any adjustment is made to the ELF object after it has been
    built these entries will need to be adjusted.  */
 #define DT_ADDRRNGLO	0x6ffffe00
+#define DT_TLSDESC_PLT	0x6ffffef6	/* Location of PLT entry for
+					   TLS descriptor resolver
+					   calls.  */
+#define DT_TLSDESC_GOT	0x6ffffef7	/* Location of GOT entry used
+					   by TLS descriptor resolver
+					   PLT entry.  */
 #define DT_GNU_CONFLICT	0x6ffffef8	/* Start of conflict section */
 #define DT_GNU_LIBLIST	0x6ffffef9	/* Library list */
 #define DT_CONFIG	0x6ffffefa	/* Configuration information.  */
@@ -1136,8 +1142,17 @@ typedef struct
 #define R_386_TLS_DTPMOD32 35		/* ID of module containing symbol */
 #define R_386_TLS_DTPOFF32 36		/* Offset in TLS block */
 #define R_386_TLS_TPOFF32  37		/* Negated offset in static TLS block */
+/* 38? */
+#define R_386_TLS_GOTDESC  39		/* GOT offset for TLS descriptor.  */
+#define R_386_TLS_DESC_CALL 40		/* Marker of call through TLS
+					   descriptor for
+					   relaxation.  */
+#define R_386_TLS_DESC     41		/* TLS descriptor containing
+					   pointer to code and to
+					   argument, returning the TLS
+					   offset for the symbol.  */
 /* Keep this the last entry.  */
-#define R_386_NUM	   38
+#define R_386_NUM	   42
 
 /* SUN SPARC specific definitions.  */
 
@@ -2496,8 +2511,17 @@ typedef Elf32_Addr Elf32_Conflict;
 #define R_X86_64_GOTTPOFF	22	/* 32 bit signed PC relative offset
 					   to GOT entry for IE symbol */
 #define R_X86_64_TPOFF32	23	/* Offset in initial TLS block */
+#define R_X86_64_PC64		24	/* PC relative 64 bit */
+#define R_X86_64_GOTOFF64	25	/* 64 bit offset to GOT */
+#define R_X86_64_GOTPC32	26	/* 32 bit signed pc relative
+					   offset to GOT */
+/* 27 .. 33 */
+#define R_X86_64_GOTPC32_TLSDESC 34	/* GOT offset for TLS descriptor.  */
+#define R_X86_64_TLSDESC_CALL   35	/* Marker for call through TLS
+					   descriptor.  */
+#define R_X86_64_TLSDESC        36	/* TLS descriptor.  */
 
-#define R_X86_64_NUM		24
+#define R_X86_64_NUM		37
 
 
 /* AM33 relocations.  */
Index: sysdeps/i386/Makefile
===================================================================
--- sysdeps/i386/Makefile.orig
+++ sysdeps/i386/Makefile
@@ -65,3 +65,13 @@ endif
 ifneq (,$(filter -mno-tls-direct-seg-refs,$(CFLAGS)))
 defines += -DNO_TLS_DIRECT_SEG_REFS
 endif
+
+ifeq ($(subdir),elf)
+sysdep-dl-routines += tlsdesc dl-tlsdesc
+sysdep_routines += tlsdesc dl-tlsdesc
+sysdep-rtld-routines += tlsdesc dl-tlsdesc
+endif
+
+ifeq ($(subdir),csu)
+gen-as-const-headers += tlsdesc.sym
+endif
Index: sysdeps/i386/bits/linkmap.h
===================================================================
--- sysdeps/i386/bits/linkmap.h.orig
+++ sysdeps/i386/bits/linkmap.h
@@ -2,4 +2,5 @@ struct link_map_machine
   {
     Elf32_Addr plt; /* Address of .plt + 0x16 */
     Elf32_Addr gotplt; /* Address of .got + 0x0c */
+    void *tlsdesc_table; /* Address of TLS descriptor hash table.  */
   };
Index: sysdeps/i386/dl-lookupcfg.h
===================================================================
--- /dev/null
+++ sysdeps/i386/dl-lookupcfg.h
@@ -0,0 +1,28 @@
+/* Configuration of lookup functions.
+   Copyright (C) 2005 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, write to the Free
+   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307 USA.  */
+
+#define DL_UNMAP_IS_SPECIAL
+
+#include_next <dl-lookupcfg.h>
+
+struct link_map;
+
+extern void _dl_unmap (struct link_map *map);
+
+#define DL_UNMAP(map) _dl_unmap (map)
Index: sysdeps/i386/dl-machine.h
===================================================================
--- sysdeps/i386/dl-machine.h.orig
+++ sysdeps/i386/dl-machine.h
@@ -25,6 +25,7 @@
 #include <sys/param.h>
 #include <sysdep.h>
 #include <tls.h>
+#include <dl-tlsdesc.h>
 
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute__ ((unused))
@@ -248,7 +249,7 @@ _dl_start_user:\n\
 # define elf_machine_type_class(type) \
   ((((type) == R_386_JMP_SLOT || (type) == R_386_TLS_DTPMOD32		      \
      || (type) == R_386_TLS_DTPOFF32 || (type) == R_386_TLS_TPOFF32	      \
-     || (type) == R_386_TLS_TPOFF)					      \
+     || (type) == R_386_TLS_TPOFF || (type) == R_386_TLS_DESC)		      \
     * ELF_RTYPE_CLASS_PLT)						      \
    | (((type) == R_386_COPY) * ELF_RTYPE_CLASS_COPY))
 #else
@@ -375,6 +376,38 @@ elf_machine_rel (struct link_map *map, c
 	    *reloc_addr = sym->st_value;
 # endif
 	  break;
+	case R_386_TLS_DESC:
+	  {
+	    struct tlsdesc volatile *td =
+	      (struct tlsdesc volatile *)reloc_addr;
+
+# ifndef RTLD_BOOTSTRAP
+	    if (! sym)
+	      td->entry = _dl_tlsdesc_undefweak;
+	    else
+# endif
+	      {
+# ifndef RTLD_BOOTSTRAP
+#  ifndef SHARED
+		CHECK_STATIC_TLS (map, sym_map);
+#  else
+		if (!TRY_STATIC_TLS (map, sym_map))
+		  {
+		    td->arg = _dl_make_tlsdesc_dynamic
+		      (sym_map, sym->st_value + (ElfW(Word))td->arg);
+		    td->entry = _dl_tlsdesc_dynamic;
+		  }
+		else
+#  endif
+# endif
+		  {
+		    td->arg = (void*)(sym->st_value - sym_map->l_tls_offset
+				      + (ElfW(Word))td->arg);
+		    td->entry = _dl_tlsdesc_return;
+		  }
+	      }
+	    break;
+	  }
 	case R_386_TLS_TPOFF32:
 	  /* The offset is positive, backward from the thread pointer.  */
 # ifdef RTLD_BOOTSTRAP
@@ -488,6 +521,41 @@ elf_machine_rela (struct link_map *map, 
 	     Therefore the offset is already correct.  */
 	  *reloc_addr = (sym == NULL ? 0 : sym->st_value) + reloc->r_addend;
 	  break;
+	case R_386_TLS_DESC:
+	  {
+	    struct tlsdesc volatile *td =
+	      (struct tlsdesc volatile *)reloc_addr;
+
+# ifndef RTLD_BOOTSTRAP
+	    if (!sym)
+	      {
+		td->arg = (void*)reloc->r_addend;
+		td->entry = _dl_tlsdesc_undefweak;
+	      }
+	    else
+# endif
+	      {
+# ifndef RTLD_BOOTSTRAP
+#  ifndef SHARED
+		CHECK_STATIC_TLS (map, sym_map);
+#  else
+		if (!TRY_STATIC_TLS (map, sym_map))
+		  {
+		    td->arg = _dl_make_tlsdesc_dynamic
+		      (sym_map, sym->st_value + reloc->r_addend);
+		    td->entry = _dl_tlsdesc_dynamic;
+		  }
+		else
+#  endif
+# endif
+		  {
+		    td->arg = (void*)(sym->st_value - sym_map->l_tls_offset
+				      + reloc->r_addend);
+		    td->entry = _dl_tlsdesc_return;
+		  }
+	      }
+	  }
+	  break;
 	case R_386_TLS_TPOFF32:
 	  /* The offset is positive, backward from the thread pointer.  */
 	  /* We know the offset of object the symbol is contained in.
@@ -582,6 +650,55 @@ elf_machine_lazy_rel (struct link_map *m
 	*reloc_addr = (map->l_mach.plt
 		       + (((Elf32_Addr) reloc_addr) - map->l_mach.gotplt) * 4);
     }
+#ifdef USE_TLS
+  else if (__builtin_expect (r_type == R_386_TLS_DESC, 1))
+    {
+      struct tlsdesc volatile * __attribute__((__unused__)) td =
+	(struct tlsdesc volatile *)reloc_addr;
+
+      /* Handle relocations that reference the local *ABS* in a simple
+	 way, so as to preserve a potential addend.  */
+      if (ELF32_R_SYM (reloc->r_info) == 0)
+	td->entry = _dl_tlsdesc_resolve_abs_plus_addend;
+      /* Given a known-zero addend, we can store a pointer to the
+	 reloc in the arg position.  */
+      else if (td->arg == 0)
+	{
+	  td->arg = (void*)reloc;
+	  td->entry = _dl_tlsdesc_resolve_rel;
+	}
+      else
+	{
+	  /* We could handle non-*ABS* relocations with non-zero addends
+	     by allocating dynamically an arg to hold a pointer to the
+	     reloc, but that sounds pointless.  */
+	  const Elf32_Rel *const r = reloc;
+	  /* The code below was borrowed from elf_dynamic_do_rel().  */
+	  const ElfW(Sym) *const symtab =
+	    (const void *) D_PTR (map, l_info[DT_SYMTAB]);
+
+#ifdef RTLD_BOOTSTRAP
+	  /* The dynamic linker always uses versioning.  */
+	  assert (map->l_info[VERSYMIDX (DT_VERSYM)] != NULL);
+#else
+	  if (map->l_info[VERSYMIDX (DT_VERSYM)])
+#endif
+	    {
+	      const ElfW(Half) *const version =
+		(const void *) D_PTR (map, l_info[VERSYMIDX (DT_VERSYM)]);
+	      ElfW(Half) ndx = version[ELFW(R_SYM) (r->r_info)] & 0x7fff;
+	      elf_machine_rel (map, r, &symtab[ELFW(R_SYM) (r->r_info)],
+			       &map->l_versions[ndx],
+			       (void *) (l_addr + r->r_offset));
+	    }
+#ifndef RTLD_BOOTSTRAP
+	  else
+	    elf_machine_rel (map, r, &symtab[ELFW(R_SYM) (r->r_info)], NULL,
+			     (void *) (l_addr + r->r_offset));
+#endif
+	}
+    }
+#endif
   else
     _dl_reloc_bad_type (map, r_type, 1);
 }
@@ -593,6 +710,22 @@ __attribute__ ((always_inline))
 elf_machine_lazy_rela (struct link_map *map,
 		       Elf32_Addr l_addr, const Elf32_Rela *reloc)
 {
+#ifdef USE_TLS
+  Elf32_Addr *const reloc_addr = (void *) (l_addr + reloc->r_offset);
+  const unsigned int r_type = ELF32_R_TYPE (reloc->r_info);
+  if (__builtin_expect (r_type == R_386_JMP_SLOT, 1))
+    ;
+  else if (__builtin_expect (r_type == R_386_TLS_DESC, 1))
+    {
+      struct tlsdesc volatile * __attribute__((__unused__)) td =
+	(struct tlsdesc volatile *)reloc_addr;
+
+      td->arg = (void*)reloc;
+      td->entry = _dl_tlsdesc_resolve_rela;
+    }
+  else
+    _dl_reloc_bad_type (map, r_type, 1);
+#endif
 }
 
 #endif	/* !RTLD_BOOTSTRAP */
Index: sysdeps/i386/dl-tls.h
===================================================================
--- sysdeps/i386/dl-tls.h.orig
+++ sysdeps/i386/dl-tls.h
@@ -19,7 +19,7 @@
 
 
 /* Type used for the representation of TLS information in the GOT.  */
-typedef struct
+typedef struct dl_tls_index
 {
   unsigned long int ti_module;
   unsigned long int ti_offset;
Index: sysdeps/i386/dl-tlsdesc.S
===================================================================
--- /dev/null
+++ sysdeps/i386/dl-tlsdesc.S
@@ -0,0 +1,228 @@
+/* Thread-local storage handling in the ELF dynamic linker.  i386 version.
+   Copyright (C) 2004, 2005 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, write to the Free
+   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307 USA.  */
+
+#include <sysdep.h>
+#include <tls.h>
+#include "tlsdesc.h"
+
+	.text
+#ifdef USE_TLS
+	.hidden _dl_tlsdesc_return
+	.global	_dl_tlsdesc_return
+	.type	_dl_tlsdesc_return,@function
+	cfi_startproc
+	.align 16
+_dl_tlsdesc_return:
+	movl	4(%eax), %eax
+	ret
+	cfi_endproc
+	.size	_dl_tlsdesc_return, .-_dl_tlsdesc_return
+
+	.hidden _dl_tlsdesc_undefweak
+	.global	_dl_tlsdesc_undefweak
+	.type	_dl_tlsdesc_undefweak,@function
+	cfi_startproc
+	.align 16
+_dl_tlsdesc_undefweak:
+	movl	4(%eax), %eax
+	subl	%gs:0, %eax
+	ret
+	cfi_endproc
+	.size	_dl_tlsdesc_undefweak, .-_dl_tlsdesc_undefweak
+
+#ifdef SHARED
+	.hidden _dl_tlsdesc_dynamic
+	.global	_dl_tlsdesc_dynamic
+	.type	_dl_tlsdesc_dynamic,@function
+
+     /* %eax points to the TLS descriptor, such that 0(%eax) points to
+	_dl_tlsdesc_dynamic itself, and 4(%eax) points to a struct
+	tlsdesc_dynamic_arg object.  It must return in %eax the offset
+	between the thread pointer and the object denoted by the
+	argument, without clobbering any registers.
+
+	The assembly code that follows is a rendition of the following
+	C code, hand-optimized a little bit.
+
+ptrdiff_t
+__attribute__ ((__regparm__ (1)))
+_dl_tlsdesc_dynamic (struct tlsdesc *tdp)
+{
+  struct tlsdesc_dynamic_arg *td = tdp->arg;
+  dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
+  if (__builtin_expect (td->gen_count <= dtv[0].counter
+			&& (dtv[td->tlsinfo.ti_module].pointer.val
+			    != TLS_DTV_UNALLOCATED),
+			1))
+    return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
+      - __thread_pointer;
+
+  return ___tls_get_addr (&td->tlsinfo) - __thread_pointer;
+}
+*/
+	cfi_startproc
+	.align 16
+_dl_tlsdesc_dynamic:
+	/* Preserve call-clobbered registers.
+	   We need two scratch regs anyway.
+	   FIXME: maybe remove the requirement to preserve them?  */
+	subl	$28, %esp
+	cfi_adjust_cfa_offset (28)
+	movl	%ecx, 20(%esp)
+	movl	%edx, 24(%esp)
+	movl	TLSDESC_ARG(%eax), %eax
+	movl	%gs:DTV_OFFSET, %edx
+	movl	TLSDESC_GEN_COUNT(%eax), %ecx
+	cmpl	(%edx), %ecx
+	ja	.Lslow
+	movl	TLSDESC_MODID(%eax), %ecx
+	movl	(%edx,%ecx,8), %edx
+	cmpl	$-1, %edx
+	je	.Lslow
+	movl	TLSDESC_MODOFF(%eax), %eax
+	addl	%edx, %eax
+.Lret:
+	movl	20(%esp), %ecx
+	subl	%gs:0, %eax
+	movl	24(%esp), %edx
+	addl	$28, %esp
+	cfi_adjust_cfa_offset (-28)
+	ret
+	.p2align 4,,7
+.Lslow:
+	cfi_adjust_cfa_offset (28)
+	movl	%ebx, 16(%esp)
+	call	__i686.get_pc_thunk.bx
+	addl	$_GLOBAL_OFFSET_TABLE_, %ebx
+	call	___tls_get_addr@PLT
+	movl	16(%esp), %ebx
+	jmp	.Lret
+	cfi_endproc
+	.size	_dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
+#endif /* SHARED */
+
+	.hidden _dl_tlsdesc_resolve_abs_plus_addend
+	.global	_dl_tlsdesc_resolve_abs_plus_addend
+	.type	_dl_tlsdesc_resolve_abs_plus_addend,@function
+	cfi_startproc
+	.align 16
+_dl_tlsdesc_resolve_abs_plus_addend:
+0:
+	pushl	%eax
+	cfi_adjust_cfa_offset (4)
+	pushl	%ecx
+	cfi_adjust_cfa_offset (4)
+	pushl	%edx
+	cfi_adjust_cfa_offset (4)
+	movl	$1f - 0b, %ecx
+	movl	4(%ebx), %edx
+	call	_dl_tlsdesc_resolve_abs_plus_addend_fixup
+1:
+	popl	%edx
+	cfi_adjust_cfa_offset (-4)
+	popl	%ecx
+	cfi_adjust_cfa_offset (-4)
+	popl	%eax
+	cfi_adjust_cfa_offset (-4)
+	jmp	*(%eax)
+	cfi_endproc
+	.size	_dl_tlsdesc_resolve_abs_plus_addend, .-_dl_tlsdesc_resolve_abs_plus_addend
+
+	.hidden _dl_tlsdesc_resolve_rel
+	.global	_dl_tlsdesc_resolve_rel
+	.type	_dl_tlsdesc_resolve_rel,@function
+	cfi_startproc
+	.align 16
+_dl_tlsdesc_resolve_rel:
+0:
+	pushl	%eax
+	cfi_adjust_cfa_offset (4)
+	pushl	%ecx
+	cfi_adjust_cfa_offset (4)
+	pushl	%edx
+	cfi_adjust_cfa_offset (4)
+	movl	$1f - 0b, %ecx
+	movl	4(%ebx), %edx
+	call	_dl_tlsdesc_resolve_rel_fixup
+1:
+	popl	%edx
+	cfi_adjust_cfa_offset (-4)
+	popl	%ecx
+	cfi_adjust_cfa_offset (-4)
+	popl	%eax
+	cfi_adjust_cfa_offset (-4)
+	jmp	*(%eax)
+	cfi_endproc
+	.size	_dl_tlsdesc_resolve_rel, .-_dl_tlsdesc_resolve_rel
+
+	.hidden _dl_tlsdesc_resolve_rela
+	.global	_dl_tlsdesc_resolve_rela
+	.type	_dl_tlsdesc_resolve_rela,@function
+	cfi_startproc
+	.align 16
+_dl_tlsdesc_resolve_rela:
+0:
+	pushl	%eax
+	cfi_adjust_cfa_offset (4)
+	pushl	%ecx
+	cfi_adjust_cfa_offset (4)
+	pushl	%edx
+	cfi_adjust_cfa_offset (4)
+	movl	$1f - 0b, %ecx
+	movl	4(%ebx), %edx
+	call	_dl_tlsdesc_resolve_rela_fixup
+1:
+	popl	%edx
+	cfi_adjust_cfa_offset (-4)
+	popl	%ecx
+	cfi_adjust_cfa_offset (-4)
+	popl	%eax
+	cfi_adjust_cfa_offset (-4)
+	jmp	*(%eax)
+	cfi_endproc
+	.size	_dl_tlsdesc_resolve_rela, .-_dl_tlsdesc_resolve_rela
+
+	.hidden _dl_tlsdesc_resolve_hold
+	.global	_dl_tlsdesc_resolve_hold
+	.type	_dl_tlsdesc_resolve_hold,@function
+	cfi_startproc
+	.align 16
+_dl_tlsdesc_resolve_hold:
+0:
+	pushl	%eax
+	cfi_adjust_cfa_offset (4)
+	pushl	%ecx
+	cfi_adjust_cfa_offset (4)
+	pushl	%edx
+	cfi_adjust_cfa_offset (4)
+	movl	$1f - 0b, %ecx
+	movl	4(%ebx), %edx
+	call	_dl_tlsdesc_resolve_hold_fixup
+1:
+	popl	%edx
+	cfi_adjust_cfa_offset (-4)
+	popl	%ecx
+	cfi_adjust_cfa_offset (-4)
+	popl	%eax
+	cfi_adjust_cfa_offset (-4)
+	jmp	*(%eax)
+	cfi_endproc
+	.size	_dl_tlsdesc_resolve_hold, .-_dl_tlsdesc_resolve_hold
+
+#endif /* USE_TLS */
Index: sysdeps/i386/dl-tlsdesc.h
===================================================================
--- /dev/null
+++ sysdeps/i386/dl-tlsdesc.h
@@ -0,0 +1,60 @@
+/* Thread-local storage descriptor handling in the ELF dynamic linker.
+   i386 version.
+   Copyright (C) 2005 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, write to the Free
+   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307 USA.  */
+
+#ifndef _I386_DL_TLSDESC_H
+# define _I386_DL_TLSDESC_H 1
+
+/* Type used to represent a TLS descriptor in the GOT.  */
+struct tlsdesc
+{
+  ptrdiff_t __attribute__((regparm(1))) (*entry)(struct tlsdesc *);
+  void *arg;
+};
+
+typedef struct dl_tls_index
+{
+  unsigned long int ti_module;
+  unsigned long int ti_offset;
+} tls_index;
+
+/* Type used as the argument in a TLS descriptor for a symbol that
+   needs dynamic TLS offsets.  */
+struct tlsdesc_dynamic_arg
+{
+  tls_index tlsinfo;
+  size_t gen_count;
+};
+
+extern ptrdiff_t attribute_hidden __attribute__((regparm(1)))
+  _dl_tlsdesc_return(struct tlsdesc *),
+  _dl_tlsdesc_undefweak(struct tlsdesc *),
+  _dl_tlsdesc_resolve_abs_plus_addend(struct tlsdesc *),
+  _dl_tlsdesc_resolve_rel(struct tlsdesc *),
+  _dl_tlsdesc_resolve_rela(struct tlsdesc *),
+  _dl_tlsdesc_resolve_hold(struct tlsdesc *);
+
+# ifdef SHARED
+extern void *_dl_make_tlsdesc_dynamic (struct link_map *map, size_t ti_offset);
+
+extern ptrdiff_t attribute_hidden __attribute__((regparm(1)))
+  _dl_tlsdesc_dynamic(struct tlsdesc *);
+# endif
+
+#endif
Index: sysdeps/i386/tlsdesc.c
===================================================================
--- /dev/null
+++ sysdeps/i386/tlsdesc.c
@@ -0,0 +1,673 @@
+/* Manage TLS descriptors.  i386 version.
+   Copyright (C) 2005 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, write to the Free
+   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307 USA.  */
+
+#include <link.h>
+#include <ldsodefs.h>
+#include <elf/dynamic-link.h>
+#include <tls.h>
+#include <dl-tlsdesc.h>
+
+#ifdef USE_TLS
+# ifdef SHARED
+
+extern void weak_function free (void *ptr);
+
+/* The hashcode handling code below is heavily inspired in libiberty's
+   hashtab code, but with most adaptation points and support for
+   deleting elements removed.
+
+   Copyright (C) 1999, 2000, 2001, 2002, 2003 Free Software Foundation, Inc.
+   Contributed by Vladimir Makarov (vmakarov@cygnus.com).  */
+
+inline static unsigned long
+higher_prime_number (unsigned long n)
+{
+  /* These are primes that are near, but slightly smaller than, a
+     power of two.  */
+  static const unsigned long primes[] = {
+    (unsigned long) 7,
+    (unsigned long) 13,
+    (unsigned long) 31,
+    (unsigned long) 61,
+    (unsigned long) 127,
+    (unsigned long) 251,
+    (unsigned long) 509,
+    (unsigned long) 1021,
+    (unsigned long) 2039,
+    (unsigned long) 4093,
+    (unsigned long) 8191,
+    (unsigned long) 16381,
+    (unsigned long) 32749,
+    (unsigned long) 65521,
+    (unsigned long) 131071,
+    (unsigned long) 262139,
+    (unsigned long) 524287,
+    (unsigned long) 1048573,
+    (unsigned long) 2097143,
+    (unsigned long) 4194301,
+    (unsigned long) 8388593,
+    (unsigned long) 16777213,
+    (unsigned long) 33554393,
+    (unsigned long) 67108859,
+    (unsigned long) 134217689,
+    (unsigned long) 268435399,
+    (unsigned long) 536870909,
+    (unsigned long) 1073741789,
+    (unsigned long) 2147483647,
+					/* 4294967291L */
+    ((unsigned long) 2147483647) + ((unsigned long) 2147483644),
+  };
+
+  const unsigned long *low = &primes[0];
+  const unsigned long *high = &primes[sizeof(primes) / sizeof(primes[0])];
+
+  while (low != high)
+    {
+      const unsigned long *mid = low + (high - low) / 2;
+      if (n > *mid)
+	low = mid + 1;
+      else
+	high = mid;
+    }
+
+#if 0
+  /* If we've run out of primes, abort.  */
+  if (n > *low)
+    {
+      fprintf (stderr, "Cannot find prime bigger than %lu\n", n);
+      abort ();
+    }
+#endif
+
+  return *low;
+}
+
+struct hashtab
+{
+  /* Table itself.  */
+  void **entries;
+
+  /* Current size (in entries) of the hash table */
+  size_t size;
+
+  /* Current number of elements.  */
+  size_t n_elements;
+};
+
+inline static struct hashtab *
+htab_create (void)
+{
+  struct hashtab *ht = malloc (sizeof (struct hashtab));
+
+  if (! ht)
+    return NULL;
+  ht->size = 3;
+  ht->entries = malloc (sizeof (void *) * ht->size);
+  if (! ht->entries)
+    return NULL;
+
+  ht->n_elements = 0;
+
+  memset (ht->entries, 0, sizeof (void *) * ht->size);
+
+  return ht;
+}
+
+/* This is only called from _dl_unmap, so it's safe to call
+   free().  See the discussion below.  */
+inline static void
+htab_delete (struct hashtab *htab)
+{
+  int i;
+
+  for (i = htab->size - 1; i >= 0; i--)
+    if (htab->entries[i])
+      free (htab->entries[i]);
+
+  free (htab->entries);
+  free (htab);
+}
+
+/* Similar to htab_find_slot, but without several unwanted side effects:
+    - Does not call htab->eq_f when it finds an existing entry.
+    - Does not change the count of elements/searches/collisions in the
+      hash table.
+   This function also assumes there are no deleted entries in the table.
+   HASH is the hash value for the element to be inserted.  */
+
+inline static void **
+find_empty_slot_for_expand (struct hashtab *htab, int hash)
+{
+  size_t size = htab->size;
+  unsigned int index = hash % size;
+  void **slot = htab->entries + index;
+  int hash2;
+
+  if (! *slot)
+    return slot;
+
+  hash2 = 1 + hash % (size - 2);
+  for (;;)
+    {
+      index += hash2;
+      if (index >= size)
+	index -= size;
+
+      slot = htab->entries + index;
+      if (! *slot)
+	return slot;
+    }
+}
+
+/* The following function changes size of memory allocated for the
+   entries and repeatedly inserts the table elements.  The occupancy
+   of the table after the call will be about 50%.  Naturally the hash
+   table must already exist.  Remember also that the place of the
+   table entries is changed.  If memory allocation failures are allowed,
+   this function will return zero, indicating that the table could not be
+   expanded.  If all goes well, it will return a non-zero value.  */
+
+inline static int
+htab_expand (struct hashtab *htab, int (*hash_fn)(void *))
+{
+  void **oentries;
+  void **olimit;
+  void **p;
+  void **nentries;
+  size_t nsize;
+
+  oentries = htab->entries;
+  olimit = oentries + htab->size;
+
+  /* Resize only when table after removal of unused elements is either
+     too full or too empty.  */
+  if (htab->n_elements * 2 > htab->size)
+    nsize = higher_prime_number (htab->n_elements * 2);
+  else
+    nsize = htab->size;
+
+  nentries = malloc (sizeof (void *) * nsize);
+  memset (nentries, 0, sizeof (void *) * nsize);
+  if (nentries == NULL)
+    return 0;
+  htab->entries = nentries;
+  htab->size = nsize;
+
+  p = oentries;
+  do
+    {
+      if (*p)
+	*find_empty_slot_for_expand (htab, hash_fn (*p))
+	  = *p;
+
+      p++;
+    }
+  while (p < olimit);
+
+#if 0 /* We can't tell whether this was allocated by the malloc()
+	 built into ld.so or the one in the main executable or libc,
+	 and calling free() for something that wasn't malloc()ed could
+	 do Very Bad Things (TM).  Take the conservative approach
+	 here, potentially wasting as much memory as actually used by
+	 the hash table, even if multiple growths occur.  That's not
+	 so bad as to require some overengineered solution that would
+	 enable us to keep track of how it was allocated. */
+  free (oentries);
+#endif
+  return 1;
+}
+
+/* This function searches for a hash table slot containing an entry
+   equal to the given element.  To delete an entry, call this with
+   INSERT = 0, then call htab_clear_slot on the slot returned (possibly
+   after doing some checks).  To insert an entry, call this with
+   INSERT = 1, then write the value you want into the returned slot.
+   When inserting an entry, NULL may be returned if memory allocation
+   fails.  */
+
+inline static void **
+htab_find_slot (struct hashtab *htab, void *ptr, int insert,
+		int (*hash_fn)(void *), int (*eq_fn)(void *, void *))
+{
+  unsigned int index;
+  int hash, hash2;
+  size_t size;
+  void **entry;
+
+  if (htab->size * 3 <= htab->n_elements * 4
+      && htab_expand (htab, hash_fn) == 0)
+    return NULL;
+
+  hash = hash_fn (ptr);
+
+  size = htab->size;
+  index = hash % size;
+
+  entry = &htab->entries[index];
+  if (!*entry)
+    goto empty_entry;
+  else if (eq_fn (*entry, ptr))
+    return entry;
+
+  hash2 = 1 + hash % (size - 2);
+  for (;;)
+    {
+      index += hash2;
+      if (index >= size)
+	index -= size;
+
+      entry = &htab->entries[index];
+      if (!*entry)
+	goto empty_entry;
+      else if (eq_fn (*entry, ptr))
+	return entry;
+    }
+
+ empty_entry:
+  if (!insert)
+    return NULL;
+
+  htab->n_elements++;
+  return entry;
+}
+
+inline static int
+hash_tlsdesc(void *p)
+{
+  struct tlsdesc_dynamic_arg *td = p;
+
+  /* We know all entries are for the same module, so ti_offset is the
+     only distinguishing entry.  */
+  return td->tlsinfo.ti_offset;
+}
+
+inline static int
+eq_tlsdesc(void *p, void *q)
+{
+  struct tlsdesc_dynamic_arg *tdp = p, *tdq = q;
+
+  return tdp->tlsinfo.ti_offset == tdq->tlsinfo.ti_offset;
+}
+
+inline static int
+map_generation (struct link_map *map)
+{
+  size_t idx = map->l_tls_modid;
+  struct dtv_slotinfo_list *listp = GL(dl_tls_dtv_slotinfo_list);
+
+  /* Find the place in the dtv slotinfo list.  */
+  do
+    {
+      /* Does it fit in the array of this list element?  */
+      if (idx < listp->len)
+	{
+	  /* We should never get here for a module in static TLS, so
+	     we can assume that, if the generation count is zero, we
+	     still haven't determined the generation count for this
+	     module.  */
+	  if (listp->slotinfo[idx].gen)
+	    return listp->slotinfo[idx].gen;
+	  else
+	    break;
+	}
+      idx -= listp->len;
+      listp = listp->next;
+    }
+  while (listp != NULL);
+
+  /* If we get to this point, the module still hasn't been assigned an
+     entry in the dtv slotinfo data structures, and it will when we're
+     done with relocations.  At that point, the module will get a
+     generation number that is one past the current generation, so
+     return exactly that.  */
+  return GL(dl_tls_generation) + 1;
+}
+
+void *
+_dl_make_tlsdesc_dynamic (struct link_map *map, size_t ti_offset)
+{
+  struct hashtab *ht;
+  void **entry;
+  struct tlsdesc_dynamic_arg *td, test;
+
+  /* FIXME: We could use a per-map lock here, but is it worth it?  */
+  __rtld_lock_lock_recursive (GL(dl_load_lock));
+
+  ht = map->l_mach.tlsdesc_table;
+  if (! ht)
+    {
+      ht = htab_create ();
+      if (! ht)
+	{
+	  __rtld_lock_unlock_recursive (GL(dl_load_lock));
+	  return 0;
+	}
+      map->l_mach.tlsdesc_table = ht;
+    }
+
+  test.tlsinfo.ti_module = map->l_tls_modid;
+  test.tlsinfo.ti_offset = ti_offset;
+  entry = htab_find_slot (ht, &test, 1, hash_tlsdesc, eq_tlsdesc);
+  if (*entry)
+    {
+      td = *entry;
+      __rtld_lock_unlock_recursive (GL(dl_load_lock));
+      return td;
+    }
+
+  *entry = td = malloc (sizeof (struct tlsdesc_dynamic_arg));
+  /* This may be higher than the map's generation, but it doesn't
+     matter much.  Worst case, we'll have one extra DTV update per
+     thread.  */
+  td->gen_count = map_generation (map);
+  td->tlsinfo = test.tlsinfo;
+
+  __rtld_lock_unlock_recursive (GL(dl_load_lock));
+  return td;
+}
+
+# endif /* SHARED */
+
+/* The idea of the following two functions is to stop multiple threads
+   from attempting to resolve the same TLS descriptor without busy
+   waiting.  Ideally, we should be able to release the lock right
+   after changing td->entry, and then using say a condition variable
+   or a futex wake to wake up any waiting threads, but let's try to
+   avoid introducing such dependencies.  */
+
+inline static int
+_dl_tlsdesc_resolve_early_return_p (struct tlsdesc volatile *td, void *caller)
+{
+  if (caller != td->entry)
+    return 1;
+
+  __rtld_lock_lock_recursive (GL(dl_load_lock));
+  if (caller != td->entry)
+    {
+      __rtld_lock_unlock_recursive (GL(dl_load_lock));
+      return 1;
+    }
+
+  td->entry = _dl_tlsdesc_resolve_hold;
+
+  return 0;
+}
+
+inline static void
+_dl_tlsdesc_wake_up_held_fixups (void)
+{
+  __rtld_lock_unlock_recursive (GL(dl_load_lock));
+}
+
+/* The following 4 functions take an entry_check_offset argument.
+   It's computed by the caller as an offset between its entry point
+   and the call site, such that by adding the built-in return address
+   that is implicitly passed to the function with this offset, we can
+   easily obtain the caller's entry point to compare with the entry
+   point given in the TLS descriptor.  If it's changed, we want to
+   return immediately.  */
+
+/* These macros are copied from elf/dl-reloc.c */
+
+#define CHECK_STATIC_TLS(map, sym_map)					\
+    do {								\
+      if (__builtin_expect ((sym_map)->l_tls_offset == NO_TLS_OFFSET	\
+			    || ((sym_map)->l_tls_offset			\
+				== FORCED_DYNAMIC_TLS_OFFSET), 0))	\
+	_dl_allocate_static_tls (sym_map);				\
+    } while (0)
+
+#define TRY_STATIC_TLS(map, sym_map)					\
+    (__builtin_expect ((sym_map)->l_tls_offset				\
+		       != FORCED_DYNAMIC_TLS_OFFSET, 1)			\
+     && (__builtin_expect ((sym_map)->l_tls_offset != NO_TLS_OFFSET, 1)	\
+	 || _dl_try_allocate_static_tls (sym_map) == 0))
+
+int internal_function _dl_try_allocate_static_tls (struct link_map *map);
+
+/* This function is used to lazily resolve TLS_DESC REL relocations
+   that reference the *ABS* segment in their own link maps.  The
+   argument is the addend originally stored there.  */
+
+void
+__attribute__ ((regparm (3))) attribute_hidden
+_dl_tlsdesc_resolve_abs_plus_addend_fixup (struct tlsdesc volatile *td,
+					   struct link_map *l,
+					   ptrdiff_t entry_check_offset)
+{
+  ptrdiff_t addend = (ptrdiff_t) td->arg;
+
+  if (_dl_tlsdesc_resolve_early_return_p (td, __builtin_return_address (0)
+					  - entry_check_offset))
+    return;
+
+#ifndef SHARED
+  CHECK_STATIC_TLS (l, l);
+#else
+  if (!TRY_STATIC_TLS (l, l))
+    {
+      td->arg = _dl_make_tlsdesc_dynamic (l, addend);
+      td->entry = _dl_tlsdesc_dynamic;
+    }
+  else
+#endif
+    {
+      td->arg = (void*)(addend - l->l_tls_offset);
+      td->entry = _dl_tlsdesc_return;
+    }
+
+  _dl_tlsdesc_wake_up_held_fixups ();
+}
+
+/* This function is used to lazily resolve TLS_DESC REL relocations
+   that originally had zero addends.  The argument location, that
+   originally held the addend, is used to hold a pointer to the
+   relocation, but it has to be restored before we call the function
+   that applies relocations.  */
+
+void
+__attribute__ ((regparm (3))) attribute_hidden
+_dl_tlsdesc_resolve_rel_fixup (struct tlsdesc volatile *td,
+			       struct link_map *l,
+			       ptrdiff_t entry_check_offset)
+{
+  const ElfW(Rel) *reloc = td->arg;
+
+  if (_dl_tlsdesc_resolve_early_return_p (td, __builtin_return_address (0)
+					  - entry_check_offset))
+    return;
+
+  /* The code below was borrowed from _dl_fixup(),
+     except for checking for STB_LOCAL.  */
+  const ElfW(Sym) *const symtab
+    = (const void *) D_PTR (l, l_info[DT_SYMTAB]);
+  const char *strtab = (const void *) D_PTR (l, l_info[DT_STRTAB]);
+  const ElfW(Sym) *sym = &symtab[ELFW(R_SYM) (reloc->r_info)];
+  lookup_t result;
+
+   /* Look up the target symbol.  If the normal lookup rules are not
+      used don't look in the global scope.  */
+  if (ELFW(ST_BIND) (sym->st_info) != STB_LOCAL
+      && __builtin_expect (ELFW(ST_VISIBILITY) (sym->st_other), 0) == 0)
+    {
+      const struct r_found_version *version = NULL;
+
+      if (l->l_info[VERSYMIDX (DT_VERSYM)] != NULL)
+	{
+	  const ElfW(Half) *vernum =
+	    (const void *) D_PTR (l, l_info[VERSYMIDX (DT_VERSYM)]);
+	  ElfW(Half) ndx = vernum[ELFW(R_SYM) (reloc->r_info)] & 0x7fff;
+	  version = &l->l_versions[ndx];
+	  if (version->hash == 0)
+	    version = NULL;
+	}
+
+      result = _dl_lookup_symbol_x (strtab + sym->st_name, l, &sym,
+				    l->l_scope, version, ELF_RTYPE_CLASS_PLT,
+				    DL_LOOKUP_ADD_DEPENDENCY, NULL);
+    }
+  else
+    {
+      /* We already found the symbol.  The module (and therefore its load
+	 address) is also known.  */
+      result = l;
+    }
+
+  if (!sym)
+    {
+      td->arg = 0;
+      td->entry = _dl_tlsdesc_undefweak;
+    }
+  else
+    {
+#  ifndef SHARED
+      CHECK_STATIC_TLS (l, result);
+#  else
+      if (!TRY_STATIC_TLS (l, result))
+	{
+	  td->arg = _dl_make_tlsdesc_dynamic (result, sym->st_value);
+	  td->entry = _dl_tlsdesc_dynamic;
+	}
+      else
+#  endif
+	{
+	  td->arg = (void*)(sym->st_value - result->l_tls_offset);
+	  td->entry = _dl_tlsdesc_return;
+	}
+    }
+
+  _dl_tlsdesc_wake_up_held_fixups ();
+}
+
+/* This function is used to lazily resolve TLS_DESC RELA relocations.
+   The argument location is used to hold a pointer to the relocation.  */
+
+void
+__attribute__ ((regparm (3))) attribute_hidden
+_dl_tlsdesc_resolve_rela_fixup (struct tlsdesc volatile *td,
+				struct link_map *l,
+				ptrdiff_t entry_check_offset)
+{
+  const ElfW(Rela) *reloc = td->arg;
+
+  if (_dl_tlsdesc_resolve_early_return_p (td, __builtin_return_address (0)
+					  - entry_check_offset))
+    return;
+
+  /* The code below was borrowed from _dl_fixup(),
+     except for checking for STB_LOCAL.  */
+  const ElfW(Sym) *const symtab
+    = (const void *) D_PTR (l, l_info[DT_SYMTAB]);
+  const char *strtab = (const void *) D_PTR (l, l_info[DT_STRTAB]);
+  const ElfW(Sym) *sym = &symtab[ELFW(R_SYM) (reloc->r_info)];
+  lookup_t result;
+
+   /* Look up the target symbol.  If the normal lookup rules are not
+      used don't look in the global scope.  */
+  if (ELFW(ST_BIND) (sym->st_info) != STB_LOCAL
+      && __builtin_expect (ELFW(ST_VISIBILITY) (sym->st_other), 0) == 0)
+    {
+      const struct r_found_version *version = NULL;
+
+      if (l->l_info[VERSYMIDX (DT_VERSYM)] != NULL)
+	{
+	  const ElfW(Half) *vernum =
+	    (const void *) D_PTR (l, l_info[VERSYMIDX (DT_VERSYM)]);
+	  ElfW(Half) ndx = vernum[ELFW(R_SYM) (reloc->r_info)] & 0x7fff;
+	  version = &l->l_versions[ndx];
+	  if (version->hash == 0)
+	    version = NULL;
+	}
+
+      result = _dl_lookup_symbol_x (strtab + sym->st_name, l, &sym,
+				    l->l_scope, version, ELF_RTYPE_CLASS_PLT,
+				    DL_LOOKUP_ADD_DEPENDENCY, NULL);
+    }
+  else
+    {
+      /* We already found the symbol.  The module (and therefore its load
+	 address) is also known.  */
+      result = l;
+    }
+
+  if (!sym)
+    {
+      td->arg = (void*)reloc->r_addend;
+      td->entry = _dl_tlsdesc_undefweak;
+    }
+  else
+    {
+#  ifndef SHARED
+      CHECK_STATIC_TLS (l, result);
+#  else
+      if (!TRY_STATIC_TLS (l, result))
+	{
+	  td->arg = _dl_make_tlsdesc_dynamic (result, sym->st_value
+					      + reloc->r_addend);
+	  td->entry = _dl_tlsdesc_dynamic;
+	}
+      else
+#  endif
+	{
+	  td->arg = (void*)(sym->st_value - result->l_tls_offset
+			    + reloc->r_addend);
+	  td->entry = _dl_tlsdesc_return;
+	}
+    }
+
+  _dl_tlsdesc_wake_up_held_fixups ();
+}
+
+void
+__attribute__ ((regparm (3))) attribute_hidden
+_dl_tlsdesc_resolve_hold_fixup (struct tlsdesc volatile *td,
+				struct link_map *l __attribute__((__unused__)),
+				ptrdiff_t entry_check_offset)
+{
+  /* Maybe we're lucky and can return early.  */
+  if (__builtin_return_address (0) - entry_check_offset != td->entry)
+    return;
+
+  /* Locking here will stop execution until the runnign resolver runs
+     _dl_tlsdesc_wake_up_held_fixups(), releasing the lock.
+
+     FIXME: We'd be better off waiting on a condition variable, such
+     that we didn't have to hold the lock throughout the relocation
+     processing.  */
+  __rtld_lock_lock_recursive (GL(dl_load_lock));
+  __rtld_lock_unlock_recursive (GL(dl_load_lock));
+}
+
+#endif /* USE_TLS */
+
+void
+_dl_unmap (struct link_map *map)
+{
+  __munmap ((void *) (map)->l_map_start,
+	    (map)->l_map_end - (map)->l_map_start);
+
+#if USE_TLS && SHARED
+  /* _dl_unmap is only called for dlopen()ed libraries, for which
+     calling free() is safe, or before we've completed the initial
+     relocation, in which case calling free() is probably pointless,
+     but still safe.  */
+  if (map->l_mach.tlsdesc_table)
+    htab_delete (map->l_mach.tlsdesc_table);
+#endif
+}
Index: sysdeps/i386/tlsdesc.sym
===================================================================
--- /dev/null
+++ sysdeps/i386/tlsdesc.sym
@@ -0,0 +1,20 @@
+#include <stddef.h>
+#include <sysdep.h>
+#include <tls.h>
+#include <link.h>
+#include <dl-tlsdesc.h>
+
+--
+
+-- Abuse tls.h macros to derive offsets relative to the thread register.
+#if defined USE_TLS
+
+DTV_OFFSET			offsetof(struct pthread, header.dtv)
+
+TLSDESC_ARG			offsetof(struct tlsdesc, arg)
+
+TLSDESC_GEN_COUNT		offsetof(struct tlsdesc_dynamic_arg, gen_count)
+TLSDESC_MODID			offsetof(struct tlsdesc_dynamic_arg, tlsinfo.ti_module)
+TLSDESC_MODOFF			offsetof(struct tlsdesc_dynamic_arg, tlsinfo.ti_offset)
+
+#endif
Index: sysdeps/x86_64/Makefile
===================================================================
--- sysdeps/x86_64/Makefile.orig
+++ sysdeps/x86_64/Makefile
@@ -9,3 +9,13 @@ endif
 ifeq ($(subdir),gmon)
 sysdep_routines += _mcount
 endif
+
+ifeq ($(subdir),elf)
+sysdep-dl-routines += tlsdesc dl-tlsdesc
+sysdep_routines += tlsdesc dl-tlsdesc
+sysdep-rtld-routines += tlsdesc dl-tlsdesc
+endif
+
+ifeq ($(subdir),csu)
+gen-as-const-headers += tlsdesc.sym
+endif
Index: sysdeps/x86_64/bits/linkmap.h
===================================================================
--- sysdeps/x86_64/bits/linkmap.h.orig
+++ sysdeps/x86_64/bits/linkmap.h
@@ -3,6 +3,7 @@ struct link_map_machine
   {
     Elf64_Addr plt; /* Address of .plt + 0x16 */
     Elf64_Addr gotplt; /* Address of .got + 0x18 */
+    void *tlsdesc_table; /* Address of TLS descriptor hash table.  */
   };
 
 #else
@@ -10,5 +11,6 @@ struct link_map_machine
   {
     Elf32_Addr plt; /* Address of .plt + 0x16 */
     Elf32_Addr gotplt; /* Address of .got + 0x0c */
+    void *tlsdesc_table; /* Address of TLS descriptor hash table.  */
   };
 #endif
Index: sysdeps/x86_64/dl-lookupcfg.h
===================================================================
--- /dev/null
+++ sysdeps/x86_64/dl-lookupcfg.h
@@ -0,0 +1,28 @@
+/* Configuration of lookup functions.
+   Copyright (C) 2005 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, write to the Free
+   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307 USA.  */
+
+#define DL_UNMAP_IS_SPECIAL
+
+#include_next <dl-lookupcfg.h>
+
+struct link_map;
+
+extern void _dl_unmap (struct link_map *map);
+
+#define DL_UNMAP(map) _dl_unmap (map)
Index: sysdeps/x86_64/dl-machine.h
===================================================================
--- sysdeps/x86_64/dl-machine.h.orig
+++ sysdeps/x86_64/dl-machine.h
@@ -26,6 +26,7 @@
 #include <sys/param.h>
 #include <sysdep.h>
 #include <tls.h>
+#include <dl-tlsdesc.h>
 
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute__ ((unused))
@@ -131,6 +132,10 @@ elf_machine_runtime_setup (struct link_m
 	got[2] = (Elf64_Addr) &_dl_runtime_resolve;
     }
 
+  if (l->l_info[ADDRIDX (DT_TLSDESC_GOT)] && lazy)
+    *(Elf64_Addr*)(D_PTR (l, l_info[ADDRIDX (DT_TLSDESC_GOT)]) + l->l_addr)
+      = (Elf64_Addr) &_dl_tlsdesc_resolve_rela;
+
   return lazy;
 }
 
@@ -194,7 +199,9 @@ _dl_start_user:\n\
 # define elf_machine_type_class(type)					      \
   ((((type) == R_X86_64_JUMP_SLOT					      \
      || (type) == R_X86_64_DTPMOD64					      \
-     || (type) == R_X86_64_DTPOFF64 || (type) == R_X86_64_TPOFF64)	      \
+     || (type) == R_X86_64_DTPOFF64					      \
+     || (type) == R_X86_64_TPOFF64					      \
+     || (type) == R_X86_64_TLSDESC)					      \
     * ELF_RTYPE_CLASS_PLT)						      \
    | (((type) == R_X86_64_COPY) * ELF_RTYPE_CLASS_COPY))
 #else
@@ -323,6 +330,41 @@ elf_machine_rela (struct link_map *map, 
 	    *reloc_addr = sym->st_value + reloc->r_addend;
 # endif
 	  break;
+	case R_X86_64_TLSDESC:
+	  {
+	    struct tlsdesc volatile *td =
+	      (struct tlsdesc volatile *)reloc_addr;
+
+# ifndef RTLD_BOOTSTRAP
+	    if (! sym)
+	      {
+		td->arg = (void*)reloc->r_addend;
+		td->entry = _dl_tlsdesc_undefweak;
+	      }
+	    else
+# endif
+	      {
+# ifndef RTLD_BOOTSTRAP
+#  ifndef SHARED
+		CHECK_STATIC_TLS (map, sym_map);
+#  else
+		if (!TRY_STATIC_TLS (map, sym_map))
+		  {
+		    td->arg = _dl_make_tlsdesc_dynamic
+		      (sym_map, sym->st_value + reloc->r_addend);
+		    td->entry = _dl_tlsdesc_dynamic;
+		  }
+		else
+#  endif
+# endif
+		  {
+		    td->arg = (void*)(sym->st_value - sym_map->l_tls_offset
+				      + reloc->r_addend);
+		    td->entry = _dl_tlsdesc_return;
+		  }
+	      }
+	    break;
+	  }
 	case R_X86_64_TPOFF64:
 	  /* The offset is negative, forward from the thread pointer.  */
 # ifndef RTLD_BOOTSTRAP
@@ -435,6 +477,15 @@ elf_machine_lazy_rel (struct link_map *m
 	  map->l_mach.plt
 	  + (((Elf64_Addr) reloc_addr) - map->l_mach.gotplt) * 2;
     }
+  else if (__builtin_expect (r_type == R_X86_64_TLSDESC, 1))
+    {
+      struct tlsdesc volatile * __attribute__((__unused__)) td =
+	(struct tlsdesc volatile *)reloc_addr;
+
+      td->arg = (void*)reloc;
+      td->entry = (void*)(D_PTR (map, l_info[ADDRIDX (DT_TLSDESC_PLT)])
+			  + map->l_addr);
+    }
   else
     _dl_reloc_bad_type (map, r_type, 1);
 }
Index: sysdeps/x86_64/dl-tls.h
===================================================================
--- sysdeps/x86_64/dl-tls.h.orig
+++ sysdeps/x86_64/dl-tls.h
@@ -1,5 +1,5 @@
 /* Thread-local storage handling in the ELF dynamic linker.  x86-64 version.
-   Copyright (C) 2002 Free Software Foundation, Inc.
+   Copyright (C) 2002, 2005 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -19,11 +19,13 @@
 
 
 /* Type used for the representation of TLS information in the GOT.  */
-typedef struct
+typedef struct dl_tls_index
 {
   unsigned long int ti_module;
   unsigned long int ti_offset;
 } tls_index;
 
 
+#ifdef SHARED
 extern void *__tls_get_addr (tls_index *ti);
+#endif
Index: sysdeps/x86_64/dl-tlsdesc.S
===================================================================
--- /dev/null
+++ sysdeps/x86_64/dl-tlsdesc.S
@@ -0,0 +1,208 @@
+/* Thread-local storage handling in the ELF dynamic linker.  x86_64 version.
+   Copyright (C) 2004, 2005 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, write to the Free
+   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307 USA.  */
+
+#include <sysdep.h>
+#include <tls.h>
+#include "tlsdesc.h"
+
+	.text
+#ifdef USE_TLS
+	.hidden _dl_tlsdesc_return
+	.global	_dl_tlsdesc_return
+	.type	_dl_tlsdesc_return,@function
+	cfi_startproc
+	.align 16
+_dl_tlsdesc_return:
+	movq	8(%rax), %rax
+	ret
+	cfi_endproc
+	.size	_dl_tlsdesc_return, .-_dl_tlsdesc_return
+
+	.hidden _dl_tlsdesc_undefweak
+	.global	_dl_tlsdesc_undefweak
+	.type	_dl_tlsdesc_undefweak,@function
+	cfi_startproc
+	.align 16
+_dl_tlsdesc_undefweak:
+	movq	8(%rax), %rax
+	subq	%fs:0, %rax
+	ret
+	cfi_endproc
+	.size	_dl_tlsdesc_undefweak, .-_dl_tlsdesc_undefweak
+
+#ifdef SHARED
+	.hidden _dl_tlsdesc_dynamic
+	.global	_dl_tlsdesc_dynamic
+	.type	_dl_tlsdesc_dynamic,@function
+
+     /* %rax points to the TLS descriptor, such that 0(%rax) points to
+	_dl_tlsdesc_dynamic itself, and 8(%rax) points to a struct
+	tlsdesc_dynamic_arg object.  It must return in %rax the offset
+	between the thread pointer and the object denoted by the
+	argument, without clobbering any registers.
+
+	The assembly code that follows is a rendition of the following
+	C code, hand-optimized a little bit.
+
+ptrdiff_t
+_dl_tlsdesc_dynamic (register struct tlsdesc *tdp asm ("%rax"))
+{
+  struct tlsdesc_dynamic_arg *td = tdp->arg;
+  dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
+  if (__builtin_expect (td->gen_count <= dtv[0].counter
+			&& (dtv[td->tlsinfo.ti_module].pointer.val
+			    != TLS_DTV_UNALLOCATED),
+			1))
+    return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
+      - __thread_pointer;
+
+  return __tls_get_addr_internal (&td->tlsinfo) - __thread_pointer;
+}
+*/
+	cfi_startproc
+	.align 16
+_dl_tlsdesc_dynamic:
+	/* Preserve call-clobbered registers that we modify.
+	   We need two scratch regs anyway.  */
+	pushq	%rsi
+	cfi_adjust_cfa_offset (8)
+	movq	%fs:DTV_OFFSET, %rsi
+	pushq	%rdi
+	cfi_adjust_cfa_offset (8)
+	movq	TLSDESC_ARG(%rax), %rdi
+	movq	(%rsi), %rax
+	cmpq	%rax, TLSDESC_GEN_COUNT(%rdi)
+	ja	.Lslow
+	movq	TLSDESC_MODID(%rdi), %rax
+	salq	$4, %rax
+	movq	(%rax,%rsi), %rax
+	cmpq	$-1, %rax
+	je	.Lslow
+	addq	TLSDESC_MODOFF(%rdi), %rax
+.Lret:
+	popq	%rdi
+	cfi_adjust_cfa_offset (-8)
+	subq	%fs:0, %rax
+	popq	%rsi
+	cfi_adjust_cfa_offset (-8)
+	ret
+.Lslow:
+	/* Besides rdi and rsi, saved above, save rdx, rcx, r8, r9,
+	   r10 and r11.  Also, align the stack, that's off by 8 bytes.	*/
+	cfi_adjust_cfa_offset (16)
+	subq	$56, %rsp
+	cfi_adjust_cfa_offset (56)
+	movq	%rdx, 8(%rsp)
+	movq	%rcx, 16(%rsp)
+	movq	%r8, 24(%rsp)
+	movq	%r9, 32(%rsp)
+	movq	%r10, 40(%rsp)
+	movq	%r11, 48(%rsp)
+	/* %rdi already points to the tlsinfo data structure.  */
+	call	__tls_get_addr@PLT
+	movq	8(%rsp), %rdx
+	movq	16(%rsp), %rcx
+	movq	24(%rsp), %r8
+	movq	32(%rsp), %r9
+	movq	40(%rsp), %r10
+	movq	48(%rsp), %r11
+	addq	$56, %rsp
+	cfi_adjust_cfa_offset (-56)
+	jmp	.Lret
+	cfi_endproc
+	.size	_dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
+#endif /* SHARED */
+
+	.hidden _dl_tlsdesc_resolve_rela
+	.global	_dl_tlsdesc_resolve_rela
+	.type	_dl_tlsdesc_resolve_rela,@function
+	cfi_startproc
+	.align 16
+	/* The PLT entry will have pushed the link_map pointer.  */
+	cfi_adjust_cfa_offset (8)
+_dl_tlsdesc_resolve_rela:
+	/* Save all call-clobbered registers.  */
+	subq	$72, %rsp
+	cfi_adjust_cfa_offset (72)
+	movq	%rax, (%rsp)
+	movq	%rdi, 8(%rsp)
+	movq	%rax, %rdi	/* Pass tlsdesc* in %rdi.  */
+	movq	%rsi, 16(%rsp)
+	movq	72(%rsp), %rsi	/* Pass link_map* in %rsi.  */
+	movq	%r8, 24(%rsp)
+	movq	%r9, 32(%rsp)
+	movq	%r10, 40(%rsp)
+	movq	%r11, 48(%rsp)
+	movq	%rdx, 56(%rsp)
+	movq	%rcx, 64(%rsp)
+	call	_dl_tlsdesc_resolve_rela_fixup
+	movq	(%rsp), %rax
+	movq	8(%rsp), %rdi
+	movq	16(%rsp), %rsi
+	movq	24(%rsp), %r8
+	movq	32(%rsp), %r9
+	movq	40(%rsp), %r10
+	movq	48(%rsp), %r11
+	movq	56(%rsp), %rdx
+	movq	64(%rsp), %rcx
+	addq	$80, %rsp
+	cfi_adjust_cfa_offset (-80)
+	jmp	*(%rax)
+	cfi_endproc
+	.size	_dl_tlsdesc_resolve_rela, .-_dl_tlsdesc_resolve_rela
+
+	.hidden _dl_tlsdesc_resolve_hold
+	.global	_dl_tlsdesc_resolve_hold
+	.type	_dl_tlsdesc_resolve_hold,@function
+	cfi_startproc
+	.align 16
+_dl_tlsdesc_resolve_hold:
+0:
+	/* Save all call-clobbered registers.  */
+	subq	$72, %rsp
+	cfi_adjust_cfa_offset (72)
+	movq	%rax, (%rsp)
+	movq	%rdi, 8(%rsp)
+	movq	%rax, %rdi	/* Pass tlsdesc* in %rdi.  */
+	movq	%rsi, 16(%rsp)
+	movq	$1f - 0b, %rsi	/* Pass return address offset in %rsi.  */
+	movq	%r8, 24(%rsp)
+	movq	%r9, 32(%rsp)
+	movq	%r10, 40(%rsp)
+	movq	%r11, 48(%rsp)
+	movq	%rdx, 56(%rsp)
+	movq	%rcx, 64(%rsp)
+	call	_dl_tlsdesc_resolve_hold_fixup
+1:
+	movq	(%rsp), %rax
+	movq	8(%rsp), %rdi
+	movq	16(%rsp), %rsi
+	movq	24(%rsp), %r8
+	movq	32(%rsp), %r9
+	movq	40(%rsp), %r10
+	movq	48(%rsp), %r11
+	movq	56(%rsp), %rdx
+	movq	64(%rsp), %rcx
+	addq	$72, %rsp
+	cfi_adjust_cfa_offset (-72)
+	jmp	*(%eax)
+	cfi_endproc
+	.size	_dl_tlsdesc_resolve_hold, .-_dl_tlsdesc_resolve_hold
+
+#endif /* USE_TLS */
Index: sysdeps/x86_64/dl-tlsdesc.h
===================================================================
--- /dev/null
+++ sysdeps/x86_64/dl-tlsdesc.h
@@ -0,0 +1,63 @@
+/* Thread-local storage descriptor handling in the ELF dynamic linker.
+   x86_64 version.
+   Copyright (C) 2005 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, write to the Free
+   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307 USA.  */
+
+#ifndef _X86_64_DL_TLSDESC_H
+# define _X86_64_DL_TLSDESC_H 1
+
+/* Use this to access DT_TLSDESC_PLT and DT_TLSDESC_GOT.  */
+#ifndef ADDRIDX
+# define ADDRIDX(tag) (DT_NUM + DT_THISPROCNUM + DT_VERSIONTAGNUM \
+		       + DT_EXTRANUM + DT_VALNUM + DT_ADDRTAGIDX (tag))
+#endif
+
+/* Type used to represent a TLS descriptor in the GOT.  */
+struct tlsdesc
+{
+  ptrdiff_t (*entry)(struct tlsdesc *on_rax);
+  void *arg;
+};
+
+typedef struct dl_tls_index
+{
+  unsigned long int ti_module;
+  unsigned long int ti_offset;
+} tls_index;
+
+/* Type used as the argument in a TLS descriptor for a symbol that
+   needs dynamic TLS offsets.  */
+struct tlsdesc_dynamic_arg
+{
+  tls_index tlsinfo;
+  size_t gen_count;
+};
+
+extern ptrdiff_t attribute_hidden
+  _dl_tlsdesc_return(struct tlsdesc *on_rax),
+  _dl_tlsdesc_undefweak(struct tlsdesc *on_rax),
+  _dl_tlsdesc_resolve_rela(struct tlsdesc *on_rax),
+  _dl_tlsdesc_resolve_hold(struct tlsdesc *on_rax);
+
+# ifdef SHARED
+extern void *_dl_make_tlsdesc_dynamic (struct link_map *map, size_t ti_offset);
+
+extern ptrdiff_t attribute_hidden _dl_tlsdesc_dynamic(struct tlsdesc *);
+# endif
+
+#endif
Index: sysdeps/x86_64/tlsdesc.c
===================================================================
--- /dev/null
+++ sysdeps/x86_64/tlsdesc.c
@@ -0,0 +1,556 @@
+/* Manage TLS descriptors.  x86_64 version.
+   Copyright (C) 2005 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, write to the Free
+   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307 USA.  */
+
+#include <link.h>
+#include <ldsodefs.h>
+#include <elf/dynamic-link.h>
+#include <tls.h>
+#include <dl-tlsdesc.h>
+
+#ifdef USE_TLS
+# ifdef SHARED
+
+extern void weak_function free (void *ptr);
+
+/* The hashcode handling code below is heavily inspired in libiberty's
+   hashtab code, but with most adaptation points and support for
+   deleting elements removed.
+
+   Copyright (C) 1999, 2000, 2001, 2002, 2003 Free Software Foundation, Inc.
+   Contributed by Vladimir Makarov (vmakarov@cygnus.com).  */
+
+inline static unsigned long
+higher_prime_number (unsigned long n)
+{
+  /* These are primes that are near, but slightly smaller than, a
+     power of two.  */
+  static const unsigned long primes[] = {
+    (unsigned long) 7,
+    (unsigned long) 13,
+    (unsigned long) 31,
+    (unsigned long) 61,
+    (unsigned long) 127,
+    (unsigned long) 251,
+    (unsigned long) 509,
+    (unsigned long) 1021,
+    (unsigned long) 2039,
+    (unsigned long) 4093,
+    (unsigned long) 8191,
+    (unsigned long) 16381,
+    (unsigned long) 32749,
+    (unsigned long) 65521,
+    (unsigned long) 131071,
+    (unsigned long) 262139,
+    (unsigned long) 524287,
+    (unsigned long) 1048573,
+    (unsigned long) 2097143,
+    (unsigned long) 4194301,
+    (unsigned long) 8388593,
+    (unsigned long) 16777213,
+    (unsigned long) 33554393,
+    (unsigned long) 67108859,
+    (unsigned long) 134217689,
+    (unsigned long) 268435399,
+    (unsigned long) 536870909,
+    (unsigned long) 1073741789,
+    (unsigned long) 2147483647,
+					/* 4294967291L */
+    ((unsigned long) 2147483647) + ((unsigned long) 2147483644),
+  };
+
+  const unsigned long *low = &primes[0];
+  const unsigned long *high = &primes[sizeof(primes) / sizeof(primes[0])];
+
+  while (low != high)
+    {
+      const unsigned long *mid = low + (high - low) / 2;
+      if (n > *mid)
+	low = mid + 1;
+      else
+	high = mid;
+    }
+
+#if 0
+  /* If we've run out of primes, abort.  */
+  if (n > *low)
+    {
+      fprintf (stderr, "Cannot find prime bigger than %lu\n", n);
+      abort ();
+    }
+#endif
+
+  return *low;
+}
+
+struct hashtab
+{
+  /* Table itself.  */
+  void **entries;
+
+  /* Current size (in entries) of the hash table */
+  size_t size;
+
+  /* Current number of elements.  */
+  size_t n_elements;
+};
+
+inline static struct hashtab *
+htab_create (void)
+{
+  struct hashtab *ht = malloc (sizeof (struct hashtab));
+
+  if (! ht)
+    return NULL;
+  ht->size = 3;
+  ht->entries = malloc (sizeof (void *) * ht->size);
+  if (! ht->entries)
+    return NULL;
+
+  ht->n_elements = 0;
+
+  memset (ht->entries, 0, sizeof (void *) * ht->size);
+
+  return ht;
+}
+
+/* This is only called from _dl_unmap, so it's safe to call
+   free().  See the discussion below.  */
+inline static void
+htab_delete (struct hashtab *htab)
+{
+  int i;
+
+  for (i = htab->size - 1; i >= 0; i--)
+    if (htab->entries[i])
+      free (htab->entries[i]);
+
+  free (htab->entries);
+  free (htab);
+}
+
+/* Similar to htab_find_slot, but without several unwanted side effects:
+    - Does not call htab->eq_f when it finds an existing entry.
+    - Does not change the count of elements/searches/collisions in the
+      hash table.
+   This function also assumes there are no deleted entries in the table.
+   HASH is the hash value for the element to be inserted.  */
+
+inline static void **
+find_empty_slot_for_expand (struct hashtab *htab, int hash)
+{
+  size_t size = htab->size;
+  unsigned int index = hash % size;
+  void **slot = htab->entries + index;
+  int hash2;
+
+  if (! *slot)
+    return slot;
+
+  hash2 = 1 + hash % (size - 2);
+  for (;;)
+    {
+      index += hash2;
+      if (index >= size)
+	index -= size;
+
+      slot = htab->entries + index;
+      if (! *slot)
+	return slot;
+    }
+}
+
+/* The following function changes size of memory allocated for the
+   entries and repeatedly inserts the table elements.  The occupancy
+   of the table after the call will be about 50%.  Naturally the hash
+   table must already exist.  Remember also that the place of the
+   table entries is changed.  If memory allocation failures are allowed,
+   this function will return zero, indicating that the table could not be
+   expanded.  If all goes well, it will return a non-zero value.  */
+
+inline static int
+htab_expand (struct hashtab *htab, int (*hash_fn)(void *))
+{
+  void **oentries;
+  void **olimit;
+  void **p;
+  void **nentries;
+  size_t nsize;
+
+  oentries = htab->entries;
+  olimit = oentries + htab->size;
+
+  /* Resize only when table after removal of unused elements is either
+     too full or too empty.  */
+  if (htab->n_elements * 2 > htab->size)
+    nsize = higher_prime_number (htab->n_elements * 2);
+  else
+    nsize = htab->size;
+
+  nentries = malloc (sizeof (void *) * nsize);
+  memset (nentries, 0, sizeof (void *) * nsize);
+  if (nentries == NULL)
+    return 0;
+  htab->entries = nentries;
+  htab->size = nsize;
+
+  p = oentries;
+  do
+    {
+      if (*p)
+	*find_empty_slot_for_expand (htab, hash_fn (*p))
+	  = *p;
+
+      p++;
+    }
+  while (p < olimit);
+
+#if 0 /* We can't tell whether this was allocated by the malloc()
+	 built into ld.so or the one in the main executable or libc,
+	 and calling free() for something that wasn't malloc()ed could
+	 do Very Bad Things (TM).  Take the conservative approach
+	 here, potentially wasting as much memory as actually used by
+	 the hash table, even if multiple growths occur.  That's not
+	 so bad as to require some overengineered solution that would
+	 enable us to keep track of how it was allocated. */
+  free (oentries);
+#endif
+  return 1;
+}
+
+/* This function searches for a hash table slot containing an entry
+   equal to the given element.  To delete an entry, call this with
+   INSERT = 0, then call htab_clear_slot on the slot returned (possibly
+   after doing some checks).  To insert an entry, call this with
+   INSERT = 1, then write the value you want into the returned slot.
+   When inserting an entry, NULL may be returned if memory allocation
+   fails.  */
+
+inline static void **
+htab_find_slot (struct hashtab *htab, void *ptr, int insert,
+		int (*hash_fn)(void *), int (*eq_fn)(void *, void *))
+{
+  unsigned int index;
+  int hash, hash2;
+  size_t size;
+  void **entry;
+
+  if (htab->size * 3 <= htab->n_elements * 4
+      && htab_expand (htab, hash_fn) == 0)
+    return NULL;
+
+  hash = hash_fn (ptr);
+
+  size = htab->size;
+  index = hash % size;
+
+  entry = &htab->entries[index];
+  if (!*entry)
+    goto empty_entry;
+  else if (eq_fn (*entry, ptr))
+    return entry;
+
+  hash2 = 1 + hash % (size - 2);
+  for (;;)
+    {
+      index += hash2;
+      if (index >= size)
+	index -= size;
+
+      entry = &htab->entries[index];
+      if (!*entry)
+	goto empty_entry;
+      else if (eq_fn (*entry, ptr))
+	return entry;
+    }
+
+ empty_entry:
+  if (!insert)
+    return NULL;
+
+  htab->n_elements++;
+  return entry;
+}
+
+inline static int
+hash_tlsdesc(void *p)
+{
+  struct tlsdesc_dynamic_arg *td = p;
+
+  /* We know all entries are for the same module, so ti_offset is the
+     only distinguishing entry.  */
+  return td->tlsinfo.ti_offset;
+}
+
+inline static int
+eq_tlsdesc(void *p, void *q)
+{
+  struct tlsdesc_dynamic_arg *tdp = p, *tdq = q;
+
+  return tdp->tlsinfo.ti_offset == tdq->tlsinfo.ti_offset;
+}
+
+inline static int
+map_generation (struct link_map *map)
+{
+  size_t idx = map->l_tls_modid;
+  struct dtv_slotinfo_list *listp = GL(dl_tls_dtv_slotinfo_list);
+
+  /* Find the place in the dtv slotinfo list.  */
+  do
+    {
+      /* Does it fit in the array of this list element?  */
+      if (idx < listp->len)
+	{
+	  /* We should never get here for a module in static TLS, so
+	     we can assume that, if the generation count is zero, we
+	     still haven't determined the generation count for this
+	     module.  */
+	  if (listp->slotinfo[idx].gen)
+	    return listp->slotinfo[idx].gen;
+	  else
+	    break;
+	}
+      idx -= listp->len;
+      listp = listp->next;
+    }
+  while (listp != NULL);
+
+  /* If we get to this point, the module still hasn't been assigned an
+     entry in the dtv slotinfo data structures, and it will when we're
+     done with relocations.  At that point, the module will get a
+     generation number that is one past the current generation, so
+     return exactly that.  */
+  return GL(dl_tls_generation) + 1;
+}
+
+void *
+_dl_make_tlsdesc_dynamic (struct link_map *map, size_t ti_offset)
+{
+  struct hashtab *ht;
+  void **entry;
+  struct tlsdesc_dynamic_arg *td, test;
+
+  /* FIXME: We could use a per-map lock here, but is it worth it?  */
+  __rtld_lock_lock_recursive (GL(dl_load_lock));
+
+  ht = map->l_mach.tlsdesc_table;
+  if (! ht)
+    {
+      ht = htab_create ();
+      if (! ht)
+	{
+	  __rtld_lock_unlock_recursive (GL(dl_load_lock));
+	  return 0;
+	}
+      map->l_mach.tlsdesc_table = ht;
+    }
+
+  test.tlsinfo.ti_module = map->l_tls_modid;
+  test.tlsinfo.ti_offset = ti_offset;
+  entry = htab_find_slot (ht, &test, 1, hash_tlsdesc, eq_tlsdesc);
+  if (*entry)
+    {
+      td = *entry;
+      __rtld_lock_unlock_recursive (GL(dl_load_lock));
+      return td;
+    }
+
+  *entry = td = malloc (sizeof (struct tlsdesc_dynamic_arg));
+  /* This may be higher than the map's generation, but it doesn't
+     matter much.  Worst case, we'll have one extra DTV update per
+     thread.  */
+  td->gen_count = map_generation (map);
+  td->tlsinfo = test.tlsinfo;
+
+  __rtld_lock_unlock_recursive (GL(dl_load_lock));
+  return td;
+}
+
+# endif /* SHARED */
+
+/* The idea of the following two functions is to stop multiple threads
+   from attempting to resolve the same TLS descriptor without busy
+   waiting.  Ideally, we should be able to release the lock right
+   after changing td->entry, and then using say a condition variable
+   or a futex wake to wake up any waiting threads, but let's try to
+   avoid introducing such dependencies.  */
+
+inline static int
+_dl_tlsdesc_resolve_early_return_p (struct tlsdesc volatile *td, void *caller)
+{
+  if (caller != td->entry)
+    return 1;
+
+  __rtld_lock_lock_recursive (GL(dl_load_lock));
+  if (caller != td->entry)
+    {
+      __rtld_lock_unlock_recursive (GL(dl_load_lock));
+      return 1;
+    }
+
+  td->entry = _dl_tlsdesc_resolve_hold;
+
+  return 0;
+}
+
+inline static void
+_dl_tlsdesc_wake_up_held_fixups (void)
+{
+  __rtld_lock_unlock_recursive (GL(dl_load_lock));
+}
+
+/* The following 2 functions take an entry_check_offset argument.
+   It's computed by the caller as an offset between its entry point
+   and the call site, such that by adding the built-in return address
+   that is implicitly passed to the function with this offset, we can
+   easily obtain the caller's entry point to compare with the entry
+   point given in the TLS descriptor.  If it's changed, we want to
+   return immediately.  */
+
+/* These macros are copied from elf/dl-reloc.c */
+
+#define CHECK_STATIC_TLS(map, sym_map)					\
+    do {								\
+      if (__builtin_expect ((sym_map)->l_tls_offset == NO_TLS_OFFSET	\
+			    || ((sym_map)->l_tls_offset			\
+				== FORCED_DYNAMIC_TLS_OFFSET), 0))	\
+	_dl_allocate_static_tls (sym_map);				\
+    } while (0)
+
+#define TRY_STATIC_TLS(map, sym_map)					\
+    (__builtin_expect ((sym_map)->l_tls_offset				\
+		       != FORCED_DYNAMIC_TLS_OFFSET, 1)			\
+     && (__builtin_expect ((sym_map)->l_tls_offset != NO_TLS_OFFSET, 1)	\
+	 || _dl_try_allocate_static_tls (sym_map) == 0))
+
+int internal_function _dl_try_allocate_static_tls (struct link_map *map);
+
+/* This function is used to lazily resolve TLS_DESC RELA relocations.
+   The argument location is used to hold a pointer to the relocation.  */
+
+void
+attribute_hidden
+_dl_tlsdesc_resolve_rela_fixup (struct tlsdesc volatile *td,
+				struct link_map *l)
+{
+  const ElfW(Rela) *reloc = td->arg;
+
+  if (_dl_tlsdesc_resolve_early_return_p
+      (td, (void*)(D_PTR (l, l_info[ADDRIDX (DT_TLSDESC_PLT)]) + l->l_addr)))
+    return;
+
+  /* The code below was borrowed from _dl_fixup().  */
+  const ElfW(Sym) *const symtab
+    = (const void *) D_PTR (l, l_info[DT_SYMTAB]);
+  const char *strtab = (const void *) D_PTR (l, l_info[DT_STRTAB]);
+  const ElfW(Sym) *sym = &symtab[ELFW(R_SYM) (reloc->r_info)];
+  lookup_t result;
+
+   /* Look up the target symbol.  If the normal lookup rules are not
+      used don't look in the global scope.  */
+  if (ELFW(ST_BIND) (sym->st_info) != STB_LOCAL
+      && __builtin_expect (ELFW(ST_VISIBILITY) (sym->st_other), 0) == 0)
+    {
+      const struct r_found_version *version = NULL;
+
+      if (l->l_info[VERSYMIDX (DT_VERSYM)] != NULL)
+	{
+	  const ElfW(Half) *vernum =
+	    (const void *) D_PTR (l, l_info[VERSYMIDX (DT_VERSYM)]);
+	  ElfW(Half) ndx = vernum[ELFW(R_SYM) (reloc->r_info)] & 0x7fff;
+	  version = &l->l_versions[ndx];
+	  if (version->hash == 0)
+	    version = NULL;
+	}
+
+      result = _dl_lookup_symbol_x (strtab + sym->st_name, l, &sym,
+				    l->l_scope, version, ELF_RTYPE_CLASS_PLT,
+				    DL_LOOKUP_ADD_DEPENDENCY, NULL);
+    }
+  else
+    {
+      /* We already found the symbol.  The module (and therefore its load
+	 address) is also known.  */
+      result = l;
+    }
+
+  if (! sym)
+    {
+      td->arg = (void*)reloc->r_addend;
+      td->entry = _dl_tlsdesc_undefweak;
+    }
+  else
+    {
+#  ifndef SHARED
+      CHECK_STATIC_TLS (l, result);
+#  else
+      if (!TRY_STATIC_TLS (l, result))
+	{
+	  td->arg = _dl_make_tlsdesc_dynamic (result, sym->st_value
+					      + reloc->r_addend);
+	  td->entry = _dl_tlsdesc_dynamic;
+	}
+      else
+#  endif
+	{
+	  td->arg = (void*)(sym->st_value - result->l_tls_offset
+			    + reloc->r_addend);
+	  td->entry = _dl_tlsdesc_return;
+	}
+    }
+
+  _dl_tlsdesc_wake_up_held_fixups ();
+}
+
+void
+attribute_hidden
+_dl_tlsdesc_resolve_hold_fixup (struct tlsdesc volatile *td,
+				ptrdiff_t entry_check_offset)
+{
+  /* Maybe we're lucky and can return early.  */
+  if (__builtin_return_address (0) - entry_check_offset != td->entry)
+    return;
+
+  /* Locking here will stop execution until the runnign resolver runs
+     _dl_tlsdesc_wake_up_held_fixups(), releasing the lock.
+
+     FIXME: We'd be better off waiting on a condition variable, such
+     that we didn't have to hold the lock throughout the relocation
+     processing.  */
+  __rtld_lock_lock_recursive (GL(dl_load_lock));
+  __rtld_lock_unlock_recursive (GL(dl_load_lock));
+}
+
+#endif /* USE_TLS */
+
+void
+_dl_unmap (struct link_map *map)
+{
+  __munmap ((void *) (map)->l_map_start,
+	    (map)->l_map_end - (map)->l_map_start);
+
+#if USE_TLS && SHARED
+  /* _dl_unmap is only called for dlopen()ed libraries, for which
+     calling free() is safe, or before we've completed the initial
+     relocation, in which case calling free() is probably pointless,
+     but still safe.  */
+  if (map->l_mach.tlsdesc_table)
+    htab_delete (map->l_mach.tlsdesc_table);
+#endif
+}
Index: sysdeps/x86_64/tlsdesc.sym
===================================================================
--- /dev/null
+++ sysdeps/x86_64/tlsdesc.sym
@@ -0,0 +1,20 @@
+#include <stddef.h>
+#include <sysdep.h>
+#include <tls.h>
+#include <link.h>
+#include <dl-tlsdesc.h>
+
+--
+
+-- Abuse tls.h macros to derive offsets relative to the thread register.
+#if defined USE_TLS
+
+DTV_OFFSET			offsetof(struct pthread, header.dtv)
+
+TLSDESC_ARG			offsetof(struct tlsdesc, arg)
+
+TLSDESC_GEN_COUNT		offsetof(struct tlsdesc_dynamic_arg, gen_count)
+TLSDESC_MODID			offsetof(struct tlsdesc_dynamic_arg, tlsinfo.ti_module)
+TLSDESC_MODOFF			offsetof(struct tlsdesc_dynamic_arg, tlsinfo.ti_offset)
+
+#endif


-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]