This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: TLS improvements for IA32 and AMD64/EM64T


On Sep 22, 2005, Alexandre Oliva <aoliva@redhat.com> wrote:

> On Sep 17, 2005, Alexandre Oliva <aoliva@redhat.com> wrote:
>> On Sep 16, 2005, Alexandre Oliva <aoliva@redhat.com> wrote:
>>> On Sep 16, 2005, Alexandre Oliva <aoliva@redhat.com> wrote:
>>>> Over the past few months, I've been working on porting to IA32 and
>>>> AMD64/EM64T the interesting bits of the TLS design I came up with for
>>>> FR-V, achieving some impressive speedups along with slight code size
>>>> reductions in the most common cases.

>>>> Although the design is not set in stone yet, it's fully implemented
>>>> and functional with patches I'm about to post for binutils, gcc and
>>>> glibc mainline, as follow-ups to this message, except that the GCC
>>>> patch will go to gcc-patches, as expected.

>>> This is the glibc portion of the implementation.  I've built it for
>>> amd64-linux-gnu and i686-pc-linux-gnu, with both the current TLS
>>> dialect and the new -mtls-dialect=gnu2, without regressions.

>> Revised patch using new relocation and dynamic entry numbers.  Tested
>> again on all 4 combinations mentioned above.

> Revised patch that does not include other changes, fixes a few
> inconsistent uses of addends here and there (copy&pastos), avoids
> crashing when lazily resolving TLSDESC relocations to weak symbols
> that turn out to be undefined, and, as a bonus, handles them correctly
> such that their address does map to NULL for all threads.  I know
> people shouldn't rely on this, since the linker may very well break
> it, but hey, since I was already tweaking the code to avoid crashes,
> why not take the final step and make it work as closely as possible to
> a random person's expectations? :-)

> Index: ChangeLog
> from  Alexandre Oliva  <aoliva@redhat.com>

> 	Introduce TLS descriptors for i386 and x86_64.

binutils 2.17 and GCC 4.2 will be able to generate code conforming to
this ABI extension, so ideally GLIBC should be able to support the
corresponding relocations.  The patch has been pending for a while and
AFAIK it was not even reviewed yet :-(

This was re-tested with yesterday's glibc tree, using GCC and binutils
trunks as of one or two weeks ago, no regressions on 32- or 64-bit
builds using the old and the new dynamic access models, with the use
of Initial Exec enabled or forced disabled (for a total of 8 builds).

Index: ChangeLog
from  Alexandre Oliva  <aoliva@redhat.com>

	Introduce TLS descriptors for i386 and x86_64.
	* elf/dl-reloc.c (_dl_try_allocate_static_tls): Extract from...
	(_dl_allocate_static_tls): ... here.  Rearrange failure path.
	(TRY_STATIC_TLS): New macro.
	* elf/dl-conflict.c (TRY_STATIC_TLS): Dummy define.
	* elf/elf.h (DT_TLSDESC_GOT, DT_TLSDESC_PLT): Define.
	(R_386_TLS_GOTDESC, R_386_TLS_DESC_CALL, R_386_TLS_DESC): Define.
	(R_X86_64_PC64, R_X86_GOTOFF64, R_X86_64_GOTPC32): Merge from
	binutils.
	(R_X86_64_GOTPC32_TLSDESC, R_X86_64_TLSDESC_CALL,
	R_X86_64_TLSDESC): Define.
	(R_386_NUM, R_X86_64_NUM): Adjust.
	* sysdeps/i386/Makefile (sysdep-dl-routines, sysdep_routines,
	systep-rtld-routines): Add tlsdesc and dl-tlsdesc for elf subdir.
	(gen-as-const-headers): Add tlsdesc.sym to csu subdir.
	* sysdeps/i386/dl-lookupcfg.h: New file.  Introduce _dl_unmap to
	release tlsdesc_table.
	* sysdeps/i386/dl-machine.h: Include dl-tlsdesc.h.
	(elf_machine_type_class): Mark R_386_TLS_DESC as PLT class.
	(elf_machine_rel): Handle R_386_TLS_DESC.
	(elf_machine_rela): Likewise.
	(elf_machine_lazy_rel): Likewise.
	(elf_machine_lazy_rela): Likewise.
	* sysdeps/i386/dl-tls.h (struct dl_tls_index): Name it.
	* sysdeps/i386/dl-tlsdesc.S: New file.
	* sysdeps/i386/dl-tlsdesc.h: New file.
	* sysdeps/i386/tlsdesc.c: New file.
	* sysdeps/i386/tlsdesc.sym: New file.
	* sysdeps/i386/bits/linkmap.h (struct link_map_machine): Add
	tlsdesc_table.
	* sysdeps/x86_64/Makefile (sysdep-dl-routines, sysdep_routines,
	systep-rtld-routines): Add tlsdesc and dl-tlsdesc for elf subdir.
	(gen-as-const-headers): Add tlsdesc.sym to csu subdir.
	* sysdeps/x86_64/dl-lookupcfg.h: New file.  Introduce _dl_unmap to
	release tlsdesc_table.
	* sysdeps/x86_64/dl-machine.h: Include dl-tlsdesc.h.
	(elf_machine_runtime_setup): Set up lazy TLSDESC GOT entry.
	(elf_machine_type_class): Mark R_X86_64_TLSDESC as PLT class.
	(elf_machine_rel): Handle R_X86_64_TLSDESC.
	(elf_machine_rela): Likewise.
	(elf_machine_lazy_rel): Likewise.
	* sysdeps/x86_64/dl-tls.h (struct dl_tls_index): Name it.
	(__tls_get_addr): Do not declare for non-shared compiles.
	* sysdeps/x86_64/dl-tlsdesc.S: New file.
	* sysdeps/x86_64/dl-tlsdesc.h: New file.
	* sysdeps/x86_64/tlsdesc.c: New file.
	* sysdeps/x86_64/tlsdesc.sym: New file.
	* sysdeps/x86_64/bits/linkmap.h (struct link_map_machine): Add
	tlsdesc_table for both 32- and 64-bit structs.

Index: elf/dl-conflict.c
===================================================================
--- elf/dl-conflict.c.orig	2006-01-13 18:14:11.000000000 -0500
+++ elf/dl-conflict.c	2006-01-13 18:16:22.000000000 -0500
@@ -45,6 +45,7 @@
 #define RESOLVE_MAP(ref, version, flags) (*ref = NULL, NULL)
 #define RESOLVE(ref, version, flags) (*ref = NULL, 0)
 #define CHECK_STATIC_TLS(ref_map, sym_map) ((void) 0)
+#define TRY_STATIC_TLS(ref_map, sym_map) (0)
 #define RESOLVE_CONFLICT_FIND_MAP(map, r_offset) \
   do {									      \
     while ((resolve_conflict_map->l_map_end < (ElfW(Addr)) (r_offset))	      \
Index: elf/dl-reloc.c
===================================================================
--- elf/dl-reloc.c.orig	2006-01-13 18:16:22.000000000 -0500
+++ elf/dl-reloc.c	2006-01-13 18:16:22.000000000 -0500
@@ -44,9 +44,9 @@
    This function intentionally does not return any value but signals error
    directly, as static TLS should be rare and code handling it should
    not be inlined as much as possible.  */
-void
-internal_function __attribute_noinline__
-_dl_allocate_static_tls (struct link_map *map)
+int
+internal_function
+_dl_try_allocate_static_tls (struct link_map *map)
 {
   /* If we've already used the variable with dynamic access, or if the
      alignment requirements are too high, fail.  */
@@ -54,8 +54,7 @@
       || map->l_tls_align > GL(dl_tls_static_align))
     {
     fail:
-      _dl_signal_error (0, map->l_name, NULL, N_("\
-cannot allocate memory in static TLS block"));
+      return -1;
     }
 
 # if TLS_TCB_AT_TP
@@ -109,6 +108,20 @@
     }
   else
     map->l_need_tls_init = 1;
+
+  return 0;
+}
+
+void
+internal_function __attribute_noinline__
+_dl_allocate_static_tls (struct link_map *map)
+{
+  if (map->l_tls_offset == FORCED_DYNAMIC_TLS_OFFSET
+      || _dl_try_allocate_static_tls (map))
+    {
+      _dl_signal_error (0, map->l_name, NULL, N_("\
+cannot allocate memory in static TLS block"));
+    }
 }
 
 /* Initialize static TLS area and DTV for current (only) thread.
@@ -267,6 +280,12 @@
 	_dl_allocate_static_tls (sym_map);				\
     } while (0)
 
+#define TRY_STATIC_TLS(map, sym_map)					\
+    (__builtin_expect ((sym_map)->l_tls_offset				\
+		       != FORCED_DYNAMIC_TLS_OFFSET, 1)			\
+     && (__builtin_expect ((sym_map)->l_tls_offset != NO_TLS_OFFSET, 1)	\
+	 || _dl_try_allocate_static_tls (sym_map) == 0))
+
 #include "dynamic-link.h"
 
     ELF_DYNAMIC_RELOCATE (l, lazy, consider_profiling);
Index: elf/elf.h
===================================================================
--- elf/elf.h.orig	2006-01-13 18:14:11.000000000 -0500
+++ elf/elf.h	2006-01-13 18:16:22.000000000 -0500
@@ -699,6 +699,12 @@
    If any adjustment is made to the ELF object after it has been
    built these entries will need to be adjusted.  */
 #define DT_ADDRRNGLO	0x6ffffe00
+#define DT_TLSDESC_PLT	0x6ffffef6	/* Location of PLT entry for
+					   TLS descriptor resolver
+					   calls.  */
+#define DT_TLSDESC_GOT	0x6ffffef7	/* Location of GOT entry used
+					   by TLS descriptor resolver
+					   PLT entry.  */
 #define DT_GNU_CONFLICT	0x6ffffef8	/* Start of conflict section */
 #define DT_GNU_LIBLIST	0x6ffffef9	/* Library list */
 #define DT_CONFIG	0x6ffffefa	/* Configuration information.  */
@@ -1136,8 +1142,17 @@
 #define R_386_TLS_DTPMOD32 35		/* ID of module containing symbol */
 #define R_386_TLS_DTPOFF32 36		/* Offset in TLS block */
 #define R_386_TLS_TPOFF32  37		/* Negated offset in static TLS block */
+/* 38? */
+#define R_386_TLS_GOTDESC  39		/* GOT offset for TLS descriptor.  */
+#define R_386_TLS_DESC_CALL 40		/* Marker of call through TLS
+					   descriptor for
+					   relaxation.  */
+#define R_386_TLS_DESC     41		/* TLS descriptor containing
+					   pointer to code and to
+					   argument, returning the TLS
+					   offset for the symbol.  */
 /* Keep this the last entry.  */
-#define R_386_NUM	   38
+#define R_386_NUM	   42
 
 /* SUN SPARC specific definitions.  */
 
@@ -2509,8 +2524,17 @@
 #define R_X86_64_GOTTPOFF	22	/* 32 bit signed PC relative offset
 					   to GOT entry for IE symbol */
 #define R_X86_64_TPOFF32	23	/* Offset in initial TLS block */
+#define R_X86_64_PC64		24	/* PC relative 64 bit */
+#define R_X86_64_GOTOFF64	25	/* 64 bit offset to GOT */
+#define R_X86_64_GOTPC32	26	/* 32 bit signed pc relative
+					   offset to GOT */
+/* 27 .. 33 */
+#define R_X86_64_GOTPC32_TLSDESC 34	/* GOT offset for TLS descriptor.  */
+#define R_X86_64_TLSDESC_CALL   35	/* Marker for call through TLS
+					   descriptor.  */
+#define R_X86_64_TLSDESC        36	/* TLS descriptor.  */
 
-#define R_X86_64_NUM		24
+#define R_X86_64_NUM		37
 
 
 /* AM33 relocations.  */
Index: sysdeps/i386/Makefile
===================================================================
--- sysdeps/i386/Makefile.orig	2006-01-13 18:14:11.000000000 -0500
+++ sysdeps/i386/Makefile	2006-01-13 18:16:22.000000000 -0500
@@ -65,3 +65,13 @@
 ifneq (,$(filter -mno-tls-direct-seg-refs,$(CFLAGS)))
 defines += -DNO_TLS_DIRECT_SEG_REFS
 endif
+
+ifeq ($(subdir),elf)
+sysdep-dl-routines += tlsdesc dl-tlsdesc
+sysdep_routines += tlsdesc dl-tlsdesc
+sysdep-rtld-routines += tlsdesc dl-tlsdesc
+endif
+
+ifeq ($(subdir),csu)
+gen-as-const-headers += tlsdesc.sym
+endif
Index: sysdeps/i386/bits/linkmap.h
===================================================================
--- sysdeps/i386/bits/linkmap.h.orig	2006-01-13 18:14:11.000000000 -0500
+++ sysdeps/i386/bits/linkmap.h	2006-01-13 18:16:22.000000000 -0500
@@ -2,4 +2,5 @@
   {
     Elf32_Addr plt; /* Address of .plt + 0x16 */
     Elf32_Addr gotplt; /* Address of .got + 0x0c */
+    void *tlsdesc_table; /* Address of TLS descriptor hash table.  */
   };
Index: sysdeps/i386/dl-lookupcfg.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ sysdeps/i386/dl-lookupcfg.h	2006-01-13 18:16:22.000000000 -0500
@@ -0,0 +1,28 @@
+/* Configuration of lookup functions.
+   Copyright (C) 2005 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, write to the Free
+   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307 USA.  */
+
+#define DL_UNMAP_IS_SPECIAL
+
+#include_next <dl-lookupcfg.h>
+
+struct link_map;
+
+extern void _dl_unmap (struct link_map *map);
+
+#define DL_UNMAP(map) _dl_unmap (map)
Index: sysdeps/i386/dl-machine.h
===================================================================
--- sysdeps/i386/dl-machine.h.orig	2006-01-13 18:14:11.000000000 -0500
+++ sysdeps/i386/dl-machine.h	2006-01-13 18:16:22.000000000 -0500
@@ -25,6 +25,7 @@
 #include <sys/param.h>
 #include <sysdep.h>
 #include <tls.h>
+#include <dl-tlsdesc.h>
 
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute__ ((unused))
@@ -248,7 +249,7 @@
 # define elf_machine_type_class(type) \
   ((((type) == R_386_JMP_SLOT || (type) == R_386_TLS_DTPMOD32		      \
      || (type) == R_386_TLS_DTPOFF32 || (type) == R_386_TLS_TPOFF32	      \
-     || (type) == R_386_TLS_TPOFF)					      \
+     || (type) == R_386_TLS_TPOFF || (type) == R_386_TLS_DESC)		      \
     * ELF_RTYPE_CLASS_PLT)						      \
    | (((type) == R_386_COPY) * ELF_RTYPE_CLASS_COPY))
 #else
@@ -375,6 +376,38 @@
 	    *reloc_addr = sym->st_value;
 # endif
 	  break;
+	case R_386_TLS_DESC:
+	  {
+	    struct tlsdesc volatile *td =
+	      (struct tlsdesc volatile *)reloc_addr;
+
+# ifndef RTLD_BOOTSTRAP
+	    if (! sym)
+	      td->entry = _dl_tlsdesc_undefweak;
+	    else
+# endif
+	      {
+# ifndef RTLD_BOOTSTRAP
+#  ifndef SHARED
+		CHECK_STATIC_TLS (map, sym_map);
+#  else
+		if (!TRY_STATIC_TLS (map, sym_map))
+		  {
+		    td->arg = _dl_make_tlsdesc_dynamic
+		      (sym_map, sym->st_value + (ElfW(Word))td->arg);
+		    td->entry = _dl_tlsdesc_dynamic;
+		  }
+		else
+#  endif
+# endif
+		  {
+		    td->arg = (void*)(sym->st_value - sym_map->l_tls_offset
+				      + (ElfW(Word))td->arg);
+		    td->entry = _dl_tlsdesc_return;
+		  }
+	      }
+	    break;
+	  }
 	case R_386_TLS_TPOFF32:
 	  /* The offset is positive, backward from the thread pointer.  */
 # ifdef RTLD_BOOTSTRAP
@@ -488,6 +521,41 @@
 	     Therefore the offset is already correct.  */
 	  *reloc_addr = (sym == NULL ? 0 : sym->st_value) + reloc->r_addend;
 	  break;
+	case R_386_TLS_DESC:
+	  {
+	    struct tlsdesc volatile *td =
+	      (struct tlsdesc volatile *)reloc_addr;
+
+# ifndef RTLD_BOOTSTRAP
+	    if (!sym)
+	      {
+		td->arg = (void*)reloc->r_addend;
+		td->entry = _dl_tlsdesc_undefweak;
+	      }
+	    else
+# endif
+	      {
+# ifndef RTLD_BOOTSTRAP
+#  ifndef SHARED
+		CHECK_STATIC_TLS (map, sym_map);
+#  else
+		if (!TRY_STATIC_TLS (map, sym_map))
+		  {
+		    td->arg = _dl_make_tlsdesc_dynamic
+		      (sym_map, sym->st_value + reloc->r_addend);
+		    td->entry = _dl_tlsdesc_dynamic;
+		  }
+		else
+#  endif
+# endif
+		  {
+		    td->arg = (void*)(sym->st_value - sym_map->l_tls_offset
+				      + reloc->r_addend);
+		    td->entry = _dl_tlsdesc_return;
+		  }
+	      }
+	  }
+	  break;
 	case R_386_TLS_TPOFF32:
 	  /* The offset is positive, backward from the thread pointer.  */
 	  /* We know the offset of object the symbol is contained in.
@@ -582,6 +650,55 @@
 	*reloc_addr = (map->l_mach.plt
 		       + (((Elf32_Addr) reloc_addr) - map->l_mach.gotplt) * 4);
     }
+#ifdef USE_TLS
+  else if (__builtin_expect (r_type == R_386_TLS_DESC, 1))
+    {
+      struct tlsdesc volatile * __attribute__((__unused__)) td =
+	(struct tlsdesc volatile *)reloc_addr;
+
+      /* Handle relocations that reference the local *ABS* in a simple
+	 way, so as to preserve a potential addend.  */
+      if (ELF32_R_SYM (reloc->r_info) == 0)
+	td->entry = _dl_tlsdesc_resolve_abs_plus_addend;
+      /* Given a known-zero addend, we can store a pointer to the
+	 reloc in the arg position.  */
+      else if (td->arg == 0)
+	{
+	  td->arg = (void*)reloc;
+	  td->entry = _dl_tlsdesc_resolve_rel;
+	}
+      else
+	{
+	  /* We could handle non-*ABS* relocations with non-zero addends
+	     by allocating dynamically an arg to hold a pointer to the
+	     reloc, but that sounds pointless.  */
+	  const Elf32_Rel *const r = reloc;
+	  /* The code below was borrowed from elf_dynamic_do_rel().  */
+	  const ElfW(Sym) *const symtab =
+	    (const void *) D_PTR (map, l_info[DT_SYMTAB]);
+
+#ifdef RTLD_BOOTSTRAP
+	  /* The dynamic linker always uses versioning.  */
+	  assert (map->l_info[VERSYMIDX (DT_VERSYM)] != NULL);
+#else
+	  if (map->l_info[VERSYMIDX (DT_VERSYM)])
+#endif
+	    {
+	      const ElfW(Half) *const version =
+		(const void *) D_PTR (map, l_info[VERSYMIDX (DT_VERSYM)]);
+	      ElfW(Half) ndx = version[ELFW(R_SYM) (r->r_info)] & 0x7fff;
+	      elf_machine_rel (map, r, &symtab[ELFW(R_SYM) (r->r_info)],
+			       &map->l_versions[ndx],
+			       (void *) (l_addr + r->r_offset));
+	    }
+#ifndef RTLD_BOOTSTRAP
+	  else
+	    elf_machine_rel (map, r, &symtab[ELFW(R_SYM) (r->r_info)], NULL,
+			     (void *) (l_addr + r->r_offset));
+#endif
+	}
+    }
+#endif
   else
     _dl_reloc_bad_type (map, r_type, 1);
 }
@@ -593,6 +710,22 @@
 elf_machine_lazy_rela (struct link_map *map,
 		       Elf32_Addr l_addr, const Elf32_Rela *reloc)
 {
+#ifdef USE_TLS
+  Elf32_Addr *const reloc_addr = (void *) (l_addr + reloc->r_offset);
+  const unsigned int r_type = ELF32_R_TYPE (reloc->r_info);
+  if (__builtin_expect (r_type == R_386_JMP_SLOT, 1))
+    ;
+  else if (__builtin_expect (r_type == R_386_TLS_DESC, 1))
+    {
+      struct tlsdesc volatile * __attribute__((__unused__)) td =
+	(struct tlsdesc volatile *)reloc_addr;
+
+      td->arg = (void*)reloc;
+      td->entry = _dl_tlsdesc_resolve_rela;
+    }
+  else
+    _dl_reloc_bad_type (map, r_type, 1);
+#endif
 }
 
 #endif	/* !RTLD_BOOTSTRAP */
Index: sysdeps/i386/dl-tls.h
===================================================================
--- sysdeps/i386/dl-tls.h.orig	2006-01-13 18:14:11.000000000 -0500
+++ sysdeps/i386/dl-tls.h	2006-01-13 18:16:22.000000000 -0500
@@ -19,7 +19,7 @@
 
 
 /* Type used for the representation of TLS information in the GOT.  */
-typedef struct
+typedef struct dl_tls_index
 {
   unsigned long int ti_module;
   unsigned long int ti_offset;
Index: sysdeps/i386/dl-tlsdesc.S
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ sysdeps/i386/dl-tlsdesc.S	2006-01-13 18:16:22.000000000 -0500
@@ -0,0 +1,228 @@
+/* Thread-local storage handling in the ELF dynamic linker.  i386 version.
+   Copyright (C) 2004, 2005 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, write to the Free
+   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307 USA.  */
+
+#include <sysdep.h>
+#include <tls.h>
+#include "tlsdesc.h"
+
+	.text
+#ifdef USE_TLS
+	.hidden _dl_tlsdesc_return
+	.global	_dl_tlsdesc_return
+	.type	_dl_tlsdesc_return,@function
+	cfi_startproc
+	.align 16
+_dl_tlsdesc_return:
+	movl	4(%eax), %eax
+	ret
+	cfi_endproc
+	.size	_dl_tlsdesc_return, .-_dl_tlsdesc_return
+
+	.hidden _dl_tlsdesc_undefweak
+	.global	_dl_tlsdesc_undefweak
+	.type	_dl_tlsdesc_undefweak,@function
+	cfi_startproc
+	.align 16
+_dl_tlsdesc_undefweak:
+	movl	4(%eax), %eax
+	subl	%gs:0, %eax
+	ret
+	cfi_endproc
+	.size	_dl_tlsdesc_undefweak, .-_dl_tlsdesc_undefweak
+
+#ifdef SHARED
+	.hidden _dl_tlsdesc_dynamic
+	.global	_dl_tlsdesc_dynamic
+	.type	_dl_tlsdesc_dynamic,@function
+
+     /* %eax points to the TLS descriptor, such that 0(%eax) points to
+	_dl_tlsdesc_dynamic itself, and 4(%eax) points to a struct
+	tlsdesc_dynamic_arg object.  It must return in %eax the offset
+	between the thread pointer and the object denoted by the
+	argument, without clobbering any registers.
+
+	The assembly code that follows is a rendition of the following
+	C code, hand-optimized a little bit.
+
+ptrdiff_t
+__attribute__ ((__regparm__ (1)))
+_dl_tlsdesc_dynamic (struct tlsdesc *tdp)
+{
+  struct tlsdesc_dynamic_arg *td = tdp->arg;
+  dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
+  if (__builtin_expect (td->gen_count <= dtv[0].counter
+			&& (dtv[td->tlsinfo.ti_module].pointer.val
+			    != TLS_DTV_UNALLOCATED),
+			1))
+    return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
+      - __thread_pointer;
+
+  return ___tls_get_addr (&td->tlsinfo) - __thread_pointer;
+}
+*/
+	cfi_startproc
+	.align 16
+_dl_tlsdesc_dynamic:
+	/* Preserve call-clobbered registers.
+	   We need two scratch regs anyway.
+	   FIXME: maybe remove the requirement to preserve them?  */
+	subl	$28, %esp
+	cfi_adjust_cfa_offset (28)
+	movl	%ecx, 20(%esp)
+	movl	%edx, 24(%esp)
+	movl	TLSDESC_ARG(%eax), %eax
+	movl	%gs:DTV_OFFSET, %edx
+	movl	TLSDESC_GEN_COUNT(%eax), %ecx
+	cmpl	(%edx), %ecx
+	ja	.Lslow
+	movl	TLSDESC_MODID(%eax), %ecx
+	movl	(%edx,%ecx,8), %edx
+	cmpl	$-1, %edx
+	je	.Lslow
+	movl	TLSDESC_MODOFF(%eax), %eax
+	addl	%edx, %eax
+.Lret:
+	movl	20(%esp), %ecx
+	subl	%gs:0, %eax
+	movl	24(%esp), %edx
+	addl	$28, %esp
+	cfi_adjust_cfa_offset (-28)
+	ret
+	.p2align 4,,7
+.Lslow:
+	cfi_adjust_cfa_offset (28)
+	movl	%ebx, 16(%esp)
+	call	__i686.get_pc_thunk.bx
+	addl	$_GLOBAL_OFFSET_TABLE_, %ebx
+	call	___tls_get_addr@PLT
+	movl	16(%esp), %ebx
+	jmp	.Lret
+	cfi_endproc
+	.size	_dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
+#endif /* SHARED */
+
+	.hidden _dl_tlsdesc_resolve_abs_plus_addend
+	.global	_dl_tlsdesc_resolve_abs_plus_addend
+	.type	_dl_tlsdesc_resolve_abs_plus_addend,@function
+	cfi_startproc
+	.align 16
+_dl_tlsdesc_resolve_abs_plus_addend:
+0:
+	pushl	%eax
+	cfi_adjust_cfa_offset (4)
+	pushl	%ecx
+	cfi_adjust_cfa_offset (4)
+	pushl	%edx
+	cfi_adjust_cfa_offset (4)
+	movl	$1f - 0b, %ecx
+	movl	4(%ebx), %edx
+	call	_dl_tlsdesc_resolve_abs_plus_addend_fixup
+1:
+	popl	%edx
+	cfi_adjust_cfa_offset (-4)
+	popl	%ecx
+	cfi_adjust_cfa_offset (-4)
+	popl	%eax
+	cfi_adjust_cfa_offset (-4)
+	jmp	*(%eax)
+	cfi_endproc
+	.size	_dl_tlsdesc_resolve_abs_plus_addend, .-_dl_tlsdesc_resolve_abs_plus_addend
+
+	.hidden _dl_tlsdesc_resolve_rel
+	.global	_dl_tlsdesc_resolve_rel
+	.type	_dl_tlsdesc_resolve_rel,@function
+	cfi_startproc
+	.align 16
+_dl_tlsdesc_resolve_rel:
+0:
+	pushl	%eax
+	cfi_adjust_cfa_offset (4)
+	pushl	%ecx
+	cfi_adjust_cfa_offset (4)
+	pushl	%edx
+	cfi_adjust_cfa_offset (4)
+	movl	$1f - 0b, %ecx
+	movl	4(%ebx), %edx
+	call	_dl_tlsdesc_resolve_rel_fixup
+1:
+	popl	%edx
+	cfi_adjust_cfa_offset (-4)
+	popl	%ecx
+	cfi_adjust_cfa_offset (-4)
+	popl	%eax
+	cfi_adjust_cfa_offset (-4)
+	jmp	*(%eax)
+	cfi_endproc
+	.size	_dl_tlsdesc_resolve_rel, .-_dl_tlsdesc_resolve_rel
+
+	.hidden _dl_tlsdesc_resolve_rela
+	.global	_dl_tlsdesc_resolve_rela
+	.type	_dl_tlsdesc_resolve_rela,@function
+	cfi_startproc
+	.align 16
+_dl_tlsdesc_resolve_rela:
+0:
+	pushl	%eax
+	cfi_adjust_cfa_offset (4)
+	pushl	%ecx
+	cfi_adjust_cfa_offset (4)
+	pushl	%edx
+	cfi_adjust_cfa_offset (4)
+	movl	$1f - 0b, %ecx
+	movl	4(%ebx), %edx
+	call	_dl_tlsdesc_resolve_rela_fixup
+1:
+	popl	%edx
+	cfi_adjust_cfa_offset (-4)
+	popl	%ecx
+	cfi_adjust_cfa_offset (-4)
+	popl	%eax
+	cfi_adjust_cfa_offset (-4)
+	jmp	*(%eax)
+	cfi_endproc
+	.size	_dl_tlsdesc_resolve_rela, .-_dl_tlsdesc_resolve_rela
+
+	.hidden _dl_tlsdesc_resolve_hold
+	.global	_dl_tlsdesc_resolve_hold
+	.type	_dl_tlsdesc_resolve_hold,@function
+	cfi_startproc
+	.align 16
+_dl_tlsdesc_resolve_hold:
+0:
+	pushl	%eax
+	cfi_adjust_cfa_offset (4)
+	pushl	%ecx
+	cfi_adjust_cfa_offset (4)
+	pushl	%edx
+	cfi_adjust_cfa_offset (4)
+	movl	$1f - 0b, %ecx
+	movl	4(%ebx), %edx
+	call	_dl_tlsdesc_resolve_hold_fixup
+1:
+	popl	%edx
+	cfi_adjust_cfa_offset (-4)
+	popl	%ecx
+	cfi_adjust_cfa_offset (-4)
+	popl	%eax
+	cfi_adjust_cfa_offset (-4)
+	jmp	*(%eax)
+	cfi_endproc
+	.size	_dl_tlsdesc_resolve_hold, .-_dl_tlsdesc_resolve_hold
+
+#endif /* USE_TLS */
Index: sysdeps/i386/dl-tlsdesc.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ sysdeps/i386/dl-tlsdesc.h	2006-01-13 18:16:22.000000000 -0500
@@ -0,0 +1,60 @@
+/* Thread-local storage descriptor handling in the ELF dynamic linker.
+   i386 version.
+   Copyright (C) 2005 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, write to the Free
+   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307 USA.  */
+
+#ifndef _I386_DL_TLSDESC_H
+# define _I386_DL_TLSDESC_H 1
+
+/* Type used to represent a TLS descriptor in the GOT.  */
+struct tlsdesc
+{
+  ptrdiff_t __attribute__((regparm(1))) (*entry)(struct tlsdesc *);
+  void *arg;
+};
+
+typedef struct dl_tls_index
+{
+  unsigned long int ti_module;
+  unsigned long int ti_offset;
+} tls_index;
+
+/* Type used as the argument in a TLS descriptor for a symbol that
+   needs dynamic TLS offsets.  */
+struct tlsdesc_dynamic_arg
+{
+  tls_index tlsinfo;
+  size_t gen_count;
+};
+
+extern ptrdiff_t attribute_hidden __attribute__((regparm(1)))
+  _dl_tlsdesc_return(struct tlsdesc *),
+  _dl_tlsdesc_undefweak(struct tlsdesc *),
+  _dl_tlsdesc_resolve_abs_plus_addend(struct tlsdesc *),
+  _dl_tlsdesc_resolve_rel(struct tlsdesc *),
+  _dl_tlsdesc_resolve_rela(struct tlsdesc *),
+  _dl_tlsdesc_resolve_hold(struct tlsdesc *);
+
+# ifdef SHARED
+extern void *_dl_make_tlsdesc_dynamic (struct link_map *map, size_t ti_offset);
+
+extern ptrdiff_t attribute_hidden __attribute__((regparm(1)))
+  _dl_tlsdesc_dynamic(struct tlsdesc *);
+# endif
+
+#endif
Index: sysdeps/i386/tlsdesc.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ sysdeps/i386/tlsdesc.c	2006-01-13 18:16:22.000000000 -0500
@@ -0,0 +1,673 @@
+/* Manage TLS descriptors.  i386 version.
+   Copyright (C) 2005 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, write to the Free
+   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307 USA.  */
+
+#include <link.h>
+#include <ldsodefs.h>
+#include <elf/dynamic-link.h>
+#include <tls.h>
+#include <dl-tlsdesc.h>
+
+#ifdef USE_TLS
+# ifdef SHARED
+
+extern void weak_function free (void *ptr);
+
+/* The hashcode handling code below is heavily inspired in libiberty's
+   hashtab code, but with most adaptation points and support for
+   deleting elements removed.
+
+   Copyright (C) 1999, 2000, 2001, 2002, 2003 Free Software Foundation, Inc.
+   Contributed by Vladimir Makarov (vmakarov@cygnus.com).  */
+
+inline static unsigned long
+higher_prime_number (unsigned long n)
+{
+  /* These are primes that are near, but slightly smaller than, a
+     power of two.  */
+  static const unsigned long primes[] = {
+    (unsigned long) 7,
+    (unsigned long) 13,
+    (unsigned long) 31,
+    (unsigned long) 61,
+    (unsigned long) 127,
+    (unsigned long) 251,
+    (unsigned long) 509,
+    (unsigned long) 1021,
+    (unsigned long) 2039,
+    (unsigned long) 4093,
+    (unsigned long) 8191,
+    (unsigned long) 16381,
+    (unsigned long) 32749,
+    (unsigned long) 65521,
+    (unsigned long) 131071,
+    (unsigned long) 262139,
+    (unsigned long) 524287,
+    (unsigned long) 1048573,
+    (unsigned long) 2097143,
+    (unsigned long) 4194301,
+    (unsigned long) 8388593,
+    (unsigned long) 16777213,
+    (unsigned long) 33554393,
+    (unsigned long) 67108859,
+    (unsigned long) 134217689,
+    (unsigned long) 268435399,
+    (unsigned long) 536870909,
+    (unsigned long) 1073741789,
+    (unsigned long) 2147483647,
+					/* 4294967291L */
+    ((unsigned long) 2147483647) + ((unsigned long) 2147483644),
+  };
+
+  const unsigned long *low = &primes[0];
+  const unsigned long *high = &primes[sizeof(primes) / sizeof(primes[0])];
+
+  while (low != high)
+    {
+      const unsigned long *mid = low + (high - low) / 2;
+      if (n > *mid)
+	low = mid + 1;
+      else
+	high = mid;
+    }
+
+#if 0
+  /* If we've run out of primes, abort.  */
+  if (n > *low)
+    {
+      fprintf (stderr, "Cannot find prime bigger than %lu\n", n);
+      abort ();
+    }
+#endif
+
+  return *low;
+}
+
+struct hashtab
+{
+  /* Table itself.  */
+  void **entries;
+
+  /* Current size (in entries) of the hash table */
+  size_t size;
+
+  /* Current number of elements.  */
+  size_t n_elements;
+};
+
+inline static struct hashtab *
+htab_create (void)
+{
+  struct hashtab *ht = malloc (sizeof (struct hashtab));
+
+  if (! ht)
+    return NULL;
+  ht->size = 3;
+  ht->entries = malloc (sizeof (void *) * ht->size);
+  if (! ht->entries)
+    return NULL;
+
+  ht->n_elements = 0;
+
+  memset (ht->entries, 0, sizeof (void *) * ht->size);
+
+  return ht;
+}
+
+/* This is only called from _dl_unmap, so it's safe to call
+   free().  See the discussion below.  */
+inline static void
+htab_delete (struct hashtab *htab)
+{
+  int i;
+
+  for (i = htab->size - 1; i >= 0; i--)
+    if (htab->entries[i])
+      free (htab->entries[i]);
+
+  free (htab->entries);
+  free (htab);
+}
+
+/* Similar to htab_find_slot, but without several unwanted side effects:
+    - Does not call htab->eq_f when it finds an existing entry.
+    - Does not change the count of elements/searches/collisions in the
+      hash table.
+   This function also assumes there are no deleted entries in the table.
+   HASH is the hash value for the element to be inserted.  */
+
+inline static void **
+find_empty_slot_for_expand (struct hashtab *htab, int hash)
+{
+  size_t size = htab->size;
+  unsigned int index = hash % size;
+  void **slot = htab->entries + index;
+  int hash2;
+
+  if (! *slot)
+    return slot;
+
+  hash2 = 1 + hash % (size - 2);
+  for (;;)
+    {
+      index += hash2;
+      if (index >= size)
+	index -= size;
+
+      slot = htab->entries + index;
+      if (! *slot)
+	return slot;
+    }
+}
+
+/* The following function changes size of memory allocated for the
+   entries and repeatedly inserts the table elements.  The occupancy
+   of the table after the call will be about 50%.  Naturally the hash
+   table must already exist.  Remember also that the place of the
+   table entries is changed.  If memory allocation failures are allowed,
+   this function will return zero, indicating that the table could not be
+   expanded.  If all goes well, it will return a non-zero value.  */
+
+inline static int
+htab_expand (struct hashtab *htab, int (*hash_fn)(void *))
+{
+  void **oentries;
+  void **olimit;
+  void **p;
+  void **nentries;
+  size_t nsize;
+
+  oentries = htab->entries;
+  olimit = oentries + htab->size;
+
+  /* Resize only when table after removal of unused elements is either
+     too full or too empty.  */
+  if (htab->n_elements * 2 > htab->size)
+    nsize = higher_prime_number (htab->n_elements * 2);
+  else
+    nsize = htab->size;
+
+  nentries = malloc (sizeof (void *) * nsize);
+  memset (nentries, 0, sizeof (void *) * nsize);
+  if (nentries == NULL)
+    return 0;
+  htab->entries = nentries;
+  htab->size = nsize;
+
+  p = oentries;
+  do
+    {
+      if (*p)
+	*find_empty_slot_for_expand (htab, hash_fn (*p))
+	  = *p;
+
+      p++;
+    }
+  while (p < olimit);
+
+#if 0 /* We can't tell whether this was allocated by the malloc()
+	 built into ld.so or the one in the main executable or libc,
+	 and calling free() for something that wasn't malloc()ed could
+	 do Very Bad Things (TM).  Take the conservative approach
+	 here, potentially wasting as much memory as actually used by
+	 the hash table, even if multiple growths occur.  That's not
+	 so bad as to require some overengineered solution that would
+	 enable us to keep track of how it was allocated. */
+  free (oentries);
+#endif
+  return 1;
+}
+
+/* This function searches for a hash table slot containing an entry
+   equal to the given element.  To delete an entry, call this with
+   INSERT = 0, then call htab_clear_slot on the slot returned (possibly
+   after doing some checks).  To insert an entry, call this with
+   INSERT = 1, then write the value you want into the returned slot.
+   When inserting an entry, NULL may be returned if memory allocation
+   fails.  */
+
+inline static void **
+htab_find_slot (struct hashtab *htab, void *ptr, int insert,
+		int (*hash_fn)(void *), int (*eq_fn)(void *, void *))
+{
+  unsigned int index;
+  int hash, hash2;
+  size_t size;
+  void **entry;
+
+  if (htab->size * 3 <= htab->n_elements * 4
+      && htab_expand (htab, hash_fn) == 0)
+    return NULL;
+
+  hash = hash_fn (ptr);
+
+  size = htab->size;
+  index = hash % size;
+
+  entry = &htab->entries[index];
+  if (!*entry)
+    goto empty_entry;
+  else if (eq_fn (*entry, ptr))
+    return entry;
+
+  hash2 = 1 + hash % (size - 2);
+  for (;;)
+    {
+      index += hash2;
+      if (index >= size)
+	index -= size;
+
+      entry = &htab->entries[index];
+      if (!*entry)
+	goto empty_entry;
+      else if (eq_fn (*entry, ptr))
+	return entry;
+    }
+
+ empty_entry:
+  if (!insert)
+    return NULL;
+
+  htab->n_elements++;
+  return entry;
+}
+
+inline static int
+hash_tlsdesc(void *p)
+{
+  struct tlsdesc_dynamic_arg *td = p;
+
+  /* We know all entries are for the same module, so ti_offset is the
+     only distinguishing entry.  */
+  return td->tlsinfo.ti_offset;
+}
+
+inline static int
+eq_tlsdesc(void *p, void *q)
+{
+  struct tlsdesc_dynamic_arg *tdp = p, *tdq = q;
+
+  return tdp->tlsinfo.ti_offset == tdq->tlsinfo.ti_offset;
+}
+
+inline static int
+map_generation (struct link_map *map)
+{
+  size_t idx = map->l_tls_modid;
+  struct dtv_slotinfo_list *listp = GL(dl_tls_dtv_slotinfo_list);
+
+  /* Find the place in the dtv slotinfo list.  */
+  do
+    {
+      /* Does it fit in the array of this list element?  */
+      if (idx < listp->len)
+	{
+	  /* We should never get here for a module in static TLS, so
+	     we can assume that, if the generation count is zero, we
+	     still haven't determined the generation count for this
+	     module.  */
+	  if (listp->slotinfo[idx].gen)
+	    return listp->slotinfo[idx].gen;
+	  else
+	    break;
+	}
+      idx -= listp->len;
+      listp = listp->next;
+    }
+  while (listp != NULL);
+
+  /* If we get to this point, the module still hasn't been assigned an
+     entry in the dtv slotinfo data structures, and it will when we're
+     done with relocations.  At that point, the module will get a
+     generation number that is one past the current generation, so
+     return exactly that.  */
+  return GL(dl_tls_generation) + 1;
+}
+
+void *
+_dl_make_tlsdesc_dynamic (struct link_map *map, size_t ti_offset)
+{
+  struct hashtab *ht;
+  void **entry;
+  struct tlsdesc_dynamic_arg *td, test;
+
+  /* FIXME: We could use a per-map lock here, but is it worth it?  */
+  __rtld_lock_lock_recursive (GL(dl_load_lock));
+
+  ht = map->l_mach.tlsdesc_table;
+  if (! ht)
+    {
+      ht = htab_create ();
+      if (! ht)
+	{
+	  __rtld_lock_unlock_recursive (GL(dl_load_lock));
+	  return 0;
+	}
+      map->l_mach.tlsdesc_table = ht;
+    }
+
+  test.tlsinfo.ti_module = map->l_tls_modid;
+  test.tlsinfo.ti_offset = ti_offset;
+  entry = htab_find_slot (ht, &test, 1, hash_tlsdesc, eq_tlsdesc);
+  if (*entry)
+    {
+      td = *entry;
+      __rtld_lock_unlock_recursive (GL(dl_load_lock));
+      return td;
+    }
+
+  *entry = td = malloc (sizeof (struct tlsdesc_dynamic_arg));
+  /* This may be higher than the map's generation, but it doesn't
+     matter much.  Worst case, we'll have one extra DTV update per
+     thread.  */
+  td->gen_count = map_generation (map);
+  td->tlsinfo = test.tlsinfo;
+
+  __rtld_lock_unlock_recursive (GL(dl_load_lock));
+  return td;
+}
+
+# endif /* SHARED */
+
+/* The idea of the following two functions is to stop multiple threads
+   from attempting to resolve the same TLS descriptor without busy
+   waiting.  Ideally, we should be able to release the lock right
+   after changing td->entry, and then using say a condition variable
+   or a futex wake to wake up any waiting threads, but let's try to
+   avoid introducing such dependencies.  */
+
+inline static int
+_dl_tlsdesc_resolve_early_return_p (struct tlsdesc volatile *td, void *caller)
+{
+  if (caller != td->entry)
+    return 1;
+
+  __rtld_lock_lock_recursive (GL(dl_load_lock));
+  if (caller != td->entry)
+    {
+      __rtld_lock_unlock_recursive (GL(dl_load_lock));
+      return 1;
+    }
+
+  td->entry = _dl_tlsdesc_resolve_hold;
+
+  return 0;
+}
+
+inline static void
+_dl_tlsdesc_wake_up_held_fixups (void)
+{
+  __rtld_lock_unlock_recursive (GL(dl_load_lock));
+}
+
+/* The following 4 functions take an entry_check_offset argument.
+   It's computed by the caller as an offset between its entry point
+   and the call site, such that by adding the built-in return address
+   that is implicitly passed to the function with this offset, we can
+   easily obtain the caller's entry point to compare with the entry
+   point given in the TLS descriptor.  If it's changed, we want to
+   return immediately.  */
+
+/* These macros are copied from elf/dl-reloc.c */
+
+#define CHECK_STATIC_TLS(map, sym_map)					\
+    do {								\
+      if (__builtin_expect ((sym_map)->l_tls_offset == NO_TLS_OFFSET	\
+			    || ((sym_map)->l_tls_offset			\
+				== FORCED_DYNAMIC_TLS_OFFSET), 0))	\
+	_dl_allocate_static_tls (sym_map);				\
+    } while (0)
+
+#define TRY_STATIC_TLS(map, sym_map)					\
+    (__builtin_expect ((sym_map)->l_tls_offset				\
+		       != FORCED_DYNAMIC_TLS_OFFSET, 1)			\
+     && (__builtin_expect ((sym_map)->l_tls_offset != NO_TLS_OFFSET, 1)	\
+	 || _dl_try_allocate_static_tls (sym_map) == 0))
+
+int internal_function _dl_try_allocate_static_tls (struct link_map *map);
+
+/* This function is used to lazily resolve TLS_DESC REL relocations
+   that reference the *ABS* segment in their own link maps.  The
+   argument is the addend originally stored there.  */
+
+void
+__attribute__ ((regparm (3))) attribute_hidden
+_dl_tlsdesc_resolve_abs_plus_addend_fixup (struct tlsdesc volatile *td,
+					   struct link_map *l,
+					   ptrdiff_t entry_check_offset)
+{
+  ptrdiff_t addend = (ptrdiff_t) td->arg;
+
+  if (_dl_tlsdesc_resolve_early_return_p (td, __builtin_return_address (0)
+					  - entry_check_offset))
+    return;
+
+#ifndef SHARED
+  CHECK_STATIC_TLS (l, l);
+#else
+  if (!TRY_STATIC_TLS (l, l))
+    {
+      td->arg = _dl_make_tlsdesc_dynamic (l, addend);
+      td->entry = _dl_tlsdesc_dynamic;
+    }
+  else
+#endif
+    {
+      td->arg = (void*)(addend - l->l_tls_offset);
+      td->entry = _dl_tlsdesc_return;
+    }
+
+  _dl_tlsdesc_wake_up_held_fixups ();
+}
+
+/* This function is used to lazily resolve TLS_DESC REL relocations
+   that originally had zero addends.  The argument location, that
+   originally held the addend, is used to hold a pointer to the
+   relocation, but it has to be restored before we call the function
+   that applies relocations.  */
+
+void
+__attribute__ ((regparm (3))) attribute_hidden
+_dl_tlsdesc_resolve_rel_fixup (struct tlsdesc volatile *td,
+			       struct link_map *l,
+			       ptrdiff_t entry_check_offset)
+{
+  const ElfW(Rel) *reloc = td->arg;
+
+  if (_dl_tlsdesc_resolve_early_return_p (td, __builtin_return_address (0)
+					  - entry_check_offset))
+    return;
+
+  /* The code below was borrowed from _dl_fixup(),
+     except for checking for STB_LOCAL.  */
+  const ElfW(Sym) *const symtab
+    = (const void *) D_PTR (l, l_info[DT_SYMTAB]);
+  const char *strtab = (const void *) D_PTR (l, l_info[DT_STRTAB]);
+  const ElfW(Sym) *sym = &symtab[ELFW(R_SYM) (reloc->r_info)];
+  lookup_t result;
+
+   /* Look up the target symbol.  If the normal lookup rules are not
+      used don't look in the global scope.  */
+  if (ELFW(ST_BIND) (sym->st_info) != STB_LOCAL
+      && __builtin_expect (ELFW(ST_VISIBILITY) (sym->st_other), 0) == 0)
+    {
+      const struct r_found_version *version = NULL;
+
+      if (l->l_info[VERSYMIDX (DT_VERSYM)] != NULL)
+	{
+	  const ElfW(Half) *vernum =
+	    (const void *) D_PTR (l, l_info[VERSYMIDX (DT_VERSYM)]);
+	  ElfW(Half) ndx = vernum[ELFW(R_SYM) (reloc->r_info)] & 0x7fff;
+	  version = &l->l_versions[ndx];
+	  if (version->hash == 0)
+	    version = NULL;
+	}
+
+      result = _dl_lookup_symbol_x (strtab + sym->st_name, l, &sym,
+				    l->l_scope, version, ELF_RTYPE_CLASS_PLT,
+				    DL_LOOKUP_ADD_DEPENDENCY, NULL);
+    }
+  else
+    {
+      /* We already found the symbol.  The module (and therefore its load
+	 address) is also known.  */
+      result = l;
+    }
+
+  if (!sym)
+    {
+      td->arg = 0;
+      td->entry = _dl_tlsdesc_undefweak;
+    }
+  else
+    {
+#  ifndef SHARED
+      CHECK_STATIC_TLS (l, result);
+#  else
+      if (!TRY_STATIC_TLS (l, result))
+	{
+	  td->arg = _dl_make_tlsdesc_dynamic (result, sym->st_value);
+	  td->entry = _dl_tlsdesc_dynamic;
+	}
+      else
+#  endif
+	{
+	  td->arg = (void*)(sym->st_value - result->l_tls_offset);
+	  td->entry = _dl_tlsdesc_return;
+	}
+    }
+
+  _dl_tlsdesc_wake_up_held_fixups ();
+}
+
+/* This function is used to lazily resolve TLS_DESC RELA relocations.
+   The argument location is used to hold a pointer to the relocation.  */
+
+void
+__attribute__ ((regparm (3))) attribute_hidden
+_dl_tlsdesc_resolve_rela_fixup (struct tlsdesc volatile *td,
+				struct link_map *l,
+				ptrdiff_t entry_check_offset)
+{
+  const ElfW(Rela) *reloc = td->arg;
+
+  if (_dl_tlsdesc_resolve_early_return_p (td, __builtin_return_address (0)
+					  - entry_check_offset))
+    return;
+
+  /* The code below was borrowed from _dl_fixup(),
+     except for checking for STB_LOCAL.  */
+  const ElfW(Sym) *const symtab
+    = (const void *) D_PTR (l, l_info[DT_SYMTAB]);
+  const char *strtab = (const void *) D_PTR (l, l_info[DT_STRTAB]);
+  const ElfW(Sym) *sym = &symtab[ELFW(R_SYM) (reloc->r_info)];
+  lookup_t result;
+
+   /* Look up the target symbol.  If the normal lookup rules are not
+      used don't look in the global scope.  */
+  if (ELFW(ST_BIND) (sym->st_info) != STB_LOCAL
+      && __builtin_expect (ELFW(ST_VISIBILITY) (sym->st_other), 0) == 0)
+    {
+      const struct r_found_version *version = NULL;
+
+      if (l->l_info[VERSYMIDX (DT_VERSYM)] != NULL)
+	{
+	  const ElfW(Half) *vernum =
+	    (const void *) D_PTR (l, l_info[VERSYMIDX (DT_VERSYM)]);
+	  ElfW(Half) ndx = vernum[ELFW(R_SYM) (reloc->r_info)] & 0x7fff;
+	  version = &l->l_versions[ndx];
+	  if (version->hash == 0)
+	    version = NULL;
+	}
+
+      result = _dl_lookup_symbol_x (strtab + sym->st_name, l, &sym,
+				    l->l_scope, version, ELF_RTYPE_CLASS_PLT,
+				    DL_LOOKUP_ADD_DEPENDENCY, NULL);
+    }
+  else
+    {
+      /* We already found the symbol.  The module (and therefore its load
+	 address) is also known.  */
+      result = l;
+    }
+
+  if (!sym)
+    {
+      td->arg = (void*)reloc->r_addend;
+      td->entry = _dl_tlsdesc_undefweak;
+    }
+  else
+    {
+#  ifndef SHARED
+      CHECK_STATIC_TLS (l, result);
+#  else
+      if (!TRY_STATIC_TLS (l, result))
+	{
+	  td->arg = _dl_make_tlsdesc_dynamic (result, sym->st_value
+					      + reloc->r_addend);
+	  td->entry = _dl_tlsdesc_dynamic;
+	}
+      else
+#  endif
+	{
+	  td->arg = (void*)(sym->st_value - result->l_tls_offset
+			    + reloc->r_addend);
+	  td->entry = _dl_tlsdesc_return;
+	}
+    }
+
+  _dl_tlsdesc_wake_up_held_fixups ();
+}
+
+void
+__attribute__ ((regparm (3))) attribute_hidden
+_dl_tlsdesc_resolve_hold_fixup (struct tlsdesc volatile *td,
+				struct link_map *l __attribute__((__unused__)),
+				ptrdiff_t entry_check_offset)
+{
+  /* Maybe we're lucky and can return early.  */
+  if (__builtin_return_address (0) - entry_check_offset != td->entry)
+    return;
+
+  /* Locking here will stop execution until the runnign resolver runs
+     _dl_tlsdesc_wake_up_held_fixups(), releasing the lock.
+
+     FIXME: We'd be better off waiting on a condition variable, such
+     that we didn't have to hold the lock throughout the relocation
+     processing.  */
+  __rtld_lock_lock_recursive (GL(dl_load_lock));
+  __rtld_lock_unlock_recursive (GL(dl_load_lock));
+}
+
+#endif /* USE_TLS */
+
+void
+_dl_unmap (struct link_map *map)
+{
+  __munmap ((void *) (map)->l_map_start,
+	    (map)->l_map_end - (map)->l_map_start);
+
+#if USE_TLS && SHARED
+  /* _dl_unmap is only called for dlopen()ed libraries, for which
+     calling free() is safe, or before we've completed the initial
+     relocation, in which case calling free() is probably pointless,
+     but still safe.  */
+  if (map->l_mach.tlsdesc_table)
+    htab_delete (map->l_mach.tlsdesc_table);
+#endif
+}
Index: sysdeps/i386/tlsdesc.sym
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ sysdeps/i386/tlsdesc.sym	2006-01-13 18:16:22.000000000 -0500
@@ -0,0 +1,20 @@
+#include <stddef.h>
+#include <sysdep.h>
+#include <tls.h>
+#include <link.h>
+#include <dl-tlsdesc.h>
+
+--
+
+-- Abuse tls.h macros to derive offsets relative to the thread register.
+#if defined USE_TLS
+
+DTV_OFFSET			offsetof(struct pthread, header.dtv)
+
+TLSDESC_ARG			offsetof(struct tlsdesc, arg)
+
+TLSDESC_GEN_COUNT		offsetof(struct tlsdesc_dynamic_arg, gen_count)
+TLSDESC_MODID			offsetof(struct tlsdesc_dynamic_arg, tlsinfo.ti_module)
+TLSDESC_MODOFF			offsetof(struct tlsdesc_dynamic_arg, tlsinfo.ti_offset)
+
+#endif
Index: sysdeps/x86_64/Makefile
===================================================================
--- sysdeps/x86_64/Makefile.orig	2006-01-13 18:14:11.000000000 -0500
+++ sysdeps/x86_64/Makefile	2006-01-13 18:16:22.000000000 -0500
@@ -9,3 +9,13 @@
 ifeq ($(subdir),gmon)
 sysdep_routines += _mcount
 endif
+
+ifeq ($(subdir),elf)
+sysdep-dl-routines += tlsdesc dl-tlsdesc
+sysdep_routines += tlsdesc dl-tlsdesc
+sysdep-rtld-routines += tlsdesc dl-tlsdesc
+endif
+
+ifeq ($(subdir),csu)
+gen-as-const-headers += tlsdesc.sym
+endif
Index: sysdeps/x86_64/bits/linkmap.h
===================================================================
--- sysdeps/x86_64/bits/linkmap.h.orig	2006-01-13 18:14:11.000000000 -0500
+++ sysdeps/x86_64/bits/linkmap.h	2006-01-13 18:16:22.000000000 -0500
@@ -3,6 +3,7 @@
   {
     Elf64_Addr plt; /* Address of .plt + 0x16 */
     Elf64_Addr gotplt; /* Address of .got + 0x18 */
+    void *tlsdesc_table; /* Address of TLS descriptor hash table.  */
   };
 
 #else
@@ -10,5 +11,6 @@
   {
     Elf32_Addr plt; /* Address of .plt + 0x16 */
     Elf32_Addr gotplt; /* Address of .got + 0x0c */
+    void *tlsdesc_table; /* Address of TLS descriptor hash table.  */
   };
 #endif
Index: sysdeps/x86_64/dl-lookupcfg.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ sysdeps/x86_64/dl-lookupcfg.h	2006-01-13 18:16:22.000000000 -0500
@@ -0,0 +1,28 @@
+/* Configuration of lookup functions.
+   Copyright (C) 2005 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, write to the Free
+   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307 USA.  */
+
+#define DL_UNMAP_IS_SPECIAL
+
+#include_next <dl-lookupcfg.h>
+
+struct link_map;
+
+extern void _dl_unmap (struct link_map *map);
+
+#define DL_UNMAP(map) _dl_unmap (map)
Index: sysdeps/x86_64/dl-machine.h
===================================================================
--- sysdeps/x86_64/dl-machine.h.orig	2006-01-13 18:14:11.000000000 -0500
+++ sysdeps/x86_64/dl-machine.h	2006-01-13 18:16:22.000000000 -0500
@@ -26,6 +26,7 @@
 #include <sys/param.h>
 #include <sysdep.h>
 #include <tls.h>
+#include <dl-tlsdesc.h>
 
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute__ ((unused))
@@ -131,6 +132,10 @@
 	got[2] = (Elf64_Addr) &_dl_runtime_resolve;
     }
 
+  if (l->l_info[ADDRIDX (DT_TLSDESC_GOT)] && lazy)
+    *(Elf64_Addr*)(D_PTR (l, l_info[ADDRIDX (DT_TLSDESC_GOT)]) + l->l_addr)
+      = (Elf64_Addr) &_dl_tlsdesc_resolve_rela;
+
   return lazy;
 }
 
@@ -194,7 +199,9 @@
 # define elf_machine_type_class(type)					      \
   ((((type) == R_X86_64_JUMP_SLOT					      \
      || (type) == R_X86_64_DTPMOD64					      \
-     || (type) == R_X86_64_DTPOFF64 || (type) == R_X86_64_TPOFF64)	      \
+     || (type) == R_X86_64_DTPOFF64					      \
+     || (type) == R_X86_64_TPOFF64					      \
+     || (type) == R_X86_64_TLSDESC)					      \
     * ELF_RTYPE_CLASS_PLT)						      \
    | (((type) == R_X86_64_COPY) * ELF_RTYPE_CLASS_COPY))
 #else
@@ -323,6 +330,41 @@
 	    *reloc_addr = sym->st_value + reloc->r_addend;
 # endif
 	  break;
+	case R_X86_64_TLSDESC:
+	  {
+	    struct tlsdesc volatile *td =
+	      (struct tlsdesc volatile *)reloc_addr;
+
+# ifndef RTLD_BOOTSTRAP
+	    if (! sym)
+	      {
+		td->arg = (void*)reloc->r_addend;
+		td->entry = _dl_tlsdesc_undefweak;
+	      }
+	    else
+# endif
+	      {
+# ifndef RTLD_BOOTSTRAP
+#  ifndef SHARED
+		CHECK_STATIC_TLS (map, sym_map);
+#  else
+		if (!TRY_STATIC_TLS (map, sym_map))
+		  {
+		    td->arg = _dl_make_tlsdesc_dynamic
+		      (sym_map, sym->st_value + reloc->r_addend);
+		    td->entry = _dl_tlsdesc_dynamic;
+		  }
+		else
+#  endif
+# endif
+		  {
+		    td->arg = (void*)(sym->st_value - sym_map->l_tls_offset
+				      + reloc->r_addend);
+		    td->entry = _dl_tlsdesc_return;
+		  }
+	      }
+	    break;
+	  }
 	case R_X86_64_TPOFF64:
 	  /* The offset is negative, forward from the thread pointer.  */
 # ifndef RTLD_BOOTSTRAP
@@ -435,6 +477,15 @@
 	  map->l_mach.plt
 	  + (((Elf64_Addr) reloc_addr) - map->l_mach.gotplt) * 2;
     }
+  else if (__builtin_expect (r_type == R_X86_64_TLSDESC, 1))
+    {
+      struct tlsdesc volatile * __attribute__((__unused__)) td =
+	(struct tlsdesc volatile *)reloc_addr;
+
+      td->arg = (void*)reloc;
+      td->entry = (void*)(D_PTR (map, l_info[ADDRIDX (DT_TLSDESC_PLT)])
+			  + map->l_addr);
+    }
   else
     _dl_reloc_bad_type (map, r_type, 1);
 }
Index: sysdeps/x86_64/dl-tls.h
===================================================================
--- sysdeps/x86_64/dl-tls.h.orig	2006-01-13 18:14:11.000000000 -0500
+++ sysdeps/x86_64/dl-tls.h	2006-01-13 18:16:22.000000000 -0500
@@ -1,5 +1,5 @@
 /* Thread-local storage handling in the ELF dynamic linker.  x86-64 version.
-   Copyright (C) 2002 Free Software Foundation, Inc.
+   Copyright (C) 2002, 2005 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -19,11 +19,13 @@
 
 
 /* Type used for the representation of TLS information in the GOT.  */
-typedef struct
+typedef struct dl_tls_index
 {
   unsigned long int ti_module;
   unsigned long int ti_offset;
 } tls_index;
 
 
+#ifdef SHARED
 extern void *__tls_get_addr (tls_index *ti);
+#endif
Index: sysdeps/x86_64/dl-tlsdesc.S
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ sysdeps/x86_64/dl-tlsdesc.S	2006-01-13 18:16:22.000000000 -0500
@@ -0,0 +1,203 @@
+/* Thread-local storage handling in the ELF dynamic linker.  x86_64 version.
+   Copyright (C) 2004, 2005 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, write to the Free
+   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307 USA.  */
+
+#include <sysdep.h>
+#include <tls.h>
+#include "tlsdesc.h"
+
+	.text
+#ifdef USE_TLS
+	.hidden _dl_tlsdesc_return
+	.global	_dl_tlsdesc_return
+	.type	_dl_tlsdesc_return,@function
+	cfi_startproc
+	.align 16
+_dl_tlsdesc_return:
+	movq	8(%rax), %rax
+	ret
+	cfi_endproc
+	.size	_dl_tlsdesc_return, .-_dl_tlsdesc_return
+
+	.hidden _dl_tlsdesc_undefweak
+	.global	_dl_tlsdesc_undefweak
+	.type	_dl_tlsdesc_undefweak,@function
+	cfi_startproc
+	.align 16
+_dl_tlsdesc_undefweak:
+	movq	8(%rax), %rax
+	subq	%fs:0, %rax
+	ret
+	cfi_endproc
+	.size	_dl_tlsdesc_undefweak, .-_dl_tlsdesc_undefweak
+
+#ifdef SHARED
+	.hidden _dl_tlsdesc_dynamic
+	.global	_dl_tlsdesc_dynamic
+	.type	_dl_tlsdesc_dynamic,@function
+
+     /* %rax points to the TLS descriptor, such that 0(%rax) points to
+	_dl_tlsdesc_dynamic itself, and 8(%rax) points to a struct
+	tlsdesc_dynamic_arg object.  It must return in %rax the offset
+	between the thread pointer and the object denoted by the
+	argument, without clobbering any registers.
+
+	The assembly code that follows is a rendition of the following
+	C code, hand-optimized a little bit.
+
+ptrdiff_t
+_dl_tlsdesc_dynamic (register struct tlsdesc *tdp asm ("%rax"))
+{
+  struct tlsdesc_dynamic_arg *td = tdp->arg;
+  dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
+  if (__builtin_expect (td->gen_count <= dtv[0].counter
+			&& (dtv[td->tlsinfo.ti_module].pointer.val
+			    != TLS_DTV_UNALLOCATED),
+			1))
+    return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
+      - __thread_pointer;
+
+  return __tls_get_addr_internal (&td->tlsinfo) - __thread_pointer;
+}
+*/
+	cfi_startproc
+	.align 16
+_dl_tlsdesc_dynamic:
+	/* Preserve call-clobbered registers that we modify.
+	   We need two scratch regs anyway.  */
+	movq	%rsi, -16(%rsp)
+	movq	%fs:DTV_OFFSET, %rsi
+	movq	%rdi, -8(%rsp)
+	movq	TLSDESC_ARG(%rax), %rdi
+	movq	(%rsi), %rax
+	cmpq	%rax, TLSDESC_GEN_COUNT(%rdi)
+	ja	.Lslow
+	movq	TLSDESC_MODID(%rdi), %rax
+	salq	$4, %rax
+	movq	(%rax,%rsi), %rax
+	cmpq	$-1, %rax
+	je	.Lslow
+	addq	TLSDESC_MODOFF(%rdi), %rax
+.Lret:
+	movq	-16(%rsp), %rsi
+	subq	%fs:0, %rax
+	movq	-8(%rsp), %rdi
+	ret
+.Lslow:
+	/* Besides rdi and rsi, saved above, save rdx, rcx, r8, r9,
+	   r10 and r11.  Also, align the stack, that's off by 8 bytes.	*/
+	subq	$72, %rsp
+	cfi_adjust_cfa_offset (72)
+	movq	%rdx, 8(%rsp)
+	movq	%rcx, 16(%rsp)
+	movq	%r8, 24(%rsp)
+	movq	%r9, 32(%rsp)
+	movq	%r10, 40(%rsp)
+	movq	%r11, 48(%rsp)
+	/* %rdi already points to the tlsinfo data structure.  */
+	call	__tls_get_addr@PLT
+	movq	8(%rsp), %rdx
+	movq	16(%rsp), %rcx
+	movq	24(%rsp), %r8
+	movq	32(%rsp), %r9
+	movq	40(%rsp), %r10
+	movq	48(%rsp), %r11
+	addq	$72, %rsp
+	cfi_adjust_cfa_offset (-72)
+	jmp	.Lret
+	cfi_endproc
+	.size	_dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
+#endif /* SHARED */
+
+	.hidden _dl_tlsdesc_resolve_rela
+	.global	_dl_tlsdesc_resolve_rela
+	.type	_dl_tlsdesc_resolve_rela,@function
+	cfi_startproc
+	.align 16
+	/* The PLT entry will have pushed the link_map pointer.  */
+	cfi_adjust_cfa_offset (8)
+_dl_tlsdesc_resolve_rela:
+	/* Save all call-clobbered registers.  */
+	subq	$72, %rsp
+	cfi_adjust_cfa_offset (72)
+	movq	%rax, (%rsp)
+	movq	%rdi, 8(%rsp)
+	movq	%rax, %rdi	/* Pass tlsdesc* in %rdi.  */
+	movq	%rsi, 16(%rsp)
+	movq	72(%rsp), %rsi	/* Pass link_map* in %rsi.  */
+	movq	%r8, 24(%rsp)
+	movq	%r9, 32(%rsp)
+	movq	%r10, 40(%rsp)
+	movq	%r11, 48(%rsp)
+	movq	%rdx, 56(%rsp)
+	movq	%rcx, 64(%rsp)
+	call	_dl_tlsdesc_resolve_rela_fixup
+	movq	(%rsp), %rax
+	movq	8(%rsp), %rdi
+	movq	16(%rsp), %rsi
+	movq	24(%rsp), %r8
+	movq	32(%rsp), %r9
+	movq	40(%rsp), %r10
+	movq	48(%rsp), %r11
+	movq	56(%rsp), %rdx
+	movq	64(%rsp), %rcx
+	addq	$80, %rsp
+	cfi_adjust_cfa_offset (-80)
+	jmp	*(%rax)
+	cfi_endproc
+	.size	_dl_tlsdesc_resolve_rela, .-_dl_tlsdesc_resolve_rela
+
+	.hidden _dl_tlsdesc_resolve_hold
+	.global	_dl_tlsdesc_resolve_hold
+	.type	_dl_tlsdesc_resolve_hold,@function
+	cfi_startproc
+	.align 16
+_dl_tlsdesc_resolve_hold:
+0:
+	/* Save all call-clobbered registers.  */
+	subq	$72, %rsp
+	cfi_adjust_cfa_offset (72)
+	movq	%rax, (%rsp)
+	movq	%rdi, 8(%rsp)
+	movq	%rax, %rdi	/* Pass tlsdesc* in %rdi.  */
+	movq	%rsi, 16(%rsp)
+	movq	$1f - 0b, %rsi	/* Pass return address offset in %rsi.  */
+	movq	%r8, 24(%rsp)
+	movq	%r9, 32(%rsp)
+	movq	%r10, 40(%rsp)
+	movq	%r11, 48(%rsp)
+	movq	%rdx, 56(%rsp)
+	movq	%rcx, 64(%rsp)
+	call	_dl_tlsdesc_resolve_hold_fixup
+1:
+	movq	(%rsp), %rax
+	movq	8(%rsp), %rdi
+	movq	16(%rsp), %rsi
+	movq	24(%rsp), %r8
+	movq	32(%rsp), %r9
+	movq	40(%rsp), %r10
+	movq	48(%rsp), %r11
+	movq	56(%rsp), %rdx
+	movq	64(%rsp), %rcx
+	addq	$72, %rsp
+	cfi_adjust_cfa_offset (-72)
+	jmp	*(%eax)
+	cfi_endproc
+	.size	_dl_tlsdesc_resolve_hold, .-_dl_tlsdesc_resolve_hold
+
+#endif /* USE_TLS */
Index: sysdeps/x86_64/dl-tlsdesc.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ sysdeps/x86_64/dl-tlsdesc.h	2006-01-13 18:16:22.000000000 -0500
@@ -0,0 +1,63 @@
+/* Thread-local storage descriptor handling in the ELF dynamic linker.
+   x86_64 version.
+   Copyright (C) 2005 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, write to the Free
+   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307 USA.  */
+
+#ifndef _X86_64_DL_TLSDESC_H
+# define _X86_64_DL_TLSDESC_H 1
+
+/* Use this to access DT_TLSDESC_PLT and DT_TLSDESC_GOT.  */
+#ifndef ADDRIDX
+# define ADDRIDX(tag) (DT_NUM + DT_THISPROCNUM + DT_VERSIONTAGNUM \
+		       + DT_EXTRANUM + DT_VALNUM + DT_ADDRTAGIDX (tag))
+#endif
+
+/* Type used to represent a TLS descriptor in the GOT.  */
+struct tlsdesc
+{
+  ptrdiff_t (*entry)(struct tlsdesc *on_rax);
+  void *arg;
+};
+
+typedef struct dl_tls_index
+{
+  unsigned long int ti_module;
+  unsigned long int ti_offset;
+} tls_index;
+
+/* Type used as the argument in a TLS descriptor for a symbol that
+   needs dynamic TLS offsets.  */
+struct tlsdesc_dynamic_arg
+{
+  tls_index tlsinfo;
+  size_t gen_count;
+};
+
+extern ptrdiff_t attribute_hidden
+  _dl_tlsdesc_return(struct tlsdesc *on_rax),
+  _dl_tlsdesc_undefweak(struct tlsdesc *on_rax),
+  _dl_tlsdesc_resolve_rela(struct tlsdesc *on_rax),
+  _dl_tlsdesc_resolve_hold(struct tlsdesc *on_rax);
+
+# ifdef SHARED
+extern void *_dl_make_tlsdesc_dynamic (struct link_map *map, size_t ti_offset);
+
+extern ptrdiff_t attribute_hidden _dl_tlsdesc_dynamic(struct tlsdesc *);
+# endif
+
+#endif
Index: sysdeps/x86_64/tlsdesc.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ sysdeps/x86_64/tlsdesc.c	2006-01-13 18:16:22.000000000 -0500
@@ -0,0 +1,556 @@
+/* Manage TLS descriptors.  x86_64 version.
+   Copyright (C) 2005 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, write to the Free
+   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307 USA.  */
+
+#include <link.h>
+#include <ldsodefs.h>
+#include <elf/dynamic-link.h>
+#include <tls.h>
+#include <dl-tlsdesc.h>
+
+#ifdef USE_TLS
+# ifdef SHARED
+
+extern void weak_function free (void *ptr);
+
+/* The hashcode handling code below is heavily inspired in libiberty's
+   hashtab code, but with most adaptation points and support for
+   deleting elements removed.
+
+   Copyright (C) 1999, 2000, 2001, 2002, 2003 Free Software Foundation, Inc.
+   Contributed by Vladimir Makarov (vmakarov@cygnus.com).  */
+
+inline static unsigned long
+higher_prime_number (unsigned long n)
+{
+  /* These are primes that are near, but slightly smaller than, a
+     power of two.  */
+  static const unsigned long primes[] = {
+    (unsigned long) 7,
+    (unsigned long) 13,
+    (unsigned long) 31,
+    (unsigned long) 61,
+    (unsigned long) 127,
+    (unsigned long) 251,
+    (unsigned long) 509,
+    (unsigned long) 1021,
+    (unsigned long) 2039,
+    (unsigned long) 4093,
+    (unsigned long) 8191,
+    (unsigned long) 16381,
+    (unsigned long) 32749,
+    (unsigned long) 65521,
+    (unsigned long) 131071,
+    (unsigned long) 262139,
+    (unsigned long) 524287,
+    (unsigned long) 1048573,
+    (unsigned long) 2097143,
+    (unsigned long) 4194301,
+    (unsigned long) 8388593,
+    (unsigned long) 16777213,
+    (unsigned long) 33554393,
+    (unsigned long) 67108859,
+    (unsigned long) 134217689,
+    (unsigned long) 268435399,
+    (unsigned long) 536870909,
+    (unsigned long) 1073741789,
+    (unsigned long) 2147483647,
+					/* 4294967291L */
+    ((unsigned long) 2147483647) + ((unsigned long) 2147483644),
+  };
+
+  const unsigned long *low = &primes[0];
+  const unsigned long *high = &primes[sizeof(primes) / sizeof(primes[0])];
+
+  while (low != high)
+    {
+      const unsigned long *mid = low + (high - low) / 2;
+      if (n > *mid)
+	low = mid + 1;
+      else
+	high = mid;
+    }
+
+#if 0
+  /* If we've run out of primes, abort.  */
+  if (n > *low)
+    {
+      fprintf (stderr, "Cannot find prime bigger than %lu\n", n);
+      abort ();
+    }
+#endif
+
+  return *low;
+}
+
+struct hashtab
+{
+  /* Table itself.  */
+  void **entries;
+
+  /* Current size (in entries) of the hash table */
+  size_t size;
+
+  /* Current number of elements.  */
+  size_t n_elements;
+};
+
+inline static struct hashtab *
+htab_create (void)
+{
+  struct hashtab *ht = malloc (sizeof (struct hashtab));
+
+  if (! ht)
+    return NULL;
+  ht->size = 3;
+  ht->entries = malloc (sizeof (void *) * ht->size);
+  if (! ht->entries)
+    return NULL;
+
+  ht->n_elements = 0;
+
+  memset (ht->entries, 0, sizeof (void *) * ht->size);
+
+  return ht;
+}
+
+/* This is only called from _dl_unmap, so it's safe to call
+   free().  See the discussion below.  */
+inline static void
+htab_delete (struct hashtab *htab)
+{
+  int i;
+
+  for (i = htab->size - 1; i >= 0; i--)
+    if (htab->entries[i])
+      free (htab->entries[i]);
+
+  free (htab->entries);
+  free (htab);
+}
+
+/* Similar to htab_find_slot, but without several unwanted side effects:
+    - Does not call htab->eq_f when it finds an existing entry.
+    - Does not change the count of elements/searches/collisions in the
+      hash table.
+   This function also assumes there are no deleted entries in the table.
+   HASH is the hash value for the element to be inserted.  */
+
+inline static void **
+find_empty_slot_for_expand (struct hashtab *htab, int hash)
+{
+  size_t size = htab->size;
+  unsigned int index = hash % size;
+  void **slot = htab->entries + index;
+  int hash2;
+
+  if (! *slot)
+    return slot;
+
+  hash2 = 1 + hash % (size - 2);
+  for (;;)
+    {
+      index += hash2;
+      if (index >= size)
+	index -= size;
+
+      slot = htab->entries + index;
+      if (! *slot)
+	return slot;
+    }
+}
+
+/* The following function changes size of memory allocated for the
+   entries and repeatedly inserts the table elements.  The occupancy
+   of the table after the call will be about 50%.  Naturally the hash
+   table must already exist.  Remember also that the place of the
+   table entries is changed.  If memory allocation failures are allowed,
+   this function will return zero, indicating that the table could not be
+   expanded.  If all goes well, it will return a non-zero value.  */
+
+inline static int
+htab_expand (struct hashtab *htab, int (*hash_fn)(void *))
+{
+  void **oentries;
+  void **olimit;
+  void **p;
+  void **nentries;
+  size_t nsize;
+
+  oentries = htab->entries;
+  olimit = oentries + htab->size;
+
+  /* Resize only when table after removal of unused elements is either
+     too full or too empty.  */
+  if (htab->n_elements * 2 > htab->size)
+    nsize = higher_prime_number (htab->n_elements * 2);
+  else
+    nsize = htab->size;
+
+  nentries = malloc (sizeof (void *) * nsize);
+  memset (nentries, 0, sizeof (void *) * nsize);
+  if (nentries == NULL)
+    return 0;
+  htab->entries = nentries;
+  htab->size = nsize;
+
+  p = oentries;
+  do
+    {
+      if (*p)
+	*find_empty_slot_for_expand (htab, hash_fn (*p))
+	  = *p;
+
+      p++;
+    }
+  while (p < olimit);
+
+#if 0 /* We can't tell whether this was allocated by the malloc()
+	 built into ld.so or the one in the main executable or libc,
+	 and calling free() for something that wasn't malloc()ed could
+	 do Very Bad Things (TM).  Take the conservative approach
+	 here, potentially wasting as much memory as actually used by
+	 the hash table, even if multiple growths occur.  That's not
+	 so bad as to require some overengineered solution that would
+	 enable us to keep track of how it was allocated. */
+  free (oentries);
+#endif
+  return 1;
+}
+
+/* This function searches for a hash table slot containing an entry
+   equal to the given element.  To delete an entry, call this with
+   INSERT = 0, then call htab_clear_slot on the slot returned (possibly
+   after doing some checks).  To insert an entry, call this with
+   INSERT = 1, then write the value you want into the returned slot.
+   When inserting an entry, NULL may be returned if memory allocation
+   fails.  */
+
+inline static void **
+htab_find_slot (struct hashtab *htab, void *ptr, int insert,
+		int (*hash_fn)(void *), int (*eq_fn)(void *, void *))
+{
+  unsigned int index;
+  int hash, hash2;
+  size_t size;
+  void **entry;
+
+  if (htab->size * 3 <= htab->n_elements * 4
+      && htab_expand (htab, hash_fn) == 0)
+    return NULL;
+
+  hash = hash_fn (ptr);
+
+  size = htab->size;
+  index = hash % size;
+
+  entry = &htab->entries[index];
+  if (!*entry)
+    goto empty_entry;
+  else if (eq_fn (*entry, ptr))
+    return entry;
+
+  hash2 = 1 + hash % (size - 2);
+  for (;;)
+    {
+      index += hash2;
+      if (index >= size)
+	index -= size;
+
+      entry = &htab->entries[index];
+      if (!*entry)
+	goto empty_entry;
+      else if (eq_fn (*entry, ptr))
+	return entry;
+    }
+
+ empty_entry:
+  if (!insert)
+    return NULL;
+
+  htab->n_elements++;
+  return entry;
+}
+
+inline static int
+hash_tlsdesc(void *p)
+{
+  struct tlsdesc_dynamic_arg *td = p;
+
+  /* We know all entries are for the same module, so ti_offset is the
+     only distinguishing entry.  */
+  return td->tlsinfo.ti_offset;
+}
+
+inline static int
+eq_tlsdesc(void *p, void *q)
+{
+  struct tlsdesc_dynamic_arg *tdp = p, *tdq = q;
+
+  return tdp->tlsinfo.ti_offset == tdq->tlsinfo.ti_offset;
+}
+
+inline static int
+map_generation (struct link_map *map)
+{
+  size_t idx = map->l_tls_modid;
+  struct dtv_slotinfo_list *listp = GL(dl_tls_dtv_slotinfo_list);
+
+  /* Find the place in the dtv slotinfo list.  */
+  do
+    {
+      /* Does it fit in the array of this list element?  */
+      if (idx < listp->len)
+	{
+	  /* We should never get here for a module in static TLS, so
+	     we can assume that, if the generation count is zero, we
+	     still haven't determined the generation count for this
+	     module.  */
+	  if (listp->slotinfo[idx].gen)
+	    return listp->slotinfo[idx].gen;
+	  else
+	    break;
+	}
+      idx -= listp->len;
+      listp = listp->next;
+    }
+  while (listp != NULL);
+
+  /* If we get to this point, the module still hasn't been assigned an
+     entry in the dtv slotinfo data structures, and it will when we're
+     done with relocations.  At that point, the module will get a
+     generation number that is one past the current generation, so
+     return exactly that.  */
+  return GL(dl_tls_generation) + 1;
+}
+
+void *
+_dl_make_tlsdesc_dynamic (struct link_map *map, size_t ti_offset)
+{
+  struct hashtab *ht;
+  void **entry;
+  struct tlsdesc_dynamic_arg *td, test;
+
+  /* FIXME: We could use a per-map lock here, but is it worth it?  */
+  __rtld_lock_lock_recursive (GL(dl_load_lock));
+
+  ht = map->l_mach.tlsdesc_table;
+  if (! ht)
+    {
+      ht = htab_create ();
+      if (! ht)
+	{
+	  __rtld_lock_unlock_recursive (GL(dl_load_lock));
+	  return 0;
+	}
+      map->l_mach.tlsdesc_table = ht;
+    }
+
+  test.tlsinfo.ti_module = map->l_tls_modid;
+  test.tlsinfo.ti_offset = ti_offset;
+  entry = htab_find_slot (ht, &test, 1, hash_tlsdesc, eq_tlsdesc);
+  if (*entry)
+    {
+      td = *entry;
+      __rtld_lock_unlock_recursive (GL(dl_load_lock));
+      return td;
+    }
+
+  *entry = td = malloc (sizeof (struct tlsdesc_dynamic_arg));
+  /* This may be higher than the map's generation, but it doesn't
+     matter much.  Worst case, we'll have one extra DTV update per
+     thread.  */
+  td->gen_count = map_generation (map);
+  td->tlsinfo = test.tlsinfo;
+
+  __rtld_lock_unlock_recursive (GL(dl_load_lock));
+  return td;
+}
+
+# endif /* SHARED */
+
+/* The idea of the following two functions is to stop multiple threads
+   from attempting to resolve the same TLS descriptor without busy
+   waiting.  Ideally, we should be able to release the lock right
+   after changing td->entry, and then using say a condition variable
+   or a futex wake to wake up any waiting threads, but let's try to
+   avoid introducing such dependencies.  */
+
+inline static int
+_dl_tlsdesc_resolve_early_return_p (struct tlsdesc volatile *td, void *caller)
+{
+  if (caller != td->entry)
+    return 1;
+
+  __rtld_lock_lock_recursive (GL(dl_load_lock));
+  if (caller != td->entry)
+    {
+      __rtld_lock_unlock_recursive (GL(dl_load_lock));
+      return 1;
+    }
+
+  td->entry = _dl_tlsdesc_resolve_hold;
+
+  return 0;
+}
+
+inline static void
+_dl_tlsdesc_wake_up_held_fixups (void)
+{
+  __rtld_lock_unlock_recursive (GL(dl_load_lock));
+}
+
+/* The following 2 functions take an entry_check_offset argument.
+   It's computed by the caller as an offset between its entry point
+   and the call site, such that by adding the built-in return address
+   that is implicitly passed to the function with this offset, we can
+   easily obtain the caller's entry point to compare with the entry
+   point given in the TLS descriptor.  If it's changed, we want to
+   return immediately.  */
+
+/* These macros are copied from elf/dl-reloc.c */
+
+#define CHECK_STATIC_TLS(map, sym_map)					\
+    do {								\
+      if (__builtin_expect ((sym_map)->l_tls_offset == NO_TLS_OFFSET	\
+			    || ((sym_map)->l_tls_offset			\
+				== FORCED_DYNAMIC_TLS_OFFSET), 0))	\
+	_dl_allocate_static_tls (sym_map);				\
+    } while (0)
+
+#define TRY_STATIC_TLS(map, sym_map)					\
+    (__builtin_expect ((sym_map)->l_tls_offset				\
+		       != FORCED_DYNAMIC_TLS_OFFSET, 1)			\
+     && (__builtin_expect ((sym_map)->l_tls_offset != NO_TLS_OFFSET, 1)	\
+	 || _dl_try_allocate_static_tls (sym_map) == 0))
+
+int internal_function _dl_try_allocate_static_tls (struct link_map *map);
+
+/* This function is used to lazily resolve TLS_DESC RELA relocations.
+   The argument location is used to hold a pointer to the relocation.  */
+
+void
+attribute_hidden
+_dl_tlsdesc_resolve_rela_fixup (struct tlsdesc volatile *td,
+				struct link_map *l)
+{
+  const ElfW(Rela) *reloc = td->arg;
+
+  if (_dl_tlsdesc_resolve_early_return_p
+      (td, (void*)(D_PTR (l, l_info[ADDRIDX (DT_TLSDESC_PLT)]) + l->l_addr)))
+    return;
+
+  /* The code below was borrowed from _dl_fixup().  */
+  const ElfW(Sym) *const symtab
+    = (const void *) D_PTR (l, l_info[DT_SYMTAB]);
+  const char *strtab = (const void *) D_PTR (l, l_info[DT_STRTAB]);
+  const ElfW(Sym) *sym = &symtab[ELFW(R_SYM) (reloc->r_info)];
+  lookup_t result;
+
+   /* Look up the target symbol.  If the normal lookup rules are not
+      used don't look in the global scope.  */
+  if (ELFW(ST_BIND) (sym->st_info) != STB_LOCAL
+      && __builtin_expect (ELFW(ST_VISIBILITY) (sym->st_other), 0) == 0)
+    {
+      const struct r_found_version *version = NULL;
+
+      if (l->l_info[VERSYMIDX (DT_VERSYM)] != NULL)
+	{
+	  const ElfW(Half) *vernum =
+	    (const void *) D_PTR (l, l_info[VERSYMIDX (DT_VERSYM)]);
+	  ElfW(Half) ndx = vernum[ELFW(R_SYM) (reloc->r_info)] & 0x7fff;
+	  version = &l->l_versions[ndx];
+	  if (version->hash == 0)
+	    version = NULL;
+	}
+
+      result = _dl_lookup_symbol_x (strtab + sym->st_name, l, &sym,
+				    l->l_scope, version, ELF_RTYPE_CLASS_PLT,
+				    DL_LOOKUP_ADD_DEPENDENCY, NULL);
+    }
+  else
+    {
+      /* We already found the symbol.  The module (and therefore its load
+	 address) is also known.  */
+      result = l;
+    }
+
+  if (! sym)
+    {
+      td->arg = (void*)reloc->r_addend;
+      td->entry = _dl_tlsdesc_undefweak;
+    }
+  else
+    {
+#  ifndef SHARED
+      CHECK_STATIC_TLS (l, result);
+#  else
+      if (!TRY_STATIC_TLS (l, result))
+	{
+	  td->arg = _dl_make_tlsdesc_dynamic (result, sym->st_value
+					      + reloc->r_addend);
+	  td->entry = _dl_tlsdesc_dynamic;
+	}
+      else
+#  endif
+	{
+	  td->arg = (void*)(sym->st_value - result->l_tls_offset
+			    + reloc->r_addend);
+	  td->entry = _dl_tlsdesc_return;
+	}
+    }
+
+  _dl_tlsdesc_wake_up_held_fixups ();
+}
+
+void
+attribute_hidden
+_dl_tlsdesc_resolve_hold_fixup (struct tlsdesc volatile *td,
+				ptrdiff_t entry_check_offset)
+{
+  /* Maybe we're lucky and can return early.  */
+  if (__builtin_return_address (0) - entry_check_offset != td->entry)
+    return;
+
+  /* Locking here will stop execution until the runnign resolver runs
+     _dl_tlsdesc_wake_up_held_fixups(), releasing the lock.
+
+     FIXME: We'd be better off waiting on a condition variable, such
+     that we didn't have to hold the lock throughout the relocation
+     processing.  */
+  __rtld_lock_lock_recursive (GL(dl_load_lock));
+  __rtld_lock_unlock_recursive (GL(dl_load_lock));
+}
+
+#endif /* USE_TLS */
+
+void
+_dl_unmap (struct link_map *map)
+{
+  __munmap ((void *) (map)->l_map_start,
+	    (map)->l_map_end - (map)->l_map_start);
+
+#if USE_TLS && SHARED
+  /* _dl_unmap is only called for dlopen()ed libraries, for which
+     calling free() is safe, or before we've completed the initial
+     relocation, in which case calling free() is probably pointless,
+     but still safe.  */
+  if (map->l_mach.tlsdesc_table)
+    htab_delete (map->l_mach.tlsdesc_table);
+#endif
+}
Index: sysdeps/x86_64/tlsdesc.sym
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ sysdeps/x86_64/tlsdesc.sym	2006-01-13 18:16:22.000000000 -0500
@@ -0,0 +1,20 @@
+#include <stddef.h>
+#include <sysdep.h>
+#include <tls.h>
+#include <link.h>
+#include <dl-tlsdesc.h>
+
+--
+
+-- Abuse tls.h macros to derive offsets relative to the thread register.
+#if defined USE_TLS
+
+DTV_OFFSET			offsetof(struct pthread, header.dtv)
+
+TLSDESC_ARG			offsetof(struct tlsdesc, arg)
+
+TLSDESC_GEN_COUNT		offsetof(struct tlsdesc_dynamic_arg, gen_count)
+TLSDESC_MODID			offsetof(struct tlsdesc_dynamic_arg, tlsinfo.ti_module)
+TLSDESC_MODOFF			offsetof(struct tlsdesc_dynamic_arg, tlsinfo.ti_offset)
+
+#endif
-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
Secretary for FSF Latin America        http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]