This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libc/5741] Segfault in __libc_dlopen_mode ()


------- Additional Comments From sjmunroe at us dot ibm dot com  2008-02-06 17:19 -------
The standard powerpc64 plt call stub looks like:

0x400000751c8 <._init+440>:     addis   r12,r2,0
0x400000751cc <._init+444>:     std     r2,40(r1)
0x400000751d0 <._init+448>:     ld      r11,-31232(r12) // plt->func
0x400000751d4 <._init+452>:     ld      r2,-31224(r12) // plt->toc
0x400000751d8 <._init+456>:     mtctr   r11
0x400000751dc <._init+460>:     ld      r11,-31216(r12) // plt-aux
0x400000751e0 <._init+464>:     bctr

For lazy resolution the plt entries 1-n are initialized to plt[n].func =
&glink[n]. The toc and aux fields of the unresolved plt entry are NULL. Calls to
unresolved plt entries end up in glink0 which computes the plt index and
transfers control to _dl_runtime_resolve (via the plt[0] AKA plt_reserve. Note
that the unresolved plt entry does not need the toc set because glink0 will load
the &_dl_runtime_resolve and toc for ld.so from the plt[0] entry.

_dl_runtime_resolve calls _dl_fixup which in-lines elf_machine_fixup_plt.
elf_machine_fixup_plt effectively copies the target functions resolved opd entry
over the callers plt entry?

...
/* For PPC64, fixup_plt copies the function descriptor from opd
over the corresponding PLT entry.
Initially, PLT Entry[i] is set up for lazy linking.
For lazy linking, the fd_toc and fd_aux entries are irrelevant,
so for thread safety we write them before changing fd_func.  */

plt->fd_aux = rel->fd_aux + offset;
plt->fd_toc = rel->fd_toc + offset;
PPC_DCBST (&plt->fd_aux);
PPC_DCBST (&plt->fd_toc);
PPC_SYNC;

plt->fd_func = rel->fd_func + offset;
PPC_DCBST (&plt->fd_func);
PPC_SYNC;
...

This should be a safe sequence as the caller should never see the updated
plt->fd_func until the plt->fd_toc store completes and is broadcast system wide
(by the data cache block store and sync instruction). Seeing the updated toc but
the old (glink) function address is ok as glink0 will reload the toc anyway.

This has worked for a long time and clearly works for in-order
micro-architectures. POWER5 is both out-of-order and speculative super-scalar
with a up to 175 instruction in flight per thread. If the plt entry crosses a
cache-line or page boundary the function pointer load can be delayed (for
example a cache or TLB miss) while the toc load completes early.

It does not happen often (1 in 50,000 on an 8 core system) but it does happen.

It also seems that the data cache block store is not strong enough. On POWER5 he
dcbst instruction has no direct effect on the L1 DCache (since it is
store-through), and no direct effect on the L2 cache or L3 cache (both of these
are kept coherent). As a result, the instruction simply goes through address
translation, reports any errors, and completes.)

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=5741

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]