This is the mail archive of the
libc-help@sourceware.org
mailing list for the glibc project.
Re: A possible libc/dlmopen/pthreads bug
- From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- To: libc-help at sourceware dot org
- Date: Wed, 24 Jan 2018 15:08:22 -0200
- Subject: Re: A possible libc/dlmopen/pthreads bug
- Authentication-results: sourceware.org; auth=none
- References: <alpine.DEB.2.20.1801241345460.22921@noise.cbg.collabora.co.uk>
On 24/01/2018 11:59, Vivek Das Mohapatra wrote:
> Hello -
>
> As I've posted here before, I'm working on a segregated-dynamic-linking
> mechanism based on dlmopen (to allow applications to link against libraries
> without sharing their dependencies).
>
> While testing some recent builds, I think I may have tripped over a
> possible dlmopen/pthreads bug. It is entirely possible that I'm doing
> something wrong, or failing to do something - a lot of this is new
> territory for me.
>
> Either way, I'd appreciate some feedback/insight into what's going on.
>
> A bit of background before I proceed: libcapsule is the helper library
> I'm working on, based on dlmopen, that allows me to create "shim"
> libraries that expose the symbols from their immediate target,
> but don't expose any further dependencies.
>
> +-----------------------------+ +----------------------------+
> | Runtime filesystem | | Host filesystem |
> | | | |
> | +------------+ | | |
> | | Executable | | | |
> | ++------+----+ | | |
> | | | | | |
> | | +--+---------------+ | | +------------------+ |
> | | | shim libX11 | <-----> | real libX11 |-+ |
> | | +--+---------------+ | | +----------+-------+ | |
> | | | | | | | |
> | | | | | +---+-------+ | |
> | ++------+-----+ | | | libA | | |
> | | libc | | | +-----------+ | |
> | +-------------+ | | +-----------+ | |
> | | | | libB |-+ |
> | | | +-----------+ |
> +-----------------------------+ +----------------------------+
>
> The libraries on the right hand side from the host filesystem are
> in a dlmopen namespace of their own (as an implementation detail
> they have the same libc as the DSOs on the left, but it is a new
> copy opened with dlmopen)
>
> The shim libX11 takes care of making sure the executable gets symbols
> from the real libX11 instead of the shim, but the bug we're going
> to look is not (I think) related to that, so I'm not going to discuss
> that process here.
>
> The visible symptom is that when I launch pulseaudio with a few shimmed
> libraries (the first case I stumbled upon, and easy to reproduce) it seems
> to deadlock very early in its life.
>
> A bit of digging with strace and gdb shows that when it locks up
> it does so inside setresuid. A bit more digging indicates that the
> code is infinite looping here:
>
> __nptl_setxid (cmdp=0xffffd9d8) at allocatestack.c:1105
> +list
> 1103 1104 /* Now the list with threads using user-allocated stacks. */
> 1105 list_for_each (runp, &__stack_user)
> 1106 {
> 1107 struct pthread *t = list_entry (runp, struct pthread, list);
> 1108 if (t == self)
> 1109 continue;
> 1110 1111 setxid_mark_thread (cmdp, t);
> 1112 }
>
> For some reason, list_for_each never terminates.
>
> If I disable encapsulation (remove the shim libraries from the library
> path) then the following holds at that point in the code:
>
> Breakpoint 6, __nptl_setxid (cmdp=0xffffd9e8) at allocatestack.c:1105
> 1105 list_for_each (runp, &__stack_user)
> +bt
> #0 __nptl_setxid (cmdp=0xffffd9e8) at allocatestack.c:1105
> #1 0xf7b96162 in __GI___setresuid (ruid=1000, euid=1000, suid=1000)
> at ../sysdeps/unix/sysv/linux/i386/setresuid.c:29
> #2 0x5655b7f0 in pa_drop_root ()
> #3 0x56558a6e in main ()
>
> Digging into __stack_user:
>
> +p __stack_user
> $1 = {next = 0xf73a48a0, prev = 0xf73a48a0}
>
> +p &__stack_user
> $2 = (list_t *) 0xf7d1d1a4 <__stack_user>
>
> +p (&__stack_user)->next
> $3 = (struct list_head *) 0xf73a48a0
>
> +p (&__stack_user)->next->next
> $4 = (struct list_head *) 0xf7d1d1a4 <__stack_user>
>
> +p (&__stack_user)->next->next->next
> $5 = (struct list_head *) 0xf73a48a0
>
> We find a circular linked list, which contains a pointer to __stack_user.
> Since list_for_each is invoked as list_for_each(…, &__stack_user),
> this means the for loop it implements will terminate, allowing setresuid
> to proceed.
>
> // ============================================================================
> Note: The definition of list_for_each is this:
>
> # define list_for_each(pos, head) \
> for (pos = (head)->next; pos != (head); pos = pos->next)
> // ============================================================================
>
> Now let's examine the same case with the shim in place:
>
> Breakpoint 6, __nptl_setxid (cmdp=0xffffd9d8) at allocatestack.c:1105
> 1105 list_for_each (runp, &__stack_user)
> ⋮
> +p __stack_user
> $1 = {next = 0xf76eeb60, prev = 0xf76eeb60}
>
> +p &__stack_user
> $2 = (list_t *) 0xf7d8f1a4 <__stack_user>
>
> +p (&__stack_user)->next
> $3 = (struct list_head *) 0xf76eeb60
>
> +p (&__stack_user)->next->next
> $4 = (struct list_head *) 0xf71391a4
>
> +p (&__stack_user)->next->next->next
> $5 = (struct list_head *) 0xf76eeb60
>
> We can see we have a circular linked list, as before, but it does
> _not_ contain the element supplied as the head to list_for_each:
> We're going to loop forever.
>
> ============================================================================
>
> Next let's try and figure out when/where this happens.
> Setting various breakpoints and watches we uncover the following:
>
> +run
> Starting program: /usr/bin/pulseaudio --start
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
>
> Breakpoint 1, __pthread_initialize_minimal_internal () at nptl-init.c:290
> 290 {
> +break allocatestack.c:1105
> Breakpoint 6 at 0xf7d78b2c: file allocatestack.c, line 1105.
> +watch __stack_user
> Hardware watchpoint 7: __stack_user
> +watch __stack_user.next
> Hardware watchpoint 8: __stack_user.next
> +cont
> Continuing.
>
> Hardware watchpoint 7: __stack_user
>
> Old value = {next = 0x0, prev = 0x0}
> New value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0x0}
>
> Hardware watchpoint 8: __stack_user.next
>
> Old value = (struct list_head *) 0x0
> New value = (struct list_head *) 0xf7d8f1a4 <__stack_user>
> __pthread_initialize_minimal_internal () at nptl-init.c:377
> 377 list_add (&pd->list, &__stack_user);
> +cont
> Continuing.
>
> Hardware watchpoint 7: __stack_user
>
> Old value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0x0}
> New value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0xf76eeb60}
> list_add (head=<optimized out>, newp=0xf76eeb60) at ../include/list.h:64
> 64 head->next = newp;
> +cont
> Continuing.
>
> Hardware watchpoint 7: __stack_user
>
> Old value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0xf76eeb60}
> New value = {next = 0xf76eeb60, prev = 0xf76eeb60}
>
> Hardware watchpoint 8: __stack_user.next
>
> Old value = (struct list_head *) 0xf7d8f1a4 <__stack_user>
> New value = (struct list_head *) 0xf76eeb60
> __pthread_initialize_minimal_internal () at nptl-init.c:381
> 381 THREAD_SETMEM (pd, report_events, __nptl_initial_report_events);
> +cont
> Continuing.
>
> Breakpoint 2, __pthread_init_static_tls (map=0x5657e040) at allocatestack.c:1210
> 1210 {
>
> // ============================================================================
> // At this point we step to the end of __pthread_init_static_tls and set
> // an extra watch point on the address currently holding &__stack_user
> // ============================================================================
>
> +p __stack_user.next
> $1 = (struct list_head *) 0xf76eeb60
>
> +p __stack_user.next->next
> $2 = (struct list_head *) 0xf7d8f1a4 <__stack_user> ← STILL GOOD
>
> +watch __stack_user.next->next
> Hardware watchpoint 9: __stack_user.next->next
> +s
>
>
> // And here it is: Hardware watchpoint 9: __stack_user.next->next
>
> Old value = (struct list_head *) 0xf7d8f1a4 <__stack_user>
> New value = (struct list_head *) 0xf71391a4 ← >>>>> GONE WRONG HERE <<<<<
> 0xf7121c83 in ?? ()
>
> // Hm, an unknown address scribbling on __stack_user.
>
> +call calloc(1, sizeof(Dl_info))
> $3 = (void *) 0x56574d18
> +call dladdr(0xf7121c83, $3)
> $4 = 1
>
> +p *(Dl_info *)$3
> $5 = {dli_fname = 0x565755b8 "/lib/i386-linux-gnu/libpthread.so.0",
> dli_fbase = 0xf711d000,
> dli_sname = 0xf711f617 "__pthread_initialize_minimal",
> dli_saddr = 0xf7121be0}
>
> // Well that can't be right, can it? gdb should have figured out the name
> // of 0xf7121c83, not said ?? - let's work out the address in the other
> // direction:
>
> +p __pthread_initialize_minimal
> $6 = {<text variable, no debug info>} 0xf7d77be0 <__pthread_initialize_minimal_internal>
>
> +call dladdr(0xf7d77be0, $3)
> $8 = 1
>
> +p *(Dl_info *)$3
> $10 = {dli_fname = 0xf7fd4d70 "/lib/i386-linux-gnu/libpthread.so.0",
> dli_fbase = 0xf7d73000,
> dli_sname = 0xf7d75617 "__pthread_initialize_minimal",
> dli_saddr = 0xf7d77be0 <__pthread_initialize_minimal_internal>}
>
> // ============================================================================
>
> Aha! Same DSO, different base address. So the ?? instance of
> __pthread_initialize_minimal_internal was from the _other_ copy of libc,
> inside the dlmopen namespace - the one gdb doesn't know how to inspect.
>
>
> PS: for completeness, I went back and followed the __stack_user linked list
> at the "GONE WRONG HERE" point, just to be sure:
>
> +p __stack_user
> $1 = {next = 0xf76eeb60, prev = 0xf76eeb60}
>
> +p __stack_user.next
> $2 = (struct list_head *) 0xf76eeb60
>
> +p __stack_user.next->next
> $3 = (struct list_head *) 0xf71391a4
>
> +p __stack_user.next->next->next
> $4 = (struct list_head *) 0xf71391a4
>
> +p __stack_user.next->next->next->next
> $5 = (struct list_head *) 0xf71391a4
>
> So the linked list definitely doesn't contain &__stack_user any more.
>
> // ============================================================================
>
> Apologies for the exegesis: It seems to me that the copy of libc in the
> private namespace has somehow managed to scribble on the linked list
> pointed to by __stack_user, overwriting a key address.
>
> Is my analysis correct? Is there something I could or should have done to
> avoid this?
>
> A while ago (https://sourceware.org/ml/libc-help/2018-01/msg00002.html)
> I suggested a dlmopen flag RTLD_UNIQUE or similar which would cause the
> existing mapping of the target library in the main namespace/link-map to be
> re-used instead of creating a new one: I believe this would prevent this
> problem (and others detailed in that message) from occurring - any thoughts?
Nice write through, if you could please open a bug report if possible with a
testcase to trigger it. I am wondering if this is triggering already reported
issues with dlmopen: BZ#18684 [1], BZ#15271 [2], and BZ#15134 [3]. In fact
it really looks like BZ#18684, where Carlos noted namespace's global searchlist
(RTLD_GLOBAL) is never initialized in some cases.
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=18684
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=15271
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=15134