This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: A possible libc/dlmopen/pthreads bug



On 24/01/2018 11:59, Vivek Das Mohapatra wrote:
> Hello -
> 
> As I've posted here before, I'm working on a segregated-dynamic-linking
> mechanism based on dlmopen (to allow applications to link against libraries
> without sharing their dependencies).
> 
> While testing some recent builds, I think I may have tripped over a
> possible dlmopen/pthreads bug. It is entirely possible that I'm doing
> something wrong, or failing to do something - a lot of this is new
> territory for me.
> 
> Either way, I'd appreciate some feedback/insight into what's going on.
> 
> A bit of background before I proceed: libcapsule is the helper library
> I'm working on, based on dlmopen, that allows me to create "shim"
> libraries that expose the symbols from their immediate target,
> but don't expose any further dependencies.
> 
>   +-----------------------------+ +----------------------------+
>   | Runtime filesystem          | | Host filesystem            |
>   |                             | |                            |
>   | +------------+              | |                            |
>   | | Executable |              | |                            |
>   | ++------+----+              | |                            |
>   |  |      |                   | |                            |
>   |  |   +--+---------------+   | |   +------------------+     |
>   |  |   | shim libX11      | <-----> | real libX11      |-+   |
>   |  |   +--+---------------+   | |   +----------+-------+ |   |
>   |  |      |                   | |              |         |   |
>   |  |      |                   | |          +---+-------+ |   |
>   | ++------+-----+             | |          | libA      | |   |
>   | | libc        |             | |          +-----------+ |   |
>   | +-------------+             | |          +-----------+ |   |
>   |                             | |          | libB      |-+   |
>   |                             | |          +-----------+     |
>   +-----------------------------+ +----------------------------+
> 
> The libraries on the right hand side from the host filesystem are
> in a dlmopen namespace of their own (as an implementation detail
> they have the same libc as the DSOs on the left, but it is a new
> copy opened with dlmopen)
> 
> The shim libX11 takes care of making sure the executable gets symbols
> from the real libX11 instead of the shim, but the bug we're going
> to look is not (I think) related to that, so I'm not going to discuss
> that process here.
> 
> The visible symptom is that when I launch pulseaudio with a few shimmed
> libraries (the first case I stumbled upon, and easy to reproduce) it seems
> to deadlock very early in its life.
> 
> A bit of digging with strace and gdb shows that when it locks up
> it does so inside setresuid. A bit more digging indicates that the
> code is infinite looping here:
> 
> __nptl_setxid (cmdp=0xffffd9d8) at allocatestack.c:1105
> +list
> 1103 1104      /* Now the list with threads using user-allocated stacks.  */
> 1105      list_for_each (runp, &__stack_user)
> 1106        {
> 1107          struct pthread *t = list_entry (runp, struct pthread, list);
> 1108          if (t == self)
> 1109            continue;
> 1110 1111          setxid_mark_thread (cmdp, t);
> 1112        }
> 
> For some reason, list_for_each never terminates.
> 
> If I disable encapsulation (remove the shim libraries from the library
> path) then the following holds at that point in the code:
> 
> Breakpoint 6, __nptl_setxid (cmdp=0xffffd9e8) at allocatestack.c:1105
> 1105      list_for_each (runp, &__stack_user)
> +bt
> #0  __nptl_setxid (cmdp=0xffffd9e8) at allocatestack.c:1105
> #1  0xf7b96162 in __GI___setresuid (ruid=1000, euid=1000, suid=1000)
>       at ../sysdeps/unix/sysv/linux/i386/setresuid.c:29
> #2  0x5655b7f0 in pa_drop_root ()
> #3  0x56558a6e in main ()
> 
> Digging into __stack_user:
> 
> +p __stack_user
> $1 = {next = 0xf73a48a0, prev = 0xf73a48a0}
> 
> +p &__stack_user
> $2 = (list_t *) 0xf7d1d1a4 <__stack_user>
> 
> +p (&__stack_user)->next
> $3 = (struct list_head *) 0xf73a48a0
> 
> +p (&__stack_user)->next->next
> $4 = (struct list_head *) 0xf7d1d1a4 <__stack_user>
> 
> +p (&__stack_user)->next->next->next
> $5 = (struct list_head *) 0xf73a48a0
> 
> We find a circular linked list, which contains a pointer to __stack_user.
> Since list_for_each is invoked as list_for_each(…, &__stack_user),
> this means the for loop it implements will terminate, allowing setresuid
> to proceed.
> 
> // ============================================================================
> Note: The definition of list_for_each is this:
> 
> # define list_for_each(pos, head) \
>   for (pos = (head)->next; pos != (head); pos = pos->next)
> // ============================================================================
> 
> Now let's examine the same case with the shim in place:
> 
> Breakpoint 6, __nptl_setxid (cmdp=0xffffd9d8) at allocatestack.c:1105
> 1105      list_for_each (runp, &__stack_user)
>  ⋮
> +p __stack_user
> $1 = {next = 0xf76eeb60, prev = 0xf76eeb60}
> 
> +p &__stack_user
> $2 = (list_t *) 0xf7d8f1a4 <__stack_user>
> 
> +p (&__stack_user)->next
> $3 = (struct list_head *) 0xf76eeb60
> 
> +p (&__stack_user)->next->next
> $4 = (struct list_head *) 0xf71391a4
> 
> +p (&__stack_user)->next->next->next
> $5 = (struct list_head *) 0xf76eeb60
> 
> We can see we have a circular linked list, as before, but it does
> _not_ contain the element supplied as the head to list_for_each:
> We're going to loop forever.
> 
> ============================================================================
> 
> Next let's try and figure out when/where this happens.
> Setting various breakpoints and watches we uncover the following:
> 
> +run
> Starting program: /usr/bin/pulseaudio --start
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
> 
> Breakpoint 1, __pthread_initialize_minimal_internal () at nptl-init.c:290
> 290    {
> +break allocatestack.c:1105
> Breakpoint 6 at 0xf7d78b2c: file allocatestack.c, line 1105.
> +watch __stack_user
> Hardware watchpoint 7: __stack_user
> +watch __stack_user.next
> Hardware watchpoint 8: __stack_user.next
> +cont
> Continuing.
> 
> Hardware watchpoint 7: __stack_user
> 
> Old value = {next = 0x0, prev = 0x0}
> New value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0x0}
> 
> Hardware watchpoint 8: __stack_user.next
> 
> Old value = (struct list_head *) 0x0
> New value = (struct list_head *) 0xf7d8f1a4 <__stack_user>
> __pthread_initialize_minimal_internal () at nptl-init.c:377
> 377      list_add (&pd->list, &__stack_user);
> +cont
> Continuing.
> 
> Hardware watchpoint 7: __stack_user
> 
> Old value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0x0}
> New value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0xf76eeb60}
> list_add (head=<optimized out>, newp=0xf76eeb60) at ../include/list.h:64
> 64      head->next = newp;
> +cont
> Continuing.
> 
> Hardware watchpoint 7: __stack_user
> 
> Old value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0xf76eeb60}
> New value = {next = 0xf76eeb60, prev = 0xf76eeb60}
> 
> Hardware watchpoint 8: __stack_user.next
> 
> Old value = (struct list_head *) 0xf7d8f1a4 <__stack_user>
> New value = (struct list_head *) 0xf76eeb60
> __pthread_initialize_minimal_internal () at nptl-init.c:381
> 381      THREAD_SETMEM (pd, report_events, __nptl_initial_report_events);
> +cont
> Continuing.
> 
> Breakpoint 2, __pthread_init_static_tls (map=0x5657e040) at allocatestack.c:1210
> 1210    {
> 
> // ============================================================================
> // At this point we step to the end of __pthread_init_static_tls and set
> // an extra watch point on the address currently holding &__stack_user
> // ============================================================================
> 
> +p __stack_user.next
> $1 = (struct list_head *) 0xf76eeb60
> 
> +p __stack_user.next->next
> $2 = (struct list_head *) 0xf7d8f1a4 <__stack_user>  ← STILL GOOD
> 
> +watch __stack_user.next->next
> Hardware watchpoint 9: __stack_user.next->next
> +s
> 
> 
> // And here it is: Hardware watchpoint 9: __stack_user.next->next
> 
> Old value = (struct list_head *) 0xf7d8f1a4 <__stack_user>
> New value = (struct list_head *) 0xf71391a4 ← >>>>> GONE WRONG HERE <<<<<
> 0xf7121c83 in ?? ()
> 
> // Hm, an unknown address scribbling on __stack_user.
> 
> +call calloc(1, sizeof(Dl_info))
> $3 = (void *) 0x56574d18
> +call dladdr(0xf7121c83, $3)
> $4 = 1
> 
> +p *(Dl_info *)$3
> $5 = {dli_fname = 0x565755b8 "/lib/i386-linux-gnu/libpthread.so.0",
>       dli_fbase = 0xf711d000,
>       dli_sname = 0xf711f617 "__pthread_initialize_minimal",
>       dli_saddr = 0xf7121be0}
> 
> // Well that can't be right, can it? gdb should have figured out the name
> // of 0xf7121c83, not said ?? - let's work out the address in the other
> // direction:
> 
> +p __pthread_initialize_minimal
> $6 = {<text variable, no debug info>} 0xf7d77be0 <__pthread_initialize_minimal_internal>
> 
> +call dladdr(0xf7d77be0, $3)
> $8 = 1
> 
> +p *(Dl_info *)$3
> $10 = {dli_fname = 0xf7fd4d70 "/lib/i386-linux-gnu/libpthread.so.0",
>        dli_fbase = 0xf7d73000,
>        dli_sname = 0xf7d75617 "__pthread_initialize_minimal",
>        dli_saddr = 0xf7d77be0 <__pthread_initialize_minimal_internal>}
> 
> // ============================================================================
> 
> Aha! Same DSO, different base address. So the ?? instance of
> __pthread_initialize_minimal_internal was from the _other_ copy of libc,
> inside the dlmopen namespace - the one gdb doesn't know how to inspect.
> 
> 
> PS: for completeness, I went back and followed the __stack_user linked list
> at the "GONE WRONG HERE" point, just to be sure:
> 
> +p __stack_user
> $1 = {next = 0xf76eeb60, prev = 0xf76eeb60}
> 
> +p __stack_user.next
> $2 = (struct list_head *) 0xf76eeb60
> 
> +p __stack_user.next->next
> $3 = (struct list_head *) 0xf71391a4
> 
> +p __stack_user.next->next->next
> $4 = (struct list_head *) 0xf71391a4
> 
> +p __stack_user.next->next->next->next
> $5 = (struct list_head *) 0xf71391a4
> 
> So the linked list definitely doesn't contain &__stack_user any more.
> 
> // ============================================================================
> 
> Apologies for the exegesis: It seems to me that the copy of libc in the
> private namespace has somehow managed to scribble on the linked list
> pointed to by __stack_user, overwriting a key address.
> 
> Is my analysis correct? Is there something I could or should have done to
> avoid this?
> 
> A while ago (https://sourceware.org/ml/libc-help/2018-01/msg00002.html)
> I suggested a dlmopen flag RTLD_UNIQUE or similar which would cause the
> existing mapping of the target library in the main namespace/link-map to be
> re-used instead of creating a new one: I believe this would prevent this
> problem (and others detailed in that message) from occurring - any thoughts?

Nice write through, if you could please open a bug report if possible with a
testcase to trigger it.  I am wondering if this is triggering already reported
issues with dlmopen: BZ#18684 [1], BZ#15271 [2], and BZ#15134 [3].  In fact
it really looks like BZ#18684, where Carlos noted namespace's global searchlist
(RTLD_GLOBAL) is never initialized in some cases.

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=18684
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=15271
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=15134


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]