This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug dynamic-link/22745] New: _nptl_setxid can loop forever if a dlmopen namespace tries to initialise pthreads after the main namespace does


https://sourceware.org/bugzilla/show_bug.cgi?id=22745

            Bug ID: 22745
           Summary: _nptl_setxid can loop forever if a dlmopen namespace
                    tries to initialise pthreads after the main namespace
                    does
           Product: glibc
           Version: 2.24
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: dynamic-link
          Assignee: unassigned at sourceware dot org
          Reporter: vivek at collabora dot com
  Target Milestone: ---

Created attachment 10759
  --> https://sourceware.org/bugzilla/attachment.cgi?id=10759&action=edit
Test case - build two executables. One uses dlmopen and triggers the lock up,
the other uses dlopen and does not.

Stumbled open this while testing pulseaudio in conjunction with
dlmopen: pulseaudio seems to lock up very soon after it starts.

A bit of digging with strace and gdb shows that when it locks up
it does so inside setresuid. A bit more digging indicates that the
code is infinite looping here:

__nptl_setxid (cmdp=0xffffd9d8) at allocatestack.c:1105
+list
1103    
1104      /* Now the list with threads using user-allocated stacks.  */
1105      list_for_each (runp, &__stack_user)
1106        {
1107          struct pthread *t = list_entry (runp, struct pthread, list);
1108          if (t == self)
1109            continue;
1110    
1111          setxid_mark_thread (cmdp, t);
1112        }

For some reason, list_for_each never terminates.

If I disable the dlmopen code path then the following holds at that
point in the code:

Breakpoint 6, __nptl_setxid (cmdp=0xffffd9e8) at allocatestack.c:1105
1105      list_for_each (runp, &__stack_user)
+bt
#0  __nptl_setxid (cmdp=0xffffd9e8) at allocatestack.c:1105
#1  0xf7b96162 in __GI___setresuid (ruid=1000, euid=1000, suid=1000)
      at ../sysdeps/unix/sysv/linux/i386/setresuid.c:29
#2  0x5655b7f0 in pa_drop_root ()
#3  0x56558a6e in main ()

Digging into __stack_user:

+p __stack_user
$1 = {next = 0xf73a48a0, prev = 0xf73a48a0}

+p &__stack_user
$2 = (list_t *) 0xf7d1d1a4 <__stack_user>

+p (&__stack_user)->next
$3 = (struct list_head *) 0xf73a48a0

+p (&__stack_user)->next->next
$4 = (struct list_head *) 0xf7d1d1a4 <__stack_user>

+p (&__stack_user)->next->next->next
$5 = (struct list_head *) 0xf73a48a0

We find a circular linked list, which contains a pointer to __stack_user.
Since list_for_each is invoked as list_for_each(…, &__stack_user),
this means the for loop it implements will terminate, allowing setresuid
to proceed.

// ============================================================================
Note: The definition of list_for_each is this:

# define list_for_each(pos, head) \
  for (pos = (head)->next; pos != (head); pos = pos->next)
// ============================================================================

Now let's examine the same case with the dlmopen call back in place:

Breakpoint 6, __nptl_setxid (cmdp=0xffffd9d8) at allocatestack.c:1105
1105      list_for_each (runp, &__stack_user)
 ⋮
+p __stack_user
$1 = {next = 0xf76eeb60, prev = 0xf76eeb60}

+p &__stack_user
$2 = (list_t *) 0xf7d8f1a4 <__stack_user>

+p (&__stack_user)->next
$3 = (struct list_head *) 0xf76eeb60

+p (&__stack_user)->next->next
$4 = (struct list_head *) 0xf71391a4

+p (&__stack_user)->next->next->next
$5 = (struct list_head *) 0xf76eeb60

We can see we have a circular linked list, as before, but it does
_not_ contain the element supplied as the head to list_for_each:
We're going to loop forever.

============================================================================

Next let's try and figure out when/where this happens.
Setting various breakpoints and watches we uncover the following:

+run
Starting program: /usr/bin/pulseaudio --start
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".

Breakpoint 1, __pthread_initialize_minimal_internal () at nptl-init.c:290
290     {
+break allocatestack.c:1105
Breakpoint 6 at 0xf7d78b2c: file allocatestack.c, line 1105.
+watch __stack_user
Hardware watchpoint 7: __stack_user
+watch __stack_user.next
Hardware watchpoint 8: __stack_user.next
+cont
Continuing.

Hardware watchpoint 7: __stack_user

Old value = {next = 0x0, prev = 0x0}
New value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0x0}

Hardware watchpoint 8: __stack_user.next

Old value = (struct list_head *) 0x0
New value = (struct list_head *) 0xf7d8f1a4 <__stack_user>
__pthread_initialize_minimal_internal () at nptl-init.c:377
377       list_add (&pd->list, &__stack_user);
+cont
Continuing.

Hardware watchpoint 7: __stack_user

Old value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0x0}
New value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0xf76eeb60}
list_add (head=<optimized out>, newp=0xf76eeb60) at ../include/list.h:64
64        head->next = newp;
+cont
Continuing.

Hardware watchpoint 7: __stack_user

Old value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0xf76eeb60}
New value = {next = 0xf76eeb60, prev = 0xf76eeb60}

Hardware watchpoint 8: __stack_user.next

Old value = (struct list_head *) 0xf7d8f1a4 <__stack_user>
New value = (struct list_head *) 0xf76eeb60
__pthread_initialize_minimal_internal () at nptl-init.c:381
381       THREAD_SETMEM (pd, report_events, __nptl_initial_report_events);
+cont
Continuing.

Breakpoint 2, __pthread_init_static_tls (map=0x5657e040) at
allocatestack.c:1210
1210    {

// ============================================================================
// At this point we step to the end of __pthread_init_static_tls and set
// an extra watch point on the address currently holding &__stack_user
// ============================================================================

+p __stack_user.next
$1 = (struct list_head *) 0xf76eeb60

+p __stack_user.next->next
$2 = (struct list_head *) 0xf7d8f1a4 <__stack_user>  ← STILL GOOD

+watch __stack_user.next->next
Hardware watchpoint 9: __stack_user.next->next
+s

// And here it is: 
Hardware watchpoint 9: __stack_user.next->next

Old value = (struct list_head *) 0xf7d8f1a4 <__stack_user>
New value = (struct list_head *) 0xf71391a4 ← >>>>> GONE WRONG HERE <<<<<
0xf7121c83 in ?? ()

// Hm, an unknown address scribbling on __stack_user.

+call calloc(1, sizeof(Dl_info))
$3 = (void *) 0x56574d18
+call dladdr(0xf7121c83, $3)
$4 = 1

+p *(Dl_info *)$3
$5 = {dli_fname = 0x565755b8 "/lib/i386-linux-gnu/libpthread.so.0",
      dli_fbase = 0xf711d000,
      dli_sname = 0xf711f617 "__pthread_initialize_minimal",
      dli_saddr = 0xf7121be0}

// Well that can't be right, can it? gdb should have figured out the name
// of 0xf7121c83, not said ?? - let's work out the address in the other
// direction:

+p __pthread_initialize_minimal
$6 = {<text variable, no debug info>} 0xf7d77be0
      <__pthread_initialize_minimal_internal>

+call dladdr(0xf7d77be0, $3)
$8 = 1

+p *(Dl_info *)$3
$10 = {dli_fname = 0xf7fd4d70 "/lib/i386-linux-gnu/libpthread.so.0",
       dli_fbase = 0xf7d73000,
       dli_sname = 0xf7d75617 "__pthread_initialize_minimal", 
       dli_saddr = 0xf7d77be0 <__pthread_initialize_minimal_internal>}

// ============================================================================

Aha! Same DSO, different base address. So the ?? instance of
__pthread_initialize_minimal_internal was from the _other_ copy of libc,
inside the dlmopen namespace - the one gdb doesn't know how to inspect.

PS: for completeness, I went back and followed the __stack_user linked list
at the "GONE WRONG HERE" point, just to be sure:

+p __stack_user
$1 = {next = 0xf76eeb60, prev = 0xf76eeb60}

+p __stack_user.next
$2 = (struct list_head *) 0xf76eeb60

+p __stack_user.next->next
$3 = (struct list_head *) 0xf71391a4

+p __stack_user.next->next->next
$4 = (struct list_head *) 0xf71391a4

+p __stack_user.next->next->next->next
$5 = (struct list_head *) 0xf71391a4

So the linked list definitely doesn't contain &__stack_user any more.

// ============================================================================

Apologies for the exegesis: It seems to me that the copy of libc in the
private namespace has somehow managed to scribble on the linked list
pointed to by __stack_user, overwriting a key address.

Is my analysis correct? Is there something I could or should have done to
avoid this?

A while ago (https://sourceware.org/ml/libc-help/2018-01/msg00002.html)
I suggested a dlmopen flag RTLD_UNIQUE or similar which would cause the
existing mapping of the target library in the main namespace/link-map to be
re-used instead of creating a new one: I believe this would prevent this
problem (and others detailed in that message) from occurring - any thoughts?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]