This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug nptl/17705] New: nptl_db: stale thread create/death events if debugger detaches


https://sourceware.org/bugzilla/show_bug.cgi?id=17705

            Bug ID: 17705
           Summary: nptl_db: stale thread create/death events if debugger
                    detaches
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: nptl
          Assignee: unassigned at sourceware dot org
          Reporter: palves at redhat dot com
                CC: drepper.fsp at gmail dot com

I wrote a GDB test that attaches to a program that is constantly/quickly
spawning short-lived threads.  The test makes GDB attach, have threads hit a
breakpoint, detach, and then reattaches, rinse/repeat.

Sometimes, the test fails with a surprising libthread_db error:

 (gdb) continue
 Continuing.
 Cannot get thread event message: debugger service failed
 (gdb)

Investigation showed that that test exposes a libthread_db issue.

If we detach just after a thread had decided that it needs to report an event
to the debugger (thread creation or death), and before the event is actually
queued (in __nptl_last_event), and the event function (__nptl_create_event or
__nptl_death_event) is called, the debugger won't be around to consume the
event, but the thread will still be left dangling in the __nptl_last_event
event queue/list.

__pthread_create_2_1():
...
  /* Start the thread.  */
  if (__glibc_unlikely (report_thread_creation (pd)))
    {
...
      retval = create_thread (pd, iattr, true, STACK_VARIABLES_ARGS,
                  &thread_ran);
      if (retval == 0)
    {
...
          pd->eventbuf.eventnum = TD_CREATE;
          pd->eventbuf.eventdata = pd;

          /* Enqueue the descriptor.  */
          do
            pd->nextevent = __nptl_last_event;
          while (atomic_compare_and_exchange_bool_acq (&__nptl_last_event,
                                                       pd, pd->nextevent)
                                                     != 0);

          /* Now call the function which signals the event.  */
          __nptl_create_event ();
...


That is, if the debugger detaches after the report_thread_creation check and
before the __nptl_create_event call.

Later when the thread dies, if it has a glibc managed stack, and its stack is
reused, its event buffer is cleared, but, __nptl_last_event (or a thread in the
chain that itself is __nptl_last_event ultimately) still has a stale pointer to
to it.

So if another GDB reattaches, when any thread pushes another event, the new GDB
fetches the events out of libthread_db, with td_ta_event_getmsg.  Now
td_ta_event_getmsg finds a stale pointer to the resumed thread stack in the
event list, with no event, which fails with TD_DBERR:

td_err_e
td_ta_event_getmsg (const td_thragent_t *ta_arg, td_event_msg_t *msg)
{
...
  /* If the structure is on the list there better be an event recorded.  */
  if ((int) (uintptr_t) eventnum == TD_EVENT_NONE)
    return TD_DBERR;
...

And thus GDB's "debugger service failed" error message.

If the thread had been allocated on a user provided stack, then the failures
modes will even be more "interesting", possibly even corrupting the inferior,
as that TD_EVENT_NONE check (and a similar one in td_thr_event_getmsg) might
well be fooled, for reading from a dangling pointer.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]