This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug nptl/17705] New: nptl_db: stale thread create/death events if debugger detaches
- From: "palves at redhat dot com" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Fri, 12 Dec 2014 17:28:20 +0000
- Subject: [Bug nptl/17705] New: nptl_db: stale thread create/death events if debugger detaches
- Auto-submitted: auto-generated
https://sourceware.org/bugzilla/show_bug.cgi?id=17705
Bug ID: 17705
Summary: nptl_db: stale thread create/death events if debugger
detaches
Product: glibc
Version: unspecified
Status: NEW
Severity: normal
Priority: P2
Component: nptl
Assignee: unassigned at sourceware dot org
Reporter: palves at redhat dot com
CC: drepper.fsp at gmail dot com
I wrote a GDB test that attaches to a program that is constantly/quickly
spawning short-lived threads. The test makes GDB attach, have threads hit a
breakpoint, detach, and then reattaches, rinse/repeat.
Sometimes, the test fails with a surprising libthread_db error:
(gdb) continue
Continuing.
Cannot get thread event message: debugger service failed
(gdb)
Investigation showed that that test exposes a libthread_db issue.
If we detach just after a thread had decided that it needs to report an event
to the debugger (thread creation or death), and before the event is actually
queued (in __nptl_last_event), and the event function (__nptl_create_event or
__nptl_death_event) is called, the debugger won't be around to consume the
event, but the thread will still be left dangling in the __nptl_last_event
event queue/list.
__pthread_create_2_1():
...
/* Start the thread. */
if (__glibc_unlikely (report_thread_creation (pd)))
{
...
retval = create_thread (pd, iattr, true, STACK_VARIABLES_ARGS,
&thread_ran);
if (retval == 0)
{
...
pd->eventbuf.eventnum = TD_CREATE;
pd->eventbuf.eventdata = pd;
/* Enqueue the descriptor. */
do
pd->nextevent = __nptl_last_event;
while (atomic_compare_and_exchange_bool_acq (&__nptl_last_event,
pd, pd->nextevent)
!= 0);
/* Now call the function which signals the event. */
__nptl_create_event ();
...
That is, if the debugger detaches after the report_thread_creation check and
before the __nptl_create_event call.
Later when the thread dies, if it has a glibc managed stack, and its stack is
reused, its event buffer is cleared, but, __nptl_last_event (or a thread in the
chain that itself is __nptl_last_event ultimately) still has a stale pointer to
to it.
So if another GDB reattaches, when any thread pushes another event, the new GDB
fetches the events out of libthread_db, with td_ta_event_getmsg. Now
td_ta_event_getmsg finds a stale pointer to the resumed thread stack in the
event list, with no event, which fails with TD_DBERR:
td_err_e
td_ta_event_getmsg (const td_thragent_t *ta_arg, td_event_msg_t *msg)
{
...
/* If the structure is on the list there better be an event recorded. */
if ((int) (uintptr_t) eventnum == TD_EVENT_NONE)
return TD_DBERR;
...
And thus GDB's "debugger service failed" error message.
If the thread had been allocated on a user provided stack, then the failures
modes will even be more "interesting", possibly even corrupting the inferior,
as that TD_EVENT_NONE check (and a similar one in td_thr_event_getmsg) might
well be fooled, for reading from a dangling pointer.
--
You are receiving this mail because:
You are on the CC list for the bug.