This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [patch] Fix for PR gdb/10757

From: Pedro Alves <pedro at codesourcery dot com>
To: Paul Pluzhnikov <ppluzhnikov at google dot com>
Cc: gdb-patches at sourceware dot org
Date: Sun, 25 Oct 2009 23:54:53 +0000
Subject: Re: [patch] Fix for PR gdb/10757
References: <20091015191037.4091D76761@ppluzhnikov.mtv.corp.google.com> <8ac60eac0910171625n489736e1v94326e4d95b40f98@mail.gmail.com>

On Sunday 18 October 2009 00:25:56, Paul Pluzhnikov wrote:
> 2009-10-17 ?Paul Pluzhnikov ?<ppluzhnikov@google.com>
> 
> ? ? ? ?PR gdb/10757
> ? ? ? ?* linux-thread-db.c (attach_thread): Return success/failure
> ? ? ? ?indicator.
> ? ? ? ?(thread_db_find_new_threads_silently): Retry until no new threads.
> ? ? ? ?(struct callback_data): New.
> ? ? ? ?(find_new_threads_callback): Count new threads, stop iteration
> ? ? ? ?on error.
> ? ? ? ?(find_new_threads_once): New function.
> ? ? ? ?(thread_db_find_new_threads_2): Rename from
> ? ? ? ?thread_db_find_new_threads_1 and adjust.
> ? ? ? ?(thread_db_find_new_threads_1): New function.

Thanks.  This is quite unfortunate, but we get to live with it...

FTR, I've took a peek at nptl/nptl_db, to try to understand
a bit better why this happens (please do correct me if I'm
wrong):

 - First, it is worth remind us while nptl takes locks and barriers
   to protect its internal structures, thread_db/ntpl_db doesn't,
   since it runs in gdb's context.

 - nptl adds new threads to one of the __stack_user or stack_used
   lists.  New nodes are added to the _head_ of the lists.

 - nptl_db/td_ta_thr_iter.c:td_ta_thr_iter iterates over
   these thread lists from head to tail.

From these last two points alone, one can infer that an iteration
over the thread list can miss new threads if they're added just while
the list is being iterated.

Another way to tackle this, would be iterate over 
/proc/pid/task/ and attach to all lwps that way, instead of
relying on thread_db to do the initial iteration over all
threads.  We'd still iterate using the thread_db interface
once afterwards, so to learn the matching pthread
ids of the lwps.

With /proc/pid/task, you'd still need to keep iterating until
no new thread comes out, but, I think we could make the
number of loops unbounded much safely, since we wouldn't be
at risk of reading stale inferior data.  Assuming no kernel
bugs, that is.  Note that this is something that would be
a useful addition to gdb anyway, so that we could be able to
attach to multi-threaded apps that use raw clone instead of
pthreads.

> + ? ? ?else
> +???????/* Problem attaching this thread; perhaps it exited before we
> +??????? ? could attach it?
> +??????? ? This could mean that the thread list inside glibc itself is in
> +??????? ? inconsistent state, and libthread_db could go on looping forever
> +??????? ? (observed with glibc-2.3.6). ?To prevent that, terminate
> +??????? ? iteration: thread_db_find_new_threads_2 will retry. ?*/
> +???????return 1;

Probably related to the fact that gdb can be iterating over
the threads lists, exactly while libc is concurrently removing
a thread from the same lists.  nptl_db could be iterating over
stale inferior pointers.

> ?/* Search for new threads, accessing memory through stopped thread
> - ? PTID. ?*/
> + ? PTID. ?If NUM_LOOPS is non-zero, repeat searching until NUM_LOOP
> + ? searches in a row do not discover any new threads. ?*/
> ?
> ?static void
> -thread_db_find_new_threads_1 (ptid_t ptid)
> +thread_db_find_new_threads_2 (ptid_t ptid, int num_loop)
> ?{
(...)
> +
> + ?if (num_loop != 0)
> + ? ?{
> + ? ? ?/* Require NUM_LOOP successive iterations which do not find any new threads. ?*/

This line too long.

> + ? ? ?for (i = 0, loop = 0; loop < 4; ++i, ++loop)

Err, s/4/num_loop/ ?  A comment explain _why_ we need this is missing
as well.  The patch looks OK otherwise.

Actually, I'd rename the `num_loop' parameter into
say, `until_no_new', and do this instead:

  if (until_no_new)
    {
      /* Require a few successive iterations which do not find any new threads.  */
      for (i = 0, loop = 0; loop < 4; ++i, ++loop)

That would be one way of localizing a bit the detail (and any comments)
of how many iterations you've happened to decide were sufficient in
practice, in one place, as opposed to spread around, like in:

> @@ -597,7 +598,7 @@ thread_db_find_new_threads_silently (pti
>  
>    TRY_CATCH (except, RETURN_MASK_ERROR)
>      {
> -      thread_db_find_new_threads_1 (ptid);
> +      thread_db_find_new_threads_2 (ptid, 4);
>      }

... or any other call sites we may add in the future.

-- 
Pedro Alves

Follow-Ups:
- Re: [patch] Fix for PR gdb/10757
  - From: Paul Pluzhnikov

References:
- [patch] Fix for PR gdb/10757
  - From: Paul Pluzhnikov
- Re: [patch] Fix for PR gdb/10757
  - From: Paul Pluzhnikov

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]