This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug testsuite/20600] parallel testsuite hang in [nd_]syscall.exp


https://sourceware.org/bugzilla/show_bug.cgi?id=20600

--- Comment #4 from David Smith <dsmith at redhat dot com> ---
Here's an update.

I believe I've tracked this down to _stp_init_time(). I've added lots of
'might_sleep()' calls to that function in my local copy of systemtap, and I got
the following:

====
Sep 21 22:32:40 ibm-p8-01-lp7.lab.eng.rdu.redhat.com kernel: BUG: sleeping
function called from invalid context at
/usr/local/share/systemtap/runtime/time.c:323
Sep 21 22:32:40 ibm-p8-01-lp7.lab.eng.rdu.redhat.com kernel: in_atomic(): 1,
irqs_disabled(): 0, pid: 5960, name: stapio
Sep 21 22:32:40 ibm-p8-01-lp7.lab.eng.rdu.redhat.com kernel: INFO: lockdep is
turned off.
====

Here's that section of runtime/time.c (with line numbers):

====
   310      might_sleep();
   311      stp_time = _stp_alloc_percpu(sizeof(stp_time_t));
   312      if (unlikely(stp_time == 0))
   313              return -1;
   314  
   315      might_sleep();
   316  #ifdef STAPCONF_ONEACHCPU_RETRY
   317      ret = on_each_cpu(__stp_init_time, NULL, 0, 1);
   318  #else
   319      ret = on_each_cpu(__stp_init_time, NULL, 1);
   320  #endif
   321  
   322  #ifdef STAPCONF_ADD_TIMER_ON
   323      might_sleep();
   324      for_each_online_cpu(cpu) {
   325          stp_time_t *time = per_cpu_ptr(stp_time, cpu);
   326          add_timer_on(&time->timer, cpu);
   327      }
   328  #endif
===

I believe this means that something in the following line is causing us to
become atomic:

    ret = on_each_cpu(__stp_init_time, NULL, 1);

At line 315, might_sleep() didn't complain, but at line 323 we're suddenly
atomic.

On RHEL7, on_each_cpu() looks like the following:

====
static inline int on_each_cpu(smp_call_func_t func, void *info, int wait)
{
        unsigned long flags;

        local_irq_save(flags);
        func(info);
        local_irq_restore(flags);
        return 0;
}
====

That matches up to the kernel message above, since irqs aren't disabled. So, my
guess is that something in __stp_init_time() is causing us to become atomic.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]