This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: context[2] stuck: (null)
- From: Arkady <arkady dot miasnikov at gmail dot com>
- To: systemtap at sourceware dot org
- Date: Tue, 11 Jul 2017 13:30:33 +0300
- Subject: Re: context[2] stuck: (null)
- Authentication-results: sourceware.org; auth=none
- References: <CANA-60q25-tnw72LjrtgMaavsE=VCee4-66fkOHbKBAHWSNqDA@mail.gmail.com> <CANA-60pAhrCXkzkNS5sY_Cypd-_ky=zfHmxHi-dj06QFkrOQAg@mail.gmail.com> <CANA-60rKy7uxpeF_zSV=nMwGO_oRp_rNHnt-PrWRzG4QivKsrQ@mail.gmail.com>
Is there a chance to get a user defined hook in the context of
systemtap_module_init()
Something like this, for example:
diff --git a/translate.cxx b/translate.cxx
index 0b9fc45..ba7088b 100644
--- a/translate.cxx
+++ b/translate.cxx
@@ -1774,6 +1774,10 @@ c_unparser::emit_module_init ()
o->newline() << "#include \"linux/stp_tracepoint.c\"";
o->newline() << "#endif";
+ o->newline() << "#ifdef STAP_NEED_USER_INIT";
+ o->newline() << "static int stp_user_init(void);";
+ o->newline() << "#endif";
+
o->newline();
o->newline() << "static int systemtap_module_init (void) {";
o->newline(1) << "int rc = 0;";
@@ -1922,6 +1926,15 @@ c_unparser::emit_module_init ()
o->newline(1) << "goto out;";
o->indent(-1);
+ // user init hook
+ o->newline() << "#ifdef STAP_NEED_USER_INIT";
+ o->newline() << "rc = stp_user_init();";
+ o->newline() << "if (rc) {";
+ o->newline(1) << "_stp_error (\"couldn't initialize user init\");";
+ o->newline() << "goto out;";
+ o->newline(-1) << "}";
+ o->newline() << "#endif";
+
On Tue, Jul 11, 2017 at 9:46 AM, Arkady <arkady.miasnikov@gmail.com> wrote:
> Update. Some of the system calls I am doing in the begin probe are
> blocking. I understand that it will break things on multicore systems.
> Am I right?
>
> On Tue, Jul 11, 2017 at 9:24 AM, Arkady <arkady.miasnikov@gmail.com> wrote:
>> Update. The failure happens consistently in the same context
>>
>> "context[1] stuck: (null), line_get=36848, line_put=36914
>> last_err=(null) last_stmt=identifier 'probe_begin'"
>> where line_get and line_put are lines in the enter_be_probe()
>>
>> 36843 #endif
>> 36844 goto probe_epilogue;
>> 36845 }
>> 36846 if (atomic_read (session_state()) != stp->state)
>> 36847 goto probe_epilogue;
>> 36848 c = _stp_runtime_entryfn_get_context(__LINE__);
>> 36849 if (!c) {
>> 36850 #if !INTERRUPTIBLE
>> 36851 atomic_inc (skipped_count());
>> 36852 #endif
>> 36853 #ifdef STP_TIMING
>> 36854 atomic_inc (skipped_count_reentrant());
>> 36855 #endif
>> 36856 goto probe_epilogue;
>> 36857 }
>> ..................
>> 36907 }
>> 36908 }
>> 36909 probe_epilogue:
>> 36910 if (unlikely (atomic_read (skipped_count()) > MAXSKIPPED)) {
>> 36911 if (unlikely (pseudo_atomic_cmpxchg(session_state(),
>> STAP_SESSION_RUNNING, STAP_SESSION_ERROR) == STAP_SESSION_RUNNING))
>> 36912 _stp_error ("Skipped too many probes, check MAXSKIPPED or
>> try again with stap -t for more details.");
>> 36913 }
>> 36914 _stp_runtime_entryfn_put_context(c, __LINE__);
>> 36915 #if !INTERRUPTIBLE
>> 36916 local_irq_restore (flags);
>> 36917 #endif
>> 36918 #endif // STP_ALIBI
>>
>> On Mon, Jul 10, 2017 at 6:16 PM, Arkady <arkady.miasnikov@gmail.com> wrote:
>>> Hi,
>>>
>>> I am getting context[2] stuck: (null) error. The cause of error is
>>> likely the "unmanaged" code I have added to the driver. Specifically I
>>> have a shared memory (mmap) in the driver. The failure happens
>>> randomly every 50-200 module restarts The failure happens only on the
>>> multicore CPUs, or happens often enough to be caught.
>>>
>>> I tried to force the the wait function with
>>> STAP_OVERRIDE_STUCK_CONTEXT - kernel panics in one of the (probably
>>> random) probes.
>>>
>>> While debugging the issue I patched the SystemTap source code - added
>>> an argument to the _stp_runtime_entryfn_get_context(int) like in this
>>> commit https://github.com/larytet/SystemTap/commit/61a284732893fa6f201e07f9f12f5e1820e7c26f
>>> In the function _stp_runtime_context_wait() I print the line in the
>>> source code which called the _stp_runtime_entryfn_get_context()
>>>
>>> The "bad" context is enter_be_probe(). I checked the source code of
>>> enter_be_probe() and there is not much there.
>>>
>>> I struggle with the problem for some time and I will greatly
>>> appreciate any tip.
>>>
>>> Thank you, Arkady.