This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug runtime/15982] process.end probes broken on RHEL6


https://sourceware.org/bugzilla/show_bug.cgi?id=15982

Josh Stone <jistone at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jistone at redhat dot com

--- Comment #2 from Josh Stone <jistone at redhat dot com> ---
(In reply to David Smith from comment #1)
> 1) Let the process.end probe fire even when the the module's session state
> isn't STAP_SESSION_RUNNING. This works a good bit of the time, but not 100%
> consistently.

I'm guessing this still has a race, to reach the process.end before the module
has executed all the end probes and reached cleanup / unload.

Even if that doesn't race, it breaks the general idea that end probes run in
exclusion, after everything else has finished.  For example, a final report of
probe activity is not so final anymore if a process.end might change data.


> 2) Switch the task_finder from using UTRACE_DEATH (Thread has died) to
> UTRACE_EXIT (Thread exit in progress). The UTRACE_EXIT event happens before
> the signal is sent to the dying thread's parent, so we won't miss the event.

This sound fine as long as these really are paired, meaning that UTRACE_EXIT
always precedes and is always followed by UTRACE_DEATH.  That does appear to be
true, as far as I can follow the tracehook_reports in kernel/exit.c.

> In the tracepoint-based utrace replacement (for kernels without built-in
> utrace), the 'sched_process_exit' tracepoint (which we use for process.end
> probes) happens in a similar place as the UTRACE_EXIT hook, so this should
> work reasonably well.

The position of the tracepoint is one thing, and it's also important when our
"quiesce" task work will get caught up, right?  But this appears to be shortly
after, via exit_task_work(), still before exit_notify() is called.  OK. :)


So (2) seems clearly preferable to me.

-- 
You are receiving this mail because:
You are the assignee for the bug.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]