This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
RE: Probing for Zombie Processes?
- From: "Stone, Joshua I" <joshua dot i dot stone at intel dot com>
- To: "Nathan DeBardeleben" <ndebard at lanl dot gov>, <systemtap at sources dot redhat dot com>
- Date: Mon, 5 Mar 2007 14:38:23 -0800
- Subject: RE: Probing for Zombie Processes?
Nathan DeBardeleben wrote:
> I love the SystemTap wiki, by the way. The war stories are great - I
> encourage more of them! :)
> On that note, I am having a problem with processes going zombie on me
> - more often than I normally am used to and thought SystemTap might be
> useful in this regard. But, before I dive into it:
>
> 1: has anyone dug into looking at zombie processes?
> 2: anyone have any insight into what might be good areas to probe with
> respect to this
> 3: can anyone think of a better / easier tool that I should look into
> instead?
>
> I'm hoping I might be able to record something more useful than just
> "PID xyz went to state=zombie".
My understanding is that most processes will become zombies for a short
period, until their parent is notified through a wait() call. So I
think when a process gets stuck as a zombie, it's because the parent did
something wrong.
"process.exit" will tell you when a process has completed execution, and
"process.release" tells you when the process is actually marked for
deletion. If you're stuck as a zombie, you'll see an exit but no
release. You could also probe the function exit_notify, which sets up
the exit state as either EXIT_DEAD or EXIT_ZOMBIE.
The hard thing is that you're really trying to discover something that's
*not* happening -- namely the parent's wait. Perhaps probing this
syscall would be enlightening. You could also use the signals tapset to
look out for the SIGCHLD to the parent.
If you find a way to make sense of this, it's definitely a good one for
the war stories...
Josh