This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: stap vs pgrps


> The stapgui wanted to SIGTERM stap to finish the run, but this wasn't
> passing the SIGTERM to stapio/staprun.  Thus, while stap would quit, the
> module would still be active.

Normal ways of killing the job would send the signal to the whole pgrp.
Before all these machinations, all of stap's brood would be in the pgrp
that stap started in, so all would get the same signal.  If a user does
"kill %1" in a shell, that's the pgrp (SIGTERM).  If a user hits ^C, that's
the pgrp (SIGINT).  If the tty hangs up, that's the pgrp (SIGHUP).

Say you run a shell script:

     #!/bin/sh

     echo hi
     (sleep 10; echo still here!)
     echo bye

(This is just like stap with its child.)  Run it like this:

     $ sh script & pid=$! ; sleep 2; kill $pid; wait; echo killed $pid; sleep 9
     [1] 3129
     hi
     [1]+  Terminated              sh /tmp/foo.sh
     killed 3129
     still here!
     $

Here your interactive shell ($ prompt) is acting like stapgui.
The shell running "sh script" is acting like stap.

What you see is that "kill $pid" did the system call kill(3129, SIGTERM),
i.e. just the "sh script" (stap) process and not its whole pgrp.  (If you'd
typed "kill %1" the shell would have instead done kill(-3129, SIGTERM),
i.e. the pgrp of the "sh script" job.)  The shell process died (nothing
left to go on to "echo bye").  It was completely dead (parent shell waited
before the "echo killed 3129" we see), and its child was still running.
To wit, the "echo still here!" comes about 8 seconds later.  (If not for
the trailing "sleep 9", your interactive shell's prompt would have come
back first and "still here!" would arrive in the middle of your later work.)

So, what would foobargui be complaining about if foobar were implemented as
a shell script that used some background processes and "wait"?  The same
thing, but perhaps it would be immediately obvious that foobargui was the
one in the wrong.  

I have not looked at stapgui's context in detail to recommend what in
particular it ought to be doing instead.  Since I see it's entirely written
in Eclipsese, we may need to consult with our Eclipse experts to work it out.

> I think the main source of this grief is that we use system() to spawn
> processes, which doesn't give us a way to propagate our signals.  

system does have that property.  I'm just not convinced that stap actually
needs to be in the business of caring about this.

> I'm thinking of something like this instead:
> 
>   int stap_spawn(...) {
>     posix_spawn(&pid, ...);
>     while (1) {
>       pid_ret = waitpid(&pid, &status, 0);
			  ^
>       if (pid_ret == pid)
>         return WIFEXITED(status) ? WEXITSTATUS(status) : -1;
>       if (errno == EINTR)
>         kill(pid, SIGTERM);
>       else
>         return -1;
>     }
>   }
> 
> 
> Does that look sane?

Except for the typo, more or less.

What shells use for exit status is:
	WIFEXITED(status) ? WEXITSTATUS(status) : 128 + WTERMSIG(status)

The EINTR check only makes sense if you have used sigaction for signals of
concern to install handlers that disable SA_RESTART behavior so you will
actually get the EINTR.  

When that's so, you'll hit this case for a signal that won't be fatal too,
e.g. for SIGTSTP/SIGCONT after you resume (even if those weren't handled).
So the normal method is to loop, and only do the kill based on testing a
flag set by a signal handler.


Thanks,
Roland


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]