Bug 3591 - need better array/misc. allocation + tests
Summary: need better array/misc. allocation + tests
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: runtime (show other bugs)
Version: unspecified
: P1 critical
Target Milestone: ---
Assignee: Martin Hunt
URL:
Keywords:
: 3592 3593 (view as bug list)
Depends on:
Blocks:
 
Reported: 2006-11-25 23:49 UTC by Frank Ch. Eigler
Modified: 2007-07-19 16:24 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Frank Ch. Eigler 2006-11-25 23:49:57 UTC
As work already in progress, this bug is simply to track the completion of a
crash-avoiding alternative to array/buffer allocations.
Comment 1 Frank Ch. Eigler 2006-11-26 22:48:23 UTC
*** Bug 3592 has been marked as a duplicate of this bug. ***
Comment 2 Frank Ch. Eigler 2006-11-26 22:48:40 UTC
*** Bug 3593 has been marked as a duplicate of this bug. ***
Comment 3 Frank Ch. Eigler 2006-11-27 04:43:47 UTC
If the intent is that the 2006-11-15 STP_ALLOC_FLAGS-related changes is
sufficient, it would be good to prove this with some tests in the suite.
Comment 4 Martin Hunt 2006-11-27 17:03:12 UTC
Changes to the translator and runtime over the past two weeks have, AFAICT,
fixed the crashing issue.

There is no test case because we have not yet addressed the more general problem
of poor interaction with linux's overcommitting allocator and oom-killer.  One
one of my test systems I sometimes see oom-killer being invoked and killing
staprun. This leaves an orphaned systemtap module still in memory. This is
unacceptable.

Do you want to leave this BZ open for the test case, or just change the summary?

Comment 5 Frank Ch. Eigler 2006-11-28 02:11:56 UTC
(In reply to comment #4)
> Changes to the translator and runtime over the past two weeks have, AFAICT,
> fixed the crashing issue.

OK, a test case for even mild scenarios would be good.

> There is no test case because we have not yet addressed the more general problem
> of poor interaction with linux's overcommitting allocator and oom-killer.  One
> one of my test systems I sometimes see oom-killer being invoked and killing
> staprun.

When?  During probe startup?  After?  Well after?

> This leaves an orphaned systemtap module still in memory. This is
> unacceptable.

Really?  Nothing much must break if staprun happens to be killed by an
erroneous kill -9.  The module should be removable cleanly with rmmod
at any time.  The module could self-terminate if it detects staprun
going away suddenly (though I thought it already did that at one point).

> Do you want to leave this BZ open for the test case, or just change the summary?

Both. :-)
Comment 6 Martin Hunt 2006-11-28 16:37:11 UTC
(
> > There is no test case because we have not yet addressed the more general problem
> > of poor interaction with linux's overcommitting allocator and oom-killer.  One
> > one of my test systems I sometimes see oom-killer being invoked and killing
> > staprun.
> 
> When?  During probe startup?  After?  Well after?

If you set MAXMAPENTRIES too large, it will happen before probe startup. But
only rarely and only on vmware. However you can imagine that if MAXMAPENTRIES is
set just right, all of systemtap's memory could be allocated successfully, and
then some other app decides it wants some memory it thinks it allocated and that
memory isn't really available and oom-killer gets invoked.

> > This leaves an orphaned systemtap module still in memory. This is
> > unacceptable.
> 
> Really?  Nothing much must break if staprun happens to be killed by an
> erroneous kill -9.

Nothing breaks, except with the caching, we cannot rerun the script.

> The module should be removable cleanly with rmmod
> at any time.  

It can be.

> The module could self-terminate if it detects staprun
> going away suddenly (though I thought it already did that at one point).

Yeah, that's the problem.  If there is a way for a module to unload itself, I
don't know about it.  That's why my preferred approach is to force oom-killer to
kill stap and not staprun. Killing stap would be detected by staprun which would
unload the module and then itself. 


Comment 7 Frank Ch. Eigler 2006-11-28 18:31:48 UTC
> > > This leaves an orphaned systemtap module still in memory. This is
> > > unacceptable.
> > 
> > Really?  Nothing much must break if staprun happens to be killed by an
> > erroneous kill -9.
> 
> Nothing breaks, except with the caching, we cannot rerun the script.

OK, that's not that bad.  staprun could print a better error message,
and suggest rmmod'ing the duplicate.

> > The module could self-terminate if it detects staprun
> > going away suddenly (though I thought it already did that at one point).
> 
> Yeah, that's the problem.  If there is a way for a module to unload itself, I
> don't know about it.  

Right.  At least, we could run the shutdown code to release memory and
unregister the probes, and spit out a printk as an explanation.

> That's why my preferred approach is to force oom-killer to
> kill stap and not staprun. 

Thing is, both stap and staprun will ideally use up rather little core during
actual execution.  If the OOM guy is hungry, both may well get the knife.
Comment 8 Martin Hunt 2007-07-19 16:24:48 UTC
This is fixed except for the discussed oom-killer interaction.  I opened 4815 as
a new PR for just that issue.