This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
RE: offline elfutils processing committed
- From: Martin Hunt <hunt at redhat dot com>
- To: "Stone, Joshua I" <joshua dot i dot stone at intel dot com>
- Cc: "Frank Ch. Eigler" <fche at redhat dot com>, systemtap at sources dot redhat dot com
- Date: Mon, 06 Nov 2006 23:37:09 -0500
- Subject: RE: offline elfutils processing committed
- Organization: Red Hat Inc.
- References: <C56DB814FAA30B418C75310AC4BB279DE35064@scsmsx413.amr.corp.intel.com>
On Mon, 2006-11-06 at 14:15 -0800, Stone, Joshua I wrote:
> On Monday, November 06, 2006 1:18 PM, Martin Hunt wrote:
> > The point is damage control. Systemtap allocates too much memory and
> > oom killer gets active, the first thing it will kill is staprun and
> > that should unload the module (but this seems broken at the moment).
> > So we haven't really hurt the system.
>
> The goal is fine, but I don't think this accomplishes it. My
> understanding is that __alloc_pages will keep calling OOM until it is
> able to satisfy the request -- thus the module is blocked waiting for
> memory. The process might end up something like:
>
> stap module: allocate lots of memory
> __alloc_pages: Not enough memory -> OOM kill something (staprun)
> __alloc_pages: Still not enough memory -> OOM kill other stuff
> __alloc_pages: Yay, now we have enough memory!
> stap module: got some memory
> stap module: Oops, staprun is gone, better exit...
There are 2 different, but related problems. The one you describe is
easily fixed by using the GFP_NORETRY flag on our allocs. The second
problem is the one I was trying to describe. What happens when
systemtap's allocations succeed, but leave the system in a low memory
state such that other applications trigger the oom killer when they try
to allocate memory. In this case, we want staprun and the systemtap
module to be first to be killed. I haven't looked at the sources, but it
seems unlikely to me that the oom killer would be so fast that it would
kill staprun and then kill other processes before the module is also
killed and frees it's memory.
Martin