Summary: | improve translated C code to reduce compile & run time | ||
---|---|---|---|
Product: | systemtap | Reporter: | Martin Hunt <hunt> |
Component: | translator | Assignee: | Frank Ch. Eigler <fche> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | P1 | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: | 2006-01-23 18:13:16 | |
Bug Depends on: | |||
Bug Blocks: | 2111 | ||
Attachments: | my test case |
Description
Martin Hunt
2005-12-15 18:09:15 UTC
Created attachment 804 [details]
my test case
Are you sure you're running cvs systemtap? Graydon made a big improvement in just this area of code a few days ago: bug #1931 Never mind, misunderstood your timings. Needs further study. This looks decidedly wrong. Off hand I can't tell why. It's possible that we're simply generating too much code -- maybe 200 syscalls times a handful of parameter-accessor functions makes "too much code" -- but it also looks like we're generating junk as well. Experiments ongoing. Counterintuitively, it seems like the probe handler bodies are *not* the dominant factor. With all ~500 of them commented out, the compile time is still just as long. Judging by the resulting function/symbol sizes, I infer that the module_init/module_exit functions are stressing the C compiler most, and therefore will look there first. Patches just committed appear to improve this significantly. BEFORE ~> time stap -p4 sys.stp real 1m40.500s user 1m35.334s sys 0m5.947s AFTER ~> time stap -p4 sys.stp real 0m47.287s user 0m46.979s sys 0m1.393s That was a big improvement. Still, I hope we can eventually improve upon this. I suggest keeping this open at a lowered priority. Right. I anticipate further improvements are possible along these lines: - reducing the amount of code generated (duh), particularly: - collecting the activity-count additions & especially checks - reducing the frequency of last_stmt assignments, and last_eerror checks - raising some global variable locking/unlocking code up to the outermost nesting level of probe/function bodies; beyond simplifying the emitted C code, this could reduce potential concurrency but it would kill a bunch of race conditions - adjusting the kbuild CFLAGS to lessen optimization *** Bug 1159 has been marked as a duplicate of this bug. *** *** Bug 1330 has been marked as a duplicate of this bug. *** - will include lock lifting, unused $target elimination, and one or two other optimizations mostly done; need just lock lifting now lock lifting done. other future improvements are possible; will be tracked separately. |