Bug 2060

Summary: improve translated C code to reduce compile & run time
Product: systemtap Reporter: Martin Hunt <hunt>
Component: translatorAssignee: Frank Ch. Eigler <fche>
Severity: normal    
Priority: P1    
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed: 2006-01-23 18:13:16
Bug Depends on:    
Bug Blocks: 2111    
Attachments: my test case

Description Martin Hunt 2005-12-15 18:09:15 UTC
Typical compile times for simple scripts probing kernel.syscall.* is 1 to 2 minutes.

~> time stap -p2 sys.stp > foo

real    0m1.570s
user    0m1.518s
sys     0m0.052s
~> time stap -p3 sys.stp > foo

real    0m3.282s
user    0m2.136s
sys     0m1.148s
~> wc -l foo
183458 foo
~> time stap -p4 sys.stp

real    1m27.217s
user    1m23.365s
sys     0m4.691s

So we have a 183458 line C file to compile. The context struct itself is over
14000 lines long and includes stuff like:
struct function__module_flags_str_locals {
      int64_t f;
      union {
        struct {
        struct {
        struct {
        struct {
        struct {
        struct {
        struct {
        struct {
        struct {
        struct {
        struct {
        struct {
      string_t __retvalue;
    } function__module_flags_str;

Everything else in the C file looks normal at first glance. Very repetetive,
Comment 1 Martin Hunt 2005-12-15 18:10:00 UTC
Created attachment 804 [details]
my test case
Comment 2 Frank Ch. Eigler 2005-12-15 18:16:07 UTC
Are you sure you're running cvs systemtap?
Graydon made a big improvement in just this area of code a few days ago: bug #1931
Comment 3 Frank Ch. Eigler 2005-12-15 18:17:27 UTC
Never mind, misunderstood your timings.
Needs further study.
Comment 4 Graydon Hoare 2005-12-21 01:36:03 UTC
This looks decidedly wrong. Off hand I can't tell why. It's possible that we're
simply generating too much code -- maybe 200 syscalls times a handful of
parameter-accessor functions makes "too much code" -- but it also looks like
we're generating junk as well. 
Comment 5 Frank Ch. Eigler 2006-01-04 21:45:03 UTC
Experiments ongoing.

Counterintuitively, it seems like the probe handler bodies are *not* the
dominant factor.  With all ~500 of them commented out, the compile time is still
just as long.  Judging by the resulting function/symbol sizes, I infer that the
module_init/module_exit functions are stressing the C compiler most, and
therefore will look there first.
Comment 6 Frank Ch. Eigler 2006-01-10 18:52:36 UTC
Patches just committed appear to improve this significantly.
Comment 7 Martin Hunt 2006-01-10 20:34:24 UTC
~> time stap -p4 sys.stp
real    1m40.500s
user    1m35.334s
sys     0m5.947s

~> time stap -p4 sys.stp
real    0m47.287s
user    0m46.979s
sys     0m1.393s

That was a big improvement. Still, I hope we can eventually improve upon this. 
I suggest keeping this open at a lowered priority.
Comment 8 Frank Ch. Eigler 2006-01-10 20:44:42 UTC
Right.  I anticipate further improvements are possible along these lines:

- reducing the amount of code generated (duh), particularly:
  - collecting the activity-count additions & especially checks
  - reducing the frequency of last_stmt assignments, and last_eerror checks
  - raising some global variable locking/unlocking code up to the outermost
nesting level of probe/function bodies; beyond simplifying the emitted C code,
this could reduce potential concurrency but it would kill a bunch of race conditions
- adjusting the kbuild CFLAGS to lessen optimization
Comment 9 Frank Ch. Eigler 2006-01-10 20:52:37 UTC
*** Bug 1159 has been marked as a duplicate of this bug. ***
Comment 10 Frank Ch. Eigler 2006-01-10 23:18:42 UTC
*** Bug 1330 has been marked as a duplicate of this bug. ***
Comment 11 Frank Ch. Eigler 2006-01-23 18:13:16 UTC
- will include lock lifting, unused $target elimination, and one or two other
Comment 12 Frank Ch. Eigler 2006-01-24 17:58:40 UTC
mostly done; need just lock lifting now
Comment 13 Frank Ch. Eigler 2006-01-26 23:01:30 UTC
lock lifting done.
other future improvements are possible; will be tracked separately.