This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug testsuite/18577] New: on rhel7, listing_mode_sanity.exp always gets a failure when doing 'stap -l **'


https://sourceware.org/bugzilla/show_bug.cgi?id=18577

            Bug ID: 18577
           Summary: on rhel7, listing_mode_sanity.exp always gets a
                    failure when doing 'stap -l **'
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: testsuite
          Assignee: systemtap at sourceware dot org
          Reporter: dsmith at redhat dot com
  Target Milestone: ---

On all RHEL7 architectures (x86_64, ppc64, and s390x), the
listing_mode_sanity.exp test case always gets a failure when doing 'stap -l
**':

====
FAIL: listing_mode_sanity (stap -l ** exited badly)
====

At first I thought this was because of tapset errors - but the tapsets don't
have any errors on x86_64 RHEL7. So, I went a bit further. The failure occurs
because the command is taking too long. The testsuite uses an rc file to ensure
that systemtap doesn't take too long. So, in effect, when doing a 'stap -l **'
we are really doing the following the following:

====
# time stap -v --rlimit-cpu=300 -l '**'
Pass 1: parsed user script and 112 library script(s) using
219868virt/37520res/2928shr/34884data kb, in 280usr/50sys/330real ms.
Killed

real    5m34.124s
user    5m32.627s
sys     0m1.075s
====

So, how long does doing 'stap -l **' take? On this VM (2 cpus, 2GB of ram), it
takes over 15 minutes:

====
time stap -v -l '**' > stap.log
Pass 1: parsed user script and 112 library script(s) using
219868virt/37520res/2924shr/34884data kb, in 270usr/40sys/325real ms.
Pass 2: analyzed script: 9649 probe(s), 2892 function(s), 122 embed(s), 404
global(s) using 572132virt/388464res/4396shr/387148data kb, in
927390usr/2750sys/931367real ms.

real    15m31.976s
user    15m27.846s
sys     0m2.894s
====

15 minutes is too long. We could work around this by extending the rlimit value
in the testsuite, but really we should look into ways to speed this up. Pass 1
is short, Pass 2 takes too long.

In addition, I see some odd/unexpected output (which may be part of the reason
this is taking so long). For example, in the networking tapset, there is the
following probe alias:

====
probe netdev.receive
        =  kernel.function("netif_receive_skb")
====

This is the only place in the tapsets where the netif_receive_skb() kernel
function is referenced. However, in the 'stap -l **' output I see the
following:

====
kernel.function("netif_receive_skb@net/core/dev.c:3592").call
kernel.function("netif_receive_skb@net/core/dev.c:3592").callee("__net_timestamp@include/linux/skbuff.h:2603")
kernel.function("netif_receive_skb@net/core/dev.c:3592").callee("__stack_chk_fail@kernel/panic.c:483")
kernel.function("netif_receive_skb@net/core/dev.c:3592").callee("enqueue_to_backlog@net/core/dev.c:3096")
kernel.function("netif_receive_skb@net/core/dev.c:3592").callee("get_rps_cpu@net/core/dev.c:2929")
kernel.function("netif_receive_skb@net/core/dev.c:3592").callee("skb_defer_rx_timestamp@net/core/timestamping.c:104")
kernel.function("netif_receive_skb@net/core/dev.c:3592").callee("static_key_false@include/linux/jump_label.h:93")
kernel.function("netif_receive_skb@net/core/dev.c:3592").exported
kernel.function("netif_receive_skb@net/core/dev.c:3592").return
====

I wouldn't have expected to see netif_receive_skb() in the 'stap -l **' output
at all, and certainly not all its variants. In the 'stap -l **' output I see
similar output for several (if not all) kernel functions that a probe alias
uses.

-- 
You are receiving this mail because:
You are the assignee for the bug.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]