This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [Qemu-devel] Using the qemu tracepoints with SystemTap

From: Stefan Hajnoczi <stefanha at gmail dot com>
To: William Cohen <wcohen at redhat dot com>
Cc: qemu-devel at nongnu dot org, SystemTAP <systemtap at sources dot redhat dot com>
Date: Wed, 14 Sep 2011 16:32:50 +0100
Subject: Re: [Qemu-devel] Using the qemu tracepoints with SystemTap
References: <4E6E262E.6060400@redhat.com> <CAJSP0QVCJwVXb6owsbsuXKOiA=dfpUz22FXWX2K2b60i+4HHFQ@mail.gmail.com> <4E6F8087.1040101@redhat.com>

On Tue, Sep 13, 2011 at 5:10 PM, William Cohen <wcohen@redhat.com> wrote:
> On 09/13/2011 06:03 AM, Stefan Hajnoczi wrote:
>> On Mon, Sep 12, 2011 at 4:33 PM, William Cohen <wcohen@redhat.com> wrote:
>>> The RHEL-6 version of qemu-kvm makes the tracepoints available to SystemTap. I have been working on useful examples for the SystemTap tracepoints in qemu. There doesn't seem to be a great number of examples showing the utility of the tracepoints in diagnosing problems. However, I came across the following blog entry that had several examples:
>>>
>>> http://blog.vmsplice.net/2011/03/how-to-write-trace-analysis-scripts-for.html
>>>
>>> I reimplemented the VirtqueueRequestTracker example from the blog in SystemTap (the attached virtqueueleaks.stp). I can run it on RHEL-6's qemu-kvm-0.12.1.2-2.160.el6_1.8.x86_64 and get output like the following. It outputs the pid and the address of the elem that leaked when the script is stopped like the following:
>>>
>>> $ stap virtqueueleaks.stp
>>> ^C
>>> ? ? pid ? ? elem
>>> ? 19503 ?1c4af28
>>> ? 19503 ?1c56f88
>>> ? 19503 ?1c62fe8
>>> ? 19503 ?1c6f048
>>> ? 19503 ?1c7b0a8
>>> ? 19503 ?1c87108
>>> ? 19503 ?1c93168
>>> ...
>>>
>>> I am not that familiar with the internals of qemu. The script seems to indicates qemu is leaking, but is that really the case? ?If there are resource leaks, what output would help debug those leaks? What enhancements can be done to this script to provide more useful information?
>>
>
> Hi Stefan,
>
> Thanks for the comments.
>
>> Leak tracing always has corner cases :).
>>
>> With virtio-blk this would indicate a leak because it uses a
>> request-response model where the guest initiates I/O and the host
>> responds. ?A guest that cleanly shuts down before you exit your
>> SystemTap script should not leak requests for virtio-blk.
>
> I stopped the systemtap script while the guest vm was still running. So when the guest vm cleanly shuts down there should be a series of virtqueue_fill operations that will remove those elements?

In the case of virtio-blk we only see pop and fill when the guest
tells the host to process an I/O request.  That means once the request
is complete we've done the fill and there is no outstanding virtqueue
element anymore.  When a guest shuts down cleanly it will have no
pending virtio-blk I/O requests and hence the pops and fills balance
out (there is no leak).

I tried to explain that with virtio-net the situation is different
since the guest gives us a bunch of virtqueue elements that point to
receive packet buffers.  When the guest shuts down it will reset the
virtio device, which will clear all virtqueues and hence the popped
elements are no longer relevant.  I CCed you on a patch that adds a
virtio_set_status() trace event so we see when the guest driver resets
the virtio device.

The upshot of this is that you can use these trace events to check
that QEMU is returning all virtqueue elements to the guest and not
holding on to (leaking) them.  It's easy for virtio-blk.

> Qemu uses a thread for each virtual processor, but a single thread to handle all IO. It seems like that might be a possible bottleneck. What would be the path of io event from guest to host back to guest? Is there somthing that a script could do to gauge that delay due to the qemu io thread handling multiple processors?

I think it would be fantastic to look at the vm_exit and vm_entry
events.  These are the events that get raised each time a vcpu exits
guest mode and returns to the kvm.ko host kernel module and every time
we re-enter guest mode.  There are a couple of different ways to work
with this data:

1. Average, min, max, std deviation.  Answers what the general
vm_exit->vm_entry latency looks like.
2. Heat map of vm_exit->vm_entry latencies.  We can see more in a
heatmap, including how the latencies change over time and which are
most frequent.
3. Detailed vm_exit->vm_entry trace logs together with the events that
lead to the exit (e.g. I/O register access by guest).  This
information is useful for finding out why certain exits happened but
can already be captured today, so I would focus on 1 & 2 above which
can make use of SystemTap's power.

This sort of information can help track down instances where the vcpu
is being held up, either because QEMU's global mutex is held by
another thread or for various other reasons.

Stefan

References:
- Using the qemu tracepoints with SystemTap
  - From: William Cohen
- Re: [Qemu-devel] Using the qemu tracepoints with SystemTap
  - From: Stefan Hajnoczi
- Re: [Qemu-devel] Using the qemu tracepoints with SystemTap
  - From: William Cohen

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]