This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
RE: gettimeofday_ns accuracy
- From: "Rob Donovan" <rob at proivrc dot com>
- To: <systemtap at sourceware dot org>
- Date: Fri, 1 Oct 2010 13:45:03 +0100
- Subject: RE: gettimeofday_ns accuracy
Hi,
I've tested this on the latest snapshot (systemtap-20100925), and it's still
a problem.
Is anyone particularly looking into this or do you need any more info or
anything from me?
Has anyone been able to reproduce it, or is it just me?
Thanks,
Rob.
> -----Original Message-----
> From: Rob Donovan [mailto:rob@proivrc.com]
> Sent: 05 September 2010 17:47
> To: 'Steve Fink'
> Cc: 'Frank Ch. Eigler'; 'systemtap@sourceware.org'
> Subject: RE: gettimeofday_ns accuracy
>
> Hi,
>
> Well, I've done some more testing...
>
> I turned off Power management (by putting "apm=off acpi=off" in
> /etc/grub.conf and rebooting).
>
> GHz still showed 1400, and now gettimeofday_s() always returns '0'.
>
> I don't think that has anything to do with the bug mind you, because the
> other machine I try it on at work doesn't show the GHz incorrectly, and I
still
> get the problem on that computer also, but as I said, only on RHEL6 not
> RHEL5.
>
> One thing I didn't say, is that this problem seems to get worse when lots
of
> processes are running or starting, so once the script has started if you
also
> run 2 commands to create processes, something like..... watch -n 0 "ps
aux |
> grep stap"
>
> Once the stap script is running, I get some of the messages that the time
has
> gone backwards by 1 second, but then as I said after a while it then
starts
> reporting that the difference is many hours.
>
> One thing I've noticed, is that when it does jump to the difference that
is hrs,
> the time that it reports instead of the correct time, is the time that the
script
> started! And then it just gets higher.
>
> Also, when it was reporting 1 second diff, the 'last time' was always on
CPU 3
> mostly, sometimes on 1, and the 'now' time was always on CPU 0 or 2.
Exactly
> when it starts showing a diff of 1hr or more, it switches to the 'last
time' on
> CPU 2 or 0, and the 'now' time on CPU 3 or 1.... not sure if that is
relevant or
> not...
>
> Below is new output, with the CPU field for the 2 times...
>
> Started at Sun Sep 5 14:55:28 2010
> proc_exec: Oh dear, last time was Sun Sep 5 14:55:54 2010 (CPU=3) but now
> its Sun Sep 5 14:55:53 2010 (CPU=0)
> proc_exec: Oh dear, last time was Sun Sep 5 14:55:56 2010 (CPU=3) but now
> its Sun Sep 5 14:55:55 2010 (CPU=2)
> proc_exec: Oh dear, last time was Sun Sep 5 14:56:03 2010 (CPU=3) but now
> its Sun Sep 5 14:56:02 2010 (CPU=0)
> proc_exec: Oh dear, last time was Sun Sep 5 14:56:07 2010 (CPU=3) but now
> its Sun Sep 5 14:56:06 2010 (CPU=0)
> proc_exec: Oh dear, last time was Sun Sep 5 14:56:09 2010 (CPU=3) but now
> its Sun Sep 5 14:56:08 2010 (CPU=2)
> proc_exec: Oh dear, last time was Sun Sep 5 14:56:52 2010 (CPU=3) but now
> its Sun Sep 5 14:56:51 2010 (CPU=0)
> proc_exec: Oh dear, last time was Sun Sep 5 14:56:53 2010 (CPU=3) but now
> its Sun Sep 5 14:56:52 2010 (CPU=0)
> proc_exec: Oh dear, last time was Sun Sep 5 14:56:55 2010 (CPU=3) but now
> its Sun Sep 5 14:56:54 2010 (CPU=0)
> proc_exec: Oh dear, last time was Sun Sep 5 14:57:00 2010 (CPU=3) but now
> its Sun Sep 5 14:56:59 2010 (CPU=2)
> proc_exec: Oh dear, last time was Sun Sep 5 14:57:02 2010 (CPU=3) but now
> its Sun Sep 5 14:57:01 2010 (CPU=2)
> proc_exec: Oh dear, last time was Sun Sep 5 14:57:04 2010 (CPU=3) but now
> its Sun Sep 5 14:57:03 2010 (CPU=2)
> proc_exec: Oh dear, last time was Sun Sep 5 14:57:08 2010 (CPU=1) but now
> its Sun Sep 5 14:57:07 2010 (CPU=2)
> proc_exec: Oh dear, last time was Sun Sep 5 14:57:18 2010 (CPU=3) but now
> its Sun Sep 5 14:57:17 2010 (CPU=0)
> proc_exec: Oh dear, last time was Sun Sep 5 14:57:23 2010 (CPU=3) but now
> its Sun Sep 5 14:57:22 2010 (CPU=0) ....
> ....
> ....
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:49 2010 (CPU=3) but now
> its Sun Sep 5 16:17:48 2010 (CPU=0)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:49 2010 (CPU=3) but now
> its Sun Sep 5 16:17:48 2010 (CPU=2)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:49 2010 (CPU=3) but now
> its Sun Sep 5 16:17:48 2010 (CPU=0)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:49 2010 (CPU=3) but now
> its Sun Sep 5 16:17:48 2010 (CPU=0)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:49 2010 (CPU=1) but now
> its Sun Sep 5 14:55:28 2010 (CPU=3) <- this is where it starts back to
the time
> the scripts starts!
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:48 2010 (CPU=2) but now
> its Sun Sep 5 14:55:28 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:48 2010 (CPU=2) but now
> its Sun Sep 5 14:55:28 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:49 2010 (CPU=2) but now
> its Sun Sep 5 14:55:28 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:49 2010 (CPU=2) but now
> its Sun Sep 5 14:55:28 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:49 2010 (CPU=2) but now
> its Sun Sep 5 14:55:28 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:49 2010 (CPU=0) but now
> its Sun Sep 5 14:55:28 2010 (CPU=3)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:49 2010 (CPU=2) but now
> its Sun Sep 5 14:55:28 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:49 2010 (CPU=2) but now
> its Sun Sep 5 14:55:29 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:49 2010 (CPU=2) but now
> its Sun Sep 5 14:55:29 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:50 2010 (CPU=2) but now
> its Sun Sep 5 14:55:29 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:50 2010 (CPU=2) but now
> its Sun Sep 5 14:55:29 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:50 2010 (CPU=2) but now
> its Sun Sep 5 14:55:29 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:50 2010 (CPU=2) but now
> its Sun Sep 5 14:55:30 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:50 2010 (CPU=2) but now
> its Sun Sep 5 14:55:30 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:50 2010 (CPU=2) but now
> its Sun Sep 5 14:55:30 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:51 2010 (CPU=0) but now
> its Sun Sep 5 14:55:30 2010 (CPU=3)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:51 2010 (CPU=0) but now
> its Sun Sep 5 14:55:30 2010 (CPU=3)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:51 2010 (CPU=2) but now
> its Sun Sep 5 14:55:30 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:51 2010 (CPU=2) but now
> its Sun Sep 5 14:55:30 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:51 2010 (CPU=2) but now
> its Sun Sep 5 14:55:30 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:51 2010 (CPU=2) but now
> its Sun Sep 5 14:55:30 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:51 2010 (CPU=2) but now
> its Sun Sep 5 14:55:31 2010 (CPU=3)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:51 2010 (CPU=2) but now
> its Sun Sep 5 14:55:31 2010 (CPU=3)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:51 2010 (CPU=2) but now
> its Sun Sep 5 14:55:31 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep 5 16:17:51 2010 (CPU=2) but now
> its Sun Sep 5 14:55:31 2010 (CPU=1)
>
> -----Original Message-----
> From: systemtap-owner@sourceware.org [mailto:systemtap-
> owner@sourceware.org] On Behalf Of Steve Fink
> Sent: 03 September 2010 17:58
> To: Rob Donovan
> Cc: Frank Ch. Eigler; systemtap@sourceware.org
> Subject: Re: gettimeofday_ns accuracy
>
> On Fri, Sep 3, 2010 at 1:56 AM, Rob Donovan <rob@proivrc.com> wrote:
> > Hi,
> >
> > Sorry for the delay...
> >
> > I think I've got a worse problem now with gettimeofday now.
> >
> > If you run a script for a long time, it seems that the
> > gettimeofday_s() sometimes goes backwards by 1 second. And if you
> > leave it running for 30 minutes (or up to 2 hrs sometimes) then some
> > probes are returning with the time 2-3 hrs ago.
> >
> > I have the following script below, and if you start it, it will start
> > off with just a few errors of 1 second, but if you leave it running,
> > it will get worse and worse.
>
> Just speculating - does it still happen if you turn off power management?
>
> Your output says the CPU is a 3.73GHz machine running at 1.4GHz, which I
> think means you have frequency scaling happening. I'm wondering if the
> larger differences are from CPUs getting completely shut down for a while
or
> something.
>
> Can you print out the CPU ids in the output as well?