This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: gettimeofday_ns accuracy


Hi,

I've tested this on the latest snapshot (systemtap-20100925), and it's still
a problem.

Is anyone particularly looking into this or do you need any more info or
anything from me?

Has anyone been able to reproduce it, or is it just me?

Thanks,

Rob.

> -----Original Message-----
> From: Rob Donovan [mailto:rob@proivrc.com]
> Sent: 05 September 2010 17:47
> To: 'Steve Fink'
> Cc: 'Frank Ch. Eigler'; 'systemtap@sourceware.org'
> Subject: RE: gettimeofday_ns accuracy
> 
> Hi,
> 
> Well, I've done some more testing...
> 
> I turned off Power management (by putting "apm=off acpi=off" in
> /etc/grub.conf and rebooting).
> 
> GHz still showed 1400, and now gettimeofday_s() always returns '0'.
> 
> I don't think that has anything to do with the bug mind you, because the
> other machine I try it on at work doesn't show the GHz incorrectly, and I
still
> get the problem on that computer also, but as I said, only on RHEL6 not
> RHEL5.
> 
> One thing I didn't say, is that this problem seems to get worse when lots
of
> processes are running or starting, so once the script has started if you
also
> run 2 commands to create processes, something like.....  watch -n 0 "ps
aux |
> grep stap"
> 
> Once the stap script is running, I get some of the messages that the time
has
> gone backwards by 1 second, but then as I said after a while it then
starts
> reporting that the difference is many hours.
> 
> One thing I've noticed, is that when it does jump to the difference that
is hrs,
> the time that it reports instead of the correct time, is the time that the
script
> started! And then it just gets higher.
> 
> Also, when it was reporting 1 second diff, the 'last time' was always on
CPU 3
> mostly, sometimes on 1, and the 'now' time was always on CPU 0 or 2.
Exactly
> when it starts showing a diff of 1hr or more, it switches to the 'last
time' on
> CPU 2 or 0, and the 'now' time on CPU 3 or 1.... not sure if that is
relevant or
> not...
> 
> Below is new output, with the CPU field for the 2 times...
> 
> Started at Sun Sep  5 14:55:28 2010
> proc_exec: Oh dear, last time was Sun Sep  5 14:55:54 2010 (CPU=3) but now
> its Sun Sep  5 14:55:53 2010 (CPU=0)
> proc_exec: Oh dear, last time was Sun Sep  5 14:55:56 2010 (CPU=3) but now
> its Sun Sep  5 14:55:55 2010 (CPU=2)
> proc_exec: Oh dear, last time was Sun Sep  5 14:56:03 2010 (CPU=3) but now
> its Sun Sep  5 14:56:02 2010 (CPU=0)
> proc_exec: Oh dear, last time was Sun Sep  5 14:56:07 2010 (CPU=3) but now
> its Sun Sep  5 14:56:06 2010 (CPU=0)
> proc_exec: Oh dear, last time was Sun Sep  5 14:56:09 2010 (CPU=3) but now
> its Sun Sep  5 14:56:08 2010 (CPU=2)
> proc_exec: Oh dear, last time was Sun Sep  5 14:56:52 2010 (CPU=3) but now
> its Sun Sep  5 14:56:51 2010 (CPU=0)
> proc_exec: Oh dear, last time was Sun Sep  5 14:56:53 2010 (CPU=3) but now
> its Sun Sep  5 14:56:52 2010 (CPU=0)
> proc_exec: Oh dear, last time was Sun Sep  5 14:56:55 2010 (CPU=3) but now
> its Sun Sep  5 14:56:54 2010 (CPU=0)
> proc_exec: Oh dear, last time was Sun Sep  5 14:57:00 2010 (CPU=3) but now
> its Sun Sep  5 14:56:59 2010 (CPU=2)
> proc_exec: Oh dear, last time was Sun Sep  5 14:57:02 2010 (CPU=3) but now
> its Sun Sep  5 14:57:01 2010 (CPU=2)
> proc_exec: Oh dear, last time was Sun Sep  5 14:57:04 2010 (CPU=3) but now
> its Sun Sep  5 14:57:03 2010 (CPU=2)
> proc_exec: Oh dear, last time was Sun Sep  5 14:57:08 2010 (CPU=1) but now
> its Sun Sep  5 14:57:07 2010 (CPU=2)
> proc_exec: Oh dear, last time was Sun Sep  5 14:57:18 2010 (CPU=3) but now
> its Sun Sep  5 14:57:17 2010 (CPU=0)
> proc_exec: Oh dear, last time was Sun Sep  5 14:57:23 2010 (CPU=3) but now
> its Sun Sep  5 14:57:22 2010 (CPU=0) ....
> ....
> ....
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:49 2010 (CPU=3) but now
> its Sun Sep  5 16:17:48 2010 (CPU=0)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:49 2010 (CPU=3) but now
> its Sun Sep  5 16:17:48 2010 (CPU=2)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:49 2010 (CPU=3) but now
> its Sun Sep  5 16:17:48 2010 (CPU=0)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:49 2010 (CPU=3) but now
> its Sun Sep  5 16:17:48 2010 (CPU=0)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:49 2010 (CPU=1) but now
> its Sun Sep  5 14:55:28 2010 (CPU=3)   <- this is where it starts back to
the time
> the scripts starts!
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:48 2010 (CPU=2) but now
> its Sun Sep  5 14:55:28 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:48 2010 (CPU=2) but now
> its Sun Sep  5 14:55:28 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:49 2010 (CPU=2) but now
> its Sun Sep  5 14:55:28 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:49 2010 (CPU=2) but now
> its Sun Sep  5 14:55:28 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:49 2010 (CPU=2) but now
> its Sun Sep  5 14:55:28 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:49 2010 (CPU=0) but now
> its Sun Sep  5 14:55:28 2010 (CPU=3)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:49 2010 (CPU=2) but now
> its Sun Sep  5 14:55:28 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:49 2010 (CPU=2) but now
> its Sun Sep  5 14:55:29 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:49 2010 (CPU=2) but now
> its Sun Sep  5 14:55:29 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:50 2010 (CPU=2) but now
> its Sun Sep  5 14:55:29 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:50 2010 (CPU=2) but now
> its Sun Sep  5 14:55:29 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:50 2010 (CPU=2) but now
> its Sun Sep  5 14:55:29 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:50 2010 (CPU=2) but now
> its Sun Sep  5 14:55:30 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:50 2010 (CPU=2) but now
> its Sun Sep  5 14:55:30 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:50 2010 (CPU=2) but now
> its Sun Sep  5 14:55:30 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:51 2010 (CPU=0) but now
> its Sun Sep  5 14:55:30 2010 (CPU=3)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:51 2010 (CPU=0) but now
> its Sun Sep  5 14:55:30 2010 (CPU=3)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:51 2010 (CPU=2) but now
> its Sun Sep  5 14:55:30 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:51 2010 (CPU=2) but now
> its Sun Sep  5 14:55:30 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:51 2010 (CPU=2) but now
> its Sun Sep  5 14:55:30 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:51 2010 (CPU=2) but now
> its Sun Sep  5 14:55:30 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:51 2010 (CPU=2) but now
> its Sun Sep  5 14:55:31 2010 (CPU=3)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:51 2010 (CPU=2) but now
> its Sun Sep  5 14:55:31 2010 (CPU=3)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:51 2010 (CPU=2) but now
> its Sun Sep  5 14:55:31 2010 (CPU=1)
> proc_exec: Oh dear, last time was Sun Sep  5 16:17:51 2010 (CPU=2) but now
> its Sun Sep  5 14:55:31 2010 (CPU=1)
> 
> -----Original Message-----
> From: systemtap-owner@sourceware.org [mailto:systemtap-
> owner@sourceware.org] On Behalf Of Steve Fink
> Sent: 03 September 2010 17:58
> To: Rob Donovan
> Cc: Frank Ch. Eigler; systemtap@sourceware.org
> Subject: Re: gettimeofday_ns accuracy
> 
> On Fri, Sep 3, 2010 at 1:56 AM, Rob Donovan <rob@proivrc.com> wrote:
> > Hi,
> >
> > Sorry for the delay...
> >
> > I think I've got a worse problem now with gettimeofday now.
> >
> > If you run a script for a long time, it seems that the
> > gettimeofday_s() sometimes goes backwards by 1 second. And if you
> > leave it running for 30 minutes (or up to 2 hrs sometimes) then some
> > probes are returning with the time 2-3 hrs ago.
> >
> > I have the following script below, and if you start it, it will start
> > off with just a few errors of 1 second, but if you leave it running,
> > it will get worse and worse.
> 
> Just speculating - does it still happen if you turn off power management?
> 
> Your output says the CPU is a 3.73GHz machine running at 1.4GHz, which I
> think means you have frequency scaling happening. I'm wondering if the
> larger differences are from CPUs getting completely shut down for a while
or
> something.
> 
> Can you print out the CPU ids in the output as well?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]