This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: consistent printf race conditions


Scripts below.  Unfortunately, part of what I'm using this script for is figuring out what the workload is, so I can't provide you with a reliable way to replicate it.  I'll try to include some of the surrounding output on the offchance that might help.  FWIW, this is SystemTap 1.3.

-----
probe syscall.close.return {
  uid = uid()
  track_uid =   uid == 42 // normally does more checks like this
  if ((execname() != "stap" && execname() != "salt-minion" && execname() != "stap-io" && track_uid) && $return == 0) {
    // checks if pid and fd are in an associative array
    if (!is_fd_blacklisted(pid(), $fd)) {
        printf("{\"execname\":\"%s\",\"fd\":%d,\"op\":\"%s\",\"pid\":%d,\"ppid\":%d,\"return\":%d,\"timestamp\":%d,\"uid
\":%d}\n",clean_string(execname()),$fd,clean_string("CLOSE"),pid(),ppid(),$return,gettimeofday_ms(),uid())
    }
    unblacklist_fd(pid(), $fd)
    delete write_fds[pid(), $fd]
    delete read_fds[pid(), $fd]
  }   
}
-----
probe kernel.function("sys_wait4").call {
  uid = uid()
  track_uid =   uid == 0
  if ((execname() != "stap" && execname() != "salt-minion" && execname() != "stap-io" && track_uid)) {
    printf("{\"execname\":\"%s\",\"op\":\"%s\",\"options\":%d,\"pid\":%d,\"ppid\":%d,\"timestamp\":%d,\"uid\":%d}\n",cle
an_string(execname()),clean_string("WAITPID_CALL"),$options,$pid,ppid(),gettimeofday_ms(),uid())
  }
}
-----

Here's some output surrounding one of those errors (changed some values for security):
-----
{"execname":"bash","fd":3,"op":"CLOSE","pid":13562,"ppid":13561,"return":0,"timestamp":1347749678387,"uid":1}
{"execname":"bash","op":"PIPE","pid":13562,"pipe0":3,"pipe1":4,"ppid":13561,"return":0,"timestamp":1347749678387,       "uid":1}
{"execname":"bash","op":"FORK","pid":13562,"ppid":13561,"return":15121,"timestamp":1347749678387,"uid":1}
{"execname":"bash","fd":4,"op":"CLOSE","pid":15121,"ppid":13562,"return":0,"timestamp":1347749678387,"uid":1}
{"execname":"bash","fd":3,"op":"CLOSE","pid":13562,"ppid":13561,"return":0,"timestamp":1347749678388,"uid":1}
{"execname":"bash","fd":4,"op":"CLOSE","pid":13562,"ppid":13561,"return":0,"timestamp":1347749678388,"uid":1}
{"execname":"bash","op":"WAITPID_CALL","options":10,"pid":-1,"ppid":13561,"timestamp":134{"execname":"bash","fd":       3,"op":"CLOSE","pid":15121,"ppid":13562,"return":0,"timestamp":1347749678388,"uid":1}
{"args":"\"ls\" \"--color=tty\" \"-l\" \"t\"\"Output/\"","execname":"bash","filename":"<unknown>","op":"EXECVE_CALL","pid":15121,"ppid":13562,"pwd":"somepath","timestamp":1347749678388,"uid":1}
{"execname":"ls","fd":3,"op":"CLOSE","pid":15121,"ppid":13562,"return":0,"timestamp":1347749678388,"uid":1}
{"execname":"ls","op":"EXECVE_RETURN","pid":15121,"ppid":13562,"return":0,"timestamp":1347749678388,"uid":1}

On Oct 24, 2012, at 7:27 PM, Josh Stone <jistone@redhat.com> wrote:

> On 10/24/2012 01:40 PM, Nicholas Murphy wrote:
>> I've been consistently seeing the following race condition: a printf from a close() system call pre-empts printing of a sys_wait4 hook.  Here's the sys_wait4 printf code:
>> 
>> printf("{\"execname\":\"%s\",\"op\":\"%s\",\"options\":%d,\"pid\":%d,\"ppid\":%d,\"timestamp\":%d,\"uid\":%d}\n",clean_string(execname()),clean_string("WAITPID_CALL"),$options,$pid,ppid(),gettimeofday_ms(),uid())
>> 
>> ...and here's an example of what I end up seeing printed:
>> 
>> {"execname":"bash","op":"WAITPID_CALL","options":10,"pid":-1,"ppid":13561,"timestamp":134{"execname":"bash","fd":3,"op":"CLOSE","pid":15121,"ppid":13562,"return":0,"timestamp":1347749678388,"uid":56441}
>> 
>> I don't see anything in the printf that would cause it to be pre-empted by another printf.  Any idea why I'm seeing this and what I could do to avoid it?  It seems to always be waitpid conflicting with close, so I could perhaps add an access to a common associative array to force synchronization between the two?
> 
> Can you provide a more complete script which demonstrates the error,
> with at least the two close and wait probes?  And what sort of workload
> do you run where it consistently fails?
> 
> I don't know why it would do this, but it will help if I can reproduce
> the corruption locally.
> 
> It would also help if you could attach the output of stap-report.
> 
> Thanks,
> Josh


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]