This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: consistent printf race conditions
On 10/24/2012 01:40 PM, Nicholas Murphy wrote:
> I've been consistently seeing the following race condition: a printf from a close() system call pre-empts printing of a sys_wait4 hook. Here's the sys_wait4 printf code:
>
> printf("{\"execname\":\"%s\",\"op\":\"%s\",\"options\":%d,\"pid\":%d,\"ppid\":%d,\"timestamp\":%d,\"uid\":%d}\n",clean_string(execname()),clean_string("WAITPID_CALL"),$options,$pid,ppid(),gettimeofday_ms(),uid())
>
> ...and here's an example of what I end up seeing printed:
>
> {"execname":"bash","op":"WAITPID_CALL","options":10,"pid":-1,"ppid":13561,"timestamp":134{"execname":"bash","fd":3,"op":"CLOSE","pid":15121,"ppid":13562,"return":0,"timestamp":1347749678388,"uid":56441}
>
> I don't see anything in the printf that would cause it to be pre-empted by another printf. Any idea why I'm seeing this and what I could do to avoid it? It seems to always be waitpid conflicting with close, so I could perhaps add an access to a common associative array to force synchronization between the two?
Can you provide a more complete script which demonstrates the error,
with at least the two close and wait probes? And what sort of workload
do you run where it consistently fails?
I don't know why it would do this, but it will help if I can reproduce
the corruption locally.
It would also help if you could attach the output of stap-report.
Thanks,
Josh