This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Unwarranted assumption in tst-waitid, or a kernel bug?
- From: ppluzhnikov at google dot com (Paul Pluzhnikov)
- To: libc-alpha at sourceware dot org
- Cc: ppluzhnikov at google dot com
- Date: Tue, 21 Sep 2010 10:43:28 -0700 (PDT)
- Subject: Unwarranted assumption in tst-waitid, or a kernel bug?
Greetings,
We've recently noticed intermittent failures in posix/tst-waitid and
rt/tst-mqueue5 under newer kernels.
Attached test is distilled from tst-waitid, and
- passes on kernels 2.6.18
- fails after ~30000 iterations on kernels 2.6.26
- fails after ~10 iterations on kernels 2.6.34
In addition to kernels we build ourselves, the failure has been observed on
"stock" Lucid distribution (2.6.32-24-generic #41-Ubuntu SMP), as well as
Fedora 11 (2.6.29.6-167.fc11.i586) and Fedora 13 (2.6.33.3-85.fc13.i686,
2.6.34.6-54.fc13.x86_64), but only on multi-processor machines.
The test succeeds when built with -DSKIP_SIGSTOP.
Is there some standard that says that glibc expectaion is correct, and
the SIGCHLD *must* be delevered before waitpid() returns?
If not, it seems that tst-waitid should be fixed (e.g. by nanosleep()ing
for 1 usec, though there is probably a better fix).
Thanks,
--
Paul Pluzhnikov
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <signal.h>
#include <string.h>
#include <time.h>
#include <sys/types.h>
#include <sys/wait.h>
int expecting_sigchld;
int fork_counter;
void sighandler(int signo)
{
if (signo != SIGCHLD) {
fprintf(stderr, "Unexpected signal %d\n", signo);
abort();
}
if (!expecting_sigchld) {
fprintf(stderr, "Unexpected SIGCHLD, fork_counter = %d\n", fork_counter);
abort();
}
}
#ifndef SKIP_SIGSTOP
# define SKIP_SIGSTOP 0
#endif
int main()
{
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sa.sa_handler = &sighandler;
sa.sa_flags = SA_RESTART;
if (0 != sigaction(SIGCHLD, &sa, NULL)) {
perror("sigaction");
abort();
}
while (fork_counter++ < 1000000) {
int pid, status;
struct timespec ts = { 0, 1000 }; // 1 usec
switch ((pid = fork())) {
case -1:
perror("fork");
abort();
case 0:
// child
while (1) sleep(3600);
abort(); // unreached
default:
// parent
expecting_sigchld = 1;
#if !SKIP_SIGSTOP
kill(pid, SIGSTOP);
if (pid != waitpid(pid, &status, WUNTRACED)) {
perror("waitpid");
abort();
}
// A reasonable expectation is that SIGCHLD is delivered
// before waitpid() returns successfully.
expecting_sigchld = 0;
nanosleep(&ts, NULL); // aborts on Lucid
expecting_sigchld = 1;
kill(pid, SIGCONT);
#endif
expecting_sigchld = 1;
kill(pid, SIGKILL);
if (pid != waitpid(pid, &status, 0)) {
perror("waitpid");
abort();
}
// A reasonable expectation is that SIGCHLD is delivered
// before waitpid() returns successfully.
expecting_sigchld = 0;
break;
}
if (fork_counter % 10000 == 0) {
// Print progress.
fprintf(stderr, ".");
}
}
fprintf(stderr, "\n");
return 0;
}