This is the mail archive of the
mailing list for the glibc project.
Re: Unwarranted assumption in tst-waitid, or a kernel bug?
On Tue, Sep 21, 2010 at 8:43 PM, Oleg Nesterov <email@example.com> wrote:
> On 09/21, Roland McGrath wrote:
>> As far as I can tell, Linux has never had a guarantee like this. ?From a
>> cursory look at the code in a few versions, I think the differences
>> you've seen between kernel versions are due to scheduling changes, not
>> that the actual local constraints in the exit/SIGCHLD/wait code paths
>> have changed at all.
> Paul, I guess that this test-case "fails" after kill(pid, SIGSTOP),
Yes, the failure is:
missing SIGCHLD on stopped
And is coming from line 358 in posix/tst-waitid.c:
334 expecting_sigchld = 1;
335 if (kill (pid, SIGSTOP) != 0)
337 printf ("kill (%d, SIGSTOP): %m\n", pid);
338 RETURN (EXIT_FAILURE);
340 pid_t wpid = waitpid (pid, &fail, WUNTRACED);
341 if (wpid < 0)
343 printf ("waitpid WUNTRACED on stopped: %m\n");
344 RETURN (EXIT_FAILURE);
346 else if (wpid != pid)
348 printf ("waitpid WUNTRACED on stopped returned %d != %d
349 wpid, pid, fail);
350 RETURN (EXIT_FAILURE);
352 else if (!WIFSTOPPED (fail) || WIFSIGNALED (fail) || WIFEXITED (fail)
353 || WIFCONTINUED (fail) || WSTOPSIG (fail) != SIGSTOP)
355 printf ("waitpid WUNTRACED on stopped: status %x\n", fail);
356 RETURN (EXIT_FAILURE);
358 CHECK_SIGCHLD ("stopped", CLD_STOPPED, SIGSTOP);
> I am a bit surprised it never fails on 2.6.18. I think you can add
> a small delay into finish_stop() (before it takes tasklist_lock),
> then I believe it should fail the same way.
You are probably in better position to confirm this -- I don't usually
build kernels :-)
Anyway, assuming we all agree the assumption is unwarranted, what is
the correct way to fix tst-waitid.c ?
And while I have your attention, is it possible for the same problem
to manifest itself in rt/tst-mqueue5.c ?
Here the failure is "missing SIGRTMIN" at line 120:
114 /* Parent calls mqsend (q), which should trigger notification. */
116 (void) pthread_barrier_wait (b3);
118 if (rtmin_cnt != 2)
120 puts ("SIGRTMIN signal in child did not arrive");
121 result = 1;
(I have not yet tried to produce a small test case for this, but the
fact that signal delivery also appears to be delayed here makes me
think that it might be the same issue.)