This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] posix: Fix improper assert in Linux posix_spawn (BZ#22273)
> Il giorno 20 ott 2017, alle ore 21:55, Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com> ha scritto:
>
> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
>
>> As noted by Florian Weimer, current Linux posix_spawn implementation
>> can trigger an assert if the auxiliary process is terminated before
>> actually setting the err member:
>>
>> 340 /* Child must set args.err to something non-negative - we rely on
>> 341 the parent and child sharing VM. */
>> 342 args.err = -1;
>> [...]
>> 362 new_pid = CLONE (__spawni_child, STACK (stack, stack_size), stack_size,
>> 363 CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
>> 364
>> 365 if (new_pid > 0)
>> 366 {
>> 367 ec = args.err;
>> 368 assert (ec >= 0);
>>
>> Another possible issue is killing the child between setting the err and
>> actually calling execve. In this case the process will not ran, but
>> posix_spawn also will not report any error:
>>
>> 269
>> 270 args->err = 0;
>> 271 args->exec (args->file, args->argv, args->envp);
>>
>> As suggested by Andreas Schwab, this patch removes the faulty assert
>> and also handles any signal that happens before fork and execve as the
>> spawn was successful (and thus relaying the handling to the caller to
>> figure this out). Different than Florian, I can not see why using
>> atomics to set err would help here, essentially the code runs
>> sequentially (due CLONE_VFORK) and I think it would not be legal the
>> compiler evaluate ec without checking for new_pid result (thus there
>> is no need to compiler barrier).
>>
>> Summarizing the possible scenarios on posix_spawn execution, we
>> have:
>>
>> 1. For default case with a success execution, args.err will be 0, pid
>> will not be collected and it will be reported to caller.
>>
>> 2. For default failure case, args.err will be positive and the it will
>> be collected by the waitpid. An error will be reported to the
>> caller.
>>
>> 3. For the unlikely case where the process was terminated and not
>> collected by a caller signal handler, it will be reported as succeful
>> execution and not be collected by posix_spawn (since args.err will
>> be 0). The caller will need to actually handle this case.
>>
>> 4. For the unlikely case where the process was terminated and collected
>> by caller we have 3 other possible scenarios:
>>
>> 4.1. The auxiliary process was terminated with args.err equal to 0:
>> it will handled as 1. (so it does not matter if we hit the pid
>> reuse race since we won't possible collect an unexpected
>> process).
>>
>> 4.2. The auxiliary process was terminated after execve (due a failure
>> in calling it) and before setting args.err to -1: it will also
>> be handle as 1. but with the issue of not be able to report the
>> caller a possible execve failures.
>>
>> 4.3. The auxiliary process was terminated after args.err is set to -1:
>> this is the case where it will be possible to hit the pid reuse
>> case where we will need to collected the auxiliary pid but we
>> can not be sure if it will be expected one. I think for this
>> case we need to actually change waitpid to use WNOHANG to avoid
>> hanging indefinitely on the call and report an error to caller
>> since we can't differentiate between a default failure as 2.
>> and a possible pid reuse race issue.
>>
>> Checked on x86_64-linux-gnu.
>
> posix/tst-spawn and posix/tst-spawn2 started to fail after this patch.
> Reproduced on aarch64 [1], ppc [2], ppc64 [3] and ppc64le.
>
> posix/tst-spawn.out:
> error: tst-spawn.c:257: not true: WEXITSTATUS (status) == 0
> error: 1 test failures
>
> posix/tst-spawn2.out:
> error: tst-spawn2.c:56: waitpid: No such file or directory)
> error: 1 test failures
>
> [1] http://glibc-buildbot.reserved-bit.com/builders/glibc-aarch64-linux/builds/760
> [2] http://glibc-buildbot.reserved-bit.com/builders/glibc-ppc-linux/builds/152
> [3] http://glibc-buildbot.reserved-bit.com/builders/glibc-power8-linux/builds/728
>
> --
> Tulio Magno
Right, I only indeed checked on x86_64. I will check this out.