This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: clone with CLONE_VM behavior



On 15-04-2016 01:56, Carlos O'Donell wrote:
> On 04/13/2016 09:32 AM, Adhemerval Zanella wrote:
>> In fact, pthread does not really require it, since it will use CLONE_THREAD
>> and the glibc clone implementation won't change THERAD_SELF pid/tid field.
>> So the question is if glibc really should change pid/tid value for
>> CLONE_VM flag.
>>
>> My impression is to allow programs that might use this flag to be able
>> to use the raise function and force pthread call to fail (due invalid
>> handle), however I am not convinced it is the right path to take
>> (and with proposed solution on BZ#15368 I think using cached value is
>> not an option).
> 
> Florian and I discussed this briefly, and I think we came to some more
> dire conclusions. After a cup of coffee and further discussions with
> Adhemerval and Szabolcs Nagy I realized things are not so bad. I would
> still like to see Florian review this list and agree with me.
> 
> I went through several questions when thinking about this issue, and so
> I wrote them all down, even though only (e) applies.
> 
> (a) Non-problem: vfork.
> 
> All glibc vfork implementations set tcb->pthread->pid to -PID during the vfork
> operation. Calls to getpid or getppid in a signal handler at this point will
> return valid results from having called getpid or getppid from the kernel
> (because they see a negative value in the cache), and will *not* refresh the 
> tcb->pthread->tid cache (tid is non-zero) because doing so would result in
> the cache having a stale value in the child and a signal handler could observe
> that.
> 
> Corollary: All async-signal-safe functions that operate on the tid/pid cache
>            directly must understand -PID/-TID.
> 
> (b) Non-problem: failed exec after vfork changes parent errno value.
> 
> A failed exec after vfork sets errno, and that sets the parent's errno, and
> thus vfork technically sets errno but is not described as setting errno.
> POSIX says that since vfork doesn't describe setting errno that the value
> is unspecified, and so this is OK from a standards perspective that calling
> vfork may change the parent's errno.

As you stated vfork it not an issue here, since it will use either the vfork
syscall or clone directly, without calling the __clone.

> 
> (c) Non-problem: pthread_join from another thread to a vfork-ing thread.
> 
> A vfork-ing thread will have it's child write -PID into it's own PID cache,
> but the TID remains intact. Therefore another thread may still call
> pthread_join on the thread that is vfork-ing and wait for it to exit.
> 
> (d) Problem: raise is not async-signal safe.
> 
> Still true. See https://sourceware.org/bugzilla/show_bug.cgi?id=15368
> Though with (e) fixed there is one less scenario to worry about.

This BZ is on my radar and the proposed solution, although it is considering
the default implementation (nptl/raise.c), instead of the Linux one
(unix/sysv/linux/{pt-}raise.c).

I think the approach outline in the bug report is the correct one,
although slower (it much lock all signals, execute a gettid).

> 
> (e) Problem: pthread_join from another thread to a posix_spawn'ing thread.
> 
> The test nptl/tst-exec1.c calls pthread_join on a thread using posix_spawn,
> and this results in a testsuite failure given the new posix_spawn
> implementation. The thread spawning the child using posix_spawn is able
> to accelerate the spawn with clone, but that results in tid/pid caches
> set to -1 and thus the pthread_join from the other thread fails, and the
> test fails.
> 
> The real problem is that __clone sets tid/pid to -1 in child using
> parent memory. 
> 
> All implmentations of clone.S in glibc write -1 to tid/pid in the
> tcb->pthread->pid/tid fields. This modifies the tid/pid of the parent.
> It would seem that the purpose of this is to reset the tid/pid cache
> such that a signal handler calling gepid/getpid in the child will
> call the syscall to get the new correct results. The consequence is
> that any external parallel thread calling pthread_join will see the
> value of -1 in tid and be unable to wait on the thread calling vfork
> (there is no way to get the right tid value anymore). You should be
> able to join on the thread calling posix_spawn. I expect it was an
> oversight that pthread_join fails to work in this case e.g. vfork
> from a thread was not considered.
> 
> There are two theoretical solutions, but only one is practical:
> 
> (e.1) Remove setting tid/pid to -1 in __clone and provide users with a clone
>       that does *nothing* to the parent.
> 
>       This would fix a complaint from ASAN which has them using a direct clone
>       syscall because our clone wrapper breaks the parent's tid/pid with CLONE_VM
>       but not CLONE_THREAD (they use this for a detached pseudo-monitor thread).
> 
>       https://github.com/llvm-mirror/compiler-rt/blob/master/lib/sanitizer_common/sanitizer_linux.cc#L881
> 
>       Verify that posix_spawn{p} does not modify tid/pid. The child created by
>       posix_spawn{p} will have default signal dispositions so no signal handler
>       exists that can observe the wrong tid/pid.
> 
>       Change the tst-pid* tests in glibc to verify tid/pid does not change.

This is the approach I am working currently and the one I think the correct way.
As a side note, even with this modification ASAN will need to carry one its
own clone implementation because it requires the clone syscall to not modify
anything in its own pthread, not even the pid/tid in case of success.

To really fix it for ASAN it will require to remove the new pid/tid write
and move it for fork() code and fix the places in GLIBC where it calls
clone directly (sysdeps/unix/sysv/linux/system.c).

> 
> (e.2) Create a fake tcb->pthread.
> 
>       In the parent create a tcbhead_t, and struct pthread, and set them up with
>       enough data they are valid, and reference the parent's TLS data or a copy
>       of the parent's TLS data.
> 
>       Call clone with with CLONE_SETTLS pointing at the tcbhead_t copy, with
>       CLONE_PARENT_SETTID pointing at the pid, and CLONE_CHILD_SETTID pointing
>       at the tid. This will mean that you enter the child with unique data
>       for struct pthread, tcbhead_t, and unique and correct values for pid/tid
>       which are the same.
> 
>       You can immediately take a signal and the returned values from getpid
>       and getppid in a signal handler would be valid.
> 
>       It is unclear exactly how hard it will be to get a valid working tcbhead_t
>       or struct pthread setup and initialized for everything to operate
>       correctly.

I do not really this approach because it add much more complexity than the
implementation really requires to setup a cloned task (just check TLS_INIT_TP)
to execute posix_spawn.

> 
> In practice (1) is the most reasonable.
> 
> The only problem is if we have an expectation that user code can call clone
> or __clone2 and have getpid and getppid work correctly in the child? I think
> the answer to that is: No we don't have any such expectation because clone
> is designed for low-level runtime implementation.
> 
> I support removing the -1 tid/pid reset in clone/__clone2 (all clone.S assembly
> files for all machines).
> 
> Other comments?

I think the main problem is here is if we rellay should support direct clone
syscall. GLIBC requires some internal state update for all the clone internal
usage (either in thread creation, fork, and now posix_spawn) so pursing it
might be problematic.

As Szabolcs has pointed out in IRC, clone usage outside GLIBC are *very*
specific and contains some very dubious assumptions. For instance systemtap [1]
do uses clone directly with CLONE_VM, but does not seem to rely on getpid
to work.

[1] http://sources.debian.net/src/systemtap/3.0-2/testsuite/systemtap.clone/dtrace_clone.c/?hl=27#L27




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]