This is the mail archive of the cygwin-developers mailing list for the Cygwin project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: How to make child of failed fork exit cleanly?

From: Ryan Johnson <ryan dot johnson at cs dot utoronto dot ca>
To: cygwin-developers at cygwin dot com
Date: Tue, 03 May 2011 19:03:07 -0400
Subject: Re: How to make child of failed fork exit cleanly?
References: <4DC02339.8060606@cs.utoronto.ca> <20110503184125.GA26180@calimero.vinschen.de>

On 03/05/2011 2:41 PM, Corinna Vinschen wrote:

On May 3 11:46, Ryan Johnson wrote:

Hi all,

I'm working on some changes to fork() which would detect early the
case where a parent-child pair have unresolvable differences in
address space layout (e.g. thread stacks, heaps, or
statically-linked dlls which moved).

Detecting the problem turned out to be pretty easy, but making the
child exit cleanly is not. This leads to two questions, followed by
what I have figured out so far while attempting to answer them
myself.

1. What's the best way to make a child process notify the parent
that the fork() cannot succeed, and exit cleanly?

Usually by using some helpful status code which then can be recognized
by child_info::proc_retry.

Wonderful. I copied the example of heap.cc in calling fork_info->handle_failure(), and it works beautifully... except when the child seg faults.

Given that the cause of the fork failure is known (rather than some
surprise or bug), I propose that the messages go to some strace
channel (a new one for fork, perhaps?) and that the child exit
without attempting to generate a dump file (especially since dump

Sounds ok to me, if you're really sure that the situation is not
recoverable.

I wish it was recoverable, because it's a huge pain for applications which pull in many statically-linked dlls and then fork (emacs, gcc, python, ...). Unfortunately, I know of no way to force-unload a dll brought in by the nt loader. If one of those dlls lands in the wrong place, we're stuck even if we figure out how to get rid of whatever heap/stack/file-map was in the way.

generation itself has a tendency to cause crashes). It would also be
good, in cases where the parent is the reason for fork failures, to
prevent Windows from respawning the process so many times (though it
is admittedly handy when the child was the problem and the fork
succeeds on the nth try).

See above. That's handled in child_info::proc_retry.

I'll keep that in mind, but for now I'm leaving it alone. Technically it's always possible the fork could succeed, and I don't know how effectively I could identify a bad parent in the general case (other than seeing that fork fails repeatedly).

All of this still leaves the question of
how to exit the child process, "properly" though. Is it necessary to
wait for dll initialization to finish first, for example?

I'm not sure I understand the question.  How do you know which
DLL is already initialized and which isn't?

I'm talking about a call to dll_list::alloc, due to a DLL_LINK which did not map to its parent's address. At this point we know the fork has failed and there's no point continuing to try.

When this happens there are several possibilities: {windows, cygwin dll} x {DLL_LINK, DLL_LOAD} x {match or not match parent base addr}. - We know no DLL_LOAD has been mapped, let alone initialized, but AFAICT that currently doesn't stop cygwin dll finalizers (copied over from the parent) from running. Maybe I missed something here because I'd expect this to cause far more trouble than it seems to in practice (I'm still testing the statically-linked version of my toy cygwin-breaker). - For windows DLL_LINK, the initializers may or may not have already run, but they're supposed to load/unload correctly even if none of their dependencies are available, so I wouldn't expect any trouble from them. Incidentally, AFAICT we don't care whether the parent and child bases match. I can't find any code that checks whether such dlls loaded at the same address, and they don't end up in the dll_list. - For cygwin DLL_LINK, the initializers did not run (because of in_forkee=1), but the initialized data has been copied over from the parent. Currently we run the finalizers regardless of whether the child's base addresses match those of the parent. Mayhem results, even for matched dlls (see my other email -- perhaps libgcc_s tries to follow invalid pointers into mismatched dlls?).

In other words, if we realize part way through forking that it's not going to work, we're in trouble (and "partway" here means everything between the parent's call to CreateProcess call and assigning in_forkee=false). Some dlls are inconsistent and none of their finalizers are safe to run; others appear consistent but have been poisoned by state from inconsistent dlls (libgcc_s!), so some unknown subset of finalizers can be run. Still other dlls really are consistent, and it's arguably bad if we don't run their finalizers, but I don't know how to identify them.

For the moment I've just disabled all finalizers if in_forkee=1, on the premise that it's better to risk not runing a valid finalizer than to risk running an invalid one. That made the access violations go away, though I still occasionally see an error closing the pinfo handle which causes the child to abort and the parent to seg fault (why???).

Thoughts?
Ryan

Follow-Ups:
- Re: How to make child of failed fork exit cleanly?
  - From: Corinna Vinschen

References:
- How to make child of failed fork exit cleanly?
  - From: Ryan Johnson
- Re: How to make child of failed fork exit cleanly?
  - From: Corinna Vinschen

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]