This is the mail archive of the cygwin@cygwin.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Catalog of TCP socket problems


During the last year a number of TCP socket problems have been
reported, mainly on Win98/ME. This message catalogs them
and gives references to the discussion threads.
The second part discusses some proposed solutions.

Win98/ME
1) CLOSE_WAIT / WSAENOBUFS
http://support.microsoft.com/default.aspx?scid=kb;EN-US;q229658
   Application level fix:  fcntl("close on fork")
http://cygwin.com/ml/cygwin-patches/2002-q2/msg00039.html
   Cygwin level fix:       Corinna's socket/pid bookkeeping
http://cygwin.com/ml/cygwin-patches/2002-q2/msg00049.html

2) Steve Chew ssh -R / persisting listen sockets 
http://sources.redhat.com/ml/cygwin/2002-04/msg00515.html
   Application level fix: make socket blocking before close
   Cygwin level fix:      make socket blocking before close
http://cygwin.com/ml/cygwin-patches/2002-q2/msg00107.html

3) Unexpected exit from ssh or other "forked workers"
http://cygwin.com/ml/cygwin-patches/2002-q2/msg00102.html
   Application level fix:  fcntl("close on fork")
   Cygwin level fix: (???) do not duplicate "listen" sockets after 
                           an accept() has succeeded

4) Jonathan Kamens (below), with extra read() hanging while wait for EOF
http://cygwin.com/ml/cygwin-patches/2002-q2/msg00117.html
   Application level fix:  shutdown()
   Cygwin level fix:       Corinna's socket/pid bookkeeping

5) Steve Chew ssh -R when no server is present
http://sources.redhat.com/ml/cygwin/2002-04/msg00515.html
   Fix:                    ????????

NT
1) Jonathan Kamens socketpair() / linger on close hack
http://cygwin.com/ml/cygwin/2001-07/msg00758.html
   Application level fix:  shutdown()
http://cygwin.com/ml/cygwin/2001-07/msg00815.html
   Cygwin level fix:       Corinna's socket/pid bookkeeping

2) Apache CLOSE_WAIT
http://sources.redhat.com/ml/cygwin/2001-10/msg01171.html
    Fix:                    ???????

**********************************************************************
As discussed in http://cygwin.com/ml/cygwin/2001-07/msg00815.html
the best solution to the NT problem #1 and Win98 #4 is to have Cygwin
issue shutdown() on the last close(). This was dismissed for now. 
The "bookkeeping" solution is based on processes and may be easier
to implement. It also helps Win98/ME. Its drawback is that a read() 
waiting for EOF returns when all processes with a copy of the 
socket are done, not when the last close() occurs.  

As I see it, there are three main cases to consider in a bookkeeping
solution, depending how much interprocess communication is required.

1) PID_A is a long lived process. It opens a socket, forks PID_B. 
PID_B forks other processes. 
When PID_B exits all subprocesses are already terminated.
In that case it is enough for Cygwin in PID_A to really close
the socket when PID_B terminates, if it has already been close()
in PID_A.
This can be accomplished without changes to the Cygwin interprocess
communication mechanism, only local bookkeeping is required.
It probably covers 90% of the applications (sshd, inetd (I think), 
qpopper, Jonathan Kamens example...).
Looks like an excellent benefit/work ratio.

2) Same as in 1), but some subprocesses are still running (with
ppid = 1 ) when PID_B exits. I see two solutions:
2a) PID_B "reparents (like)" the subprocesses, making PID_A wait for
them and close the socket after they terminate.
2b) PID_B signals to PID_A that it is logically exited, but keeps
running in a "angel" state until subprocesses are done.

3) PID_A exits while PID_B is still alive. If so, some 
kind of "angel" state would be necessary.
By the way, I just tried that on WinME. PID_A does socketpair(),
children B & C use it. Bug Win98 #4 occurs as expected.
In addition, if parent is gone when child write(),
get "Socket operation on non-socket". But if parent is gone
and the reader has closed its (useless) write socket, then the 
write() succeeds. The list above is already incomplete :( 

Regarding solution Win98/ME #2, I think the easiest is to split the
Cygwin socket close in two cases: 
a) NT: keep "linger on close" for now, it helps with #1. 
b) Win98/ME: set blocking, if not already set.

Finally, having Cygwin work around MS bugs is much better than having
applications do it. However if Cygwin doesn't do it, having hooks 
(e.g. "close on fork") to fix applications is better than nothing.

Pierre



--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]