This is the mail archive of the cygwin-developers@cygwin.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: TCP connections can occasionally fail because of a winsock bug

[Get raw message]
On Thu, Nov 15, 2001 at 08:00:18PM -0700, robert bowman wrote:
> On Thursday 15 November 2001 14:21, you wrote:
> > I've dug deeply enough into this to determine that I believe the
> > problem is caused by a bug in winsock.  I can get the problem to
> > manifest itself completely independently from Cygwin.  See the full
> > description in the attached program, which one of my coworkers with an
> > MSDN subscription is going to forward to Microsoft to see what they
> > have to say about it.
> 
> For what it's worth, we recently encountered this problem in the ONC RPC 
> library. The original Sun code, and any revision I've been able to find, 
> binds a local port even on the TCP protocol. The same thing happens, with the 
> bind not failing, and the failure occurring on the connect. 
> 
> We depend on RPC heavily, and would see delays on startup when the inital 
> clnt_create would fail repeatedly. The RPC attempts to use a pool of local 
> ports, and will increment and retry if the bind fails -- but it doesn't.
> 
> This is not a cygwin issue; we are using the MKS/DataFocus NutCracker 
> toolkit. DataFocus provided the ported ONC RPC code but does not support it.  
> We have been tinkering with it in-house. The bind can be eliminated for some 
> improvement, in this case. 
> 
> There are other issues we are dealing with. I've forwarded a couple of the 
> emails to another programmer at work who is also working on NT/2000 socket 
> issues.
> 
> Interestingly enough, on Linux, the bind also fails unless the process has 
> root priveleges. However, the code only iterates on EADDRINUSE and the return 
> is not checked, so the connect succeeds. 
> 
> I, also, wrote a native testcase with the WSA calls and got the same results. 
> I did note that the OS expires the port eventually, but it takes 5 to 20 
> minutes. 
> 
> I believe the root of the problem is that both the remote host address and 
> local port are used to determine if the connection is unique. bind would fail 
> if anything other than ANY_ADDR is used, so at the time of the bind it isn't 
> known if the combination is unique. Only when the host address is known in 
> connect, will the combination fail.
> 
> Our problem was exacerbated by the fact several apps are typically started at 
> the same time on one station, and they are all trying to make RPC connections 
> to the server machine. The ONC RPC algo uses the pid to calculate  which port 
> to try first; with several clients starting and making several connection, 
> there would be groups of used ports; if a connection timed out, and the next 
> attempt moved into a cluster of ports being used by another app, the 
> clnt_create would fail many times, before it finally iterated into fresh 
> territory.

Thanks for that interesting description.  There's that SO_REUSEADDR
call to setsockopt().  I wonder if that could be a help.  It's
treated somewhat dangerous, though. 

Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Developer                                mailto:cygwin@cygwin.com
Red Hat, Inc.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]