This is the mail archive of the cygwin-developers@cygwin.com mailing list for the Cygwin project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

TCP connections can occasionally fail because of a winsock bug

From: Jonathan Kamens <jik at curl dot com>
To: cygwin-developers at cygwin dot com
Date: 15 Nov 2001 16:21:56 -0500
Subject: TCP connections can occasionally fail because of a winsock bug

[Get raw message]

If you run "ssh <any-host> date" over and over again with the current
openssh package from Cygwin installed, it will occasionally fail.
You're more likely to get it to happen if you run a bunch of loops,
e.g.:

  HOST=<whatever>
  while ssh $HOST date > /dev/null; do true; done &
  while ssh $HOST date > /dev/null; do true; done &
  while ssh $HOST date > /dev/null; do true; done &
  while ssh $HOST date > /dev/null; do true; done &
  while ssh $HOST date > /dev/null; do true; done &
  while ssh $HOST date > /dev/null; do true; done &
  while ssh $HOST date > /dev/null; do true; done &
  while ssh $HOST date > /dev/null; do true; done &

Eventually, all the loops will exit because they'll all encounter a
failure.  This will happen more quickly on SMP machines, but it'll
eventually happen on uniprocessor machines too (although you may need
to run more loops concurrently to make it happen).

I've dug deeply enough into this to determine that I believe the
problem is caused by a bug in winsock.  I can get the problem to
manifest itself completely independently from Cygwin.  See the full
description in the attached program, which one of my coworkers with an
MSDN subscription is going to forward to Microsoft to see what they
have to say about it.

This problem is in no way unique to ssh.  Any program which uses
connect() to connect to remote hosts could occasionally fail.  Note,
furthermore, that the problem still occurs if the client program
doesn't bind() the socket before the connect() -- the only reason I do
the bind() in the sample program below is to make it possible to print
out the local port number after a failure.

I don't think there's anything that Cygwin can do about this.  I can't
imagine the disgustingness that would ensue if we attempted to code a
workaround for this failure.  Yech.  I'm just sending this message to
cygwin-developers as a heads-up so that if anybody encounters any
mysterious TCP connection failures with Cygwin in the future, you'll
know that there's a possibility that the failures have nothing to do
with Cygwin.

The reason why this started happening in ssh when it wasn't happening
before is this change which was checked into the OpenSSH source tree
on August 6:

   - markus@cvs.openbsd.org 2001/07/25 14:35:18
     [readconf.c ssh.1 ssh.c sshconnect.c]
     cleanup connect(); connection_attempts 4 -> 1; from 
     eivind@freebsd.org

That is, before August 6, ssh would attempt to connect four times by
default before giving up; after August 6, it only attempts to connect
once.  We should probably do something in the Cygwin ssh package to
work around the problem; I'll send a separate message to cygwin-apps
about that.

  Jonathan Kamens

/*

  This program illustrates a bug in the Windows winsock layer.  This
  bug manifests itself both on Windows NT 4.0 and Windows 2000.  The
  bug is that the winsock layer is apparently willing to assign a
  local port number to a socket which is actually already in use, such
  that when the program later tries to connect() the socket to a
  remote port, the connection fails with WSAEADDRINUSE.

  When you run the program, it will keep creating and closing socket
  connections over and over again.  You will notice in the output that
  once it gets WSAEADDRINUSE for a particular local port, it is never
  given that port again by winsock; apparently, winsock keeps track
  *after the fact* of which local ports are in use, but doesn't notice
  before the fact.  Eventually, bind will fail rather than connect,
  with WSAENOBUFS rather than WSAEADDRINUSE, and then the program will
  exit.

  If you run it again immediately after it exits, you will notice that
  it ends up using many of the ports which it was previously avoiding
  using.  So, apparently, once winsock decides that a port is in use,
  it stops using it in that process even when it becomes available
  again.  It could be argued that that's a second bug, in addition to
  the first bug that it assigns an in-use port in the first place.
*/

/*

  With Visual Studio, compile this program as "cl doecho.c
  ws2_32.lib".

  With Cygwin, compile this program with "gcc -mno-cygwin -mwindows
  doecho.c -lws2_32" if you want a native Windows version or "gcc
  doecho.c -lws32_32" for a Cygwin version.  Both the native Windows
  and Cygwin versions illustrate the bug.

  With Linux, compile the program with "gcc doecho.c".  You can then
  run it essentially forever and it will never print any errors, thus
  illustrating that there is a bug in winsock that isn't in Linux.

*/

/*

  Run the program with two arguments -- an IP address in dotted quad
  notation, and a port number.

  Separately from this program, you need to run a listener on the
  specified address and port.  All the listener should do is accept
  and close connections.  Something like this Perl script:
   
  use Socket;

  ($port = shift) || die;

  socket(ACCEPTOR, AF_INET, SOCK_STREAM, 0) || die;

  $iaddr = INADDR_ANY;
  $sockaddr = sockaddr_in($port, $iaddr);

  setsockopt(ACCEPTOR, SOL_SOCKET, SO_REUSEADDR, pack("l", 1));

  bind(ACCEPTOR, $sockaddr) || die;
  listen(ACCEPTOR, SOMAXCONN) || die;

  while (accept(CLIENT, ACCEPTOR)) {
      close(CLIENT);
  }

*/

#ifndef WIN32
#ifdef _MSC_VER
#define WIN32
#endif
#endif

#include <sys/types.h>
#include <stdio.h>
#include <string.h>
#ifdef WIN32
#include <winsock2.h>
#else
#include <sys/socket.h>
#include <netinet/in.h>
#endif

#ifndef WIN32
#define wsaperror perror
#else
void wsaperror(char *);
#endif

main(int argc, char *argv[])
{
#ifdef WIN32
  WSADATA wsaData;
#endif
  int succeeded = 0;
    struct sockaddr_in addr = {0};
    int i;
    char *endptr = NULL;

    if (argc != 3) {
      fprintf(stderr, "Must specify IP address and port arguments\n");
      exit(1);
    }

    addr.sin_family = AF_INET;
    if ((addr.sin_addr.s_addr = inet_addr(argv[1])) == INADDR_NONE) {
      fprintf(stderr, "First argument must be IP address\n");
      exit(1);
    }
    addr.sin_port = strtol(argv[2], &endptr, 10);
    if (! (*argv[2] && endptr && ! *endptr)) {
      fprintf(stderr, "Second argument must be port number\n");
      exit(1);
    }
    addr.sin_port = htons(addr.sin_port);

#ifdef WIN32
  if (WSAStartup(MAKEWORD(2, 2), &wsaData) != 0) {
    wsaperror("WSAStartup");
    exit(1);
  }
#endif
  while (1) {
    int s = socket(AF_INET, SOCK_STREAM, 0);
    struct sockaddr_in local_addr = {0};
#ifdef WIN32
    int name_len = sizeof(local_addr);
#else
    socklen_t name_len = sizeof(local_addr);
#endif

    if (s < 0) {
      wsaperror("socket");
      break;
    }

    i = 1;
    if (setsockopt(s, SOL_SOCKET, SO_REUSEADDR, (char *) &i, sizeof(i)) < 0) {
      wsaperror("setsockopt");
      break;
    }

    local_addr.sin_family = AF_INET;
    local_addr.sin_addr.s_addr = INADDR_ANY;

    if (bind(s, (struct sockaddr *) &local_addr, sizeof(local_addr)) < 0) {
      wsaperror("bind");
      break;
    }

    if (getsockname(s, (struct sockaddr *) &local_addr, &name_len) < 0) {
      wsaperror("getsockname after bind");
      break;
    }

    if (connect(s, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
      fprintf(stderr, "Local port failed: %d\n", ntohs(local_addr.sin_port));
      wsaperror("connect");
      continue;
    }

    succeeded++;

    if (getsockname(s, (struct sockaddr *) &local_addr, &name_len) < 0) {
      wsaperror("getsockname");
      break;
    }

    if (send(s, "foo\n", 4, 0) < 0) {
      wsaperror("send");
      break;
    }

/*      shutdown(s, 2); */

#ifdef WIN32
    closesocket(s);
#else
    close(s);
#endif

    fprintf(stderr, "%d\n", ntohs(local_addr.sin_port));
    fflush(stdout);
  }

  fprintf(stderr, "%d connections succeeded\n", succeeded);

#ifdef WIN32
  WSACleanup();
#endif
  exit(1);
}

#ifdef WIN32
void wsaperror(char *str) 
{
  int err = WSAGetLastError();
  char *errstr;

  switch (err) {
  case WSAEADDRINUSE:
    errstr = "WSAEADDRINUSE";
    break;
  case WSAENOBUFS:
    errstr = "WSAENOBUFS";
    break;
  default:
    errstr = "Unknown error";
    break;
  }
  
  fprintf(stderr, "%s: Error %d (%s)\n", str, err, errstr);
}
#endif

Follow-Ups:
- Re: TCP connections can occasionally fail because of a winsock bug
  - From: robert bowman

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]