This is the mail archive of the ecos-bugs@sourceware.org mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug 1000738] Redboot networking problem


http://bugs.ecos.sourceware.org/show_bug.cgi?id=1000738


Andrew Lunn <andrew.lunn@ascom.ch> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrew.lunn@ascom.ch




--- Comment #1 from Andrew Lunn <andrew.lunn@ascom.ch>  2009-04-16 16:12:27 ---
There is a race condition with closing the socket and opening the next socket.

The normal code path is:

http_client.c opens the first socket and transfers data. Once finished it calls
http_stream_close() which calls __tcp_abort(). __tcp_abort() starts a timer
with a delay of 1ms. After that 1ms delay the function do_abort() is called
which sends a TCP ACK and RST packet and then unlinks the socket structure from
the linked list of sockets.

The race happens because the socket structure is a member of the static
singleton http_stream in http_client.c. What i think is happening is that after
the http_stream_close(), you are starting a second http transfer, before the
1ms delay. This results in the http_stream->sock structure being added to the
linked list for a "second time", messing up the list pointers, and so giving
your endless loop. When you delay your next http transfer for a short while,
bigger an 1ms, the socket gets removed from the list before it is added to the
list and everybody is happy.

How to solve this problem? _tcp_open has code like:

     // Send off the SYN packet to open the connection
    tcp_send(s, TCP_FLAG_SYN, 0);
    // Wait for connection to establish
    while (s->state != _ESTABLISHED) {
        if (s->state == _CLOSED) {
            diag_printf("TCP open - host closed connection\n");
            return -1;
        }
        if (--timeout <= 0) {
            diag_printf("TCP open - connection timed out\n");
            return -1;
        }
        MS_TICKS_DELAY();
        __tcp_poll();
    }
    return 0;

Maybe abort needs something similar:

void
__tcp_abort(tcp_socket_t *s, unsigned long delay)
{
  int timeout = 10;

  __timer_set(&abort_timer, delay, do_abort, s);

  while (s->state != _CLOSED) {
        if (--timeout <= 0) {
            diag_printf("TCP close - connection failed to close\n");
            return;
        }
        MS_TICKS_DELAY();
        __tcp_poll();
    }     
}


It also looks like there could be a second similar race condition when the
connection breaks. The code calls __tcp_close(&s->sock) and returns. Maybe a
call to __tcp_close_wait() is needed?


-- 
Configure bugmail: http://bugs.ecos.sourceware.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]