This is the mail archive of the
ecos-bugs@sourceware.org
mailing list for the eCos project.
[Bug 1000738] Redboot networking problem
- From: bugzilla-daemon at ecoscentric dot com
- To: ecos-bugs at ecos dot sourceware dot org
- Date: Thu, 16 Apr 2009 16:12:30 +0100
- Subject: [Bug 1000738] Redboot networking problem
- References: <bug-1000738-13@http.bugs.ecos.sourceware.org/>
http://bugs.ecos.sourceware.org/show_bug.cgi?id=1000738
Andrew Lunn <andrew.lunn@ascom.ch> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |andrew.lunn@ascom.ch
--- Comment #1 from Andrew Lunn <andrew.lunn@ascom.ch> 2009-04-16 16:12:27 ---
There is a race condition with closing the socket and opening the next socket.
The normal code path is:
http_client.c opens the first socket and transfers data. Once finished it calls
http_stream_close() which calls __tcp_abort(). __tcp_abort() starts a timer
with a delay of 1ms. After that 1ms delay the function do_abort() is called
which sends a TCP ACK and RST packet and then unlinks the socket structure from
the linked list of sockets.
The race happens because the socket structure is a member of the static
singleton http_stream in http_client.c. What i think is happening is that after
the http_stream_close(), you are starting a second http transfer, before the
1ms delay. This results in the http_stream->sock structure being added to the
linked list for a "second time", messing up the list pointers, and so giving
your endless loop. When you delay your next http transfer for a short while,
bigger an 1ms, the socket gets removed from the list before it is added to the
list and everybody is happy.
How to solve this problem? _tcp_open has code like:
// Send off the SYN packet to open the connection
tcp_send(s, TCP_FLAG_SYN, 0);
// Wait for connection to establish
while (s->state != _ESTABLISHED) {
if (s->state == _CLOSED) {
diag_printf("TCP open - host closed connection\n");
return -1;
}
if (--timeout <= 0) {
diag_printf("TCP open - connection timed out\n");
return -1;
}
MS_TICKS_DELAY();
__tcp_poll();
}
return 0;
Maybe abort needs something similar:
void
__tcp_abort(tcp_socket_t *s, unsigned long delay)
{
int timeout = 10;
__timer_set(&abort_timer, delay, do_abort, s);
while (s->state != _CLOSED) {
if (--timeout <= 0) {
diag_printf("TCP close - connection failed to close\n");
return;
}
MS_TICKS_DELAY();
__tcp_poll();
}
}
It also looks like there could be a second similar race condition when the
connection breaks. The code calls __tcp_close(&s->sock) and returns. Maybe a
call to __tcp_close_wait() is needed?
--
Configure bugmail: http://bugs.ecos.sourceware.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.