This is the mail archive of the ecos-discuss@sources.redhat.com mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: accept() behaviour (out of file descriptors)


Thomas BINDER <Thomas.Binder@frequentis.com> writes:

> Unfortunately the problem has a little deeper impact. We are not
> talking about regular use of (lots of) file descriptors here. Think
> about the case where file descriptors are consumed erroneously. In
> our application the thread (telnet server) that waits for incoming
> connections would suddenly run into an endless loop and some of the
> other threads (those with lower or equal prio) would not get the CPU
> any longer. Now go ahead and find the real problem :-). Increasing
> the number of filedescriptors does not help either.

But this sort of thing should only happen during development. Anything
that causes an application to eat up all the file descriptors during
deployment is a bug.

> 
> Now, one could certainly argue that a telnet server should sleep for
> a certain period when accept fails. But what about a Web-Server
> (which we also use in a different project)?. Is it a good idea to
> sleep between consecutive (failed) accepts? From a quick look at the
> eCos Web-Server I believe that this problem is also not properly
> handled there (consecutive array lookup with index -1).
>

Adding a delay to the loop while debugging the problem may allow other
threads to run and print an error message and so help you eliminate
the bug. But it does not need to be there permanently.

> How do you suggest to use accept() in eCos?
>

I suggest you find the bug that is eating up all your file descriptors
and fix that, rather than worry about a symptom. There will always be
obscure corner cases where eCos will behave slightly different from
Linux or BSD. This is a consequence of being an embedded OS rather
than a fully-featured general purpose OS. We have to make compromises
in things like the amount of resource we devote to certain aspects, or
the complexity of the code we use to implement them. If we made the
effort to fully duplicate the behaviour of Linux/BSD we would end up
just as large and complex.


> > I don't like that at all. It breaks the layering and would make the
> > introduction of different network stacks difficult.
>
> I am afraid I don't understand that. All network stacks use
> callbacks (into mempools) to allocate/de-allocate resources (mbufs,
> sockets). What's the catch of using a callback to allocate a file
> descriptor / pointer (as the original FreeBSD stack does)? What else
> was the FIXME originally meant for?


Those callbacks are into other parts of the same package. BSD is just
one big lump of code with very loose interfaces between modules, Linux
is even worse and doesn't seem to have any clean interfaces at all.

One of the compromises we have made in the design of eCos is to keep
the interface between the FILEIO package and network stacks
simple. The FILEIO package deals entirely with file descriptors, the
network stacks know nothing about them. This gives us the freedom to
reimplement or even eliminate code and data if we want. Moving this
knowledge down into the stack makes the interface more complex,
exposes routines that were never intended to be an API and makes the
task of porting a network stack to eCos more onerous. All of this to
fix one obscure corner case that is itself merely a symptom of a more
serious application bug.


-- 
Nick Garnett                    eCos Kernel Architect
http://www.ecoscentric.com      The eCos and RedBoot experts


-- 
Before posting, please read the FAQ: http://sources.redhat.com/fom/ecos
and search the list archive: http://sources.redhat.com/ml/ecos-discuss


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]