This is the mail archive of the ecos-discuss@sourceware.cygnus.com mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

TCP/IP checksum routine performance



I've been playing with various system performance issues for
the past couple days. According to the results of running
nc_test_slave, the checksum routine (in_cksum.c) is taking more
time than all of the other instrumented routines combined.
This isn't very surprising.

One should get a pretty good performance improvement by writing
a checksum routine in ARM assembly since you could use the
add-with-carry instruction to sum 32 bits at a time and the
load-multiple instruction to read 32 bytes into registers at a
time.  The timing results for a somewhat artifical test loop
that does a checksum on a chain of 4 mbufs of varying lengths:

   assembly routine:   5.04 seconds
   C routine w/ -O0:  10.62 seconds
   C routine w/ -O1:  15.95 seconds
   C routine w/ -O2:  15.95 seconds
   C routine w/ -O3:  15.95 seconds

I thought maybe the assembly routine would turn out be a bit
more of an improvement, but it's not too bad.  I'm not sure how
well my test case matches reality or how well tuned my assembly
routine is.  For example, I could tune the assembly routine to
be optimal for the lengths of IP or TCP headers.  I'm guessing
that the performance difference between C and assembly code
would be greater with cache disabled (I think I'll try that
next.)

The really odd thing is that the code generated by gcc -O[123]
is 50% slower than the code generated with -O0.  At first I
thought there was something wrong with my test procedure, but
when I looked at the assembly language gnerated with
optimization on, it's indeed about 50% longer.  The C version
of the routine is an impressive piece of work considering it's
completely portable and only takes twice as long as my
carefully crafted assembly language.  It does, however, seem to
confuse the gcc optimizer something awful.

So, the executive summary for ARM users is: you can probably
improve your TCP/IP performance noticably by compiling
in_cksum.c with optimization turned off.

-- 
Grant Edwards
grante@visi.com

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]