This is the mail archive of the
ecos-devel@sourceware.org
mailing list for the eCos project.
Re: NAND technical review
Ross Younger wrote:
Jonathan Larmour wrote:
I wonder if Ross has any performance data for E he could contribute?
I have done a little benchmarking and so have _some_ numbers to hand, but
the goalposts are moving and my figures are a bit old and must be treated
with caution...
On the EA LPC2468 board (Samsung K9 NAND chip), with the state of my code on
July 8, compiling with -O2 and asserts off, my NAND benchmarker reported
average page read times[*] of 3578us per page, programming 2680us, and
erasing 1848us. These stack up against the fastest-possible raw chip times
(which I computed from the "typical" times on the datasheet) of 88.5, 363.5
and 2000us.
To double check, you mean reading was slowest, programming was faster and
erasing was fastest, even apparently faster than what may be the
theoretical fastest time? (I use the term "fast" advisedly, mark).
Are you sure there isn't a problem with your driver to cause such figures? :-)
This led to a YAFFS throughput data rate, on a recently-erased NAND array,
of up to 480kB/s in reading and 578kB/s in writing. (Actual rates vary
depending on the size of chunk you pass to read() and write().)
I wonder if Rutger has the ability to compare with his YAFFS throughput.
OTOH, as you say, the controller plays a large part, and there's no common
ground with R so it's entirely possible no comparison can be fair for
either implementation.
The board is based on the Samsung S3C2410X ucontroller and carries the same
Samsung K9 NAND chip as on the EA LPC2468. Now, this CPU has a dedicated
NAND controller with hardware ECC... After I taught the library to use h/w
ECC I immediately saw a 46% speedup on reads and 38% on writes when compared
with software ECC. I've also added an option to do a partial loop unroll in
the read and write cycles which gives a further 4% boost on reads and 15% on
writes.
Just to be sure, are the differences measured by these percentages purely
in terms of overall data throughput per time?
I'm very interested in the fact that software changes you made, had such a
relatively large change to the performance. If that's true, this seems to
go against the possibility that waiting for hardware (the NAND chip) may
have figured as the dominating component of the time (which would mean the
software components of the overall time are lost in the noise). Instead
the software latency required in setting up the next operation can be
noticeable - which was my concern with R in my mail of 2009-10-15 which
you're replying to.
The current (work-in-progress) numbers I have from the benchmarker
are 452us per page read, 623us per write and 1934us per erase; YAFFS
throughput is similarly impressive at 4690 kB/s in reads and 3432 kB/s in
writes. (Charles Manning has stated publicly several times that if you want
YAFFS to be fast, you should start by looking at the speed of your NAND driver.)
Hmm, as opposed to what though? YAFFS itself isn't able to change much.
Of course, we're not comparing apples with apples here; the S3C2410X is an
ARM9 whose CPU clock runs at 200MHz, but the EA LPC2468 is an ARM7TDMI
running at just 48MHz, but even so the speed-up given by hardware ECC
demonstrates that option to be a no-brainer.
Hence my surprise at E not having support, even in principle, before! But
clearly you're at the stage where stuff is nearly working. I look forward
to a code drop, as the APIs would benefit from comparison with R's. It
looks like R has considered a variety of interesting ECC hardware so it
would be interesting to see if E's could cope.
BTW: Some profiling and souping up is on my todo list, and some more
benchmarking will probably happen at that time. When I implement hardware
ECC support on the STM3210E I intend to produce some before and after numbers.
Just as an aside, you may find that improving eCos more generally to have
e.g. assembler optimised implementation of memcpy/memmove/memset (and
possibly others) may improve performance of these and other things across
the board. GCC's intrinsics can only do so much. (FAOD actual
implementations to use (at least to start with) can be found in newlib.
Jifl
--
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine