This is the mail archive of the ecos-devel@sourceware.org mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: NAND technical review


Hi Ross,

First thanks very much for all this. Quite a bit to digest but only because it's extremely useful. Sorry for the number of questions I have - it's not meant to be inquisitorial, but obviously I need to get to the bottom of certain issues.

I've added Rutger to the CC as he may be able to comment on some of the issues I raise.

You can assume tacit acceptance/understanding of whatever I haven't commented on.

Ross Younger wrote:
Here goes with a comparison between the two in something close to their
current states (my 26/08 push to bugzilla 1000770, and Rutger's r659).

FWIW, Rutger is now up to r666.


However, not all chips are quite the same. The ONFI initiative is an attempt
to standardise chip protocols and most new chips should comply with it. A
number of chips on the market are _nearly_ ONFI-compliant: deviations
typically occur over the format of the ReadID response and that of an
address. I believe that older chips did their own thing entirely.

Good ONFI support should be the highest priority as that's the way everything is likely to go, although we do need the others too. OTOH, my experience of NOR flash chip interfaces is that standard specs are all well and good, but manufacturers still like to add their own touches. So I suspect ONFI will probably correspond to a common subset of functionality, but more would want to be done to improve support for individual chips in due course.


It can be beneficial to be able to set up the ready/busy line as an
interrupt source, as opposed to having to poll it. Whilst there is an
overhead involved in context-switching, if other application threads have
much to do it may be advantageous overall for the thread waiting for the
NAND to sleep until woken by interrupt.

Personally I would expect use as an interrupt line as the main role of the ready line.


Of course, it is possible to put multiple chips on a board. In that case
there needs to be a way to route between them; I would expect this to be
done with the Chip Select line, addressed either by different MMIO addresses
or a separate GPIO or CPLD step. Theoretically, multiple chips could be
hooked up in parallel to give something that looks like a 16 or 32-bit
"wide" chip, but I have never encountered this in the NAND world, and it
would impose a certain extra level of complexity on the driver.

Have you found on-chip (SoC's) NAND controllers permit such a configuration? If not, I would assume that it's not an expected hardware configuration. Rutger's layer does allow multiple chips per controller, but AFAICT that's just in the straightforward way.


What problems would you see, if any, using your layer with the same controller and two completely different chips, of different geometry? Can you still have a common codebase with other (different) platforms?

Is anyone aware of NAND chips with different sized blocks? Analogous to bootblocks with NOR (I haven't, but others will undoubtedly have seen more parts than I). Although it's possible that even if they're not around or common now, they may be in future. Unfortunately from what I can tell neither layer would be able to support that directly, although I think it may be possible for the eCosCentric layer to allow the driver to pretend there is a different NAND chip. Do you think so too?

2. Application interface -----------------------------------------------

Both layers have broadly similar application interfaces.

In both layers, an application must first use a `lookup' call which provides
a pointer to a device context struct. In Rutger's layer, devices are
identified by device number; in eCosCentric's, by a textual name set in the
board HAL.

A device number does seem to be a bit limiting, and less deterministic. OTOH, a textual name arguably adds a little extra complexity.


I note Rutger's layer needs an explicit init call, whereas yours DTRT using a constructor, which is good.

The basic operations required are reading a page, programming a page and
erasing a block, and both layers provide these.

However I believe Rutger's supports partial page writes (use of 'column'), whereas I don't believe eCosCentric's does.


The page-oriented operations optionally allow read/write of the page spare
area. These operations also automatically calculate and check an ECC, if the
device has been configured to do so. Rutger's layer has an extra hook in
place where an application may explicitly request the use of cached reading
and writing where the device supports this.

That seems like a useful potential optimisation, exploiting underlying capabilities. Any reason you didn't implement this?


I could also believe that NAND controllers can also optimise by doing multiple block reads, where this hint would also prove useful.

Both layers also support the necessary ancillary operations of querying the
status of a block in the bad-block table, and marking a block as bad.

Does your implementation _require_ a BBT in its current implementation? For simpler NAND usage, it may be overkill e.g. an application where the number of rewrites is very small, so the factory bad markers may be considered sufficient.


(a) Partitions
[snip]
R's interface does not have such a facility. It appears that, in the event
that the flash is shared between two or more logical regions, it's up to
higher-level code to be configured with the correct block ranges to use.

In yours, the block ranges must be configured in CDL. Is there much difference? I can see an advantage in writing platform-independent test programs. But in applications within products possibly less so. Especially since the flash geometry, including size, can be programmatically queried.


If there was to be a single firmware supporting multiple board revisions/configurations (as can definitely happen), which could include different sizes of NAND, I think R's implementation would be able to adapt better than E's, as the high-level program can divide up the sizes based on what it sees.

(b) Dynamic memory allocation

R's layer mandates the provision of malloc and free, or compatible
functions. These must be provided to the cyg_nand_init() call.

That's unfortunate - that limits its use in smaller boot loaders - a key application.


E's doesn't; instead it declares a small number of static buffers.

I assume everything is keyed off CYGNUM_NAND_PAGEBUFFER, and there are no other variables. Again I'm thinking of the scenario of single firmware - different board revs. Can you confirm?


Andrew Lunn opined on 6/3/09 that R's requirement for malloc is not a major
issue because the memory needs of that layer are well-bounded; I think I
broadly agree, though the situation is not ideal in that it forces somebody
who wants to use a lean, mean eCos configuration to work around.

The overhead of including something like malloc/free in the image may compare badly with the amount of memory R's needs to allocate in the first place. I also note that if R's implementation has program verifies enabled it allocates and frees a page _every_ time. If nothing else this could lead to heap fragmentation.


OTOH your implementation doesn't supports program verifies in the higher level anyway (I note your code comment about it being unnecessary as the device should report a successful program - your faith in correct hardware behaviour is considerable :-) ).

Also note that if you're going to run a full file system like YAFFS, you
can't avoid needing malloc, but in an application making simpler use of
NAND, it's an overhead that you may prefer to avoid.

It's true that YAFFS is likely to be the most common application though.


3. Driver model --------------------------------------------------------

[snip]

In eCosCentric's layer, a NAND driver is a single abstraction covering chip init and querying the factory-bad status as well as the high level functions (reading a page, etc). It is left to the driver to determine the sequence of commands to send. How the driver interacts with the device is considered to be a contract only between the driver and the relevant platform HAL, so is not formally abstracted by the NAND layer.

Indeed it's not dissimilar to the existing NOR flash layer.


- R's model shares the command sequence logic amongst all chips,
differentiating only between small- and large-page devices. (I do not know
whether this is correct for all current chips, though going forwards seems
less likely to be an issue as fully-ONFI-compliant chips become the norm.)

Hmm. Nevertheless, this is a concern for me with R's. I'm concerned it may be too prescriptive to be robustly future-proof.


If multiple chips of different types are present in a build, E's model
potentially duplicates code (though this could be worked around; also, an
ONFI driver ought to be written).

Worked around in a way likely to increase single-device footprint though. Shame about the lack of OFNI driver, although I guess the parts still aren't widely used which can't help. The Samsung K9 is close at least.


- A corollary of arguably inconsequential import: R's model forces the synth
driver to emulate an entire NAND chip and its protocol. E's synth doesn't
need to.

One could say that makes it a more realistic emulation. But yes I can see disadvantages with a somewhat rigid world view. Thinking out loud, I wonder if Rutger's layer could work with something like Samsung OneNAND.


- E's high-level driver interface makes it harder to add new functions
later, necessitating a change to that API (H2 above). R's does not; the
requisite logic would only need to be added to the ANC. It is not thought
that more than a handful such changes will ever be required, and it may be
possible to maintain backwards compatibility. (As a case in point, support
for hardware ECC is currently work-in-progress within eCosCentric, and does
require such a change, but now is not the right time to discuss that.)

In my view allowing hardware ECC support is a vital part of an API. If an API doesn't permit exploiting hardware ECC that would be quite a negative. R's does appear to. OTOH I can't imagine it being a difficult thing to add in yours. In fact, because of the requirement for the drivers to call CYG_NAND_FUNS, it doesn't seem difficult at all to be backwardly compatible. Am I right? Nevertheless, it would be unfortunate to have an API which already needs its low level driver interface updating to a rev 2.


Incidentally I note Rutger has a "Samsung" ECC implementation, whereas you support Samsung K9 chips, but use the normal ECC algorithm. Did Samsung change their practice?

4. Feature/implementation differences ------------------------------------

(I don't consider these to be significant issues; whilst noteworthy, I don't
think they would take much effort to resolve.)

(a) Documentation

The two layers' documentation differ in their depth and layout; these are
difficult for me to compare objectively, and I would suggest that a fresh
pair of eyes compare them.

Your documentation does appear very thorough and well-structured (although the Samsung and EA LPC2468 docs really should be broken out into their own packages). Rutger's does also seem fine though so I don't think there's a strong difference either way.


I can only offer the comment that I documented the E layer bearing in mind
what I considered to be missing from the R layer documentation: it was not
clear how the controller and chip layers inter-related, nor where to start
in creating a driver. (I also had a lot less experience of NAND chips then
than I do now, and what I need to know now is different from what a newbie
would.)

It's possible that those layer interrelations were at the level where really the code would be the better guide. Although there's always room for improvement.


That being said, experience shows that the best "documentation" for driver internals (i.e. beneath the application API) is in fact real concrete drivers, which brings us to...

(b) Availability of drivers

R provides support for:
- One board: BlackFin EZ-Kit BF548 (which is not in anoncvs?)
- One chip: the ST Micro 0xG chip (large page, x8 and x16 present but
presumably only tested on the x8 chip on the BlackFin board?)
- A synthetic controller/chip package
- A template for a GPIO-based controller (untested, intended as an example only)

I seem to remember rumours of the existence of a driver for a further
chip+board combination, but I haven't seen it.

E provides support for:
- Two boards: Embedded Artists LPC2468 (very well tested); STM3210E (largely
complete, based on work by Simon K; some enhancements planned)
- Two chips: Samsung K9 family (large page, only x8 done so far); ST-Micro
NANDxxxx3A (small page, x8) (based on work by Simon K)
- Synthetic target. This offers more features than R's: bad block injection,
logging, and a GUI interface via the synth I/O auxiliary.
- Further (customer-confidential) board ports.

I would certainly appreciate feedback from anyone who has used R's layer. What you say would seem to imply that both small page and OFNI are untested in R's layer.


(c) RedBoot support

E have added some commands for NAND operations and tested on the EA LPC2468
board. (YAFFS support works via the existing RB fileio layer; nothing really
needed to be done.)

I think that patch needs some work (I can go into detail if you like), but it's presence is still a positive thing.


(d) Degree of testing

There are presumably differences of coverage here; both E and R assert they
have carried out stress tests. Properly comparing the depth of the two would
be a job for fresh eyes.

E have:
- a handful of unit and functional tests of the NAND layer, and a benchmarker
- a number of YAFFS functional tests, one of which includes benchmarking,
and a further severe YAFFS stress test: these indirectly test the NAND
layer. (The latter has been run under the synth driver with bad-block
injection turned on, and has revealed some subtle bugs which we probably
wouldn't otherwise have caught.)
- the ability to run continual test cycles in their test farm

Bad block injection sounds like an extremely useful feature. I infer from the latter that we're now talking about many hours of testing?


I'd need feedback from Rutger as to what level of testing has been done with his.

5. Works in progress -----------------------------------------------------

I can of course only comment on eCosCentric's plans, but the following work
is in the pipeline:

* Expansion of the device interface to better allow efficient hardware ECC
support (in progress)

Rough ETA? All I'm interested in knowing is whether the device interface changes for this are likely to be concluded within the timeframe of this discussion.


* Partition addressing: make addressing relative to the start of the
partition, once and for all

That's quite a major API change, which seems problematic to me.


* Part-page read support (would provide a big speed-up to parts of YAFFS2
inbandTags mode as needed by small-page devices like that on the STM3210E)

Do you foresee this happening within any particular timeframe? Do you expect the changes to be backwardly compatible?


If you got this far, well done! Since you say you'll be away, you may prefer to reply to this email in sections rather than sucking up your time and doing it all at once.

Thanks in advance.

Jifl
--
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]