next up previous
Next: JFFS2 Up: JFFS : The Journalling Previous: Introduction

Subsections

JFFS Version 1

The design goals of JFFS are largely determined by the characteristics of flash technology and of the devices in which it is expected to be used -- as embedded and battery-powered devices are often treated by users as simple appliances, we must ensure reliable operation when the system is uncleanly shut down.

Storage Format

The original JFFS is a purely log-structured file system [LFS]. Nodes containing data and metadata are stored on the flash chips sequentially, progressing strictly linearly through the storage space available.

In JFFS v1, there is only one type of node in the log; a structure known as struct jffs_raw_inode. Each such node is associated with a single inode. It starts with a common header containing the inode number of the inode to which it belongs and all the current file system metadata for that inode, and may also carry a variable amount of data.

There is a total ordering between the all the nodes belonging to any individual inode, which is maintained by storing a version number in each node. Each node is written with a version higher than all previous nodes belonging to the same inode. The version is an unsigned 32-bit field, allowing for 4 milliard nodes to be written for each inode during the life span of the file system. Because the limited lifetime of flash chips means this number is extremely unlikely to be reached, this limitation is deemed to be acceptable.

Similarly, the inode number is stored in a 32-bit field, and inode numbers are never reused. The same logic applies to the acceptability of this limitation, especially as it is possible to remove this restriction without breaking backwards compatibility of JFFS file systems, if it becomes necessary.

In addition to the normal inode metadata such as uid, gid, mtime, atime, mtime etc., each JFFS v1 raw node also contains the name of the inode to which it belongs and the inode number of the parent inode.1

Each node may also contain an amount of data, and if data are present the node will also record the offset in the file at which these data should appear. For reasons which are discussed later, there is a restriction on the maximum size of physical nodes, so large files will have many nodes associated with them, each node carrying data for a different range within the file.

Nodes containing data for a range in the inode which is also covered by a later node are said to be obsoleted, as are nodes which contain no data, where the metadata they contain has been outdated by a later node. Space taken by obsoleted nodes is referred to as ``dirty space''.

Special inodes such as character or block devices and symbolic links which have extra information associated with them represent this information -- the device numbers or symlink target string -- in the data part of the JFFS node, in the same manner as regular files represent their data, with the exception that there may be only one non-obsolete node for each such special inode at any time. Because symbolic links and especially device nodes have small amounts of such data, and because the data in these inodes are always required all at once rather than by reading separate ranges, it is simpler to ensure that the data are not fragmented into many different nodes on the flash.

Inode deletion is performed by setting a deleted flag in the inode metadata. All later nodes associated with the deleted inode are marked with the same flag, and when the last file handle referring to the deleted inode is closed, all its nodes become obsolete.

Operation

The entire medium is scanned at mount time, each node being read and interpreted. The data stored in the raw nodes provide sufficient information to rebuild the entire directory hierarchy and a complete map for each inode of the physical location on the medium of each range of data.

JFFS v1 stores all this information at all times while the file system is mounted. Each directory lookup can be satisfied immediately from data structures held in-core, and file reads can be performed by reading immediately from the appropriate locations on the medium into the supplied buffer.

Metadata changes such as ownership or permissions changes are performed by simply writing a new node to the end of the log recording the appropriate new metadata. File writes are similar; differing only in that the node written will have data associated with it.

Garbage Collection

The principles of operation so far are extremely simple. The JFFS code happily writes out new jffs_raw_inode structures to the medium to mark each change made to the filesystem...until, that is, it runs out of space.

At that point, the system needs to begin to reclaim the dirty space which contains old nodes which have been obsoleted by subsequent writes.

The oldest node in the log is known as the head, and new nodes are added to the tail of the log. In a clean filesystem which on which garbage collection has never been triggered, the head of the log will be at the very beginning of the flash. As the tail approaches the end of the flash, garbage collection will be triggered to make space.

Garbage collection will happen either in the context of a kernel thread which attempts to make space before it is actually required, or in the context of a user process which finds insufficient free space on the medium to perform a requested write. In either case, garbage collection will only continue if there is dirty space which can be reclaimed. If there is not enough dirty space to ensure that garbage collection will improve the situation, the kernel thread will sleep, and writes will fail with $-$ENOSPC errors.

The goal of the garbage collection code is to erase the first flash block in the log. At each pass, the node at the head of the log is examined. If the node is obsolete, it is skipped and the head moves on to the next node.2 If the node is still valid, it must be rendered obsolete. The garbage collection code does so by writing out a new data or metadata node to the tail of the log.

The new node written will contain the currently valid data for at least the range covered by the original node. If there is sufficient free space, the garbage collection code may write a larger node than the one being obsoleted, in order to improve storage efficiency by merging many small nodes into fewer, larger nodes.

If the node being obsoleted is already partially obsoleted by later nodes which cover only part of the same range of data, some of the data written to the new node will obviously differ from the data contained in the original.

In this way, the garbage collection code progresses the head of the log through the flash until a complete erase block is rendered obsolete, at which point it is erased and becomes available for reuse by the tail of the log.

Housekeeping

The JFFS file system requires a certain amount of space to be available between the head and the tail of the log at all times, in order to ensure that it is always possible to proceed with garbage collection by writing out new nodes as described above.

A simplified analysis of this situation is as follows:

In order to be able to erase the next block from the head of the log, there must be sufficient space to write out new nodes to obsolete all the nodes in that block. The worst case is that all nodes in the block are valid, the first node starts at the very beginning of the block, and the final node starts just before the end of the block and extends into the subsequent block.

By restricting the maximum size of a data node to half the size of the flash erase sector, we limit the amount of free space required in this worst case scenario to one and a half times the size of the flash sectors in use.

In fact, the above is only an approximation -- it ignores the fact that a name is stored with each node on the flash, and that renaming a file to a longer name will cause all nodes belonging to that file to grow when they are garbage collected.3

The precise amount of space which is required in order to ensure that garbage collection can continue is not formally proven and may not even be bounded with the current algorithms.

Empirical results show that a value of four flash sectors seems to be sufficient, while the previous default of two flash sectors would occasionally lead to the tail of the log reaching the head and complete deadlock of the file system.

Evolution

The original version of JFFS was used by Axis in their embedded devices in a relatively limited fashion, on 2.0 version of the Linux kernel.

After the code was released, it was ported to the 2.3 development kernels by a developer in Sweden. Subsequently, Red Hat, Inc. were asked to port it to the 2.2 series and provide commercial support for a contract customer.

Although the design of the file system was impressive, certain parts of the implementation appeared not to have been tested by its use in Axis' products. Writing data anywhere other than at the end of a file did not work, and deleting a file while a process had a valid file descriptor for it would cause a kernel oops.

After some weeks of reliability and compliance testing, JFFS reached stability. It is a credit to the clarity and quality of the original code that it was possible to become familiar with it and bring it to the current state in a relatively short space of time.

However, during this time it became apparent that there were a few serious flaws in the original implementation of the filesystem:

Garbage collection
would proceed linearly through the medium, writing out new nodes to allow it to erase the oldest block in the log, even if the block being garbage collected contained only clean nodes.

In the relatively common situation where a 16 MiB file system contained 12 MiB of static data -- libraries and program executables, 2 MiB of slack space and 2 MiB of dynamic data, the garbage collection would move the 12 MiB of static data from one place on the flash to another on every pass through the medium. JFFS provided perfect wear levelling -- each block was erased exactly the same number of times -- but this meant that the blocks were also erased more often than was necessary.

Wear levelling must be provided, by occasionally picking on a clean block and moving its contents. But that should be an occasional event, not the normal behaviour.

Compression
was not supported by JFFS. Because of the cost of flash chips and the constant desire to squeeze more functionality into embedded devices, compression was a very important requirement for a large proportion of potential users of JFFS.

Hard links
were also not supported by the original version of the filesystem. While this lack was not particularly limiting, it was annoying, as was the fact that file names were stored with each jffs_raw_inode, potentially leading to unbounded space expansion upon renames.


next up previous
Next: JFFS2 Up: JFFS : The Journalling Previous: Introduction
David Woodhouse 2001-10-10