This is the mail archive of the
cluster-cvs@sourceware.org
mailing list for the cluster.
cluster: STABLE3 - doc: remove old gfs docs
- From: David Teigland <teigland at fedoraproject dot org>
- To: cluster-cvs-relay at redhat dot com
- Date: Thu, 23 Jul 2009 17:52:22 +0000 (UTC)
- Subject: cluster: STABLE3 - doc: remove old gfs docs
Gitweb: http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=commitdiff;h=6f83ab05ad47c5fe242d7840388db955c20c4ece
Commit: 6f83ab05ad47c5fe242d7840388db955c20c4ece
Parent: ab181b7303ccd66ab4bd67a08ed06136b4a20a93
Author: David Teigland <teigland@redhat.com>
AuthorDate: Thu Jul 23 12:42:43 2009 -0500
Committer: David Teigland <teigland@redhat.com>
CommitterDate: Thu Jul 23 12:44:03 2009 -0500
doc: remove old gfs docs
Signed-off-by: David Teigland <teigland@redhat.com>
---
doc/Makefile | 5 +-
doc/gfs2.txt | 45 ---------------
doc/journaling.txt | 155 --------------------------------------------------
doc/min-gfs.txt | 159 ----------------------------------------------------
4 files changed, 1 insertions(+), 363 deletions(-)
diff --git a/doc/Makefile b/doc/Makefile
index 10a076c..2aeb0b9 100644
--- a/doc/Makefile
+++ b/doc/Makefile
@@ -1,7 +1,4 @@
-DOCS = gfs2.txt \
- journaling.txt \
- min-gfs.txt \
- usage.txt \
+DOCS = usage.txt \
COPYING.applications \
COPYING.libraries \
COPYRIGHT \
diff --git a/doc/gfs2.txt b/doc/gfs2.txt
deleted file mode 100644
index 88f0143..0000000
--- a/doc/gfs2.txt
+++ /dev/null
@@ -1,45 +0,0 @@
-Global File System
-------------------
-
-http://sources.redhat.com/cluster/
-
-GFS is a cluster file system. It allows a cluster of computers to
-simultaneously use a block device that is shared between them (with FC,
-iSCSI, NBD, etc). GFS reads and writes to the block device like a local
-file system, but also uses a lock module to allow the computers coordinate
-their I/O so file system consistency is maintained. One of the nifty
-features of GFS is perfect consistency -- changes made to the file system
-on one machine show up immediately on all other machines in the cluster.
-
-GFS uses interchangable inter-node locking mechanisms. Different lock
-modules can plug into GFS and each file system selects the appropriate
-lock module at mount time. Lock modules include:
-
- lock_nolock -- does no real locking and allows gfs to be used as a
- local file system
-
- lock_dlm -- uses a distributed lock manager (dlm) for inter-node locking
- The dlm is found at linux/fs/dlm/
-
-In addition to interfacing with an external locking manager, a gfs lock
-module is responsible for interacting with external cluster management
-systems. Lock_dlm depends on user space cluster management systems found
-at the URL above.
-
-To use gfs as a local file system, no external clustering systems are
-needed, simply:
-
- $ gfs2_mkfs -p lock_nolock -j 1 /dev/block_device
- $ mount -t gfs2 /dev/block_device /dir
-
-GFS2 is not on-disk compatible with previous versions of GFS.
-
-The following man pages can be found at the URL above:
- gfs2_mkfs to make a filesystem
- gfs2_fsck to repair a filesystem
- gfs2_grow to expand a filesystem online
- gfs2_jadd to add journals to a filesystem online
- gfs2_tool to manipulate, examine and tune a filesystem
- gfs2_quota to examine and change quota values in a filesystem
- mount.gfs2 to find mount options
-
diff --git a/doc/journaling.txt b/doc/journaling.txt
deleted file mode 100644
index e89eefa..0000000
--- a/doc/journaling.txt
+++ /dev/null
@@ -1,155 +0,0 @@
-o Journaling & Replay
-
-The fundamental problem with a journaled cluster filesystem is
-handling journal replay with multiple journals. A single block of
-metadata can be modified sequentially by many different nodes in the
-cluster. As the block is modified by each node, it gets logged in the
-journal for each node. If care is not taken, it's possible to get
-into a situation where a journal replay can actually corrupt a
-filesystem. The error scenario is:
-
-1) Node A modifies a metadata block by putting a updated copy into its
- incore log.
-2) Node B wants to read and modify the block so it requests the lock
- and a blocking callback is sent to Node A.
-3) Node A flushes its incore log to disk, and then syncs out the
- metadata block to its inplace location.
-4) Node A then releases the lock.
-5) Node B reads in the block and puts a modified copy into its ondisk
- log and then the inplace block location.
-6) Node A crashes.
-
-At this point, Node A's journal needs to be replayed. Since there is
-a newer version of block inplace, if that block is replayed, the
-filesystem will be corrupted. There are a few different ways of
-avoiding this problem.
-
-1) Generation Numbers (GFS1)
-
- Each metadata block has header in it that contains a 64-bit
- generation number. As each block is logged into a journal, the
- generation number is incremented. This provides a strict ordering
- of the different versions of the block a they are logged in the FS'
- different journals. When journal replay happens, each block in the
- journal is not replayed if generation number in the journal is less
- than the generation number in place. This ensures that a newer
- version of a block is never replaced with an older version. So,
- this solution basically allows multiple copies of the same block in
- different journals, but it allows you to always know which is the
- correct one.
-
- Pros:
-
- A) This method allows the fastest callbacks. To release a lock,
- the incore log for the lock must be flushed and then the inplace
- data and metadata must be synced. That's it. The sync
- operations involved are: start the log body and wait for it to
- become stable on the disk, synchronously write the commit block,
- start the inplace metadata and wait for it to become stable on
- the disk.
-
- Cons:
-
- A) Maintaining the generation numbers is expensive. All newly
- allocated metadata block must be read off the disk in order to
- figure out what the previous value of the generation number was.
- When deallocating metadata, extra work and care must be taken to
- make sure dirty data isn't thrown away in such a way that the
- generation numbers stop doing their thing.
- B) You can't continue to modify the filesystem during journal
- replay. Basically, replay of a block is a read-modify-write
- operation: the block is read from disk, the generation number is
- compared, and (maybe) the new version is written out. Replay
- requires that the R-M-W operation is atomic with respect to
- other R-M-W operations that might be happening (say by a normal
- I/O process). Since journal replay doesn't (and can't) play by
- the normal metadata locking rules, you can't count on them to
- protect replay. Hence GFS1, quieces all writes on a filesystem
- before starting replay. This provides the mutual exclusion
- required, but it's slow and unnecessarily interrupts service on
- the whole cluster.
-
-2) Total Metadata Sync (OCFS2)
-
- This method is really simple in that it uses exactly the same
- infrastructure that a local journaled filesystem uses. Every time
- a node receives a callback, it stops all metadata modification,
- syncs out the whole incore journal, syncs out any dirty data, marks
- the journal as being clean (unmounted), and then releases the lock.
- Because journal is marked as clean and recovery won't look at any
- of the journaled blocks in it, a valid copy of any particular block
- only exists in one journal at a time and that journal always the
- journal who modified it last.
-
- Pros:
-
- A) Very simple to implement.
- B) You can reuse journaling code from other places (such as JBD).
- C) No quiece necessary for replay.
- D) No need for generation numbers sprinkled throughout the metadata.
-
- Cons:
-
- A) This method has the slowest possible callbacks. The sync
- operations are: stop all metadata operations, start and wait for
- the log body, write the log commit block, start and wait for all
- the FS' dirty metadata, write an unmount block. Writing the
- metadata for the whole filesystem can be particularly expensive
- because it can be scattered all over the disk and there can be a
- whole journal's worth of it.
-
-3) Revocation of a lock's buffers (GFS2)
-
- This method prevents a block from appearing in more than one
- journal by canceling out the metadata blocks in the journal that
- belong to the lock being released. Journaling works very similarly
- to a local filesystem or to #2 above.
-
- The biggest difference is you have to keep track of buffers in the
- active region of the ondisk journal, even after the inplace blocks
- have been written back. This is done in GFS2 by adding a second
- part to the Active Items List. The first part (in GFS2 called
- AIL1) contains a list of all the blocks which have been logged to
- the journal, but not written back to their inplace location. Once
- an item in AIL1 has been written back to its inplace location, it
- is moved to AIL2. Once the tail of the log moves past the block's
- transaction in the log, it can be removed from AIL2.
-
- When a callback occurs, the log is flushed to the disk and the
- metadata for the lock is synced to disk. At this point, any
- metadata blocks for the lock that are in the current active region
- of the log will be in the AIL2 list. We then build a transaction
- that contains revoke tags for each buffer in the AIL2 list that
- belongs to that lock.
-
- Pros:
-
- A) No quiece necessary for Replay
- B) No need for generation numbers sprinkled throughout the
- metadata.
- C) The sync operations are: stop all metadata operations, start and
- wait for the log body, write the log commit block, start and
- wait for all the FS' dirty metadata, start and wait for the log
- body of a transaction that revokes any of the lock's metadata
- buffers in the journal's active region, and write the commit
- block for that transaction.
-
- Cons:
-
- A) Recovery takes two passes, one to find all the revoke tags in
- the log and one to replay the metadata blocks using the revoke
- tags as a filter. This is necessary for a local filesystem and
- the total sync method, too. It's just that there will probably
- be more tags.
-
-Comparing #2 and #3, both do extra I/O during a lock callback to make
-sure that any metadata blocks in the log for that lock will be
-removed. I believe #2 will be slower because syncing out all the
-dirty metadata for entire filesystem requires lots of little,
-scattered I/O across the whole disk. The extra I/O done by #3 is a
-log write to the disk. So, not only should it be less I/O, but it
-should also be better suited to get good performance out of the disk
-subsystem.
-
-KWP 07/06/05
-
diff --git a/doc/min-gfs.txt b/doc/min-gfs.txt
deleted file mode 100644
index af1399c..0000000
--- a/doc/min-gfs.txt
+++ /dev/null
@@ -1,159 +0,0 @@
-
-Minimum GFS HowTo
------------------
-
-The following gfs configuration requires a minimum amount of hardware and
-no expensive storage system. It's the cheapest and quickest way to "play"
-with gfs.
-
-
- ---------- ----------
- | GNBD | | GNBD |
- | client | | client | <-- these nodes use gfs
- | node2 | | node3 |
- ---------- ----------
- | |
- ------------------ IP network
- |
- ----------
- | GNBD |
- | server | <-- this node doesn't use gfs
- | node1 |
- ----------
-
-- There are three machines to use with hostnames: node1, node2, node3
-
-- node1 has an extra disk /dev/sda1 to use for gfs
- (this could be hda1 or an lvm LV or an md device)
-
-- node1 will use gnbd to export this disk to node2 and node3
-
-- Node1 cannot use gfs, it only acts as a gnbd server.
- (Node1 will /not/ actually be part of the cluster since it is only
- running the gnbd server.)
-
-- Only node2 and node3 will be in the cluster and use gfs.
- (A two-node cluster is a special case for cman, noted in the config below.)
-
-- There's not much point to using clvm in this setup so it's left out.
-
-- Download the "cluster" source tree.
-
-- Build and install from the cluster source tree. (The kernel components
- are not required on node1 which will only need the gnbd_serv program.)
-
- cd cluster
- ./configure --kernel_src=/path/to/kernel
- make; make install
-
-- Create /etc/cluster/cluster.conf on node2 with the following contents:
-
-<?xml version="1.0"?>
-<cluster name="gamma" config_version="1">
-
-<cman two_node="1" expected_votes="1">
-</cman>
-
-<clusternodes>
-<clusternode name="node2">
- <fence>
- <method name="single">
- <device name="gnbd" ipaddr="node2"/>
- </method>
- </fence>
-</clusternode>
-
-<clusternode name="node3">
- <fence>
- <method name="single">
- <device name="gnbd" ipaddr="node3"/>
- </method>
- </fence>
-</clusternode>
-</clusternodes>
-
-<fencedevices>
- <fencedevice name="gnbd" agent="fence_gnbd" servers="node1"/>
-</fencedevices>
-
-</cluster>
-
-
-- load kernel modules on nodes
-
-node2 and node3> modprobe gnbd
-node2 and node3> modprobe gfs
-node2 and node3> modprobe lock_dlm
-
-- run the following commands
-
-node1> gnbd_serv -n
-node1> gnbd_export -c -d /dev/sda1 -e global_disk
-
-node2 and node3> gnbd_import -n -i node1
-node2 and node3> ccsd
-node2 and node3> cman_tool join
-node2 and node3> fence_tool join
-
-node2> gfs_mkfs -p lock_dlm -t gamma:gfs1 -j 2 /dev/gnbd/global_disk
-
-node2 and node3> mount -t gfs /dev/gnbd/global_disk /mnt
-
-- the end, you now have a gfs file system mounted on node2 and node3
-
-
-Appendix A
-----------
-
-To use manual fencing instead of gnbd fencing, the cluster.conf file
-would look like this:
-
-<?xml version="1.0"?>
-<cluster name="gamma" config_version="1">
-
-<cman two_node="1" expected_votes="1">
-</cman>
-
-<clusternodes>
-<clusternode name="node2">
- <fence>
- <method name="single">
- <device name="manual" ipaddr="node2"/>
- </method>
- </fence>
-</clusternode>
-
-<clusternode name="node3">
- <fence>
- <method name="single">
- <device name="manual" ipaddr="node3"/>
- </method>
- </fence>
-</clusternode>
-</clusternodes>
-
-<fencedevices>
- <fencedevice name="manual" agent="fence_manual"/>
-</fencedevices>
-
-</cluster>
-
-
-FAQ
----
-
-- Why can't node3 use gfs, too?
-
-You might be able to make it work, but we recommend that you not try.
-This software was not intended or designed to allow that kind of usage.
-
-- Isn't node3 a single point of failure? how do I avoid that?
-
-Yes it is. For the time being, there's no way to avoid that, apart from
-not using gnbd, of course. Eventually, there will be a way to avoid this
-using cluster mirroring.
-
-- More info from
- http://sources.redhat.com/cluster/gnbd/gnbd_usage.txt
- http://sources.redhat.com/cluster/doc/usage.txt
-