Ability to handle transient device failures

When a device goes out of sync, we want to keep track of regions that are out of sync and resynchronize only the out of sync regions when the device comes back. We would like an LVM tool to automatically resynchronize (actually request the mirror target to do the actual work) out of sync devices. LVM mirror monitoring daemon would request resynchronization of out of sync devices at exponential back off time intervals.

[The trick is detecting what is transient and what is not... It would be simple to make dmeventd do something more intelligent. The final act of suspending/resuming would have the side-effect of restarting synchronization - giving what you want. This can be done in user space on version 1 of the mirroring code, I think.]

Select the most recently used master device as master at boot time if the mirror is not in sync

Attached DRBD document pages relevant to their UUID implementation. I talked to a co-auther of the project, and he suggested using 'device-id with generaton number' instead of uuid to avoid uuid collision. He pointed that 'device-id with data generation number' is a better method...


This requires saving the synchronization state of the mirror as well as the master device (if not in sync) in persistent storage. This can be done by the kernel module itself or by a user level daemon as part of processing "sync/nosync" event. For simplicity, assume that there are two devices in the mirror, devA and devB. If devB goes out of sync, the mirror state is recorded as "out of sync" and devA is stored as master device. On a reboot, we can detect that devA is master if we find both devA and devB. Also, we can detect that devB wasn't master if we only find devB because devB's metadata indicates that devA and devB were active which is not the case. If we really need to extend this to N-way mirror, we should keep the following 'mirror meta data'

  1. Number of devices in the mirror, that is "N" (already part of LVM metadata)
  2. Number of sync devices in the mirror (if this is NOT equal to N, implies an out of sync mirror)
  3. List of sync devices (devices that are in sync, any one could be a master)

If we find X devices at start up and they all list those X devices being in sync, then they must have the latest data.

Method 2:

Have a state on the device that indicates if the device has the 'lastest data'. This would be set to FALSE on start up and set to TRUE at shutdown time if the device really has the latest data.

The first method can be implemented by the LVM tools but the kernel DM target has to suspend all I/O until LVM finishes its metadata update. The second method is easier to implement but may require destroying and recreating the mirror metadata to avoid a device showing up with uptodate data after multiple reboot cycles.

Ability to have at least two log devices to avoid single point of log device failure. The log devices can be used in replicated mode or hot/spare standby mode.

Ability to operate in read/write mode after reboot even if there is only one mirror device detected.

"vgreduce --removemissing" removes any logical volumes that are partly on any missing device. It should keep mirrored logical volumes provided if it can find at least one mirror device. "lvm --partial" can try read-write for mirrored logical volumes.

None: MirrorDesignV2-discussion (last edited 2008-01-10 19:42:45 by localhost)