This is the mail archive of the cluster-cvs@sourceware.org mailing list for the cluster.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Cluster Project branch, RHEL5, updated. cmirror_1_1_15-48-gafb6cf2

From: teigland at sourceware dot org
To: cluster-cvs at sources dot redhat dot com, cluster-devel at redhat dot com
Date: 16 Apr 2008 14:28:22 -0000
Subject: Cluster Project branch, RHEL5, updated. cmirror_1_1_15-48-gafb6cf2

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "Cluster Project".

http://sources.redhat.com/git/gitweb.cgi?p=cluster.git;a=commitdiff;h=afb6cf25e46a7afc40f97367e26719b29cd0983d

The branch, RHEL5 has been updated
       via  afb6cf25e46a7afc40f97367e26719b29cd0983d (commit)
      from  0847ffdaf607aafd538e949c91eb47f2a06c4335 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit afb6cf25e46a7afc40f97367e26719b29cd0983d
Author: David Teigland <teigland@redhat.com>
Date:   Wed Apr 16 09:22:27 2008 -0500

    gfs_controld: retry recovery for withdrawn journal
    
    bz 442451
    
    This is unfortunate, but seems to be the best solution available.  The
    problem, described more fully in the bz, is that when gfs_controld tries
    to do recovery on a journal for a withdraw, the withdrawing node may not
    yet have cleared its dlm locks.  This means the journal lock may still be
    held by the withdrawing node, causing all the recovering node(s) to fail
    acquiring it, and no one does the recovery.  The solution is for all
    recovering nodes to retry recovery of a withdrawn journal until they
    succeed (only the first to get the journal lock will actually recover
    it, the others will see it's recovered and report success.)
    
    Signed-off-by: David Teigland <teigland@redhat.com>

-----------------------------------------------------------------------

Summary of changes:
 group/gfs_controld/recover.c |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/group/gfs_controld/recover.c b/group/gfs_controld/recover.c
index 9ce3aa7..52d96ff 100644
--- a/group/gfs_controld/recover.c
+++ b/group/gfs_controld/recover.c
@@ -1913,6 +1913,25 @@ int kernel_recovery_done(char *table)
 
 	switch (atoi(buf)) {
 	case LM_RD_GAVEUP:
+		/*
+		 * This is unfortunate; it's needed for bz 442451 where
+		 * gfs-kernel fails to acquire the journal lock on all nodes
+		 * because a withdrawing node has not yet called
+		 * dlm_release_lockspace() to free it's journal lock.  With
+		 * this, all nodes should repeatedly try to to recover the
+		 * journal of the withdrawn node until the withdrawing node
+		 * clears its dlm locks, and gfs on each of the remaining nodes
+		 * succeeds in doing the recovery.
+		 */
+
+		if (memb->withdrawing) {
+			log_group(mg, "recovery_done jid %d nodeid %d retry "
+				  "for withdraw", memb->jid, memb->nodeid);
+			memb->tell_gfs_to_recover = 1;
+			memb->wait_gfs_recover_done = 0;
+			usleep(500000);
+		}
+
 		memb->local_recovery_status = RS_GAVEUP;
 		ss = "gaveup";
 		break;


hooks/post-receive
--
Cluster Project

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]