This is the mail archive of the
cluster-cvs@sourceware.org
mailing list for the cluster.
cluster/gfs-kernel/src/dlm mount.c
- From: teigland at sourceware dot org
- To: cluster-cvs at sources dot redhat dot com
- Date: 14 Jan 2008 15:35:30 -0000
- Subject: cluster/gfs-kernel/src/dlm mount.c
CVSROOT: /cvs/cluster
Module name: cluster
Branch: RHEL4
Changes by: teigland@sourceware.org 2008-01-14 15:35:30
Modified files:
gfs-kernel/src/dlm: mount.c
Log message:
bz 324881
It's easy to tell if you've hit this bug, because a message like this will
always appear in /var/log/messages:
SM: 02000378 ignoring service callback id=2000144 event=1324
If you look at /proc/cluster/lock_dlm/debug on this node at this point,
you'll see something like this at the end, which shows what the problem
is:
others_may_mount start_done 1322 b
The event_id that others_may_mount uses when calling kcl_start_done()
is incorrect; it's using 1322 when it should be 1324.
I believe the fix is for others_may_mount() to read the event_id
after taking the umount_lock semaphore which serializes
others_may_mount() with a start callback from the lock_dlm thread.
In this case, I believe the start callback is changing the event_id
after others_may_mount reads it, and before othres_may_mount gets
the umount_lock semaphore.
Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/gfs-kernel/src/dlm/mount.c.diff?cvsroot=cluster&only_with_tag=RHEL4&r1=1.11.2.3&r2=1.11.2.4