Commit 1416bd50 authored by Alexander Aring's avatar Alexander Aring Committed by David Teigland
Browse files

dlm: fix recovery pending middle conversion



During a workload involving conversions between lock modes PR and CW,
lock recovery can create a "conversion deadlock" state between locks
that have been recovered.  When this occurs, kernel warning messages
are logged, e.g.

  "dlm: WARN: pending deadlock 1e node 0 2 1bf21"

  "dlm: receive_rcom_lock_args 2e middle convert gr 3 rq 2 remote 2 1e"

After this occurs, the deadlocked conversions both appear on the convert
queue of the resource being locked, and the conversion requests do not
complete.

Outside of recovery, conversions that would produce a deadlock are
resolved immediately, and return -EDEADLK.  The locks are not placed
on the convert queue in the deadlocked state.

To fix this problem, an lkb under conversion between PR/CW is rebuilt
during recovery on a new master's granted queue, with the currently
granted mode, rather than being rebuilt on the new master's convert
queue, with the currently granted mode and the newly requested mode.
The in-progress convert is then resent to the new master after
recovery, so the conversion deadlock will be processed outside of
the recovery context and handled as described above.

Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
parent 24d479d2
Loading
Loading
Loading
Loading
+1 −18
Original line number Diff line number Diff line
@@ -5014,25 +5014,8 @@ void dlm_receive_buffer(const union dlm_packet *p, int nodeid)
static void recover_convert_waiter(struct dlm_ls *ls, struct dlm_lkb *lkb,
				   struct dlm_message *ms_local)
{
	if (middle_conversion(lkb)) {
		log_rinfo(ls, "%s %x middle convert in progress", __func__,
			 lkb->lkb_id);

		/* We sent this lock to the new master. The new master will
		 * tell us when it's granted.  We no longer need a reply, so
		 * use a fake reply to put the lkb into the right state.
		 */
		hold_lkb(lkb);
		memset(ms_local, 0, sizeof(struct dlm_message));
		ms_local->m_type = cpu_to_le32(DLM_MSG_CONVERT_REPLY);
		ms_local->m_result = cpu_to_le32(to_dlm_errno(-EINPROGRESS));
		ms_local->m_header.h_nodeid = cpu_to_le32(lkb->lkb_nodeid);
		_receive_convert_reply(lkb, ms_local, true);
		unhold_lkb(lkb);

	} else if (lkb->lkb_rqmode >= lkb->lkb_grmode) {
	if (middle_conversion(lkb) || lkb->lkb_rqmode >= lkb->lkb_grmode)
		set_bit(DLM_IFL_RESEND_BIT, &lkb->lkb_iflags);
	}

	/* lkb->lkb_rqmode < lkb->lkb_grmode shouldn't happen since down
	   conversions are async; there's no reply from the remote master */