Commit 2cdd64cb authored by Sean Christopherson's avatar Sean Christopherson
Browse files

KVM: Disallow binding multiple irqfds to an eventfd with a priority waiter



Disallow binding an irqfd to an eventfd that already has a priority waiter,
i.e. to an eventfd that already has an attached irqfd.  KVM always
operates in exclusive mode for EPOLL_IN (unconditionally returns '1'),
i.e. only the first waiter will be notified.

KVM already disallows binding multiple irqfds to an eventfd in a single
VM, but doesn't guard against multiple VMs binding to an eventfd.  Adding
the extra protection reduces the pain of a userspace VMM bug, e.g. if
userspace fails to de-assign before re-assigning when transferring state
for intra-host migration, then the migration will explicitly fail as
opposed to dropping IRQs on the destination VM.

Temporarily keep KVM's manual check on irqfds.items, but add a WARN, e.g.
to allow sanity checking the waitqueue enforcement.

Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: David Matlack <dmatlack@google.com>
Tested-by: default avatarK Prateek Nayak <kprateek.nayak@amd.com>
Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250522235223.3178519-10-seanjc@google.com


Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
parent 0d09582b
Loading
Loading
Loading
Loading
+37 −18
Original line number Diff line number Diff line
@@ -291,38 +291,57 @@ static void kvm_irqfd_register(struct file *file, wait_queue_head_t *wqh,
	struct kvm_kernel_irqfd *tmp;
	struct kvm *kvm = p->kvm;

	/*
	 * Note, irqfds.lock protects the irqfd's irq_entry, i.e. its routing,
	 * and irqfds.items.  It does NOT protect registering with the eventfd.
	 */
	spin_lock_irq(&kvm->irqfds.lock);

	list_for_each_entry(tmp, &kvm->irqfds.items, list) {
		if (irqfd->eventfd != tmp->eventfd)
			continue;
		/* This fd is used for another irq already. */
		p->ret = -EBUSY;
		spin_unlock_irq(&kvm->irqfds.lock);
		return;
	}

	/*
	 * Initialize the routing information prior to adding the irqfd to the
	 * eventfd's waitqueue, as irqfd_wakeup() can be invoked as soon as the
	 * irqfd is registered.
	 */
	irqfd_update(kvm, irqfd);

	list_add_tail(&irqfd->list, &kvm->irqfds.items);

	/*
	 * Add the irqfd as a priority waiter on the eventfd, with a custom
	 * wake-up handler, so that KVM *and only KVM* is notified whenever the
	 * underlying eventfd is signaled.  Temporarily lie to lockdep about
	 * holding irqfds.lock to avoid a false positive regarding potential
	 * deadlock with irqfd_wakeup() (see irqfd_wakeup() for details).
	 * underlying eventfd is signaled.
	 */
	init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);

	/*
	 * Temporarily lie to lockdep about holding irqfds.lock to avoid a
	 * false positive regarding potential deadlock with irqfd_wakeup()
	 * (see irqfd_wakeup() for details).
	 *
	 * Adding to the wait queue will fail if there is already a priority
	 * waiter, i.e. if the eventfd is associated with another irqfd (in any
	 * VM).  Note, kvm_irqfd_deassign() waits for all in-flight shutdown
	 * jobs to complete, i.e. ensures the irqfd has been removed from the
	 * eventfd's waitqueue before returning to userspace.
	 */
	spin_release(&kvm->irqfds.lock.dep_map, _RET_IP_);
	irqfd->wait.flags |= WQ_FLAG_EXCLUSIVE;
	add_wait_queue_priority(wqh, &irqfd->wait);
	p->ret = add_wait_queue_priority_exclusive(wqh, &irqfd->wait);
	spin_acquire(&kvm->irqfds.lock.dep_map, 0, 0, _RET_IP_);
	if (p->ret)
		goto out;

	spin_unlock_irq(&kvm->irqfds.lock);
	list_for_each_entry(tmp, &kvm->irqfds.items, list) {
		if (irqfd->eventfd != tmp->eventfd)
			continue;

	p->ret = 0;
		WARN_ON_ONCE(1);
		/* This fd is used for another irq already. */
		p->ret = -EBUSY;
		goto out;
	}

	list_add_tail(&irqfd->list, &kvm->irqfds.items);

out:
	spin_unlock_irq(&kvm->irqfds.lock);
}

#if IS_ENABLED(CONFIG_HAVE_KVM_IRQ_BYPASS)