Commit c09dd2bb authored by Paolo Bonzini's avatar Paolo Bonzini
Browse files

Merge branch 'kvm-redo-enable-virt' into HEAD



Register KVM's cpuhp and syscore callbacks when enabling virtualization in
hardware, as the sole purpose of said callbacks is to disable and re-enable
virtualization as needed.

The primary motivation for this series is to simplify dealing with enabling
virtualization for Intel's TDX, which needs to enable virtualization
when kvm-intel.ko is loaded, i.e. long before the first VM is created.

That said, this is a nice cleanup on its own.  By registering the callbacks
on-demand, the callbacks themselves don't need to check kvm_usage_count,
because their very existence implies a non-zero count.

Patch 1 (re)adds a dedicated lock for kvm_usage_count.  This avoids a
lock ordering issue between cpus_read_lock() and kvm_lock.  The lock
ordering issue still exist in very rare cases, and will be fixed for
good by switching vm_list to an (S)RCU-protected list.

Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
parents 55f50b2f 590b09b1
Loading
Loading
Loading
Loading
+17 −0
Original line number Diff line number Diff line
@@ -2648,6 +2648,23 @@

			Default is Y (on).

	kvm.enable_virt_at_load=[KVM,ARM64,LOONGARCH,MIPS,RISCV,X86]
			If enabled, KVM will enable virtualization in hardware
			when KVM is loaded, and disable virtualization when KVM
			is unloaded (if KVM is built as a module).

			If disabled, KVM will dynamically enable and disable
			virtualization on-demand when creating and destroying
			VMs, i.e. on the 0=>1 and 1=>0 transitions of the
			number of VMs.

			Enabling virtualization at module lode avoids potential
			latency for creation of the 0=>1 VM, as KVM serializes
			virtualization enabling across all online CPUs.  The
			"cost" of enabling virtualization when KVM is loaded,
			is that doing so may interfere with using out-of-tree
			hypervisors that want to "own" virtualization hardware.

	kvm.enable_vmware_backdoor=[KVM] Support VMware backdoor PV interface.
				   Default is false (don't support).

+23 −8
Original line number Diff line number Diff line
@@ -11,6 +11,8 @@ The acquisition orders for mutexes are as follows:

- cpus_read_lock() is taken outside kvm_lock

- kvm_usage_lock is taken outside cpus_read_lock()

- kvm->lock is taken outside vcpu->mutex

- kvm->lock is taken outside kvm->slots_lock and kvm->irq_lock
@@ -24,6 +26,12 @@ The acquisition orders for mutexes are as follows:
  are taken on the waiting side when modifying memslots, so MMU notifiers
  must not take either kvm->slots_lock or kvm->slots_arch_lock.

cpus_read_lock() vs kvm_lock:
- Taking cpus_read_lock() outside of kvm_lock is problematic, despite that
  being the official ordering, as it is quite easy to unknowingly trigger
  cpus_read_lock() while holding kvm_lock.  Use caution when walking vm_list,
  e.g. avoid complex operations when possible.

For SRCU:

- ``synchronize_srcu(&kvm->srcu)`` is called inside critical sections
@@ -227,10 +235,16 @@ time it will be set using the Dirty tracking mechanism described above.
:Type:		mutex
:Arch:		any
:Protects:	- vm_list
		- kvm_usage_count

``kvm_usage_lock``
^^^^^^^^^^^^^^^^^^

:Type:		mutex
:Arch:		any
:Protects:	- kvm_usage_count
		- hardware virtualization enable/disable
:Comment:	KVM also disables CPU hotplug via cpus_read_lock() during
		enable/disable.
:Comment:	Exists to allow taking cpus_read_lock() while kvm_usage_count is
		protected, which simplifies the virtualization enabling logic.

``kvm->mn_invalidate_lock``
^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -290,11 +304,12 @@ time it will be set using the Dirty tracking mechanism described above.
		wakeup.

``vendor_module_lock``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^
:Type:		mutex
:Arch:		x86
:Protects:	loading a vendor module (kvm_amd or kvm_intel)
:Comment:	Exists because using kvm_lock leads to deadlock.  cpu_hotplug_lock is
    taken outside of kvm_lock, e.g. in KVM's CPU online/offline callbacks, and
    many operations need to take cpu_hotplug_lock when loading a vendor module,
    e.g. updating static calls.
:Comment:	Exists because using kvm_lock leads to deadlock.  kvm_lock is taken
    in notifiers, e.g. __kvmclock_cpufreq_notifier(), that may be invoked while
    cpu_hotplug_lock is held, e.g. from cpufreq_boost_trigger_state(), and many
    operations need to take cpu_hotplug_lock when loading a vendor module, e.g.
    updating static calls.
+3 −3
Original line number Diff line number Diff line
@@ -2164,7 +2164,7 @@ static void cpu_hyp_uninit(void *discard)
	}
}

int kvm_arch_hardware_enable(void)
int kvm_arch_enable_virtualization_cpu(void)
{
	/*
	 * Most calls to this function are made with migration
@@ -2184,7 +2184,7 @@ int kvm_arch_hardware_enable(void)
	return 0;
}

void kvm_arch_hardware_disable(void)
void kvm_arch_disable_virtualization_cpu(void)
{
	kvm_timer_cpu_down();
	kvm_vgic_cpu_down();
@@ -2380,7 +2380,7 @@ static int __init do_pkvm_init(u32 hyp_va_bits)

	/*
	 * The stub hypercalls are now disabled, so set our local flag to
	 * prevent a later re-init attempt in kvm_arch_hardware_enable().
	 * prevent a later re-init attempt in kvm_arch_enable_virtualization_cpu().
	 */
	__this_cpu_write(kvm_hyp_initialized, 1);
	preempt_enable();
+2 −2
Original line number Diff line number Diff line
@@ -261,7 +261,7 @@ long kvm_arch_dev_ioctl(struct file *filp,
	return -ENOIOCTLCMD;
}

int kvm_arch_hardware_enable(void)
int kvm_arch_enable_virtualization_cpu(void)
{
	unsigned long env, gcfg = 0;

@@ -300,7 +300,7 @@ int kvm_arch_hardware_enable(void)
	return 0;
}

void kvm_arch_hardware_disable(void)
void kvm_arch_disable_virtualization_cpu(void)
{
	write_csr_gcfg(0);
	write_csr_gstat(0);
+2 −2
Original line number Diff line number Diff line
@@ -728,8 +728,8 @@ struct kvm_mips_callbacks {
	int (*handle_fpe)(struct kvm_vcpu *vcpu);
	int (*handle_msa_disabled)(struct kvm_vcpu *vcpu);
	int (*handle_guest_exit)(struct kvm_vcpu *vcpu);
	int (*hardware_enable)(void);
	void (*hardware_disable)(void);
	int (*enable_virtualization_cpu)(void);
	void (*disable_virtualization_cpu)(void);
	int (*check_extension)(struct kvm *kvm, long ext);
	int (*vcpu_init)(struct kvm_vcpu *vcpu);
	void (*vcpu_uninit)(struct kvm_vcpu *vcpu);
Loading