Commit 11e8c7e9 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull kvm fixes from Paolo Bonzini:
 "Quite a large pull request, partly due to skipping last week and
  therefore having material from ~all submaintainers in this one. About
  a fourth of it is a new selftest, and a couple more changes are large
  in number of files touched (fixing a -Wflex-array-member-not-at-end
  compiler warning) or lines changed (reformatting of a table in the API
  documentation, thanks rST).

  But who am I kidding---it's a lot of commits and there are a lot of
  bugs being fixed here, some of them on the nastier side like the
  RISC-V ones.

  ARM:

   - Correctly handle deactivation of interrupts that were activated
     from LRs. Since EOIcount only denotes deactivation of interrupts
     that are not present in an LR, start EOIcount deactivation walk
     *after* the last irq that made it into an LR

   - Avoid calling into the stubs to probe for ICH_VTR_EL2.TDS when pKVM
     is already enabled -- not only thhis isn't possible (pKVM will
     reject the call), but it is also useless: this can only happen for
     a CPU that has already booted once, and the capability will not
     change

   - Fix a couple of low-severity bugs in our S2 fault handling path,
     affecting the recently introduced LS64 handling and the even more
     esoteric handling of hwpoison in a nested context

   - Address yet another syzkaller finding in the vgic initialisation,
     where we would end-up destroying an uninitialised vgic with nasty
     consequences

   - Address an annoying case of pKVM failing to boot when some of the
     memblock regions that the host is faulting in are not page-aligned

   - Inject some sanity in the NV stage-2 walker by checking the limits
     against the advertised PA size, and correctly report the resulting
     faults

  PPC:

   - Fix a PPC e500 build error due to a long-standing wart that was
     exposed by the recent conversion to kmalloc_obj(); rip out all the
     ugliness that led to the wart

  RISC-V:

   - Prevent speculative out-of-bounds access using array_index_nospec()
     in APLIC interrupt handling, ONE_REG regiser access, AIA CSR
     access, float register access, and PMU counter access

   - Fix potential use-after-free issues in kvm_riscv_gstage_get_leaf(),
     kvm_riscv_aia_aplic_has_attr(), and kvm_riscv_aia_imsic_has_attr()

   - Fix potential null pointer dereference in
     kvm_riscv_vcpu_aia_rmw_topei()

   - Fix off-by-one array access in SBI PMU

   - Skip THP support check during dirty logging

   - Fix error code returned for Smstateen and Ssaia ONE_REG interface

   - Check host Ssaia extension when creating AIA irqchip

  x86:

   - Fix cases where CPUID mitigation features were incorrectly marked
     as available whenever the kernel used scattered feature words for
     them

   - Validate _all_ GVAs, rather than just the first GVA, when
     processing a range of GVAs for Hyper-V's TLB flush hypercalls

   - Fix a brown paper bug in add_atomic_switch_msr()

   - Use hlist_for_each_entry_srcu() when traversing mask_notifier_list,
     to fix a lockdep warning; KVM doesn't hold RCU, just irq_srcu

   - Ensure AVIC VMCB fields are initialized if the VM has an in-kernel
     local APIC (and AVIC is enabled at the module level)

   - Update CR8 write interception when AVIC is (de)activated, to fix a
     bug where the guest can run in perpetuity with the CR8 intercept
     enabled

   - Add a quirk to skip the consistency check on FREEZE_IN_SMM, i.e. to
     allow L1 hypervisors to set FREEZE_IN_SMM. This reverts (by
     default) an unintentional tightening of userspace ABI in 6.17, and
     provides some amount of backwards compatibility with hypervisors
     who want to freeze PMCs on VM-Entry

   - Validate the VMCS/VMCB on return to a nested guest from SMM,
     because either userspace or the guest could stash invalid values in
     memory and trigger the processor's consistency checks

  Generic:

   - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from
     being unnecessary and confusing, triggered compiler warnings due to
     -Wflex-array-member-not-at-end

   - Document that vcpu->mutex is take outside of kvm->slots_lock and
     kvm->slots_arch_lock, which is intentional and desirable despite
     being rather unintuitive

  Selftests:

   - Increase the maximum number of NUMA nodes in the guest_memfd
     selftest to 64 (from 8)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits)
  KVM: selftests: Verify SEV+ guests can read and write EFER, CR0, CR4, and CR8
  Documentation: kvm: fix formatting of the quirks table
  KVM: x86: clarify leave_smm() return value
  selftests: kvm: add a test that VMX validates controls on RSM
  selftests: kvm: extract common functionality out of smm_test.c
  KVM: SVM: check validity of VMCB controls when returning from SMM
  KVM: VMX: check validity of VMCS controls when returning from SMM
  KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
  KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC
  KVM: x86: Introduce KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM
  KVM: x86: Fix SRCU list traversal in kvm_fire_mask_notifiers()
  KVM: VMX: Fix a wrong MSR update in add_atomic_switch_msr()
  KVM: x86: hyper-v: Validate all GVAs during PV TLB flush
  KVM: x86: synthesize CPUID bits only if CPU capability is set
  KVM: PPC: e500: Rip out "struct tlbe_ref"
  KVM: PPC: e500: Fix build error due to using kmalloc_obj() with wrong type
  KVM: selftests: Increase 'maxnode' for guest_memfd tests
  KVM: arm64: pkvm: Don't reprobe for ICH_VTR_EL2.TDS on CPU hotplug
  KVM: arm64: vgic: Pick EOIcount deactivations from AP-list tail
  KVM: arm64: Remove the redundant ISB in __kvm_at_s1e2()
  ...
parents 4f3df2e5 d2ea4ff1
Loading
Loading
Loading
Loading
+117 −109
Original line number Diff line number Diff line
@@ -8435,7 +8435,7 @@ KVM_CHECK_EXTENSION.

The valid bits in cap.args[0] are:

=================================== ============================================
========================================   ================================================
KVM_X86_QUIRK_LINT0_REENABLED              By default, the reset value for the LVT
                                           LINT0 register is 0x700 (APIC_MODE_EXTINT).
                                           When this quirk is disabled, the reset value
@@ -8543,7 +8543,15 @@ KVM_X86_QUIRK_IGNORE_GUEST_PAT By default, on Intel platforms, KVM ignores
                                           guest software, for example if it does not
                                           expose a bochs graphics device (which is
                                           known to have had a buggy driver).
=================================== ============================================

KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM   By default, KVM relaxes the consistency
                                           check for GUEST_IA32_DEBUGCTL in vmcs12
                                           to allow FREEZE_IN_SMM to be set.  When
                                           this quirk is disabled, KVM requires this
                                           bit to be cleared.  Note that the vmcs02
                                           bit is still completely controlled by the
                                           host, regardless of the quirk setting.
========================================   ================================================

7.32 KVM_CAP_MAX_VCPU_ID
------------------------
+2 −0
Original line number Diff line number Diff line
@@ -17,6 +17,8 @@ The acquisition orders for mutexes are as follows:

- kvm->lock is taken outside kvm->slots_lock and kvm->irq_lock

- vcpu->mutex is taken outside kvm->slots_lock and kvm->slots_arch_lock

- kvm->slots_lock is taken outside kvm->irq_lock, though acquiring
  them together is quite rare.

+3 −0
Original line number Diff line number Diff line
@@ -784,6 +784,9 @@ struct kvm_host_data {
	/* Number of debug breakpoints/watchpoints for this CPU (minus 1) */
	unsigned int debug_brps;
	unsigned int debug_wrps;

	/* Last vgic_irq part of the AP list recorded in an LR */
	struct vgic_irq *last_lr_irq;
};

struct kvm_host_psci_config {
+9 −0
Original line number Diff line number Diff line
@@ -2345,6 +2345,15 @@ static bool can_trap_icv_dir_el1(const struct arm64_cpu_capabilities *entry,
	    !is_midr_in_range_list(has_vgic_v3))
		return false;

	/*
	 * pKVM prevents late onlining of CPUs. This means that whatever
	 * state the capability is in after deprivilege cannot be affected
	 * by a new CPU booting -- this is garanteed to be a CPU we have
	 * already seen, and the cap is therefore unchanged.
	 */
	if (system_capabilities_finalized() && is_protected_kvm_enabled())
		return cpus_have_final_cap(ARM64_HAS_ICH_HCR_EL2_TDIR);

	if (is_kernel_in_hyp_mode())
		res.a1 = read_sysreg_s(SYS_ICH_VTR_EL2);
	else
+0 −2
Original line number Diff line number Diff line
@@ -1504,8 +1504,6 @@ int __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
			fail = true;
		}

		isb();

		if (!fail)
			par = read_sysreg_par();

Loading