Commit edb0e8f6 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Nested virtualization support for VGICv3, giving the nested
     hypervisor control of the VGIC hardware when running an L2 VM

   - Removal of 'late' nested virtualization feature register masking,
     making the supported feature set directly visible to userspace

   - Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage
     of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers

   - Paravirtual interface for discovering the set of CPU
     implementations where a VM may run, addressing a longstanding issue
     of guest CPU errata awareness in big-little systems and
     cross-implementation VM migration

   - Userspace control of the registers responsible for identifying a
     particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1),
     allowing VMs to be migrated cross-implementation

   - pKVM updates, including support for tracking stage-2 page table
     allocations in the protected hypervisor in the 'SecPageTable' stat

   - Fixes to vPMU, ensuring that userspace updates to the vPMU after
     KVM_RUN are reflected into the backing perf events

  LoongArch:

   - Remove unnecessary header include path

   - Assume constant PGD during VM context switch

   - Add perf events support for guest VM

  RISC-V:

   - Disable the kernel perf counter during configure

   - KVM selftests improvements for PMU

   - Fix warning at the time of KVM module removal

  x86:

   - Add support for aging of SPTEs without holding mmu_lock.

     Not taking mmu_lock allows multiple aging actions to run in
     parallel, and more importantly avoids stalling vCPUs. This includes
     an implementation of per-rmap-entry locking; aging the gfn is done
     with only a per-rmap single-bin spinlock taken, whereas locking an
     rmap for write requires taking both the per-rmap spinlock and the
     mmu_lock.

     Note that this decreases slightly the accuracy of accessed-page
     information, because changes to the SPTE outside aging might not
     use atomic operations even if they could race against a clear of
     the Accessed bit.

     This is deliberate because KVM and mm/ tolerate false
     positives/negatives for accessed information, and testing has shown
     that reducing the latency of aging is far more beneficial to
     overall system performance than providing "perfect" young/old
     information.

   - Defer runtime CPUID updates until KVM emulates a CPUID instruction,
     to coalesce updates when multiple pieces of vCPU state are
     changing, e.g. as part of a nested transition

   - Fix a variety of nested emulation bugs, and add VMX support for
     synthesizing nested VM-Exit on interception (instead of injecting
     #UD into L2)

   - Drop "support" for async page faults for protected guests that do
     not set SEND_ALWAYS (i.e. that only want async page faults at CPL3)

   - Bring a bit of sanity to x86's VM teardown code, which has
     accumulated a lot of cruft over the years. Particularly, destroy
     vCPUs before the MMU, despite the latter being a VM-wide operation

   - Add common secure TSC infrastructure for use within SNP and in the
     future TDX

   - Block KVM_CAP_SYNC_REGS if guest state is protected. It does not
     make sense to use the capability if the relevant registers are not
     available for reading or writing

   - Don't take kvm->lock when iterating over vCPUs in the suspend
     notifier to fix a largely theoretical deadlock

   - Use the vCPU's actual Xen PV clock information when starting the
     Xen timer, as the cached state in arch.hv_clock can be stale/bogus

   - Fix a bug where KVM could bleed PVCLOCK_GUEST_STOPPED across
     different PV clocks; restrict PVCLOCK_GUEST_STOPPED to kvmclock, as
     KVM's suspend notifier only accounts for kvmclock, and there's no
     evidence that the flag is actually supported by Xen guests

   - Clean up the per-vCPU "cache" of its reference pvclock, and instead
     only track the vCPU's TSC scaling (multipler+shift) metadata (which
     is moderately expensive to compute, and rarely changes for modern
     setups)

   - Don't write to the Xen hypercall page on MSR writes that are
     initiated by the host (userspace or KVM) to fix a class of bugs
     where KVM can write to guest memory at unexpected times, e.g.
     during vCPU creation if userspace has set the Xen hypercall MSR
     index to collide with an MSR that KVM emulates

   - Restrict the Xen hypercall MSR index to the unofficial synthetic
     range to reduce the set of possible collisions with MSRs that are
     emulated by KVM (collisions can still happen as KVM emulates
     Hyper-V MSRs, which also reside in the synthetic range)

   - Clean up and optimize KVM's handling of Xen MSR writes and
     xen_hvm_config

   - Update Xen TSC leaves during CPUID emulation instead of modifying
     the CPUID entries when updating PV clocks; there is no guarantee PV
     clocks will be updated between TSC frequency changes and CPUID
     emulation, and guest reads of the TSC leaves should be rare, i.e.
     are not a hot path

  x86 (Intel):

   - Fix a bug where KVM unnecessarily reads XFD_ERR from hardware and
     thus modifies the vCPU's XFD_ERR on a #NM due to CR0.TS=1

   - Pass XFD_ERR as the payload when injecting #NM, as a preparatory
     step for upcoming FRED virtualization support

   - Decouple the EPT entry RWX protection bit macros from the EPT
     Violation bits, both as a general cleanup and in anticipation of
     adding support for emulating Mode-Based Execution Control (MBEC)

   - Reject KVM_RUN if userspace manages to gain control and stuff
     invalid guest state while KVM is in the middle of emulating nested
     VM-Enter

   - Add a macro to handle KVM's sanity checks on entry/exit VMCS
     control pairs in anticipation of adding sanity checks for secondary
     exit controls (the primary field is out of bits)

  x86 (AMD):

   - Ensure the PSP driver is initialized when both the PSP and KVM
     modules are built-in (the initcall framework doesn't handle
     dependencies)

   - Use long-term pins when registering encrypted memory regions, so
     that the pages are migrated out of MIGRATE_CMA/ZONE_MOVABLE and
     don't lead to excessive fragmentation

   - Add macros and helpers for setting GHCB return/error codes

   - Add support for Idle HLT interception, which elides interception if
     the vCPU has a pending, unmasked virtual IRQ when HLT is executed

   - Fix a bug in INVPCID emulation where KVM fails to check for a
     non-canonical address

   - Don't attempt VMRUN for SEV-ES+ guests if the vCPU's VMSA is
     invalid, e.g. because the vCPU was "destroyed" via SNP's AP
     Creation hypercall

   - Reject SNP AP Creation if the requested SEV features for the vCPU
     don't match the VM's configured set of features

  Selftests:

   - Fix again the Intel PMU counters test; add a data load and do
     CLFLUSH{OPT} on the data instead of executing code. The theory is
     that modern Intel CPUs have learned new code prefetching tricks
     that bypass the PMU counters

   - Fix a flaw in the Intel PMU counters test where it asserts that an
     event is counting correctly without actually knowing what the event
     counts on the underlying hardware

   - Fix a variety of flaws, bugs, and false failures/passes
     dirty_log_test, and improve its coverage by collecting all dirty
     entries on each iteration

   - Fix a few minor bugs related to handling of stats FDs

   - Add infrastructure to make vCPU and VM stats FDs available to tests
     by default (open the FDs during VM/vCPU creation)

   - Relax an assertion on the number of HLT exits in the xAPIC IPI test
     when running on a CPU that supports AMD's Idle HLT (which elides
     interception of HLT if a virtual IRQ is pending and unmasked)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (216 commits)
  RISC-V: KVM: Optimize comments in kvm_riscv_vcpu_isa_disable_allowed
  RISC-V: KVM: Teardown riscv specific bits after kvm_exit
  LoongArch: KVM: Register perf callbacks for guest
  LoongArch: KVM: Implement arch-specific functions for guest perf
  LoongArch: KVM: Add stub for kvm_arch_vcpu_preempted_in_kernel()
  LoongArch: KVM: Remove PGD saving during VM context switch
  LoongArch: KVM: Remove unnecessary header include path
  KVM: arm64: Tear down vGIC on failed vCPU creation
  KVM: arm64: PMU: Reload when resetting
  KVM: arm64: PMU: Reload when user modifies registers
  KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs
  KVM: arm64: PMU: Assume PMU presence in pmu-emul.c
  KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
  KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu
  KVM: arm64: Factor out pKVM hyp vcpu creation to separate function
  KVM: arm64: Initialize HCRX_EL2 traps in pKVM
  KVM: arm64: Factor out setting HCRX_EL2 traps into separate function
  KVM: x86: block KVM_CAP_SYNC_REGS if guest state is protected
  KVM: x86: Add infrastructure for secure TSC
  KVM: x86: Push down setting vcpu.arch.user_set_tsc
  ...
parents 27bd3ce4 782f9fea
Loading
Loading
Loading
Loading
+22 −0
Original line number Diff line number Diff line
@@ -1000,6 +1000,10 @@ blobs in userspace. When the guest writes the MSR, kvm copies one
page of a blob (32- or 64-bit, depending on the vcpu mode) to guest
memory.

The MSR index must be in the range [0x40000000, 0x4fffffff], i.e. must reside
in the range that is unofficially reserved for use by hypervisors.  The min/max
values are enumerated via KVM_XEN_MSR_MIN_INDEX and KVM_XEN_MSR_MAX_INDEX.

::

  struct kvm_xen_hvm_config {
@@ -8258,6 +8262,24 @@ KVM exits with the register state of either the L1 or L2 guest
depending on which executed at the time of an exit. Userspace must
take care to differentiate between these cases.

7.37 KVM_CAP_ARM_WRITABLE_IMP_ID_REGS
-------------------------------------

:Architectures: arm64
:Target: VM
:Parameters: None
:Returns: 0 on success, -EINVAL if vCPUs have been created before enabling this
          capability.

This capability changes the behavior of the registers that identify a PE
implementation of the Arm architecture: MIDR_EL1, REVIDR_EL1, and AIDR_EL1.
By default, these registers are visible to userspace but treated as invariant.

When this capability is enabled, KVM allows userspace to change the
aforementioned registers before the first KVM_RUN. These registers are VM
scoped, meaning that the same set of values are presented on all vCPUs in a
given VM.

8. Other capabilities.
======================

+14 −1
Original line number Diff line number Diff line
@@ -116,7 +116,7 @@ The pseudo-firmware bitmap register are as follows:
      ARM DEN0057A.

* KVM_REG_ARM_VENDOR_HYP_BMAP:
    Controls the bitmap of the Vendor specific Hypervisor Service Calls.
    Controls the bitmap of the Vendor specific Hypervisor Service Calls[0-63].

  The following bits are accepted:

@@ -127,6 +127,19 @@ The pseudo-firmware bitmap register are as follows:
    Bit-1: KVM_REG_ARM_VENDOR_HYP_BIT_PTP:
      The bit represents the Precision Time Protocol KVM service.

* KVM_REG_ARM_VENDOR_HYP_BMAP_2:
    Controls the bitmap of the Vendor specific Hypervisor Service Calls[64-127].

  The following bits are accepted:

    Bit-0: KVM_REG_ARM_VENDOR_HYP_BIT_DISCOVER_IMPL_VER
      This represents the ARM_SMCCC_VENDOR_HYP_KVM_DISCOVER_IMPL_VER_FUNC_ID
      function-id. This is reset to 0.

    Bit-1: KVM_REG_ARM_VENDOR_HYP_BIT_DISCOVER_IMPL_CPUS
      This represents the ARM_SMCCC_VENDOR_HYP_KVM_DISCOVER_IMPL_CPUS_FUNC_ID
      function-id. This is reset to 0.

Errors:

    =======  =============================================================
+59 −0
Original line number Diff line number Diff line
@@ -142,3 +142,62 @@ region is equal to the memory protection granule advertised by
|                     |          |    +---------------------------------------------+
|                     |          |    | ``INVALID_PARAMETER (-3)``                  |
+---------------------+----------+----+---------------------------------------------+

``ARM_SMCCC_VENDOR_HYP_KVM_DISCOVER_IMPL_VER_FUNC_ID``
-------------------------------------------------------
Request the target CPU implementation version information and the number of target
implementations for the Guest VM.

+---------------------+-------------------------------------------------------------+
| Presence:           | Optional;  KVM/ARM64 Guests only                            |
+---------------------+-------------------------------------------------------------+
| Calling convention: | HVC64                                                       |
+---------------------+----------+--------------------------------------------------+
| Function ID:        | (uint32) | 0xC6000040                                       |
+---------------------+----------+--------------------------------------------------+
| Arguments:          | None                                                        |
+---------------------+----------+----+---------------------------------------------+
| Return Values:      | (int64)  | R0 | ``SUCCESS (0)``                             |
|                     |          |    +---------------------------------------------+
|                     |          |    | ``NOT_SUPPORTED (-1)``                      |
|                     +----------+----+---------------------------------------------+
|                     | (uint64) | R1 | Bits [63:32] Reserved/Must be zero          |
|                     |          |    +---------------------------------------------+
|                     |          |    | Bits [31:16] Major version                  |
|                     |          |    +---------------------------------------------+
|                     |          |    | Bits [15:0] Minor version                   |
|                     +----------+----+---------------------------------------------+
|                     | (uint64) | R2 | Number of target implementations            |
|                     +----------+----+---------------------------------------------+
|                     | (uint64) | R3 | Reserved / Must be zero                     |
+---------------------+----------+----+---------------------------------------------+

``ARM_SMCCC_VENDOR_HYP_KVM_DISCOVER_IMPL_CPUS_FUNC_ID``
-------------------------------------------------------

Request the target CPU implementation information for the Guest VM. The Guest kernel
will use this information to enable the associated errata.

+---------------------+-------------------------------------------------------------+
| Presence:           | Optional;  KVM/ARM64 Guests only                            |
+---------------------+-------------------------------------------------------------+
| Calling convention: | HVC64                                                       |
+---------------------+----------+--------------------------------------------------+
| Function ID:        | (uint32) | 0xC6000041                                       |
+---------------------+----------+----+---------------------------------------------+
| Arguments:          | (uint64) | R1 | selected implementation index               |
|                     +----------+----+---------------------------------------------+
|                     | (uint64) | R2 | Reserved / Must be zero                     |
|                     +----------+----+---------------------------------------------+
|                     | (uint64) | R3 | Reserved / Must be zero                     |
+---------------------+----------+----+---------------------------------------------+
| Return Values:      | (int64)  | R0 | ``SUCCESS (0)``                             |
|                     |          |    +---------------------------------------------+
|                     |          |    | ``INVALID_PARAMETER (-3)``                  |
|                     +----------+----+---------------------------------------------+
|                     | (uint64) | R1 | MIDR_EL1 of the selected implementation     |
|                     +----------+----+---------------------------------------------+
|                     | (uint64) | R2 | REVIDR_EL1 of the selected implementation   |
|                     +----------+----+---------------------------------------------+
|                     | (uint64) | R3 | AIDR_EL1  of the selected implementation    |
+---------------------+----------+----+---------------------------------------------+
+4 −1
Original line number Diff line number Diff line
@@ -126,7 +126,8 @@ KVM_DEV_ARM_VGIC_GRP_ITS_REGS
ITS Restore Sequence:
---------------------

The following ordering must be followed when restoring the GIC and the ITS:
The following ordering must be followed when restoring the GIC, ITS, and
KVM_IRQFD assignments:

a) restore all guest memory and create vcpus
b) restore all redistributors
@@ -139,6 +140,8 @@ d) restore the ITS in the following order:
     3. Load the ITS table data (KVM_DEV_ARM_ITS_RESTORE_TABLES)
     4. Restore GITS_CTLR

e) restore KVM_IRQFD assignments for MSIs

Then vcpus can be started.

ITS Table ABI REV0:
+11 −1
Original line number Diff line number Diff line
@@ -291,8 +291,18 @@ Groups:
      |    Aff3    |    Aff2    |    Aff1    |    Aff0    |

  Errors:

    =======  =============================================
    -EINVAL  vINTID is not multiple of 32 or info field is
	     not VGIC_LEVEL_INFO_LINE_LEVEL
    =======  =============================================

  KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ
   Attributes:

    The attr field of kvm_device_attr encodes the following values:

      bits:     | 31   ....    5 | 4  ....  0 |
      values:   |      RES0      |   vINTID   |

    The vINTID specifies which interrupt is generated when the vGIC
    must generate a maintenance interrupt. This must be a PPI.
Loading