Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm (2c9b3512) · Commits · git / linux-net

Documentation/admin-guide/kernel-parameters.txt

+21 −3

Original line number	Diff line number	Diff line
		@@ -2722,6 +2722,24 @@
		[KVM,ARM,EARLY] Allow use of GICv4 for direct
		injection of LPIs.

		kvm-arm.wfe_trap_policy=
		[KVM,ARM] Control when to set WFE instruction trap for
		KVM VMs. Traps are allowed but not guaranteed by the
		CPU architecture.

		trap: set WFE instruction trap

		notrap: clear WFE instruction trap

		kvm-arm.wfi_trap_policy=
		[KVM,ARM] Control when to set WFI instruction trap for
		KVM VMs. Traps are allowed but not guaranteed by the
		CPU architecture.

		trap: set WFI instruction trap

		notrap: clear WFI instruction trap

		kvm_cma_resv_ratio=n [PPC,EARLY]
		Reserves given percentage from system memory area for
		contiguous memory allocation for KVM hash pagetable
		@@ -4036,9 +4054,9 @@
		prediction) vulnerability. System may allow data
		leaks with this option.

		no-steal-acc [X86,PV_OPS,ARM64,PPC/PSERIES,RISCV,EARLY] Disable
		paravirtualized steal time accounting. steal time is
		computed, but won't influence scheduler behaviour
		no-steal-acc [X86,PV_OPS,ARM64,PPC/PSERIES,RISCV,LOONGARCH,EARLY]
		Disable paravirtualized steal time accounting. steal time
		is computed, but won't influence scheduler behaviour

		nosync [HW,M68K] Disables sync negotiation for all devices.

Documentation/virt/coco/sev-guest.rst

+19 −0

Original line number	Diff line number	Diff line
		@@ -176,6 +176,25 @@ to SNP_CONFIG command defined in the SEV-SNP spec. The current values of
		the firmware parameters affected by this command can be queried via
		SNP_PLATFORM_STATUS.

		2.7 SNP_VLEK_LOAD
		-----------------
		:Technology: sev-snp
		:Type: hypervisor ioctl cmd
		:Parameters (in): struct sev_user_data_snp_vlek_load
		:Returns (out): 0 on success, -negative on error

		When requesting an attestation report a guest is able to specify whether
		it wants SNP firmware to sign the report using either a Versioned Chip
		Endorsement Key (VCEK), which is derived from chip-unique secrets, or a
		Versioned Loaded Endorsement Key (VLEK) which is obtained from an AMD
		Key Derivation Service (KDS) and derived from seeds allocated to
		enrolled cloud service providers.

		In the case of VLEK keys, the SNP_VLEK_LOAD SNP command is used to load
		them into the system after obtaining them from the KDS, and corresponds
		closely to the SNP_VLEK_LOAD firmware command specified in the SEV-SNP
		spec.

		3. SEV-SNP CPUID Enforcement
		============================

Documentation/virt/kvm/api.rst

+138 −31

Original line number	Diff line number	Diff line
		@@ -891,12 +891,12 @@ like this::

		The irq_type field has the following values:

		- irq_type[0]:
		- KVM_ARM_IRQ_TYPE_CPU:
		out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ
		- irq_type[1]:
		- KVM_ARM_IRQ_TYPE_SPI:
		in-kernel GIC: SPI, irq_id between 32 and 1019 (incl.)
		(the vcpu_index field is ignored)
		- irq_type[2]:
		- KVM_ARM_IRQ_TYPE_PPI:
		in-kernel GIC: PPI, irq_id between 16 and 31 (incl.)

		(The irq_id field thus corresponds nicely to the IRQ ID in the ARM GIC specs)
		@@ -1403,6 +1403,12 @@ Instead, an abort (data abort if the cause of the page-table update
		was a load or a store, instruction abort if it was an instruction
		fetch) is injected in the guest.

		S390:
		^^^^^

		Returns -EINVAL if the VM has the KVM_VM_S390_UCONTROL flag set.
		Returns -EINVAL if called on a protected VM.

		4.36 KVM_SET_TSS_ADDR
		---------------------

		@@ -1921,7 +1927,7 @@ flags:

		If KVM_MSI_VALID_DEVID is set, devid contains a unique device identifier
		for the device that wrote the MSI message. For PCI, this is usually a
		BFD identifier in the lower 16 bits.
		BDF identifier in the lower 16 bits.

		On x86, address_hi is ignored unless the KVM_X2APIC_API_USE_32BIT_IDS
		feature of KVM_CAP_X2APIC_API capability is enabled. If it is enabled,
		@@ -2989,7 +2995,7 @@ flags:

		If KVM_MSI_VALID_DEVID is set, devid contains a unique device identifier
		for the device that wrote the MSI message. For PCI, this is usually a
		BFD identifier in the lower 16 bits.
		BDF identifier in the lower 16 bits.

		On x86, address_hi is ignored unless the KVM_X2APIC_API_USE_32BIT_IDS
		feature of KVM_CAP_X2APIC_API capability is enabled. If it is enabled,
		@@ -6276,6 +6282,12 @@ state. At VM creation time, all memory is shared, i.e. the PRIVATE attribute
		is '0' for all gfns. Userspace can control whether memory is shared/private by
		toggling KVM_MEMORY_ATTRIBUTE_PRIVATE via KVM_SET_MEMORY_ATTRIBUTES as needed.

		S390:
		^^^^^

		Returns -EINVAL if the VM has the KVM_VM_S390_UCONTROL flag set.
		Returns -EINVAL if called on a protected VM.

		4.141 KVM_SET_MEMORY_ATTRIBUTES
		-------------------------------

		@@ -6355,6 +6367,61 @@ a single guest_memfd file, but the bound ranges must not overlap).

		See KVM_SET_USER_MEMORY_REGION2 for additional details.

		4.143 KVM_PRE_FAULT_MEMORY
		------------------------

		:Capability: KVM_CAP_PRE_FAULT_MEMORY
		:Architectures: none
		:Type: vcpu ioctl
		:Parameters: struct kvm_pre_fault_memory (in/out)
		:Returns: 0 if at least one page is processed, < 0 on error

		Errors:

		========== ===============================================================
		EINVAL The specified `gpa` and `size` were invalid (e.g. not
		page aligned, causes an overflow, or size is zero).
		ENOENT The specified `gpa` is outside defined memslots.
		EINTR An unmasked signal is pending and no page was processed.
		EFAULT The parameter address was invalid.
		EOPNOTSUPP Mapping memory for a GPA is unsupported by the
		hypervisor, and/or for the current vCPU state/mode.
		EIO unexpected error conditions (also causes a WARN)
		========== ===============================================================

		::

		struct kvm_pre_fault_memory {
		/* in/out */
		__u64 gpa;
		__u64 size;
		/* in */
		__u64 flags;
		__u64 padding[5];
		};

		KVM_PRE_FAULT_MEMORY populates KVM's stage-2 page tables used to map memory
		for the current vCPU state. KVM maps memory as if the vCPU generated a
		stage-2 read page fault, e.g. faults in memory as needed, but doesn't break
		CoW. However, KVM does not mark any newly created stage-2 PTE as Accessed.

		In some cases, multiple vCPUs might share the page tables. In this
		case, the ioctl can be called in parallel.

		When the ioctl returns, the input values are updated to point to the
		remaining range. If `size` > 0 on return, the caller can just issue
		the ioctl again with the same `struct kvm_map_memory` argument.

		Shadow page tables cannot support this ioctl because they
		are indexed by virtual address or nested guest physical address.
		Calling this ioctl when the guest is using shadow page tables (for
		example because it is running a nested guest with nested page tables)
		will fail with `EOPNOTSUPP` even if `KVM_CHECK_EXTENSION` reports
		the capability to be present.

		`flags` must currently be zero.


		5. The kvm_run structure
		========================

		@@ -6421,7 +6488,10 @@ affect the device's behavior. Current defined flags::
		/* x86, set if the VCPU is in system management mode */
		#define KVM_RUN_X86_SMM (1 << 0)
		/* x86, set if bus lock detected in VM */
		#define KVM_RUN_BUS_LOCK (1 << 1)
		#define KVM_RUN_X86_BUS_LOCK (1 << 1)
		/* x86, set if the VCPU is executing a nested (L2) guest */
		#define KVM_RUN_X86_GUEST_MODE (1 << 2)

		/* arm64, set for KVM_EXIT_DEBUG */
		#define KVM_DEBUG_ARCH_HSR_HIGH_VALID (1 << 0)

		@@ -7767,29 +7837,31 @@ Valid bits in args[0] are::
		#define KVM_BUS_LOCK_DETECTION_OFF (1 << 0)
		#define KVM_BUS_LOCK_DETECTION_EXIT (1 << 1)

		Enabling this capability on a VM provides userspace with a way to select
		a policy to handle the bus locks detected in guest. Userspace can obtain
		the supported modes from the result of KVM_CHECK_EXTENSION and define it
		through the KVM_ENABLE_CAP.
		Enabling this capability on a VM provides userspace with a way to select a
		policy to handle the bus locks detected in guest. Userspace can obtain the
		supported modes from the result of KVM_CHECK_EXTENSION and define it through
		the KVM_ENABLE_CAP. The supported modes are mutually-exclusive.

		KVM_BUS_LOCK_DETECTION_OFF and KVM_BUS_LOCK_DETECTION_EXIT are supported
		currently and mutually exclusive with each other. More bits can be added in
		the future.
		This capability allows userspace to force VM exits on bus locks detected in the
		guest, irrespective whether or not the host has enabled split-lock detection
		(which triggers an #AC exception that KVM intercepts). This capability is
		intended to mitigate attacks where a malicious/buggy guest can exploit bus
		locks to degrade the performance of the whole system.

		With KVM_BUS_LOCK_DETECTION_OFF set, bus locks in guest will not cause vm exits
		so that no additional actions are needed. This is the default mode.
		If KVM_BUS_LOCK_DETECTION_OFF is set, KVM doesn't force guest bus locks to VM
		exit, although the host kernel's split-lock #AC detection still applies, if
		enabled.

		With KVM_BUS_LOCK_DETECTION_EXIT set, vm exits happen when bus lock detected
		in VM. KVM just exits to userspace when handling them. Userspace can enforce
		its own throttling or other policy based mitigations.
		If KVM_BUS_LOCK_DETECTION_EXIT is set, KVM enables a CPU feature that ensures
		bus locks in the guest trigger a VM exit, and KVM exits to userspace for all
		such VM exits, e.g. to allow userspace to throttle the offending guest and/or
		apply some other policy-based mitigation. When exiting to userspace, KVM sets
		KVM_RUN_X86_BUS_LOCK in vcpu-run->flags, and conditionally sets the exit_reason
		to KVM_EXIT_X86_BUS_LOCK.

		This capability is aimed to address the thread that VM can exploit bus locks to
		degree the performance of the whole system. Once the userspace enable this
		capability and select the KVM_BUS_LOCK_DETECTION_EXIT mode, KVM will set the
		KVM_RUN_BUS_LOCK flag in vcpu-run->flags field and exit to userspace. Concerning
		the bus lock vm exit can be preempted by a higher priority VM exit, the exit
		notifications to userspace can be KVM_EXIT_BUS_LOCK or other reasons.
		KVM_RUN_BUS_LOCK flag is used to distinguish between them.
		Note! Detected bus locks may be coincident with other exits to userspace, i.e.
		KVM_RUN_X86_BUS_LOCK should be checked regardless of the primary exit reason if
		userspace wants to take action on all detected bus locks.

		7.23 KVM_CAP_PPC_DAWR1
		----------------------
		@@ -7905,10 +7977,10 @@ perform a bulk copy of tags to/from the guest.
		7.29 KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM
		-------------------------------------

		Architectures: x86 SEV enabled
		Type: vm
		Parameters: args[0] is the fd of the source vm
		Returns: 0 on success
		:Architectures: x86 SEV enabled
		:Type: vm
		:Parameters: args[0] is the fd of the source vm
		:Returns: 0 on success

		This capability enables userspace to migrate the encryption context from the VM
		indicated by the fd to the VM this is called on.
		@@ -7956,7 +8028,11 @@ The valid bits in cap.args[0] are:
		When this quirk is disabled, the reset value
		is 0x10000 (APIC_LVT_MASKED).

		KVM_X86_QUIRK_CD_NW_CLEARED By default, KVM clears CR0.CD and CR0.NW.
		KVM_X86_QUIRK_CD_NW_CLEARED By default, KVM clears CR0.CD and CR0.NW on
		AMD CPUs to workaround buggy guest firmware
		that runs in perpetuity with CR0.CD, i.e.
		with caches in "no fill" mode.

		When this quirk is disabled, KVM does not
		change the value of CR0.CD and CR0.NW.

		@@ -8073,6 +8149,37 @@ error/annotated fault.

		See KVM_EXIT_MEMORY_FAULT for more information.

		7.35 KVM_CAP_X86_APIC_BUS_CYCLES_NS
		-----------------------------------

		:Architectures: x86
		:Target: VM
		:Parameters: args[0] is the desired APIC bus clock rate, in nanoseconds
		:Returns: 0 on success, -EINVAL if args[0] contains an invalid value for the
		frequency or if any vCPUs have been created, -ENXIO if a virtual
		local APIC has not been created using KVM_CREATE_IRQCHIP.

		This capability sets the VM's APIC bus clock frequency, used by KVM's in-kernel
		virtual APIC when emulating APIC timers. KVM's default value can be retrieved
		by KVM_CHECK_EXTENSION.

		Note: Userspace is responsible for correctly configuring CPUID 0x15, a.k.a. the
		core crystal clock frequency, if a non-zero CPUID 0x15 is exposed to the guest.

		7.36 KVM_CAP_X86_GUEST_MODE
		------------------------------

		:Architectures: x86
		:Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP.

		The presence of this capability indicates that KVM_RUN will update the
		KVM_RUN_X86_GUEST_MODE bit in kvm_run.flags to indicate whether the
		vCPU was executing nested guest code when it exited.

		KVM exits with the register state of either the L1 or L2 guest
		depending on which executed at the time of an exit. Userspace must
		take care to differentiate between these cases.

		8. Other capabilities.
		======================

Documentation/virt/kvm/devices/arm-vgic.rst

+1 −1

Original line number	Diff line number	Diff line
		@@ -31,7 +31,7 @@ Groups:
		KVM_VGIC_V2_ADDR_TYPE_CPU (rw, 64-bit)
		Base address in the guest physical address space of the GIC virtual cpu
		interface register mappings. Only valid for KVM_DEV_TYPE_ARM_VGIC_V2.
		This address needs to be 4K aligned and the region covers 4 KByte.
		This address needs to be 4K aligned and the region covers 8 KByte.

		Errors:

Documentation/virt/kvm/halt-polling.rst

+6 −6

Original line number	Diff line number	Diff line
		@@ -79,11 +79,11 @@ adjustment of the polling interval.
		Module Parameters
		=================

		The kvm module has 3 tuneable module parameters to adjust the global max
		polling interval as well as the rate at which the polling interval is grown and
		shrunk. These variables are defined in include/linux/kvm_host.h and as module
		parameters in virt/kvm/kvm_main.c, or arch/powerpc/kvm/book3s_hv.c in the
		powerpc kvm-hv case.
		The kvm module has 4 tunable module parameters to adjust the global max polling
		interval, the initial value (to grow from 0), and the rate at which the polling
		interval is grown and shrunk. These variables are defined in
		include/linux/kvm_host.h and as module parameters in virt/kvm/kvm_main.c, or
		arch/powerpc/kvm/book3s_hv.c in the powerpc kvm-hv case.

		+-----------------------+---------------------------+-------------------------+
		\|Module Parameter \| Description \| Default Value \|
		@@ -105,7 +105,7 @@ powerpc kvm-hv case.
		\| \| grow_halt_poll_ns() \| \|
		\| \| function. \| \|
		+-----------------------+---------------------------+-------------------------+
		\|halt_poll_ns_shrink \| The value by which the \| 0 \|
		\|halt_poll_ns_shrink \| The value by which the \| 2 \|
		\| \| halt polling interval is \| \|
		\| \| divided in the \| \|
		\| \| shrink_halt_poll_ns() \| \|