Commit 4215ee0d authored by Paolo Bonzini's avatar Paolo Bonzini
Browse files

Merge tag 'kvm-x86-svm-6.20' of https://github.com/kvm-x86/linux into HEAD

KVM SVM changes for 6.20

 - Drop a user-triggerable WARN on nested_svm_load_cr3() failure.

 - Add support for virtualizing ERAPS.  Note, correct virtualization of ERAPS
   relies on an upcoming, publicly announced change in the APM to reduce the
   set of conditions where hardware (i.e. KVM) *must* flush the RAP.

 - Ignore nSVM intercepts for instructions that are not supported according to
   L1's virtual CPU model.

 - Add support for expedited writes to the fast MMIO bus, a la VMX's fastpath
   for EPT Misconfig.

 - Don't set GIF when clearing EFER.SVME, as GIF exists independently of SVM,
   and allow userspace to restore nested state with GIF=0.

 - Treat exit_code as an unsigned 64-bit value through all of KVM.

 - Add support for fetching SNP certificates from userspace.

 - Fix a bug where KVM would use vmcb02 instead of vmcb01 when emulating VMLOAD
   or VMSAVE on behalf of L2.

 - Misc fixes and cleanups.
parents 687603fb 20c3c410
Loading
Loading
Loading
Loading
+44 −0
Original line number Diff line number Diff line
@@ -7382,6 +7382,50 @@ Please note that the kernel is allowed to use the kvm_run structure as the
primary storage for certain register types. Therefore, the kernel may use the
values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.

::

		/* KVM_EXIT_SNP_REQ_CERTS */
		struct kvm_exit_snp_req_certs {
			__u64 gpa;
			__u64 npages;
			__u64 ret;
		};

KVM_EXIT_SNP_REQ_CERTS indicates an SEV-SNP guest with certificate-fetching
enabled (see KVM_SEV_SNP_ENABLE_REQ_CERTS) has generated an Extended Guest
Request NAE #VMGEXIT (SNP_GUEST_REQUEST) with message type MSG_REPORT_REQ,
i.e. has requested an attestation report from firmware, and would like the
certificate data corresponding to the attestation report signature to be
provided by the hypervisor as part of the request.

To allow for userspace to provide the certificate, the 'gpa' and 'npages'
are forwarded verbatim from the guest request (the RAX and RBX GHCB fields
respectively).  'ret' is not an "output" from KVM, and is always '0' on
exit.  KVM verifies the 'gpa' is 4KiB aligned prior to exiting to userspace,
but otherwise the information from the guest isn't validated.

Upon the next KVM_RUN, e.g. after userspace has serviced the request (or not),
KVM will complete the #VMGEXIT, using the 'ret' field to determine whether to
signal success or failure to the guest, and on failure, what reason code will
be communicated via SW_EXITINFO2.  If 'ret' is set to an unsupported value (see
the table below), KVM_RUN will fail with -EINVAL.  For a 'ret' of 'ENOSPC', KVM
also consumes the 'npages' field, i.e. userspace can use the field to inform
the guest of the number of pages needed to hold all the certificate data.

The supported 'ret' values and their respective SW_EXITINFO2 encodings:

  ======     =============================================================
  0          0x0, i.e. success.  KVM will emit an SNP_GUEST_REQUEST command
             to SNP firmware.
  ENOSPC     0x0000000100000000, i.e. not enough guest pages to hold the
             certificate table and certificate data.  KVM will also set the
             RBX field in the GHBC to 'npages'.
  EAGAIN     0x0000000200000000, i.e. the host is busy and the guest should
             retry the request.
  EIO        0xffffffff00000000, for all other errors (this return code is
             a KVM-defined hypervisor value, as allowed by the GHCB)
  ======     =============================================================


.. _cap_enable:

+51 −1
Original line number Diff line number Diff line
@@ -572,6 +572,52 @@ Returns: 0 on success, -negative on error
See SNP_LAUNCH_FINISH in the SEV-SNP specification [snp-fw-abi]_ for further
details on the input parameters in ``struct kvm_sev_snp_launch_finish``.

21. KVM_SEV_SNP_ENABLE_REQ_CERTS
--------------------------------

The KVM_SEV_SNP_ENABLE_REQ_CERTS command will configure KVM to exit to
userspace with a ``KVM_EXIT_SNP_REQ_CERTS`` exit type as part of handling
a guest attestation report, which will to allow userspace to provide a
certificate corresponding to the endorsement key used by firmware to sign
that attestation report.

Returns: 0 on success, -negative on error

NOTE: The endorsement key used by firmware may change as a result of
management activities like updating SEV-SNP firmware or loading new
endorsement keys, so some care should be taken to keep the returned
certificate data in sync with the actual endorsement key in use by
firmware at the time the attestation request is sent to SNP firmware. The
recommended scheme to do this is to use file locking (e.g. via fcntl()'s
F_OFD_SETLK) in the following manner:

  - Prior to obtaining/providing certificate data as part of servicing an
    exit type of ``KVM_EXIT_SNP_REQ_CERTS``, the VMM should obtain a
    shared/read or exclusive/write lock on the certificate blob file before
    reading it and returning it to KVM, and continue to hold the lock until
    the attestation request is actually sent to firmware. To facilitate
    this, the VMM can set the ``immediate_exit`` flag of kvm_run just after
    supplying the certificate data, and just before resuming the vCPU.
    This will ensure the vCPU will exit again to userspace with ``-EINTR``
    after it finishes fetching the attestation request from firmware, at
    which point the VMM can safely drop the file lock.

  - Tools/libraries that perform updates to SNP firmware TCB values or
    endorsement keys (e.g. via /dev/sev interfaces such as ``SNP_COMMIT``,
    ``SNP_SET_CONFIG``, or ``SNP_VLEK_LOAD``, see
    Documentation/virt/coco/sev-guest.rst for more details) in such a way
    that the certificate blob needs to be updated, should similarly take an
    exclusive lock on the certificate blob for the duration of any updates
    to endorsement keys or the certificate blob contents to ensure that
    VMMs using the above scheme will not return certificate blob data that
    is out of sync with the endorsement key used by firmware at the time
    the attestation request is actually issued.

This scheme is recommended so that tools can use a fairly generic/natural
approach to synchronizing firmware/certificate updates via file-locking,
which should make it easier to maintain interoperability across
tools/VMMs/vendors.

Device attribute API
====================

@@ -579,11 +625,15 @@ Attributes of the SEV implementation can be retrieved through the
``KVM_HAS_DEVICE_ATTR`` and ``KVM_GET_DEVICE_ATTR`` ioctls on the ``/dev/kvm``
device node, using group ``KVM_X86_GRP_SEV``.

Currently only one attribute is implemented:
The following attributes are currently implemented:

* ``KVM_X86_SEV_VMSA_FEATURES``: return the set of all bits that
  are accepted in the ``vmsa_features`` of ``KVM_SEV_INIT2``.

* ``KVM_X86_SEV_SNP_REQ_CERTS``: return a value of 1 if the kernel supports the
  ``KVM_EXIT_SNP_REQ_CERTS`` exit, which allows for fetching endorsement key
  certificates from userspace for each SNP attestation request the guest issues.

Firmware Management
===================

+1 −0
Original line number Diff line number Diff line
@@ -472,6 +472,7 @@
#define X86_FEATURE_GP_ON_USER_CPUID	(20*32+17) /* User CPUID faulting */

#define X86_FEATURE_PREFETCHI		(20*32+20) /* Prefetch Data/Instruction to Cache Level */
#define X86_FEATURE_ERAPS		(20*32+24) /* Enhanced Return Address Predictor Security */
#define X86_FEATURE_SBPB		(20*32+27) /* Selective Branch Prediction Barrier */
#define X86_FEATURE_IBPB_BRTYPE		(20*32+28) /* MSR_PRED_CMD[IBPB] flushes all branch type predictions */
#define X86_FEATURE_SRSO_NO		(20*32+29) /* CPU is not affected by SRSO */
+8 −0
Original line number Diff line number Diff line
@@ -195,7 +195,15 @@ enum kvm_reg {

	VCPU_EXREG_PDPTR = NR_VCPU_REGS,
	VCPU_EXREG_CR0,
	/*
	 * Alias AMD's ERAPS (not a real register) to CR3 so that common code
	 * can trigger emulation of the RAP (Return Address Predictor) with
	 * minimal support required in common code.  Piggyback CR3 as the RAP
	 * is cleared on writes to CR3, i.e. marking CR3 dirty will naturally
	 * mark ERAPS dirty as well.
	 */
	VCPU_EXREG_CR3,
	VCPU_EXREG_ERAPS = VCPU_EXREG_CR3,
	VCPU_EXREG_CR4,
	VCPU_EXREG_RFLAGS,
	VCPU_EXREG_SEGMENTS,
+6 −3
Original line number Diff line number Diff line
@@ -131,13 +131,13 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
	u64 tsc_offset;
	u32 asid;
	u8 tlb_ctl;
	u8 reserved_2[3];
	u8 erap_ctl;
	u8 reserved_2[2];
	u32 int_ctl;
	u32 int_vector;
	u32 int_state;
	u8 reserved_3[4];
	u32 exit_code;
	u32 exit_code_hi;
	u64 exit_code;
	u64 exit_info_1;
	u64 exit_info_2;
	u32 exit_int_info;
@@ -182,6 +182,9 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
#define TLB_CONTROL_FLUSH_ASID 3
#define TLB_CONTROL_FLUSH_ASID_LOCAL 7

#define ERAP_CONTROL_ALLOW_LARGER_RAP BIT(0)
#define ERAP_CONTROL_CLEAR_RAP BIT(1)

#define V_TPR_MASK 0x0f

#define V_IRQ_SHIFT 8
Loading