Commit 9913212b authored by Paolo Bonzini's avatar Paolo Bonzini
Browse files

Merge branch 'kvm-tdx-interrupts' into HEAD

Introduces support for interrupt handling for TDX guests, including
virtual interrupt injection and VM-Exits caused by vectored events.

Injection
=========

TDX supports non-NMI interrupt injection only by posted interrupt. Posted
interrupt descriptors (PIDs) are allocated in shared memory, KVM
can update them directly. To post pending interrupts in the PID, KVM
can generate a self-IPI with notification vector prior to TD entry.
TDX guest status is protected, KVM can't get the interrupt status of
TDX guest. For now, assume the interrupt is always allowed. A later
patch set will let TDX guests to call TDVMCALL with HLT, which passes
the interrupt block flag, so that whether interrupt is allowed in HLT
will checked against the interrupt block flag.

For NMIs, KVM can request the TDX module to inject a NMI into a TDX vCPU
by setting the PEND_NMI TDVPS field to 1. Following that, KVM can call
TDH.VP.ENTER to run the vCPU and the TDX module will attempt to inject
the NMI as soon as possible.  PEND_NMI TDVPS field is a 1-bit filed,
i.e. KVM can only pend one NMI in the TDX module. Also, TDX doesn't
allow KVM to request NMI-window exit directly. When there is already
one NMI pending in the TDX module, i.e. it has not been delivered to
TDX guest yet, if there is NMI pending in KVM, collapse the pending
NMI in KVM into the one pending in the TDX module.  Such collapse is OK
considering on X86 bare metal, multiple NMIs could collapse into one NMI,
e.g. when NMI is blocked by SMI.  It's OS's responsibility to poll all
NMI sources in the NMI handler to avoid missing handling of some NMI
events. More details can be found in the changelog of the patch "KVM:
TDX: Implement methods to inject NMI".

TDX doesn't support system-management mode (SMM) and system-management
interrupt (SMI) in guest TDs because TDX module doesn't provide a way for
VMM to inject SMI into guest TD or switch guest vCPU mode into SMM.
SMI requests return -ENOTTY similar to CONFIG_KVM_SMM=n.  Likewise,
INIT and SIPI events are not used and are blocked for TDX guests;
TDX defines its own vCPU creation and initialization sequence, which
is done on the host via SEAMCALLs at TD build time.

VM-exit for external events
===========================

Similar to the VMX case, external interrupts are with interrupts off:
in the .handle_exit_irqoff() callback for external interrupts and in
the noinstr region for NMIs.  Just like VMX, NMI remains blocked after
exiting from TDX guest for NMI-induced exits.

Machine check, which is handled in the .handle_exit_irqoff() callback, is
the only exception type KVM handles for TDX guests. For other exceptions,
because TDX guest state is protected, exceptions in TDX guests can't be
intercepted. TDX VMM isn't supposed to handle these exceptions. Exit to
userspace with KVM_EXIT_EXCEPTION If unexpected exception occurs.

Host SMIs also cause an exit to KVM.  This is needed because in SEAM
root mode (TDX module) all interrupts are blocked.  An SMI can be "I/O
SMI" or "other SMI".  For TDX, there will be no I/O SMI because I/O
instructions inside TDX guest trigger #VE and TDX guest needs to use
TDVMCALL to request VMM to do I/O emulation.  The only case of interest
for "other SMI" is an #MC occurring in the guest when MCE-SMI morphing
is enabled in the host firmware.  Such "MSMI" is marked by having bit 0
set in the exit qualification; MSMI exits are fatal for the TD and
are eventually handled by the kernel machine check handler (7911f145
x86/mce: Implement recovery for errors in TDX/SEAM non-root mode),
which marks the page as poisoned.  It is not possible right now to
pass machine check exceptions to the guest.

SMIs other than machine check SMIs are handled just by leaving SEAM
root mode and KVM doesn't need to do anything.
parents 4d2dc9a2 6c441e4d
Loading
Loading
Loading
Loading
+1 −0
Original line number Diff line number Diff line
@@ -116,6 +116,7 @@ KVM_X86_OP_OPTIONAL(pi_start_assignment)
KVM_X86_OP_OPTIONAL(apicv_pre_state_restore)
KVM_X86_OP_OPTIONAL(apicv_post_state_restore)
KVM_X86_OP_OPTIONAL_RET0(dy_apicv_has_pending_interrupt)
KVM_X86_OP_OPTIONAL(protected_apic_has_interrupt)
KVM_X86_OP_OPTIONAL(set_hv_timer)
KVM_X86_OP_OPTIONAL(cancel_hv_timer)
KVM_X86_OP(setup_mce)
+1 −0
Original line number Diff line number Diff line
@@ -1842,6 +1842,7 @@ struct kvm_x86_ops {
	void (*apicv_pre_state_restore)(struct kvm_vcpu *vcpu);
	void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu);
	bool (*dy_apicv_has_pending_interrupt)(struct kvm_vcpu *vcpu);
	bool (*protected_apic_has_interrupt)(struct kvm_vcpu *vcpu);

	int (*set_hv_timer)(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
			    bool *expired);
+5 −0
Original line number Diff line number Diff line
@@ -81,6 +81,11 @@ static inline bool pi_test_sn(struct pi_desc *pi_desc)
	return test_bit(POSTED_INTR_SN, (unsigned long *)&pi_desc->control);
}

static inline bool pi_test_pir(int vector, struct pi_desc *pi_desc)
{
	return test_bit(vector, (unsigned long *)pi_desc->pir);
}

/* Non-atomic helpers */
static inline void __pi_set_sn(struct pi_desc *pi_desc)
{
+1 −0
Original line number Diff line number Diff line
@@ -34,6 +34,7 @@
#define EXIT_REASON_TRIPLE_FAULT        2
#define EXIT_REASON_INIT_SIGNAL			3
#define EXIT_REASON_SIPI_SIGNAL         4
#define EXIT_REASON_OTHER_SMI           6

#define EXIT_REASON_INTERRUPT_WINDOW    7
#define EXIT_REASON_NMI_WINDOW          8
+3 −0
Original line number Diff line number Diff line
@@ -100,6 +100,9 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
	if (kvm_cpu_has_extint(v))
		return 1;

	if (lapic_in_kernel(v) && v->arch.apic->guest_apic_protected)
		return kvm_x86_call(protected_apic_has_interrupt)(v);

	return kvm_apic_has_interrupt(v) != -1;	/* LAPIC */
}
EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
Loading