From 3ccbf6f47098f5d5e247d1b7739d0fd90802187b Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Fri, 22 Aug 2025 15:03:47 +0800 Subject: [PATCH] KVM: x86/mmu: Return -EAGAIN if userspace deletes/moves memslot during prefault Return -EAGAIN if userspace attempts to delete or move a memslot while also prefaulting memory for that same memslot, i.e. force userspace to retry instead of trying to handle the scenario entirely within KVM. Unlike KVM_RUN, which needs to handle the scenario entirely within KVM because userspace has come to depend on such behavior, KVM_PRE_FAULT_MEMORY can return -EAGAIN without breaking userspace as this scenario can't have ever worked (and there's no sane use case for prefaulting to a memslot that's being deleted/moved). And also unlike KVM_RUN, the prefault path doesn't naturally guarantee forward progress. E.g. to handle such a scenario, KVM would need to drop and reacquire SRCU to break the deadlock between the memslot update (synchronizes SRCU) and the prefault (waits for the memslot update to complete). However, dropping SRCU creates more problems, as completing the memslot update will bump the memslot generation, which in turn will invalidate the MMU root. To handle that, prefaulting would need to handle pending KVM_REQ_MMU_FREE_OBSOLETE_ROOTS requests and do kvm_mmu_reload() prior to mapping each individual. I.e. to fully handle this scenario, prefaulting would eventually need to look a lot like vcpu_enter_guest(). Given that there's no reasonable use case and practically zero risk of breaking userspace, punt the problem to userspace and avoid adding unnecessary complexity to the prefault path. Note, TDX's guest_memfd post-populate path is unaffected as slots_lock is held for the entire duration of populate(), i.e. any memslot modifications will be fully serialized against TDX's flavor of prefaulting. Reported-by: Reinette Chatre Closes: https://lore.kernel.org/all/20250519023737.30360-1-yan.y.zhao@intel.com Debugged-by: Yan Zhao Reviewed-by: Binbin Wu Link: https://lore.kernel.org/r/20250822070347.26451-1-yan.y.zhao@intel.com Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 92ff15969a36..53be6c2a5b0c 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4653,10 +4653,16 @@ static int kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu, /* * Retry the page fault if the gfn hit a memslot that is being deleted * or moved. This ensures any existing SPTEs for the old memslot will - * be zapped before KVM inserts a new MMIO SPTE for the gfn. + * be zapped before KVM inserts a new MMIO SPTE for the gfn. Punt the + * error to userspace if this is a prefault, as KVM's prefaulting ABI + * doesn't provide the same forward progress guarantees as KVM_RUN. */ - if (slot->flags & KVM_MEMSLOT_INVALID) + if (slot->flags & KVM_MEMSLOT_INVALID) { + if (fault->prefetch) + return -EAGAIN; + return RET_PF_RETRY; + } if (slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT) { /*