Commit e9a2bba4 authored by Paolo Bonzini's avatar Paolo Bonzini
Browse files

Merge tag 'kvm-x86-xen-6.9' of https://github.com/kvm-x86/linux into HEAD

KVM Xen and pfncache changes for 6.9:

 - Rip out the half-baked support for using gfn_to_pfn caches to manage pages
   that are "mapped" into guests via physical addresses.

 - Add support for using gfn_to_pfn caches with only a host virtual address,
   i.e. to bypass the "gfn" stage of the cache.  The primary use case is
   overlay pages, where the guest may change the gfn used to reference the
   overlay page, but the backing hva+pfn remains the same.

 - Add an ioctl() to allow mapping Xen's shared_info page using an hva instead
   of a gpa, so that userspace doesn't need to reconfigure and invalidate the
   cache/mapping if the guest changes the gpa (but userspace keeps the resolved
   hva the same).

 - When possible, use a single host TSC value when computing the deadline for
   Xen timers in order to improve the accuracy of the timer emulation.

 - Inject pending upcall events when the vCPU software-enables its APIC to fix
   a bug where an upcall can be lost (and to follow Xen's behavior).

 - Fall back to the slow path instead of warning if "fast" IRQ delivery of Xen
   events fails, e.g. if the guest has aliased xAPIC IDs.

 - Extend gfn_to_pfn_cache's mutex to cover (de)activation (in addition to
   refresh), and drop a now-redundant acquisition of xen_lock (that was
   protecting the shared_info cache) to fix a deadlock due to recursively
   acquiring xen_lock.
parents e9025cdd 7a36d680
Loading
Loading
Loading
Loading
+41 −12
Original line number Diff line number Diff line
@@ -372,7 +372,7 @@ The bits in the dirty bitmap are cleared before the ioctl returns, unless
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is enabled.  For more information,
see the description of the capability.

Note that the Xen shared info page, if configured, shall always be assumed
Note that the Xen shared_info page, if configured, shall always be assumed
to be dirty. KVM will not explicitly mark it such.


@@ -5487,8 +5487,9 @@ KVM_PV_ASYNC_CLEANUP_PERFORM
		__u8 long_mode;
		__u8 vector;
		__u8 runstate_update_flag;
		struct {
		union {
			__u64 gfn;
			__u64 hva;
		} shared_info;
		struct {
			__u32 send_port;
@@ -5516,19 +5517,20 @@ type values:

KVM_XEN_ATTR_TYPE_LONG_MODE
  Sets the ABI mode of the VM to 32-bit or 64-bit (long mode). This
  determines the layout of the shared info pages exposed to the VM.
  determines the layout of the shared_info page exposed to the VM.

KVM_XEN_ATTR_TYPE_SHARED_INFO
  Sets the guest physical frame number at which the Xen "shared info"
  Sets the guest physical frame number at which the Xen shared_info
  page resides. Note that although Xen places vcpu_info for the first
  32 vCPUs in the shared_info page, KVM does not automatically do so
  and instead requires that KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO be used
  explicitly even when the vcpu_info for a given vCPU resides at the
  "default" location in the shared_info page. This is because KVM may
  not be aware of the Xen CPU id which is used as the index into the
  vcpu_info[] array, so may know the correct default location.

  Note that the shared info page may be constantly written to by KVM;
  and instead requires that KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO or
  KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA be used explicitly even when
  the vcpu_info for a given vCPU resides at the "default" location
  in the shared_info page. This is because KVM may not be aware of
  the Xen CPU id which is used as the index into the vcpu_info[]
  array, so may know the correct default location.

  Note that the shared_info page may be constantly written to by KVM;
  it contains the event channel bitmap used to deliver interrupts to
  a Xen guest, amongst other things. It is exempt from dirty tracking
  mechanisms — KVM will not explicitly mark the page as dirty each
@@ -5537,9 +5539,21 @@ KVM_XEN_ATTR_TYPE_SHARED_INFO
  any vCPU has been running or any event channel interrupts can be
  routed to the guest.

  Setting the gfn to KVM_XEN_INVALID_GFN will disable the shared info
  Setting the gfn to KVM_XEN_INVALID_GFN will disable the shared_info
  page.

KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA
  If the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA flag is also set in the
  Xen capabilities, then this attribute may be used to set the
  userspace address at which the shared_info page resides, which
  will always be fixed in the VMM regardless of where it is mapped
  in guest physical address space. This attribute should be used in
  preference to KVM_XEN_ATTR_TYPE_SHARED_INFO as it avoids
  unnecessary invalidation of an internal cache when the page is
  re-mapped in guest physcial address space.

  Setting the hva to zero will disable the shared_info page.

KVM_XEN_ATTR_TYPE_UPCALL_VECTOR
  Sets the exception vector used to deliver Xen event channel upcalls.
  This is the HVM-wide vector injected directly by the hypervisor
@@ -5636,6 +5650,21 @@ KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO
  on dirty logging. Setting the gpa to KVM_XEN_INVALID_GPA will disable
  the vcpu_info.

KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA
  If the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA flag is also set in the
  Xen capabilities, then this attribute may be used to set the
  userspace address of the vcpu_info for a given vCPU. It should
  only be used when the vcpu_info resides at the "default" location
  in the shared_info page. In this case it is safe to assume the
  userspace address will not change, because the shared_info page is
  an overlay on guest memory and remains at a fixed host address
  regardless of where it is mapped in guest physical address space
  and hence unnecessary invalidation of an internal cache may be
  avoided if the guest memory layout is modified.
  If the vcpu_info does not reside at the "default" location then
  it is not guaranteed to remain at the same host address and
  hence the aforementioned cache invalidation is required.

KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO
  Sets the guest physical address of an additional pvclock structure
  for a given vCPU. This is typically used for guest vsyscall support.
+1 −1
Original line number Diff line number Diff line
@@ -102,7 +102,7 @@ static int __diag_page_ref_service(struct kvm_vcpu *vcpu)
		    parm.token_addr & 7 || parm.zarch != 0x8000000000000000ULL)
			return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);

		if (kvm_is_error_gpa(vcpu->kvm, parm.token_addr))
		if (!kvm_is_gpa_in_memslot(vcpu->kvm, parm.token_addr))
			return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);

		vcpu->arch.pfault_token = parm.token_addr;
+7 −7
Original line number Diff line number Diff line
@@ -664,7 +664,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
	case ASCE_TYPE_REGION1:	{
		union region1_table_entry rfte;

		if (kvm_is_error_gpa(vcpu->kvm, ptr))
		if (!kvm_is_gpa_in_memslot(vcpu->kvm, ptr))
			return PGM_ADDRESSING;
		if (deref_table(vcpu->kvm, ptr, &rfte.val))
			return -EFAULT;
@@ -682,7 +682,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
	case ASCE_TYPE_REGION2: {
		union region2_table_entry rste;

		if (kvm_is_error_gpa(vcpu->kvm, ptr))
		if (!kvm_is_gpa_in_memslot(vcpu->kvm, ptr))
			return PGM_ADDRESSING;
		if (deref_table(vcpu->kvm, ptr, &rste.val))
			return -EFAULT;
@@ -700,7 +700,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
	case ASCE_TYPE_REGION3: {
		union region3_table_entry rtte;

		if (kvm_is_error_gpa(vcpu->kvm, ptr))
		if (!kvm_is_gpa_in_memslot(vcpu->kvm, ptr))
			return PGM_ADDRESSING;
		if (deref_table(vcpu->kvm, ptr, &rtte.val))
			return -EFAULT;
@@ -728,7 +728,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
	case ASCE_TYPE_SEGMENT: {
		union segment_table_entry ste;

		if (kvm_is_error_gpa(vcpu->kvm, ptr))
		if (!kvm_is_gpa_in_memslot(vcpu->kvm, ptr))
			return PGM_ADDRESSING;
		if (deref_table(vcpu->kvm, ptr, &ste.val))
			return -EFAULT;
@@ -748,7 +748,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
		ptr = ste.fc0.pto * (PAGE_SIZE / 2) + vaddr.px * 8;
	}
	}
	if (kvm_is_error_gpa(vcpu->kvm, ptr))
	if (!kvm_is_gpa_in_memslot(vcpu->kvm, ptr))
		return PGM_ADDRESSING;
	if (deref_table(vcpu->kvm, ptr, &pte.val))
		return -EFAULT;
@@ -770,7 +770,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
		*prot = PROT_TYPE_IEP;
		return PGM_PROTECTION;
	}
	if (kvm_is_error_gpa(vcpu->kvm, raddr.addr))
	if (!kvm_is_gpa_in_memslot(vcpu->kvm, raddr.addr))
		return PGM_ADDRESSING;
	*gpa = raddr.addr;
	return 0;
@@ -957,7 +957,7 @@ static int guest_range_to_gpas(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
				return rc;
		} else {
			gpa = kvm_s390_real_to_abs(vcpu, ga);
			if (kvm_is_error_gpa(vcpu->kvm, gpa)) {
			if (!kvm_is_gpa_in_memslot(vcpu->kvm, gpa)) {
				rc = PGM_ADDRESSING;
				prot = PROT_NONE;
			}
+2 −2
Original line number Diff line number Diff line
@@ -2878,7 +2878,7 @@ static int kvm_s390_vm_mem_op_abs(struct kvm *kvm, struct kvm_s390_mem_op *mop)

	srcu_idx = srcu_read_lock(&kvm->srcu);

	if (kvm_is_error_gpa(kvm, mop->gaddr)) {
	if (!kvm_is_gpa_in_memslot(kvm, mop->gaddr)) {
		r = PGM_ADDRESSING;
		goto out_unlock;
	}
@@ -2940,7 +2940,7 @@ static int kvm_s390_vm_mem_op_cmpxchg(struct kvm *kvm, struct kvm_s390_mem_op *m

	srcu_idx = srcu_read_lock(&kvm->srcu);

	if (kvm_is_error_gpa(kvm, mop->gaddr)) {
	if (!kvm_is_gpa_in_memslot(kvm, mop->gaddr)) {
		r = PGM_ADDRESSING;
		goto out_unlock;
	}
+2 −2
Original line number Diff line number Diff line
@@ -149,7 +149,7 @@ static int handle_set_prefix(struct kvm_vcpu *vcpu)
	 * first page, since address is 8k aligned and memory pieces are always
	 * at least 1MB aligned and have at least a size of 1MB.
	 */
	if (kvm_is_error_gpa(vcpu->kvm, address))
	if (!kvm_is_gpa_in_memslot(vcpu->kvm, address))
		return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);

	kvm_s390_set_prefix(vcpu, address);
@@ -464,7 +464,7 @@ static int handle_test_block(struct kvm_vcpu *vcpu)
		return kvm_s390_inject_prog_irq(vcpu, &vcpu->arch.pgm);
	addr = kvm_s390_real_to_abs(vcpu, addr);

	if (kvm_is_error_gpa(vcpu->kvm, addr))
	if (!kvm_is_gpa_in_memslot(vcpu->kvm, addr))
		return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
	/*
	 * We don't expect errors on modern systems, and do not care
Loading