Commit 0d5fd7a9 authored by Nicolin Chen's avatar Nicolin Chen Committed by Joerg Roedel
Browse files

iommu: Fix nested pci_dev_reset_iommu_prepare/done()



Shuai found that cxl_reset_bus_function() calls pci_reset_bus_function()
internally while both are calling pci_dev_reset_iommu_prepare/done().

As pci_dev_reset_iommu_prepare() doesn't support re-entry, the inner call
will trigger a WARN_ON and return -EBUSY, resulting in failing the entire
device reset.

On the other hand, removing the outer calls in the PCI callers is unsafe.
As pointed out by Kevin, device-specific quirks like reset_hinic_vf_dev()
execute custom firmware waits after their inner pcie_flr() completes. If
the IOMMU protection relies solely on the inner reset, the IOMMU will be
unblocked prematurely while the device is still resetting.

Instead, fix this by making pci_dev_reset_iommu_prepare/done() reentrant.

Introduce gdev->reset_depth to handle the re-entries on the same device.

Fixes: c279e839 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()")
Cc: stable@vger.kernel.org
Reported-by: default avatarShuai Xue <xueshuai@linux.alibaba.com>
Closes: https://lore.kernel.org/all/absKsk7qQOwzhpzv@Asurada-Nvidia/


Suggested-by: default avatarKevin Tian <kevin.tian@intel.com>
Reviewed-by: default avatarShuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: default avatarJason Gunthorpe <jgg@nvidia.com>
Reviewed-by: default avatarKevin Tian <kevin.tian@intel.com>
Reviewed-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: default avatarNicolin Chen <nicolinc@nvidia.com>
Signed-off-by: default avatarJoerg Roedel <joerg.roedel@amd.com>
parent 1615e889
Loading
Loading
Loading
Loading
+13 −6
Original line number Diff line number Diff line
@@ -83,6 +83,7 @@ struct group_device {
	 *  - Device is undergoing a reset
	 */
	bool blocked;
	unsigned int reset_depth;
};

/* Iterate over each struct group_device in a struct iommu_group */
@@ -4045,21 +4046,24 @@ int pci_dev_reset_iommu_prepare(struct pci_dev *pdev)
	if (WARN_ON(!gdev))
		return -ENODEV;

	/* Re-entry is not allowed */
	if (WARN_ON(gdev->blocked))
		return -EBUSY;
	if (gdev->reset_depth++)
		return 0;

	ret = __iommu_group_alloc_blocking_domain(group);
	if (ret)
	if (ret) {
		gdev->reset_depth--;
		return ret;
	}

	/* Stage RID domain at blocking_domain while retaining group->domain */
	if (group->domain != group->blocking_domain) {
		ret = __iommu_attach_device(group->blocking_domain, &pdev->dev,
					    group->domain);
		if (ret)
		if (ret) {
			gdev->reset_depth--;
			return ret;
		}
	}

	/*
	 * Update gdev->blocked upon the domain change, as it is used to return
@@ -4118,7 +4122,10 @@ void pci_dev_reset_iommu_done(struct pci_dev *pdev)
	if (WARN_ON(!gdev))
		return;

	if (!gdev->blocked)
	/* Unbalanced done() calls would underflow the counter */
	if (WARN_ON(gdev->reset_depth == 0))
		return;
	if (--gdev->reset_depth)
		return;

	if (WARN_ON(!group->blocking_domain))