+13
−6
Loading
Shuai found that cxl_reset_bus_function() calls pci_reset_bus_function() internally while both are calling pci_dev_reset_iommu_prepare/done(). As pci_dev_reset_iommu_prepare() doesn't support re-entry, the inner call will trigger a WARN_ON and return -EBUSY, resulting in failing the entire device reset. On the other hand, removing the outer calls in the PCI callers is unsafe. As pointed out by Kevin, device-specific quirks like reset_hinic_vf_dev() execute custom firmware waits after their inner pcie_flr() completes. If the IOMMU protection relies solely on the inner reset, the IOMMU will be unblocked prematurely while the device is still resetting. Instead, fix this by making pci_dev_reset_iommu_prepare/done() reentrant. Introduce gdev->reset_depth to handle the re-entries on the same device. Fixes: c279e839 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()") Cc: stable@vger.kernel.org Reported-by:Shuai Xue <xueshuai@linux.alibaba.com> Closes: https://lore.kernel.org/all/absKsk7qQOwzhpzv@Asurada-Nvidia/ Suggested-by:
Kevin Tian <kevin.tian@intel.com> Reviewed-by:
Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by:
Jason Gunthorpe <jgg@nvidia.com> Reviewed-by:
Kevin Tian <kevin.tian@intel.com> Reviewed-by:
Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by:
Nicolin Chen <nicolinc@nvidia.com> Signed-off-by:
Joerg Roedel <joerg.roedel@amd.com>