Commit 5b2c214a authored by Keith Busch's avatar Keith Busch Committed by Christoph Hellwig
Browse files

nvme-pci: try function level reset on init failure



NVMe devices from multiple vendors appear to get stuck in a reset state
that we can't get out of with an NVMe level Controller Reset. The kernel
would report these with messages that look like:

  Device not ready; aborting reset, CSTS=0x1

These have historically required a power cycle to make them usable
again, but in many cases, a PCIe FLR is sufficient to restart operation
without a power cycle. Try it if the initial controller reset fails
during any nvme reset attempt.

Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
Reviewed-by: default avatarChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: default avatarNitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
parent 746d0ac5
Loading
Loading
Loading
Loading
+22 −2
Original line number Diff line number Diff line
@@ -2064,9 +2064,29 @@ static int nvme_pci_configure_admin_queue(struct nvme_dev *dev)
	 * might be pointing at!
	 */
	result = nvme_disable_ctrl(&dev->ctrl, false);
	if (result < 0) {
		struct pci_dev *pdev = to_pci_dev(dev->dev);

		/*
		 * The NVMe Controller Reset method did not get an expected
		 * CSTS.RDY transition, so something with the device appears to
		 * be stuck. Use the lower level and bigger hammer PCIe
		 * Function Level Reset to attempt restoring the device to its
		 * initial state, and try again.
		 */
		result = pcie_reset_flr(pdev, false);
		if (result < 0)
			return result;

		pci_restore_state(pdev);
		result = nvme_disable_ctrl(&dev->ctrl, false);
		if (result < 0)
			return result;

		dev_info(dev->ctrl.device,
			"controller reset completed after pcie flr\n");
	}

	result = nvme_alloc_queue(dev, 0, NVME_AQ_DEPTH);
	if (result)
		return result;