+1043
−1
File changed.
Preview size limit exceeded, changes collapsed.
Loading
Background: loading qla2xxx with "ql2xtgt_tape_enable=1" enables Sequence Level Error Recovery (SLER), which is most commonly used for tape drives. With SLER enabled, if there is a recoverable I/O error during a SCSI command, a Sequence Retransmission Request (SRR) will be used to retry the I/O at a low-level completely within the driver without propagating the error to the upper levels of the SCSI stack. SRR support was removed in 2017 by commit 2c39b5ca ("qla2xxx: Remove SRR code"). Add it back, new and improved. The old removed SRR code used sequence numbers to correlate the SRR CTIOs with SRR immediate notify messages. I don't see how that would work reliably with MSI-X interrupts and multiple queues. So instead use the exchange address to find the command associated with the immediate notify (qlt_srr_to_cmd). The old removed SRR code had a function qlt_check_srr_debug() to simulate a SRR, but it didn't work for me. Instead I just used fiber optic attenuators attached to the FC cable to reduce the strength of the signal and induce errors. Unfortunately this only worked for inducing SRRs on Data-Out (write) commands, so that is all I was able to test. The code to build a new scatterlist for a SRR with nonzero offset has been improved to reduce memory requirements and has been well-tested. However it does not support protection information. When a single cmd gets multiple SRRs, the old removed SRR code would restore the data buffer from the values in cmd->se_cmd before processing the new SRR. That might be needed if the offset for the new SRR was lower than the offset for the previous SRR, but I am not sure if that can happen. In my testing, when a single cmd gets multiple SRRs, the SRR offset always increases or stays the same. But in case it can decrease, I added the function qlt_restore_orig_sg(). If this is not supposed to happen then qlt_restore_orig_sg() can be removed to simplify the code. I ran into some HBA firmware bugs with QLE269x, QLE27xx, and QLE28xx firmware 9.05.xx - 9.08.xx where a SRR would cause the HBA to misbehave badly. Since SRRs are rare and therefore difficult to test, I figured it would be worth checking for the buggy firmware and disabling SLER with a warning instead of letting others run into the same problem on the rare occasion that they get a SRR. This turned out to be difficult because the firmware version isn't known in the normal NVRAM config routine, so I added a second NVRAM config routine that is called after the firmware version is known. Signed-off-by:Tony Battersby <tonyb@cybernetics.com> Link: https://patch.msgid.link/654b7181-b79e-40ed-a15b-6d6e441a5d5f@cybernetics.com Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
File changed.
Preview size limit exceeded, changes collapsed.