drm/amdgpu: Fix two reset triggered in a row

Some times a hang GPU causes multiple reset sources to schedule resets.
The second source will be able to trigger an unnecessary reset if they
schedule after we call amdgpu_device_stop_pending_resets.

Move amdgpu_device_stop_pending_resets to after the reset is done. Since
at this point the GPU is supposedly in a good state, any reset scheduled
after this point would be a legitimate reset.

Remove unnecessary and incorrect checks for amdgpu_in_reset that was
kinda serving this purpose.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit is contained in:
Yunxiang Li
2024-04-22 14:59:02 -04:00
committed by Alex Deucher
parent f5007c67fc
commit f4322b9f8a
5 changed files with 14 additions and 13 deletions

View File

@@ -560,7 +560,7 @@ static int xgpu_vi_mailbox_rcv_irq(struct amdgpu_device *adev,
r = xgpu_vi_mailbox_rcv_msg(adev, IDH_FLR_NOTIFICATION);
/* only handle FLR_NOTIFY now */
if (!r && !amdgpu_in_reset(adev))
if (!r)
WARN_ONCE(!amdgpu_reset_domain_schedule(adev->reset_domain,
&adev->virt.flr_work),
"Failed to queue work! at %s",