drm/amdgpu: fix sriov host flr handler

We send back the ready to reset message before we stop anything. This is
wrong. Move it to when we are actually ready for the FLR to happen.

In the current state since we take tens of seconds to stop everything,
it is very likely that host would give up waiting and reset the GPU
before we send ready, so it would be the same as before. But this gets
rid of the hack with reset_domain locking and also let us tell how slow
ready to reset actually is from the host. The ready to reset speed can
be improved later.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit is contained in:
Yunxiang Li
2024-05-24 16:22:28 -04:00
committed by Alex Deucher
parent b3948ad1ac
commit 5c0a1cdd17
6 changed files with 58 additions and 60 deletions

View File

@@ -5070,6 +5070,8 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev,
struct amdgpu_hive_info *hive = NULL;
if (test_bit(AMDGPU_HOST_FLR, &reset_context->flags)) {
amdgpu_virt_ready_to_reset(adev);
amdgpu_virt_wait_reset(adev);
clear_bit(AMDGPU_HOST_FLR, &reset_context->flags);
r = amdgpu_virt_request_full_gpu(adev, true);
} else {