drm/amdgpu: Improve SDMA reset logic with guilty queue tracking

This patch includes the remaining improvements to the SDMA reset logic:
- Added `gfx_guilty` and `page_guilty` flags to track guilty queues.
- Updated the reset and resume functions to handle the guilty state.
- Cached the `rptr` before reset.

v2:
   1.replace the caller with a guilty bool.
   If the queue is the guilty one, set the rptr and wptr  to the saved wptr value,
   else, set the rptr and wptr to the saved rptr value. (Alex)
   2. cache the rptr before the reset. (Alex)

v3: Keeping intermediate variables like u64 rwptr simplifies resotre rptr/wptr.(Lijo)

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit is contained in:
Jesse.zhang@amd.com
2025-02-20 14:43:59 +08:00
committed by Alex Deucher
parent 0ad649321a
commit fdbfaaaae0
3 changed files with 61 additions and 14 deletions

View File

@@ -475,6 +475,10 @@ void amdgpu_sdma_register_on_reset_callbacks(struct amdgpu_device *adev, struct
if (!funcs)
return;
/* Ensure the reset_callback_list is initialized */
if (!adev->sdma.reset_callback_list.next) {
INIT_LIST_HEAD(&adev->sdma.reset_callback_list);
}
/* Initialize the list node in the callback structure */
INIT_LIST_HEAD(&funcs->list);
@@ -517,7 +521,7 @@ int amdgpu_sdma_reset_engine(struct amdgpu_device *adev, uint32_t instance_id, b
*/
if (!amdgpu_ring_sched_ready(gfx_ring)) {
drm_sched_wqueue_stop(&gfx_ring->sched);
gfx_sched_stopped = true;;
gfx_sched_stopped = true;
}
if (adev->sdma.has_page_queue && !amdgpu_ring_sched_ready(page_ring)) {