drm/amdgpu: enable watchdog feature for SQ of aldebaran

SQ's watchdog timer monitors forward progress, a mask of which waves
caused the watchdog timeout is recorded into ras status registers and
then trigger a system fatal error event.

v2:
1. change *query_timeout_status to *query_sq_timeout_status.
2. move query_sq_timeout_status into amdgpu_ras_do_recovery.
3. add module parameters to enable/disable fatal error event and modify
the watchdog timer.

v3:
1. remove unused parameters of *enable_watchdog_timer

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit is contained in:
Dennis Li
2021-03-05 16:30:54 -05:00
committed by Alex Deucher
parent 4abc2567f0
commit 88f8575bca
8 changed files with 174 additions and 0 deletions

View File

@@ -226,6 +226,8 @@ struct amdgpu_gfx_funcs {
void (*init_spm_golden)(struct amdgpu_device *adev);
void (*query_ras_error_status) (struct amdgpu_device *adev);
void (*update_perfmon_mgcg)(struct amdgpu_device *adev, bool enable);
void (*enable_watchdog_timer)(struct amdgpu_device *adev);
void (*query_sq_timeout_status)(struct amdgpu_device *adev);
};
struct sq_work {