drm/amdgpu: refine create and release logic of hive info

Change to dynamically create and release hive info object,
which help driver support more hives in the future.

v2:
Change to save hive object pointer in adev, to avoid locking
xgmi_mutex every time when calling amdgpu_get_xgmi_hive.

v3:
1. Change type of hive object pointer in adev from void* to
amdgpu_hive_info*.
2. remove unnecessary variable initialization.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit is contained in:
Dennis Li
2020-08-18 18:44:17 +08:00
committed by Alex Deucher
parent aac891685d
commit d95e8e97e2
5 changed files with 132 additions and 107 deletions

View File

@@ -1554,9 +1554,10 @@ static void amdgpu_ras_do_recovery(struct work_struct *work)
struct amdgpu_device *remote_adev = NULL;
struct amdgpu_device *adev = ras->adev;
struct list_head device_list, *device_list_handle = NULL;
struct amdgpu_hive_info *hive = amdgpu_get_xgmi_hive(adev, false);
if (!ras->disable_ras_err_cnt_harvest) {
struct amdgpu_hive_info *hive = amdgpu_get_xgmi_hive(adev);
/* Build list of devices to query RAS related errors */
if (hive && adev->gmc.xgmi.num_physical_nodes > 1) {
device_list_handle = &hive->device_list;
@@ -1569,6 +1570,8 @@ static void amdgpu_ras_do_recovery(struct work_struct *work)
list_for_each_entry(remote_adev,
device_list_handle, gmc.xgmi.head)
amdgpu_ras_log_on_err_counter(remote_adev);
amdgpu_put_xgmi_hive(hive);
}
if (amdgpu_device_should_recover_gpu(ras->adev))