drm/amdkfd: restore userptr ignore bad address error

The userptr can be unmapped by application and still registered to
driver, restore userptr work return user pages will get -EFAULT bad
address error. Pretend this error as succeed. GPU access this userptr
will have VM fault later, it is better than application soft hangs with
stalled user mode queues.

v2: squash in warning fix (Alex)

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit is contained in:
Philip Yang
2021-10-26 11:59:28 -04:00
committed by Alex Deucher
parent 68daadf3d6
commit 3b8a23ae52
2 changed files with 20 additions and 10 deletions

View File

@@ -2041,19 +2041,26 @@ static int update_invalid_user_pages(struct amdkfd_process_info *process_info,
/* Get updated user pages */
ret = amdgpu_ttm_tt_get_user_pages(bo, bo->tbo.ttm->pages);
if (ret) {
pr_debug("%s: Failed to get user pages: %d\n",
__func__, ret);
pr_debug("Failed %d to get user pages\n", ret);
/* Return error -EBUSY or -ENOMEM, retry restore */
return ret;
/* Return -EFAULT bad address error as success. It will
* fail later with a VM fault if the GPU tries to access
* it. Better than hanging indefinitely with stalled
* user mode queues.
*
* Return other error -EBUSY or -ENOMEM to retry restore
*/
if (ret != -EFAULT)
return ret;
} else {
/*
* FIXME: Cannot ignore the return code, must hold
* notifier_lock
*/
amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm);
}
/*
* FIXME: Cannot ignore the return code, must hold
* notifier_lock
*/
amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm);
/* Mark the BO as valid unless it was invalidated
* again concurrently.
*/