drm/amdkfd: Remove arbitrary timeout for hmm_range_fault

On system with khugepaged enabled and user cases with THP buffer, the
hmm_range_fault may takes > 15 seconds to return -EBUSY, the arbitrary
timeout value is not accurate, cause memory allocation failure.

Remove the arbitrary timeout value, return EAGAIN to application if
hmm_range_fault return EBUSY, then userspace libdrm and Thunk will call
ioctl again.

Change EAGAIN to debug message as this is not error.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit is contained in:
Philip Yang
2024-04-30 13:51:51 -04:00
committed by Alex Deucher
parent 10f624ef23
commit 9095e55440
3 changed files with 8 additions and 14 deletions

View File

@@ -1088,7 +1088,10 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t user_addr,
ret = amdgpu_ttm_tt_get_user_pages(bo, bo->tbo.ttm->pages, &range);
if (ret) {
pr_err("%s: Failed to get user pages: %d\n", __func__, ret);
if (ret == -EAGAIN)
pr_debug("Failed to get user pages, try again\n");
else
pr_err("%s: Failed to get user pages: %d\n", __func__, ret);
goto unregister_out;
}