Commit a6149f03 authored Oct 30, 2023 by Matthew Brost Committed by Luben Tuikov Nov 01, 2023

drm/sched: Convert drm scheduler to use a work queue rather than kthread



In Xe, the new Intel GPU driver, a choice has made to have a 1 to 1
mapping between a drm_gpu_scheduler and drm_sched_entity. At first this
seems a bit odd but let us explain the reasoning below.

1. In Xe the submission order from multiple drm_sched_entity is not
guaranteed to be the same completion even if targeting the same hardware
engine. This is because in Xe we have a firmware scheduler, the GuC,
which allowed to reorder, timeslice, and preempt submissions. If a using
shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls
apart as the TDR expects submission order == completion order. Using a
dedicated drm_gpu_scheduler per drm_sched_entity solve this problem.

2. In Xe submissions are done via programming a ring buffer (circular
buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the
limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow
control on the ring for free.

A problem with this design is currently a drm_gpu_scheduler uses a
kthread for submission / job cleanup. This doesn't scale if a large
number of drm_gpu_scheduler are used. To work around the scaling issue,
use a worker rather than kthread for submission / job cleanup.

v2:
  - (Rob Clark) Fix msm build
  - Pass in run work queue
v3:
  - (Boris) don't have loop in worker
v4:
  - (Tvrtko) break out submit ready, stop, start helpers into own patch
v5:
  - (Boris) default to ordered work queue
v6:
  - (Luben / checkpatch) fix alignment in msm_ringbuffer.c
  - (Luben) s/drm_sched_submit_queue/drm_sched_wqueue_enqueue
  - (Luben) Update comment for drm_sched_wqueue_enqueue
  - (Luben) Positive check for submit_wq in drm_sched_init
  - (Luben) s/alloc_submit_wq/own_submit_wq
v7:
  - (Luben) s/drm_sched_wqueue_enqueue/drm_sched_run_job_queue
v8:
  - (Luben) Adjust var names / comments

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20231031032439.1558703-3-matthew.brost@intel.com


Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>

parent 35963cf2

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

+1 −1

Original line number	Diff line number	Diff line
		@@ -2279,7 +2279,7 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
		break;
		}

		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, NULL,
		DRM_SCHED_PRIORITY_COUNT,
		ring->num_hw_submission, 0,
		timeout, adev->reset_domain->wq,

drivers/gpu/drm/etnaviv/etnaviv_sched.c

+1 −1

Original line number	Diff line number	Diff line
		@@ -134,7 +134,7 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
		{
		int ret;

		ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
		ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL,
		DRM_SCHED_PRIORITY_COUNT,
		etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
		msecs_to_jiffies(500), NULL, NULL,

drivers/gpu/drm/lima/lima_sched.c

+1 −1

Original line number	Diff line number	Diff line
		@@ -488,7 +488,7 @@ int lima_sched_pipe_init(struct lima_sched_pipe pipe, const char name)

		INIT_WORK(&pipe->recover_work, lima_sched_recover_work);

		return drm_sched_init(&pipe->base, &lima_sched_ops,
		return drm_sched_init(&pipe->base, &lima_sched_ops, NULL,
		DRM_SCHED_PRIORITY_COUNT,
		1,
		lima_job_hang_limit,

drivers/gpu/drm/msm/msm_ringbuffer.c

+1 −1

Original line number	Diff line number	Diff line
		@@ -94,7 +94,7 @@ struct msm_ringbuffer msm_ringbuffer_new(struct msm_gpu gpu, int id,
		/* currently managing hangcheck ourselves: */
		sched_timeout = MAX_SCHEDULE_TIMEOUT;

		ret = drm_sched_init(&ring->sched, &msm_sched_ops,
		ret = drm_sched_init(&ring->sched, &msm_sched_ops, NULL,
		DRM_SCHED_PRIORITY_COUNT,
		num_hw_submissions, 0, sched_timeout,
		NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev);

drivers/gpu/drm/nouveau/nouveau_sched.c

+1 −1

Original line number	Diff line number	Diff line
		@@ -429,7 +429,7 @@ int nouveau_sched_init(struct nouveau_drm *drm)
		if (!drm->sched_wq)
		return -ENOMEM;

		return drm_sched_init(sched, &nouveau_sched_ops,
		return drm_sched_init(sched, &nouveau_sched_ops, NULL,
		DRM_SCHED_PRIORITY_COUNT,
		NOUVEAU_SCHED_HW_SUBMISSIONS, 0, job_hang_limit,
		NULL, NULL, "nouveau_sched", drm->dev->dev);