Commit 79104bec authored by Fernand Sieber's avatar Fernand Sieber Committed by Peter Zijlstra
Browse files

sched/fair: Forfeit vruntime on yield



If a task yields, the scheduler may decide to pick it again. The task in
turn may decide to yield immediately or shortly after, leading to a tight
loop of yields.

If there's another runnable task as this point, the deadline will be
increased by the slice at each loop. This can cause the deadline to runaway
pretty quickly, and subsequent elevated run delays later on as the task
doesn't get picked again. The reason the scheduler can pick the same task
again and again despite its deadline increasing is because it may be the
only eligible task at that point.

Fix this by making the task forfeiting its remaining vruntime and pushing
the deadline one slice ahead. This implements yield behavior more
authentically.

We limit the forfeiting to eligible tasks. This is because core scheduling
prefers running ineligible tasks rather than force idling. As such, without
the condition, we can end up on a yield loop which makes the vruntime
increase rapidly, leading to anomalous run delays later down the line.

Fixes: 147f3efa ("sched/fair: Implement an EEVDF-like scheduling  policy")
Signed-off-by: default avatarFernand Sieber <sieberf@amazon.com>
Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250401123622.584018-1-sieberf@amazon.com
Link: https://lore.kernel.org/r/20250911095113.203439-1-sieberf@amazon.com
Link: https://lore.kernel.org/r/20250916140228.452231-1-sieberf@amazon.com
parent 3a866087
Loading
Loading
Loading
Loading
+13 −1
Original line number Diff line number Diff line
@@ -9007,7 +9007,19 @@ static void yield_task_fair(struct rq *rq)
	 */
	rq_clock_skip_update(rq);

	/*
	 * Forfeit the remaining vruntime, only if the entity is eligible. This
	 * condition is necessary because in core scheduling we prefer to run
	 * ineligible tasks rather than force idling. If this happens we may
	 * end up in a loop where the core scheduler picks the yielding task,
	 * which yields immediately again; without the condition the vruntime
	 * ends up quickly running away.
	 */
	if (entity_eligible(cfs_rq, se)) {
		se->vruntime = se->deadline;
		se->deadline += calc_delta_fair(se->slice, se);
		update_min_vruntime(cfs_rq);
	}
}

static bool yield_to_task_fair(struct rq *rq, struct task_struct *p)