Commit 2004cef1 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge tag 'sched-core-2024-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:

 - Implement the SCHED_DEADLINE server infrastructure - Daniel Bristot
   de Oliveira's last major contribution to the kernel:

     "SCHED_DEADLINE servers can help fixing starvation issues of low
      priority tasks (e.g., SCHED_OTHER) when higher priority tasks
      monopolize CPU cycles. Today we have RT Throttling; DEADLINE
      servers should be able to replace and improve that."

   (Daniel Bristot de Oliveira, Peter Zijlstra, Joel Fernandes, Youssef
   Esmat, Huang Shijie)

 - Preparatory changes for sched_ext integration:
     - Use set_next_task(.first) where required
     - Fix up set_next_task() implementations
     - Clean up DL server vs. core sched
     - Split up put_prev_task_balance()
     - Rework pick_next_task()
     - Combine the last put_prev_task() and the first set_next_task()
     - Rework dl_server
     - Add put_prev_task(.next)

   (Peter Zijlstra, with a fix by Tejun Heo)

 - Complete the EEVDF transition and refine EEVDF scheduling:
     - Implement delayed dequeue
     - Allow shorter slices to wakeup-preempt
     - Use sched_attr::sched_runtime to set request/slice suggestion
     - Document the new feature flags
     - Remove unused and duplicate-functionality fields
     - Simplify & unify pick_next_task_fair()
     - Misc debuggability enhancements

   (Peter Zijlstra, with fixes/cleanups by Dietmar Eggemann, Valentin
   Schneider and Chuyi Zhou)

 - Initialize the vruntime of a new task when it is first enqueued,
   resulting in significant decrease in latency of newly woken tasks
   (Zhang Qiao)

 - Introduce SM_IDLE and an idle re-entry fast-path in __schedule()
   (K Prateek Nayak, Peter Zijlstra)

 - Clean up and clarify the usage of Clean up usage of rt_task()
   (Qais Yousef)

 - Preempt SCHED_IDLE entities in strict cgroup hierarchies
   (Tianchen Ding)

 - Clarify the documentation of time units for deadline scheduler
   parameters (Christian Loehle)

 - Remove the HZ_BW chicken-bit feature flag introduced a year ago,
   the original change seems to be working fine (Phil Auld)

 - Misc fixes and cleanups (Chen Yu, Dan Carpenter, Huang Shijie,
   Peilin He, Qais Yousefm and Vincent Guittot)

* tag 'sched-core-2024-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
  sched/cpufreq: Use NSEC_PER_MSEC for deadline task
  cpufreq/cppc: Use NSEC_PER_MSEC for deadline task
  sched/deadline: Clarify nanoseconds in uapi
  sched/deadline: Convert schedtool example to chrt
  sched/debug: Fix the runnable tasks output
  sched: Fix sched_delayed vs sched_core
  kernel/sched: Fix util_est accounting for DELAY_DEQUEUE
  kthread: Fix task state in kthread worker if being frozen
  sched/pelt: Use rq_clock_task() for hw_pressure
  sched/fair: Move effective_cpu_util() and effective_cpu_util() in fair.c
  sched/core: Introduce SM_IDLE and an idle re-entry fast-path in __schedule()
  sched: Add put_prev_task(.next)
  sched: Rework dl_server
  sched: Combine the last put_prev_task() and the first set_next_task()
  sched: Rework pick_next_task()
  sched: Split up put_prev_task_balance()
  sched: Clean up DL server vs core sched
  sched: Fixup set_next_task() implementations
  sched: Use set_next_task(.first) where required
  sched/fair: Properly deactivate sched_delayed task upon class change
  ...
parents 509d2cd1 bc9057da
Loading
Loading
Loading
Loading
+6 −8
Original line number Diff line number Diff line
@@ -749,21 +749,19 @@ Appendix A. Test suite
 of the command line options. Please refer to rt-app documentation for more
 details (`<rt-app-sources>/doc/*.json`).

 The second testing application is a modification of schedtool, called
 schedtool-dl, which can be used to setup SCHED_DEADLINE parameters for a
 certain pid/application. schedtool-dl is available at:
 https://github.com/scheduler-tools/schedtool-dl.git.
 The second testing application is done using chrt which has support
 for SCHED_DEADLINE.

 The usage is straightforward::

  # schedtool -E -t 10000000:100000000 -e ./my_cpuhog_app
  # chrt -d -T 10000000 -D 100000000 0 ./my_cpuhog_app

 With this, my_cpuhog_app is put to run inside a SCHED_DEADLINE reservation
 of 10ms every 100ms (note that parameters are expressed in microseconds).
 You can also use schedtool to create a reservation for an already running
 of 10ms every 100ms (note that parameters are expressed in nanoseconds).
 You can also use chrt to create a reservation for an already running
 application, given that you know its pid::

  # schedtool -E -t 10000000:100000000 my_app_pid
  # chrt -d -T 10000000 -D 100000000 -p 0 my_app_pid

Appendix B. Minimal main()
==========================
+3 −3
Original line number Diff line number Diff line
@@ -224,9 +224,9 @@ static void __init cppc_freq_invariance_init(void)
		 * Fake (unused) bandwidth; workaround to "fix"
		 * priority inheritance.
		 */
		.sched_runtime	= 1000000,
		.sched_deadline = 10000000,
		.sched_period	= 10000000,
		.sched_runtime	= NSEC_PER_MSEC,
		.sched_deadline = 10 * NSEC_PER_MSEC,
		.sched_period	= 10 * NSEC_PER_MSEC,
	};
	int ret;

+1 −1
Original line number Diff line number Diff line
@@ -335,7 +335,7 @@ static inline bool six_owner_running(struct six_lock *lock)
	 */
	rcu_read_lock();
	struct task_struct *owner = READ_ONCE(lock->owner);
	bool ret = owner ? owner_on_cpu(owner) : !rt_task(current);
	bool ret = owner ? owner_on_cpu(owner) : !rt_or_dl_task(current);
	rcu_read_unlock();

	return ret;
+1 −1
Original line number Diff line number Diff line
@@ -2626,7 +2626,7 @@ static ssize_t timerslack_ns_write(struct file *file, const char __user *buf,
	}

	task_lock(p);
	if (task_is_realtime(p))
	if (rt_or_dl_task_policy(p))
		slack_ns = 0;
	else if (slack_ns == 0)
		slack_ns = p->default_timer_slack_ns;
+1 −1
Original line number Diff line number Diff line
@@ -40,7 +40,7 @@ static inline int task_nice_ioclass(struct task_struct *task)
{
	if (task->policy == SCHED_IDLE)
		return IOPRIO_CLASS_IDLE;
	else if (task_is_realtime(task))
	else if (rt_or_dl_task_policy(task))
		return IOPRIO_CLASS_RT;
	else
		return IOPRIO_CLASS_BE;
Loading