Commit 71ba9a5c authored by Kuba Piecuch's avatar Kuba Piecuch Committed by Tejun Heo
Browse files

sched_ext: Documentation: improve accuracy of task lifecycle pseudo-code



* Add ops.quiescent() and ops.runnable() to the sched_change path.
  When a queued task has one of its scheduling properties changed
  (e.g. nice, affinity), it goes through dequeue() -> quiescent() ->
  (property change callback, e.g. ops.set_weight()) -> runnable() ->
  enqueue().

* Change && to || in ops.enqueue() condition. We want to enqueue tasks
  that have a non-zero slice and are not in any DSQ.

* Call ops.dispatch() and ops.dequeue() only for tasks that have had
  ops.enqueue() called. This is to account for tasks direct-dispatched
  from ops.select_cpu().

* Add a note explaining that the pseudo-code provides a simplified view
  of the task lifecycle and list some examples of cases that the
  pseudo-code does not account for.

Fixes: a4f61f0a ("sched_ext: Documentation: Add ops.dequeue() to task lifecycle")
Signed-off-by: default avatarKuba Piecuch <jpiecuch@google.com>
Reviewed-by: default avatarAndrea Righi <arighi@nvidia.com>
Signed-off-by: default avatarTejun Heo <tj@kernel.org>
parent ff1befcb
Loading
Loading
Loading
Loading
+36 −7
Original line number Diff line number Diff line
@@ -408,8 +408,8 @@ for more information.
Task Lifecycle
--------------

The following pseudo-code summarizes the entire lifecycle of a task managed
by a sched_ext scheduler:
The following pseudo-code presents a rough overview of the entire lifecycle
of a task managed by a sched_ext scheduler:

.. code-block:: c

@@ -423,20 +423,25 @@ by a sched_ext scheduler:
        ops.runnable();         /* Task becomes ready to run */

        while (task_is_runnable(task)) {
            if (task is not in a DSQ && task->scx.slice == 0) {
            if (task is not in a DSQ || task->scx.slice == 0) {
                ops.enqueue();  /* Task can be added to a DSQ */

                /* Task property change (i.e., affinity, nice, etc.)? */
                if (sched_change(task)) {
                    ops.dequeue(); /* Exiting BPF scheduler custody */
                    ops.quiescent();

                    /* Property change callback, e.g. ops.set_weight() */

                    ops.runnable();
                    continue;
                }
            }

                /* Any usable CPU becomes available */

                ops.dispatch();     /* Task is moved to a local DSQ */
                ops.dequeue();      /* Exiting BPF scheduler custody */
            }

            ops.running();      /* Task starts running on its assigned CPU */

@@ -456,6 +461,30 @@ by a sched_ext scheduler:
    ops.disable();              /* Disable BPF scheduling for the task */
    ops.exit_task();            /* Task is destroyed */

Note that the above pseudo-code does not cover all possible state transitions
and edge cases, to name a few examples:

* ``ops.dispatch()`` may fail to move the task to a local DSQ due to a racing
  property change on that task, in which case ``ops.dispatch()`` will be
  retried.

* The task may be direct-dispatched to a local DSQ from ``ops.enqueue()``,
  in which case ``ops.dispatch()`` and ``ops.dequeue()`` are skipped and we go
  straight to ``ops.running()``.

* Property changes may occur at virtually any point during the task's lifecycle,
  not just when the task is queued and waiting to be dispatched. For example,
  changing a property of a running task will lead to the callback sequence
  ``ops.stopping()`` -> ``ops.quiescent()`` -> (property change callback) ->
  ``ops.runnable()`` -> ``ops.running()``.

* A sched_ext task can be preempted by a task from a higher-priority scheduling
  class, in which case it will exit the tick-dispatch loop even though it is runnable
  and has a non-zero slice.

See the "Scheduling Cycle" section for a more detailed description of how
a freshly woken up task gets on a CPU.

Where to Look
=============