Commit 88221ac0 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull latency tracing updates from Steven Rostedt:

 - Add some trace events to osnoise and timerlat sample generation

   This adds more information to the osnoise and timerlat tracers as
   well as allows BPF programs to be attached to these locations to
   extract even more data.

 - Fix to DECLARE_TRACE_CONDITION() macro

   It wasn't used but now will be and it happened to be broken causing
   the build to fail.

 - Add scheduler specification monitors to runtime verifier (RV)

   This is a continuation of Daniel Bristot's work.

   RV allows monitors to run and react concurrently. Running the
   cumulative model is equivalent to running single components using the
   same reactors, with the advantage that it's easier to point out which
   specification failed in case of error.

   This update introduces nested monitors to RV, in short, the sysfs
   monitor folder will contain a monitor named sched, which is nothing
   but an empty container for other monitors. Controlling the sched
   monitor (enable, disable, set reactors) controls all nested monitors.

   The following scheduling monitors are added:

     - sco: scheduling context operations
       Monitor to ensure sched_set_state happens only in thread context

     - tss: task switch while scheduling
       Monitor to ensure sched_switch happens only in scheduling context

     - snroc: set non runnable on its own context
       Monitor to ensure set_state happens only in the respective task's context

     - scpd: schedule called with preemption disabled
       Monitor to ensure schedule is called with preemption disabled

     - snep: schedule does not enable preempt
       Monitor to ensure schedule does not enable preempt

     - sncid: schedule not called with interrupt disabled
       Monitor to ensure schedule is not called with interrupt disabled

* tag 'trace-latency-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tools/rv: Allow rv list to filter for container
  Documentation/rv: Add docs for the sched monitors
  verification/dot2k: Add support for nested monitors
  tools/rv: Add support for nested monitors
  rv: Add scpd, snep and sncid per-cpu monitors
  rv: Add snroc per-task monitor
  rv: Add sco and tss per-cpu monitors
  rv: Add option for nested monitors and include sched
  sched: Add sched tracepoints for RV task model
  rv: Add license identifiers to monitor files
  tracing: Fix DECLARE_TRACE_CONDITION
  trace/osnoise: Add trace events for samples
parents 31eb415b 4ffef957
Loading
Loading
Loading
Loading
+69 −0
Original line number Diff line number Diff line
.. SPDX-License-Identifier: GPL-2.0

============
rv-mon-sched
============
-----------------------------
Scheduler monitors collection
-----------------------------

:Manual section: 1

SYNOPSIS
========

**rv mon sched** [*OPTIONS*]

**rv mon <NESTED_MONITOR>** [*OPTIONS*]

**rv mon sched:<NESTED_MONITOR>** [*OPTIONS*]

DESCRIPTION
===========

The scheduler monitor collection is a container for several monitors to model
the behaviour of the scheduler. Each monitor describes a specification that
the scheduler should follow.

As a monitor container, it will enable all nested monitors and set them
according to OPTIONS.
Nevertheless nested monitors can also be activated independently both by name
and by specifying sched: , e.g. to enable only monitor tss you can do any of:

    # rv mon sched:tss

    # rv mon tss

See kernel documentation for further information about this monitor:
<https://docs.kernel.org/trace/rv/monitor_sched.html>

OPTIONS
=======

.. include:: common_ikm.rst

NESTED MONITOR
==============

The available nested monitors are:
  * scpd: schedule called with preemption disabled
  * snep: schedule does not enable preempt
  * sncid: schedule not called with interrupt disabled
  * snroc: set non runnable on its own context
  * sco: scheduling context operations
  * tss: task switch while scheduling

SEE ALSO
========

**rv**\(1), **rv-mon**\(1)

Linux kernel *RV* documentation:
<https://www.kernel.org/doc/html/latest/trace/rv/index.html>

AUTHOR
======

Written by Gabriele Monaco <gmonaco@redhat.com>

.. include:: common_appendix.rst
+171 −0
Original line number Diff line number Diff line
Scheduler monitors
==================

- Name: sched
- Type: container for multiple monitors
- Author: Gabriele Monaco <gmonaco@redhat.com>, Daniel Bristot de Oliveira <bristot@kernel.org>

Description
-----------

Monitors describing complex systems, such as the scheduler, can easily grow to
the point where they are just hard to understand because of the many possible
state transitions.
Often it is possible to break such descriptions into smaller monitors,
sharing some or all events. Enabling those smaller monitors concurrently is,
in fact, testing the system as if we had one single larger monitor.
Splitting models into multiple specification is not only easier to
understand, but gives some more clues when we see errors.

The sched monitor is a set of specifications to describe the scheduler behaviour.
It includes several per-cpu and per-task monitors that work independently to verify
different specifications the scheduler should follow.

To make this system as straightforward as possible, sched specifications are *nested*
monitors, whereas sched itself is a *container*.
From the interface perspective, sched includes other monitors as sub-directories,
enabling/disabling or setting reactors to sched, propagates the change to all monitors,
however single monitors can be used independently as well.

It is important that future modules are built after their container (sched, in
this case), otherwise the linker would not respect the order and the nesting
wouldn't work as expected.
To do so, simply add them after sched in the Makefile.

Specifications
--------------

The specifications included in sched are currently a work in progress, adapting the ones
defined in by Daniel Bristot in [1].

Currently we included the following:

Monitor tss
~~~~~~~~~~~

The task switch while scheduling (tss) monitor ensures a task switch happens
only in scheduling context, that is inside a call to `__schedule`::

                     |
                     |
                     v
                   +-----------------+
                   |     thread      | <+
                   +-----------------+  |
                     |                  |
                     | schedule_entry   | schedule_exit
                     v                  |
    sched_switch                        |
  +---------------                      |
  |                       sched         |
  +-------------->                     -+

Monitor sco
~~~~~~~~~~~

The scheduling context operations (sco) monitor ensures changes in a task state
happen only in thread context::


                        |
                        |
                        v
    sched_set_state   +------------------+
  +------------------ |                  |
  |                   |  thread_context  |
  +-----------------> |                  | <+
                      +------------------+  |
                        |                   |
                        | schedule_entry    | schedule_exit
                        v                   |
                                            |
                       scheduling_context  -+

Monitor snroc
~~~~~~~~~~~~~

The set non runnable on its own context (snroc) monitor ensures changes in a
task state happens only in the respective task's context. This is a per-task
monitor::

                        |
                        |
                        v
                      +------------------+
                      |  other_context   | <+
                      +------------------+  |
                        |                   |
                        | sched_switch_in   | sched_switch_out
                        v                   |
    sched_set_state                         |
  +------------------                       |
  |                       own_context       |
  +----------------->                      -+

Monitor scpd
~~~~~~~~~~~~

The schedule called with preemption disabled (scpd) monitor ensures schedule is
called with preemption disabled::

                       |
                       |
                       v
                     +------------------+
                     |    cant_sched    | <+
                     +------------------+  |
                       |                   |
                       | preempt_disable   | preempt_enable
                       v                   |
    schedule_entry                         |
    schedule_exit                          |
  +-----------------      can_sched        |
  |                                        |
  +---------------->                      -+

Monitor snep
~~~~~~~~~~~~

The schedule does not enable preempt (snep) monitor ensures a schedule call
does not enable preemption::

                        |
                        |
                        v
    preempt_disable   +------------------------+
    preempt_enable    |                        |
  +------------------ | non_scheduling_context |
  |                   |                        |
  +-----------------> |                        | <+
                      +------------------------+  |
                        |                         |
                        | schedule_entry          | schedule_exit
                        v                         |
                                                  |
                          scheduling_contex      -+

Monitor sncid
~~~~~~~~~~~~~

The schedule not called with interrupt disabled (sncid) monitor ensures
schedule is not called with interrupt disabled::

                       |
                       |
                       v
    schedule_entry   +--------------+
    schedule_exit    |              |
  +----------------- |  can_sched   |
  |                  |              |
  +----------------> |              | <+
                     +--------------+  |
                       |               |
                       | irq_disable   | irq_enable
                       v               |
                                       |
                        cant_sched    -+

References
----------

[1] - https://bristot.me/linux-task-model
+2 −2
Original line number Diff line number Diff line
@@ -7,7 +7,7 @@
#ifndef _LINUX_RV_H
#define _LINUX_RV_H

#define MAX_DA_NAME_LEN	24
#define MAX_DA_NAME_LEN	32

#ifdef CONFIG_RV
/*
@@ -56,7 +56,7 @@ struct rv_monitor {

bool rv_monitoring_on(void);
int rv_unregister_monitor(struct rv_monitor *monitor);
int rv_register_monitor(struct rv_monitor *monitor);
int rv_register_monitor(struct rv_monitor *monitor, struct rv_monitor *parent);
int rv_get_task_monitor_slot(void);
void rv_put_task_monitor_slot(int slot);

+16 −0
Original line number Diff line number Diff line
@@ -46,6 +46,7 @@
#include <linux/rv.h>
#include <linux/livepatch_sched.h>
#include <linux/uidgid_types.h>
#include <linux/tracepoint-defs.h>
#include <asm/kmap_size.h>

/* task_struct member predeclarations (sorted alphabetically): */
@@ -187,6 +188,12 @@ struct user_event_mm;
# define debug_rtlock_wait_restore_state()	do { } while (0)
#endif

#define trace_set_current_state(state_value)                     \
	do {                                                     \
		if (tracepoint_enabled(sched_set_state_tp))      \
			__trace_set_current_state(state_value); \
	} while (0)

/*
 * set_current_state() includes a barrier so that the write of current->__state
 * is correctly serialised wrt the caller's subsequent test of whether to
@@ -227,12 +234,14 @@ struct user_event_mm;
#define __set_current_state(state_value)				\
	do {								\
		debug_normal_state_change((state_value));		\
		trace_set_current_state(state_value);			\
		WRITE_ONCE(current->__state, (state_value));		\
	} while (0)

#define set_current_state(state_value)					\
	do {								\
		debug_normal_state_change((state_value));		\
		trace_set_current_state(state_value);			\
		smp_store_mb(current->__state, (state_value));		\
	} while (0)

@@ -248,6 +257,7 @@ struct user_event_mm;
									\
		raw_spin_lock_irqsave(&current->pi_lock, flags);	\
		debug_special_state_change((state_value));		\
		trace_set_current_state(state_value);			\
		WRITE_ONCE(current->__state, (state_value));		\
		raw_spin_unlock_irqrestore(&current->pi_lock, flags);	\
	} while (0)
@@ -283,6 +293,7 @@ struct user_event_mm;
		raw_spin_lock(&current->pi_lock);			\
		current->saved_state = current->__state;		\
		debug_rtlock_wait_set_state();				\
		trace_set_current_state(TASK_RTLOCK_WAIT);		\
		WRITE_ONCE(current->__state, TASK_RTLOCK_WAIT);		\
		raw_spin_unlock(&current->pi_lock);			\
	} while (0);
@@ -292,6 +303,7 @@ struct user_event_mm;
		lockdep_assert_irqs_disabled();				\
		raw_spin_lock(&current->pi_lock);			\
		debug_rtlock_wait_restore_state();			\
		trace_set_current_state(current->saved_state);		\
		WRITE_ONCE(current->__state, current->saved_state);	\
		current->saved_state = TASK_RUNNING;			\
		raw_spin_unlock(&current->pi_lock);			\
@@ -328,6 +340,10 @@ extern void io_schedule_finish(int token);
extern long io_schedule_timeout(long timeout);
extern void io_schedule(void);

/* wrapper function to trace from this header file */
DECLARE_TRACEPOINT(sched_set_state_tp);
extern void __trace_set_current_state(int state_value);

/**
 * struct prev_cputime - snapshot of system and user cputime
 * @utime: time spent in user mode
+7 −0
Original line number Diff line number Diff line
@@ -76,6 +76,10 @@
#define DECLARE_TRACE(name, proto, args)	\
	DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))

#undef DECLARE_TRACE_CONDITION
#define DECLARE_TRACE_CONDITION(name, proto, args, cond)	\
	DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))

/* If requested, create helpers for calling these tracepoints from Rust. */
#ifdef CREATE_RUST_TRACE_POINTS
#undef DEFINE_RUST_DO_TRACE
@@ -108,6 +112,8 @@
/* Make all open coded DECLARE_TRACE nops */
#undef DECLARE_TRACE
#define DECLARE_TRACE(name, proto, args)
#undef DECLARE_TRACE_CONDITION
#define DECLARE_TRACE_CONDITION(name, proto, args, cond)

#ifdef TRACEPOINTS_ENABLED
#include <trace/trace_events.h>
@@ -129,6 +135,7 @@
#undef DEFINE_EVENT_CONDITION
#undef TRACE_HEADER_MULTI_READ
#undef DECLARE_TRACE
#undef DECLARE_TRACE_CONDITION

/* Only undef what we defined in this file */
#ifdef UNDEF_TRACE_INCLUDE_FILE
Loading