Commit d348c223 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull power management updates from Rafael Wysocki:
 "There are quite a few interesting things here, including new hardware
  support, new features, some bug fixes and documentation updates. In
  addition, there are a usual bunch of minor fixes and cleanups all
  over.

  In the new hardware support category, there are intel_pstate and
  intel_rapl driver updates to support new processors, Panther Lake,
  Wildcat Lake, Noval Lake, and Diamond Rapids in the OOB mode, OPP and
  bandwidth allocation support in the tegra186 cpufreq driver, and
  JH7110S SOC support in dt-platdev cpufreq.

  The new features are the PM QoS CPU latency limit for suspend-to-idle,
  the netlink support for the energy model management, support for
  terminating system suspend via a wakeup event during the sync of file
  systems, configurable number of hibernation compression threads, the
  runtime PM auto-cleanup macros, and the "poweroff" PM event that is
  expected to be used during system shutdown.

  Bugs are mostly fixed in cpuidle governors, but there are also fixes
  elsewhere, like in the amd-pstate cpufreq driver.

  Documentation updates include, but are not limited to, a new doc on
  debugging shutdown hangs, cross-referencing fixes and cleanups in the
  intel_pstate documentation, and updates of comments in the core
  hibernation code.

  Specifics:

   - Introduce and document a QoS limit on CPU exit latency during
     wakeup from suspend-to-idle (Ulf Hansson)

   - Add support for building libcpupower statically (Zuo An)

   - Add support for sending netlink notifications to user space on
     energy model updates (Changwoo Mini, Peng Fan)

   - Minor improvements to the Rust OPP interface (Tamir Duberstein)

   - Fixes to scope-based pointers in the OPP library (Viresh Kumar)

   - Use residency threshold in polling state override decisions in the
     menu cpuidle governor (Aboorva Devarajan)

   - Add sanity check for exit latency and target residency in the
     cpufreq core (Rafael Wysocki)

   - Use this_cpu_ptr() where possible in the teo governor (Christian
     Loehle)

   - Rework the handling of tick wakeups in the teo cpuidle governor to
     increase the likelihood of stopping the scheduler tick in the cases
     when tick wakeups can be counted as non-timer ones (Rafael Wysocki)

   - Fix a reverse condition in the teo cpuidle governor and drop a
     misguided target residency check from it (Rafael Wysocki)

   - Clean up multiple minor defects in the teo cpuidle governor (Rafael
     Wysocki)

   - Update header inclusion to make it follow the Include What You Use
     principle (Andy Shevchenko)

   - Enable MSR-based RAPL PMU support in the intel_rapl power capping
     driver and arrange for using it on the Panther Lake and Wildcat
     Lake processors (Kuppuswamy Sathyanarayanan)

   - Add support for Nova Lake and Wildcat Lake processors to the
     intel_rapl power capping driver (Kaushlendra Kumar, Srinivas
     Pandruvada)

   - Add OPP and bandwidth support for Tegra186 (Aaron Kling)

   - Optimizations for parameter array handling in the amd-pstate
     cpufreq driver (Mario Limonciello)

   - Fix for mode changes with offline CPUs in the amd-pstate cpufreq
     driver (Gautham Shenoy)

   - Preserve freq_table_sorted across suspend/hibernate in the cpufreq
     core (Zihuan Zhang)

   - Adjust energy model rules for Intel hybrid platforms in the
     intel_pstate cpufreq driver and improve printing of debug messages
     in it (Rafael Wysocki)

   - Replace deprecated strcpy() in cpufreq_unregister_governor()
     (Thorsten Blum)

   - Fix duplicate hyperlink target errors in the intel_pstate cpufreq
     driver documentation and use :ref: directive for internal linking
     in it (Swaraj Gaikwad, Bagas Sanjaya)

   - Add Diamond Rapids OOB mode support to the intel_pstate cpufreq
     driver (Kuppuswamy Sathyanarayanan)

   - Use mutex guard for driver locking in the intel_pstate driver and
     eliminate some code duplication from it (Rafael Wysocki)

   - Replace udelay() with usleep_range() in ACPI cpufreq (Kaushlendra
     Kumar)

   - Minor improvements to various cpufreq drivers (Christian Marangi,
     Hal Feng, Jie Zhan, Marco Crivellari, Miaoqian Lin, and Shuhao Fu)

   - Replace snprintf() with scnprintf() in show_trace_dev_match()
     (Kaushlendra Kumar)

   - Fix memory allocation error handling in pm_vt_switch_required()
     (Malaya Kumar Rout)

   - Introduce CALL_PM_OP() macro and use it to simplify code in generic
     PM operations (Kaushlendra Kumar)

   - Add module param to backtrace all CPUs in the device power
     management watchdog (Sergey Senozhatsky)

   - Rework message printing in swsusp_save() (Rafael Wysocki)

   - Make it possible to change the number of hibernation compression
     threads (Xueqin Luo)

   - Clarify that only cgroup1 freezer uses PM freezer (Tejun Heo)

   - Add document on debugging shutdown hangs to PM documentation and
     correct a mistaken configuration option in it (Mario Limonciello)

   - Shut down wakeup source timer before removing the wakeup source
     from the list (Kaushlendra Kumar, Rafael Wysocki)

   - Introduce new PMSG_POWEROFF event for system shutdown handling with
     the help of PM device callbacks (Mario Limonciello)

   - Make pm_test delay interruptible by wakeup events (Riwen Lu)

   - Clean up kernel-doc comment style usage in the core hibernation
     code and remove unuseful comments from it (Sunday Adelodun, Rafael
     Wysocki)

   - Add support for handling wakeup events and aborting the suspend
     process while it is syncing file systems (Samuel Wu, Rafael
     Wysocki)

   - Add WQ_UNBOUND to pm_wq workqueue (Marco Crivellari)

   - Add runtime PM wrapper macros for ACQUIRE()/ACQUIRE_ERR() and use
     them in the PCI core and the ACPI TAD driver (Rafael Wysocki)

   - Improve runtime PM in the ACPI TAD driver (Rafael Wysocki)

   - Update pm_runtime_allow/forbid() documentation (Rafael Wysocki)

   - Fix typos in runtime.c comments (Malaya Kumar Rout)

   - Move governor.h from devfreq under include/linux/ and rename to
     devfreq-governor.h to allow devfreq governor definitions in out of
     drivers/devfreq/ (Dmitry Baryshkov)

   - Use min() to improve readability in tegra30-devfreq.c (Thorsten
     Blum)

   - Fix potential use-after-free issue of OPP handling in
     hisi_uncore_freq.c (Pengjie Zhang)

   - Fix typo in DFSO_DOWNDIFFERENTIAL macro name in
     governor_simpleondemand.c in devfreq (Riwen Lu)"

* tag 'pm-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (96 commits)
  PM / devfreq: Fix typo in DFSO_DOWNDIFFERENTIAL macro name
  cpuidle: Warn instead of bailing out if target residency check fails
  cpuidle: Update header inclusion
  Documentation: power/cpuidle: Document the CPU system wakeup latency QoS
  cpuidle: Respect the CPU system wakeup QoS limit for cpuidle
  sched: idle: Respect the CPU system wakeup QoS limit for s2idle
  pmdomain: Respect the CPU system wakeup QoS limit for cpuidle
  pmdomain: Respect the CPU system wakeup QoS limit for s2idle
  PM: QoS: Introduce a CPU system wakeup QoS limit
  cpuidle: governors: teo: Add missing space to the description
  PM: hibernate: Extra cleanup of comments in swap handling code
  PM / devfreq: tegra30: use min to simplify actmon_cpu_to_emc_rate
  PM / devfreq: hisi: Fix potential UAF in OPP handling
  PM / devfreq: Move governor.h to a public header location
  powercap: intel_rapl: Enable MSR-based RAPL PMU support
  powercap: intel_rapl: Prepare read_raw() interface for atomic-context callers
  cpufreq: qcom-nvmem: fix compilation warning for qcom_cpufreq_ipq806x_match_list
  PM: sleep: Call pm_sleep_fs_sync() instead of ksys_sync_helper()
  PM: sleep: Add support for wakeup during filesystem sync
  cpufreq: ACPI: Replace udelay() with usleep_range()
  ...
parents 959bfe49 7cede21e
Loading
Loading
Loading
Loading
+16 −0
Original line number Diff line number Diff line
@@ -454,3 +454,19 @@ Description:
		disables it.  Reads from the file return the current value.
		The default is "1" if the build-time "SUSPEND_SKIP_SYNC" config
		flag is unset, or "0" otherwise.

What:           /sys/power/hibernate_compression_threads
Date:           October 2025
Contact:        <luoxueqin@kylinos.cn>
Description:
                Controls the number of threads used for compression
                and decompression of hibernation images.

                The value can be adjusted at runtime to balance
                performance and CPU utilization.

                The change takes effect on the next hibernation or
                resume operation.

                Minimum value: 1
                Default value: 3
+10 −0
Original line number Diff line number Diff line
@@ -1907,6 +1907,16 @@
			/sys/power/pm_test). Only available when CONFIG_PM_DEBUG
			is set. Default value is 5.

	hibernate_compression_threads=
			[HIBERNATION]
			Set the number of threads used for compressing or decompressing
			hibernation images.

			Format: <integer>
			Default: 3
			Minimum: 1
			Example: hibernate_compression_threads=4

	highmem=nn[KMG]	[KNL,BOOT,EARLY] forces the highmem zone to have an exact
			size of <nn>. This works even on boxes that have no
			highmem otherwise. This also works to reduce highmem
+9 −0
Original line number Diff line number Diff line
@@ -580,6 +580,15 @@ the given CPU as the upper limit for the exit latency of the idle states that
they are allowed to select for that CPU.  They should never select any idle
states with exit latency beyond that limit.

While the above CPU QoS constraints apply to CPU idle time management, user
space may also request a CPU system wakeup latency QoS limit, via the
`cpu_wakeup_latency` file.  This QoS constraint is respected when selecting a
suitable idle state for the CPUs, while entering the system-wide suspend-to-idle
sleep state, but also to the regular CPU idle time management.

Note that, the management of the `cpu_wakeup_latency` file works according to
the 'cpu_dma_latency' file from user space point of view.  Moreover, the unit
is also microseconds.

Idle States Control Via Kernel Command Line
===========================================
+74 −59
Original line number Diff line number Diff line
@@ -48,8 +48,9 @@ only way to pass early-configuration-time parameters to it is via the kernel
command line.  However, its configuration can be adjusted via ``sysfs`` to a
great extent.  In some configurations it even is possible to unregister it via
``sysfs`` which allows another ``CPUFreq`` scaling driver to be loaded and
registered (see `below <status_attr_>`_).
registered (see :ref:`below <status_attr>`).

.. _operation_modes:

Operation Modes
===============
@@ -62,6 +63,8 @@ a certain performance scaling algorithm. Which of them will be in effect
depends on what kernel command line options are used and on the capabilities of
the processor.

.. _active_mode:

Active Mode
-----------

@@ -94,6 +97,8 @@ Which of the P-state selection algorithms is used by default depends on the
Namely, if that option is set, the ``performance`` algorithm will be used by
default, and the other one will be used by default if it is not set.

.. _active_mode_hwp:

Active Mode With HWP
~~~~~~~~~~~~~~~~~~~~

@@ -123,7 +128,7 @@ Energy-Performance Bias (EPB) knob (otherwise), which means that the processor's
internal P-state selection logic is expected to focus entirely on performance.

This will override the EPP/EPB setting coming from the ``sysfs`` interface
(see `Energy vs Performance Hints`_ below).  Moreover, any attempts to change
(see :ref:`energy_performance_hints` below).  Moreover, any attempts to change
the EPP/EPB to a value different from 0 ("performance") via ``sysfs`` in this
configuration will be rejected.

@@ -192,6 +197,8 @@ This is the default P-state selection algorithm if the
:c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option
is not set.

.. _passive_mode:

Passive Mode
------------

@@ -289,12 +296,12 @@ Unlike ``_PSS`` objects in the ACPI tables, ``intel_pstate`` always exposes
the entire range of available P-states, including the whole turbo range, to the
``CPUFreq`` core and (in the passive mode) to generic scaling governors.  This
generally causes turbo P-states to be set more often when ``intel_pstate`` is
used relative to ACPI-based CPU performance scaling (see `below <acpi-cpufreq_>`_
for more information).
used relative to ACPI-based CPU performance scaling (see
:ref:`below <acpi-cpufreq>` for more information).

Moreover, since ``intel_pstate`` always knows what the real turbo threshold is
(even if the Configurable TDP feature is enabled in the processor), its
``no_turbo`` attribute in ``sysfs`` (described `below <no_turbo_attr_>`_) should
``no_turbo`` attribute in ``sysfs`` (described :ref:`below <no_turbo_attr>`) should
work as expected in all cases (that is, if set to disable turbo P-states, it
always should prevent ``intel_pstate`` from using them).

@@ -307,12 +314,12 @@ pieces of information on it to be known, including:

 * The minimum supported P-state.

 * The maximum supported `non-turbo P-state <turbo_>`_.
 * The maximum supported :ref:`non-turbo P-state <turbo>`.

 * Whether or not turbo P-states are supported at all.

 * The maximum supported `one-core turbo P-state <turbo_>`_ (if turbo P-states
   are supported).
 * The maximum supported :ref:`one-core turbo P-state <turbo>` (if turbo
   P-states are supported).

 * The scaling formula to translate the driver's internal representation
   of P-states into frequencies and the other way around.
@@ -400,10 +407,10 @@ Energy-Aware Scheduling Support

If ``CONFIG_ENERGY_MODEL`` has been set during kernel configuration and
``intel_pstate`` runs on a hybrid processor without SMT, in addition to enabling
`CAS <CAS_>`_ it registers an Energy Model for the processor.  This allows the
:ref:`CAS` it registers an Energy Model for the processor.  This allows the
Energy-Aware Scheduling (EAS) support to be enabled in the CPU scheduler if
``schedutil`` is used as the  ``CPUFreq`` governor which requires ``intel_pstate``
to operate in the `passive mode <Passive Mode_>`_.
to operate in the :ref:`passive mode <passive_mode>`.

The Energy Model registered by ``intel_pstate`` is artificial (that is, it is
based on abstract cost values and it does not include any real power numbers)
@@ -432,6 +439,8 @@ the ``energy_model`` directory in ``debugfs`` (typlically mounted on
User Space Interface in ``sysfs``
=================================

.. _global_attributes:

Global Attributes
-----------------

@@ -444,8 +453,8 @@ argument is passed to the kernel in the command line.

``max_perf_pct``
	Maximum P-state the driver is allowed to set in percent of the
	maximum supported performance level (the highest supported `turbo
	P-state <turbo_>`_).
	maximum supported performance level (the highest supported :ref:`turbo
	P-state <turbo>`).

	This attribute will not be exposed if the
	``intel_pstate=per_cpu_perf_limits`` argument is present in the kernel
@@ -453,8 +462,8 @@ argument is passed to the kernel in the command line.

``min_perf_pct``
	Minimum P-state the driver is allowed to set in percent of the
	maximum supported performance level (the highest supported `turbo
	P-state <turbo_>`_).
	maximum supported performance level (the highest supported :ref:`turbo
	P-state <turbo>`).

	This attribute will not be exposed if the
	``intel_pstate=per_cpu_perf_limits`` argument is present in the kernel
@@ -463,18 +472,18 @@ argument is passed to the kernel in the command line.
``num_pstates``
	Number of P-states supported by the processor (between 0 and 255
	inclusive) including both turbo and non-turbo P-states (see
	`Turbo P-states Support`_).
	:ref:`turbo`).

	This attribute is present only if the value exposed by it is the same
	for all of the CPUs in the system.

	The value of this attribute is not affected by the ``no_turbo``
	setting described `below <no_turbo_attr_>`_.
	setting described :ref:`below <no_turbo_attr>`.

	This attribute is read-only.

``turbo_pct``
	Ratio of the `turbo range <turbo_>`_ size to the size of the entire
	Ratio of the :ref:`turbo range <turbo>` size to the size of the entire
	range of supported P-states, in percent.

	This attribute is present only if the value exposed by it is the same
@@ -486,7 +495,7 @@ argument is passed to the kernel in the command line.

``no_turbo``
	If set (equal to 1), the driver is not allowed to set any turbo P-states
	(see `Turbo P-states Support`_).  If unset (equal to 0, which is the
	(see :ref:`turbo`).  If unset (equal to 0, which is the
	default), turbo P-states can be set by the driver.
	[Note that ``intel_pstate`` does not support the general ``boost``
	attribute (supported by some other scaling drivers) which is replaced
@@ -495,11 +504,11 @@ argument is passed to the kernel in the command line.
	This attribute does not affect the maximum supported frequency value
	supplied to the ``CPUFreq`` core and exposed via the policy interface,
	but it affects the maximum possible value of per-policy P-state	limits
	(see `Interpretation of Policy Attributes`_ below for details).
	(see :ref:`policy_attributes_interpretation` below for details).

``hwp_dynamic_boost``
	This attribute is only present if ``intel_pstate`` works in the
	`active mode with the HWP feature enabled <Active Mode With HWP_>`_ in
	:ref:`active mode with the HWP feature enabled <active_mode_hwp>` in
	the processor.  If set (equal to 1), it causes the minimum P-state limit
	to be increased dynamically for a short time whenever a task previously
	waiting on I/O is selected to run on a given logical CPU (the purpose
@@ -514,12 +523,12 @@ argument is passed to the kernel in the command line.
	Operation mode of the driver: "active", "passive" or "off".

	"active"
		The driver is functional and in the `active mode
		<Active Mode_>`_.
		The driver is functional and in the :ref:`active mode
		<active_mode>`.

	"passive"
		The driver is functional and in the `passive mode
		<Passive Mode_>`_.
		The driver is functional and in the :ref:`passive mode
		<passive_mode>`.

	"off"
		The driver is not functional (it is not registered as a scaling
@@ -547,13 +556,15 @@ argument is passed to the kernel in the command line.
	attribute to "1" enables the energy-efficiency optimizations and setting
	to "0" disables them.

.. _policy_attributes_interpretation:

Interpretation of Policy Attributes
-----------------------------------

The interpretation of some ``CPUFreq`` policy attributes described in
Documentation/admin-guide/pm/cpufreq.rst is special with ``intel_pstate``
as the current scaling driver and it generally depends on the driver's
`operation mode <Operation Modes_>`_.
:ref:`operation mode <operation_modes>`.

First of all, the values of the ``cpuinfo_max_freq``, ``cpuinfo_min_freq`` and
``scaling_cur_freq`` attributes are produced by applying a processor-specific
@@ -562,9 +573,10 @@ Also, the values of the ``scaling_max_freq`` and ``scaling_min_freq``
attributes are capped by the frequency corresponding to the maximum P-state that
the driver is allowed to set.

If the ``no_turbo`` `global attribute <no_turbo_attr_>`_ is set, the driver is
not allowed to use turbo P-states, so the maximum value of ``scaling_max_freq``
and ``scaling_min_freq`` is limited to the maximum non-turbo P-state frequency.
If the ``no_turbo`` :ref:`global attribute <no_turbo_attr>` is set, the driver
is not allowed to use turbo P-states, so the maximum value of
``scaling_max_freq`` and ``scaling_min_freq`` is limited to the maximum
non-turbo P-state frequency.
Accordingly, setting ``no_turbo`` causes ``scaling_max_freq`` and
``scaling_min_freq`` to go down to that value if they were above it before.
However, the old values of ``scaling_max_freq`` and ``scaling_min_freq`` will be
@@ -576,7 +588,7 @@ and ``scaling_min_freq`` corresponds to the maximum supported turbo P-state,
which also is the value of ``cpuinfo_max_freq`` in either case.

Next, the following policy attributes have special meaning if
``intel_pstate`` works in the `active mode <Active Mode_>`_:
``intel_pstate`` works in the :ref:`active mode <active_mode>`:

``scaling_available_governors``
	List of P-state selection algorithms provided by ``intel_pstate``.
@@ -597,20 +609,22 @@ processor:
	Shows the base frequency of the CPU. Any frequency above this will be
	in the turbo frequency range.

The meaning of these attributes in the `passive mode <Passive Mode_>`_ is the
The meaning of these attributes in the :ref:`passive mode <passive_mode>` is the
same as for other scaling drivers.

Additionally, the value of the ``scaling_driver`` attribute for ``intel_pstate``
depends on the operation mode of the driver.  Namely, it is either
"intel_pstate" (in the `active mode <Active Mode_>`_) or "intel_cpufreq" (in the
`passive mode <Passive Mode_>`_).
"intel_pstate" (in the :ref:`active mode <active_mode>`) or "intel_cpufreq"
(in the :ref:`passive mode <passive_mode>`).

.. _pstate_limits_coordination:

Coordination of P-State Limits
------------------------------

``intel_pstate`` allows P-state limits to be set in two ways: with the help of
the ``max_perf_pct`` and ``min_perf_pct`` `global attributes
<Global Attributes_>`_ or via the ``scaling_max_freq`` and ``scaling_min_freq``
the ``max_perf_pct`` and ``min_perf_pct`` :ref:`global attributes
<global_attributes>` or via the ``scaling_max_freq`` and ``scaling_min_freq``
``CPUFreq`` policy attributes.  The coordination between those limits is based
on the following rules, regardless of the current operation mode of the driver:

@@ -632,17 +646,18 @@ on the following rules, regardless of the current operation mode of the driver:

 3. The global and per-policy limits can be set independently.

In the `active mode with the HWP feature enabled <Active Mode With HWP_>`_, the
In the :ref:`active mode with the HWP feature enabled <active_mode_hwp>`, the
resulting effective values are written into hardware registers whenever the
limits change in order to request its internal P-state selection logic to always
set P-states within these limits.  Otherwise, the limits are taken into account
by scaling governors (in the `passive mode <Passive Mode_>`_) and by the driver
every time before setting a new P-state for a CPU.
by scaling governors (in the :ref:`passive mode <passive_mode>`) and by the
driver every time before setting a new P-state for a CPU.

Additionally, if the ``intel_pstate=per_cpu_perf_limits`` command line argument
is passed to the kernel, ``max_perf_pct`` and ``min_perf_pct`` are not exposed
at all and the only way to set the limits is by using the policy attributes.

.. _energy_performance_hints:

Energy vs Performance Hints
---------------------------
@@ -702,9 +717,9 @@ output.
On those systems each ``_PSS`` object returns a list of P-states supported by
the corresponding CPU which basically is a subset of the P-states range that can
be used by ``intel_pstate`` on the same system, with one exception: the whole
`turbo range <turbo_>`_ is represented by one item in it (the topmost one).  By
convention, the frequency returned by ``_PSS`` for that item is greater by 1 MHz
than the frequency of the highest non-turbo P-state listed by it, but the
:ref:`turbo range <turbo>` is represented by one item in it (the topmost one).
By convention, the frequency returned by ``_PSS`` for that item is greater by
1 MHz than the frequency of the highest non-turbo P-state listed by it, but the
corresponding P-state representation (following the hardware specification)
returned for it matches the maximum supported turbo P-state (or is the
special value 255 meaning essentially "go as high as you can get").
@@ -730,18 +745,18 @@ benefit from running at turbo frequencies will be given non-turbo P-states
instead.

One more issue related to that may appear on systems supporting the
`Configurable TDP feature <turbo_>`_ allowing the platform firmware to set the
turbo threshold.  Namely, if that is not coordinated with the lists of P-states
returned by ``_PSS`` properly, there may be more than one item corresponding to
a turbo P-state in those lists and there may be a problem with avoiding the
turbo range (if desirable or necessary).  Usually, to avoid using turbo
P-states overall, ``acpi-cpufreq`` simply avoids using the topmost state listed
by ``_PSS``, but that is not sufficient when there are other turbo P-states in
the list returned by it.
:ref:`Configurable TDP feature <turbo>` allowing the platform firmware to set
the turbo threshold.  Namely, if that is not coordinated with the lists of
P-states returned by ``_PSS`` properly, there may be more than one item
corresponding to a turbo P-state in those lists and there may be a problem with
avoiding the turbo range (if desirable or necessary).  Usually, to avoid using
turbo P-states overall, ``acpi-cpufreq`` simply avoids using the topmost state
listed by ``_PSS``, but that is not sufficient when there are other turbo
P-states in the list returned by it.

Apart from the above, ``acpi-cpufreq`` works like ``intel_pstate`` in the
`passive mode <Passive Mode_>`_, except that the number of P-states it can set
is limited to the ones listed by the ACPI ``_PSS`` objects.
:ref:`passive mode <passive_mode>`, except that the number of P-states it can
set is limited to the ones listed by the ACPI ``_PSS`` objects.


Kernel Command Line Options for ``intel_pstate``
@@ -756,11 +771,11 @@ of them have to be prepended with the ``intel_pstate=`` prefix.
	processor is supported by it.

``active``
	Register ``intel_pstate`` in the `active mode <Active Mode_>`_ to start
	with.
	Register ``intel_pstate`` in the :ref:`active mode <active_mode>` to
        start with.

``passive``
	Register ``intel_pstate`` in the `passive mode <Passive Mode_>`_ to
	Register ``intel_pstate`` in the :ref:`passive mode <passive_mode>` to
	start with.

``force``
@@ -793,12 +808,12 @@ of them have to be prepended with the ``intel_pstate=`` prefix.
	and this option has no effect.

``per_cpu_perf_limits``
	Use per-logical-CPU P-State limits (see `Coordination of P-state
	Limits`_ for details).
	Use per-logical-CPU P-State limits (see
        :ref:`pstate_limits_coordination` for details).

``no_cas``
	Do not enable `capacity-aware scheduling <CAS_>`_ which is enabled by
	default on hybrid systems without SMT.
	Do not enable :ref:`capacity-aware scheduling <CAS>` which is enabled
        by default on hybrid systems without SMT.

Diagnostics and Tuning
======================
@@ -810,7 +825,7 @@ There are two static trace events that can be used for ``intel_pstate``
diagnostics.  One of them is the ``cpu_frequency`` trace event generally used
by ``CPUFreq``, and the other one is the ``pstate_sample`` trace event specific
to ``intel_pstate``.  Both of them are triggered by ``intel_pstate`` only if
it works in the `active mode <Active Mode_>`_.
it works in the :ref:`active mode <active_mode>`.

The following sequence of shell commands can be used to enable them and see
their output (if the kernel is generally configured to support event tracing)::
@@ -822,7 +837,7 @@ their output (if the kernel is generally configured to support event tracing)::
 gnome-terminal--4510  [001] ..s.  1177.680733: pstate_sample: core_busy=107 scaled=94 from=26 to=26 mperf=1143818 aperf=1230607 tsc=29838618 freq=2474476
 cat-5235  [002] ..s.  1177.681723: cpu_frequency: state=2900000 cpu_id=2

If ``intel_pstate`` works in the `passive mode <Passive Mode_>`_, the
If ``intel_pstate`` works in the :ref:`passive mode <passive_mode>`, the
``cpu_frequency`` trace event will be triggered either by the ``schedutil``
scaling governor (for the policies it is attached to), or by the ``CPUFreq``
core (for the policies with other scaling governors).
+113 −0
Original line number Diff line number Diff line
# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)

name: em

doc: |
  Energy model netlink interface to notify its changes.

protocol: genetlink

uapi-header: linux/energy_model.h

attribute-sets:
  -
    name: pds
    attributes:
      -
        name: pd
        type: nest
        nested-attributes: pd
        multi-attr: true
  -
    name: pd
    attributes:
      -
        name: pad
        type: pad
      -
        name: pd-id
        type: u32
      -
        name: flags
        type: u64
      -
        name: cpus
        type: string
  -
    name: pd-table
    attributes:
      -
        name: pd-id
        type: u32
      -
        name: ps
        type: nest
        nested-attributes: ps
        multi-attr: true
  -
    name: ps
    attributes:
      -
        name: pad
        type: pad
      -
        name: performance
        type: u64
      -
        name: frequency
        type: u64
      -
        name: power
        type: u64
      -
        name: cost
        type: u64
      -
        name: flags
        type: u64

operations:
  list:
    -
      name: get-pds
      attribute-set: pds
      doc: Get the list of information for all performance domains.
      do:
        reply:
          attributes:
            - pd
    -
      name: get-pd-table
      attribute-set: pd-table
      doc: Get the energy model table of a performance domain.
      do:
        request:
          attributes:
            - pd-id
        reply:
          attributes:
            - pd-id
            - ps
    -
      name: pd-created
      doc: A performance domain is created.
      notify: get-pd-table
      mcgrp: event
    -
      name: pd-updated
      doc: A performance domain is updated.
      notify: get-pd-table
      mcgrp: event
    -
      name: pd-deleted
      doc: A performance domain is deleted.
      attribute-set: pd-table
      event:
        attributes:
            - pd-id
      mcgrp: event

mcast-groups:
  list:
    -
      name: event
Loading