Commit 7d20aa5c authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull power management updates from Rafael Wysocki:
 "These are dominated by cpufreq updates which in turn are dominated by
  updates related to boost support in the core and drivers and
  amd-pstate driver optimizations.

  Apart from the above, there are some cpuidle updates including a
  rework of the most recent idle intervals handling in the venerable
  menu governor that leads to significant improvements in some
  performance benchmarks, as the governor is now more likely to predict
  a shorter idle duration in some cases, and there are updates of the
  core device power management code, mostly related to system suspend
  and resume, that should help to avoid potential issues arising when
  the drivers of devices depending on one another want to use different
  optimizations.

  There is also a usual collection of assorted fixes and cleanups,
  including removal of some unused code.

  Specifics:

   - Manage sysfs attributes and boost frequencies efficiently from
     cpufreq core to reduce boilerplate code in drivers (Viresh Kumar)

   - Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider,
     Dhananjay Ugwekar, Imran Shaik, zuoqian)

   - Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky
     Bai)

   - cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski)

   - Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng)

   - Optimize the amd-pstate driver to avoid cases where call paths end
     up calling the same writes multiple times and needlessly caching
     variables through code reorganization, locking overhaul and tracing
     adjustments (Mario Limonciello, Dhananjay Ugwekar)

   - Make it possible to avoid enabling capacity-aware scheduling (CAS)
     in the intel_pstate driver and relocate a check for out-of-band
     (OOB) platform handling in it to make it detect OOB before checking
     HWP availability (Rafael Wysocki)

   - Fix dbs_update() to avoid inadvertent conversions of negative
     integer values to unsigned int which causes CPU frequency selection
     to be inaccurate in some cases when the "conservative" cpufreq
     governor is in use (Jie Zhan)

   - Update the handling of the most recent idle intervals in the menu
     cpuidle governor to prevent useful information from being discarded
     by it in some cases and improve the prediction accuracy (Rafael
     Wysocki)

   - Make it possible to tell the intel_idle driver to ignore its
     built-in table of idle states for the given processor, clean up the
     handling of auto-demotion disabling on Baytrail and Cherrytrail
     chips in it, and update its MAINTAINERS entry (David Arcari, Artem
     Bityutskiy, Rafael Wysocki)

   - Make some cpuidle drivers use for_each_present_cpu() instead of
     for_each_possible_cpu() during initialization to avoid issues
     occurring when nosmp or maxcpus=0 are used (Jacky Bai)

   - Clean up the Energy Model handling code somewhat (Rafael Wysocki)

   - Use kfree_rcu() to simplify the handling of runtime Energy Model
     updates (Li RongQing)

   - Add an entry for the Energy Model framework to MAINTAINERS as
     properly maintained (Lukasz Luba)

   - Address RCU-related sparse warnings in the Energy Model code
     (Rafael Wysocki)

   - Remove ENERGY_MODEL dependency on SMP and allow it to be selected
     when DEVFREQ is set without CPUFREQ so it can be used on a wider
     range of systems (Jeson Gao)

   - Unify error handling during runtime suspend and runtime resume in
     the core to help drivers to implement more consistent runtime PM
     error handling (Rafael Wysocki)

   - Drop a redundant check from pm_runtime_force_resume() and rearrange
     documentation related to __pm_runtime_disable() (Rafael Wysocki)

   - Rework the handling of the "smart suspend" driver flag in the PM
     core to avoid issues hat may occur when drivers using it depend on
     some other drivers and clean up the related PM core code (Rafael
     Wysocki, Colin Ian King)

   - Fix the handling of devices with the power.direct_complete flag set
     if device_suspend() returns an error for at least one device to
     avoid situations in which some of them may not be resumed (Rafael
     Wysocki)

   - Use mutex_trylock() in hibernate_compressor_param_set() to avoid a
     possible deadlock that may occur if the "compressor" hibernation
     module parameter is accessed during the registration of a new
     ieee80211 device (Lizhi Xu)

   - Suppress sleeping parent warning in device_pm_add() in the case
     when new children are added under a device with the
     power.direct_complete set after it has been processed by
     device_resume() (Xu Yang)

   - Remove needless return in three void functions related to system
     wakeup (Zijun Hu)

   - Replace deprecated kmap_atomic() with kmap_local_page() in the
     hibernation core code (David Reaver)

   - Remove unused helper functions related to system sleep (David Alan
     Gilbert)

   - Clean up s2idle_enter() so it does not lock and unlock CPU offline
     in vain and update comments in it (Ulf Hansson)

   - Clean up broken white space in dpm_wait_for_children() (Geert
     Uytterhoeven)

   - Update the cpupower utility to fix lib version-ing in it and memory
     leaks in error legs, remove hard-coded values, and implement CPU
     physical core querying (Thomas Renninger, John B. Wyatt IV, Shuah
     Khan, Yiwei Lin, Zhongqiu Han)"

* tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (139 commits)
  PM: sleep: Fix bit masking operation
  dt-bindings: cpufreq: cpufreq-qcom-hw: Narrow properties on SDX75, SA8775p and SM8650
  dt-bindings: cpufreq: cpufreq-qcom-hw: Drop redundant minItems:1
  dt-bindings: cpufreq: cpufreq-qcom-hw: Add missing constraint for interrupt-names
  dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCS8300 compatible
  cpufreq: Init cpufreq only for present CPUs
  PM: sleep: Fix handling devices with direct_complete set on errors
  cpuidle: Init cpuidle only for present CPUs
  PM: clk: Remove unused pm_clk_remove()
  PM: sleep: core: Fix indentation in dpm_wait_for_children()
  PM: s2idle: Extend comment in s2idle_enter()
  PM: s2idle: Drop redundant locks when entering s2idle
  PM: sleep: Remove unused pm_generic_ wrappers
  cpufreq: tegra186: Share policy per cluster
  cpupower: Make lib versioning scheme more obvious and fix version link
  PM: EM: Rework the depends on for CONFIG_ENERGY_MODEL
  PM: EM: Address RCU-related sparse warnings
  cpupower: Implement CPU physical core querying
  pm: cpupower: remove hard-coded topology depth values
  pm: cpupower: Fix cmd_monitor() error legs to free cpu_topology
  ...
parents 21e0ff5b c5a55e42
Loading
Loading
Loading
Loading
+3 −0
Original line number Diff line number Diff line
@@ -2314,6 +2314,9 @@
			per_cpu_perf_limits
			  Allow per-logical-CPU P-State performance control limits using
			  cpufreq sysfs interface
			no_cas
			  Do not enable capacity-aware scheduling (CAS) on
			  hybrid systems

	intremap=	[X86-64,Intel-IOMMU,EARLY]
			on	enable Interrupt Remapping (default)
+17 −12
Original line number Diff line number Diff line
@@ -275,20 +275,25 @@ values and, when predicting the idle duration next time, it computes the average
and variance of them.  If the variance is small (smaller than 400 square
milliseconds) or it is small relative to the average (the average is greater
that 6 times the standard deviation), the average is regarded as the "typical
interval" value.  Otherwise, the longest of the saved observed idle duration
interval" value.  Otherwise, either the longest or the shortest (depending on
which one is farther from the average) of the saved observed idle duration
values is discarded and the computation is repeated for the remaining ones.

Again, if the variance of them is small (in the above sense), the average is
taken as the "typical interval" value and so on, until either the "typical
interval" is determined or too many data points are disregarded, in which case
the "typical interval" is assumed to equal "infinity" (the maximum unsigned
integer value).

If the "typical interval" computed this way is long enough, the governor obtains
the time until the closest timer event with the assumption that the scheduler
tick will be stopped.  That time, referred to as the *sleep length* in what follows,
is the upper bound on the time before the next CPU wakeup.  It is used to determine
the sleep length range, which in turn is needed to get the sleep length correction
factor.
interval" is determined or too many data points are disregarded.  In the latter
case, if the size of the set of data points still under consideration is
sufficiently large, the next idle duration is not likely to be above the largest
idle duration value still in that set, so that value is taken as the predicted
next idle duration.  Finally, if the set of data points still under
consideration is too small, no prediction is made.

If the preliminary prediction of the next idle duration computed this way is
long enough, the governor obtains the time until the closest timer event with
the assumption that the scheduler tick will be stopped.  That time, referred to
as the *sleep length* in what follows, is the upper bound on the time before the
next CPU wakeup.  It is used to determine the sleep length range, which in turn
is needed to get the sleep length correction factor.

The ``menu`` governor maintains an array containing several correction factor
values that correspond to different sleep length ranges organized so that each
@@ -302,7 +307,7 @@ to 1 the correction factor becomes (it must fall between 0 and 1 inclusive).
The sleep length is multiplied by the correction factor for the range that it
falls into to obtain an approximation of the predicted idle duration that is
compared to the "typical interval" determined previously and the minimum of
the two is taken as the idle duration prediction.
the two is taken as the final idle duration prediction.

If the "typical interval" value is small, which means that the CPU is likely
to be woken up soon enough, the sleep length computation is skipped as it may
+13 −5
Original line number Diff line number Diff line
@@ -192,11 +192,19 @@ even if they have been enumerated (see :ref:`cpu-pm-qos` in
Documentation/admin-guide/pm/cpuidle.rst).
Setting ``max_cstate`` to 0 causes the ``intel_idle`` initialization to fail.

The ``no_acpi`` and ``use_acpi`` module parameters (recognized by ``intel_idle``
if the kernel has been configured with ACPI support) can be set to make the
driver ignore the system's ACPI tables entirely or use them for all of the
recognized processor models, respectively (they both are unset by default and
``use_acpi`` has no effect if ``no_acpi`` is set).
The ``no_acpi``, ``use_acpi`` and ``no_native`` module parameters are
recognized by ``intel_idle`` if the kernel has been configured with ACPI
support.  In the case that ACPI is not configured these flags have no impact
on functionality.

``no_acpi`` - Do not use ACPI at all.  Only native mode is available, no
ACPI mode.

``use_acpi`` - No-op in ACPI mode, the driver will consult ACPI tables for
C-states on/off status in native mode.

``no_native`` - Work only in ACPI mode, no native mode available (ignore
all custom tables).

The value of the ``states_off`` module parameter (0 by default) represents a
list of idle states to be disabled by default in the form of a bitmask.
+3 −0
Original line number Diff line number Diff line
@@ -696,6 +696,9 @@ of them have to be prepended with the ``intel_pstate=`` prefix.
	Use per-logical-CPU P-State limits (see `Coordination of P-state
	Limits`_ for details).

``no_cas``
	Do not enable capacity-aware scheduling (CAS) which is enabled by
	default on hybrid systems.

Diagnostics and Tuning
======================
+31 −4
Original line number Diff line number Diff line
@@ -34,6 +34,7 @@ properties:
      - description: v2 of CPUFREQ HW (EPSS)
        items:
          - enum:
              - qcom,qcs8300-cpufreq-epss
              - qcom,qdu1000-cpufreq-epss
              - qcom,sa8255p-cpufreq-epss
              - qcom,sa8775p-cpufreq-epss
@@ -111,22 +112,20 @@ allOf:
            enum:
              - qcom,qcm2290-cpufreq-hw
              - qcom,sar2130p-cpufreq-epss
              - qcom,sdx75-cpufreq-epss
    then:
      properties:
        reg:
          minItems: 1
          maxItems: 1

        reg-names:
          minItems: 1
          maxItems: 1

        interrupts:
          minItems: 1
          maxItems: 1

        interrupt-names:
          minItems: 1
          maxItems: 1

  - if:
      properties:
@@ -135,6 +134,7 @@ allOf:
            enum:
              - qcom,qdu1000-cpufreq-epss
              - qcom,sa8255p-cpufreq-epss
              - qcom,sa8775p-cpufreq-epss
              - qcom,sc7180-cpufreq-hw
              - qcom,sc8180x-cpufreq-hw
              - qcom,sc8280xp-cpufreq-epss
@@ -160,12 +160,14 @@ allOf:

        interrupt-names:
          minItems: 2
          maxItems: 2

  - if:
      properties:
        compatible:
          contains:
            enum:
              - qcom,qcs8300-cpufreq-epss
              - qcom,sc7280-cpufreq-epss
              - qcom,sm8250-cpufreq-epss
              - qcom,sm8350-cpufreq-epss
@@ -187,6 +189,7 @@ allOf:

        interrupt-names:
          minItems: 3
          maxItems: 3

  - if:
      properties:
@@ -211,7 +214,31 @@ allOf:

        interrupt-names:
          minItems: 2
          maxItems: 2

  - if:
      properties:
        compatible:
          contains:
            enum:
              - qcom,sm8650-cpufreq-epss
    then:
      properties:
        reg:
          minItems: 4
          maxItems: 4

        reg-names:
          minItems: 4
          maxItems: 4

        interrupts:
          minItems: 4
          maxItems: 4

        interrupt-names:
          minItems: 4
          maxItems: 4

examples:
  - |
Loading