Merge tag 'pm-6.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm (9b1b3dcd) · Commits · git / linux-net

Documentation/admin-guide/pm/cpufreq.rst

+1 −1

Original line number	Diff line number	Diff line
		@@ -439,7 +439,7 @@ This governor exposes only one tunable:
		``rate_limit_us``
		Minimum time (in microseconds) that has to pass between two consecutive
		runs of governor computations (default: 1.5 times the scaling driver's
		transition latency or the maximum 2ms).
		transition latency or 1ms if the driver does not provide a latency value).

		The purpose of this tunable is to reduce the scheduler context overhead
		of the governor which might be excessive without it.

Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.yaml

+2 −0

Original line number	Diff line number	Diff line
		@@ -35,6 +35,7 @@ properties:
		- description: v2 of CPUFREQ HW (EPSS)
		items:
		- enum:
		- qcom,milos-cpufreq-epss
		- qcom,qcs8300-cpufreq-epss
		- qcom,qdu1000-cpufreq-epss
		- qcom,sa8255p-cpufreq-epss
		@@ -169,6 +170,7 @@ allOf:
		compatible:
		contains:
		enum:
		- qcom,milos-cpufreq-epss
		- qcom,qcs8300-cpufreq-epss
		- qcom,sc7280-cpufreq-epss
		- qcom,sm8250-cpufreq-epss

Documentation/power/energy-model.rst

+9 −9

Original line number	Diff line number	Diff line
		@@ -14,8 +14,8 @@ subsystems willing to use that information to make energy-aware decisions.
		The source of the information about the power consumed by devices can vary greatly
		from one platform to another. These power costs can be estimated using
		devicetree data in some cases. In others, the firmware will know better.
		Alternatively, userspace might be best positioned. And so on. In order to avoid
		each and every client subsystem to re-implement support for each and every
		Alternatively, userspace might be best positioned. In order to avoid
		having each and every client subsystem re-implement support for each and every
		possible source of information on its own, the EM framework intervenes as an
		abstraction layer which standardizes the format of power cost tables in the
		kernel, hence enabling to avoid redundant work.
		@@ -32,7 +32,7 @@ be found in the Intelligent Power Allocation in
		Documentation/driver-api/thermal/power_allocator.rst.
		Kernel subsystems might implement automatic detection to check whether EM
		registered devices have inconsistent scale (based on EM internal flag).
		Important thing to keep in mind is that when the power values are expressed in
		An important thing to keep in mind is that when the power values are expressed in
		an 'abstract scale' deriving real energy in micro-Joules would not be possible.

		The figure below depicts an example of drivers (Arm-specific here, but the
		@@ -82,7 +82,7 @@ using kref mechanism. The device driver which provided the new EM at runtime,
		should call EM API to free it safely when it's no longer needed. The EM
		framework will handle the clean-up when it's possible.

		The kernel code which want to modify the EM values is protected from concurrent
		The kernel code which wants to modify the EM values is protected from concurrent
		access using a mutex. Therefore, the device driver code must run in sleeping
		context when it tries to modify the EM.

		@@ -113,7 +113,7 @@ Registration of 'advanced' EM
		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

		The 'advanced' EM gets its name due to the fact that the driver is allowed
		to provide more precised power model. It's not limited to some implemented math
		to provide a more precise power model. It's not limited to some implemented math
		formula in the framework (like it is in 'simple' EM case). It can better reflect
		the real power measurements performed for each performance state. Thus, this
		registration method should be preferred in case considering EM static power
		@@ -172,7 +172,7 @@ Registration of 'simple' EM
		~~~~~~~~~~~~~~~~~~~~~~~~~~~

		The 'simple' EM is registered using the framework helper function
		cpufreq_register_em_with_opp(). It implements a power model which is tight to
		cpufreq_register_em_with_opp(). It implements a power model which is tied to a
		math formula::

		Power = C * V^2 * f
		@@ -251,7 +251,7 @@ It returns the 'struct em_perf_state' pointer which is an array of performance
		states in ascending order.
		This function must be called in the RCU read lock section (after the
		rcu_read_lock()). When the EM table is not needed anymore there is a need to
		call rcu_real_unlock(). In this way the EM safely uses the RCU read section
		call rcu_read_unlock(). In this way the EM safely uses the RCU read section
		and protects the users. It also allows the EM framework to manage the memory
		and free it. More details how to use it can be found in Section 3.2 in the
		example driver.
		@@ -308,12 +308,12 @@ EM framework::
		05
		06 /* Use the 'foo' protocol to ceil the frequency */
		07 freq = foo_get_freq_ceil(dev, *KHz);
		08 if (freq < 0);
		08 if (freq < 0)
		09 return freq;
		10
		11 /* Estimate the power cost for the dev at the relevant freq. */
		12 power = foo_estimate_power(dev, freq);
		13 if (power < 0);
		13 if (power < 0)
		14 return power;
		15
		16 /* Return the values to the EM framework */

Documentation/power/runtime_pm.rst

+3 −4

Original line number	Diff line number	Diff line
		@@ -712,10 +712,9 @@ out the following operations:
		* During system suspend pm_runtime_get_noresume() is called for every device
		right before executing the subsystem-level .prepare() callback for it and
		pm_runtime_barrier() is called for every device right before executing the
		subsystem-level .suspend() callback for it. In addition to that the PM core
		calls __pm_runtime_disable() with 'false' as the second argument for every
		device right before executing the subsystem-level .suspend_late() callback
		for it.
		subsystem-level .suspend() callback for it. In addition to that, the PM
		core disables runtime PM for every device right before executing the
		subsystem-level .suspend_late() callback for it.

		* During system resume pm_runtime_enable() and pm_runtime_put() are called for
		every device right after executing the subsystem-level .resume_early()

Documentation/scheduler/sched-energy.rst

+4 −4

Original line number	Diff line number	Diff line
		@@ -244,7 +244,7 @@ Example 2.


		From these calculations, the Case 1 has the lowest total energy. So CPU 1
		is be the best candidate from an energy-efficiency standpoint.
		is the best candidate from an energy-efficiency standpoint.

		Big CPUs are generally more power hungry than the little ones and are thus used
		mainly when a task doesn't fit the littles. However, little CPUs aren't always
		@@ -252,7 +252,7 @@ necessarily more energy-efficient than big CPUs. For some systems, the high OPPs
		of the little CPUs can be less energy-efficient than the lowest OPPs of the
		bigs, for example. So, if the little CPUs happen to have enough utilization at
		a specific point in time, a small task waking up at that moment could be better
		of executing on the big side in order to save energy, even though it would fit
		off executing on the big side in order to save energy, even though it would fit
		on the little side.

		And even in the case where all OPPs of the big CPUs are less energy-efficient
		@@ -285,7 +285,7 @@ much that can be done by the scheduler to save energy without severely harming
		throughput. In order to avoid hurting performance with EAS, CPUs are flagged as
		'over-utilized' as soon as they are used at more than 80% of their compute
		capacity. As long as no CPUs are over-utilized in a root domain, load balancing
		is disabled and EAS overridess the wake-up balancing code. EAS is likely to load
		is disabled and EAS overrides the wake-up balancing code. EAS is likely to load
		the most energy efficient CPUs of the system more than the others if that can be
		done without harming throughput. So, the load-balancer is disabled to prevent
		it from breaking the energy-efficient task placement found by EAS. It is safe to
		@@ -385,7 +385,7 @@ Using EAS with any other governor than schedutil is not supported.
		6.5 Scale-invariant utilization signals
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		In order to make accurate prediction across CPUs and for all performance
		In order to make accurate predictions across CPUs and for all performance
		states, EAS needs frequency-invariant and CPU-invariant PELT signals. These can
		be obtained using the architecture-defined arch_scale{cpu,freq}_capacity()
		callbacks.