Commit 67da125e authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull RCU updates from Paul McKenney:
 "Documentation updates:

   - Update whatisRCU.rst and checklist.rst for recent RCU API additions

   - Fix RCU documentation formatting and typos

   - Replace dead Ottawa Linux Symposium links in RTFP.txt

  Miscellaneous RCU updates:

   - Document that rcu_barrier() hurries RCU_LAZY callbacks

   - Remove redundant interrupt disabling from
     rcu_preempt_deferred_qs_handler()

   - Move list_for_each_rcu from list.h to rculist.h, and adjust the
     include directive in kernel/cgroup/dmem.c accordingly

   - Make initial set of changes to accommodate upcoming
     system_percpu_wq changes

  SRCU updates:

   - Create an srcu_read_lock_fast_notrace() for eventual use in
     tracing, including adding guards

   - Document the reliance on per-CPU operations as implicit RCU readers
     in __srcu_read_{,un}lock_fast()

   - Document the srcu_flip() function's memory-barrier D's relationship
     to SRCU-fast readers

   - Remove a redundant preempt_disable() and preempt_enable() pair from
     srcu_gp_start_if_needed()

  Torture-test updates:

   - Fix jitter.sh spin time so that it actually varies as advertised.
     It is still quite coarse-grained, but at least it does now vary

   - Update torture.sh help text to include the not-so-new --do-normal
     parameter, which permits (for example) testing KCSAN kernels
     without doing non-debug kernels

   - Fix a number of false-positive diagnostics that were being
     triggered by rcutorture starting before boot completed. Running
     multiple near-CPU-bound rcutorture processes when there is only the
     boot CPU is after all a bit excessive

   - Substitute kcalloc() for kzalloc()

   - Remove a redundant kfree() and NULL out kfree()ed objects"

* tag 'rcu.2025.09.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (31 commits)
  rcu: WQ_UNBOUND added to sync_wq workqueue
  rcu: WQ_PERCPU added to alloc_workqueue users
  rcu: replace use of system_wq with system_percpu_wq
  refperf: Set reader_tasks to NULL after kfree()
  refperf: Remove redundant kfree() after torture_stop_kthread()
  srcu/tiny: Remove preempt_disable/enable() in srcu_gp_start_if_needed()
  srcu: Document srcu_flip() memory-barrier D relation to SRCU-fast
  srcu: Document __srcu_read_{,un}lock_fast() implicit RCU readers
  rculist: move list_for_each_rcu() to where it belongs
  refscale: Use kcalloc() instead of kzalloc()
  rcutorture: Use kcalloc() instead of kzalloc()
  docs: rcu: Replace multiple dead OLS links in RTFP.txt
  doc: Fix typo in RCU's torture.rst documentation
  Documentation: RCU: Retitle toctree index
  Documentation: RCU: Reduce toctree depth
  Documentation: RCU: Wrap kvm-remote.sh rerun snippet in literal code block
  rcu: docs: Requirements.rst: Abide by conventions of kernel documentation
  doc: Add RCU guards to checklist.rst
  doc: Update whatisRCU.rst for recent RCU API additions
  rcutorture: Delay forward-progress testing until boot completes
  ...
parents 48e3694a 1d289fc5
Loading
Loading
Loading
Loading
+24 −28
Original line number Diff line number Diff line
@@ -1973,9 +1973,7 @@ code, and the FQS loop, all of which refer to or modify this bookkeeping.
Note that grace period initialization (rcu_gp_init()) must carefully sequence
CPU hotplug scanning with grace period state changes. For example, the
following race could occur in rcu_gp_init() if rcu_seq_start() were to happen
after the CPU hotplug scanning.

.. code-block:: none
after the CPU hotplug scanning::

   CPU0 (rcu_gp_init)                   CPU1                          CPU2
   ---------------------                ----                          ----
@@ -2008,22 +2006,22 @@ after the CPU hotplug scanning.
                                                                      kfree(r1);
                                        r2 = *r0; // USE-AFTER-FREE!

By incrementing gp_seq first, CPU1's RCU read-side critical section
By incrementing ``gp_seq`` first, CPU1's RCU read-side critical section
is guaranteed to not be missed by CPU2.

**Concurrent Quiescent State Reporting for Offline CPUs**
Concurrent Quiescent State Reporting for Offline CPUs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

RCU must ensure that CPUs going offline report quiescent states to avoid
blocking grace periods. This requires careful synchronization to handle
race conditions

**Race condition causing Offline CPU to hang GP**

A race between CPU offlining and new GP initialization (gp_init) may occur
because `rcu_report_qs_rnp()` in `rcutree_report_cpu_dead()` must temporarily
release the `rcu_node` lock to wake the RCU grace-period kthread:
Race condition causing Offline CPU to hang GP
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: none
A race between CPU offlining and new GP initialization (gp_init()) may occur
because rcu_report_qs_rnp() in rcutree_report_cpu_dead() must temporarily
release the ``rcu_node`` lock to wake the RCU grace-period kthread::

   CPU1 (going offline)                 CPU0 (GP kthread)
   --------------------                 -----------------
@@ -2044,15 +2042,14 @@ release the `rcu_node` lock to wake the RCU grace-period kthread:
       // Reacquire lock (but too late)
     rnp->qsmaskinitnext &= ~mask       // Finally clears bit

Without `ofl_lock`, the new grace period includes the offline CPU and waits
Without ``ofl_lock``, the new grace period includes the offline CPU and waits
forever for its quiescent state causing a GP hang.

**A solution with ofl_lock**
A solution with ofl_lock
^^^^^^^^^^^^^^^^^^^^^^^^

The `ofl_lock` (offline lock) prevents `rcu_gp_init()` from running during
the vulnerable window when `rcu_report_qs_rnp()` has released `rnp->lock`:

.. code-block:: none
The ``ofl_lock`` (offline lock) prevents rcu_gp_init() from running during
the vulnerable window when rcu_report_qs_rnp() has released ``rnp->lock``::

   CPU0 (rcu_gp_init)                   CPU1 (rcutree_report_cpu_dead)
   ------------------                   ------------------------------
@@ -2065,21 +2062,20 @@ the vulnerable window when `rcu_report_qs_rnp()` has released `rnp->lock`:
       arch_spin_unlock(&ofl_lock) ---> // Now CPU1 can proceed
   }                                    // But snapshot already taken

**Another race causing GP hangs in rcu_gpu_init(): Reporting QS for Now-offline CPUs**
Another race causing GP hangs in rcu_gpu_init(): Reporting QS for Now-offline CPUs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

After the first loop takes an atomic snapshot of online CPUs, as shown above,
the second loop in `rcu_gp_init()` detects CPUs that went offline between
releasing `ofl_lock` and acquiring the per-node `rnp->lock`. This detection is
crucial because:
the second loop in rcu_gp_init() detects CPUs that went offline between
releasing ``ofl_lock`` and acquiring the per-node ``rnp->lock``.
This detection is crucial because:

1. The CPU might have gone offline after the snapshot but before the second loop
2. The offline CPU cannot report its own QS if it's already dead
3. Without this detection, the grace period would wait forever for CPUs that
   are now offline.

The second loop performs this detection safely:

.. code-block:: none
The second loop performs this detection safely::

   rcu_for_each_node_breadth_first(rnp) {
       raw_spin_lock_irqsave_rcu_node(rnp, flags);
@@ -2093,10 +2089,10 @@ The second loop performs this detection safely:
   }

This approach ensures atomicity: quiescent state reporting for offline CPUs
happens either in `rcu_gp_init()` (second loop) or in `rcutree_report_cpu_dead()`,
never both and never neither. The `rnp->lock` held throughout the sequence
prevents races - `rcutree_report_cpu_dead()` also acquires this lock when
clearing `qsmaskinitnext`, ensuring mutual exclusion.
happens either in rcu_gp_init() (second loop) or in rcutree_report_cpu_dead(),
never both and never neither. The ``rnp->lock`` held throughout the sequence
prevents races - rcutree_report_cpu_dead() also acquires this lock when
clearing ``qsmaskinitnext``, ensuring mutual exclusion.

Scheduler and RCU
~~~~~~~~~~~~~~~~~
+3 −3
Original line number Diff line number Diff line
@@ -641,7 +641,7 @@ Orran Krieger and Rusty Russell and Dipankar Sarma and Maneesh Soni"
,Month="July"
,Year="2001"
,note="Available:
\url{http://www.linuxsymposium.org/2001/abstracts/readcopy.php}
\url{https://kernel.org/doc/ols/2001/read-copy.pdf}
\url{http://www.rdrop.com/users/paulmck/RCU/rclock_OLS.2001.05.01c.pdf}
[Viewed June 23, 2004]"
,annotation={
@@ -1480,7 +1480,7 @@ Suparna Bhattacharya"
,Year="2006"
,pages="v2 123-138"
,note="Available:
\url{http://www.linuxsymposium.org/2006/view_abstract.php?content_key=184}
\url{https://kernel.org/doc/ols/2006/ols2006v2-pages-131-146.pdf}
\url{http://www.rdrop.com/users/paulmck/RCU/OLSrtRCU.2006.08.11a.pdf}
[Viewed January 1, 2007]"
,annotation={
@@ -1511,7 +1511,7 @@ Canis Rufus and Zoicon5 and Anome and Hal Eisen"
,Year="2006"
,pages="v2 249-254"
,note="Available:
\url{http://www.linuxsymposium.org/2006/view_abstract.php?content_key=184}
\url{https://kernel.org/doc/ols/2006/ols2006v2-pages-249-262.pdf}
[Viewed January 11, 2009]"
,annotation={
	Uses RCU-protected radix tree for a lockless page cache.
+19 −8
Original line number Diff line number Diff line
@@ -69,7 +69,13 @@ over a rather long period of time, but improvements are always welcome!
	Explicit disabling of preemption (preempt_disable(), for example)
	can serve as rcu_read_lock_sched(), but is less readable and
	prevents lockdep from detecting locking issues.  Acquiring a
	spinlock also enters an RCU read-side critical section.
	raw spinlock also enters an RCU read-side critical section.

	The guard(rcu)() and scoped_guard(rcu) primitives designate
	the remainder of the current scope or the next statement,
	respectively, as the RCU read-side critical section.  Use of
	these guards can be less error-prone than rcu_read_lock(),
	rcu_read_unlock(), and friends.

	Please note that you *cannot* rely on code known to be built
	only in non-preemptible kernels.  Such code can and will break,
@@ -405,9 +411,11 @@ over a rather long period of time, but improvements are always welcome!
13.	Unlike most flavors of RCU, it *is* permissible to block in an
	SRCU read-side critical section (demarked by srcu_read_lock()
	and srcu_read_unlock()), hence the "SRCU": "sleepable RCU".
	Please note that if you don't need to sleep in read-side critical
	sections, you should be using RCU rather than SRCU, because RCU
	is almost always faster and easier to use than is SRCU.
	As with RCU, guard(srcu)() and scoped_guard(srcu) forms are
	available, and often provide greater ease of use.  Please note
	that if you don't need to sleep in read-side critical sections,
	you should be using RCU rather than SRCU, because RCU is almost
	always faster and easier to use than is SRCU.

	Also unlike other forms of RCU, explicit initialization and
	cleanup is required either at build time via DEFINE_SRCU()
@@ -443,10 +451,13 @@ over a rather long period of time, but improvements are always welcome!
	real-time workloads than is synchronize_rcu_expedited().

	It is also permissible to sleep in RCU Tasks Trace read-side
	critical section, which are delimited by rcu_read_lock_trace() and
	rcu_read_unlock_trace().  However, this is a specialized flavor
	of RCU, and you should not use it without first checking with
	its current users.  In most cases, you should instead use SRCU.
	critical section, which are delimited by rcu_read_lock_trace()
	and rcu_read_unlock_trace().  However, this is a specialized
	flavor of RCU, and you should not use it without first checking
	with its current users.  In most cases, you should instead
	use SRCU.  As with RCU and SRCU, guard(rcu_tasks_trace)() and
	scoped_guard(rcu_tasks_trace) are available, and often provide
	greater ease of use.

	Note that rcu_assign_pointer() relates to SRCU just as it does to
	other forms of RCU, but instead of rcu_dereference() you should
+3 −3
Original line number Diff line number Diff line
.. SPDX-License-Identifier: GPL-2.0

.. _rcu_concepts:
.. _rcu_handbook:

============
RCU concepts
RCU Handbook
============

.. toctree::
   :maxdepth: 3
   :maxdepth: 2

   checklist
   lockdep
+2 −2
Original line number Diff line number Diff line
@@ -344,7 +344,7 @@ painstaking and error-prone.

And this is why the kvm-remote.sh script exists.

If you the following command works::
If the following command works::

	ssh system0 date

@@ -364,7 +364,7 @@ systems must come first.
The kvm.sh ``--dryrun scenarios`` argument is useful for working out
how many scenarios may be run in one batch across a group of systems.

You can also re-run a previous remote run in a manner similar to kvm.sh:
You can also re-run a previous remote run in a manner similar to kvm.sh::

	kvm-remote.sh "system0 system1 system2 system3 system4 system5" \
		tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28-remote \
Loading