Commit 98e7dcbb authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull RCU updates from Frederic Weisbecker:
 "SRCU:

   - Properly handle SRCU readers within IRQ disabled sections in tiny
     SRCU

   - Preparation to reimplement RCU Tasks Trace on top of SRCU fast:

      - Introduce API to expedite a grace period and test it through
        rcutorture

      - Split srcu-fast in two flavours: SRCU-fast and SRCU-fast-updown.

        Both are still targeted toward faster readers (without full
        barriers on LOCK and UNLOCK) at the expense of heavier write
        side (using full RCU grace period ordering instead of simply
        full ordering) as compared to "traditional" non-fast SRCU. But
        those srcu-fast flavours are going to be optimized in two
        different ways:

          - SRCU-fast will become the reimplementation basis for
            RCU-TASK-TRACE for consolidation. Since RCU-TASK-TRACE must
            be NMI safe, SRCU-fast must be as well.

          - SRCU-fast-updown will be needed for uretprobes code in order
            to get rid of the read-side memory barriers while still
            allowing entering the reader at task level while exiting it
            in a timer handler. It is considered semaphore-like in that
            it can have different owners between LOCK and UNLOCK.
            However it is not NMI-safe.

        The actual optimizations are work in progress for the next
        cycle. Only the new interfaces are added for now, along with
        related torture and scalability test code.

   - Create/document/debug/torture new proper initializers for RCU fast:
     DEFINE_SRCU_FAST() and init_srcu_struct_fast()

     This allows for using right away the proper ordering on the write
     side (either full ordering or full RCU grace period ordering)
     without waiting for the read side to tell which to use.

     This also optimizes the read side altogether with moving flavour
     debug checks under debug config and with removing a costly RmW
     operation on their first call.

   - Make some diagnostic functions tracing safe

  Refscale:

   - Add performance testing for common context synchronizations
     (Preemption, IRQ, Softirq) and per-cpu increments. Those are
     relevant comparisons against SRCU-fast read side APIs, especially
     as they are planned to synchronize further tracing fast-path code

  Miscellanous:

   - In order to prepare the layout for nohz_full work deferral to user
     exit, the context tracking state must shrink the counter of
     transitions to/from RCU not watching. The only possible hazard is
     to trigger wrap-around more easily, delaying a bit grace periods
     when that happens. This should be a rare event though. Yet add
     debugging and torture code to test that assumption

   - Fix memory leak on locktorture module

   - Annotate accesses in rculist_nulls.h to prevent from KCSAN
     warnings. On recent discussions, we also concluded that all those
     WRITE_ONCE() and READ_ONCE() on list APIs deserve appropriate
     comments. Something to be expected for the next cycle

   - Provide a script to apply several configs to several commits with
     torture

   - Allow torture to reuse a build directory in order to save needless
     rebuild time

   - Various cleanups"

* tag 'rcu.release.v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (29 commits)
  refscale: Add SRCU-fast-updown readers
  refscale: Exercise DEFINE_STATIC_SRCU_FAST() and init_srcu_struct_fast()
  rcutorture: Make srcu{,d}_torture_init() announce the SRCU type
  srcu: Create an SRCU-fast-updown API
  refscale: Do not disable interrupts for tests involving local_bh_enable()
  refscale: Add non-atomic per-CPU increment readers
  refscale: Add this_cpu_inc() readers
  refscale: Add preempt_disable() readers
  refscale: Add local_bh_disable() readers
  refscale: Add local_irq_disable() and local_irq_save() readers
  torture: Permit negative kvm.sh --kconfig numberic arguments
  srcu: Add SRCU_READ_FLAVOR_FAST_UPDOWN CPP macro
  rcu: Mark diagnostic functions as notrace
  rcutorture: Make TREE04 use CONFIG_RCU_DYNTICKS_TORTURE
  rcutorture: Remove redundant rcutorture_one_extend() from rcu_torture_one_read()
  rcutorture: Permit kvm-again.sh to re-use the build directory
  torture: Add kvm-series.sh to test commit/scenario combination
  rcu: use WRITE_ONCE() for ->next and ->pprev of hlist_nulls
  locktorture: Fix memory leak in param_set_cpumask()
  doc: Update for SRCU-fast definitions and initialization
  ...
parents b687034b 9a08942f
Loading
Loading
Loading
Loading
+17 −16
Original line number Diff line number Diff line
@@ -2637,15 +2637,16 @@ synchronize_srcu() for some other domain ``ss1``, and if an
that was held across as ``ss``-domain synchronize_srcu(), deadlock
would again be possible. Such a deadlock cycle could extend across an
arbitrarily large number of different SRCU domains. Again, with great
power comes great responsibility.
power comes great responsibility, though lockdep is now able to detect
this sort of deadlock.

Unlike the other RCU flavors, SRCU read-side critical sections can run
on idle and even offline CPUs. This ability requires that
srcu_read_lock() and srcu_read_unlock() contain memory barriers,
which means that SRCU readers will run a bit slower than would RCU
readers. It also motivates the smp_mb__after_srcu_read_unlock() API,
which, in combination with srcu_read_unlock(), guarantees a full
memory barrier.
Unlike the other RCU flavors, SRCU read-side critical sections can run on
idle and even offline CPUs, with the exception of srcu_read_lock_fast()
and friends.  This ability requires that srcu_read_lock() and
srcu_read_unlock() contain memory barriers, which means that SRCU
readers will run a bit slower than would RCU readers. It also motivates
the smp_mb__after_srcu_read_unlock() API, which, in combination with
srcu_read_unlock(), guarantees a full memory barrier.

Also unlike other RCU flavors, synchronize_srcu() may **not** be
invoked from CPU-hotplug notifiers, due to the fact that SRCU grace
@@ -2681,15 +2682,15 @@ run some tests first. SRCU just might need a few adjustment to deal with
that sort of load. Of course, your mileage may vary based on the speed
of your CPUs and the size of your memory.

The `SRCU
API <https://lwn.net/Articles/609973/#RCU%20Per-Flavor%20API%20Table>`__
The `SRCU API
<https://lwn.net/Articles/609973/#RCU%20Per-Flavor%20API%20Table>`__
includes srcu_read_lock(), srcu_read_unlock(),
srcu_dereference(), srcu_dereference_check(),
synchronize_srcu(), synchronize_srcu_expedited(),
call_srcu(), srcu_barrier(), and srcu_read_lock_held(). It
also includes DEFINE_SRCU(), DEFINE_STATIC_SRCU(), and
init_srcu_struct() APIs for defining and initializing
``srcu_struct`` structures.
srcu_dereference(), srcu_dereference_check(), synchronize_srcu(),
synchronize_srcu_expedited(), call_srcu(), srcu_barrier(),
and srcu_read_lock_held(). It also includes DEFINE_SRCU(),
DEFINE_STATIC_SRCU(), DEFINE_SRCU_FAST(), DEFINE_STATIC_SRCU_FAST(),
init_srcu_struct(), and init_srcu_struct_fast() APIs for defining and
initializing ``srcu_struct`` structures.

More recently, the SRCU API has added polling interfaces:

+7 −5
Original line number Diff line number Diff line
@@ -417,11 +417,13 @@ over a rather long period of time, but improvements are always welcome!
	you should be using RCU rather than SRCU, because RCU is almost
	always faster and easier to use than is SRCU.

	Also unlike other forms of RCU, explicit initialization and
	cleanup is required either at build time via DEFINE_SRCU()
	or DEFINE_STATIC_SRCU() or at runtime via init_srcu_struct()
	and cleanup_srcu_struct().  These last two are passed a
	"struct srcu_struct" that defines the scope of a given
	Also unlike other forms of RCU, explicit initialization
	and cleanup is required either at build time via
	DEFINE_SRCU(), DEFINE_STATIC_SRCU(), DEFINE_SRCU_FAST(),
	or DEFINE_STATIC_SRCU_FAST() or at runtime via either
	init_srcu_struct() or init_srcu_struct_fast() and
	cleanup_srcu_struct().	These last three are passed a
	`struct srcu_struct` that defines the scope of a given
	SRCU domain.  Once initialized, the srcu_struct is passed
	to srcu_read_lock(), srcu_read_unlock() synchronize_srcu(),
	synchronize_srcu_expedited(), and call_srcu().	A given
+3 −0
Original line number Diff line number Diff line
@@ -1227,7 +1227,10 @@ SRCU: Initialization/cleanup/ordering::

	DEFINE_SRCU
	DEFINE_STATIC_SRCU
	DEFINE_SRCU_FAST        // for srcu_read_lock_fast() and friends
	DEFINE_STATIC_SRCU_FAST // for srcu_read_lock_fast() and friends
	init_srcu_struct
	init_srcu_struct_fast
	cleanup_srcu_struct
	smp_mb__after_srcu_read_unlock

+37 −7
Original line number Diff line number Diff line
@@ -18,12 +18,6 @@ enum ctx_state {
	CT_STATE_MAX		= 4,
};

/* Odd value for watching, else even. */
#define CT_RCU_WATCHING CT_STATE_MAX

#define CT_STATE_MASK (CT_STATE_MAX - 1)
#define CT_RCU_WATCHING_MASK (~CT_STATE_MASK)

struct context_tracking {
#ifdef CONFIG_CONTEXT_TRACKING_USER
	/*
@@ -44,9 +38,45 @@ struct context_tracking {
#endif
};

/*
 * We cram two different things within the same atomic variable:
 *
 *                     CT_RCU_WATCHING_START  CT_STATE_START
 *                                |                |
 *                                v                v
 *     MSB [ RCU watching counter ][ context_state ] LSB
 *         ^                       ^
 *         |                       |
 * CT_RCU_WATCHING_END        CT_STATE_END
 *
 * Bits are used from the LSB upwards, so unused bits (if any) will always be in
 * upper bits of the variable.
 */
#ifdef CONFIG_CONTEXT_TRACKING
#define CT_SIZE (sizeof(((struct context_tracking *)0)->state) * BITS_PER_BYTE)

#define CT_STATE_WIDTH bits_per(CT_STATE_MAX - 1)
#define CT_STATE_START 0
#define CT_STATE_END   (CT_STATE_START + CT_STATE_WIDTH - 1)

#define CT_RCU_WATCHING_MAX_WIDTH (CT_SIZE - CT_STATE_WIDTH)
#define CT_RCU_WATCHING_WIDTH     (IS_ENABLED(CONFIG_RCU_DYNTICKS_TORTURE) ? 2 : CT_RCU_WATCHING_MAX_WIDTH)
#define CT_RCU_WATCHING_START     (CT_STATE_END + 1)
#define CT_RCU_WATCHING_END       (CT_RCU_WATCHING_START + CT_RCU_WATCHING_WIDTH - 1)
#define CT_RCU_WATCHING           BIT(CT_RCU_WATCHING_START)

#define CT_STATE_MASK        GENMASK(CT_STATE_END,        CT_STATE_START)
#define CT_RCU_WATCHING_MASK GENMASK(CT_RCU_WATCHING_END, CT_RCU_WATCHING_START)

#define CT_UNUSED_WIDTH (CT_RCU_WATCHING_MAX_WIDTH - CT_RCU_WATCHING_WIDTH)

static_assert(CT_STATE_WIDTH        +
	      CT_RCU_WATCHING_WIDTH +
	      CT_UNUSED_WIDTH       ==
	      CT_SIZE);

DECLARE_PER_CPU(struct context_tracking, context_tracking);
#endif
#endif	/* CONFIG_CONTEXT_TRACKING */

#ifdef CONFIG_CONTEXT_TRACKING_USER
static __always_inline int __ct_state(void)
+1 −1
Original line number Diff line number Diff line
@@ -109,7 +109,7 @@ extern void srcu_init_notifier_head(struct srcu_notifier_head *nh);
		.mutex = __MUTEX_INITIALIZER(name.mutex),	\
		.head = NULL,					\
		.srcuu = __SRCU_USAGE_INIT(name.srcuu),		\
		.srcu = __SRCU_STRUCT_INIT(name.srcu, name.srcuu, pcpu), \
		.srcu = __SRCU_STRUCT_INIT(name.srcu, name.srcuu, pcpu, 0), \
	}

#define ATOMIC_NOTIFIER_HEAD(name)				\
Loading