Commit 041aa7a8 authored by Mark Rutland's avatar Mark Rutland Committed by Thomas Gleixner
Browse files

entry: Split preemption from irqentry_exit_to_kernel_mode()

Some architecture-specific work needs to be performed between the state
management for exception entry/exit and the "real" work to handle the
exception. For example, arm64 needs to manipulate a number of exception
masking bits, with different exceptions requiring different masking.

Generally this can all be hidden in the architecture code, but for arm64
the current structure of irqentry_exit_to_kernel_mode() makes this
particularly difficult to handle in a way that is correct, maintainable,
and efficient.

The gory details are described in the thread surrounding:

  https://lore.kernel.org/lkml/acPAzdtjK5w-rNqC@J2N7QTR9R3/



The summary is:

* Currently, irqentry_exit_to_kernel_mode() handles both involuntary
  preemption AND state management necessary for exception return.

* When scheduling (including involuntary preemption), arm64 needs to
  have all arm64-specific exceptions unmasked, though regular interrupts
  must be masked.

* Prior to the state management for exception return, arm64 needs to
  mask a number of arm64-specific exceptions, and perform some work with
  these exceptions masked (with RCU watching, etc).

While in theory it is possible to handle this with a new arch_*() hook
called somewhere under irqentry_exit_to_kernel_mode(), this is fragile
and complicated, and doesn't match the flow used for exception return to
user mode, which has a separate 'prepare' step (where preemption can
occur) prior to the state management.

To solve this, refactor irqentry_exit_to_kernel_mode() to match the
style of {irqentry,syscall}_exit_to_user_mode(), moving preemption logic
into a new irqentry_exit_to_kernel_mode_preempt() function, and moving
state management in a new irqentry_exit_to_kernel_mode_after_preempt()
function. The existing irqentry_exit_to_kernel_mode() is left as a
caller of both of these, avoiding the need to modify existing callers.

There should be no functional change as a result of this change.

[ tglx: Updated kernel doc ]

Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
Signed-off-by: default avatarThomas Gleixner <tglx@kernel.org>
Reviewed-by: default avatarJinjie Ruan <ruanjinjie@huawei.com>
Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260407131650.3813777-6-mark.rutland@arm.com
parent c5538d01
Loading
Loading
Loading
Loading
+59 −14
Original line number Diff line number Diff line
@@ -438,24 +438,46 @@ static __always_inline irqentry_state_t irqentry_enter_from_kernel_mode(struct p
}

/**
 * irqentry_exit_to_kernel_mode - Run preempt checks and establish state after
 *				  invoking the interrupt handler
 * irqentry_exit_to_kernel_mode_preempt - Run preempt checks on return to kernel mode
 * @regs:	Pointer to current's pt_regs
 * @state:	Return value from matching call to irqentry_enter_from_kernel_mode()
 *
 * This is the counterpart of irqentry_enter_from_kernel_mode() and runs the
 * necessary preemption check if possible and required. It returns to the caller
 * with interrupts disabled and the correct state vs. tracing, lockdep and RCU
 * required to return to the interrupted context.
 * This is to be invoked before irqentry_exit_to_kernel_mode_after_preempt() to
 * allow kernel preemption on return from interrupt.
 *
 * Must be invoked with interrupts disabled and CPU state which allows kernel
 * preemption.
 *
 * It is the last action before returning to the low level ASM code which just
 * needs to return.
 * After returning from this function, the caller can modify CPU state before
 * invoking irqentry_exit_to_kernel_mode_after_preempt(), which is required to
 * re-establish the tracing, lockdep and RCU state for returning to the
 * interrupted context.
 */
static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs,
static inline void irqentry_exit_to_kernel_mode_preempt(struct pt_regs *regs,
							irqentry_state_t state)
{
	lockdep_assert_irqs_disabled();
	if (regs_irqs_disabled(regs) || state.exit_rcu)
		return;

	if (IS_ENABLED(CONFIG_PREEMPTION))
		irqentry_exit_cond_resched();
}

/**
 * irqentry_exit_to_kernel_mode_after_preempt - Establish trace, lockdep and RCU state
 * @regs:	Pointer to current's pt_regs
 * @state:	Return value from matching call to irqentry_enter_from_kernel_mode()
 *
 * This is to be invoked after irqentry_exit_to_kernel_mode_preempt() and before
 * actually returning to the interrupted context.
 *
 * There are no requirements for the CPU state other than being able to complete
 * the tracing, lockdep and RCU state transitions. After this function returns
 * the caller must return directly to the interrupted context.
 */
static __always_inline void
irqentry_exit_to_kernel_mode_after_preempt(struct pt_regs *regs, irqentry_state_t state)
{
	if (!regs_irqs_disabled(regs)) {
		/*
		 * If RCU was not watching on entry this needs to be done
@@ -474,9 +496,6 @@ static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs,
		}

		instrumentation_begin();
		if (IS_ENABLED(CONFIG_PREEMPTION))
			irqentry_exit_cond_resched();

		/* Covers both tracing and lockdep */
		trace_hardirqs_on();
		instrumentation_end();
@@ -490,6 +509,32 @@ static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs,
	}
}

/**
 * irqentry_exit_to_kernel_mode - Run preempt checks and establish state after
 *				  invoking the interrupt handler
 * @regs:	Pointer to current's pt_regs
 * @state:	Return value from matching call to irqentry_enter_from_kernel_mode()
 *
 * This is the counterpart of irqentry_enter_from_kernel_mode() and combines
 * the calls to irqentry_exit_to_kernel_mode_preempt() and
 * irqentry_exit_to_kernel_mode_after_preempt().
 *
 * The requirement for the CPU state is that it can schedule. After the function
 * returns the tracing, lockdep and RCU state transitions are completed and the
 * caller must return directly to the interrupted context.
 */
static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs,
							 irqentry_state_t state)
{
	lockdep_assert_irqs_disabled();

	instrumentation_begin();
	irqentry_exit_to_kernel_mode_preempt(regs, state);
	instrumentation_end();

	irqentry_exit_to_kernel_mode_after_preempt(regs, state);
}

/**
 * irqentry_enter - Handle state tracking on ordinary interrupt entries
 * @regs:	Pointer to pt_regs of interrupted context