Commit 0a096ab7 authored by Kevin Brodsky's avatar Kevin Brodsky Committed by Andrew Morton
Browse files

mm: introduce generic lazy_mmu helpers

The implementation of the lazy MMU mode is currently entirely
arch-specific; core code directly calls arch helpers:
arch_{enter,leave}_lazy_mmu_mode().

We are about to introduce support for nested lazy MMU sections.  As things
stand we'd have to duplicate that logic in every arch implementing
lazy_mmu - adding to a fair amount of logic already duplicated across
lazy_mmu implementations.

This patch therefore introduces a new generic layer that calls the
existing arch_* helpers. Two pair of calls are introduced:

* lazy_mmu_mode_enable() ... lazy_mmu_mode_disable()
    This is the standard case where the mode is enabled for a given
    block of code by surrounding it with enable() and disable()
    calls.

* lazy_mmu_mode_pause() ... lazy_mmu_mode_resume()
    This is for situations where the mode is temporarily disabled
    by first calling pause() and then resume() (e.g. to prevent any
    batching from occurring in a critical section).

The documentation in <linux/pgtable.h> will be updated in a subsequent
patch.

No functional change should be introduced at this stage.  The
implementation of enable()/resume() and disable()/pause() is currently
identical, but nesting support will change that.

Most of the call sites have been updated using the following Coccinelle
script:

@@
@@
{
...
- arch_enter_lazy_mmu_mode();
+ lazy_mmu_mode_enable();
...
- arch_leave_lazy_mmu_mode();
+ lazy_mmu_mode_disable();
...
}

@@
@@
{
...
- arch_leave_lazy_mmu_mode();
+ lazy_mmu_mode_pause();
...
- arch_enter_lazy_mmu_mode();
+ lazy_mmu_mode_resume();
...
}

A couple of notes regarding x86:

* Xen is currently the only case where explicit handling is required
  for lazy MMU when context-switching. This is purely an
  implementation detail and using the generic lazy_mmu_mode_*
  functions would cause trouble when nesting support is introduced,
  because the generic functions must be called from the current task.
  For that reason we still use arch_leave() and arch_enter() there.

* x86 calls arch_flush_lazy_mmu_mode() unconditionally in a few
  places, but only defines it if PARAVIRT_XXL is selected, and we
  are removing the fallback in <linux/pgtable.h>. Add a new fallback
  definition to <asm/pgtable.h> to keep things building.

Link: https://lkml.kernel.org/r/20251215150323.2218608-8-kevin.brodsky@arm.com


Signed-off-by: default avatarKevin Brodsky <kevin.brodsky@arm.com>
Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: default avatarYeoreum Yun <yeoreum.yun@arm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Juegren Gross <jgross@suse.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
parent 7303ecbf
Loading
Loading
Loading
Loading
+4 −4
Original line number Diff line number Diff line
@@ -800,7 +800,7 @@ int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
		return -EINVAL;

	mutex_lock(&pgtable_split_lock);
	arch_enter_lazy_mmu_mode();
	lazy_mmu_mode_enable();

	/*
	 * The split_kernel_leaf_mapping_locked() may sleep, it is not a
@@ -822,7 +822,7 @@ int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
			ret = split_kernel_leaf_mapping_locked(end);
	}

	arch_leave_lazy_mmu_mode();
	lazy_mmu_mode_disable();
	mutex_unlock(&pgtable_split_lock);
	return ret;
}
@@ -883,10 +883,10 @@ static int range_split_to_ptes(unsigned long start, unsigned long end, gfp_t gfp
{
	int ret;

	arch_enter_lazy_mmu_mode();
	lazy_mmu_mode_enable();
	ret = walk_kernel_page_table_range_lockless(start, end,
					&split_to_ptes_ops, NULL, &gfp);
	arch_leave_lazy_mmu_mode();
	lazy_mmu_mode_disable();

	return ret;
}
+2 −2
Original line number Diff line number Diff line
@@ -110,7 +110,7 @@ static int update_range_prot(unsigned long start, unsigned long size,
	if (WARN_ON_ONCE(ret))
		return ret;

	arch_enter_lazy_mmu_mode();
	lazy_mmu_mode_enable();

	/*
	 * The caller must ensure that the range we are operating on does not
@@ -119,7 +119,7 @@ static int update_range_prot(unsigned long start, unsigned long size,
	 */
	ret = walk_kernel_page_table_range_lockless(start, start + size,
						    &pageattr_ops, NULL, &data);
	arch_leave_lazy_mmu_mode();
	lazy_mmu_mode_disable();

	return ret;
}
+4 −4
Original line number Diff line number Diff line
@@ -205,7 +205,7 @@ void __flush_hash_table_range(unsigned long start, unsigned long end)
	 * way to do things but is fine for our needs here.
	 */
	local_irq_save(flags);
	arch_enter_lazy_mmu_mode();
	lazy_mmu_mode_enable();
	for (; start < end; start += PAGE_SIZE) {
		pte_t *ptep = find_init_mm_pte(start, &hugepage_shift);
		unsigned long pte;
@@ -217,7 +217,7 @@ void __flush_hash_table_range(unsigned long start, unsigned long end)
			continue;
		hpte_need_flush(&init_mm, start, ptep, pte, hugepage_shift);
	}
	arch_leave_lazy_mmu_mode();
	lazy_mmu_mode_disable();
	local_irq_restore(flags);
}

@@ -237,7 +237,7 @@ void flush_hash_table_pmd_range(struct mm_struct *mm, pmd_t *pmd, unsigned long
	 * way to do things but is fine for our needs here.
	 */
	local_irq_save(flags);
	arch_enter_lazy_mmu_mode();
	lazy_mmu_mode_enable();
	start_pte = pte_offset_map(pmd, addr);
	if (!start_pte)
		goto out;
@@ -249,6 +249,6 @@ void flush_hash_table_pmd_range(struct mm_struct *mm, pmd_t *pmd, unsigned long
	}
	pte_unmap(start_pte);
out:
	arch_leave_lazy_mmu_mode();
	lazy_mmu_mode_disable();
	local_irq_restore(flags);
}
+2 −2
Original line number Diff line number Diff line
@@ -73,13 +73,13 @@ static void hpte_flush_range(struct mm_struct *mm, unsigned long addr,
	pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
	if (!pte)
		return;
	arch_enter_lazy_mmu_mode();
	lazy_mmu_mode_enable();
	for (; npages > 0; --npages) {
		pte_update(mm, addr, pte, 0, 0, 0);
		addr += PAGE_SIZE;
		++pte;
	}
	arch_leave_lazy_mmu_mode();
	lazy_mmu_mode_disable();
	pte_unmap_unlock(pte - 1, ptl);
}

+1 −0
Original line number Diff line number Diff line
@@ -118,6 +118,7 @@ extern pmdval_t early_pmd_flags;
#define __pte(x)	native_make_pte(x)

#define arch_end_context_switch(prev)	do {} while(0)
static inline void arch_flush_lazy_mmu_mode(void) {}
#endif	/* CONFIG_PARAVIRT_XXL */

static inline pmd_t pmd_set_flags(pmd_t pmd, pmdval_t set)
Loading