linux-cryptodev-2.6

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git synced 2026-04-18 03:23:53 -04:00

Author	SHA1	Message	Date
Rohan McLure	d79f9c9cf7	mm: provide address parameter to p{te,md,ud}_user_accessible_page() On several powerpc platforms, a page table entry may not imply whether the relevant mapping is for userspace or kernelspace. Instead, such platforms infer this by the address which is being accessed. Add an additional address argument to each of these routines in order to provide support for page table check on powerpc. [ajd@linux.ibm.com: rebase on arm64 changes] Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-9-755bc151a50b@linux.ibm.com Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Ingo Molnar <mingo@kernel.org> # x86 Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Guo Weikang <guoweikang.kernel@gmail.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Magnus Lindholm <linmag7@gmail.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Cc: Thomas Huth <thuth@redhat.com> Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 20:02:35 -08:00
Rohan McLure	d7b4b67eb6	mm/page_table_check: reinstate address parameter in [__]page_table_check_pte_clear() This reverts commit `aa232204c4` ("mm/page_table_check: remove unused parameter in [__]page_table_check_pte_clear"). Reinstate previously unused parameters for the purpose of supporting powerpc platforms, as many do not encode user/kernel ownership of the page in the pte, but instead in the address of the access. [ajd@linux.ibm.com: rebase, fix additional occurrence and loop handling] Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-8-755bc151a50b@linux.ibm.com Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Ingo Molnar <mingo@kernel.org> # x86 Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Guo Weikang <guoweikang.kernel@gmail.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Magnus Lindholm <linmag7@gmail.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Cc: Thomas Huth <thuth@redhat.com> Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 20:02:35 -08:00
Rohan McLure	649ec9e3d0	mm/page_table_check: reinstate address parameter in [__]page_table_check_pmd_clear() This reverts commit `1831414cd7` ("mm/page_table_check: remove unused parameter in [__]page_table_check_pmd_clear"). Reinstate previously unused parameters for the purpose of supporting powerpc platforms, as many do not encode user/kernel ownership of the page in the pte, but instead in the address of the access. [ajd@linux.ibm.com: rebase on arm64 changes] Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-7-755bc151a50b@linux.ibm.com Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Ingo Molnar <mingo@kernel.org> # x86 Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Guo Weikang <guoweikang.kernel@gmail.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Magnus Lindholm <linmag7@gmail.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Cc: Thomas Huth <thuth@redhat.com> Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 20:02:35 -08:00
Rohan McLure	2e6ac078ce	mm/page_table_check: reinstate address parameter in [__]page_table_check_pud_clear() This reverts commit `931c38e164` ("mm/page_table_check: remove unused parameter in [__]page_table_check_pud_clear"). Reinstate previously unused parameters for the purpose of supporting powerpc platforms, as many do not encode user/kernel ownership of the page in the pte, but instead in the address of the access. [ajd@linux.ibm.com: rebase on arm64 changes] Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-6-755bc151a50b@linux.ibm.com Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Ingo Molnar <mingo@kernel.org> # x86 Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Guo Weikang <guoweikang.kernel@gmail.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Magnus Lindholm <linmag7@gmail.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Cc: Thomas Huth <thuth@redhat.com> Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 20:02:34 -08:00
Rohan McLure	0a5ae44831	mm/page_table_check: provide addr parameter to page_table_check_ptes_set() To provide support for powerpc platforms, provide an addr parameter to the __page_table_check_ptes_set() and page_table_check_ptes_set() routines. This parameter is needed on some powerpc platforms which do not encode whether a mapping is for user or kernel in the pte. On such platforms, this can be inferred from the addr parameter. [ajd@linux.ibm.com: rebase on arm64 + riscv changes, update commit message] Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-5-755bc151a50b@linux.ibm.com Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Guo Weikang <guoweikang.kernel@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Magnus Lindholm <linmag7@gmail.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Cc: Thomas Huth <thuth@redhat.com> Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 20:02:34 -08:00
Rohan McLure	6e2d8f9fc4	mm/page_table_check: reinstate address parameter in [__]page_table_check_pmd[s]_set() This reverts commit `a3b837130b` ("mm/page_table_check: remove unused parameter in [__]page_table_check_pmd_set"). Reinstate previously unused parameters for the purpose of supporting powerpc platforms, as many do not encode user/kernel ownership of the page in the pte, but instead in the address of the access. Apply this to __page_table_check_pmds_set(), page_table_check_pmd_set(), and the page_table_check_pmd_set() wrapper macro. [ajd@linux.ibm.com: rebase on arm64 + riscv changes, update commit message] Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-4-755bc151a50b@linux.ibm.com Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Ingo Molnar <mingo@kernel.org> # x86 Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Guo Weikang <guoweikang.kernel@gmail.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Magnus Lindholm <linmag7@gmail.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Cc: Thomas Huth <thuth@redhat.com> Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 20:02:34 -08:00
Rohan McLure	c4a0c5ff85	mm/page_table_check: reinstate address parameter in [__]page_table_check_pud[s]_set() This reverts commit `6d144436d9` ("mm/page_table_check: remove unused parameter in [__]page_table_check_pud_set"). Reinstate previously unused parameters for the purpose of supporting powerpc platforms, as many do not encode user/kernel ownership of the page in the pte, but instead in the address of the access. Apply this to __page_table_check_puds_set(), page_table_check_puds_set() and the page_table_check_pud_set() wrapper macro. [ajd@linux.ibm.com: rebase on riscv + arm64 changes, update commit message] Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-3-755bc151a50b@linux.ibm.com Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Ingo Molnar <mingo@kernel.org> # x86 Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Guo Weikang <guoweikang.kernel@gmail.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Magnus Lindholm <linmag7@gmail.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Cc: Thomas Huth <thuth@redhat.com> Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 20:02:33 -08:00
Andrew Donnellan	ee329c29fd	arm64/mm: add addr parameter to __ptep_get_and_clear_anysz() To provide support for page table check on powerpc, we need to reinstate the address parameter in several functions, including page_table_check_{pte,pmd,pud}_clear(). In preparation for this, add the addr parameter to arm64's __ptep_get_and_clear_anysz() and change its callsites accordingly. Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-2-755bc151a50b@linux.ibm.com Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Guo Weikang <guoweikang.kernel@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Magnus Lindholm <linmag7@gmail.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Rohan McLure <rmclure@linux.ibm.com> Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Cc: Thomas Huth <thuth@redhat.com> Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 20:02:33 -08:00
Andrew Donnellan	9ac4941ace	arm64/mm: add addr parameter to __set_ptes_anysz() Patch series "Support page table check on PowerPC", v18. Support page table check on PowerPC. Page table check tracks the usage of of page table entries at each level to ensure that anonymous mappings have at most one writable consumer, and likewise that file-backed mappings are not simultaneously also anonymous mappings. In order to support this infrastructure, a number of helpers or stubs must be defined or updated for all powerpc platforms. Additionally, we separate set_pte_at() and set_pte_at_unchecked(), to allow for internal, uninstrumented mappings. On some PowerPC platforms, implementing {pte,pmd,pud}_user_accessible_page() requires the address. We revert previous changes that removed the address parameter from various interfaces, and add it to some other interfaces, in order to allow this. For now, we don't allow page table check alongside HUGETLB_PAGE, due to the arch-specific complexity of set_huge_page_at(). (I'm sure I could figure this out, but I have to get this version on this list before I leave my job.) This series was initially written by Rohan McLure, who has left IBM and is no longer working on powerpc. This patch (of 18): To provide support for page table check on powerpc, we need to reinstate the address parameter in several functions, including page_table_check_{ptes,pmds,puds}_set(). In preparation for this, add the addr parameter to arm64's __set_ptes_anysz() and change its callsites accordingly. Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-0-755bc151a50b@linux.ibm.com Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-1-755bc151a50b@linux.ibm.com Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alistair Popple <apopple@nvidia.com> Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Guo Weikang <guoweikang.kernel@gmail.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Magnus Lindholm <linmag7@gmail.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Cc: Thomas Huth <thuth@redhat.com> Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Ingo Molnar <mingo@kernel.org> Cc: Rohan McLure <rmclure@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 20:02:33 -08:00
Kevin Brodsky	4dd9b4d7a8	arm64: mm: replace TIF_LAZY_MMU with is_lazy_mmu_mode_active() The generic lazy_mmu layer now tracks whether a task is in lazy MMU mode. As a result we no longer need a TIF flag for that purpose - let's use the new is_lazy_mmu_mode_active() helper instead. The explicit check for in_interrupt() is no longer necessary either as is_lazy_mmu_mode_active() always returns false in interrupt context. Link: https://lkml.kernel.org/r/20251215150323.2218608-11-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Juegren Gross <jgross@suse.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:34 -08:00
Kevin Brodsky	5ab2467495	mm: enable lazy_mmu sections to nest Despite recent efforts to prevent lazy_mmu sections from nesting, it remains difficult to ensure that it never occurs - and in fact it does occur on arm64 in certain situations (CONFIG_DEBUG_PAGEALLOC). Commit `1ef3095b14` ("arm64/mm: Permit lazy_mmu_mode to be nested") made nesting tolerable on arm64, but without truly supporting it: the inner call to leave() disables the batching optimisation before the outer section ends. This patch actually enables lazy_mmu sections to nest by tracking the nesting level in task_struct, in a similar fashion to e.g. pagefault_{enable,disable}(). This is fully handled by the generic lazy_mmu helpers that were recently introduced. lazy_mmu sections were not initially intended to nest, so we need to clarify the semantics w.r.t. the arch__lazy_mmu_mode() callbacks. This patch takes the following approach: The outermost calls to lazy_mmu_mode_{enable,disable}() trigger calls to arch_{enter,leave}_lazy_mmu_mode() - this is unchanged. * Nested calls to lazy_mmu_mode_{enable,disable}() are not forwarded to the arch via arch_{enter,leave} - lazy MMU remains enabled so the assumption is that these callbacks are not relevant. However, existing code may rely on a call to disable() to flush any batched state, regardless of nesting. arch_flush_lazy_mmu_mode() is therefore called in that situation. A separate interface was recently introduced to temporarily pause the lazy MMU mode: lazy_mmu_mode_{pause,resume}(). pause() fully exits the mode regardless of the nesting level, and resume() restores the mode at the same nesting level. pause()/resume() are themselves allowed to nest, so we actually store two nesting levels in task_struct: enable_count and pause_count. A new helper is_lazy_mmu_mode_active() is introduced to determine whether we are currently in lazy MMU mode; this will be used in subsequent patches to replace the various ways arch's currently track whether the mode is enabled. In summary (enable/pause represent the values after the call): lazy_mmu_mode_enable() -> arch_enter() enable=1 pause=0 lazy_mmu_mode_enable() -> ø enable=2 pause=0 lazy_mmu_mode_pause() -> arch_leave() enable=2 pause=1 lazy_mmu_mode_resume() -> arch_enter() enable=2 pause=0 lazy_mmu_mode_disable() -> arch_flush() enable=1 pause=0 lazy_mmu_mode_disable() -> arch_leave() enable=0 pause=0 Note: is_lazy_mmu_mode_active() is added to <linux/sched.h> to allow arch headers included by <linux/pgtable.h> to use it. Link: https://lkml.kernel.org/r/20251215150323.2218608-10-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Juegren Gross <jgross@suse.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:34 -08:00
Kevin Brodsky	9273dfaeac	mm: bail out of lazy_mmu_mode_* in interrupt context The lazy MMU mode cannot be used in interrupt context. This is documented in <linux/pgtable.h>, but isn't consistently handled across architectures. arm64 ensures that calls to lazy_mmu_mode_* have no effect in interrupt context, because such calls do occur in certain configurations - see commit `b81c688426` ("arm64/mm: Disable barrier batching in interrupt contexts"). Other architectures do not check this situation, most likely because it hasn't occurred so far. Let's handle this in the new generic lazy_mmu layer, in the same fashion as arm64: bail out of lazy_mmu_mode_* if in_interrupt(). Also remove the arm64 handling that is now redundant. Both arm64 and x86/Xen also ensure that any lazy MMU optimisation is disabled while in interrupt (see queue_pte_barriers() and xen_get_lazy_mode() respectively). This will be handled in the generic layer in a subsequent patch. Link: https://lkml.kernel.org/r/20251215150323.2218608-9-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Juegren Gross <jgross@suse.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:34 -08:00
Kevin Brodsky	7303ecbfe4	mm: introduce CONFIG_ARCH_HAS_LAZY_MMU_MODE Architectures currently opt in for implementing lazy_mmu helpers by defining __HAVE_ARCH_ENTER_LAZY_MMU_MODE. In preparation for introducing a generic lazy_mmu layer that will require storage in task_struct, let's switch to a cleaner approach: instead of defining a macro, select a CONFIG option. This patch introduces CONFIG_ARCH_HAS_LAZY_MMU_MODE and has each arch select it when it implements lazy_mmu helpers. __HAVE_ARCH_ENTER_LAZY_MMU_MODE is removed and <linux/pgtable.h> relies on the new CONFIG instead. On x86, lazy_mmu helpers are only implemented if PARAVIRT_XXL is selected. This creates some complications in arch/x86/boot/, because a few files manually undefine PARAVIRT* options. As a result <asm/paravirt.h> does not define the lazy_mmu helpers, but this breaks the build as <linux/pgtable.h> only defines them if !CONFIG_ARCH_HAS_LAZY_MMU_MODE. There does not seem to be a clean way out of this - let's just undefine that new CONFIG too. Link: https://lkml.kernel.org/r/20251215150323.2218608-7-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Acked-by: Andreas Larsson <andreas@gaisler.com> [sparc] Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand (Red Hat) <david@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Juegren Gross <jgross@suse.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-20 19:24:33 -08:00
Linus Torvalds	44fc84337b	Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "These are the arm64 updates for 6.19. The biggest part is the Arm MPAM driver under drivers/resctrl/. There's a patch touching mm/ to handle spurious faults for huge pmd (similar to the pte version). The corresponding arm64 part allows us to avoid the TLB maintenance if a (huge) page is reused after a write fault. There's EFI refactoring to allow runtime services with preemption enabled and the rest is the usual perf/PMU updates and several cleanups/typos. Summary: Core features: - Basic Arm MPAM (Memory system resource Partitioning And Monitoring) driver under drivers/resctrl/ which makes use of the fs/rectrl/ API Perf and PMU: - Avoid cycle counter on multi-threaded CPUs - Extend CSPMU device probing and add additional filtering support for NVIDIA implementations - Add support for the PMUs on the NoC S3 interconnect - Add additional compatible strings for new Cortex and C1 CPUs - Add support for data source filtering to the SPE driver - Add support for i.MX8QM and "DB" PMU in the imx PMU driver Memory managemennt: - Avoid broadcast TLBI if page reused in write fault - Elide TLB invalidation if the old PTE was not valid - Drop redundant cpu_set__tcr_t0sz() macros - Propagate pgtable_alloc() errors outside of __create_pgd_mapping() - Propagate return value from __change_memory_common() ACPI and EFI: - Call EFI runtime services without disabling preemption - Remove unused ACPI function Miscellaneous: - ptrace support to disable streaming on SME-only systems - Improve sysreg generation to include a 'Prefix' descriptor - Replace __ASSEMBLY__ with __ASSEMBLER__ - Align register dumps in the kselftest zt-test - Remove some no longer used macros/functions - Various spelling corrections" tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits) arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic arm64/pageattr: Propagate return value from __change_memory_common arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS KVM: arm64: selftests: Consider all 7 possible levels of cache KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros Documentation/arm64: Fix the typo of register names ACPI: GTDT: Get rid of acpi_arch_timer_mem_init() perf: arm_spe: Add support for filtering on data source perf: Add perf_event_attr::config4 perf/imx_ddr: Add support for PMU in DB (system interconnects) perf/imx_ddr: Get and enable optional clks perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe() dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT arm64: mm: use untagged address to calculate page index MAINTAINERS: new entry for MPAM Driver arm_mpam: Add kunit tests for props_mismatch() arm_mpam: Add kunit test for bitmap reset arm_mpam: Add helper to reset saved mbwu state ...	2025-12-02 17:03:55 -08:00
Catalin Marinas	17c05cb0ef	Merge branches 'for-next/misc', 'for-next/kselftest', 'for-next/efi-preempt', 'for-next/assembler-macro', 'for-next/typos', 'for-next/sme-ptrace-disable', 'for-next/local-tlbi-page-reused', 'for-next/mpam', 'for-next/acpi' and 'for-next/documentation', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: perf: arm_spe: Add support for filtering on data source perf: Add perf_event_attr::config4 perf/imx_ddr: Add support for PMU in DB (system interconnects) perf/imx_ddr: Get and enable optional clks perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe() dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL arch_topology: Provide a stub topology_core_has_smt() for !CONFIG_GENERIC_ARCH_TOPOLOGY perf/arm-ni: Fix and optimise register offset calculation perf: arm_pmuv3: Add new Cortex and C1 CPU PMUs perf: arm_cspmu: fix error handling in arm_cspmu_impl_unregister() perf/arm-ni: Add NoC S3 support perf/arm_cspmu: nvidia: Add pmevfiltr2 support perf/arm_cspmu: nvidia: Add revision id matching perf/arm_cspmu: Add pmpidr support perf/arm_cspmu: Add callback to reset filter config perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores * for-next/misc: : Miscellaneous patches arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT arm64: mm: use untagged address to calculate page index arm64: mm: make linear mapping permission update more robust for patial range arm64/mm: Elide TLB flush in certain pte protection transitions arm64/mm: Rename try_pgd_pgtable_alloc_init_mm arm64/mm: Allow __create_pgd_mapping() to propagate pgtable_alloc() errors arm64: add unlikely hint to MTE async fault check in el0_svc_common arm64: acpi: add newline to deferred APEI warning arm64: entry: Clean out some indirection arm64/mm: Ensure PGD_SIZE is aligned to 64 bytes when PA_BITS = 52 arm64/mm: Drop cpu_set_[default\|idmap]_tcr_t0sz() arm64: remove unused ARCH_PFN_OFFSET arm64: use SOFTIRQ_ON_OWN_STACK for enabling softirq stack arm64: Remove assertion on CONFIG_VMAP_STACK * for-next/kselftest: : arm64 kselftest patches kselftest/arm64: Align zt-test register dumps * for-next/efi-preempt: : arm64: Make EFI calls preemptible arm64/efi: Call EFI runtime services without disabling preemption arm64/efi: Move uaccess en/disable out of efi_set_pgd() arm64/efi: Drop efi_rt_lock spinlock from EFI arch wrapper arm64/fpsimd: Permit kernel mode NEON with IRQs off arm64/fpsimd: Don't warn when EFI execution context is preemptible efi/runtime-wrappers: Keep track of the efi_runtime_lock owner efi: Add missing static initializer for efi_mm::cpus_allowed_lock * for-next/assembler-macro: : arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in headers arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers * for-next/typos: : Random typo/spelling fixes arm64: Fix double word in comments arm64: Fix typos and spelling errors in comments * for-next/sme-ptrace-disable: : Support disabling streaming mode via ptrace on SME only systems kselftest/arm64: Cover disabling streaming mode without SVE in fp-ptrace kselftst/arm64: Test NT_ARM_SVE FPSIMD format writes on non-SVE systems arm64/sme: Support disabling streaming mode via ptrace on SME only systems * for-next/local-tlbi-page-reused: : arm64, mm: avoid TLBI broadcast if page reused in write fault arm64, tlbflush: don't TLBI broadcast if page reused in write fault mm: add spurious fault fixing support for huge pmd * for-next/mpam: (34 commits) : Basic Arm MPAM driver (more to follow) MAINTAINERS: new entry for MPAM Driver arm_mpam: Add kunit tests for props_mismatch() arm_mpam: Add kunit test for bitmap reset arm_mpam: Add helper to reset saved mbwu state arm_mpam: Use long MBWU counters if supported arm_mpam: Probe for long/lwd mbwu counters arm_mpam: Consider overflow in bandwidth counter state arm_mpam: Track bandwidth counter state for power management arm_mpam: Add mpam_msmon_read() to read monitor value arm_mpam: Add helpers to allocate monitors arm_mpam: Probe and reset the rest of the features arm_mpam: Allow configuration to be applied and restored during cpu online arm_mpam: Use a static key to indicate when mpam is enabled arm_mpam: Register and enable IRQs arm_mpam: Extend reset logic to allow devices to be reset any time arm_mpam: Add a helper to touch an MSC from any CPU arm_mpam: Reset MSC controls from cpuhp callbacks arm_mpam: Merge supported features during mpam_enable() into mpam_class arm_mpam: Probe the hardware features resctrl supports arm_mpam: Add helpers for managing the locking around the mon_sel registers ... * for-next/acpi: : arm64 acpi updates ACPI: GTDT: Get rid of acpi_arch_timer_mem_init() * for-next/documentation: : arm64 Documentation updates Documentation/arm64: Fix the typo of register names	2025-11-28 15:47:12 +00:00
Huang Ying	cb1fa2e999	arm64, tlbflush: don't TLBI broadcast if page reused in write fault A multi-thread customer workload with large memory footprint uses fork()/exec() to run some external programs every tens seconds. When running the workload on an arm64 server machine, it's observed that quite some CPU cycles are spent in the TLB flushing functions. While running the workload on the x86_64 server machine, it's not. This causes the performance on arm64 to be much worse than that on x86_64. During the workload running, after fork()/exec() write-protects all pages in the parent process, memory writing in the parent process will cause a write protection fault. Then the page fault handler will make the PTE/PDE writable if the page can be reused, which is almost always true in the workload. On arm64, to avoid the write protection fault on other CPUs, the page fault handler flushes the TLB globally with TLBI broadcast after changing the PTE/PDE. However, this isn't always necessary. Firstly, it's safe to leave some stale read-only TLB entries as long as they will be flushed finally. Secondly, it's quite possible that the original read-only PTE/PDEs aren't cached in remote TLB at all if the memory footprint is large. In fact, on x86_64, the page fault handler doesn't flush the remote TLB in this situation, which benefits the performance a lot. To improve the performance on arm64, make the write protection fault handler flush the TLB locally instead of globally via TLBI broadcast after making the PTE/PDE writable. If there are stale read-only TLB entries in the remote CPUs, the page fault handler on these CPUs will regard the page fault as spurious and flush the stale TLB entries. To test the patchset, make the usemem.c from vm-scalability (https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git). support calling fork()/exec() periodically. To mimic the behavior of the customer workload, run usemem with 4 threads, access 100GB memory, and call fork()/exec() every 40 seconds. Test results show that with the patchset the score of usemem improves ~40.6%. The cycles% of TLB flush functions reduces from ~50.5% to ~0.3% in perf profile. Signed-off-by: Huang Ying <ying.huang@linux.alibaba.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Barry Song <baohua@kernel.org> Acked-by: Zi Yan <ziy@nvidia.com> Cc: Will Deacon <will@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@redhat.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Yin Fengwei <fengwei_yin@linux.alibaba.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 16:01:48 +00:00
mrigendrachaubey	96ac403ea2	arm64: Fix typos and spelling errors in comments This patch corrects several minor typographical and spelling errors in comments across multiple arm64 source files. No functional changes. Signed-off-by: mrigendrachaubey <mrigendra.chaubey@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-12 17:06:21 +00:00
Thomas Huth	287d163322	arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers While the GCC and Clang compilers already define __ASSEMBLER__ automatically when compiling assembly code, __ASSEMBLY__ is a macro that only gets defined by the Makefiles in the kernel. This can be very confusing when switching between userspace and kernelspace coding, or when dealing with uapi headers that rather should use __ASSEMBLER__ instead. So let's standardize now on the __ASSEMBLER__ macro that is provided by the compilers. This is a mostly mechanical patch (done with a simple "sed -i" statement), except for the following files where comments with mis-spelled macros were tweaked manually: arch/arm64/include/asm/stacktrace/frame.h arch/arm64/include/asm/kvm_ptrauth.h arch/arm64/include/asm/debug-monitors.h arch/arm64/include/asm/esr.h arch/arm64/include/asm/scs.h arch/arm64/include/asm/memory.h Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 19:35:59 +00:00
Huang Ying	143937ca51	arm64, mm: avoid always making PTE dirty in pte_mkwrite() Current pte_mkwrite_novma() makes PTE dirty unconditionally. This may mark some pages that are never written dirty wrongly. For example, do_swap_page() may map the exclusive pages with writable and clean PTEs if the VMA is writable and the page fault is for read access. However, current pte_mkwrite_novma() implementation always dirties the PTE. This may cause unnecessary disk writing if the pages are never written before being reclaimed. So, change pte_mkwrite_novma() to clear the PTE_RDONLY bit only if the PTE_DIRTY bit is set to make it possible to make the PTE writable and clean. The current behavior was introduced in commit `73e86cb03c` ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()"). Before that, pte_mkwrite() only sets the PTE_WRITE bit, while set_pte_at() only clears the PTE_RDONLY bit if both the PTE_WRITE and the PTE_DIRTY bits are set. To test the performance impact of the patch, on an arm64 server machine, run 16 redis-server processes on socket 1 and 16 memtier_benchmark processes on socket 0 with mostly get transactions (that is, redis-server will mostly read memory only). The memory footprint of redis-server is larger than the available memory, so swap out/in will be triggered. Test results show that the patch can avoid most swapping out because the pages are mostly clean. And the benchmark throughput improves ~23.9% in the test. Fixes: `73e86cb03c` ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()") Signed-off-by: Huang Ying <ying.huang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Gavin Shan <gshan@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-10-21 15:00:25 +01:00
Yang Shi	a166563e7e	arm64: mm: support large block mapping when rodata=full When rodata=full is specified, kernel linear mapping has to be mapped at PTE level since large page table can't be split due to break-before-make rule on ARM64. This resulted in a couple of problems: - performance degradation - more TLB pressure - memory waste for kernel page table With FEAT_BBM level 2 support, splitting large block page table to smaller ones doesn't need to make the page table entry invalid anymore. This allows kernel split large block mapping on the fly. Add kernel page table split support and use large block mapping by default when FEAT_BBM level 2 is supported for rodata=full. When changing permissions for kernel linear mapping, the page table will be split to smaller size. The machine without FEAT_BBM level 2 will fallback to have kernel linear mapping PTE-mapped when rodata=full. With this we saw significant performance boost with some benchmarks and much less memory consumption on my AmpereOne machine (192 cores, 1P) with 256GB memory. * Memory use after boot Before: MemTotal: 258988984 kB MemFree: 254821700 kB After: MemTotal: 259505132 kB MemFree: 255410264 kB Around 500MB more memory are free to use. The larger the machine, the more memory saved. * Memcached We saw performance degradation when running Memcached benchmark with rodata=full vs rodata=on. Our profiling pointed to kernel TLB pressure. With this patchset we saw ops/sec is increased by around 3.5%, P99 latency is reduced by around 9.6%. The gain mainly came from reduced kernel TLB misses. The kernel TLB MPKI is reduced by 28.5%. The benchmark data is now on par with rodata=on too. * Disk encryption (dm-crypt) benchmark Ran fio benchmark with the below command on a 128G ramdisk (ext4) with disk encryption (by dm-crypt). fio --directory=/data --random_generator=lfsr --norandommap \ --randrepeat 1 --status-interval=999 --rw=write --bs=4k --loops=1 \ --ioengine=sync --iodepth=1 --numjobs=1 --fsync_on_close=1 \ --group_reporting --thread --name=iops-test-job --eta-newline=1 \ --size 100G The IOPS is increased by 90% - 150% (the variance is high, but the worst number of good case is around 90% more than the best number of bad case). The bandwidth is increased and the avg clat is reduced proportionally. * Sequential file read Read 100G file sequentially on XFS (xfs_io read with page cache populated). The bandwidth is increased by 150%. Co-developed-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 21:36:37 +01:00
Dev Jain	7efa1cd5f8	arm64: add batched versions of ptep_modify_prot_start/commit Override the generic definition of modify_prot_start_ptes() to use get_and_clear_full_ptes(). This helper does a TLBI only for the starting and ending contpte block of the range, whereas the current implementation will call ptep_get_and_clear() for every contpte block, thus doing a TLBI on every contpte block. Therefore, we have a performance win. The arm64 definition of pte_accessible() allows us to batch in the errata specific case: #define pte_accessible(mm, pte) \ (mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid(pte)) All ptes are obviously present in the folio batch, and they are also valid. Override the generic definition of modify_prot_commit_ptes() to simply use set_ptes() to map the new ptes into the pagetable. Link: https://lkml.kernel.org/r/20250718090244.21092-8-dev.jain@arm.com Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Zhenhua Huang <quic_zhenhuah@quicinc.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-24 19:12:41 -07:00
Alistair Popple	d438d27341	mm: remove devmap related functions and page table bits Now that DAX and all other reference counts to ZONE_DEVICE pages are managed normally there is no need for the special devmap PTE/PMD/PUD page table bits. So drop all references to these, freeing up a software defined page table bit on architectures supporting it. Link: https://lkml.kernel.org/r/6389398c32cc9daa3dfcaa9f79c7972525d310ce.1750323463.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Acked-by: Will Deacon <will@kernel.org> # arm64 Acked-by: David Hildenbrand <david@redhat.com> Suggested-by: Chunyan Zhang <zhang.lyra@gmail.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Björn Töpel <bjorn@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Deepak Gupta <debug@rivosinc.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Inki Dae <m.szyprowski@samsung.com> Cc: John Groves <john@groves.net> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-09 22:42:18 -07:00
Ryan Roberts	38b0ece6d7	mm/filemap: allow arch to request folio size for exec memory Change the readahead config so that if it is being requested for an executable mapping, do a synchronous read into a set of folios with an arch-specified order and in a naturally aligned manner. We no longer center the read on the faulting page but simply align it down to the previous natural boundary. Additionally, we don't bother with an asynchronous part. On arm64 if memory is physically contiguous and naturally aligned to the "contpte" size, we can use contpte mappings, which improves utilization of the TLB. When paired with the "multi-size THP" feature, this works well to reduce dTLB pressure. However iTLB pressure is still high due to executable mappings having a low likelihood of being in the required folio size and mapping alignment, even when the filesystem supports readahead into large folios (e.g. XFS). The reason for the low likelihood is that the current readahead algorithm starts with an order-0 folio and increases the folio order by 2 every time the readahead mark is hit. But most executable memory tends to be accessed randomly and so the readahead mark is rarely hit and most executable folios remain order-0. So let's special-case the read(ahead) logic for executable mappings. The trade-off is performance improvement (due to more efficient storage of the translations in iTLB) vs potential for making reclaim more difficult (due to the folios being larger so if a part of the folio is hot the whole thing is considered hot). But executable memory is a small portion of the overall system memory so I doubt this will even register from a reclaim perspective. I've chosen 64K folio size for arm64 which benefits both the 4K and 16K base page size configs. Crucially the same amount of data is still read (usually 128K) so I'm not expecting any read amplification issues. I don't anticipate any write amplification because text is always RO. Note that the text region of an ELF file could be populated into the page cache for other reasons than taking a fault in a mmapped area. The most common case is due to the loader read()ing the header which can be shared with the beginning of text. So some text will still remain in small folios, but this simple, best effort change provides good performance improvements as is. Confine this special-case approach to the bounds of the VMA. This prevents wasting memory for any padding that might exist in the file between sections. Previously the padding would have been contained in order-0 folios and would be easy to reclaim. But now it would be part of a larger folio so more difficult to reclaim. Solve this by simply not reading it into memory in the first place. Benchmarking ============ The below shows pgbench and redis benchmarks on Graviton3 arm64 system. First, confirmation that this patch causes more text to be contained in 64K folios: +----------------------+---------------+---------------+---------------+ \| File-backed folios by\| system boot \| pgbench \| redis \| \| size as percentage of+-------+-------+-------+-------+-------+-------+ \| all mapped text mem \|before \| after \|before \| after \|before \| after \| +======================+=======+=======+=======+=======+=======+=======+ \| base-page-4kB \| 78% \| 30% \| 78% \| 11% \| 73% \| 14% \| \| thp-aligned-8kB \| 1% \| 0% \| 0% \| 0% \| 1% \| 0% \| \| thp-aligned-16kB \| 17% \| 4% \| 17% \| 3% \| 20% \| 4% \| \| thp-aligned-32kB \| 1% \| 1% \| 1% \| 2% \| 1% \| 1% \| \| thp-aligned-64kB \| 3% \| 63% \| 3% \| 81% \| 4% \| 77% \| \| thp-aligned-128kB \| 0% \| 1% \| 1% \| 1% \| 1% \| 2% \| \| thp-unaligned-64kB \| 0% \| 0% \| 0% \| 1% \| 0% \| 1% \| \| thp-unaligned-128kB \| 0% \| 1% \| 0% \| 0% \| 0% \| 0% \| \| thp-partial \| 0% \| 0% \| 0% \| 1% \| 0% \| 1% \| +----------------------+-------+-------+-------+-------+-------+-------+ \| cont-aligned-64kB \| 4% \| 65% \| 4% \| 83% \| 6% \| 79% \| +----------------------+-------+-------+-------+-------+-------+-------+ The above shows that for both workloads (each isolated with cgroups) as well as the general system state after boot, the amount of text backed by 4K and 16K folios reduces and the amount backed by 64K folios increases significantly. And the amount of text that is contpte-mapped significantly increases (see last row). And this is reflected in performance improvement. "(I)" indicates a statistically significant improvement. Note TPS and Reqs/sec are rates so bigger is better, ms is time so smaller is better: +-------------+-------------------------------------------+------------+ \| Benchmark \| Result Class \| Improvemnt \| +=============+===========================================+============+ \| pts/pgbench \| Scale: 1 Clients: 1 RO (TPS) \| (I) 3.47% \| \| \| Scale: 1 Clients: 1 RO - Latency (ms) \| -2.88% \| \| \| Scale: 1 Clients: 250 RO (TPS) \| (I) 5.02% \| \| \| Scale: 1 Clients: 250 RO - Latency (ms) \| (I) -4.79% \| \| \| Scale: 1 Clients: 1000 RO (TPS) \| (I) 6.16% \| \| \| Scale: 1 Clients: 1000 RO - Latency (ms) \| (I) -5.82% \| \| \| Scale: 100 Clients: 1 RO (TPS) \| 2.51% \| \| \| Scale: 100 Clients: 1 RO - Latency (ms) \| -3.51% \| \| \| Scale: 100 Clients: 250 RO (TPS) \| (I) 4.75% \| \| \| Scale: 100 Clients: 250 RO - Latency (ms) \| (I) -4.44% \| \| \| Scale: 100 Clients: 1000 RO (TPS) \| (I) 6.34% \| \| \| Scale: 100 Clients: 1000 RO - Latency (ms)\| (I) -5.95% \| +-------------+-------------------------------------------+------------+ \| pts/redis \| Test: GET Connections: 50 (Reqs/sec) \| (I) 3.20% \| \| \| Test: GET Connections: 1000 (Reqs/sec) \| (I) 2.55% \| \| \| Test: LPOP Connections: 50 (Reqs/sec) \| (I) 4.59% \| \| \| Test: LPOP Connections: 1000 (Reqs/sec) \| (I) 4.81% \| \| \| Test: LPUSH Connections: 50 (Reqs/sec) \| (I) 5.31% \| \| \| Test: LPUSH Connections: 1000 (Reqs/sec) \| (I) 4.36% \| \| \| Test: SADD Connections: 50 (Reqs/sec) \| (I) 2.64% \| \| \| Test: SADD Connections: 1000 (Reqs/sec) \| (I) 4.15% \| \| \| Test: SET Connections: 50 (Reqs/sec) \| (I) 3.11% \| \| \| Test: SET Connections: 1000 (Reqs/sec) \| (I) 3.36% \| +-------------+-------------------------------------------+------------+ [ryan.roberts@arm.com: fix use-after-free] Link: https://lkml.kernel.org/r/ea7f9da7-9a9f-4b85-9d0a-35b320f5ed25@arm.com [ryan.roberts@arm.com: use the vma_pages() helper instead of open-coding] Link: https://lkml.kernel.org/r/0e0f674b-3b7e-494f-ae7a-fc9dbb98dad4@arm.com Link: https://lkml.kernel.org/r/20250609092729.274960-6-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Will Deacon <will@kernel.org> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-07-09 22:42:03 -07:00
Magnus Lindholm	403d1338a4	mm: pgtable: fix pte_swp_exclusive Make pte_swp_exclusive return bool instead of int. This will better reflect how pte_swp_exclusive is actually used in the code. This fixes swap/swapoff problems on Alpha due pte_swp_exclusive not returning correct values when _PAGE_SWP_EXCLUSIVE bit resides in upper 32-bits of PTE (like on alpha). Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Magnus Lindholm <linmag7@gmail.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/lkml/20250218175735.19882-2-linmag7@gmail.com/ Link: https://lore.kernel.org/lkml/20250602041118.GA2675383@ZenIV/ [ Applied as the 'sed' script Al suggested - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2025-06-11 14:52:08 -07:00
Linus Torvalds	00c010e130	Merge tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of creating a pte which addresses the first page in a folio and reduces the amount of plumbing which architecture must implement to provide this. - "Misc folio patches for 6.16" from Matthew Wilcox is a shower of largely unrelated folio infrastructure changes which clean things up and better prepare us for future work. - "memory,x86,acpi: hotplug memory alignment advisement" from Gregory Price adds early-init code to prevent x86 from leaving physical memory unused when physical address regions are not aligned to memory block size. - "mm/compaction: allow more aggressive proactive compaction" from Michal Clapinski provides some tuning of the (sadly, hard-coded (more sadly, not auto-tuned)) thresholds for our invokation of proactive compaction. In a simple test case, the reduction of a guest VM's memory consumption was dramatic. - "Minor cleanups and improvements to swap freeing code" from Kemeng Shi provides some code cleaups and a small efficiency improvement to this part of our swap handling code. - "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin adds the ability for a ptracer to modify syscalls arguments. At this time we can alter only "system call information that are used by strace system call tampering, namely, syscall number, syscall arguments, and syscall return value. This series should have been incorporated into mm.git's "non-MM" branch, but I goofed. - "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl against /proc/pid/pagemap. This permits CRIU to more efficiently get at the info about guard regions. - "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan implements that fix. No runtime effect is expected because validate_page_before_insert() happens to fix up this error. - "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David Hildenbrand basically brings uprobe text poking into the current decade. Remove a bunch of hand-rolled implementation in favor of using more current facilities. - "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman Khandual provides enhancements and generalizations to the pte dumping code. This might be needed when 128-bit Page Table Descriptors are enabled for ARM. - "Always call constructor for kernel page tables" from Kevin Brodsky ensures that the ctor/dtor is always called for kernel pgtables, as it already is for user pgtables. This permits the addition of more functionality such as "insert hooks to protect page tables". This change does result in various architectures performing unnecesary work, but this is fixed up where it is anticipated to occur. - "Rust support for mm_struct, vm_area_struct, and mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM structures. - "fix incorrectly disallowed anonymous VMA merges" from Lorenzo Stoakes takes advantage of some VMA merging opportunities which we've been missing for 15 years. - "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB flushing. Instead of flushing each address range in the provided iovec, we batch the flushing across all the iovec entries. The syscall's cost was approximately halved with a microbenchmark which was designed to load this particular operation. - "Track node vacancy to reduce worst case allocation counts" from Sidhartha Kumar makes the maple tree smarter about its node preallocation. stress-ng mmap performance increased by single-digit percentages and the amount of unnecessarily preallocated memory was dramaticelly reduced. - "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes a few unnecessary things which Baoquan noted when reading the code. - ""Enhance sysfs handling for memory hotplug in weighted interleave" from Rakie Kim "enhances the weighted interleave policy in the memory management subsystem by improving sysfs handling, fixing memory leaks, and introducing dynamic sysfs updates for memory hotplug support". Fixes things on error paths which we are unlikely to hit. - "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory" from SeongJae Park introduces new DAMOS quota goal metrics which eliminate the manual tuning which is required when utilizing DAMON for memory tiering. - "mm/vmalloc.c: code cleanup and improvements" from Baoquan He provides cleanups and small efficiency improvements which Baoquan found via code inspection. - "vmscan: enforce mems_effective during demotion" from Gregory Price changes reclaim to respect cpuset.mems_effective during demotion when possible. because presently, reclaim explicitly ignores cpuset.mems_effective when demoting, which may cause the cpuset settings to violated. This is useful for isolating workloads on a multi-tenant system from certain classes of memory more consistently. - "Clean up split_huge_pmd_locked() and remove unnecessary folio pointers" from Gavin Guo provides minor cleanups and efficiency gains in in the huge page splitting and migrating code. - "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache for `struct mem_cgroup', yielding improved memory utilization. - "add max arg to swappiness in memory.reclaim and lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness=" argument for memory.reclaim MGLRU's lru_gen. This directs proactive reclaim to reclaim from only anon folios rather than file-backed folios. - "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the first step on the path to permitting the kernel to maintain existing VMs while replacing the host kernel via file-based kexec. At this time only memblock's reserve_mem is preserved. - "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides and uses a smarter way of looping over a pfn range. By skipping ranges of invalid pfns. - "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning when a task is pinned a single NUMA mode. Dramatic performance benefits were seen in some real world cases. - "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank Garg addresses a warning which occurs during memory compaction when using JFS. - "move all VMA allocation, freeing and duplication logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more appropriate mm/vma.c. - "mm, swap: clean up swap cache mapping helper" from Kairui Song provides code consolidation and cleanups related to the folio_index() function. - "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that. - "memcg: Fix test_memcg_min/low test failures" from Waiman Long addresses some bogus failures which are being reported by the test_memcontrol selftest. - "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo Stoakes commences the deprecation of file_operations.mmap() in favor of the new file_operations.mmap_prepare(). The latter is more restrictive and prevents drivers from messing with things in ways which, amongst other problems, may defeat VMA merging. - "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's one. This is a step along the way to making memcg and objcg charging NMI-safe, which is a BPF requirement. - "mm/damon: minor fixups and improvements for code, tests, and documents" from SeongJae Park is yet another batch of miscellaneous DAMON changes. Fix and improve minor problems in code, tests and documents. - "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg stats to be irq safe. Another step along the way to making memcg charging and stats updates NMI-safe, a BPF requirement. - "Let unmap_hugepage_range() and several related functions take folio instead of page" from Fan Ni provides folio conversions in the hugetlb code. * tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits) mm: pcp: increase pcp->free_count threshold to trigger free_high mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range() mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page mm/hugetlb: pass folio instead of page to unmap_ref_private() memcg: objcg stock trylock without irq disabling memcg: no stock lock for cpu hot-unplug memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs memcg: make count_memcg_events re-entrant safe against irqs memcg: make mod_memcg_state re-entrant safe against irqs memcg: move preempt disable to callers of memcg_rstat_updated memcg: memcg_rstat_updated re-entrant safe against irqs mm: khugepaged: decouple SHMEM and file folios' collapse selftests/eventfd: correct test name and improve messages alloc_tag: check mem_profiling_support in alloc_tag_init Docs/damon: update titles and brief introductions to explain DAMOS selftests/damon/_damon_sysfs: read tried regions directories in order mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject() mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat() mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs ...	2025-05-31 15:44:16 -07:00
Ard Biesheuvel	93d0d6f8a6	arm64/boot: Move init_pgdir[] and init_idmap_pgdir[] into __pi_ namespace init_pgdir[] is only referenced from the startup code, but lives after BSS in the linker map. Before tightening the rules about accessing BSS from startup code, move init_pgdir[] into the __pi_ namespace, so it does not need to be exported explicitly. For symmetry, do the same with init_idmap_pgdir[], although it lives before BSS. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Link: https://lore.kernel.org/r/20250508114328.2460610-6-ardb+git@google.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-16 16:05:21 +01:00
Gavin Shan	13c63ce358	arm64: mm: Drop redundant check in pmd_trans_huge() pmd_val(pmd) is redundant because a positive pmd_present(pmd) ensures a positive pmd_val(pmd) according to their definitions like below. #define pmd_val(x) ((x).pmd) #define pmd_present(pmd) pte_present(pmd_pte(pmd)) #define pte_present(pte) (pte_valid(pte) \|\| pte_present_invalid(pte)) #define pte_valid(pte) (!!(pte_val(pte) & PTE_VALID)) #define pte_present_invalid(pte) \ ((pte_val(pte) & (PTE_VALID \| PTE_PRESENT_INVALID)) == PTE_PRESENT_INVALID) pte_present() can't be positive unless either of the flag PTE_VALID or PTE_PRESENT_INVALID is set. In this case, pmd_val(pmd) should be positive either. So lets drop the redundant check pmd_val(pmd) and no functional changes intended. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20250508085251.204282-1-gshan@redhat.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-16 15:10:13 +01:00
Ryan Roberts	1ef3095b14	arm64/mm: Permit lazy_mmu_mode to be nested lazy_mmu_mode is not supposed to permit nesting. But in practice this does happen with CONFIG_DEBUG_PAGEALLOC, where a page allocation inside a lazy_mmu_mode section (such as zap_pte_range()) will change permissions on the linear map with apply_to_page_range(), which re-enters lazy_mmu_mode (see stack trace below). The warning checking that nesting was not happening was previously being triggered due to this. So let's relax by removing the warning and tolerate nesting in the arm64 implementation. The first (inner) call to arch_leave_lazy_mmu_mode() will flush and clear the flag such that the remainder of the work in the outer nest behaves as if outside of lazy mmu mode. This is safe and keeps tracking simple. Code review suggests powerpc deals with this issue in the same way. ------------[ cut here ]------------ WARNING: CPU: 6 PID: 1 at arch/arm64/include/asm/pgtable.h:89 __apply_to_page_range+0x85c/0x9f8 Modules linked in: ip_tables x_tables ipv6 CPU: 6 UID: 0 PID: 1 Comm: systemd Not tainted 6.15.0-rc5-00075-g676795fe9cf6 #1 PREEMPT Hardware name: QEMU KVM Virtual Machine, BIOS 2024.08-4 10/25/2024 pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __apply_to_page_range+0x85c/0x9f8 lr : __apply_to_page_range+0x2b4/0x9f8 sp : ffff80008009b3c0 x29: ffff80008009b460 x28: ffff0000c43a3000 x27: ffff0001ff62b108 x26: ffff0000c43a4000 x25: 0000000000000001 x24: 0010000000000001 x23: ffffbf24c9c209c0 x22: ffff80008009b4d0 x21: ffffbf24c74a3b20 x20: ffff0000c43a3000 x19: ffff0001ff609d18 x18: 0000000000000001 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000003 x14: 0000000000000028 x13: ffffbf24c97c1000 x12: ffff0000c43a3fff x11: ffffbf24cacc9a70 x10: ffff0000c43a3fff x9 : ffff0001fffff018 x8 : 0000000000000012 x7 : ffff0000c43a4000 x6 : ffff0000c43a4000 x5 : ffffbf24c9c209c0 x4 : ffff0000c43a3fff x3 : ffff0001ff609000 x2 : 0000000000000d18 x1 : ffff0000c03e8000 x0 : 0000000080000000 Call trace: __apply_to_page_range+0x85c/0x9f8 (P) apply_to_page_range+0x14/0x20 set_memory_valid+0x5c/0xd8 __kernel_map_pages+0x84/0xc0 get_page_from_freelist+0x1110/0x1340 __alloc_frozen_pages_noprof+0x114/0x1178 alloc_pages_mpol+0xb8/0x1d0 alloc_frozen_pages_noprof+0x48/0xc0 alloc_pages_noprof+0x10/0x60 get_free_pages_noprof+0x14/0x90 __tlb_remove_folio_pages_size.isra.0+0xe4/0x140 __tlb_remove_folio_pages+0x10/0x20 unmap_page_range+0xa1c/0x14c0 unmap_single_vma.isra.0+0x48/0x90 unmap_vmas+0xe0/0x200 vms_clear_ptes+0xf4/0x140 vms_complete_munmap_vmas+0x7c/0x208 do_vmi_align_munmap+0x180/0x1a8 do_vmi_munmap+0xac/0x188 __vm_munmap+0xe0/0x1e0 __arm64_sys_munmap+0x20/0x38 invoke_syscall+0x48/0x104 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x4c/0x16c el0t_64_sync_handler+0x10c/0x140 el0t_64_sync+0x198/0x19c irq event stamp: 281312 hardirqs last enabled at (281311): [<ffffbf24c780fd04>] bad_range+0x164/0x1c0 hardirqs last disabled at (281312): [<ffffbf24c89c4550>] el1_dbg+0x24/0x98 softirqs last enabled at (281054): [<ffffbf24c752d99c>] handle_softirqs+0x4cc/0x518 softirqs last disabled at (281019): [<ffffbf24c7450694>] __do_softirq+0x14/0x20 ---[ end trace 0000000000000000 ]--- Fixes: `5fdd05efa1` ("arm64/mm: Batch barriers when updating kernel mappings") Reported-by: Catalin Marinas <catalin.marinas@arm.com> Closes: https://lore.kernel.org/linux-arm-kernel/aCH0TLRQslXHin5Q@arm.com/ Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250512150333.5589-1-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-14 13:28:40 +01:00
Ryan Roberts	b81c688426	arm64/mm: Disable barrier batching in interrupt contexts Commit `5fdd05efa1` ("arm64/mm: Batch barriers when updating kernel mappings") enabled arm64 kernels to track "lazy mmu mode" using TIF flags in order to defer barriers until exiting the mode. At the same time, it added warnings to check that pte manipulations were never performed in interrupt context, because the tracking implementation could not deal with nesting. But it turns out that some debug features (e.g. KFENCE, DEBUG_PAGEALLOC) do manipulate ptes in softirq context, which triggered the warnings. So let's take the simplest and safest route and disable the batching optimization in interrupt contexts. This makes these users no worse off than prior to the optimization. Additionally the known offenders are debug features that only manipulate a single PTE, so there is no performance gain anyway. There may be some obscure case of encrypted/decrypted DMA with the dma_free_coherent called from an interrupt context, but again, this is no worse off than prior to the commit. Some options for supporting nesting were considered, but there is a difficult to solve problem if any code manipulates ptes within interrupt context but outside of a lazy mmu region. If this case exists, the code would expect the updates to be immediate, but because the task context may have already been in lazy mmu mode, the updates would be deferred, which could cause incorrect behaviour. This problem is avoided by always ensuring updates within interrupt context are immediate. Fixes: `5fdd05efa1` ("arm64/mm: Batch barriers when updating kernel mappings") Reported-by: syzbot+5c0d9392e042f41d45c5@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-arm-kernel/681f2a09.050a0220.f2294.0006.GAE@google.com/ Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250512102242.4156463-1-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-14 13:27:55 +01:00
Matthew Wilcox (Oracle)	5071ea3d7b	arch: remove mk_pmd() There are now no callers of mk_huge_pmd() and mk_pmd(). Remove them. Link: https://lkml.kernel.org/r/20250402181709.2386022-12-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Zi Yan <ziy@nvidia.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Muchun Song <muchun.song@linux.dev> Cc: Richard Weinberger <richard@nod.at> Cc: <x86@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-05-11 17:48:04 -07:00
Matthew Wilcox (Oracle)	cb5b13cd6c	mm: introduce a common definition of mk_pte() Most architectures simply call pfn_pte(). Centralise that as the normal definition and remove the definition of mk_pte() from the architectures which have either that exact definition or something similar. Link: https://lkml.kernel.org/r/20250402181709.2386022-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> # s390 Cc: Zi Yan <ziy@nvidia.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Muchun Song <muchun.song@linux.dev> Cc: Richard Weinberger <richard@nod.at> Cc: <x86@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-05-11 17:48:02 -07:00
Ryan Roberts	5fdd05efa1	arm64/mm: Batch barriers when updating kernel mappings Because the kernel can't tolerate page faults for kernel mappings, when setting a valid, kernel space pte (or pmd/pud/p4d/pgd), it emits a dsb(ishst) to ensure that the store to the pgtable is observed by the table walker immediately. Additionally it emits an isb() to ensure that any already speculatively determined invalid mapping fault gets canceled. We can improve the performance of vmalloc operations by batching these barriers until the end of a set of entry updates. arch_enter_lazy_mmu_mode() and arch_leave_lazy_mmu_mode() provide the required hooks. vmalloc improves by up to 30% as a result. Two new TIF_ flags are created; TIF_LAZY_MMU tells us if the task is in the lazy mode and can therefore defer any barriers until exit from the lazy mode. TIF_LAZY_MMU_PENDING is used to remember if any pte operation was performed while in the lazy mode that required barriers. Then when leaving lazy mode, if that flag is set, we emit the barriers. Since arch_enter_lazy_mmu_mode() and arch_leave_lazy_mmu_mode() are used for both user and kernel mappings, we need the second flag to avoid emitting barriers unnecessarily if only user mappings were updated. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Link: https://lore.kernel.org/r/20250422081822.1836315-12-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-09 13:43:08 +01:00
Ryan Roberts	f89b399e8d	arm64/mm: Hoist barriers out of set_ptes_anysz() loop set_ptes_anysz() previously called __set_pte() for each PTE in the range, which would conditionally issue a DSB and ISB to make the new PTE value immediately visible to the table walker if the new PTE was valid and for kernel space. We can do better than this; let's hoist those barriers out of the loop so that they are only issued once at the end of the loop. We then reduce the cost by the number of PTEs in the range. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Link: https://lore.kernel.org/r/20250422081822.1836315-7-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-09 13:43:07 +01:00
Ryan Roberts	ef493d2343	arm64/mm: Refactor __set_ptes() and __ptep_get_and_clear() Refactor __set_ptes(), set_pmd_at() and set_pud_at() so that they are all a thin wrapper around a new common __set_ptes_anysz(), which takes pgsize parameter. Additionally, refactor __ptep_get_and_clear() and pmdp_huge_get_and_clear() to use a new common __ptep_get_and_clear_anysz() which also takes a pgsize parameter. These changes will permit the huge_pte API to efficiently batch-set pgtable entries and take advantage of the future barrier optimizations. Additionally since the new _anysz() helpers call the correct page_table_check__set() API based on pgsize, this means that huge_ptes will be able to get proper coverage. Currently the huge_pte API always uses the pte API which assumes an entry only covers a single page. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Link: https://lore.kernel.org/r/20250422081822.1836315-5-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-05-09 13:43:07 +01:00
Peter Xu	0fff2aa96f	arm64: mm: Drop dead code for pud special bit handling Keith Busch observed some incorrect macros defined in arm64 code [1]. It turns out the two lines should never be needed and won't be exposed to anyone, because aarch64 doesn't select HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD, hence ARCH_SUPPORTS_PUD_PFNMAP is always N. The only archs that support THP PUDs so far are x86 and powerpc. Instead of fixing the lines (with no way to test it..), remove the two lines that are in reality dead code, to avoid confusing readers. Fixes tag is attached to reflect where the wrong macros were introduced, but explicitly not copying stable, because there's no real issue to be fixed. So it's only about removing the dead code so far. [1] https://lore.kernel.org/all/Z9tDjOk-JdV_fCY4@kbusch-mbp.dhcp.thefacebook.com/#t Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Donald Dutile <ddutile@redhat.com> Cc: Will Deacon <will@kernel.org> Fixes: `3e509c9b03` ("mm/arm64: support large pfn mappings") Reported-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Donald Dutile <ddutile@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20250320183405.12659-1-peterx@redhat.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-03-28 19:21:18 +00:00
Catalin Marinas	8cc14fdcc1	Merge branches 'for-next/amuv1-avg-freq', 'for-next/pkey_unrestricted', 'for-next/sysreg', 'for-next/misc', 'for-next/pgtable-cleanups', 'for-next/kselftest', 'for-next/uaccess-mops', 'for-next/pie-poe-cleanup', 'for-next/cputype-kryo', 'for-next/cca-dma-address', 'for-next/drop-pxd_table_bit' and 'for-next/spectre-bhb-assume-vulnerable', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: perf/arm_cspmu: Fix missing io.h include perf/arm_cspmu: Add PMEVFILT2R support perf/arm_cspmu: Generalise event filtering perf/arm_cspmu: Move register definitons to header drivers/perf: apple_m1: Support host/guest event filtering drivers/perf: apple_m1: Refactor event select/filter configuration perf/dwc_pcie: fix duplicate pci_dev devices perf/dwc_pcie: fix some unreleased resources perf/arm-cmn: Minor event type housekeeping perf: arm_pmu: Move PMUv3-specific data perf: apple_m1: Don't disable counter in m1_pmu_enable_event() perf: arm_v7_pmu: Don't disable counter in (armv7\|krait_\|scorpion_)pmu_enable_event() perf: arm_v7_pmu: Drop obvious comments for enabling/disabling counters and interrupts perf: arm_pmuv3: Don't disable counter in armv8pmu_enable_event() perf: arm_pmu: Don't disable counter in armpmu_add() perf: arm_pmuv3: Call kvm_vcpu_pmu_resync_el0() before enabling counters perf: arm_pmuv3: Add support for ARM Rainier PMU * for-next/amuv1-avg-freq: : Add support for AArch64 AMUv1-based average freq arm64: Utilize for_each_cpu_wrap for reference lookup arm64: Update AMU-based freq scale factor on entering idle arm64: Provide an AMU-based version of arch_freq_get_on_cpu cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry cpufreq: Allow arch_freq_get_on_cpu to return an error arch_topology: init capacity_freq_ref to 0 * for-next/pkey_unrestricted: : mm/pkey: Add PKEY_UNRESTRICTED macro selftest/powerpc/mm/pkey: fix build-break introduced by commit `00894c3fc9` selftests/powerpc: Use PKEY_UNRESTRICTED macro selftests/mm: Use PKEY_UNRESTRICTED macro mm/pkey: Add PKEY_UNRESTRICTED macro * for-next/sysreg: : arm64 sysreg updates arm64/sysreg: Enforce whole word match for open/close tokens arm64/sysreg: Fix unbalanced closing block arm64/sysreg: Add register fields for HFGWTR2_EL2 arm64/sysreg: Add register fields for HFGRTR2_EL2 arm64/sysreg: Add register fields for HFGITR2_EL2 arm64/sysreg: Add register fields for HDFGWTR2_EL2 arm64/sysreg: Add register fields for HDFGRTR2_EL2 arm64/sysreg: Update register fields for ID_AA64MMFR0_EL1 * for-next/misc: : Miscellaneous arm64 patches arm64: mm: Don't use %pK through printk arm64/fpsimd: Remove unused declaration fpsimd_kvm_prepare() * for-next/pgtable-cleanups: : arm64 pgtable accessors cleanup arm64/mm: Define PTDESC_ORDER arm64/kernel: Always use level 2 or higher for early mappings arm64/hugetlb: Consistently use pud_sect_supported() arm64/mm: Convert __pte_to_phys() and __phys_to_pte_val() as functions * for-next/kselftest: : arm64 kselftest updates kselftest/arm64: mte: Skip the hugetlb tests if MTE not supported on such mappings kselftest/arm64: mte: Use the correct naming for tag check modes in check_hugetlb_options.c * for-next/uaccess-mops: : Implement the uaccess memory copy/set using MOPS instructions arm64: lib: Use MOPS for usercopy routines arm64: mm: Handle PAN faults on uaccess CPY* instructions arm64: extable: Add fixup handling for uaccess CPY* instructions * for-next/pie-poe-cleanup: : PIE/POE helpers cleanup arm64/sysreg: Move POR_EL0_INIT to asm/por.h arm64/sysreg: Rename POE_RXW to POE_RWX arm64/sysreg: Improve PIR/POR helpers * for-next/cputype-kryo: : Add cputype info for some Qualcomm Kryo cores arm64: cputype: Add comments about Qualcomm Kryo 5XX and 6XX cores arm64: cputype: Add QCOM_CPU_PART_KRYO_3XX_GOLD * for-next/cca-dma-address: : Fix DMA address for devices used in realms with Arm CCA arm64: realm: Use aliased addresses for device DMA to shared buffers dma: Introduce generic dma_addr_crypted helpers dma: Fix encryption bit clearing for dma_to_phys for-next/drop-pxd_table_bit: : Drop the arm64 PXD_TABLE_BIT (clean-up in preparation for 128-bit PTEs) arm64/mm: Drop PXD_TABLE_BIT arm64/mm: Check pmd_table() in pmd_trans_huge() arm64/mm: Check PUD_TYPE_TABLE in pud_bad() arm64/mm: Check PXD_TYPE_TABLE in [p4d\|pgd]_bad() arm64/mm: Clear PXX_TYPE_MASK and set PXD_TYPE_SECT in [pmd\|pud]_mkhuge() arm64/mm: Clear PXX_TYPE_MASK in mk_[pmd\|pud]_sect_prot() arm64/ptdump: Test PMD_TYPE_MASK for block mapping KVM: arm64: ptdump: Test PMD_TYPE_MASK for block mapping * for-next/spectre-bhb-assume-vulnerable: : Rework Spectre BHB mitigations to not assume "safe" arm64: errata: Add newer ARM cores to the spectre_bhb_loop_affected() lists arm64: cputype: Add MIDR_CORTEX_A76AE arm64: errata: Add KRYO 2XX/3XX/4XX silver cores to Spectre BHB safe list arm64: errata: Assume that unknown CPUs _are_ vulnerable to Spectre BHB arm64: errata: Add QCOM_KRYO_4XX_GOLD to the spectre_bhb_k24_list	2025-03-25 19:32:03 +00:00
Ryan Roberts	d1770e9098	arm64/mm: Check pmd_table() in pmd_trans_huge() Check for pmd_table() in pmd_trans_huge() rather then just checking for the PMD_TABLE_BIT. But ensure all present-invalid entries are handled correctly by always setting PTE_VALID before checking with pmd_table(). Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20250221044227.1145393-8-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-03-12 12:21:00 +00:00
Ryan Roberts	bfb1d2b902	arm64/mm: Check PUD_TYPE_TABLE in pud_bad() pud_bad() is currently defined in terms of pud_table(). Although for some configs, pud_table() is hard-coded to true i.e. when using 64K base pages or when page table levels are less than 3. pud_bad() is intended to check that the pud is configured correctly. Hence let's open-code the same check that the full version of pud_table() uses into pud_bad(). Then it always performs the check regardless of the config. Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20250221044227.1145393-7-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-03-12 12:21:00 +00:00
Anshuman Khandual	4fa8a9c0fc	arm64/mm: Check PXD_TYPE_TABLE in [p4d\|pgd]_bad() Check page table entries against PXD_TYPE_TABLE on PXD_TYPE_MASK mask bits in [p4d\|pgd]_bad() while determining a table entry instead of just checking only for PXD_TABLE_BIT. Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20250221044227.1145393-6-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-03-12 12:21:00 +00:00
Anshuman Khandual	1601df9e36	arm64/mm: Clear PXX_TYPE_MASK and set PXD_TYPE_SECT in [pmd\|pud]_mkhuge() Clear PXX_TYPE_MASK in [pmd\|pud]_mkhuge() while creating section mappings instead of just the PXX_TABLE_BIT and also set PXD_TYPE_SECT. Also ensure PTE_VALID does not get modified in these helpers, because present-invalid entries should preserve their state across. Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20250221044227.1145393-5-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-03-12 12:21:00 +00:00
Anshuman Khandual	dba9548010	arm64/mm: Clear PXX_TYPE_MASK in mk_[pmd\|pud]_sect_prot() Clear PXX_TYPE_MASK bits in mk_[pmd\|pud]_sect_prot() while creating section mappings instead of just clearing the PXX_TABLE_BIT. Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20250221044227.1145393-4-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-03-12 12:20:59 +00:00
Anshuman Khandual	2d7872f3ae	arm64/mm: Convert __pte_to_phys() and __phys_to_pte_val() as functions When CONFIG_ARM64_PA_BITS_52 is enabled, page table helpers __pte_to_phys() and __phys_to_pte_val() are functions which return phys_addr_t and pteval_t respectively as expected. But otherwise without this config being enabled, they are defined as macros and their return types are implicit. Until now this has worked out correctly as both pte_t and phys_addr_t data types have been 64 bits. But with the introduction of 128 bit page tables, pte_t becomes 128 bits. Hence this ends up with incorrect widths after the conversions, which leads to compiler warnings. Fix these warnings by converting __pte_to_phys() and __phys_to_pte_val() as functions instead where the return types are handled explicitly. Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20250227022412.2015835-1-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-03-05 18:16:15 +00:00
Will Deacon	602ffd4ce3	Merge branch 'for-next/mm' into for-next/core * for-next/mm: arm64: mm: Test for pmd_sect() in vmemmap_check_pmd() arm64/mm: Replace open encodings with PXD_TABLE_BIT arm64/mm: Rename pte_mkpresent() as pte_mkvalid() arm64: Kconfig: force ARM64_PAN=y when enabling TTBR0 sw PAN arm64/kvm: Avoid invalid physical addresses to signal owner updates arm64/kvm: Configure HYP TCR.PS/DS based on host stage1 arm64/mm: Override PARange for !LPA2 and use it consistently arm64/mm: Reduce PA space to 48 bits when LPA2 is not enabled	2025-01-17 13:52:33 +00:00
Anshuman Khandual	fe2169f556	arm64/mm: Replace open encodings with PXD_TABLE_BIT [pgd\|p4d]_bad() helpers have open encodings for their respective table bits which can be replaced with corresponding macros. This makes things clearer, thus improving their readability as well. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Link: https://lore.kernel.org/r/20250107015529.798319-1-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-01-07 16:47:45 +00:00
Anshuman Khandual	1692265830	arm64/mm: Rename pte_mkpresent() as pte_mkvalid() pte_present() is no longer synonymous with pte_valid() as it also tests for pte_present_invalid() as well. Hence pte_mkpresent() is misleading, because all that does is make an entry mapped, via setting PTE_VALID. Hence rename the helper as pte_mkvalid() which reflects its functionality appropriately. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250107023016.829416-1-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2025-01-07 16:47:33 +00:00
Zhu Jun	e281bd2299	arm64: asm: Fix typo in pgtable.h The word 'trasferring' is wrong, so fix it. Signed-off-by: Zhu Jun <zhujun2@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20241203093323.7831-1-zhujun2@cmss.chinamobile.com Signed-off-by: Will Deacon <will@kernel.org>	2024-12-10 11:34:45 +00:00
Anshuman Khandual	a0e33f528e	arm64/mm: Replace open encodings with PXD_TABLE_BIT [pgd\|p4d]_bad() helpers have open encodings for their respective table bits which can be replaced with corresponding macros. This makes things clearer, thus improving their readability as well. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Link: https://lore.kernel.org/r/20241202083850.73207-1-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2024-12-10 11:32:57 +00:00
Catalin Marinas	5a4332062e	Merge branches 'for-next/gcs', 'for-next/probes', 'for-next/asm-offsets', 'for-next/tlb', 'for-next/misc', 'for-next/mte', 'for-next/sysreg', 'for-next/stacktrace', 'for-next/hwcap3', 'for-next/kselftest', 'for-next/crc32', 'for-next/guest-cca', 'for-next/haft' and 'for-next/scs', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: perf: Switch back to struct platform_driver::remove() perf: arm_pmuv3: Add support for Samsung Mongoose PMU dt-bindings: arm: pmu: Add Samsung Mongoose core compatible perf/dwc_pcie: Fix typos in event names perf/dwc_pcie: Add support for Ampere SoCs ARM: pmuv3: Add missing write_pmuacr() perf/marvell: Marvell PEM performance monitor support perf/arm_pmuv3: Add PMUv3.9 per counter EL0 access control perf/dwc_pcie: Convert the events with mixed case to lowercase perf/cxlpmu: Support missing events in 3.1 spec perf: imx_perf: add support for i.MX91 platform dt-bindings: perf: fsl-imx-ddr: Add i.MX91 compatible drivers perf: remove unused field pmu_node * for-next/gcs: (42 commits) : arm64 Guarded Control Stack user-space support kselftest/arm64: Fix missing printf() argument in gcs/gcs-stress.c arm64/gcs: Fix outdated ptrace documentation kselftest/arm64: Ensure stable names for GCS stress test results kselftest/arm64: Validate that GCS push and write permissions work kselftest/arm64: Enable GCS for the FP stress tests kselftest/arm64: Add a GCS stress test kselftest/arm64: Add GCS signal tests kselftest/arm64: Add test coverage for GCS mode locking kselftest/arm64: Add a GCS test program built with the system libc kselftest/arm64: Add very basic GCS test program kselftest/arm64: Always run signals tests with GCS enabled kselftest/arm64: Allow signals tests to specify an expected si_code kselftest/arm64: Add framework support for GCS to signal handling tests kselftest/arm64: Add GCS as a detected feature in the signal tests kselftest/arm64: Verify the GCS hwcap arm64: Add Kconfig for Guarded Control Stack (GCS) arm64/ptrace: Expose GCS via ptrace and core files arm64/signal: Expose GCS state in signal frames arm64/signal: Set up and restore the GCS context for signal handlers arm64/mm: Implement map_shadow_stack() ... * for-next/probes: : Various arm64 uprobes/kprobes cleanups arm64: insn: Simulate nop instruction for better uprobe performance arm64: probes: Remove probe_opcode_t arm64: probes: Cleanup kprobes endianness conversions arm64: probes: Move kprobes-specific fields arm64: probes: Fix uprobes for big-endian kernels arm64: probes: Fix simulate_ldr_literal() arm64: probes: Remove broken LDR (literal) uprobe support for-next/asm-offsets: : arm64 asm-offsets.c cleanup (remove unused offsets) arm64: asm-offsets: remove PREEMPT_DISABLE_OFFSET arm64: asm-offsets: remove DMA_{TO,FROM}_DEVICE arm64: asm-offsets: remove VM_EXEC and PAGE_SZ arm64: asm-offsets: remove MM_CONTEXT_ID arm64: asm-offsets: remove COMPAT_{RT_,SIGFRAME_REGS_OFFSET arm64: asm-offsets: remove VMA_VM_* arm64: asm-offsets: remove TSK_ACTIVE_MM * for-next/tlb: : TLB flushing optimisations arm64: optimize flush tlb kernel range arm64: tlbflush: add __flush_tlb_range_limit_excess() * for-next/misc: : Miscellaneous patches arm64: tls: Fix context-switching of tpidrro_el0 when kpti is enabled arm64/ptrace: Clarify documentation of VL configuration via ptrace acpi/arm64: remove unnecessary cast arm64/mm: Change protval as 'pteval_t' in map_range() arm64: uprobes: Optimize cache flushes for xol slot acpi/arm64: Adjust error handling procedure in gtdt_parse_timer_block() arm64: fix .data.rel.ro size assertion when CONFIG_LTO_CLANG arm64/ptdump: Test both PTE_TABLE_BIT and PTE_VALID for block mappings arm64/mm: Sanity check PTE address before runtime P4D/PUD folding arm64/mm: Drop setting PTE_TYPE_PAGE in pte_mkcont() ACPI: GTDT: Tighten the check for the array of platform timer structures arm64/fpsimd: Fix a typo arm64: Expose ID_AA64ISAR1_EL1.XS to sanitised feature consumers arm64: Return early when break handler is found on linked-list arm64/mm: Re-organize arch_make_huge_pte() arm64/mm: Drop _PROT_SECT_DEFAULT arm64: Add command-line override for ID_AA64MMFR0_EL1.ECV arm64: head: Drop SWAPPER_TABLE_SHIFT arm64: cpufeature: add POE to cpucap_is_possible() arm64/mm: Change pgattr_change_is_safe() arguments as pteval_t * for-next/mte: : Various MTE improvements selftests: arm64: add hugetlb mte tests hugetlb: arm64: add mte support * for-next/sysreg: : arm64 sysreg updates arm64/sysreg: Update ID_AA64MMFR1_EL1 to DDI0601 2024-09 * for-next/stacktrace: : arm64 stacktrace improvements arm64: preserve pt_regs::stackframe during exec() arm64: stacktrace: unwind exception boundaries arm64: stacktrace: split unwind_consume_stack() arm64: stacktrace: report recovered PCs arm64: stacktrace: report source of unwind data arm64: stacktrace: move dump_backtrace() to kunwind_stack_walk() arm64: use a common struct frame_record arm64: pt_regs: swap 'unused' and 'pmr' fields arm64: pt_regs: rename "pmr_save" -> "pmr" arm64: pt_regs: remove stale big-endian layout arm64: pt_regs: assert pt_regs is a multiple of 16 bytes for-next/hwcap3: : Add AT_HWCAP3 support for arm64 (also wire up AT_HWCAP4) arm64: Support AT_HWCAP3 binfmt_elf: Wire up AT_HWCAP3 at AT_HWCAP4 * for-next/kselftest: (30 commits) : arm64 kselftest fixes/cleanups kselftest/arm64: Try harder to generate different keys during PAC tests kselftest/arm64: Don't leak pipe fds in pac.exec_sign_all() kselftest/arm64: Corrupt P0 in the irritator when testing SSVE kselftest/arm64: Add FPMR coverage to fp-ptrace kselftest/arm64: Expand the set of ZA writes fp-ptrace does kselftets/arm64: Use flag bits for features in fp-ptrace assembler code kselftest/arm64: Enable build of PAC tests with LLVM=1 kselftest/arm64: Check that SVCR is 0 in signal handlers kselftest/arm64: Fix printf() compiler warnings in the arm64 syscall-abi.c tests kselftest/arm64: Fix printf() warning in the arm64 MTE prctl() test kselftest/arm64: Fix printf() compiler warnings in the arm64 fp tests kselftest/arm64: Fix build with stricter assemblers kselftest/arm64: Test signal handler state modification in fp-stress kselftest/arm64: Provide a SIGUSR1 handler in the kernel mode FP stress test kselftest/arm64: Implement irritators for ZA and ZT kselftest/arm64: Remove unused ADRs from irritator handlers kselftest/arm64: Correct misleading comments on fp-stress irritators kselftest/arm64: Poll less often while waiting for fp-stress children kselftest/arm64: Increase frequency of signal delivery in fp-stress kselftest/arm64: Fix encoding for SVE B16B16 test ... * for-next/crc32: : Optimise CRC32 using PMULL instructions arm64/crc32: Implement 4-way interleave using PMULL arm64/crc32: Reorganize bit/byte ordering macros arm64/lib: Handle CRC-32 alternative in C code * for-next/guest-cca: : Support for running Linux as a guest in Arm CCA arm64: Document Arm Confidential Compute virt: arm-cca-guest: TSM_REPORT support for realms arm64: Enable memory encrypt for Realms arm64: mm: Avoid TLBI when marking pages as valid arm64: Enforce bounce buffers for realm DMA efi: arm64: Map Device with Prot Shared arm64: rsi: Map unprotected MMIO as decrypted arm64: rsi: Add support for checking whether an MMIO is protected arm64: realm: Query IPA size from the RMM arm64: Detect if in a realm and set RIPAS RAM arm64: rsi: Add RSI definitions * for-next/haft: : Support for arm64 FEAT_HAFT arm64: pgtable: Warn unexpected pmdp_test_and_clear_young() arm64: Enable ARCH_HAS_NONLEAF_PMD_YOUNG arm64: Add support for FEAT_HAFT arm64: setup: name 'tcr2' register arm64/sysreg: Update ID_AA64MMFR1_EL1 register * for-next/scs: : Dynamic shadow call stack fixes arm64/scs: Drop unused prototype __pi_scs_patch_vmlinux() arm64/scs: Deal with 64-bit relative offsets in FDE frames arm64/scs: Fix handling of DWARF augmentation data in CIE/FDE frames	2024-11-14 12:07:16 +00:00
Yicong Yang	b349a5a2b6	arm64: pgtable: Warn unexpected pmdp_test_and_clear_young() Young bit operation on PMD table entry is only supported if FEAT_HAFT enabled system wide. Add a warning for notifying the misbehaviour. Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20241102104235.62560-6-yangyicong@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2024-11-05 13:21:14 +00:00
Yicong Yang	62df5870eb	arm64: Enable ARCH_HAS_NONLEAF_PMD_YOUNG With the support of FEAT_HAFT, the NONLEAF_PMD_YOUNG can be enabled on arm64 since the hardware is capable of updating the AF flag for PMD table descriptor. Since the AF bit of the table descriptor shares the same bit position in block descriptors, we only need to implement arch_has_hw_nonleaf_pmd_young() and select related configs. The related pmd_young test/update operations keeps the same with and already implemented for transparent page support. Currently ARCH_HAS_NONLEAF_PMD_YOUNG is used to improve the efficiency of lru-gen aging. Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20241102104235.62560-5-yangyicong@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2024-11-05 13:21:14 +00:00

1 2 3 4 5 ...

376 Commits