38964 Commits

Author SHA1 Message Date
Linus Torvalds
32a92f8c89 Convert more 'alloc_obj' cases to default GFP_KERNEL arguments
This converts some of the visually simpler cases that have been split
over multiple lines.  I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.

Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script.  I probably had made it a bit _too_ trivial.

So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.

The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 20:03:00 -08:00
Linus Torvalds
323bbfcf1e Convert 'alloc_flex' family to use the new default GFP_KERNEL argument
This is the exact same thing as the 'alloc_obj()' version, only much
smaller because there are a lot fewer users of the *alloc_flex()
interface.

As with alloc_obj() version, this was done entirely with mindless brute
force, using the same script, except using 'flex' in the pattern rather
than 'objs*'.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Linus Torvalds
43257b2ebd Merge tag 'kmalloc_obj-prep-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull kmalloc_obj prep from Kees Cook:
 "Fixes for return types to prepare for the kmalloc_obj treewide
  conversion, that haven't yet appeared during the merge window:
  dm-crypt, dm-zoned, drm/msm, and arm64 kvm"

* tag 'kmalloc_obj-prep-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  KVM: arm64: vgic: Handle const qualifier from gic_kvm_info allocation type
  drm/msm: Adjust msm_iommu_pagetable_prealloc_allocate() allocation type
  dm: dm-zoned: Adjust dmz_load_mapping() allocation type
  dm-crypt: Adjust crypt_alloc_tfms_aead() allocation type
2026-02-20 12:51:07 -08:00
Linus Torvalds
a27a5c0f08 Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
 "Two arm64 fixes: one fixes a warning that started showing up with
  gcc 16 and the other fixes a lockup in udelay() when running on a
  vCPU loaded on a CPU with the new-fangled WFIT instruction:

   - Fix compiler warning from huge_pte_clear() with GCC 16

   - Fix hang in udelay() on systems with WFIT by consistently using the
     virtual counter to calculate the delta"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: hugetlbpage: avoid unused-but-set-parameter warning (gcc-16)
  arm64: Force the use of CNTVCT_EL0 in __delay()
2026-02-20 09:44:39 -08:00
Kees Cook
c732084c89 KVM: arm64: vgic: Handle const qualifier from gic_kvm_info allocation type
In preparation for making the kmalloc family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)

The assigned type is "struct gic_kvm_info", but the returned type,
while matching, is const qualified. To get them exactly matching, just
use the dereferenced pointer for the sizeof().

Link: https://patch.msgid.link/20260206223022.it.052-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-19 10:14:02 -08:00
Arnd Bergmann
729a2e8e9a arm64: hugetlbpage: avoid unused-but-set-parameter warning (gcc-16)
gcc-16 warns about an instance that older compilers did not:

arch/arm64/mm/hugetlbpage.c: In function 'huge_pte_clear':
arch/arm64/mm/hugetlbpage.c:369:57: error: parameter 'addr' set but not used [-Werror=unused-but-set-parameter=]

The issue here is that __pte_clear() does not actually use its second
argument, but when CONFIG_ARM64_CONTPTE is enabled it still gets
updated.

Replace the macro with an inline function to let the compiler see
the argument getting passed down.

Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2026-02-19 12:33:52 +00:00
Marc Zyngier
29cc0f3aa7 arm64: Force the use of CNTVCT_EL0 in __delay()
Quentin forwards a report from Hyesoo Yu, describing an interesting
problem with the use of WFxT in __delay() when a vcpu is loaded and
that KVM is *not* in VHE mode (either nVHE or hVHE).

In this case, CNTVOFF_EL2 is set to a non-zero value to reflect the
state of the guest virtual counter. At the same time, __delay() is
using get_cycles() to read the counter value, which is indirected to
reading CNTPCT_EL0.

The core of the issue is that WFxT is using the *virtual* counter,
while the kernel is using the physical counter, and that the offset
introduces a really bad discrepancy between the two.

Fix this by forcing the use of CNTVCT_EL0, making __delay() consistent
irrespective of the value of CNTVOFF_EL2.

Reported-by: Hyesoo Yu <hyesoo.yu@samsung.com>
Reported-by: Quentin Perret <qperret@google.com>
Reviewed-by: Quentin Perret <qperret@google.com>
Fixes: 7d26b0516a ("arm64: Use WFxT for __delay() when possible")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/ktosachvft2cgqd5qkukn275ugmhy6xrhxur4zqpdxlfr3qh5h@o3zrfnsq63od
Cc: stable@vger.kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2026-02-19 12:32:54 +00:00
Linus Torvalds
eeccf287a2 Merge tag 'mm-stable-2026-02-18-19-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM  updates from Andrew Morton:

 - "mm/vmscan: fix demotion targets checks in reclaim/demotion" fixes a
   couple of issues in the demotion code - pages were failed demotion
   and were finding themselves demoted into disallowed nodes (Bing Jiao)

 - "Remove XA_ZERO from error recovery of dup_mmap()" fixes a rare
   mapledtree race and performs a number of cleanups (Liam Howlett)

 - "mm: add bitmap VMA flag helpers and convert all mmap_prepare to use
   them" implements a lot of cleanups following on from the conversion
   of the VMA flags into a bitmap (Lorenzo Stoakes)

 - "support batch checking of references and unmapping for large folios"
   implements batching to greatly improve the performance of reclaiming
   clean file-backed large folios (Baolin Wang)

 - "selftests/mm: add memory failure selftests" does as claimed (Miaohe
   Lin)

* tag 'mm-stable-2026-02-18-19-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (36 commits)
  mm/page_alloc: clear page->private in free_pages_prepare()
  selftests/mm: add memory failure dirty pagecache test
  selftests/mm: add memory failure clean pagecache test
  selftests/mm: add memory failure anonymous page test
  mm: rmap: support batched unmapping for file large folios
  arm64: mm: implement the architecture-specific clear_flush_young_ptes()
  arm64: mm: support batch clearing of the young flag for large folios
  arm64: mm: factor out the address and ptep alignment into a new helper
  mm: rmap: support batched checks of the references for large folios
  tools/testing/vma: add VMA userland tests for VMA flag functions
  tools/testing/vma: separate out vma_internal.h into logical headers
  tools/testing/vma: separate VMA userland tests into separate files
  mm: make vm_area_desc utilise vma_flags_t only
  mm: update all remaining mmap_prepare users to use vma_flags_t
  mm: update shmem_[kernel]_file_*() functions to use vma_flags_t
  mm: update secretmem to use VMA flags on mmap_prepare
  mm: update hugetlbfs to use VMA flags on mmap_prepare
  mm: add basic VMA flag operation helper functions
  tools: bitmap: add missing bitmap_[subset(), andnot()]
  mm: add mk_vma_flags() bitmap flag macro helper
  ...
2026-02-18 20:50:32 -08:00
Linus Torvalds
cb5573868e Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
 "Loongarch:

   - Add more CPUCFG mask bits

   - Improve feature detection

   - Add lazy load support for FPU and binary translation (LBT) register
     state

   - Fix return value for memory reads from and writes to in-kernel
     devices

   - Add support for detecting preemption from within a guest

   - Add KVM steal time test case to tools/selftests

  ARM:

   - Add support for FEAT_IDST, allowing ID registers that are not
     implemented to be reported as a normal trap rather than as an UNDEF
     exception

   - Add sanitisation of the VTCR_EL2 register, fixing a number of
     UXN/PXN/XN bugs in the process

   - Full handling of RESx bits, instead of only RES0, and resulting in
     SCTLR_EL2 being added to the list of sanitised registers

   - More pKVM fixes for features that are not supposed to be exposed to
     guests

   - Make sure that MTE being disabled on the pKVM host doesn't give it
     the ability to attack the hypervisor

   - Allow pKVM's host stage-2 mappings to use the Force Write Back
     version of the memory attributes by using the "pass-through'
     encoding

   - Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the
     guest

   - Preliminary work for guest GICv5 support

   - A bunch of debugfs fixes, removing pointless custom iterators
     stored in guest data structures

   - A small set of FPSIMD cleanups

   - Selftest fixes addressing the incorrect alignment of page
     allocation

   - Other assorted low-impact fixes and spelling fixes

  RISC-V:

   - Fixes for issues discoverd by KVM API fuzzing in
     kvm_riscv_aia_imsic_has_attr(), kvm_riscv_aia_imsic_rw_attr(), and
     kvm_riscv_vcpu_aia_imsic_update()

   - Allow Zalasr, Zilsd and Zclsd extensions for Guest/VM

   - Transparent huge page support for hypervisor page tables

   - Adjust the number of available guest irq files based on MMIO
     register sizes found in the device tree or the ACPI tables

   - Add RISC-V specific paging modes to KVM selftests

   - Detect paging mode at runtime for selftests

  s390:

   - Performance improvement for vSIE (aka nested virtualization)

   - Completely new memory management. s390 was a special snowflake that
     enlisted help from the architecture's page table management to
     build hypervisor page tables, in particular enabling sharing the
     last level of page tables. This however was a lot of code (~3K
     lines) in order to support KVM, and also blocked several features.
     The biggest advantages is that the page size of userspace is
     completely independent of the page size used by the guest:
     userspace can mix normal pages, THPs and hugetlbfs as it sees fit,
     and in fact transparent hugepages were not possible before. It's
     also now possible to have nested guests and guests with huge pages
     running on the same host

   - Maintainership change for s390 vfio-pci

   - Small quality of life improvement for protected guests

  x86:

   - Add support for giving the guest full ownership of PMU hardware
     (contexted switched around the fastpath run loop) and allowing
     direct access to data MSRs and PMCs (restricted by the vPMU model).

     KVM still intercepts access to control registers, e.g. to enforce
     event filtering and to prevent the guest from profiling sensitive
     host state. This is more accurate, since it has no risk of
     contention and thus dropped events, and also has significantly less
     overhead.

     For more information, see the commit message for merge commit
     bf2c3138ae ("Merge tag 'kvm-x86-pmu-6.20' ...")

   - Disallow changing the virtual CPU model if L2 is active, for all
     the same reasons KVM disallows change the model after the first
     KVM_RUN

   - Fix a bug where KVM would incorrectly reject host accesses to PV
     MSRs when running with KVM_CAP_ENFORCE_PV_FEATURE_CPUID enabled,
     even if those were advertised as supported to userspace,

   - Fix a bug with protected guest state (SEV-ES/SNP and TDX) VMs,
     where KVM would attempt to read CR3 configuring an async #PF entry

   - Fail the build if EXPORT_SYMBOL_GPL or EXPORT_SYMBOL is used in KVM
     (for x86 only) to enforce usage of EXPORT_SYMBOL_FOR_KVM_INTERNAL.
     Only a few exports that are intended for external usage, and those
     are allowed explicitly

   - When checking nested events after a vCPU is unblocked, ignore
     -EBUSY instead of WARNing. Userspace can sometimes put the vCPU
     into what should be an impossible state, and spurious exit to
     userspace on -EBUSY does not really do anything to solve the issue

   - Also throw in the towel and drop the WARN on INIT/SIPI being
     blocked when vCPU is in Wait-For-SIPI, which also resulted in
     playing whack-a-mole with syzkaller stuffing architecturally
     impossible states into KVM

   - Add support for new Intel instructions that don't require anything
     beyond enumerating feature flags to userspace

   - Grab SRCU when reading PDPTRs in KVM_GET_SREGS2

   - Add WARNs to guard against modifying KVM's CPU caps outside of the
     intended setup flow, as nested VMX in particular is sensitive to
     unexpected changes in KVM's golden configuration

   - Add a quirk to allow userspace to opt-in to actually suppress EOI
     broadcasts when the suppression feature is enabled by the guest
     (currently limited to split IRQCHIP, i.e. userspace I/O APIC).
     Sadly, simply fixing KVM to honor Suppress EOI Broadcasts isn't an
     option as some userspaces have come to rely on KVM's buggy behavior
     (KVM advertises Supress EOI Broadcast irrespective of whether or
     not userspace I/O APIC supports Directed EOIs)

   - Clean up KVM's handling of marking mapped vCPU pages dirty

   - Drop a pile of *ancient* sanity checks hidden behind in KVM's
     unused ASSERT() macro, most of which could be trivially triggered
     by the guest and/or user, and all of which were useless

   - Fold "struct dest_map" into its sole user, "struct rtc_status", to
     make it more obvious what the weird parameter is used for, and to
     allow fropping these RTC shenanigans if CONFIG_KVM_IOAPIC=n

   - Bury all of ioapic.h, i8254.h and related ioctls (including
     KVM_CREATE_IRQCHIP) behind CONFIG_KVM_IOAPIC=y

   - Add a regression test for recent APICv update fixes

   - Handle "hardware APIC ISR", a.k.a. SVI, updates in
     kvm_apic_update_apicv() to consolidate the updates, and to
     co-locate SVI updates with the updates for KVM's own cache of ISR
     information

   - Drop a dead function declaration

   - Minor cleanups

  x86 (Intel):

   - Rework KVM's handling of VMCS updates while L2 is active to
     temporarily switch to vmcs01 instead of deferring the update until
     the next nested VM-Exit.

     The deferred updates approach directly contributed to several bugs,
     was proving to be a maintenance burden due to the difficulty in
     auditing the correctness of deferred updates, and was polluting
     "struct nested_vmx" with a growing pile of booleans

   - Fix an SGX bug where KVM would incorrectly try to handle EPCM page
     faults, and instead always reflect them into the guest. Since KVM
     doesn't shadow EPCM entries, EPCM violations cannot be due to KVM
     interference and can't be resolved by KVM

   - Fix a bug where KVM would register its posted interrupt wakeup
     handler even if loading kvm-intel.ko ultimately failed

   - Disallow access to vmcb12 fields that aren't fully supported,
     mostly to avoid weirdness and complexity for FRED and other
     features, where KVM wants enable VMCS shadowing for fields that
     conditionally exist

   - Print out the "bad" offsets and values if kvm-intel.ko refuses to
     load (or refuses to online a CPU) due to a VMCS config mismatch

  x86 (AMD):

   - Drop a user-triggerable WARN on nested_svm_load_cr3() failure

   - Add support for virtualizing ERAPS. Note, correct virtualization of
     ERAPS relies on an upcoming, publicly announced change in the APM
     to reduce the set of conditions where hardware (i.e. KVM) *must*
     flush the RAP

   - Ignore nSVM intercepts for instructions that are not supported
     according to L1's virtual CPU model

   - Add support for expedited writes to the fast MMIO bus, a la VMX's
     fastpath for EPT Misconfig

   - Don't set GIF when clearing EFER.SVME, as GIF exists independently
     of SVM, and allow userspace to restore nested state with GIF=0

   - Treat exit_code as an unsigned 64-bit value through all of KVM

   - Add support for fetching SNP certificates from userspace

   - Fix a bug where KVM would use vmcb02 instead of vmcb01 when
     emulating VMLOAD or VMSAVE on behalf of L2

   - Misc fixes and cleanups

  x86 selftests:

   - Add a regression test for TPR<=>CR8 synchronization and IRQ masking

   - Overhaul selftest's MMU infrastructure to genericize stage-2 MMU
     support, and extend x86's infrastructure to support EPT and NPT
     (for L2 guests)

   - Extend several nested VMX tests to also cover nested SVM

   - Add a selftest for nested VMLOAD/VMSAVE

   - Rework the nested dirty log test, originally added as a regression
     test for PML where KVM logged L2 GPAs instead of L1 GPAs, to
     improve test coverage and to hopefully make the test easier to
     understand and maintain

  guest_memfd:

   - Remove kvm_gmem_populate()'s preparation tracking and half-baked
     hugepage handling. SEV/SNP was the only user of the tracking and it
     can do it via the RMP

   - Retroactively document and enforce (for SNP) that
     KVM_SEV_SNP_LAUNCH_UPDATE and KVM_TDX_INIT_MEM_REGION require the
     source page to be 4KiB aligned, to avoid non-trivial complexity for
     something that no known VMM seems to be doing and to avoid an API
     special case for in-place conversion, which simply can't support
     unaligned sources

   - When populating guest_memfd memory, GUP the source page in common
     code and pass the refcounted page to the vendor callback, instead
     of letting vendor code do the heavy lifting. Doing so avoids a
     looming deadlock bug with in-place due an AB-BA conflict betwee
     mmap_lock and guest_memfd's filemap invalidate lock

  Generic:

   - Fix a bug where KVM would ignore the vCPU's selected address space
     when creating a vCPU-specific mapping of guest memory. Actually
     this bug could not be hit even on x86, the only architecture with
     multiple address spaces, but it's a bug nevertheless"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (267 commits)
  KVM: s390: Increase permitted SE header size to 1 MiB
  MAINTAINERS: Replace backup for s390 vfio-pci
  KVM: s390: vsie: Fix race in acquire_gmap_shadow()
  KVM: s390: vsie: Fix race in walk_guest_tables()
  KVM: s390: Use guest address to mark guest page dirty
  irqchip/riscv-imsic: Adjust the number of available guest irq files
  RISC-V: KVM: Transparent huge page support
  RISC-V: KVM: selftests: Add Zalasr extensions to get-reg-list test
  RISC-V: KVM: Allow Zalasr extensions for Guest/VM
  KVM: riscv: selftests: Add riscv vm satp modes
  KVM: riscv: selftests: add Zilsd and Zclsd extension to get-reg-list test
  riscv: KVM: allow Zilsd and Zclsd extensions for Guest/VM
  RISC-V: KVM: Skip IMSIC update if vCPU IMSIC state is not initialized
  RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_rw_attr()
  RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_has_attr()
  RISC-V: KVM: Remove unnecessary 'ret' assignment
  KVM: s390: Add explicit padding to struct kvm_s390_keyop
  KVM: LoongArch: selftests: Add steal time test case
  LoongArch: KVM: Add paravirt vcpu_is_preempted() support in guest side
  LoongArch: KVM: Add paravirt preempt feature in hypervisor side
  ...
2026-02-13 11:31:15 -08:00
Baolin Wang
07f440c23a arm64: mm: implement the architecture-specific clear_flush_young_ptes()
Implement the Arm64 architecture-specific clear_flush_young_ptes() to
enable batched checking of young flags and TLB flushing, improving
performance during large folio reclamation.

Performance testing:
Allocate 10G clean file-backed folios by mmap() in a memory cgroup, and
try to reclaim 8G file-backed folios via the memory.reclaim interface.  I
can observe 33% performance improvement on my Arm64 32-core server (and
10%+ improvement on my X86 machine).  Meanwhile, the hotspot
folio_check_references() dropped from approximately 35% to around 5%.

W/o patchset:
real	0m1.518s
user	0m0.000s
sys	0m1.518s

W/ patchset:
real	0m1.018s
user	0m0.000s
sys	0m1.018s

Link: https://lkml.kernel.org/r/ce749fbae3e900e733fa104a16fcb3ca9fe4f9bd.1770645603.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-12 15:43:00 -08:00
Baolin Wang
6f0e114217 arm64: mm: support batch clearing of the young flag for large folios
Currently, contpte_ptep_test_and_clear_young() and
contpte_ptep_clear_flush_young() only clear the young flag and flush TLBs
for PTEs within the contiguous range.  To support batch PTE operations for
other sized large folios in the following patches, adding a new parameter
to specify the number of PTEs that map consecutive pages of the same large
folio in a single VMA and a single page table.

While we are at it, rename the functions to maintain consistency with
other contpte_*() functions.

Link: https://lkml.kernel.org/r/5644250dcc0417278c266ad37118d27f541fd052.1770645603.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-12 15:43:00 -08:00
Baolin Wang
67d59bddfc arm64: mm: factor out the address and ptep alignment into a new helper
Factor out the contpte block's address and ptep alignment into a new
helper, and will be reused in the following patch.

No functional changes.

Link: https://lkml.kernel.org/r/8076d12cb244b2d9e91119b44dc6d5e4ad9c00af.1770645603.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-12 15:43:00 -08:00
Linus Torvalds
136114e0ab Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:

 - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves
   disk space by teaching ocfs2 to reclaim suballocator block group
   space (Heming Zhao)

 - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the
   ARRAY_END() macro and uses it in various places (Alejandro Colomar)

 - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes
   the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the
   page size (Pnina Feder)

 - "kallsyms: Prevent invalid access when showing module buildid" cleans
   up kallsyms code related to module buildid and fixes an invalid
   access crash when printing backtraces (Petr Mladek)

 - "Address page fault in ima_restore_measurement_list()" fixes a
   kexec-related crash that can occur when booting the second-stage
   kernel on x86 (Harshit Mogalapalli)

 - "kho: ABI headers and Documentation updates" updates the kexec
   handover ABI documentation (Mike Rapoport)

 - "Align atomic storage" adds the __aligned attribute to atomic_t and
   atomic64_t definitions to get natural alignment of both types on
   csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain)

 - "kho: clean up page initialization logic" simplifies the page
   initialization logic in kho_restore_page() (Pratyush Yadav)

 - "Unload linux/kernel.h" moves several things out of kernel.h and into
   more appropriate places (Yury Norov)

 - "don't abuse task_struct.group_leader" removes the usage of
   ->group_leader when it is "obviously unnecessary" (Oleg Nesterov)

 - "list private v2 & luo flb" adds some infrastructure improvements to
   the live update orchestrator (Pasha Tatashin)

* tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits)
  watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency
  procfs: fix missing RCU protection when reading real_parent in do_task_stat()
  watchdog/softlockup: fix sample ring index wrap in need_counting_irqs()
  kcsan, compiler_types: avoid duplicate type issues in BPF Type Format
  kho: fix doc for kho_restore_pages()
  tests/liveupdate: add in-kernel liveupdate test
  liveupdate: luo_flb: introduce File-Lifecycle-Bound global state
  liveupdate: luo_file: Use private list
  list: add kunit test for private list primitives
  list: add primitives for private list manipulations
  delayacct: fix uapi timespec64 definition
  panic: add panic_force_cpu= parameter to redirect panic to a specific CPU
  netclassid: use thread_group_leader(p) in update_classid_task()
  RDMA/umem: don't abuse current->group_leader
  drm/pan*: don't abuse current->group_leader
  drm/amd: kill the outdated "Only the pthreads threading model is supported" checks
  drm/amdgpu: don't abuse current->group_leader
  android/binder: use same_thread_group(proc->tsk, current) in binder_mmap()
  android/binder: don't abuse current->group_leader
  kho: skip memoryless NUMA nodes when reserving scratch areas
  ...
2026-02-12 12:13:01 -08:00
Linus Torvalds
4cff5c05e0 Merge tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:

 - "powerpc/64s: do not re-activate batched TLB flush" makes
   arch_{enter|leave}_lazy_mmu_mode() nest properly (Alexander Gordeev)

   It adds a generic enter/leave layer and switches architectures to use
   it. Various hacks were removed in the process.

 - "zram: introduce compressed data writeback" implements data
   compression for zram writeback (Richard Chang and Sergey Senozhatsky)

 - "mm: folio_zero_user: clear page ranges" adds clearing of contiguous
   page ranges for hugepages. Large improvements during demand faulting
   are demonstrated (David Hildenbrand)

 - "memcg cleanups" tidies up some memcg code (Chen Ridong)

 - "mm/damon: introduce {,max_}nr_snapshots and tracepoint for damos
   stats" improves DAMOS stat's provided information, deterministic
   control, and readability (SeongJae Park)

 - "selftests/mm: hugetlb cgroup charging: robustness fixes" fixes a few
   issues in the hugetlb cgroup charging selftests (Li Wang)

 - "Fix va_high_addr_switch.sh test failure - again" addresses several
   issues in the va_high_addr_switch test (Chunyu Hu)

 - "mm/damon/tests/core-kunit: extend existing test scenarios" improves
   the KUnit test coverage for DAMON (Shu Anzai)

 - "mm/khugepaged: fix dirty page handling for MADV_COLLAPSE" fixes a
   glitch in khugepaged which was causing madvise(MADV_COLLAPSE) to
   transiently return -EAGAIN (Shivank Garg)

 - "arch, mm: consolidate hugetlb early reservation" reworks and
   consolidates a pile of straggly code related to reservation of
   hugetlb memory from bootmem and creation of CMA areas for hugetlb
   (Mike Rapoport)

 - "mm: clean up anon_vma implementation" cleans up the anon_vma
   implementation in various ways (Lorenzo Stoakes)

 - "tweaks for __alloc_pages_slowpath()" does a little streamlining of
   the page allocator's slowpath code (Vlastimil Babka)

 - "memcg: separate private and public ID namespaces" cleans up the
   memcg ID code and prevents the internal-only private IDs from being
   exposed to userspace (Shakeel Butt)

 - "mm: hugetlb: allocate frozen gigantic folio" cleans up the
   allocation of frozen folios and avoids some atomic refcount
   operations (Kefeng Wang)

 - "mm/damon: advance DAMOS-based LRU sorting" improves DAMOS's movement
   of memory betewwn the active and inactive LRUs and adds auto-tuning
   of the ratio-based quotas and of monitoring intervals (SeongJae Park)

 - "Support page table check on PowerPC" makes
   CONFIG_PAGE_TABLE_CHECK_ENFORCED work on powerpc (Andrew Donnellan)

 - "nodemask: align nodes_and{,not} with underlying bitmap ops" makes
   nodes_and() and nodes_andnot() propagate the return values from the
   underlying bit operations, enabling some cleanup in calling code
   (Yury Norov)

 - "mm/damon: hide kdamond and kdamond_lock from API callers" cleans up
   some DAMON internal interfaces (SeongJae Park)

 - "mm/khugepaged: cleanups and scan limit fix" does some cleanup work
   in khupaged and fixes a scan limit accounting issue (Shivank Garg)

 - "mm: balloon infrastructure cleanups" goes to town on the balloon
   infrastructure and its page migration function. Mainly cleanups, also
   some locking simplification (David Hildenbrand)

 - "mm/vmscan: add tracepoint and reason for kswapd_failures reset" adds
   additional tracepoints to the page reclaim code (Jiayuan Chen)

 - "Replace wq users and add WQ_PERCPU to alloc_workqueue() users" is
   part of Marco's kernel-wide migration from the legacy workqueue APIs
   over to the preferred unbound workqueues (Marco Crivellari)

 - "Various mm kselftests improvements/fixes" provides various unrelated
   improvements/fixes for the mm kselftests (Kevin Brodsky)

 - "mm: accelerate gigantic folio allocation" greatly speeds up gigantic
   folio allocation, mainly by avoiding unnecessary work in
   pfn_range_valid_contig() (Kefeng Wang)

 - "selftests/damon: improve leak detection and wss estimation
   reliability" improves the reliability of two of the DAMON selftests
   (SeongJae Park)

 - "mm/damon: cleanup kdamond, damon_call(), damos filter and
   DAMON_MIN_REGION" does some cleanup work in the core DAMON code
   (SeongJae Park)

 - "Docs/mm/damon: update intro, modules, maintainer profile, and misc"
   performs maintenance work on the DAMON documentation (SeongJae Park)

 - "mm: add and use vma_assert_stabilised() helper" refactors and cleans
   up the core VMA code. The main aim here is to be able to use the mmap
   write lock's lockdep state to perform various assertions regarding
   the locking which the VMA code requires (Lorenzo Stoakes)

 - "mm, swap: swap table phase II: unify swapin use" removes some old
   swap code (swap cache bypassing and swap synchronization) which
   wasn't working very well. Various other cleanups and simplifications
   were made. The end result is a 20% speedup in one benchmark (Kairui
   Song)

 - "enable PT_RECLAIM on more 64-bit architectures" makes PT_RECLAIM
   available on 64-bit alpha, loongarch, mips, parisc, and um. Various
   cleanups were performed along the way (Qi Zheng)

* tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (325 commits)
  mm/memory: handle non-split locks correctly in zap_empty_pte_table()
  mm: move pte table reclaim code to memory.c
  mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE
  mm: convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config
  um: mm: enable MMU_GATHER_RCU_TABLE_FREE
  parisc: mm: enable MMU_GATHER_RCU_TABLE_FREE
  mips: mm: enable MMU_GATHER_RCU_TABLE_FREE
  LoongArch: mm: enable MMU_GATHER_RCU_TABLE_FREE
  alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE
  mm: change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h
  mm/damon/stat: remove __read_mostly from memory_idle_ms_percentiles
  zsmalloc: make common caches global
  mm: add SPDX id lines to some mm source files
  mm/zswap: use %pe to print error pointers
  mm/vmscan: use %pe to print error pointers
  mm/readahead: fix typo in comment
  mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file()
  mm: refactor vma_map_pages to use vm_insert_pages
  mm/damon: unify address range representation with damon_addr_range
  mm/cma: replace snprintf with strscpy in cma_new_area
  ...
2026-02-12 11:32:37 -08:00
Paolo Bonzini
bf2c3138ae Merge tag 'kvm-x86-pmu-6.20' of https://github.com/kvm-x86/linux into HEAD
KVM mediated PMU support for 6.20

Add support for mediated PMUs, where KVM gives the guest full ownership of PMU
hardware (contexted switched around the fastpath run loop) and allows direct
access to data MSRs and PMCs (restricted by the vPMU model), but intercepts
access to control registers, e.g. to enforce event filtering and to prevent the
guest from profiling sensitive host state.

To keep overall complexity reasonable, mediated PMU usage is all or nothing
for a given instance of KVM (controlled via module param).  The Mediated PMU
is disabled default, partly to maintain backwards compatilibity for existing
setup, partly because there are tradeoffs when running with a mediated PMU that
may be non-starters for some use cases, e.g. the host loses the ability to
profile guests with mediated PMUs, the fastpath run loop is also a blind spot,
entry/exit transitions are more expensive, etc.

Versus the emulated PMU, where KVM is "just another perf user", the mediated
PMU delivers more accurate profiling and monitoring (no risk of contention and
thus dropped events), with significantly less overhead (fewer exits and faster
emulation/programming of event selectors) E.g. when running Specint-2017 on
a single-socket Sapphire Rapids with 56 cores and no-SMT, and using perf from
within the guest:

  Perf command:
  a. basic-sampling: perf record -F 1000 -e 6-instructions  -a --overwrite
  b. multiplex-sampling: perf record -F 1000 -e 10-instructions -a --overwrite

  Guest performance overhead:
  ---------------------------------------------------------------------------
  | Test case          | emulated vPMU | all passthrough | passthrough with |
  |                    |               |                 | event filters    |
  ---------------------------------------------------------------------------
  | basic-sampling     |   33.62%      |    4.24%        |   6.21%          |
  ---------------------------------------------------------------------------
  | multiplex-sampling |   79.32%      |    7.34%        |   10.45%         |
  ---------------------------------------------------------------------------
2026-02-11 12:45:40 -05:00
Linus Torvalds
6589b3d76d Merge tag 'soc-dt-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC devicetree updates from Arnd Bergmann:
 "There are a handful of new SoCs this time, all of these are more or
  less related to chips in a wider family:

   - SpacemiT Key Stone K3 is an 8-core risc-v chip, and the first
     widely available RVA23 implementation. Note that this is entirely
     unrelated with the similarly named Texas Instruments K3 chip family
     that follwed the TI Keystone2 SoC.

   - The Realtek Kent family of SoCs contains three chip models
     rtd1501s, rtd1861b and rtd1920s, and is related to their earlier
     Set-top-box and NAS products such as rtd1619, but is built on newer
     Arm Cortex-A78 cores.

   - The Qualcomm Milos family includes the Snapdragon 7s Gen 3 (SM7635)
     mobile phone SoC built around Armv9 Kryo cores of the Arm
     Cortex-A720 generation. This one is used in the Fairphone Gen 6

   - Qualcomm Kaanapali is a new SoC based around eight high performance
     Oryon CPU cores

   - NXP i.MX8QP and i.MX952 are both feature reduced versions of chips
     we already support, i.e. the i.MX8QM and i.MX952, with fewer CPU
     cores and I/O interfaces.

  As part of a cleanup, a number of SoC specific devicetree files got
  removed because they did not have a single board using the .dtsi files
  and they were never compile tested as a result: Samsung s3c6400, ST
  spear320s, ST stm32mp21xc/stm32mp23xc/stm32mp25xc, Renesas
  r8a779m0/r8a779m2/r8a779m4/r8a779m6/r8a779m7/r8a779m8/r8a779mb/
  r9a07g044c1/r9a07g044l1/r9a07g054l1/r9a09g047e37, and TI
  am3703/am3715. All of these could be restored easily if a new board
  gets merged.

  Broadcom/Cavium/Marvell ThunderX2 gets removed along with its only
  machine, as all remaining users are assumed to be using ACPI based
  firmware.

  A relatively small number of 43 boards get added this time, and almost
  all of them for arm64. Aside from the reference boards for the newly
  added SoCs, this includes:

   - Three server boards use 32-bit ASpeed BMCs

   - One more reference board for 32-bit Microchip LAN9668

   - 64-bit Arm single-board computers based on Amlogic s905y4, CIX
     sky1, NXP ls1028a/imx8mn/imx8mp/imx91/imx93/imx95, Qualcomm
     qcs6490/qrb2210 and Rockchip rk3568/rk3588s

   - Carrier board for SOMs using Intel agilex5, Marvell Armada 7020,
     NXP iMX8QP, Mediatek mt8370/mt8390 and rockchip rk3588

   - Two mobile phones using Snapdragon 845

   - A gaming device and a NAS box, both based on Rockchips rk356x

  On top of the newly added boards and SoCs, there is a lot of
  background activity going into cleanups, in particular towards getting
  a warning-free dtc build, and the usual work on adding support for
  more hardware on the previously added machines"

* tag 'soc-dt-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (757 commits)
  dt-bindings: intel: Add Agilex eMMC support
  arm64: dts: socfpga: agilex: add emmc support
  arm64: dts: intel: agilex5: Add simple-bus node on top of dma controller node
  ARM: dts: socfpga: fix dtbs_check warning for fpga-region
  ARM: dts: socfpga: add #address-cells and #size-cells for sram node
  dt-bindings: altera: document syscon as fallback for sys-mgr
  arm64: dts: altera: Use lowercase hex
  dt-bindings: arm: altera: combine Intel's SoCFPGA into altera.yaml
  arm64: dts: socfpga: agilex5: Add IOMMUS property for ethernet nodes
  arm64: dts: socfpga: agilex5: add support for modular board
  dt-bindings: intel: Add Agilex5 SoCFPGA modular board
  arm64: dts: socfpga: agilex5: Add dma-coherent property
  arm64: dts: realtek: Add Kent SoC and EVB device trees
  dt-bindings: arm: realtek: Add Kent Soc family compatibles
  ARM: dts: samsung: Drop s3c6400.dtsi
  ARM: dts: nuvoton: Minor whitespace cleanup
  MAINTAINERS: Add Falcon DB
  arm64: dts: a7k: add COM Express boards
  ARM: dts: microchip: Drop usb_a9g20-dab-mmx.dtsi
  arm64: dts: rockchip: Fix rk3588 PCIe range mappings
  ...
2026-02-10 21:11:08 -08:00
Linus Torvalds
f7fae9b4d3 Merge tag 'soc-defconfig-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC defconfig updates from Arnd Bergmann:
 "These are the usual updates, enabling mode newly merged device drivers
  for various Arm and RISC-V based platforms in the defconfig files.

  The Renesas and NXP defconfig files also get a refresh for modified
  Kconfig options"

* tag 'soc-defconfig-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  riscv: defconfig: spacemit: k3: enable clock support
  ARM: defconfig: turn off CONFIG_EXPERT
  ARM: defconfig: move entries
  arm64: defconfig: Enable configurations for Kontron SMARC-sAM67
  ARM: imx_v4_v5_defconfig: update for v6.19-rc1
  arm64: defconfig: Enable Apple Silicon drivers
  arm64: select APPLE_PMGR_PWRSTATE for ARCH_APPLE
  arm64: defconfig: Enable Mediatek HDMIv2 driver
  ARM: shmobile: defconfig: Refresh for v6.19-rc1
  arm64: defconfig: Enable PCIe for the Renesas RZ/G3S SoC
  arm64: defconfig: Enable RZ/G3E USB3 PHY driver
  arm64: defconfig: Enable EC drivers for Qualcomm-based laptops
  arm64: defconfig: Enable options for Qualcomm Milos SoC
  ARM: imx_v6_v7_defconfig: enable EPD regulator needed for Kobo Clara 2e
  ARM: imx_v6_v7_defconfig: Configure CONFIG_SND_SOC_FSL_ASOC_CARD as module
  ARM: multi_v7_defconfig: enable DA9052 and MC13XXX
  arm64: defconfig: enable clocks, interconnect and pinctrl for Qualcomm Kaanapali
  arm64: defconfig: Drop duplicate CONFIG_OMAP_USB2 entry
  arm64: defconfig: Enable missing AMD/Xilinx drivers
2026-02-10 20:44:10 -08:00
Linus Torvalds
57cb845067 Merge tag 'x86_paravirt_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 paravirt updates from Borislav Petkov:

 - A nice cleanup to the paravirt code containing a unification of the
   paravirt clock interface, taming the include hell by splitting the
   pv_ops structure and removing of a bunch of obsolete code (Juergen
   Gross)

* tag 'x86_paravirt_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/paravirt: Use XOR r32,r32 to clear register in pv_vcpu_is_preempted()
  x86/paravirt: Remove trailing semicolons from alternative asm templates
  x86/pvlocks: Move paravirt spinlock functions into own header
  x86/paravirt: Specify pv_ops array in paravirt macros
  x86/paravirt: Allow pv-calls outside paravirt.h
  objtool: Allow multiple pv_ops arrays
  x86/xen: Drop xen_mmu_ops
  x86/xen: Drop xen_cpu_ops
  x86/xen: Drop xen_irq_ops
  x86/paravirt: Move pv_native_*() prototypes to paravirt.c
  x86/paravirt: Introduce new paravirt-base.h header
  x86/paravirt: Move paravirt_sched_clock() related code into tsc.c
  x86/paravirt: Use common code for paravirt_steal_clock()
  riscv/paravirt: Use common code for paravirt_steal_clock()
  loongarch/paravirt: Use common code for paravirt_steal_clock()
  arm64/paravirt: Use common code for paravirt_steal_clock()
  arm/paravirt: Use common code for paravirt_steal_clock()
  sched: Move clock related paravirt code to kernel/sched
  paravirt: Remove asm/paravirt_api_clock.h
  x86/paravirt: Move thunk macros to paravirt_types.h
  ...
2026-02-10 19:01:45 -08:00
Linus Torvalds
f1c538ca81 Merge tag 'timers-vdso-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull VDSO updates from Thomas Gleixner:

 - Provide the missing 64-bit variant of clock_getres()

   This allows the extension of CONFIG_COMPAT_32BIT_TIME to the vDSO and
   finally the removal of 32-bit time types from the kernel and UAPI.

 - Remove the useless and broken getcpu_cache from the VDSO

   The intention was to provide a trivial way to retrieve the CPU number
   from the VDSO, but as the VDSO data is per process there is no way to
   make it work.

 - Switch get/put_unaligned() from packed struct to memcpy()

   The packed struct violates strict aliasing rules which requires to
   pass -fno-strict-aliasing to the compiler. As this are scalar values
   __builtin_memcpy() turns them into simple loads and stores

 - Use __typeof_unqual__() for __unqual_scalar_typeof()

   The get/put_unaligned() changes triggered a new sparse warning when
   __beNN types are used with get/put_unaligned() as sparse builds add a
   special 'bitwise' attribute to them which prevents sparse to evaluate
   the Generic in __unqual_scalar_typeof().

   Newer sparse versions support __typeof_unqual__() which avoids the
   problem, but requires a recent sparse install. So this adds a sanity
   check to sparse builds, which validates that sparse is available and
   capable of handling it.

 - Force inline __cvdso_clock_getres_common()

   Compilers sometimes un-inline agressively, which results in function
   call overhead and problems with automatic stack variable
   initialization.

   Interestingly enough the force inlining results in smaller code than
   the un-inlined variant produced by GCC when optimizing for size.

* tag 'timers-vdso-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  vdso/gettimeofday: Force inlining of __cvdso_clock_getres_common()
  x86/percpu: Make CONFIG_USE_X86_SEG_SUPPORT work with sparse
  compiler: Use __typeof_unqual__() for __unqual_scalar_typeof()
  powerpc/vdso: Provide clock_getres_time64()
  tools headers: Remove unneeded ignoring of warnings in unaligned.h
  tools headers: Update the linux/unaligned.h copy with the kernel sources
  vdso: Switch get/put_unaligned() from packed struct to memcpy()
  parisc: Inline a type punning version of get_unaligned_le32()
  vdso: Remove struct getcpu_cache
  MIPS: vdso: Provide getres_time64() for 32-bit ABIs
  arm64: vdso32: Provide clock_getres_time64()
  ARM: VDSO: Provide clock_getres_time64()
  ARM: VDSO: Patch out __vdso_clock_getres() if unavailable
  x86/vdso: Provide clock_getres_time64() for x86-32
  selftests: vDSO: vdso_test_abi: Add test for clock_getres_time64()
  selftests: vDSO: vdso_test_abi: Use UAPI system call numbers
  selftests: vDSO: vdso_config: Add configurations for clock_getres_time64()
  vdso: Add prototype for __vdso_clock_getres_time64()
2026-02-10 17:02:23 -08:00
Linus Torvalds
dc855b7771 Merge tag 'irq-drivers-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq chip driver updates from Thomas Gleixner:

 - Add support for the Renesas RZ/V2N SoC

 - Add a new driver for the Renesas RZ/[TN]2H SoCs

 - Preserve the register state of the RISCV APLIC interrupt controller
   accross suspend/resume

 - Reinitialize the RISCV IMSIC registers after suspend/resume

 - Make the various Loongson interrupt chip drivers 32/64-bit aware

 - Handle the number of hardware interrupts in the SIFIVE PLIC driver
   correctly

   The hardware interrupt 0 is reserved which resulted in inconsistent
   accounting. That went unnoticed as the off by one is only noticable
   when the number of device interrupts is a multiple of 32

 - The usual device tree updates, cleanups and improvements all over the
   place

* tag 'irq-drivers-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  irqchip/gic-v5: Fix spelling mistake "ouside" -> "outside"
  dt-bindings: interrupt-controller: sifive,plic: Clarify the riscv,ndev meaning in PLIC
  irqchip/sifive-plic: Handle number of hardware interrupts correctly
  irqchip/aspeed-scu-ic: Remove unused variable mask
  irqchip/ti-sci-intr: Allow parsing interrupt-types per-line
  dt-bindings: interrupt-controller: ti,sci-intr: Per-line interrupt-types
  irqchip/renesas-rzv2h: Add suspend/resume support
  irqchip/aslint-sswi: Fix error check of of_io_request_and_map() result
  irqchip: Allow LoongArch irqchip drivers on both 32BIT/64BIT
  irqchip/loongson-pch-pic: Adjust irqchip driver for 32BIT/64BIT
  irqchip/loongson-pch-msi: Adjust irqchip driver for 32BIT/64BIT
  irqchip/loongson-htvec: Adjust irqchip driver for 32BIT/64BIT
  irqchip/loongson-eiointc: Adjust irqchip driver for 32BIT/64BIT
  irqchip/loongson-liointc: Adjust irqchip driver for 32BIT/64BIT
  irqchip/loongarch-avec: Adjust irqchip driver for 32BIT/64BIT
  irqchip/riscv-aplic: Preserve APLIC states across suspend/resume
  irqchip/riscv-imsic: Add a CPU pm notifier to restore the IMSIC on exit
  arm64: dts: renesas: r9a09g087: Add ICU support
  arm64: dts: renesas: r9a09g077: Add ICU support
  irqchip: Add RZ/{T2H,N2H} Interrupt Controller (ICU) driver
  ...
2026-02-10 14:01:40 -08:00
Linus Torvalds
36ae1c45b2 Merge tag 'sched-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "Scheduler Kconfig space updates:

   - Further consolidate configurable preemption modes (Peter Zijlstra)

     Reduce the number of architectures that are allowed to offer
     PREEMPT_NONE and PREEMPT_VOLUNTARY, reducing the number of
     preemption models from four to just two: 'full' and 'lazy' on
     up-to-date architectures (arm64, loongarch, powerpc, riscv, s390,
     x86).

     None and voluntary are only available as legacy features on
     platforms that don't implement lazy preemption yet, or which don't
     even support preemption.

     The goal is to eventually remove cond_resched() and voluntary
     preemption altogether.

  RSEQ based 'scheduler time slice extension' support (Thomas Gleixner
  and Peter Zijlstra):

  This allows a thread to request a time slice extension when it enters
  a critical section to avoid contention on a resource when the thread
  is scheduled out inside of the critical section.

   - Add fields and constants for time slice extension
   - Provide static branch for time slice extensions
   - Add statistics for time slice extensions
   - Add prctl() to enable time slice extensions
   - Implement sys_rseq_slice_yield()
   - Implement syscall entry work for time slice extensions
   - Implement time slice extension enforcement timer
   - Reset slice extension when scheduled
   - Implement rseq_grant_slice_extension()
   - entry: Hook up rseq time slice extension
   - selftests: Implement time slice extension test
   - Allow registering RSEQ with slice extension
   - Move slice_ext_nsec to debugfs
   - Lower default slice extension
   - selftests/rseq: Add rseq slice histogram script

  Scheduler performance/scalability improvements:

   - Update rq->avg_idle when a task is moved to an idle CPU, which
     improves the scalability of various workloads (Shubhang Kaushik)

   - Reorder fields in 'struct rq' for better caching (Blake Jones)

   - Fair scheduler SMP NOHZ balancing code speedups (Shrikanth Hegde):
      - Move checking for nohz cpus after time check
      - Change likelyhood of nohz.nr_cpus
      - Remove nohz.nr_cpus and use weight of cpumask instead

   - Avoid false sharing for sched_clock_irqtime (Wangyang Guo)

   - Cleanups (Yury Norov):
      - Drop useless cpumask_empty() in find_energy_efficient_cpu()
      - Simplify task_numa_find_cpu()
      - Use cpumask_weight_and() in sched_balance_find_dst_group()

  DL scheduler updates:

   - Add a deadline server for sched_ext tasks (by Andrea Righi and Joel
     Fernandes, with fixes by Peter Zijlstra)

  RT scheduler updates:

   - Skip currently executing CPU in rto_next_cpu() (Chen Jinghuang)

  Entry code updates and performance improvements (Jinjie Ruan)

  This is part of the scheduler tree in this cycle due to inter-
  dependencies with the RSEQ based time slice extension work:

    - Remove unused syscall argument from syscall_trace_enter()
    - Rework syscall_exit_to_user_mode_work() for architecture reuse
    - Add arch_ptrace_report_syscall_entry/exit()
    - Inline syscall_exit_work() and syscall_trace_enter()

  Scheduler core updates (Peter Zijlstra):

   - Rework sched_class::wakeup_preempt() and rq_modified_*()
   - Avoid rq->lock bouncing in sched_balance_newidle()
   - Rename rcu_dereference_check_sched_domain() =>
            rcu_dereference_sched_domain()
   - <linux/compiler_types.h>: Add the __signed_scalar_typeof() helper

  Fair scheduler updates/refactoring (Peter Zijlstra and Ingo Molnar):

   - Fold the sched_avg update
   - Change rcu_dereference_check_sched_domain() to rcu-sched
   - Switch to rcu_dereference_all()
   - Remove superfluous rcu_read_lock()
   - Limit hrtick work
   - Join two #ifdef CONFIG_FAIR_GROUP_SCHED blocks
   - Clean up comments in 'struct cfs_rq'
   - Separate se->vlag from se->vprot
   - Rename cfs_rq::avg_load to cfs_rq::sum_weight
   - Rename cfs_rq::avg_vruntime to ::sum_w_vruntime & helper functions
   - Introduce and use the vruntime_cmp() and vruntime_op() wrappers for
     wrapped-signed aritmetics
   - Sort out 'blocked_load*' namespace noise

  Scheduler debugging code updates:

   - Export hidden tracepoints to modules (Gabriele Monaco)

   - Convert copy_from_user() + kstrtouint() to kstrtouint_from_user()
     (Fushuai Wang)

   - Add assertions to QUEUE_CLASS (Peter Zijlstra)

   - hrtimer: Fix tracing oddity (Thomas Gleixner)

  Misc fixes and cleanups:

   - Re-evaluate scheduling when migrating queued tasks out of throttled
     cgroups (Zicheng Qu)

   - Remove task_struct->faults_disabled_mapping (Christoph Hellwig)

   - Fix math notation errors in avg_vruntime comment (Zhan Xusheng)

   - sched/cpufreq: Use %pe format for PTR_ERR() printing
     (zenghongling)"

* tag 'sched-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
  sched: Re-evaluate scheduling when migrating queued tasks out of throttled cgroups
  sched/cpufreq: Use %pe format for PTR_ERR() printing
  sched/rt: Skip currently executing CPU in rto_next_cpu()
  sched/clock: Avoid false sharing for sched_clock_irqtime
  selftests/sched_ext: Add test for DL server total_bw consistency
  selftests/sched_ext: Add test for sched_ext dl_server
  sched/debug: Fix dl_server (re)start conditions
  sched/debug: Add support to change sched_ext server params
  sched_ext: Add a DL server for sched_ext tasks
  sched/debug: Stop and start server based on if it was active
  sched/debug: Fix updating of ppos on server write ops
  sched/deadline: Clear the defer params
  entry: Inline syscall_exit_work() and syscall_trace_enter()
  entry: Add arch_ptrace_report_syscall_entry/exit()
  entry: Rework syscall_exit_to_user_mode_work() for architecture reuse
  entry: Remove unused syscall argument from syscall_trace_enter()
  sched: remove task_struct->faults_disabled_mapping
  sched: Update rq->avg_idle when a task is moved to an idle CPU
  selftests/rseq: Add rseq slice histogram script
  hrtimer: Fix trace oddity
  ...
2026-02-10 12:50:10 -08:00
Linus Torvalds
4d84667627 Merge tag 'perf-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance event updates from Ingo Molnar:
 "x86 PMU driver updates:

   - Add support for the core PMU for Intel Diamond Rapids (DMR) CPUs
     (Dapeng Mi)

     Compared to previous iterations of the Intel PMU code, there's been
     a lot of changes, which center around three main areas:

      - Introduce the OFF-MODULE RESPONSE (OMR) facility to replace the
        Off-Core Response (OCR) facility

      - New PEBS data source encoding layout

      - Support the new "RDPMC user disable" feature

   - Likewise, a large series adds uncore PMU support for Intel Diamond
     Rapids (DMR) CPUs (Zide Chen)

     This centers around these four main areas:

      - DMR may have two Integrated I/O and Memory Hub (IMH) dies,
        separate from the compute tile (CBB) dies. Each CBB and each IMH
        die has its own discovery domain.

      - Unlike prior CPUs that retrieve the global discovery table
        portal exclusively via PCI or MSR, DMR uses PCI for IMH PMON
        discovery and MSR for CBB PMON discovery.

      - DMR introduces several new PMON types: SCA, HAMVF, D2D_ULA, UBR,
        PCIE4, CRS, CPC, ITC, OTC, CMS, and PCIE6.

      - IIO free-running counters in DMR are MMIO-based, unlike SPR.

   - Also add support for Add missing PMON units for Intel Panther Lake,
     and support Nova Lake (NVL), which largely maps to Panther Lake.
     (Zide Chen)

   - KVM integration: Add support for mediated vPMUs (by Kan Liang and
     Sean Christopherson, with fixes and cleanups by Peter Zijlstra,
     Sandipan Das and Mingwei Zhang)

   - Add Intel cstate driver to support for Wildcat Lake (WCL) CPUs,
     which are a low-power variant of Panther Lake (Zide Chen)

   - Add core, cstate and MSR PMU support for the Airmont NP Intel CPU
     (aka MaxLinear Lightning Mountain), which maps to the existing
     Airmont code (Martin Schiller)

  Performance enhancements:

   - Speed up kexec shutdown by avoiding unnecessary cross CPU calls
     (Jan H. Schönherr)

   - Fix slow perf_event_task_exit() with LBR callstacks (Namhyung Kim)

  User-space stack unwinding support:

   - Various cleanups and refactorings in preparation to generalize the
     unwinding code for other architectures (Jens Remus)

  Uprobes updates:

   - Transition from kmap_atomic to kmap_local_page (Keke Ming)

   - Fix incorrect lockdep condition in filter_chain() (Breno Leitao)

   - Fix XOL allocation failure for 32-bit tasks (Oleg Nesterov)

  Misc fixes and cleanups:

   - s390: Remove kvm_types.h from Kbuild (Randy Dunlap)

   - x86/intel/uncore: Convert comma to semicolon (Chen Ni)

   - x86/uncore: Clean up const mismatch (Greg Kroah-Hartman)

   - x86/ibs: Fix typo in dc_l2tlb_miss comment (Xiang-Bin Shi)"

* tag 'perf-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
  s390: remove kvm_types.h from Kbuild
  uprobes: Fix incorrect lockdep condition in filter_chain()
  x86/ibs: Fix typo in dc_l2tlb_miss comment
  x86/uprobes: Fix XOL allocation failure for 32-bit tasks
  perf/x86/intel/uncore: Convert comma to semicolon
  perf/x86/intel: Add support for rdpmc user disable feature
  perf/x86: Use macros to replace magic numbers in attr_rdpmc
  perf/x86/intel: Add core PMU support for Novalake
  perf/x86/intel: Add support for PEBS memory auxiliary info field in NVL
  perf/x86/intel: Add core PMU support for DMR
  perf/x86/intel: Add support for PEBS memory auxiliary info field in DMR
  perf/x86/intel: Support the 4 new OMR MSRs introduced in DMR and NVL
  perf/core: Fix slow perf_event_task_exit() with LBR callstacks
  perf/core: Speed up kexec shutdown by avoiding unnecessary cross CPU calls
  uprobes: use kmap_local_page() for temporary page mappings
  arm/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol()
  mips/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol()
  arm64/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol()
  riscv/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol()
  perf/x86/intel/uncore: Add Nova Lake support
  ...
2026-02-10 12:00:46 -08:00
Linus Torvalds
f17b474e36 Merge tag 'bpf-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:

 - Support associating BPF program with struct_ops (Amery Hung)

 - Switch BPF local storage to rqspinlock and remove recursion detection
   counters which were causing false positives (Amery Hung)

 - Fix live registers marking for indirect jumps (Anton Protopopov)

 - Introduce execution context detection BPF helpers (Changwoo Min)

 - Improve verifier precision for 32bit sign extension pattern
   (Cupertino Miranda)

 - Optimize BTF type lookup by sorting vmlinux BTF and doing binary
   search (Donglin Peng)

 - Allow states pruning for misc/invalid slots in iterator loops (Eduard
   Zingerman)

 - In preparation for ASAN support in BPF arenas teach libbpf to move
   global BPF variables to the end of the region and enable arena kfuncs
   while holding locks (Emil Tsalapatis)

 - Introduce support for implicit arguments in kfuncs and migrate a
   number of them to new API. This is a prerequisite for cgroup
   sub-schedulers in sched-ext (Ihor Solodrai)

 - Fix incorrect copied_seq calculation in sockmap (Jiayuan Chen)

 - Fix ORC stack unwind from kprobe_multi (Jiri Olsa)

 - Speed up fentry attach by using single ftrace direct ops in BPF
   trampolines (Jiri Olsa)

 - Require frozen map for calculating map hash (KP Singh)

 - Fix lock entry creation in TAS fallback in rqspinlock (Kumar
   Kartikeya Dwivedi)

 - Allow user space to select cpu in lookup/update operations on per-cpu
   array and hash maps (Leon Hwang)

 - Make kfuncs return trusted pointers by default (Matt Bobrowski)

 - Introduce "fsession" support where single BPF program is executed
   upon entry and exit from traced kernel function (Menglong Dong)

 - Allow bpf_timer and bpf_wq use in all programs types (Mykyta
   Yatsenko, Andrii Nakryiko, Kumar Kartikeya Dwivedi, Alexei
   Starovoitov)

 - Make KF_TRUSTED_ARGS the default for all kfuncs and clean up their
   definition across the tree (Puranjay Mohan)

 - Allow BPF arena calls from non-sleepable context (Puranjay Mohan)

 - Improve register id comparison logic in the verifier and extend
   linked registers with negative offsets (Puranjay Mohan)

 - In preparation for BPF-OOM introduce kfuncs to access memcg events
   (Roman Gushchin)

 - Use CFI compatible destructor kfunc type (Sami Tolvanen)

 - Add bitwise tracking for BPF_END in the verifier (Tianci Cao)

 - Add range tracking for BPF_DIV and BPF_MOD in the verifier (Yazhou
   Tang)

 - Make BPF selftests work with 64k page size (Yonghong Song)

* tag 'bpf-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (268 commits)
  selftests/bpf: Fix outdated test on storage->smap
  selftests/bpf: Choose another percpu variable in bpf for btf_dump test
  selftests/bpf: Remove test_task_storage_map_stress_lookup
  selftests/bpf: Update task_local_storage/task_storage_nodeadlock test
  selftests/bpf: Update task_local_storage/recursion test
  selftests/bpf: Update sk_storage_omem_uncharge test
  bpf: Switch to bpf_selem_unlink_nofail in bpf_local_storage_{map_free, destroy}
  bpf: Support lockless unlink when freeing map or local storage
  bpf: Prepare for bpf_selem_unlink_nofail()
  bpf: Remove unused percpu counter from bpf_local_storage_map_free
  bpf: Remove cgroup local storage percpu counter
  bpf: Remove task local storage percpu counter
  bpf: Change local_storage->lock and b->lock to rqspinlock
  bpf: Convert bpf_selem_unlink to failable
  bpf: Convert bpf_selem_link_map to failable
  bpf: Convert bpf_selem_unlink_map to failable
  bpf: Select bpf_local_storage_map_bucket based on bpf_local_storage
  selftests/xsk: fix number of Tx frags in invalid packet
  selftests/xsk: properly handle batch ending in the middle of a packet
  bpf: Prevent reentrance into call_rcu_tasks_trace()
  ...
2026-02-10 11:26:21 -08:00
Linus Torvalds
13d83ea9d8 Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library updates from Eric Biggers:

 - Add support for verifying ML-DSA signatures.

   ML-DSA (Module-Lattice-Based Digital Signature Algorithm) is a
   recently-standardized post-quantum (quantum-resistant) signature
   algorithm. It was known as Dilithium pre-standardization.

   The first use case in the kernel will be module signing. But there
   are also other users of RSA and ECDSA signatures in the kernel that
   might want to upgrade to ML-DSA eventually.

 - Improve the AES library:

     - Make the AES key expansion and single block encryption and
       decryption functions use the architecture-optimized AES code.
       Enable these optimizations by default.

     - Support preparing an AES key for encryption-only, using about
       half as much memory as a bidirectional key.

     - Replace the existing two generic implementations of AES with a
       single one.

 - Simplify how Adiantum message hashing is implemented. Remove the
   "nhpoly1305" crypto_shash in favor of direct lib/crypto/ support for
   NH hashing, and enable optimizations by default.

* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (53 commits)
  lib/crypto: mldsa: Clarify the documentation for mldsa_verify() slightly
  lib/crypto: aes: Drop 'volatile' from aes_sbox and aes_inv_sbox
  lib/crypto: aes: Remove old AES en/decryption functions
  lib/crypto: aesgcm: Use new AES library API
  lib/crypto: aescfb: Use new AES library API
  crypto: omap - Use new AES library API
  crypto: inside-secure - Use new AES library API
  crypto: drbg - Use new AES library API
  crypto: crypto4xx - Use new AES library API
  crypto: chelsio - Use new AES library API
  crypto: ccp - Use new AES library API
  crypto: x86/aes-gcm - Use new AES library API
  crypto: arm64/ghash - Use new AES library API
  crypto: arm/ghash - Use new AES library API
  staging: rtl8723bs: core: Use new AES library API
  net: phy: mscc: macsec: Use new AES library API
  chelsio: Use new AES library API
  Bluetooth: SMP: Use new AES library API
  crypto: x86/aes - Remove the superseded AES-NI crypto_cipher
  lib/crypto: x86/aes: Add AES-NI optimization
  ...
2026-02-10 08:31:09 -08:00
Linus Torvalds
0c61526621 Merge tag 'efi-next-for-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:

 - Quirk the broken EFI framebuffer geometry on the Valve Steam Deck

 - Capture the EDID information of the primary display also on non-x86
   EFI systems when booting via the EFI stub.

* tag 'efi-next-for-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: Support EDID information
  sysfb: Move edid_info into sysfb_primary_display
  sysfb: Pass sysfb_primary_display to devices
  sysfb: Replace screen_info with sysfb_primary_display
  sysfb: Add struct sysfb_display_info
  efi: sysfb_efi: Reduce number of references to global screen_info
  efi: earlycon: Reduce number of references to global screen_info
  efi: sysfb_efi: Fix efidrmfb and simpledrmfb on Valve Steam Deck
  efi: sysfb_efi: Convert swap width and height quirk to a callback
  efi: sysfb_efi: Fix lfb_linelength calculation when applying quirks
  efi: sysfb_efi: Replace open coded swap with the macro
2026-02-09 20:49:19 -08:00
Linus Torvalds
45bf4bc87c Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
 "There's a little less than normal, probably due to LPC & Christmas/New
  Year meaning that a few series weren't quite ready or reviewed in
  time. It's still useful across the board, despite the only real
  feature being support for the LS64 feature enabling 64-byte atomic
  accesses to endpoints that support it.

  ACPI:
   - Add interrupt signalling support to the AGDI handler
   - Add Catalin and myself to the arm64 ACPI MAINTAINERS entry

  CPU features:
   - Drop Kconfig options for PAN and LSE (these are detected at runtime)
   - Add support for 64-byte single-copy atomic instructions (LS64/LS64V)
   - Reduce MTE overhead when executing in the kernel on Ampere CPUs
   - Ensure POR_EL0 value exposed via ptrace is up-to-date
   - Fix error handling on GCS allocation failure

  CPU frequency:
   - Add CPU hotplug support to the FIE setup in the AMU driver

  Entry code:
   - Minor optimisations and cleanups to the syscall entry path
   - Preparatory rework for moving to the generic syscall entry code

  Hardware errata:
   - Work around Spectre-BHB on TSV110 processors
   - Work around broken CMO propagation on some systems with the SI-L1
     interconnect

  Miscellaneous:
   - Disable branch profiling for arch/arm64/ to avoid issues with
     noinstr
   - Minor fixes and cleanups (kexec + ubsan, WARN_ONCE() instead of
     WARN_ON(), reduction of boolean expression)
   - Fix custom __READ_ONCE() implementation for LTO builds when
     operating on non-atomic types

  Perf and PMUs:
   - Support for CMN-600AE
   - Be stricter about supported hardware in the CMN driver
   - Support for DSU-110 and DSU-120
   - Support for the cycles event in the DSU driver (alongside the
     dedicated cycles counter)
   - Use IRQF_NO_THREAD instead of IRQF_ONESHOT in the cxlpmu driver
   - Use !bitmap_empty() as a faster alternative to bitmap_weight()
   - Fix SPE error handling when failing to resume profiling

  Selftests:
   - Add support for the FORCE_TARGETS option to the arm64 kselftests
   - Avoid nolibc-specific my_syscall() function
   - Add basic test for the LS64 HWCAP
   - Extend fp-pidbench to cover additional workload patterns"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (43 commits)
  perf/arm-cmn: Reject unsupported hardware configurations
  perf: arm_spe: Properly set hw.state on failures
  arm64/gcs: Fix error handling in arch_set_shadow_stack_status()
  arm64: Fix non-atomic __READ_ONCE() with CONFIG_LTO=y
  arm64: poe: fix stale POR_EL0 values for ptrace
  kselftest/arm64: Raise default number of loops in fp-pidbench
  kselftest/arm64: Add a no-SVE loop after SVE in fp-pidbench
  perf/cxlpmu: Replace IRQF_ONESHOT with IRQF_NO_THREAD
  arm64: mte: Set TCMA1 whenever MTE is present in the kernel
  arm64/ptrace: Return early for ptrace_report_syscall_entry() error
  arm64/ptrace: Split report_syscall()
  arm64: Remove unused _TIF_WORK_MASK
  kselftest/arm64: Add missing file in .gitignore
  arm64: errata: Workaround for SI L1 downstream coherency issue
  kselftest/arm64: Add HWCAP test for FEAT_LS64
  arm64: Add support for FEAT_{LS64, LS64_V}
  KVM: arm64: Enable FEAT_{LS64, LS64_V} in the supported guest
  arm64: Provide basic EL2 setup for FEAT_{LS64, LS64_V} usage at EL0/1
  KVM: arm64: Handle DABT caused by LS64* instructions on unsupported memory
  KVM: arm64: Add documentation for KVM_EXIT_ARM_LDST64B
  ...
2026-02-09 20:28:45 -08:00
Linus Torvalds
d16738a4e7 Merge tag 'kthread-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks
Pull kthread updates from Frederic Weisbecker:
 "The kthread code provides an infrastructure which manages the
  preferred affinity of unbound kthreads (node or custom cpumask)
  against housekeeping (CPU isolation) constraints and CPU hotplug
  events.

  One crucial missing piece is the handling of cpuset: when an isolated
  partition is created, deleted, or its CPUs updated, all the unbound
  kthreads in the top cpuset become indifferently affine to _all_ the
  non-isolated CPUs, possibly breaking their preferred affinity along
  the way.

  Solve this with performing the kthreads affinity update from cpuset to
  the kthreads consolidated relevant code instead so that preferred
  affinities are honoured and applied against the updated cpuset
  isolated partitions.

  The dispatch of the new isolated cpumasks to timers, workqueues and
  kthreads is performed by housekeeping, as per the nice Tejun's
  suggestion.

  As a welcome side effect, HK_TYPE_DOMAIN then integrates both the set
  from boot defined domain isolation (through isolcpus=) and cpuset
  isolated partitions. Housekeeping cpumasks are now modifiable with a
  specific RCU based synchronization. A big step toward making
  nohz_full= also mutable through cpuset in the future"

* tag 'kthread-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks: (33 commits)
  doc: Add housekeeping documentation
  kthread: Document kthread_affine_preferred()
  kthread: Comment on the purpose and placement of kthread_affine_node() call
  kthread: Honour kthreads preferred affinity after cpuset changes
  sched/arm64: Move fallback task cpumask to HK_TYPE_DOMAIN
  sched: Switch the fallback task allowed cpumask to HK_TYPE_DOMAIN
  kthread: Rely on HK_TYPE_DOMAIN for preferred affinity management
  kthread: Include kthreadd to the managed affinity list
  kthread: Include unbound kthreads in the managed affinity list
  kthread: Refine naming of affinity related fields
  PCI: Remove superfluous HK_TYPE_WQ check
  sched/isolation: Remove HK_TYPE_TICK test from cpu_is_isolated()
  cpuset: Remove cpuset_cpu_is_isolated()
  timers/migration: Remove superfluous cpuset isolation test
  cpuset: Propagate cpuset isolation update to timers through housekeeping
  cpuset: Propagate cpuset isolation update to workqueue through housekeeping
  PCI: Flush PCI probe workqueue on cpuset isolated partition change
  sched/isolation: Flush vmstat workqueues on cpuset isolated partition change
  sched/isolation: Flush memcg workqueues on cpuset isolated partition change
  cpuset: Update HK_TYPE_DOMAIN cpumask from cpuset
  ...
2026-02-09 19:57:30 -08:00
Paolo Bonzini
5490063269 Merge tag 'kvmarm-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 7.0

- Add support for FEAT_IDST, allowing ID registers that are not
  implemented to be reported as a normal trap rather than as an UNDEF
  exception.

- Add sanitisation of the VTCR_EL2 register, fixing a number of
  UXN/PXN/XN bugs in the process.

- Full handling of RESx bits, instead of only RES0, and resulting in
  SCTLR_EL2 being added to the list of sanitised registers.

- More pKVM fixes for features that are not supposed to be exposed to
  guests.

- Make sure that MTE being disabled on the pKVM host doesn't give it
  the ability to attack the hypervisor.

- Allow pKVM's host stage-2 mappings to use the Force Write Back
  version of the memory attributes by using the "pass-through'
  encoding.

- Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the
  guest.

- Preliminary work for guest GICv5 support.

- A bunch of debugfs fixes, removing pointless custom iterators stored
  in guest data structures.

- A small set of FPSIMD cleanups.

- Selftest fixes addressing the incorrect alignment of page
  allocation.

- Other assorted low-impact fixes and spelling fixes.
2026-02-09 18:18:19 +01:00
Marc Zyngier
6316366129 Merge branch kvm-arm64/misc-6.20 into kvmarm-master/next
* kvm-arm64/misc-6.20:
  : .
  : Misc KVM/arm64 changes for 6.20
  :
  : - Trivial FPSIMD cleanups
  :
  : - Calculate hyp VA size only once, avoiding potential mapping issues when
  :   VA bits is smaller than expected
  :
  : - Silence sparse warning for the HYP stack base
  :
  : - Fix error checking when handling FFA_VERSION
  :
  : - Add missing trap configuration for DBGWCR15_EL1
  :
  : - Don't try to deal with nested S2 when NV isn't enabled for a guest
  :
  : - Various spelling fixes
  : .
  KVM: arm64: nv: Avoid NV stage-2 code when NV is not supported
  KVM: arm64: Fix various comments
  KVM: arm64: nv: Add trap config for DBGWCR<15>_EL1
  KVM: arm64: Fix error checking for FFA_VERSION
  KVM: arm64: Fix missing <asm/stackpage/nvhe.h> include
  KVM: arm64: Calculate hyp VA size only once
  KVM: arm64: Remove ISB after writing FPEXC32_EL2
  KVM: arm64: Shuffle KVM_HOST_DATA_FLAG_* indices
  KVM: arm64: Fix comment in fpsimd_lazy_switch_to_host()

Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:17:58 +00:00
Marc Zyngier
1df3f01ebf Merge branch kvm-arm64/resx into kvmarm-master/next
* kvm-arm64/resx:
  : .
  : Add infrastructure to deal with the full gamut of RESx bits
  : for NV. As a result, it is now possible to have the expected
  : semantics for some bits such as SCTLR_EL2.SPAN.
  : .
  KVM: arm64: Add debugfs file dumping computed RESx values
  KVM: arm64: Add sanitisation to SCTLR_EL2
  KVM: arm64: Remove all traces of HCR_EL2.MIOCNCE
  KVM: arm64: Remove all traces of FEAT_TME
  KVM: arm64: Simplify handling of full register invalid constraint
  KVM: arm64: Get rid of FIXED_VALUE altogether
  KVM: arm64: Simplify handling of HCR_EL2.E2H RESx
  KVM: arm64: Move RESx into individual register descriptors
  KVM: arm64: Add RES1_WHEN_E2Hx constraints as configuration flags
  KVM: arm64: Add REQUIRES_E2H1 constraint as configuration flags
  KVM: arm64: Simplify FIXED_VALUE handling
  KVM: arm64: Convert HCR_EL2.RW to AS_RES1
  KVM: arm64: Correctly handle SCTLR_EL1 RES1 bits for unsupported features
  KVM: arm64: Allow RES1 bits to be inferred from configuration
  KVM: arm64: Inherit RESx bits from FGT register descriptors
  KVM: arm64: Extend unified RESx handling to runtime sanitisation
  KVM: arm64: Introduce data structure tracking both RES0 and RES1 bits
  KVM: arm64: Introduce standalone FGU computing primitive
  KVM: arm64: Remove duplicate configuration for SCTLR_EL1.{EE,E0E}
  arm64: Convert SCTLR_EL2 to sysreg infrastructure

Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:17:48 +00:00
Marc Zyngier
3ef5ba663a Merge branch kvm-arm64/debugfs-fixes into kvmarm-master/next
* kvm-arm64/debugfs-fixes:
  : .
  : Cleanup of the debugfs iterator, which are way more complicated
  : than they ought to be, courtesy of Fuad Tabba. From the cover letter:
  :
  : "This series refactors the debugfs implementations for `idregs` and
  :  `vgic-state` to use standard `seq_file` iterator patterns.
  :
  :  The existing implementations relied on storing iterator state within
  :  global VM structures (`kvm_arch` and `vgic_dist`). This approach
  :  prevented concurrent reads of the debugfs files (returning -EBUSY) and
  :  created improper dependencies between transient file operations and
  :  long-lived VM state."
  : .
  KVM: arm64: Use standard seq_file iterator for vgic-debug debugfs
  KVM: arm64: Reimplement vgic-debug XArray iteration
  KVM: arm64: Use standard seq_file iterator for idregs debugfs

Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:17:44 +00:00
Marc Zyngier
47e89febd3 Merge branch kvm-arm64/gicv5-prologue into kvmarm-master/next
* kvm-arm64/gicv5-prologue:
  : .
  : Prologue to GICv5 support, courtesy of Sascha Bischoff.
  :
  : This is preliminary work that sets the scene for the full-blow
  : support.
  : .
  irqchip/gic-v5: Check if impl is virt capable
  KVM: arm64: gic: Set vgic_model before initing private IRQs
  arm64/sysreg: Drop ICH_HFGRTR_EL2.ICC_HAPR_EL1 and make RES1
  KVM: arm64: gic-v3: Switch vGIC-v3 to use generated ICH_VMCR_EL2

Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:17:30 +00:00
Marc Zyngier
1c880ea3ef Merge branch kvm-arm64/gicv3-tdir-fixes into kvmarm-master/next
* kvm-arm64/gicv3-tdir-fixes:
  : .
  : Address two trapping-related issues when running legacy (i.e. GICv3)
  : guests on GICv5 hosts, courtesy of Sascha Bischoff.
  : .
  KVM: arm64: Correct test for ICH_HCR_EL2_TDIR cap for GICv5 hosts
  KVM: arm64: gic: Enable GICv3 CPUIF trapping on GICv5 hosts if required

Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:17:04 +00:00
Marc Zyngier
26e013b3fc Merge branch kvm-arm64/fwb-for-all into kvmarm-master/next
* kvm-arm64/fwb-for-all:
  : .
  : Allow pKVM's host stage-2 mappings to use the Force Write Back version
  : of the memory attributes by using the "pass-through' encoding.
  :
  : This avoids having two separate encodings for S2 on a given platform.
  : .
  KVM: arm64: Simplify PAGE_S2_MEMATTR
  KVM: arm64: Kill KVM_PGTABLE_S2_NOFWB
  KVM: arm64: Switch pKVM host S2 over to KVM_PGTABLE_S2_AS_S1
  KVM: arm64: Add KVM_PGTABLE_S2_AS_S1 flag
  arm64: Add MT_S2{,_FWB}_AS_S1 encodings

Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:16:42 +00:00
Marc Zyngier
183ac2b2ad Merge branch kvm-arm64/pkvm-no-mte into kvmarm-master/next
* kvm-arm64/pkvm-no-mte:
  : .
  : pKVM updates preventing the host from using MTE-related system
  : sysrem registers when the feature is disabled from the kernel
  : command-line (arm64.nomte), courtesy of Fuad Taba.
  :
  : From the cover letter:
  :
  : "If MTE is supported by the hardware (and is enabled at EL3), it remains
  : available to lower exception levels by default. Disabling it in the host
  : kernel (e.g., via 'arm64.nomte') only stops the kernel from advertising
  : the feature; it does not physically disable MTE in the hardware.
  :
  : The ability to disable MTE in the host kernel is used by some systems,
  : such as Android, so that the physical memory otherwise used as tag
  : storage can be used for other things (i.e. treated just like the rest of
  : memory). In this scenario, a malicious host could still access tags in
  : pages donated to a guest using MTE instructions (e.g., STG and LDG),
  : bypassing the kernel's configuration."
  : .
  KVM: arm64: Use kvm_has_mte() in pKVM trap initialization
  KVM: arm64: Inject UNDEF when accessing MTE sysregs with MTE disabled
  KVM: arm64: Trap MTE access and discovery when MTE is disabled
  KVM: arm64: Remove dead code resetting HCR_EL2 for pKVM

Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:16:31 +00:00
Marc Zyngier
edba407843 KVM: arm64: Add debugfs file dumping computed RESx values
Computing RESx values is hard. Verifying that they are correct is
harder. Add a debugfs file called "resx" that will dump all the RESx
values for a given VM.

I found it useful, maybe you will too.

Co-developed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-21-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:02:13 +00:00
Marc Zyngier
e8ef27900c KVM: arm64: Add sanitisation to SCTLR_EL2
Sanitise SCTLR_EL2 the usual way. The most important aspect of
this is that we benefit from SCTLR_EL2.SPAN being RES1 when
HCR_EL2.E2H==0.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-20-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:02:13 +00:00
Marc Zyngier
fb40cb15e8 KVM: arm64: Remove all traces of HCR_EL2.MIOCNCE
MIOCNCE had the potential to eat your data, and also was never
implemented by anyone. It's been retrospectively removed from
the architecture, and we're happy to follow that lead.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-19-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:02:13 +00:00
Marc Zyngier
d65bf6e317 KVM: arm64: Remove all traces of FEAT_TME
FEAT_TME has been dropped from the architecture. Retrospectively.
I'm sure someone is crying somewhere, but most of us won't.

Clean-up time.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-18-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:02:13 +00:00
Marc Zyngier
d784cfe697 KVM: arm64: Simplify handling of full register invalid constraint
Now that we embed the RESx bits in the register description, it becomes
easier to deal with registers that are simply not valid, as their
existence is not satisfied by the configuration (SCTLR2_ELx without
FEAT_SCTLR2, for example). Such registers essentially become RES0 for
any bit that wasn't already advertised as RESx.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-17-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:02:13 +00:00
Marc Zyngier
ab1f377b4c KVM: arm64: Get rid of FIXED_VALUE altogether
We have now killed every occurrences of FIXED_VALUE, and we can therefore
drop the whole infrastructure. Good riddance.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-16-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:02:13 +00:00
Marc Zyngier
f01e3429cf KVM: arm64: Simplify handling of HCR_EL2.E2H RESx
Now that we can link the RESx behaviour with the value of HCR_EL2.E2H,
we can trivially express the tautological constraint that makes E2H
a reserved value at all times.

Fun, isn't it?

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-15-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:02:13 +00:00
Marc Zyngier
d2f629aa75 KVM: arm64: Move RESx into individual register descriptors
Instead of hacking the RES1 bits at runtime, move them into the
register descriptors. This makes it significantly nicer.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-14-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:02:13 +00:00
Marc Zyngier
d406fcb203 KVM: arm64: Add RES1_WHEN_E2Hx constraints as configuration flags
"Thanks" to VHE, SCTLR_EL2 radically changes shape depending on the
value of HCR_EL2.E2H, as a lot of the bits that didn't have much
meaning with E2H=0 start impacting EL0 with E2H=1.

This has a direct impact on the RESx behaviour of these bits, and
we need a way to express them.

For this purpose, introduce two new constaints that, when the
controlling feature is not present, force the field to RES1 depending
on the value of E2H. Note that RES0 is still implicit,

This allows diverging RESx values depending on the value of E2H,
something that is required by a bunch of SCTLR_EL2 bits.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-13-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:02:12 +00:00
Marc Zyngier
ad90512f12 KVM: arm64: Add REQUIRES_E2H1 constraint as configuration flags
A bunch of EL2 configuration are very similar to their EL1 counterpart,
with the added constraint that HCR_EL2.E2H being 1.

For us, this means HCR_EL2.E2H being RES1, which is something we can
statically evaluate.

Add a REQUIRES_E2H1 constraint, which allows us to express conditions
in a much simpler way (without extra code). Existing occurrences are
converted, before we add a lot more.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-12-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:01:41 +00:00
Marc Zyngier
8d94458263 KVM: arm64: Simplify FIXED_VALUE handling
The FIXED_VALUE qualifier (mostly used for HCR_EL2) is pointlessly
complicated, as it tries to piggy-back on the previous RES0 handling
while being done in a different phase, on different data.

Instead, make it an integral part of the RESx computation, and allow
it to directly set RESx bits. This is much easier to understand.

It also paves the way for some additional changes to that will allow
the full removal of the FIXED_VALUE handling.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-11-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:01:41 +00:00
Marc Zyngier
fb86207bdc KVM: arm64: Convert HCR_EL2.RW to AS_RES1
Now that we have the AS_RES1 constraint, it becomes trivial to express
the HCR_EL2.RW behaviour.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-10-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:01:41 +00:00
Marc Zyngier
c27b8b7aab KVM: arm64: Correctly handle SCTLR_EL1 RES1 bits for unsupported features
A bunch of SCTLR_EL1 bits must be set to RES1 when the controlling
feature is not present. Add the AS_RES1 qualifier where needed.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-9-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:01:41 +00:00