linux-cryptodev-2.6

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git synced 2026-04-27 03:49:57 -04:00

Author	SHA1	Message	Date
Lu Baolu	e37d5a2d60	iommu/sva: invalidate stale IOTLB entries for kernel address space Introduce a new IOMMU interface to flush IOTLB paging cache entries for the CPU kernel address space. This interface is invoked from the x86 architecture code that manages combined user and kernel page tables, specifically before any kernel page table page is freed and reused. This addresses the main issue with vfree() which is a common occurrence and can be triggered by unprivileged users. While this resolves the primary problem, it doesn't address some extremely rare case related to memory unplug of memory that was present as reserved memory at boot, which cannot be triggered by unprivileged users. The discussion can be found at the link below. Enable SVA on x86 architecture since the IOMMU can now receive notification to flush the paging cache before freeing the CPU kernel page table pages. Link: https://lkml.kernel.org/r/20251022082635.2462433-9-baolu.lu@linux.intel.com Link: https://lore.kernel.org/linux-iommu/04983c62-3b1d-40d4-93ae-34ca04b827e5@intel.com/ Co-developed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Suggested-by: Jann Horn <jannh@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robin Murohy <robin.murphy@arm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yi Lai <yi1.lai@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-16 17:28:18 -08:00
Lu Baolu	72f98ef9a4	iommu: disable SVA when CONFIG_X86 is set Patch series "Fix stale IOTLB entries for kernel address space", v7. This proposes a fix for a security vulnerability related to IOMMU Shared Virtual Addressing (SVA). In an SVA context, an IOMMU can cache kernel page table entries. When a kernel page table page is freed and reallocated for another purpose, the IOMMU might still hold stale, incorrect entries. This can be exploited to cause a use-after-free or write-after-free condition, potentially leading to privilege escalation or data corruption. This solution introduces a deferred freeing mechanism for kernel page table pages, which provides a safe window to notify the IOMMU to invalidate its caches before the page is reused. This patch (of 8): In the IOMMU Shared Virtual Addressing (SVA) context, the IOMMU hardware shares and walks the CPU's page tables. The x86 architecture maps the kernel's virtual address space into the upper portion of every process's page table. Consequently, in an SVA context, the IOMMU hardware can walk and cache kernel page table entries. The Linux kernel currently lacks a notification mechanism for kernel page table changes, specifically when page table pages are freed and reused. The IOMMU driver is only notified of changes to user virtual address mappings. This can cause the IOMMU's internal caches to retain stale entries for kernel VA. Use-After-Free (UAF) and Write-After-Free (WAF) conditions arise when kernel page table pages are freed and later reallocated. The IOMMU could misinterpret the new data as valid page table entries. The IOMMU might then walk into attacker-controlled memory, leading to arbitrary physical memory DMA access or privilege escalation. This is also a Write-After-Free issue, as the IOMMU will potentially continue to write Accessed and Dirty bits to the freed memory while attempting to walk the stale page tables. Currently, SVA contexts are unprivileged and cannot access kernel mappings. However, the IOMMU will still walk kernel-only page tables all the way down to the leaf entries, where it realizes the mapping is for the kernel and errors out. This means the IOMMU still caches these intermediate page table entries, making the described vulnerability a real concern. Disable SVA on x86 architecture until the IOMMU can receive notification to flush the paging cache before freeing the CPU kernel page table pages. Link: https://lkml.kernel.org/r/20251022082635.2462433-1-baolu.lu@linux.intel.com Link: https://lkml.kernel.org/r/20251022082635.2462433-2-baolu.lu@linux.intel.com Fixes: `26b25a2b98` ("iommu: Bind process address spaces to devices") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robin Murohy <robin.murphy@arm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Vasant Hegde <vasant.hegde@amd.com> Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yi Lai <yi1.lai@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-16 17:28:16 -08:00
Jason Gunthorpe	afb47765f9	iommufd: Make vfio_compat's unmap succeed if the range is already empty iommufd returns ENOENT when attempting to unmap a range that is already empty, while vfio type1 returns success. Fix vfio_compat to match. Fixes: `d624d6652a` ("iommufd: vfio container FD ioctl compatibility") Link: https://patch.msgid.link/r/0-v1-76be45eff0be+5d-iommufd_unmap_compat_jgg@nvidia.com Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Alex Mastro <amastro@fb.com> Reported-by: Alex Mastro <amastro@fb.com> Closes: https://lore.kernel.org/r/aP0S5ZF9l3sWkJ1G@devgpu012.nha5.facebook.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-11-05 15:11:26 -04:00
Jason Gunthorpe	cb30dfa75d	iommufd: Don't overflow during division for dirty tracking If pgshift is 63 then BITS_PER_TYPE(bitmap->bitmap) pgsize will overflow to 0 and this triggers divide by 0. In this case the index should just be 0, so reorganize things to divide by shift and avoid hitting any overflows. Link: https://patch.msgid.link/r/0-v1-663679b57226+172-iommufd_dirty_div0_jgg@nvidia.com Cc: stable@vger.kernel.org Fixes: `58ccf0190d` ("vfio: Add an IOVA bitmap support") Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reported-by: syzbot+093a8a8b859472e6c257@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=093a8a8b859472e6c257 Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-10-20 19:58:37 -03:00
Linus Torvalds	e56ebe27a0	Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "Two minor fixes: - Make the selftest work again on x86 platforms with iommus enabled - Fix a compiler warning in the userspace kselftest" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd: Register iommufd mock devices with fwspec iommu/selftest: prevent use of uninitialized variable	2025-10-03 18:18:48 -07:00
Linus Torvalds	bed0653fe2	Merge tag 'iommu-updates-v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Joerg Roedel: - Inte VT-d: - IOMMU driver updated to the latest VT-d specification - Don't enable PRS if PDS isn't supported - Replace snprintf with scnprintf - Fix legacy mode page table dump through debugfs - Miscellaneous cleanups - AMD-Vi: - Support kdump boot when SNP is enabled - Apple-DART: - 4-level page-table support - RISC-V IOMMU: - ACPI support - Small number of miscellaneous cleanups and fixes * tag 'iommu-updates-v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (22 commits) iommu/vt-d: Disallow dirty tracking if incoherent page walk iommu/vt-d: debugfs: Avoid dumping context command register iommu/vt-d: Removal of Advanced Fault Logging iommu/vt-d: PRS isn't usable if PDS isn't supported iommu/vt-d: Remove LPIG from page group response descriptor iommu/vt-d: Drop unused cap_super_offset() iommu/vt-d: debugfs: Fix legacy mode page table dump logic iommu/vt-d: Replace snprintf with scnprintf in dmar_latency_snapshot() iommu/io-pgtable-dart: Fix off by one error in table index check iommu/riscv: Add ACPI support ACPI: scan: Add support for RISC-V in acpi_iommu_configure_id() ACPI: RISC-V: Add support for RIMT iommu/omap: Use int type to store negative error codes iommu/apple-dart: Clear stream error indicator bits for T8110 DARTs iommu/amd: Skip enabling command/event buffers for kdump crypto: ccp: Skip SEV and SNP INIT for kdump boot iommu/amd: Reuse device table for kdump iommu/amd: Add support to remap/unmap IOMMU buffers for kdump iommu/apple-dart: Add 4-level page table support iommu/io-pgtable-dart: Add 4-level page table support ...	2025-10-03 18:00:11 -07:00
Linus Torvalds	a498d59c46	Merge tag 'dma-mapping-6.18-2025-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping updates from Marek Szyprowski: - Refactoring of DMA mapping API to physical addresses as the primary interface instead of page+offset parameters This gets much closer to Matthew Wilcox's long term wish for struct-pageless IO to cacheable DRAM and is supporting memdesc project which seeks to substantially transform how struct page works. An advantage of this approach is the possibility of introducing DMA_ATTR_MMIO, which covers existing 'dma_map_resource' flow in the common paths, what in turn lets to use recently introduced dma_iova_link() API to map PCI P2P MMIO without creating struct page Developped by Leon Romanovsky and Jason Gunthorpe - Minor clean-up by Petr Tesarik and Qianfeng Rong * tag 'dma-mapping-6.18-2025-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: kmsan: fix missed kmsan_handle_dma() signature conversion mm/hmm: properly take MMIO path mm/hmm: migrate to physical address-based DMA mapping API dma-mapping: export new dma_map_phys() interface xen: swiotlb: Open code map_resource callback dma-mapping: implement DMA_ATTR_MMIO for dma_(un)map_page_attrs() kmsan: convert kmsan_handle_dma to use physical addresses dma-mapping: convert dma_direct_map_page to be phys_addr_t based iommu/dma: implement DMA_ATTR_MMIO for iommu_dma_(un)map_phys() iommu/dma: rename iommu_dma_map_page to iommu_dma_map_phys dma-mapping: rename trace_dma_map_page to trace_dma_map_phys dma-debug: refactor to use physical addresses for page mapping iommu/dma: implement DMA_ATTR_MMIO for dma_iova_link(). dma-mapping: introduce new DMA attribute to indicate MMIO memory swiotlb: Remove redundant __GFP_NOWARN dma-direct: clean up the logic in __dma_direct_alloc_pages()	2025-10-03 17:41:12 -07:00
Guixin Liu	2a918911ed	iommufd: Register iommufd mock devices with fwspec Since the bus ops were retired the iommu subsystem changed to using fwspec to match the iommu driver to the iommu device. If a device has a NULL fwspec then it is matched to the first iommu driver with a NULL fwspec, effectively disabling support for systems with more than one non-fwspec iommu driver. Thus, if the iommufd selfest are run in an x86 system that registers a non-fwspec iommu driver they fail to bind their mock devices to the mock iommu driver. Fix this by allocating a software fwnode for mock iommu driver's iommu_device, and set it to the device which mock iommu driver created. This is done by adding a new helper iommu_mock_device_add() which abuses the internals of the fwspec system to establish a fwspec before the device is added and is careful not to leak it. A matching dummy fwspec is automatically added to the mock iommu driver. Test by "make -C toosl/testing/selftests TARGETS=iommu run_tests": PASSED: 229 / 229 tests passed. In addition, this issue is also can be found on amd platform, and also tested on a amd machine. Link: https://patch.msgid.link/r/20250925054730.3877-1-kanie@linux.alibaba.com Fixes: `17de3f5fdd` ("iommu: Retire bus ops") Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Qinyun Tan <qinyuntan@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-09-30 09:54:12 -03:00
Joerg Roedel	5f4b8c03f4	Merge branches 'apple/dart', 'ti/omap', 'riscv', 'intel/vt-d' and 'amd/amd-vi' into next	2025-09-26 10:03:33 +02:00
Lu Baolu	57f55048e5	iommu/vt-d: Disallow dirty tracking if incoherent page walk Dirty page tracking relies on the IOMMU atomically updating the dirty bit in the paging-structure entry. For this operation to succeed, the paging- structure memory must be coherent between the IOMMU and the CPU. In another word, if the iommu page walk is incoherent, dirty page tracking doesn't work. The Intel VT-d specification, Section 3.10 "Snoop Behavior" states: "Remapping hardware encountering the need to atomically update A/EA/D bits in a paging-structure entry that is not snooped will result in a non- recoverable fault." To prevent an IOMMU from being incorrectly configured for dirty page tracking when it is operating in an incoherent mode, mark SSADS as supported only when both ecap_slads and ecap_smpwc are supported. Fixes: `f35f22cc76` ("iommu/vt-d: Access/Dirty bit support for SS domains") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250924083447.123224-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-26 10:02:26 +02:00
Linus Torvalds	d4c7fccfa7	Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd fixes from Jason Gunthorpe: "Fix two user triggerable use-after-free issues: - Possible race UAF setting up mmaps - Syzkaller found UAF when erroring an file descriptor creation ioctl due to the fput() work queue" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd/selftest: Update the fail_nth limit iommufd: WARN if an object is aborted with an elevated refcount iommufd: Fix race during abort for file descriptors iommufd: Fix refcounting race during mmap	2025-09-22 11:16:14 -07:00
Jason Gunthorpe	53d0584eeb	iommufd: WARN if an object is aborted with an elevated refcount If something holds a refcount then it is at risk of UAFing. For abort paths we expect the caller to never share the object with a parallel thread and to clean up any refcounts it obtained on its own. Add the missing dec inside iommufd_hwpt_paging_alloc() during error unwind by making iommufd_hw_pagetable_attach/detach() proper pairs. Link: https://patch.msgid.link/r/2-v1-02cd136829df+31-iommufd_syz_fput_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-09-19 10:34:49 -03:00
Jason Gunthorpe	4e034bf045	iommufd: Fix race during abort for file descriptors fput() doesn't actually call file_operations release() synchronously, it puts the file on a work queue and it will be released eventually. This is normally fine, except for iommufd the file and the iommufd_object are tied to gether. The file has the object as it's private_data and holds a users refcount, while the object is expected to remain alive as long as the file is. When the allocation of a new object aborts before installing the file it will fput() the file and then go on to immediately kfree() the obj. This causes a UAF once the workqueue completes the fput() and tries to decrement the users refcount. Fix this by putting the core code in charge of the file lifetime, and call __fput_sync() during abort to ensure that release() is called before kfree. __fput_sync() is a bit too tricky to open code in all the object implementations. Instead the objects tell the core code where the file pointer is and the core will take care of the life cycle. If the object is successfully allocated then the file will hold a users refcount and the iommufd_object cannot be destroyed. It is worth noting that close(); ioctl(IOMMU_DESTROY); doesn't have an issue because close() is already using a synchronous version of fput(). The UAF looks like this: BUG: KASAN: slab-use-after-free in iommufd_eventq_fops_release+0x45/0xc0 drivers/iommu/iommufd/eventq.c:376 Write of size 4 at addr ffff888059c97804 by task syz.0.46/6164 CPU: 0 UID: 0 PID: 6164 Comm: syz.0.46 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xcd/0x630 mm/kasan/report.c:482 kasan_report+0xe0/0x110 mm/kasan/report.c:595 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0x100/0x1b0 mm/kasan/generic.c:189 instrument_atomic_read_write include/linux/instrumented.h:96 [inline] atomic_fetch_sub_release include/linux/atomic/atomic-instrumented.h:400 [inline] __refcount_dec include/linux/refcount.h:455 [inline] refcount_dec include/linux/refcount.h:476 [inline] iommufd_eventq_fops_release+0x45/0xc0 drivers/iommu/iommufd/eventq.c:376 __fput+0x402/0xb70 fs/file_table.c:468 task_work_run+0x14d/0x240 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop+0xeb/0x110 kernel/entry/common.c:43 exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline] syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline] do_syscall_64+0x41c/0x4c0 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f Link: https://patch.msgid.link/r/1-v1-02cd136829df+31-iommufd_syz_fput_jgg@nvidia.com Cc: stable@vger.kernel.org Fixes: `07838f7fd5` ("iommufd: Add iommufd fault object") Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Nirmoy Das <nirmoyd@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reported-by: syzbot+80620e2d0d0a33b09f93@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/68c8583d.050a0220.2ff435.03a2.GAE@google.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-09-19 10:34:49 -03:00
Jason Gunthorpe	7a425ec75d	iommufd: Fix refcounting race during mmap The owner object of the imap can be destroyed while the imap remains in the mtree. So access to the imap pointer without holding locks is racy with destruction. The imap is safe to access outside the lock once a users refcount is obtained, the owner object cannot start destruction until users is 0. Thus the users refcount should not be obtained at the end of iommufd_fops_mmap() but instead inside the mtree lock held around the mtree_load(). Move the refcount there and use refcount_inc_not_zero() as we can have a 0 refcount inside the mtree during destruction races. Link: https://patch.msgid.link/r/0-v1-e6faace50971+3cc-iommufd_mmap_fix_jgg@nvidia.com Cc: stable@vger.kernel.org Fixes: `56e9a0d8e5` ("iommufd: Add mmap interface") Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-09-19 10:34:49 -03:00
Lu Baolu	5bd5ab53e7	iommu/vt-d: debugfs: Avoid dumping context command register The register-based cache invalidation interface is in the process of being replaced by the queued invalidation interface. The VT-d architecture allows hardware implementations with a queued invalidation interface to not implement the registers used for cache invalidation. Currently, the debugfs interface dumps the Context Command Register unconditionally, which is not reasonable. Remove it to avoid potential access to non-present registers. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250917025051.143853-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-19 09:43:21 +02:00
Lu Baolu	7d4e404102	iommu/vt-d: Removal of Advanced Fault Logging The advanced fault logging has been removed from the specification since v4.0. Linux doesn't implement advanced fault logging functionality, but it currently dumps the advanced logging registers through debugfs. Remove the dumping of these advanced fault logging registers through debugfs to avoid potential access to non-present registers. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250917024850.143801-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-19 09:43:21 +02:00
Lu Baolu	5ef7e24c74	iommu/vt-d: PRS isn't usable if PDS isn't supported The specification, Section 7.10, "Software Steps to Drain Page Requests & Responses," requires software to submit an Invalidation Wait Descriptor (inv_wait_dsc) with the Page-request Drain (PD=1) flag set, along with the Invalidation Wait Completion Status Write flag (SW=1). It then waits for the Invalidation Wait Descriptor's completion. However, the PD field in the Invalidation Wait Descriptor is optional, as stated in Section 6.5.2.9, "Invalidation Wait Descriptor": "Page-request Drain (PD): Remapping hardware implementations reporting Page-request draining as not supported (PDS = 0 in ECAP_REG) treat this field as reserved." This implies that if the IOMMU doesn't support the PDS capability, software can't drain page requests and group responses as expected. Do not enable PCI/PRI if the IOMMU doesn't support PDS. Reported-by: Joel Granados <joel.granados@kernel.org> Closes: https://lore.kernel.org/r/20250909-jag-pds-v1-1-ad8cba0e494e@kernel.org Fixes: `66ac4db36f` ("iommu/vt-d: Add page request draining support") Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250915062946.120196-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-19 09:43:21 +02:00
Lu Baolu	4402e8f39d	iommu/vt-d: Remove LPIG from page group response descriptor Bit 66 in the page group response descriptor used to be the LPIG (Last Page in Group), but it was marked as Reserved since Specification 4.0. Remove programming on this bit to make it consistent with the latest specification. Existing hardware all treats bit 66 of the page group response descriptor as "ignored", therefore this change doesn't break any existing hardware. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250901053943.1708490-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-19 09:43:20 +02:00
Yury Norov (NVIDIA)	4c48101364	iommu/vt-d: Drop unused cap_super_offset() The macro is unused. Drop the dead code. Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20250913015024.81186-1-yury.norov@gmail.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-19 09:43:20 +02:00
Vineeth Pillai (Google)	fbe6070c73	iommu/vt-d: debugfs: Fix legacy mode page table dump logic In legacy mode, SSPTPTR is ignored if TT is not 00b or 01b. SSPTPTR maybe uninitialized or zero in that case and may cause oops like: Oops: general protection fault, probably for non-canonical address 0xf00087d3f000f000: 0000 [#1] SMP NOPTI CPU: 2 UID: 0 PID: 786 Comm: cat Not tainted 6.16.0 #191 PREEMPT(voluntary) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-5.fc42 04/01/2014 RIP: 0010:pgtable_walk_level+0x98/0x150 RSP: 0018:ffffc90000f279c0 EFLAGS: 00010206 RAX: 0000000040000000 RBX: ffffc90000f27ab0 RCX: 000000000000001e RDX: 0000000000000003 RSI: f00087d3f000f000 RDI: f00087d3f0010000 RBP: ffffc90000f27a00 R08: ffffc90000f27a98 R09: 0000000000000002 R10: 0000000000000000 R11: 0000000000000000 R12: f00087d3f000f000 R13: 0000000000000000 R14: 0000000040000000 R15: ffffc90000f27a98 FS: 0000764566dcb740(0000) GS:ffff8881f812c000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000764566d44000 CR3: 0000000109d81003 CR4: 0000000000772ef0 PKRU: 55555554 Call Trace: <TASK> pgtable_walk_level+0x88/0x150 domain_translation_struct_show.isra.0+0x2d9/0x300 dev_domain_translation_struct_show+0x20/0x40 seq_read_iter+0x12d/0x490 ... Avoid walking the page table if TT is not 00b or 01b. Fixes: `2b437e8045` ("iommu/vt-d: debugfs: Support dumping a specified page table") Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250814163153.634680-1-vineeth@bitbyteword.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-19 09:43:20 +02:00
Seyediman Seyedarab	75c02a0376	iommu/vt-d: Replace snprintf with scnprintf in dmar_latency_snapshot() snprintf() returns the number of bytes that would have been written, not the number actually written. Using this for offset tracking can cause buffer overruns if truncation occurs. Replace snprintf() with scnprintf() to ensure the offset stays within bounds. Since scnprintf() never returns a negative value, and zero is not possible in this context because 'bytes' starts at 0 and 'size - bytes' is DEBUG_BUFFER_SIZE in the first call, which is large enough to hold the string literals used, the return value is always positive. An integer overflow is also completely out of reach here due to the small and fixed buffer size. The error check in latency_show_one() is therefore unnecessary. Remove it and make dmar_latency_snapshot() return void. Signed-off-by: Seyediman Seyedarab <ImanDevel@gmail.com> Link: https://lore.kernel.org/r/20250731225048.131364-1-ImanDevel@gmail.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-19 09:43:19 +02:00
Vasant Hegde	1e56310b40	iommu/amd/pgtbl: Fix possible race while increase page table level The AMD IOMMU host page table implementation supports dynamic page table levels (up to 6 levels), starting with a 3-level configuration that expands based on IOVA address. The kernel maintains a root pointer and current page table level to enable proper page table walks in alloc_pte()/fetch_pte() operations. The IOMMU IOVA allocator initially starts with 32-bit address and onces its exhuasted it switches to 64-bit address (max address is determined based on IOMMU and device DMA capability). To support larger IOVA, AMD IOMMU driver increases page table level. But in unmap path (iommu_v1_unmap_pages()), fetch_pte() reads pgtable->[root/mode] without lock. So its possible that in exteme corner case, when increase_address_space() is updating pgtable->[root/mode], fetch_pte() reads wrong page table level (pgtable->mode). It does compare the value with level encoded in page table and returns NULL. This will result is iommu_unmap ops to fail and upper layer may retry/log WARN_ON. CPU 0 CPU 1 ------ ------ map pages unmap pages alloc_pte() -> increase_address_space() iommu_v1_unmap_pages() -> fetch_pte() pgtable->root = pte (new root value) READ pgtable->[mode/root] Reads new root, old mode Updates mode (pgtable->mode += 1) Since Page table level updates are infrequent and already synchronized with a spinlock, implement seqcount to enable lock-free read operations on the read path. Fixes: `754265bcab` ("iommu/amd: Fix race in increase_address_space()") Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Cc: stable@vger.kernel.org Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-19 09:39:40 +02:00
Vasant Hegde	a0c17ed907	iommu/amd: Fix alias device DTE setting Commit `7bea695ada` restructured DTE flag handling but inadvertently changed the alias device configuration logic. This may cause incorrect DTE settings for certain devices. Add alias flag check before calling set_dev_entry_from_acpi(). Also move the device iteration loop inside the alias check to restrict execution to cases where alias devices are present. Fixes: `7bea695ada` ("iommu/amd: Introduce struct ivhd_dte_flags to store persistent DTE flags") Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-13 08:11:30 +02:00
Janne Grunau	a2d2e6ea18	iommu/io-pgtable-dart: Fix off by one error in table index check The check for the dart table index allowed values of (1 << data->tbl_bits) while only as many entries are initialized in apple_dart_alloc_pgtable. This results in an array out of bounds access when data->tbl_bits is at its maximal value of 2. When data->tbl_bits is 0 or 1 an unset (initialized to zero) data->pgd[] entry is used. In both cases the value is used as pointer to read page table entries and results in dereferencing invalid pointers. There is no prior check that the passed iova is inside the iommu's IAS so invalid values can be passed from driver's calling iommu_map(). Fixes: `74a0e72f03` ("iommu/io-pgtable-dart: Add 4-level page table support") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/asahi/aMACFlJjrZHs_Yf-@stanley.mountain/ Signed-off-by: Janne Grunau <j@jannau.net> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-13 08:07:25 +02:00
Leon Romanovsky	f7326196a7	dma-mapping: export new dma_map_phys() interface Introduce new DMA mapping functions dma_map_phys() and dma_unmap_phys() that operate directly on physical addresses instead of page+offset parameters. This provides a more efficient interface for drivers that already have physical addresses available. The new functions are implemented as the primary mapping layer, with the existing dma_map_page_attrs()/dma_map_resource() and dma_unmap_page_attrs()/dma_unmap_resource() functions converted to simple wrappers around the phys-based implementations. In case dma_map_page_attrs(), the struct page is converted to physical address with help of page_to_phys() function and dma_map_resource() provides physical address as is together with addition of DMA_ATTR_MMIO attribute. The old page-based API is preserved in mapping.c to ensure that existing code won't be affected by changing EXPORT_SYMBOL to EXPORT_SYMBOL_GPL variant for dma_map_phys(). Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/54cc52af91777906bbe4a386113437ba0bcfba9c.1757423202.git.leonro@nvidia.com	2025-09-12 00:18:21 +02:00
Leon Romanovsky	f9374de14c	iommu/dma: implement DMA_ATTR_MMIO for iommu_dma_(un)map_phys() Make iommu_dma_map_phys() and iommu_dma_unmap_phys() respect DMA_ATTR_MMIO. DMA_ATTR_MMIO makes the functions behave the same as iommu_dma_(un)map_resource(): - No swiotlb is possible - No cache flushing is done (ATTR_MMIO should not be cached memory) - prot for iommu_map() has IOMMU_MMIO not IOMMU_CACHE This is preparation for replacing iommu_dma_map_resource() callers with iommu_dma_map_phys(DMA_ATTR_MMIO) and removing iommu_dma_(un)map_resource(). Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/acc255bee358fec9c7da6b2a5904ee50abcd09f1.1757423202.git.leonro@nvidia.com	2025-09-12 00:18:20 +02:00
Leon Romanovsky	513559f737	iommu/dma: rename iommu_dma_map_page to iommu_dma_map_phys Rename the IOMMU DMA mapping functions to better reflect their actual calling convention. The functions iommu_dma_map_page() and iommu_dma_unmap_page() are renamed to iommu_dma_map_phys() and iommu_dma_unmap_phys() respectively, as they already operate on physical addresses rather than page structures. The calling convention changes from accepting (struct page *page, unsigned long offset) to (phys_addr_t phys), which eliminates the need for page-to-physical address conversion within the functions. This renaming prepares for the broader DMA API conversion from page-based to physical address-based mapping throughout the kernel. All callers are updated to pass physical addresses directly, including dma_map_page_attrs(), scatterlist mapping functions, and DMA page allocation helpers. The change simplifies the code by removing the page_to_phys() + offset calculation that was previously done inside the IOMMU functions. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/ed172f95f8f57782beae04f782813366894e98df.1757423202.git.leonro@nvidia.com	2025-09-12 00:18:20 +02:00
Leon Romanovsky	c288d657dd	iommu/dma: implement DMA_ATTR_MMIO for dma_iova_link(). This will replace the hacky use of DMA_ATTR_SKIP_CPU_SYNC to avoid touching the possibly non-KVA MMIO memory. Also correct the incorrect caching attribute for the IOMMU, MMIO memory should not be cachable inside the IOMMU mapping or it can possibly create system problems. Set IOMMU_MMIO for DMA_ATTR_MMIO. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/17ba63991aeaf8a80d5aca9ba5d028f1daa58f62.1757423202.git.leonro@nvidia.com	2025-09-12 00:08:07 +02:00
Niklas Schnelle	9ffaf52290	iommu/s390: Make attach succeed when the device was surprise removed When a PCI device is removed with surprise hotplug, there may still be attempts to attach the device to the default domain as part of tear down via (__iommu_release_dma_ownership()), or because the removal happens during probe (__iommu_probe_device()). In both cases zpci_register_ioat() fails with a cc value indicating that the device handle is invalid. This is because the device is no longer part of the instance as far as the hypervisor is concerned. Currently this leads to an error return and s390_iommu_attach_device() fails. This triggers the WARN_ON() in __iommu_group_set_domain_nofail() because attaching to the default domain must never fail. With the device fenced by the hypervisor no DMAs to or from memory are possible and the IOMMU translations have no effect. Proceed as if the registration was successful and let the hotplug event handling clean up the device. This is similar to how devices in the error state are handled since commit `59bbf59679` ("iommu/s390: Make attach succeed even if the device is in error state") except that for removal the domain will not be registered later. This approach was also previously discussed at the link. Handle both cases, error state and removal, in a helper which checks if the error needs to be propagated or ignored. Avoid magic number condition codes by using the pre-existing, but never used, defines for PCI load/store condition codes and rename them to reflect that they apply to all PCI instructions. Cc: stable@vger.kernel.org # v6.2 Link: https://lore.kernel.org/linux-iommu/20240808194155.GD1985367@ziepe.ca/ Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Link: https://lore.kernel.org/r/20250904-iommu_succeed_attach_removed-v1-1-e7f333d2f80f@linux.ibm.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-05 15:11:09 +02:00
Eugene Koira	dce043c07c	iommu/vt-d: Fix __domain_mapping()'s usage of switch_to_super_page() switch_to_super_page() assumes the memory range it's working on is aligned to the target large page level. Unfortunately, __domain_mapping() doesn't take this into account when using it, and will pass unaligned ranges ultimately freeing a PTE range larger than expected. Take for example a mapping with the following iov_pfn range [0x3fe400, 0x4c0600), which should be backed by the following mappings: iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff] to switch_to_super_page() at a 1 GiB granularity, which will in turn free PTEs all the way to iov_pfn 0x4fffff. Mitigate this by rounding down the iov_pfn range passed to switch_to_super_page() in __domain_mapping() to the target large page level. Additionally add range alignment checks to switch_to_super_page. Fixes: `9906b9352a` ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()") Signed-off-by: Eugene Koira <eugkoira@amazon.com> Cc: stable@vger.kernel.org Reviewed-by: Nicolas Saenz Julienne <nsaenz@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://lore.kernel.org/r/20250826143816.38686-1-eugkoira@amazon.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-05 15:11:08 +02:00
Matthew Rosato	b3506e9bcc	iommu/s390: Fix memory corruption when using identity domain zpci_get_iommu_ctrs() returns counter information to be reported as part of device statistics; these counters are stored as part of the s390_domain. The problem, however, is that the identity domain is not backed by an s390_domain and so the conversion via to_s390_domain() yields a bad address that is zero'd initially and read on-demand later via a sysfs read. These counters aren't necessary for the identity domain; just return NULL in this case. This issue was discovered via KASAN with reports that look like: BUG: KASAN: global-out-of-bounds in zpci_fmb_enable_device when using the identity domain for a device on s390. Cc: stable@vger.kernel.org Fixes: `64af12c6ec` ("iommu/s390: implement iommu passthrough via identity domain") Reported-by: Cam Miller <cam@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Cam Miller <cam@linux.ibm.com> Reviewed-by: Farhan Ali <alifm@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20250827210828.274527-1-mjrosato@linux.ibm.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-05 15:11:07 +02:00
Zhen Ni	923b70581c	iommu/amd: Fix ivrs_base memleak in early_amd_iommu_init() Fix a permanent ACPI table memory leak in early_amd_iommu_init() when CMPXCHG16B feature is not supported Fixes: `82582f85ed` ("iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported") Cc: stable@vger.kernel.org Signed-off-by: Zhen Ni <zhen.ni@easystack.cn> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20250822024915.673427-1-zhen.ni@easystack.cn Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-05 15:11:07 +02:00
Sunil V L	4f901b3dce	iommu/riscv: Add ACPI support RISC-V IO Mapping Table (RIMT) provides the information about the IOMMU to the OS in ACPI. Add support for ACPI in RISC-V IOMMU drivers by using RIMT data. The changes at high level are, a) Register the IOMMU with RIMT data structures. b) Enable probing of platform IOMMU in ACPI way using the ACPIID defined for the RISC-V IOMMU in the BRS spec [1]. Configure the MSI domain if the platform IOMMU uses MSIs. [1] - https://github.com/riscv-non-isa/riscv-brs/blob/main/acpi-id.adoc Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250818045807.763922-4-sunilvl@ventanamicro.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-05 15:06:06 +02:00
Qianfeng Rong	e520b2520c	iommu/omap: Use int type to store negative error codes Change the 'ret' variable from u32 to int to store negative error codes or zero; Storing the negative error codes in unsigned type, doesn't cause an issue at runtime but it's ugly. Additionally, assigning negative error codes to unsigned type may trigger a GCC warning when the -Wsign-conversion flag is enabled. No effect on runtime. Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250829140219.121783-1-rongqianfeng@vivo.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-05 14:58:26 +02:00
Hector Martin	ecf6508923	iommu/apple-dart: Clear stream error indicator bits for T8110 DARTs These registers exist and at least on the t602x variant the IRQ only clears when theses are cleared. Signed-off-by: Hector Martin <marcan@marcan.st> Signed-off-by: Janne Grunau <j@jannau.net> Reviewed-by: Sven Peter <sven@kernel.org> Reviewed-by: Neal Gompa <neal@gompa.dev> Link: https://lore.kernel.org/r/20250826-dart-t8110-stream-error-v1-1-e33395112014@jannau.net Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-05 14:47:16 +02:00
Ashish Kalra	9be15fbfc6	iommu/amd: Skip enabling command/event buffers for kdump After a panic if SNP is enabled in the previous kernel then the kdump kernel boots with IOMMU SNP enforcement still enabled. IOMMU command buffers and event buffer registers remain locked and exclusive to the previous kernel. Attempts to enable command and event buffers in the kdump kernel will fail, as hardware ignores writes to the locked MMIO registers as per AMD IOMMU spec Section 2.12.2.1. Skip enabling command buffers and event buffers for kdump boot as they are already enabled in the previous kernel. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Link: https://lore.kernel.org/r/576445eb4f168b467b0fc789079b650ca7c5b037.1756157913.git.ashish.kalra@amd.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-05 14:45:17 +02:00
Ashish Kalra	38e5f33ee3	iommu/amd: Reuse device table for kdump After a panic if SNP is enabled in the previous kernel then the kdump kernel boots with IOMMU SNP enforcement still enabled. IOMMU device table register is locked and exclusive to the previous kernel. Attempts to copy old device table from the previous kernel fails in kdump kernel as hardware ignores writes to the locked device table base address register as per AMD IOMMU spec Section 2.12.2.1. This causes the IOMMU driver (OS) and the hardware to reference different memory locations. As a result, the IOMMU hardware cannot process the command which results in repeated "Completion-Wait loop timed out" errors and a second kernel panic: "Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC". Reuse device table instead of copying device table in case of kdump boot and remove all copying device table code. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Link: https://lore.kernel.org/r/3a31036fb2f7323e6b1a1a1921ac777e9f7bdddc.1756157913.git.ashish.kalra@amd.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-05 14:44:32 +02:00
Ashish Kalra	f32fe7cb01	iommu/amd: Add support to remap/unmap IOMMU buffers for kdump After a panic if SNP is enabled in the previous kernel then the kdump kernel boots with IOMMU SNP enforcement still enabled. IOMMU completion wait buffers (CWBs), command buffers and event buffer registers remain locked and exclusive to the previous kernel. Attempts to allocate and use new buffers in the kdump kernel fail, as hardware ignores writes to the locked MMIO registers as per AMD IOMMU spec Section 2.12.2.1. This results in repeated "Completion-Wait loop timed out" errors and a second kernel panic: "Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC" The list of MMIO registers locked and which ignore writes after failed SNP shutdown are mentioned in the AMD IOMMU specifications below: Section 2.12.2.1. https://docs.amd.com/v/u/en-US/48882_3.10_PUB Reuse the pages of the previous kernel for completion wait buffers, command buffers, event buffers and memremap them during kdump boot and essentially work with an already enabled IOMMU configuration and re-using the previous kernel’s data structures. Reusing of command buffers and event buffers is now done for kdump boot irrespective of SNP being enabled during kdump. Re-use of completion wait buffers is only done when SNP is enabled as the exclusion base register is used for the completion wait buffer (CWB) address only when SNP is enabled. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Link: https://lore.kernel.org/r/ff04b381a8fe774b175c23c1a336b28bc1396511.1756157913.git.ashish.kalra@amd.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-05 14:44:30 +02:00
Hector Martin	5229bd5f9e	iommu/apple-dart: Add 4-level page table support The T8110 variant DART implementation on T602x SoCs indicates an IAS of 42, which requires an extra page table level. The extra level is optional, but let's implement it. Later it might be useful to restrict this based on the actual attached devices, since most won't need that much address space anyway. Signed-off-by: Hector Martin <marcan@marcan.st> Signed-off-by: Janne Grunau <j@jannau.net> Reviewed-by: Sven Peter <sven@kernel.org> Reviewed-by: Neal Gompa <neal@gompa.dev> Link: https://lore.kernel.org/r/20250821-apple-dart-4levels-v2-3-e39af79daa37@jannau.net Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-05 14:27:32 +02:00
Hector Martin	74a0e72f03	iommu/io-pgtable-dart: Add 4-level page table support DARTs on t602x SoCs are of the t8110 variant but have an IAS of 42, which means optional support for an extra page table level. Refactor the PTE management to support an arbitrary level count, and then calculate how many levels we need for any given configuration. Signed-off-by: Hector Martin <marcan@marcan.st> Signed-off-by: Janne Grunau <j@jannau.net> Reviewed-by: Sven Peter <sven@kernel.org> Reviewed-by: Neal Gompa <neal@gompa.dev> Link: https://lore.kernel.org/r/20250821-apple-dart-4levels-v2-2-e39af79daa37@jannau.net Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-05 14:27:31 +02:00
Hector Martin	1268890086	iommu/apple-dart: Make the hw register fields u32s The registers are 32-bit and the offsets definitely don't need 64 bits either, these should've been u32s. Signed-off-by: Hector Martin <marcan@marcan.st> Signed-off-by: Janne Grunau <j@jannau.net> Reviewed-by: Sven Peter <sven@kernel.org> Reviewed-by: Neal Gompa <neal@gompa.dev> Link: https://lore.kernel.org/r/20250821-apple-dart-4levels-v2-1-e39af79daa37@jannau.net Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-05 14:27:31 +02:00
Xichao Zhao	d3d3b60427	iommu/amd: use str_plural() to simplify the code Use the string choice helper function str_plural() to simplify the code. Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com> Reviewed-by: Ankit Soni <Ankit.Soni@amd.com> Link: https://lore.kernel.org/r/20250818070556.458271-1-zhao.xichao@vivo.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-05 14:25:32 +02:00
Linus Torvalds	471b25a2fc	Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd fixes from Jason Gunthorpe: "Two very minor fixes: - Fix mismatched kvalloc()/kfree() - Spelling fixes in documentation" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd: Fix spelling errors in iommufd.rst iommufd: viommu: free memory allocated by kvcalloc() using kvfree()	2025-08-22 17:24:48 -04:00
XianLiang Huang	99d4d1a070	iommu/riscv: prevent NULL deref in iova_to_phys The riscv_iommu_pte_fetch() function returns either NULL for unmapped/never-mapped iova, or a valid leaf pte pointer that requires no further validation. riscv_iommu_iova_to_phys() failed to handle NULL returns. Prevent null pointer dereference in riscv_iommu_iova_to_phys(), and remove the pte validation. Fixes: `488ffbf181` ("iommu/riscv: Paging domain support") Cc: Tomasz Jeznach <tjeznach@rivosinc.com> Signed-off-by: XianLiang Huang <huangxianliang@lanxincomputing.com> Link: https://lore.kernel.org/r/20250820072248.312-1-huangxianliang@lanxincomputing.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-08-22 08:51:49 +02:00
Robin Murphy	72b6f7cd89	iommu/virtio: Make instance lookup robust Much like arm-smmu in commit `7d835134d4` ("iommu/arm-smmu: Make instance lookup robust"), virtio-iommu appears to have the same issue where iommu_device_register() makes the IOMMU instance visible to other API callers (including itself) straight away, but internally the instance isn't ready to recognise itself for viommu_probe_device() to work correctly until after viommu_probe() has returned. This matters a lot more now that bus_iommu_probe() has the DT/VIOT knowledge to probe client devices the way that was always intended. Tweak the lookup and initialisation in much the same way as for arm-smmu, to ensure that what we register is functional and ready to go. Cc: stable@vger.kernel.org Fixes: `bcb81ac6ae` ("iommu: Get DT/ACPI parsing into the proper probe path") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/308911aaa1f5be32a3a709996c7bd6cf71d30f33.1755190036.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-08-22 08:43:23 +02:00
Nicolin Chen	685ca577b4	iommu/arm-smmu-v3: Fix smmu_domain->nr_ats_masters decrement The arm_smmu_attach_commit() updates master->ats_enabled before calling arm_smmu_remove_master_domain() that is supposed to clean up everything in the old domain, including the old domain's nr_ats_masters. So, it is supposed to use the old ats_enabled state of the device, not an updated state. This isn't a problem if switching between two domains where: - old ats_enabled = false; new ats_enabled = false - old ats_enabled = true; new ats_enabled = true but can fail cases where: - old ats_enabled = false; new ats_enabled = true (old domain should keep the counter but incorrectly decreased it) - old ats_enabled = true; new ats_enabled = false (old domain needed to decrease the counter but incorrectly missed it) Update master->ats_enabled after arm_smmu_remove_master_domain() to fix this. Fixes: `7497f4211f` ("iommu/arm-smmu-v3: Make changing domains be hitless for ATS") Cc: stable@vger.kernel.org Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Link: https://lore.kernel.org/r/20250801030127.2006979-1-nicolinc@nvidia.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-08-22 08:41:20 +02:00
Akhilesh Patil	8fe8a09204	iommufd: viommu: free memory allocated by kvcalloc() using kvfree() Use kvfree() instead of kfree() to free pages allocated by kvcalloc() in iommufs_hw_queue_alloc_phys() to fix potential memory corruption. Ensure the memory is properly freed, as kvcalloc may internally use vmalloc or kmalloc depending on available memory in the system. Fixes: `2238ddc2b0` ("iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl") Link: https://patch.msgid.link/r/aJifyVV2PL6WGEs6@bhairav-test.ee.iitb.ac.in Signed-off-by: Akhilesh Patil <akhilesh@ee.iitb.ac.in> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-08-18 11:10:40 -03:00
Nicolin Chen	41f0200c71	iommu/tegra241-cmdqv: Fix missing cpu_to_le64 at lvcmdq_err_map Sparse reported a warning: drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c:305:47: sparse: expected restricted __le64 sparse: got unsigned long long Add cpu_to_le64() to fix that. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202508142105.Jb5Smjsg-lkp@intel.com/ Suggested-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20250814193039.2265813-1-nicolinc@nvidia.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-08-15 12:02:24 +02:00
Kees Cook	8503d0fcb1	iommu/amd: Avoid stack buffer overflow from kernel cmdline While the kernel command line is considered trusted in most environments, avoid writing 1 byte past the end of "acpiid" if the "str" argument is maximum length. Reported-by: Simcha Kosman <simcha.kosman@cyberark.com> Closes: https://lore.kernel.org/all/AS8P193MB2271C4B24BCEDA31830F37AE84A52@AS8P193MB2271.EURP193.PROD.OUTLOOK.COM Fixes: `b6b26d86c6` ("iommu/amd: Add a length limitation for the ivrs_acpihid command-line parameter") Signed-off-by: Kees Cook <kees@kernel.org> Reviewed-by: Ankit Soni <Ankit.Soni@amd.com> Link: https://lore.kernel.org/r/20250804154023.work.970-kees@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-08-15 11:50:47 +02:00
Linus Torvalds	0bd0a41a51	Merge tag 'pci-v6.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Allow built-in drivers, not just modular drivers, to use async initial probing (Lukas Wunner) - Support Immediate Readiness even on devices with no PM Capability (Sean Christopherson) - Consolidate definition of PCIE_RESET_CONFIG_WAIT_MS (100ms), the required delay between a reset and sending config requests to a device (Niklas Cassel) - Add pci_is_display() to check for "Display" base class and use it in ALSA hda, vfio, vga_switcheroo, vt-d (Mario Limonciello) - Allow 'isolated PCI functions' (multi-function devices without a function 0) for LoongArch, similar to s390 and jailhouse (Huacai Chen) Power control: - Add ability to enable optional slot clock for cases where the PCIe host controller and the slot are supplied by different clocks (Marek Vasut) PCIe native device hotplug: - Fix runtime PM ref imbalance on Hot-Plug Capable ports caused by misinterpreting a config read failure after a device has been removed (Lukas Wunner) - Avoid creating a useless PCIe port service device for pciehp if the slot is handled by the ACPI hotplug driver (Lukas Wunner) - Ignore ACPI hotplug slots when calculating depth of pciehp hotplug ports (Lukas Wunner) Virtualization: - Save VF resizable BAR state and restore it after reset (Michał Winiarski) - Allow IOV resources (VF BARs) to be resized (Michał Winiarski) - Add pci_iov_vf_bar_set_size() so drivers can control VF BAR size (Michał Winiarski) Endpoint framework: - Add RC-to-EP doorbell support using platform MSI controller, including a test case (Frank Li) - Allow BAR assignment via configfs so platforms have flexibility in determining BAR usage (Jerome Brunet) Native PCIe controller drivers: - Convert amazon,al-alpine-v[23]-pcie, apm,xgene-pcie, axis,artpec6-pcie, marvell,armada-3700-pcie, st,spear1340-pcie to DT schema format (Rob Herring) - Use dev_fwnode() instead of of_fwnode_handle() to remove OF dependency in altera (fixes an unused variable), designware-host, mediatek, mediatek-gen3, mobiveil, plda, xilinx, xilinx-dma, xilinx-nwl (Jiri Slaby, Arnd Bergmann) - Convert aardvark, altera, brcmstb, designware-host, iproc, mediatek, mediatek-gen3, mobiveil, plda, rcar-host, vmd, xilinx, xilinx-dma, xilinx-nwl from using pci_msi_create_irq_domain() to using msi_create_parent_irq_domain() instead; this makes the interrupt controller per-PCI device, allows dynamic allocation of vectors after initialization, and allows support of IMS (Nam Cao) APM X-Gene PCIe controller driver: - Rewrite MSI handling to MSI CPU affinity, drop useless CPU hotplug bits, use device-managed memory allocations, and clean things up (Marc Zyngier) - Probe xgene-msi as a standard platform driver rather than a subsys_initcall (Marc Zyngier) Broadcom STB PCIe controller driver: - Add optional DT 'num-lanes' property and if present, use it to override the Maximum Link Width advertised in Link Capabilities (Jim Quinlan) Cadence PCIe controller driver: - Use PCIe Message routing types from the PCI core rather than defining private ones (Hans Zhang) Freescale i.MX6 PCIe controller driver: - Add IMX8MQ_EP third 64-bit BAR in epc_features (Richard Zhu) - Add IMX8MM_EP and IMX8MP_EP fixed 256-byte BAR 4 in epc_features (Richard Zhu) - Configure LUT for MSI/IOMMU in Endpoint mode so Root Complex can trigger doorbel on Endpoint (Frank Li) - Remove apps_reset (LTSSM_EN) from imx_pcie_{assert,deassert}_core_reset(), which fixes a hotplug regression on i.MX8MM (Richard Zhu) - Delay Endpoint link start until configfs 'start' written (Richard Zhu) Intel VMD host bridge driver: - Add Intel Panther Lake (PTL)-H/P/U Vendor ID (George D Sworo) Qualcomm PCIe controller driver: - Add DT binding and driver support for SA8255p, which supports ECAM for Configuration Space access (Mayank Rana) - Update DT binding and driver to describe PHYs and per-Root Port resets in a Root Port stanza and deprecate describing them in the host bridge; this makes it possible to support multiple Root Ports in the future (Krishna Chaitanya Chundru) - Add Qualcomm QCS615 to SM8150 DT binding (Ziyue Zhang) - Add Qualcomm QCS8300 to SA8775p DT binding (Ziyue Zhang) - Drop TBU and ref clocks from Qualcomm SM8150 and SC8180x DT bindings (Konrad Dybcio) - Document 'link_down' reset in Qualcomm SA8775P DT binding (Ziyue Zhang) - Add required PCIE_RESET_CONFIG_WAIT_MS delay after Link up IRQ (Niklas Cassel) Rockchip PCIe controller driver: - Drop unused PCIe Message routing and code definitions (Hans Zhang) - Remove several unused header includes (Hans Zhang) - Use standard PCIe config register definitions instead of rockchip-specific redefinitions (Geraldo Nascimento) - Set Target Link Speed to 5.0 GT/s before retraining so we have a chance to train at a higher speed (Geraldo Nascimento) Rockchip DesignWare PCIe controller driver: - Prevent race between link training and register update via DBI by inhibiting link training after hot reset and link down (Wilfred Mallawa) - Add required PCIE_RESET_CONFIG_WAIT_MS delay after Link up IRQ (Niklas Cassel) Sophgo PCIe controller driver: - Add DT binding and driver for Sophgo SG2044 PCIe controller driver in Root Complex mode (Inochi Amaoto) Synopsys DesignWare PCIe controller driver: - Add required PCIE_RESET_CONFIG_WAIT_MS after waiting for Link up on Ports that support > 5.0 GT/s. Slower Ports still rely on the not-quite-correct PCIE_LINK_WAIT_SLEEP_MS 90ms default delay while waiting for the Link (Niklas Cassel)" * tag 'pci-v6.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (116 commits) dt-bindings: PCI: qcom,pcie-sa8775p: Document 'link_down' reset dt-bindings: PCI: Remove 83xx-512x-pci.txt dt-bindings: PCI: Convert amazon,al-alpine-v[23]-pcie to DT schema dt-bindings: PCI: Convert marvell,armada-3700-pcie to DT schema dt-bindings: PCI: Convert apm,xgene-pcie to DT schema dt-bindings: PCI: Convert axis,artpec6-pcie to DT schema dt-bindings: PCI: Convert st,spear1340-pcie to DT schema PCI: Move is_pciehp check out of pciehp_is_native() PCI: pciehp: Use is_pciehp instead of is_hotplug_bridge PCI/portdrv: Use is_pciehp instead of is_hotplug_bridge PCI/ACPI: Fix runtime PM ref imbalance on Hot-Plug Capable ports selftests: pci_endpoint: Add doorbell test case misc: pci_endpoint_test: Add doorbell test case PCI: endpoint: pci-epf-test: Add doorbell test support PCI: endpoint: Add pci_epf_align_inbound_addr() helper for inbound address alignment PCI: endpoint: pci-ep-msi: Add checks for MSI parent and mutability PCI: endpoint: Add RC-to-EP doorbell support using platform MSI controller PCI: dwc: Add Sophgo SG2044 PCIe controller driver in Root Complex mode PCI: vmd: Switch to msi_create_parent_irq_domain() PCI: vmd: Convert to lock guards ...	2025-08-01 13:59:07 -07:00

1 2 3 4 5 ...

6031 Commits