linux-cryptodev-2.6

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git synced 2026-04-19 03:53:51 -04:00

Author	SHA1	Message	Date
Mario Limonciello (AMD)	28695ca09d	drm/amd: Clean up kfd node on surprise disconnect When an eGPU is unplugged the KFD topology should also be destroyed for that GPU. This never happens because the fini_sw callbacks never get to run. Run them manually before calling amdgpu_device_ip_fini_early() when a device has already been disconnected. This location is intentionally chosen to make sure that the kfd locking refcount doesn't get incremented unintentionally. Cc: kent.russell@amd.com Closes: https://community.frame.work/t/amd-egpu-on-linux/8691/33 Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `6a23e7b433`) Cc: stable@vger.kernel.org	2026-01-14 14:51:36 -05:00
Mario Limonciello (AMD)	6b2989ac5e	Reapply "Revert "drm/amd: Skip power ungate during suspend for VPE"" Skipping power ungate exposed some scenarios that will fail like below: ``` amdgpu: Register(0) [regVPEC_QUEUE_RESET_REQ] failed to reach value 0x00000000 != 0x00000001n amdgpu 0000:c1:00.0: amdgpu: VPE queue reset failed ... amdgpu: [drm] ERROR wait_for_completion_timeout timeout! ``` The underlying s2idle issue that prompted this commit is going to be fixed in BIOS. This reverts commit `2a6c826cfe`. This was lost in the 6.19 merge so reapply it. Fixes: `2a6c826cfe` ("drm/amd: Skip power ungate during suspend for VPE") Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reported-by: Konstantin <answer2019@yandex.ru> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220812 Reported-by: Matthew Schwartz <matthew.schwartz@linux.dev> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `3925683515`)	2026-01-07 17:24:16 -05:00
Perry Yuan	0de604d035	drm/amd/pm: Disable MMIO access during SMU Mode 1 reset During Mode 1 reset, the ASIC undergoes a reset cycle and becomes temporarily inaccessible via PCIe. Any attempt to access MMIO registers during this window (e.g., from interrupt handlers or other driver threads) can result in uncompleted PCIe transactions, leading to NMI panics or system hangs. To prevent this, set the `no_hw_access` flag to true immediately after triggering the reset. This signals other driver components to skip register accesses while the device is offline. A memory barrier `smp_mb()` is added to ensure the flag update is globally visible to all cores before the driver enters the sleep/wait state. Signed-off-by: Perry Yuan <perry.yuan@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `7edb503fe4`)	2026-01-07 17:24:10 -05:00
Alex Deucher	77f7325301	drm/amdgpu: fix a job->pasid access race in gpu recovery Avoid a possible UAF in GPU recovery due to a race between the sched timeout callback and the tdr work queue. The gpu recovery function calls drm_sched_stop() and later drm_sched_start(). drm_sched_start() restarts the tdr queue which will eventually free the job. If the tdr queue frees the job before time out callback completes, the job will be freed and we'll get a UAF when accessing the pasid. Cache it early to avoid the UAF. Example KASAN trace: [ 493.058141] BUG: KASAN: slab-use-after-free in amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] [ 493.067530] Read of size 4 at addr ffff88b0ce3f794c by task kworker/u128:1/323 [ 493.074892] [ 493.076485] CPU: 9 UID: 0 PID: 323 Comm: kworker/u128:1 Tainted: G E 6.16.0-1289896.2.zuul.bf4f11df81c1410bbe901c4373305a31 #1 PREEMPT(voluntary) [ 493.076493] Tainted: [E]=UNSIGNED_MODULE [ 493.076495] Hardware name: TYAN B8021G88V2HR-2T/S8021GM2NR-2T, BIOS V1.03.B10 04/01/2019 [ 493.076500] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched] [ 493.076512] Call Trace: [ 493.076515] <TASK> [ 493.076518] dump_stack_lvl+0x64/0x80 [ 493.076529] print_report+0xce/0x630 [ 493.076536] ? _raw_spin_lock_irqsave+0x86/0xd0 [ 493.076541] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 493.076545] ? amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] [ 493.077253] kasan_report+0xb8/0xf0 [ 493.077258] ? amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] [ 493.077965] amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] [ 493.078672] ? __pfx_amdgpu_device_gpu_recover+0x10/0x10 [amdgpu] [ 493.079378] ? amdgpu_coredump+0x1fd/0x4c0 [amdgpu] [ 493.080111] amdgpu_job_timedout+0x642/0x1400 [amdgpu] [ 493.080903] ? pick_task_fair+0x24e/0x330 [ 493.080910] ? __pfx_amdgpu_job_timedout+0x10/0x10 [amdgpu] [ 493.081702] ? _raw_spin_lock+0x75/0xc0 [ 493.081708] ? __pfx__raw_spin_lock+0x10/0x10 [ 493.081712] drm_sched_job_timedout+0x1b0/0x4b0 [gpu_sched] [ 493.081721] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 493.081725] process_one_work+0x679/0xff0 [ 493.081732] worker_thread+0x6ce/0xfd0 [ 493.081736] ? __pfx_worker_thread+0x10/0x10 [ 493.081739] kthread+0x376/0x730 [ 493.081744] ? __pfx_kthread+0x10/0x10 [ 493.081748] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 493.081751] ? __pfx_kthread+0x10/0x10 [ 493.081755] ret_from_fork+0x247/0x330 [ 493.081761] ? __pfx_kthread+0x10/0x10 [ 493.081764] ret_from_fork_asm+0x1a/0x30 [ 493.081771] </TASK> Fixes: `a72002cb18` ("drm/amdgpu: Make use of drm_wedge_task_info") Link: https://github.com/HansKristian-Work/vkd3d-proton/pull/2670 Cc: SRINIVASAN.SHANMUGAM@amd.com Cc: vitaly.prosyak@amd.com Cc: christian.koenig@amd.com Suggested-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `20880a3fd5`)	2025-12-16 14:16:08 -05:00
Linus Torvalds	43dfc13ca9	Merge tag 'pci-v6.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms (Dan Williams) - Switch vmd from custom domain number allocator to the common allocator to prevent a potential race with new non-VMD buses (Dan Williams) - Enable Precision Time Measurement (PTM) only if device advertises support for a relevant role, to prevent invalid PTM Requests that cause ACS violations that are reported as AER Uncorrectable Non-Fatal errors (Mika Westerberg) Resource management: - Prevent resource tree corruption when BAR resize fails (Ilpo Järvinen) - Restore BARs to the original size if a BAR resize fails (Ilpo Järvinen) - Remove BAR release from BAR resize attempts by the xe, i915, and amdgpu drivers so the PCI core can restore BARs if the resize fails (Ilpo Järvinen) - Move Resizable BAR code to rebar.c (Ilpo Järvinen) - Add pci_rebar_size_supported() and use it in i915 and xe (Ilpo Järvinen) - Add pci_rebar_get_max_size() and use it in xe and amdgpu (Ilpo Järvinen) Power management and error handling: - For drivers using PCI legacy suspend, save config state at suspend so that state (not any earlier state from enumeration, probe, or error recovery) will be restored when resuming (Lukas Wunner) - For devices with no driver or a driver that lacks power management, save config state at hibernate so that state (not any earlier state from enumeration, probe, or error recovery) will be restored when resuming (Lukas Wunner) - Save device config space on device addition, before driver binding, so error recovery works more reliably (Lukas Wunner) - Drop pci_save_state() from several drivers that no longer need it since the PCI core always does it and pci_restore_state() no longer invalidates the saved state (Lukas Wunner) - Document use of pci_save_state() by drivers to capture the state they want restored during error recovery (Lukas Wunner) Power control: - Add a struct pci_ops.assert_perst() function pointer to assert/deassert PCIe PERST# and implement it for the qcom driver (Krishna Chaitanya Chundru) - Add DT binding and pwrctrl driver for the Toshiba TC9563 PCIe switch, which must be held in reset after poweron so the pwrctrl driver can configure the switch via I2C before bringing up the links (Krishna Chaitanya Chundru) Endpoint framework: - Convert the endpoint doorbell test to use a threaded IRQ to fix a 'sleeping while atomic' issue (Bhanu Seshu Kumar Valluri) - Add endpoint VNTB MSI doorbell support to reduce latency between host and endpoint (Frank Li) New native PCIe controller drivers: - Add CIX Sky1 host controller DT binding and driver (Hans Zhang) - Add NXP S32G host controller DT binding and driver (Vincent Guittot) - Add Renesas RZ/G3S host controller DT binding and driver (Claudiu Beznea) - Add SpacemiT K1 host controller DT binding and driver (Alex Elder) Amlogic Meson PCIe controller driver: - Update DT binding to name DBI region 'dbi', not 'elbi', and update driver to support both (Manivannan Sadhasivam) Apple PCIe controller driver: - Move struct pci_host_bridge allocation from pci_host_common_init() to callers, which significantly simplifies pcie-apple (Marc Zyngier) Broadcom STB PCIe controller driver: - Disable advertising ASPM L0s support correctly (Jim Quinlan) - Add a panic/die handler to print diagnostic info in case PCIe caused an unrecoverable abort (Jim Quinlan) Cadence PCIe controller driver: - Add module support for Cadence platform host and endpoint controller driver (Manikandan K Pillai) - Split headers into 'legacy' (LGA) and 'high perf' (HPA) to prepare for new CIX Sky1 driver (Manikandan K Pillai) MediaTek PCIe controller driver: - Convert DT binding to YAML schema (Christian Marangi) - Add Airoha AN7583 DT compatible and driver support (Christian Marangi) Qualcomm PCIe controller driver: - Add Qualcomm Kaanapali to SM8550 DT binding (Qiang Yu) - Add required 'power-domains' and 'resets' to qcom sa8775p, sc7280, sc8280xp, sm8150, sm8250, sm8350, sm8450, sm8550, x1e80100 DT schemas (Krzysztof Kozlowski) - Look up OPP using both frequency and data rate (not just frequency) so RPMh votes can account for both (Krishna Chaitanya Chundru) Rockchip DesignWare PCIe controller driver: - Add Rockchip RK3528 compatible strings in DT binding (Yao Zi) STMicroelectronics STM32MP25 PCIe controller driver: - Fix a race between link training and endpoint register initialization (Christian Bruel) - Align endpoint allocations to match the ATU requirements (Christian Bruel) Synopsys DesignWare PCIe controller driver: - Clear L1 PM Substate Capability 'Supported' bits unless glue driver says it's supported, which prevents users from enabling non-working L1SS. Currently only qcom and tegra194 support L1SS (Bjorn Helgaas) - Remove now-superfluous L1SS disable code from tegra194 (Bjorn Helgaas) - Configure L1SS support in dw-rockchip when DT says 'supports-clkreq' (Shawn Lin) TI Keystone PCIe controller driver: - Fail the probe instead of silently succeeding if ks_pcie_of_data didn't specify Root Complex or Endpoint mode (Siddharth Vadapalli) - Make keystone buildable as a loadable module, except on ARM32 where hook_fault_code() is __init (Siddharth Vadapalli)" * tag 'pci-v6.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (100 commits) MAINTAINERS: Add Manivannan Sadhasivam as PCI/pwrctrl maintainer MAINTAINERS: Add CIX Sky1 PCIe controller driver maintainer PCI: sky1: Add PCIe host support for CIX Sky1 dt-bindings: PCI: Add CIX Sky1 PCIe Root Complex bindings PCI: cadence: Add support for High Perf Architecture (HPA) controller MAINTAINERS: Add NXP S32G PCIe controller driver maintainer PCI: s32g: Add NXP S32G PCIe controller driver (RC) PCI: dwc: Add register and bitfield definitions dt-bindings: PCI: s32g: Add NXP S32G PCIe controller PCI: Add Renesas RZ/G3S host controller driver PCI: host-generic: Move bridge allocation outside of pci_host_common_init() dt-bindings: PCI: Add Renesas RZ/G3S PCIe controller binding PCI: Validate pci_rebar_size_supported() input Documentation: PCI: Amend error recovery doc with pci_save_state() rules treewide: Drop pci_save_state() after pci_restore_state() PCI/ERR: Ensure error recoverability at all times PCI/PM: Stop needlessly clearing state_saved on enumeration and thaw PCI/PM: Reinstate clearing state_saved in legacy and !PM codepaths PCI: dw-rockchip: Configure L1SS support PCI: tegra194: Remove unnecessary L1SS disable code ...	2025-12-04 17:29:41 -08:00
Linus Torvalds	6dfafbd029	Merge tag 'drm-next-2025-12-03' of https://gitlab.freedesktop.org/drm/kernel Pull drm updates from Dave Airlie: "There was a rather late merge of a new color pipeline feature, that some userspace projects are blocked on, and has seen a lot of work in amdgpu. This should have seen some time in -next. There is additional support for this for Intel, that if it arrives in the next day or two I'll pass it on in another pull request and you can decide if you want to take it. Highlights: - Arm Ethos NPU accelerator driver - new DRM color pipeline support - amdgpu will now run discrete SI/CIK cards instead of radeon, which enables vulkan support in userspace - msm gets gen8 gpu support - initial Xe3P support in xe Full detail summary: New driver: - Arm Ethos-U65/U85 accel driver Core: - support the drm color pipeline in vkms/amdgfx - add support for drm colorop pipeline - add COLOR PIPELINE plane property - add DRM_CLIENT_CAP_PLANE_COLOR_PIPELINE - throttle dirty worker with vblank - use drm_for_each_bridge_in_chain_scoped in drm's bridge code - Ensure drm_client_modeset tests are enabled in UML - add simulated vblank interrupt - use in drivers - dumb buffer sizing helper - move freeing of drm client memory to driver - crtc sharpness strength property - stop using system_wq in scheduler/drivers - support emergency restore in drm-client Rust: - make slice::as_flattened usable on all supported rustc - add FromBytes::from_bytes_prefix() method - remove redundant device ptr from Rust GEM object - Change how AlwaysRefCounted is implemented for GEM objects gpuvm: - Add deferred vm_bo cleanup to GPUVM (for rust) atomic: - cleanup and improve state handling interfaces buddy: - optimize block management dma-buf: - heaps: Create heap per CMA reserved location - improve userspace documentation dp: - add POST_LT_ADJ_REQ training sequence - DPCD dSC quirk for synaptics panamera devices - helpers to query branch DSC max throughput ttm: - Rename ttm_bo_put to ttm_bo_fini - allow page protection flags on risc-v - rework pipelined eviction fence handling amdgpu: - enable amdgpu by default for SI/CI dGPUs - enable DC by default on SI - refactor CIK/SI enablement - add ABM KMS property - Re-enable DM idle optimizations - DC Analog encoders support - Powerplay fixes for fiji/iceland - Enable DC on bonaire by default - HMM cleanup - Add new RAS framework - DML2.1 updates - YCbCr420 fixes - DC FP fixes - DMUB fixes - LTTPR fixes - DTBCLK fixes - DMU cursor offload handling - Userq validation improvements - Unify shutdown callback handling - Suspend improvements - Power limit code cleanup - SR-IOV fixes - AUX backlight fixes - DCN 3.5 fixes - HDMI compliance fixes - DCN 4.0.1 cursor updates - DCN interrupt fix - DC KMS full update improvements - Add additional HDCP traces - DCN 3.2 fixes - DP MST fixes - Add support for new SR-IOV mailbox interface - UQ reset support - HDP flush rework - VCE1 support amdkfd: - HMM cleanups - Relax checks on save area overallocations - Fix GPU mappings after prefetch radeon: - refactor CIK/SI enablement xe: - Initial Xe3P support - panic support on VRAM for display - fix stolen size check - Loosen used tracking restriction - New SR-IOV debugfs structure and debugfs updates - Hide the GPU madvise flag behind a VM_BIND flag - Always expose VRAM provisioning data on discrete GPUs - Allow VRAM mappings for userptr when used with SVM - Allow pinning of p2p dma-buf - Use per-tile debugfs where appropriate - Add documentation for Execution Queues - PF improvements - VF migration recovery redesign work - User / Kernel VRAM partitioning - Update Tile-based messages - Allow configfs to disable specific GT types - VF provisioning and migration improvements - use SVM range helpers in PT layer - Initial CRI support - access VF registers using dedicated MMIO view - limit number of jobs per exec queue - add sriov_admin sysfs tree - more crescent island specific support - debugfs residency counter - SRIOV migration work - runtime registers for GFX 35 i915: - add initial Xe3p_LPD display version 35 support - Enable LNL+ content adaptive sharpness filter - Use optimized VRR guardband - Enable Xe3p LT PHY - enable FBC support for Xe3p_LPD display - add display 30.02 firmware support - refactor SKL+ watermark latency setup - refactor fbdev handling - call i915/xe runtime PM via function pointers - refactor i915/xe stolen memory/display interfaces - use display version instead of gfx version in display code - extend i915_display_info with Type-C port details - lots of display cleanups/refactorings - set O_LARGEFILE in __create_shmem - skuip guc communication warning on reset - fix time conversions - defeature DRRS on LNL+ - refactor intel_frontbuffer split between i915/xe/display - convert inteL_rom interfaces to struct drm_device - unify display register polling interfaces - aovid lock inversion when pinning to GGTT on CHV/BXT+VTD panel: - Add KD116N3730A08/A12, chromebook mt8189 - JT101TM023, LQ079L1SX01, - GLD070WX3-SL01 MIPI DSI - Samsung LTL106AL0, Samsung LTL106AL01 - Raystar RFF500F-AWH-DNN - Winstar WF70A8SYJHLNGA - Wanchanglong w552946aaa - Samsung SOFEF00 - Lenovo X13s panel - ilitek-ili9881c - add rpi 5" support - visionx-rm69299 - add backlight support - edp - support AUI B116XAN02.0 bridge: - improve ref counting - ti-sn65dsi86 - add support for DP mode with HPD - synopsis: support CEC, init timer with correct freq - ASL CS5263 DP-to-HDMI bridge support nova-core: - introduce bitfield! macro - introduce safe integer converters - GSP inits to fully booted state on Ampere - Use more future-proof register for GPU identification nova-drm: - select NOVA_CORE - 64-bit only nouveau: - improve reclocking on tegra 186+ - add large page and compression support msm: - GPU: - Gen8 support: A840 (Kaanapali) and X2-85 (Glymur) - A612 support - MDSS: - Added support for Glymur and QCS8300 platforms - DPU: - Enabled Quad-Pipe support, unlocking higher resolutions support - Added support for Glymur platform - Documented DPU on QCS8300 platform as supported - DisplayPort: - Added support for Glymur platform - Added support lame remapping inside DP block - Documented DisplayPort controller on QCS8300 and SM6150/QCS615 as supported tegra: - NVJPG driver panfrost: - display JM contexts over debugfs - export JM contexts to userspace - improve error and job handling panthor: - support custom ASN_HASH for mt8196 - support mali-G1 GPU - flush shmem write before mapping buffers uncached - make timeout per-queue instead of per-job mediatek: - MT8195/88 HDMIv2/DDCv2 support rockchip: - dsi: add support for RK3368 amdxdna: - enhance runtime PM - last hardware error reading uapi - support firmware debug output - add resource and telemetry data uapi - preemption support imx: - add driver for HDMI TX Parallel audio interface ivpu: - add support for user-managed preemption buffer - add userptr support - update JSM firware API to 3.33.0 - add better alloc/free warnings - fix page fault in unbind all bos - rework bind/unbind of imported buffers - enable MCA ECC signalling - split fw runtime and global memory buffers - add fdinfo memory statistics tidss: - convert to drm logging - logging cleanup ast: - refactor generation init paths - add per chip generation detect_tx_chip - set quirks for each chip model atmel-hlcdc: - set LCDC_ATTRE register in plane disable - set correct values for plane scaler solomon: - use drm helper for get_modes and move_valid sitronix: - fix output position when clearing screens qaic: - support dma-buf exports - support new firmware's READ_DATA implementation - sahara AIC200 image table update - add sysfs support - add coredump support - add uevents support - PM support sun4i: - layer refactors to decouple plane from output - improve DE33 support vc4: - switch to generic CEC helpers komeda: - use drm_ logging functions vkms: - configfs support for display configuration vgem: - fix fence timer deadlock etnaviv: - add HWDB entry for GC8000 Nano Ultra VIP r6205" * tag 'drm-next-2025-12-03' of https://gitlab.freedesktop.org/drm/kernel: (1869 commits) Revert "drm/amd: Skip power ungate during suspend for VPE" drm/amdgpu: use common defines for HUB faults drm/amdgpu/gmc12: add amdgpu_vm_handle_fault() handling drm/amdgpu/gmc11: add amdgpu_vm_handle_fault() handling drm/amdgpu: use static ids for ACP platform devs drm/amdgpu/sdma6: Update SDMA 6.0.3 FW version to include UMQ protected-fence fix drm/amdgpu: Forward VMID reservation errors drm/amdgpu/gmc8: Delegate VM faults to soft IRQ handler ring drm/amdgpu/gmc7: Delegate VM faults to soft IRQ handler ring drm/amdgpu/gmc6: Delegate VM faults to soft IRQ handler ring drm/amdgpu/gmc6: Cache VM fault info drm/amdgpu/gmc6: Don't print MC client as it's unknown drm/amdgpu/cz_ih: Enable soft IRQ handler ring drm/amdgpu/tonga_ih: Enable soft IRQ handler ring drm/amdgpu/iceland_ih: Enable soft IRQ handler ring drm/amdgpu/cik_ih: Enable soft IRQ handler ring drm/amdgpu/si_ih: Enable soft IRQ handler ring drm/amd/display: fix typo in display_mode_core_structs.h drm/amd/display: fix Smart Power OLED not working after S4 drm/amd/display: Move RGB-type check for audio sync to DCE HW sequence ...	2025-12-04 08:53:30 -08:00
Mario Limonciello (AMD)	3925683515	Revert "drm/amd: Skip power ungate during suspend for VPE" Skipping power ungate exposed some scenarios that will fail like below: ``` amdgpu: Register(0) [regVPEC_QUEUE_RESET_REQ] failed to reach value 0x00000000 != 0x00000001n amdgpu 0000:c1:00.0: amdgpu: VPE queue reset failed ... amdgpu: [drm] ERROR wait_for_completion_timeout timeout! ``` The underlying s2idle issue that prompted this commit is going to be fixed in BIOS. This reverts commit `2a6c826cfe`. Fixes: `2a6c826cfe` ("drm/amd: Skip power ungate during suspend for VPE") Cc: stable@vger.kernel.org Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reported-by: Konstantin <answer2019@yandex.ru> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220812 Reported-by: Matthew Schwartz <matthew.schwartz@linux.dev> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-12-02 11:02:07 -05:00
Alex Deucher	7fa666ab07	drm/amdgpu: fix cyan_skillfish2 gpu info fw handling If the board supports IP discovery, we don't need to parse the gpu info firmware. Backport to 6.18. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4721 Fixes: `fa819e3a7c` ("drm/amdgpu: add support for cyan skillfish gpu_info") Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `5427e32fa3`)	2025-11-26 12:34:16 -05:00
Alex Deucher	5427e32fa3	drm/amdgpu: fix cyan_skillfish2 gpu info fw handling If the board supports IP discovery, we don't need to parse the gpu info firmware. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4721 Fixes: `fa819e3a7c` ("drm/amdgpu: add support for cyan skillfish gpu_info") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-26 11:50:43 -05:00
Rodrigo Siqueira	34355e6183	drm/amdgpu: Fix GFX hang on SteamDeck when amdgpu is reloaded When trying to unload amdgpu in the SteamDeck (TTY mode), the following set of errors happens and the system gets unstable: [..] [drm] Initialized amdgpu 3.64.0 for 0000:04:00.0 on minor 0 amdgpu 0000:04:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] ERROR IB test failed on gfx_0.0.0 (-110). amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110). [..] amdgpu 0000:04:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000 amdgpu 0000:04:00.0: amdgpu: Failed to disable gfxoff! amdgpu 0000:04:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000 amdgpu 0000:04:00.0: amdgpu: Failed to disable gfxoff! [..] When the driver initializes the GPU, the PSP validates all the firmware loaded, and after that, it is not possible to load any other firmware unless the device is reset. What is happening in the load/unload situation is that PSP halts the GC engine because it suspects that something is amiss. To address this issue, this commit ensures that the GPU is reset (mode 2 reset) in the unload sequence. Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-24 12:34:40 -05:00
Mario Limonciello	31ab31433c	drm/amd: Skip power ungate during suspend for VPE During the suspend sequence VPE is already going to be power gated as part of vpe_suspend(). It's unnecessary to call during calls to amdgpu_device_set_pg_state(). It actually can expose a race condition with the firmware if s0i3 sequence starts as well. Drop these calls. Cc: Peyton.Lee@amd.com Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `2a6c826cfe`) Cc: stable@vger.kernel.org	2025-11-19 18:08:36 -05:00
Mario Limonciello	2a6c826cfe	drm/amd: Skip power ungate during suspend for VPE During the suspend sequence VPE is already going to be power gated as part of vpe_suspend(). It's unnecessary to call during calls to amdgpu_device_set_pg_state(). It actually can expose a race condition with the firmware if s0i3 sequence starts as well. Drop these calls. Cc: Peyton.Lee@amd.com Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-19 17:33:58 -05:00
Ilpo Järvinen	c7df7059e3	drm/amdgpu: Use pci_rebar_get_max_size() Use pci_rebar_get_max_size() to simplify amdgpu_device_resize_fb_bar(). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patch.msgid.link/20251113180053.27944-11-ilpo.jarvinen@linux.intel.com	2025-11-14 12:34:21 -06:00
Ilpo Järvinen	db92e3fef5	drm/amdgpu: Remove driver side BAR release before resize PCI core handles releasing device's resources and their rollback in case of failure of a BAR resizing operation. Releasing resource prior to calling pci_resize_resource() prevents PCI core from restoring the BARs as they were. Remove driver-side release of BARs from the amdgpu driver. Also remove the driver initiated assignment as pci_resize_resource() should try to assign as much as possible. If the driver side call manages to get more required resources assigned in some scenario, such a problem should be fixed inside pci_resize_resource() instead. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Alex Bennée <alex.bennee@linaro.org> # AVA, AMD GPU Link: https://patch.msgid.link/20251113162628.5946-11-ilpo.jarvinen@linux.intel.com	2025-11-14 12:34:20 -06:00
Ilpo Järvinen	337b1b566d	PCI: Fix restoring BARs on BAR resize rollback path BAR resize operation is implemented in the pci_resize_resource() and pbus_reassign_bridge_resources() functions. pci_resize_resource() can be called either from __resource_resize_store() from sysfs or directly by the driver for the Endpoint Device. The pci_resize_resource() requires that caller has released the device resources that share the bridge window with the BAR to be resized as otherwise the bridge window is pinned in place and cannot be changed. pbus_reassign_bridge_resources() rolls back resources if the resize operation fails, but rollback is performed only for the bridge windows. Because releasing the device resources are done by the caller of the BAR resize interface, these functions performing the BAR resize do not have access to the device resources as they were before the resize. pbus_reassign_bridge_resources() could try __pci_bridge_assign_resources() after rolling back the bridge windows as they were, however, it will not guarantee the resource are assigned due to differences in how FW and the kernel assign the resources (alignment of the start address and tail). To perform rollback robustly, the BAR resize interface has to be altered to also release the device resources that share the bridge window with the BAR to be resized. Also, remove restoring from the entries failed list as saved list should now contain both the bridge windows and device resources so the extra restore is duplicated work. Some drivers (currently only amdgpu) want to prevent releasing some resources. Add exclude_bars param to pci_resize_resource() and make amdgpu pass its register BAR (BAR 2 or 5), which should never be released during resize operation. Normally 64-bit prefetchable resources do not share a bridge window with the 32-bit only register BAR, but there are various fallbacks in the resource assignment logic which may make the resources share the bridge window in rare cases. This change (together with the driver side changes) is to counter the resource releases that had to be done to prevent resource tree corruption in the ("PCI: Release assigned resource before restoring them") change. As such, it likely restores functionality in cases where device resources were released to avoid resource tree conflicts which appeared to be "working" when such conflicts were not correctly detected by the kernel. Reported-by: Simon Richter <Simon.Richter@hogyros.de> Link: https://lore.kernel.org/linux-pci/f9a8c975-f5d3-4dd2-988e-4371a1433a60@hogyros.de/ Reported-by: Alex Bennée <alex.bennee@linaro.org> Link: https://lore.kernel.org/linux-pci/874irqop6b.fsf@draig.linaro.org/ Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash amdgpu BAR selection from https://lore.kernel.org/r/20251114103053.13778-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Alex Bennée <alex.bennee@linaro.org> # AVA, AMD GPU Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patch.msgid.link/20251113162628.5946-7-ilpo.jarvinen@linux.intel.com	2025-11-14 12:33:14 -06:00
Timur Kristóf	2da2e952a7	drm/amdgpu: Use DC by default on SI dGPUs Now that DC supports analog connectors, it has reached feature parity with the legacy non-DC display driver on SI dGPUs. Use the DC display driver by default on SI dGPUs, unless it is explicitly disabled using the amdgpu.dc=0 module parameter. DC brings proper support for DP/HDMI audio, DP MST, 10-bit colors, some HDR features, atomic modesetting, etc. Also clarify the comment about what is missing to have full DC support for CIK APUs. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-11 21:54:17 -05:00
YiPeng Chai	be031770bf	drm/amd/ras: Fix the issue of incorrect function call When amdgpu_device_health_check fails, amdgpu_ras_pre_reset will not be called and therefore amdgpu_ras_post_reset cannot be called either. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-06 09:57:17 -05:00
Alex Deucher	c81f5cebe8	drm/amdgpu: Drop PMFW RLC notifier from amdgpu_device_suspend() For S3 on vangogh, PMFW needs to be notified before the driver powers down RLC. This already happens in smu_disable_dpms() so drop the superfluous call in amdgpu_device_suspend(). Co-developed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `960e30a61e`)	2025-11-04 13:28:20 -05:00
YiPeng Chai	d95ca7f515	drm/amdgpu: suspend ras module before gpu reset During gpu reset, all GPU-related resources are inaccessible. To avoid affecting ras functionality, suspend ras module before gpu reset and resume it after gpu reset is complete. V2: Rename functions to avoid misunderstanding. V3: Move flush_delayed_work to amdgpu_ras_process_pause, Move schedule_delayed_work to amdgpu_ras_process_unpause. V4: Rename functions. V5: Move the function to amdgpu_ras.c. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:59 -05:00
YiPeng Chai	2f46c547e4	drm/amdgpu: Add ras ip block name Add ras ip block name. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:22 -05:00
Jesse.Zhang	290f46cf57	drm/amdgpu: Implement user queue reset functionality This patch adds robust reset handling for user queues (userq) to improve recovery from queue failures. The key components include: 1. Queue detection and reset logic: - amdgpu_userq_detect_and_reset_queues() identifies failed queues - Per-IP detect_and_reset callbacks for targeted recovery - Falls back to full GPU reset when needed 2. Reset infrastructure: - Adds userq_reset_work workqueue for async reset handling - Implements pre/post reset handlers for queue state management - Integrates with existing GPU reset framework 3. Error handling improvements: - Enhanced state tracking with HUNG state - Automatic reset triggering on critical failures - VRAM loss handling during recovery 4. Integration points: - Added to device init/reset paths - Called during queue destroy, suspend, and isolation events - Handles both individual queue and full GPU resets The reset functionality works with both gfx/compute and sdma queues, providing better resilience against queue failures while minimizing disruption to unaffected queues. v2: add detection and reset calls when preemption/unmaped fails. add a per device userq counter for each user queue type.(Alex) v3: make sure we hold the adev->userq_mutex when we call amdgpu_userq_detect_and_reset_queues. (Alex) warn if the adev->userq_mutex is not held. v4: make sure we have all of the uqm->userq_mutex held. warn if the uqm->userq_mutex is not held. v5: Use array for user queue type counters.(Alex) all of the uqm->userq_mutex need to be held when calling detect and reset. (Alex) v6: fix lock dep warning in amdgpu_userq_fence_dence_driver_process v7: add the queue types in an array and use a loop in amdgpu_userq_detect_and_reset_queues (Lijo) v8: remove atomic_set(&userq_mgr->userq_count[i], 0). it should already be 0 since we kzalloc the structure (Alex) v9: For consistency with kernel queues, We may want something like: amdgpu_userq_is_reset_type_supported (Alex) Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:05 -05:00
Mario Limonciello (AMD)	72b0b75d60	drm/amd: Unwind for failed device suspend If device suspend has failed, add a recovery flow that will attempt to unwind the suspend and get things back up and running. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4627 Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:04 -05:00
Mario Limonciello (AMD)	1d61121872	drm/amd: Add an unwind for failures in amdgpu_device_ip_suspend_phase2() If any hardware IPs involved with the second phase of suspend fail, unwind all steps to restore back to original state. Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:04 -05:00
Mario Limonciello (AMD)	6f4208f9d9	drm/amd: Add an unwind for failures in amdgpu_device_ip_suspend_phase1() If any hardware IPs involved with the first phase of suspend fail, unwind all steps to restore back to original state. Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:04 -05:00
Alex Deucher	960e30a61e	drm/amdgpu: Drop PMFW RLC notifier from amdgpu_device_suspend() For S3 on vangogh, PMFW needs to be notified before the driver powers down RLC. This already happens in smu_disable_dpms() so drop the superfluous call in amdgpu_device_suspend(). Co-developed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:52:48 -05:00
Asad Kamal	d80391dd03	drm/amdgpu: Remove invalidate and flush hdp macros Remove amdgpu_asic_flush_hdp & amdgpu_asic_invalidate_hdp functions and directly use the mapped ones Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:33:54 -05:00
Simona Vetter	f67d54e96b	Merge tag 'amd-drm-next-6.19-2025-10-29' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.19-2025-10-29: amdgpu: - VPE idle handler fix - Re-enable DM idle optimizations - DCN3.0 fix - SMU fix - Powerplay fixes for fiji/iceland - License copy-pasta fixes - HDP eDP panel fix - Vblank fix - RAS fixes - SR-IOV updates - SMU 13 VCN reset fix - DMUB fixes - DC frame limit fix - Additional DC underflow logging - DCN 3.1.5 fixes - DC Analog encoders support - Enable DC on bonaire by default - UserQ fixes - Remove redundant pm_runtime_mark_last_busy() calls amdkfd: - Process cleanup fix - Misc fixes radeon: - devm migration fixes - Remove redundant pm_runtime_mark_last_busy() calls UAPI - Add ABM KMS property Proposed kwin changes: https://invent.kde.org/plasma/kwin/-/merge_requests/6028 Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20251029205713.9480-1-alexander.deucher@amd.com	2025-10-31 22:08:24 +01:00
Simona Vetter	119348477d	Merge tag 'amd-drm-next-6.19-2025-10-24' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.19-2025-10-24: amdgpu: - HMM cleanup - Add new RAS framework - DML2.1 updates - YCbCr420 fixes - DC FP fixes - DMUB fixes - LTTPR fixes - DTBCLK fixes - DMU cursor offload handling - Userq validation improvements - Misc code cleanups - Unify shutdown callback handling - Suspend improvements - Power limit code cleanup - Fence cleanup - IP Discovery cleanup - SR-IOV fixes - AUX backlight fixes - DCN 3.5 fixes - HDMI compliance fixes - DCN 4.0.1 cursor updates - DCN interrupt fix - DC KMS full update improvements - Add additional HDCP traces - DCN 3.2 fixes - DP MST fixes - Add support for new SR-IOV mailbox interface Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20251024175249.58099-1-alexander.deucher@amd.com	2025-10-31 18:33:43 +01:00
Timur Kristóf	f44aad39b6	drm/amdgpu: Use DC by default for Bonaire Now that DC supports analog connectors, there is nothing stopping us from using it by default on Bonaire. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-28 10:10:13 -04:00
Jesse.Zhang	f18719ef4b	drm/amdgpu: Convert amdgpu userqueue management from IDR to XArray This commit refactors the AMDGPU userqueue management subsystem to replace IDR (ID Allocation) with XArray for improved performance, scalability, and maintainability. The changes address several issues with the previous IDR implementation and provide better locking semantics. Key changes: 1. Global XArray Introduction: - Added `userq_doorbell_xa` to `struct amdgpu_device` for global queue tracking - Uses doorbell_index as key for efficient global lookup - Replaces the previous `userq_mgr_list` linked list approach 2. Per-process XArray Conversion: - Replaced `userq_idr` with `userq_mgr_xa` in `struct amdgpu_userq_mgr` - Maintains per-process queue tracking with queue_id as key - Uses XA_FLAGS_ALLOC for automatic ID allocation 3. Locking Improvements: - Removed global `userq_mutex` from `struct amdgpu_device` - Replaced with fine-grained XArray locking using XArray's internal spinlocks 4. Runtime Idle Check Optimization: - Updated `amdgpu_runtime_idle_check_userq()` to use xa_empty 5. Queue Management Functions: - Converted all IDR operations to equivalent XArray functions: - `idr_alloc()` → `xa_alloc()` - `idr_find()` → `xa_load()` - `idr_remove()` → `xa_erase()` - `idr_for_each()` → `xa_for_each()` Benefits: - Performance: XArray provides better scalability for large numbers of queues - Memory Efficiency: Reduced memory overhead compared to IDR - Thread Safety: Improved locking semantics with XArray's internal spinlocks v2: rename userq_global_xa/userq_xa to userq_doorbell_xa/userq_mgr_xa Remove xa_lock and use its own lock. v3: Set queue->userq_mgr = uq_mgr in amdgpu_userq_create() v4: use xa_store_irq (Christian) hold the read side of the reset lock while creating/destroying queues and the manager data structure. (Chritian) Acked-by: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-28 09:59:22 -04:00
Simona Vetter	098456f314	Merge tag 'drm-misc-next-2025-10-21' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.19: UAPI Changes: amdxdna: - Support reading last hardware error Cross-subsystem Changes: dma-buf: - heaps: Create heap per CMA reserved location; Improve user-space documentation Core Changes: atomic: - Clean up and improve state-handling interfaces, update drivers bridge: - Improve ref counting buddy: - Optimize block management Driver Changes: amdxdna: - Fix runtime power management - Support firmware debug output ast: - Set quirks for each chip model atmel-hlcdc: - Set LCDC_ATTRE register in plane disable - Set correct values for plane scaler bochs: - Use vblank timer bridge: - synopsis: Support CEC; Init timer with correct frequency cirrus-qemu: - Use vblank timer imx: - Clean up ivu: - Update JSM API to 3.33.0 - Reset engine on more job errors - Return correct error codes for jobs komeda: - Use drm_ logging functions panel: - edp: Support AUO B116XAN02.0 panfrost: - Embed struct drm_driver in Panfrost device - Improve error handling - Clean up job handling panthor: - Support custom ASN_HASH for mt8196 renesas: - rz-du: Fix dependencies rockchip: - dsi: Add support for RK3368 - Fix LUT size for RK3386 sitronix: - Fix output position when clearing screens qaic: - Support dma-buf exports - Support new firmware's READ_DATA implementation - Replace kcalloc with memdup - Replace snprintf() with sysfs_emit() - Avoid overflows in arithmetics - Clean up - Fixes qxl: - Use vblank timer rockchip: - Clean up mode-setting code vgem: - Fix fence timer deadlock virtgpu: - Use vblank timer Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20251021111837.GA40643@linux.fritz.box	2025-10-24 13:25:20 +02:00
Ellen Pan	07009df649	drm/amdgpu: Introduce SRIOV critical regions v2 during VF init 1. Introduced amdgpu_virt_init_critical_region during VF init. - VFs use init_data_header_offset and init_data_header_size_kb transmitted via PF2VF mailbox to fetch the offset of critical regions' offsets/sizes in VRAM and save to adev->virt.crit_region_offsets and adev->virt.crit_region_sizes_kb. Signed-off-by: Ellen Pan <yunru.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-20 18:28:05 -04:00
Victor Zhao	6169b555db	drm/amdgpu: use GPU_HDP_FLUSH for sriov Currently SRIOV runtime will use kiq to write HDP_MEM_FLUSH_CNTL for hdp flush. This register need to be write from CPU for nbif to aware, otherwise it will not work. Implement amdgpu_kiq_hdp_flush and use kiq to do gpu hdp flush during sriov runtime. v2: - fallback to amdgpu_asic_flush_hdp when amdgpu_kiq_hdp_flush failed - add function amdgpu_mes_hdp_flush v3: - changed returned error Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-20 18:25:41 -04:00
Mario Limonciello	152dca4ea7	drm/amd: Add a helper to tell whether an IP block HW is enabled There is already a helper for telling if a block is valid, but if IP handling wants to check if it's HW is enabled no such helper exists. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-20 18:25:31 -04:00
Thomas Zimmermann	7910d69376	drm/client: Remove holds_console_lock parameter from suspend/resume No caller of the client resume/suspend helpers holds the console lock. The last such cases were removed from radeon in the patch series at [1]. Now remove the related parameter and the TODO items. v2: - update placeholders for CONFIG_DRM_CLIENT=n Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/series/151624/ # [1] Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Petr Vorel <pvorel@suse.cz> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Danilo Krummrich <dakr@kernel.org> Reviewed-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://lore.kernel.org/r/20251001143709.419736-1-tzimmermann@suse.de	2025-10-18 17:35:09 +02:00
Lijo Lazar	1cbac73d1a	drm/amdgpu: Move reset-on-init sequence earlier Complete reset-on-init sequence before sysfs interfaces are created. Devices get properly initiaized only after reset, and then only sysfs interfaces should be made available. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:35 -04:00
Lijo Lazar	9e2096baab	drm/amdgpu: Add amdgpu_discovery_info Add amdgpu_discovery_info structure to keep all discovery related information. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:35 -04:00
Lijo Lazar	aa6674f2da	drm/amdgpu: Reorganize sysfs ini/fini calls Aggregate sysfs ini/fini calls into separate functions. No functional change. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:35 -04:00
Alex Deucher	db36632ea5	drm/amdgpu: clean up and unify hw fence handling Decouple the amdgpu fence from the amdgpu_job structure. This lets us clean up the separate fence ops for the embedded fence and other fences. This also allows us to allocate the vm fence up front when we allocate the job. v2: Additional cleanup suggested by Christian v3: Additional cleanups suggested by Christian v4: Additional cleanups suggested by David and vm fence fix v5: cast seqno (David) Cc: David.Wu3@amd.com Cc: christian.koenig@amd.com Tested-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:35 -04:00
Mario Limonciello	1f3cca7794	drm/amd: Pass userq suspend failures up to caller If a userq failed to suspend the rest of the suspend sequence may have problems. Pass the error code up to the caller for a decision on what to do. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:34 -04:00
Mario Limonciello	173360fe49	drm/amd: Pass IP suspend errors up to callers If IP suspend fails the callers should be notified so that they can potentially react. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:34 -04:00
Mario Limonciello	b7ff2e7924	drm/amd: Don't always set IP block HW status to false amdgpu_device_ip_suspend_phase2() calls amdgpu_ip_block_suspend() which already sets HW block status to false when succeeding with IP suspend. Remove the explicit call in amdgpu_device_ip_suspend_phase2() so that the status is accurate. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:34 -04:00
Mario Limonciello	f35f254178	drm/amd: Remove comment about handling errors in amdgpu_device_ip_suspend_phase1() Error handling was introduced in commit `e095026f00` ("drm/amdgpu: validate suspend before function call") so the comment about TODO is no longer needed. Fixes: `e095026f00` ("drm/amdgpu: validate suspend before function call") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:34 -04:00
Mario Limonciello	6062ede680	drm/amd: Stop exporting amdgpu_device_ip_suspend() outside amdgpu_device amdgpu_device_ip_suspend() doesn't have a caller outside of amdgpu_device.c. Make it static. No intended functional changes. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:34 -04:00
Christian König	1bea57ea75	drm/amdgpu: reduce queue timeout to 2 seconds v2 There has been multiple complains that 10 seconds are usually to long. The original requirement for longer timeout came from compute tests on AMDVLK, since that is no longer a topic reduce the timeout back to 2 seconds for all queues. While at it also remove any special handling for compute queues under SRIOV or pass through. v2: fix checkpatch warning. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:33 -04:00
Timur Kristóf	7bdd91abf0	drm/amd: Disable ASPM on SI Enabling ASPM causes randoms hangs on Tahiti and Oland on Zen4. It's unclear if this is a platform-specific or GPU-specific issue. Disable ASPM on SI for the time being. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:14 -04:00
Lijo Lazar	2e97663760	drm/amdgpu: Report individual reset error If reinitialization of one of the GPUs fails after reset, it logs failure on all subsequent GPUs eventhough they have resumed successfully. A sample log where only device at 0000:95:00.0 had a failure - amdgpu 0000:15:00.0: amdgpu: GPU reset(19) succeeded! amdgpu 0000:65:00.0: amdgpu: GPU reset(19) succeeded! amdgpu 0000:75:00.0: amdgpu: GPU reset(19) succeeded! amdgpu 0000:85:00.0: amdgpu: GPU reset(19) succeeded! amdgpu 0000:95:00.0: amdgpu: GPU reset(19) failed amdgpu 0000:e5:00.0: amdgpu: GPU reset(19) failed amdgpu 0000:f5:00.0: amdgpu: GPU reset(19) failed amdgpu 0000:05:00.0: amdgpu: GPU reset(19) failed amdgpu 0000:15:00.0: amdgpu: GPU reset end with ret = -5 To avoid confusion, report the error for each device separately and return the first error as the overall result. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:19 -04:00
Lijo Lazar	9b608fe948	drm/amdgpu: Check swus/ds for switch state save For saving switch state, check if the GPU is having SWUS/DS architecture. Otherwise, skip saving. Reported-by: Roman Elshin <roman.elshin@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4602 Fixes: `1dd2fa0e00` ("drm/amdgpu: Save and restore switch state") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-07 14:09:19 -04:00
Alex Deucher	f8b367e6fa	drm/amdgpu: suspend KFD and KGD user queues for S0ix We need to make sure the user queues are preempted so GFX can enter gfxoff. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Tested-by: David Perry <david.perry@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-18 09:43:30 -04:00
Mario Limonciello (AMD)	531df041f2	drm/amd: Avoid evicting resources at S5 Normally resources are evicted on dGPUs at suspend or hibernate and on APUs at hibernate. These steps are unnecessary when using the S4 callbacks to put the system into S5. Cc: AceLan Kao <acelan.kao@canonical.com> Cc: Kai-Heng Feng <kaihengf@nvidia.com> Cc: Mark Pearson <mpearson-lenovo@squebb.ca> Cc: Denis Benato <benato.denis96@gmail.com> Cc: Merthan Karakaş <m3rthn.k@gmail.com> Tested-by: Eric Naim <dnaim@cachyos.org> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-15 17:02:39 -04:00

1 2 3 4 5 ...

1550 Commits