Commit 1faeeb31 authored by Dave Airlie's avatar Dave Airlie
Browse files

Merge tag 'amd-drm-next-6.16-2025-05-09' of...

Merge tag 'amd-drm-next-6.16-2025-05-09' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.16-2025-05-09:

amdgpu:
- IPS fixes
- DSC cleanup
- DC Scaling updates
- DC FP fixes
- Fused I2C-over-AUX updates
- SubVP fixes
- Freesync fix
- DMUB AUX fixes
- VCN fix
- Hibernation fixes
- HDP fixes
- DCN 2.1 fixes
- DPIA fixes
- DMUB updates
- Use drm_file_err in amdgpu
- Enforce isolation updates
- Use new dma_fence helpers
- USERQ fixes
- Documentation updates
- Misc code cleanups
- SR-IOV updates
- RAS updates
- PSP 12 cleanups

amdkfd:
- Update error messages for SDMA
- Userptr updates

drm:
- Add drm_file_err function

dma-buf:
- Add a helper to sort and deduplicate dma_fence arrays

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250509230951.3871914-1-alexander.deucher@amd.com


Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
parents f9fa0122 afc6053d
Loading
Loading
Loading
Loading
+23 −0
Original line number Diff line number Diff line
=================================================
 AMD Hardware Components Information per Product
=================================================

On this page, you can find the AMD product name and which component version is
part of it.

Accelerated Processing Units (APU) Info
---------------------------------------

.. csv-table::
   :header-rows: 1
   :widths: 3, 2, 2, 1, 1, 1, 1
   :file: ./apu-asic-info-table.csv

Discrete GPU Info
-----------------

.. csv-table::
   :header-rows: 1
   :widths: 3, 2, 2, 1, 1, 1
   :file: ./dgpu-asic-info-table.csv
+75 −0
Original line number Diff line number Diff line
@@ -12,18 +12,39 @@ we have a dedicated glossary for Display Core at
      The number of CUs that are active on the system.  The number of active
      CUs may be less than SE * SH * CU depending on the board configuration.

    BACO
      Bus Alive, Chip Off

    BOCO
      Bus Off, Chip Off

    CE
      Constant Engine

    CIK
      Sea Islands

    CB
      Color Buffer

    CP
      Command Processor

    CPLIB
      Content Protection Library

    CS
      Command Submission

    CSB
      Clear State Indirect Buffer

    CU
      Compute Unit

    DB
      Depth Buffer

    DFS
      Digital Frequency Synthesizer

@@ -33,6 +54,9 @@ we have a dedicated glossary for Display Core at
    EOP
      End Of Pipe/Pipeline

    FLR
      Function Level Reset

    GART
      Graphics Address Remapping Table.  This is the name we use for the GPUVM
      page table used by the GPU kernel driver.  It remaps system resources
@@ -45,6 +69,12 @@ we have a dedicated glossary for Display Core at
    GC
      Graphics and Compute

    GDS
      Global Data Share

    GE
      Geometry Engine

    GMC
      Graphic Memory Controller

@@ -80,6 +110,9 @@ we have a dedicated glossary for Display Core at
    KCQ
      Kernel Compute Queue

    KFD
      Kernel Fusion Driver

    KGQ
      Kernel Graphics Queue

@@ -89,6 +122,9 @@ we have a dedicated glossary for Display Core at
    MC
      Memory Controller

    MCBP
      Mid Command Buffer Preemption

    ME
      MicroEngine (Graphics)

@@ -104,6 +140,9 @@ we have a dedicated glossary for Display Core at
    MQD
      Memory Queue Descriptor

    PA
      Primitive Assembler / Physical Address

    PFP
      Pre-Fetch Parser (Graphics)

@@ -113,24 +152,39 @@ we have a dedicated glossary for Display Core at
    PSP
        Platform Security Processor

    RB
      Render Backends. Some people called it ROPs.

    RLC
      RunList Controller. This name is a remnant of past ages and doesn't have
      much meaning today. It's a group of general-purpose helper engines for
      the GFX block. It's involved in GFX power management and SR-IOV, among
      other things.

    SC
      Scan Converter

    SDMA
      System DMA

    SE
      Shader Engine

    SGPR
      Scalar General-Purpose Registers

    SH
      SHader array

    SI
      Southern Islands

    SMU/SMC
      System Management Unit / System Management Controller

    SPI
      Shader Processor Input

    SRLC
      Save/Restore List Control

@@ -143,12 +197,21 @@ we have a dedicated glossary for Display Core at
    SS
      Spread Spectrum

    SX
      Shader Export

    TA
      Trusted Application

    TC
      Texture Cache

    TOC
      Table of Contents

    UMSCH
      User Mode Scheduler

    UVD
      Unified Video Decoder

@@ -158,5 +221,17 @@ we have a dedicated glossary for Display Core at
    VCN
      Video Codec Next

    VGPR
      Vector General-Purpose Registers

    VMID
      Virtual Memory ID

    VPE
      Video Processing Engine

    XCC
      Accelerator Core Complex

    XCP
      Accelerator Core Partition
+2 −0
Original line number Diff line number Diff line
@@ -13,3 +13,5 @@ Ryzen 7x20 series, Mendocino, 3.1.6, 10.3.7, 3.1.1, 5.2.7, 13.0.8
Ryzen 7x40 series, Phoenix, 3.1.4, 11.0.1 / 11.0.4, 4.0.2, 6.0.1, 13.0.4 / 13.0.11
Ryzen 8x40 series, Hawk Point, 3.1.4, 11.0.1 / 11.0.4, 4.0.2, 6.0.1, 13.0.4 / 13.0.11
Ryzen AI 300 series, Strix Point, 3.5.0, 11.5.0, 4.0.5, 6.1.0, 14.0.0
Ryzen AI 350 series, Krackan Point, 3.5.0, 11.5.2, 4.0.5, 6.1.2, 14.0.4
Ryzen AI Max 300 series, Strix Halo, 3.5.1, 11.5.1, 4.0.6, 6.1.1, 14.0.1
+210 −0
Original line number Diff line number Diff line
==============
AMDGPU DebugFS
==============

The amdgpu driver provides a number of debugfs files to aid in debugging
issues in the driver.  These are usually found in
/sys/kernel/debug/dri/<num>.

DebugFS Files
=============

amdgpu_benchmark
----------------

Run benchmarks using the DMA engine the driver uses for GPU memory paging.
Write a number to the file to run the test.  The results are written to the
kernel log.  VRAM is on device memory (dGPUs) or carve out (APUs) and GTT
(Graphics Translation Tables) is system memory that is accessible by the GPU.
The following tests are available:

- 1: simple test, VRAM to GTT and GTT to VRAM
- 2: simple test, VRAM to VRAM
- 3: GTT to VRAM, buffer size sweep, powers of 2
- 4: VRAM to GTT, buffer size sweep, powers of 2
- 5: VRAM to VRAM, buffer size sweep, powers of 2
- 6: GTT to VRAM, buffer size sweep, common display sizes
- 7: VRAM to GTT, buffer size sweep, common display sizes
- 8: VRAM to VRAM, buffer size sweep, common display sizes

amdgpu_test_ib
--------------

Read this file to run simple IB (Indirect Buffer) tests on all kernel managed
rings.  IBs are command buffers usually generated by userspace applications
which are submitted to the kernel for execution on an particular GPU engine.
This just runs the simple IB tests included in the kernel.  These tests
are engine specific and verify that IB submission works.

amdgpu_discovery
----------------

Provides raw access to the IP discovery binary provided by the GPU.  Read this
file to access the raw binary.  This is useful for verifying the contents of
the IP discovery table.  It is chip specific.

amdgpu_vbios
------------

Provides raw access to the ROM binary image from the GPU.  Read this file to
access the raw binary.  This is useful for verifying the contents of the
video BIOS ROM.  It is board specific.

amdgpu_evict_gtt
----------------

Evict all buffers from the GTT memory pool.  Read this file to evict all
buffers from this pool.

amdgpu_evict_vram
-----------------

Evict all buffers from the VRAM memory pool.  Read this file to evict all
buffers from this pool.

amdgpu_gpu_recover
------------------

Trigger a GPU reset.  Read this file to trigger reset the entire GPU.
All work currently running  on the GPU will be lost.

amdgpu_ring_<name>
------------------

Provides read access to the kernel managed ring buffers for each ring <name>.
These are useful for debugging problems on a particular ring.  The ring buffer
is how the CPU sends commands to the GPU.  The CPU writes commands into the
buffer and then asks the GPU engine to process it.  This is the raw binary
contents of the ring buffer.  Use a tool like UMR to decode the rings into human
readable form.

amdgpu_mqd_<name>
-----------------

Provides read access to the kernel managed MQD (Memory Queue Descriptor) for
ring <name> managed by the kernel driver.  MQDs define the features of the ring
and are used to store the ring's state when it is not connected to hardware.
The driver writes the requested ring features and metadata (GPU addresses of
the ring itself and associated buffers) to the MQD and the firmware uses the MQD
to populate the hardware when the ring is mapped to a hardware slot.  Only
available on engines which use MQDs.  This provides access to the raw MQD
binary.

amdgpu_error_<name>
-------------------

Provides an interface to set an error code on the dma fences associated with
ring <name>.  The error code specified is propogated to all fences associated
with the ring.  Use this to inject a fence error into a ring.

amdgpu_pm_info
--------------

Provides human readable information about the power management features
and state of the GPU.  This includes current GFX clock, Memory clock,
voltages, average SoC power, temperature, GFX load, Memory load, SMU
feature mask, VCN power state, clock and power gating features.

amdgpu_firmware_info
--------------------

Lists the firmware versions for all firmwares used by the GPU.  Only
entries with a non-0 version are valid.  If the version is 0, the firmware
is not valid for the GPU.

amdgpu_fence_info
-----------------

Shows the last signalled and emitted fence sequence numbers for each
kernel driver managed ring.  Fences are associated with submissions
to the engine.  Emitted fences have been submitted to the ring
and signalled fences have been signalled by the GPU.  Rings with a
larger emitted fence value have outstanding work that is still being
processed by the engine that owns that ring.  When the emitted and
signalled fence values are equal, the ring is idle.

amdgpu_gem_info
---------------

Lists all of the PIDs using the GPU and the GPU buffers that they have
allocated.  This lists the buffer size, pool (VRAM, GTT, etc.), and buffer
attributes (CPU access required, CPU cache attributes, etc.).

amdgpu_vm_info
--------------

Lists all of the PIDs using the GPU and the GPU buffers that they have
allocated as well as the status of those buffers relative to that process'
GPU virtual address space (e.g., evicted, idle, invalidated, etc.).

amdgpu_sa_info
--------------

Prints out all of the suballocations (sa) by the suballocation manager in the
kernel driver.  Prints the GPU address, size, and fence info associated
with each suballocation.  The suballocations are used internally within
the kernel driver for various things.

amdgpu_<pool>_mm
----------------

Prints TTM information about the memory pool <pool>.

amdgpu_vram
-----------

Provides direct access to VRAM.  Used by tools like UMR to inspect
objects in VRAM.

amdgpu_iomem
------------

Provides direct access to GTT memory.  Used by tools like UMR to inspect
GTT memory.

amdgpu_regs_*
-------------

Provides direct access to various register aperatures on the GPU.  Used
by tools like UMR to access GPU registers.

amdgpu_regs2
------------

Provides an IOCTL interface used by UMR for interacting with GPU registers.


amdgpu_sensors
--------------

Provides an interface to query GPU power metrics (temperature, average
power, etc.).  Used by tools like UMR to query GPU power metrics.


amdgpu_gca_config
-----------------

Provides an interface to query GPU details (Graphics/Compute Array config,
PCI config, GPU family, etc.).  Used by tools like UMR to query GPU details.

amdgpu_wave
-----------

Used to query GFX/compute wave information from the hardware.  Used by tools
like UMR to query GFX/compute wave information.

amdgpu_gpr
----------

Used to query GFX/compute GPR (General Purpose Register) information from the
hardware.  Used by tools like UMR to query GPRs when debugging shaders.

amdgpu_gprwave
--------------

Provides an IOCTL interface used by UMR for interacting with shader waves.

amdgpu_fw_attestation
---------------------

Provides an interface for reading back firmware attestation records.
+7 −0
Original line number Diff line number Diff line
@@ -2,6 +2,13 @@
 GPU Debugging
===============

General Debugging Options
=========================

The DebugFS section provides documentation on a number files to aid in debugging
issues on the GPU.


GPUVM Debugging
===============

Loading