Commit ba0f4c4c authored by Dave Airlie's avatar Dave Airlie
Browse files

Merge tag 'nova-next-v6.17-2025-07-18' of https://gitlab.freedesktop.org/drm/nova into drm-next



Nova changes for v6.17

DMA:

  - Merge topic/dma-features-2025-06-23 from alloc tree.

    - Clarify wording and be consistent in 'coherent' nomenclature.

    - Convert the read!() / write!() macros to return a Result.

    - Add as_slice() / write() methods in CoherentAllocation.

    - Fix doc-comment of dma_handle().

    - Expose count() and size() in CoherentAllocation and add the
      corresponding type invariants.

    - Implement CoherentAllocation::dma_handle_with_offset().

nova-core:

  - Various register!() macro improvements.

  - Custom Sleep / Delay helpers (until the actual abstractions land).

  - Add DMA object abstraction.

  - VBIOS

    - Image parser / iterator.

    - PMU table look up in FWSEC.

    - FWSEC ucode extraction.

  - Register sysmem flush page.

  - Falcon

    - Generic falcon boot code and HAL (Ampere).

    - GSP / SEC2 specific code.

  - FWSEC-FRTS

    - Compute layout of FRTS region (FbLayout and HAL).

    - Load into GSP falcon and execute.

  - Add Documentation for VBIOS layout, Devinit process, Fwsec operation
    and layout, Falcon basics.

  - Update and annotate TODO list.

  - Add Alexandre Courbot as co-maintainer.

Rust:

  - Make ETIMEDOUT error available.

  - Add size constants up to SZ_2G.

Signed-off-by: default avatarDave Airlie <airlied@redhat.com>

From: "Danilo Krummrich" <dakr@kernel.org>
Link: https://lore.kernel.org/r/DBFKLDMUGZD9.Z93GN2N5B0FI@kernel.org
parents acab5fbd 14ae91a8
Loading
Loading
Loading
Loading
+61 −0
Original line number Diff line number Diff line
.. SPDX-License-Identifier: GPL-2.0

==================================
Device Initialization (devinit)
==================================
The devinit process is complex and subject to change. This document provides a high-level
overview using the Ampere GPU family as an example. The goal is to provide a conceptual
overview of the process to aid in understanding the corresponding kernel code.

Device initialization (devinit) is a crucial sequence of register read/write operations
that occur after a GPU reset. The devinit sequence is essential for properly configuring
the GPU hardware before it can be used.

The devinit engine is an interpreter program that typically runs on the PMU (Power Management
Unit) microcontroller of the GPU. This interpreter executes a "script" of initialization
commands. The devinit engine itself is part of the VBIOS ROM in the same ROM image as the
FWSEC (Firmware Security) image (see fwsec.rst and vbios.rst) and it runs before the
nova-core driver is even loaded. On an Ampere GPU, the devinit ucode is separate from the
FWSEC ucode. It is launched by FWSEC, which runs on the GSP in 'heavy-secure' mode, while
devinit runs on the PMU in 'light-secure' mode.

Key Functions of devinit
------------------------
devinit performs several critical tasks:

1. Programming VRAM memory controller timings
2. Power sequencing
3. Clock and PLL (Phase-Locked Loop) configuration
4. Thermal management

Low-level Firmware Initialization Flow
--------------------------------------
Upon reset, several microcontrollers on the GPU (such as PMU, SEC2, GSP, etc.) run GPU
firmware (gfw) code to set up the GPU and its core parameters. Most of the GPU is
considered unusable until this initialization process completes.

These low-level GPU firmware components are typically:

1. Located in the VBIOS ROM in the same ROM partition (see vbios.rst and fwsec.rst).
2. Executed in sequence on different microcontrollers:

  - The devinit engine typically but not necessarily runs on the PMU.
  - On an Ampere GPU, the FWSEC typically runs on the GSP (GPU System Processor) in
    heavy-secure mode.

Before the driver can proceed with further initialization, it must wait for a signal
indicating that core initialization is complete (known as GFW_BOOT). This signal is
asserted by the FWSEC running on the GSP in heavy-secure mode.

Runtime Considerations
----------------------
It's important to note that the devinit sequence also needs to run during suspend/resume
operations at runtime, not just during initial boot, as it is critical to power management.

Security and Access Control
---------------------------
The initialization process involves careful privilege management. For example, before
accessing certain completion status registers, the driver must check privilege level
masks. Some registers are only accessible after secure firmware (FWSEC) lowers the
privilege level to allow CPU (LS/low-secure) access. This is the case, for example,
when receiving the GFW_BOOT signal.
 No newline at end of file
+158 −0
Original line number Diff line number Diff line
.. SPDX-License-Identifier: GPL-2.0

==============================
Falcon (FAst Logic Controller)
==============================
The following sections describe the Falcon core and the ucode running on it.
The descriptions are based on the Ampere GPU or earlier designs; however, they
should mostly apply to future designs as well, but everything is subject to
change. The overview provided here is mainly tailored towards understanding the
interactions of nova-core driver with the Falcon.

NVIDIA GPUs embed small RISC-like microcontrollers called Falcon cores, which
handle secure firmware tasks, initialization, and power management. Modern
NVIDIA GPUs may have multiple such Falcon instances (e.g., GSP (the GPU system
processor) and SEC2 (the security engine)) and also may integrate a RISC-V core.
This core is capable of running both RISC-V and Falcon code.

The code running on the Falcon cores is also called 'ucode', and will be
referred to as such in the following sections.

Falcons have separate instruction and data memories (IMEM/DMEM) and provide a
small DMA engine (via the FBIF - "Frame Buffer Interface") to load code from
system memory. The nova-core driver must reset and configure the Falcon, load
its firmware via DMA, and start its CPU.

Falcon security levels
======================
Falcons can run in Non-secure (NS), Light Secure (LS), or Heavy Secure (HS)
modes.

Heavy Secured (HS) also known as Privilege Level 3 (PL3)
--------------------------------------------------------
HS ucode is the most trusted code and has access to pretty much everything on
the chip. The HS binary includes a signature in it which is verified at boot.
This signature verification is done by the hardware itself, thus establishing a
root of trust. For example, the FWSEC-FRTS command (see fwsec.rst) runs on the
GSP in HS mode. FRTS, which involves setting up and loading content into the WPR
(Write Protect Region), has to be done by the HS ucode and cannot be done by the
host CPU or LS ucode.

Light Secured (LS or PL2) and Non Secured (NS or PL0)
-----------------------------------------------------
These modes are less secure than HS. Like HS, the LS or NS ucode binary also
typically includes a signature in it. To load firmware in LS or NS mode onto a
Falcon, another Falcon needs to be running in HS mode, which also establishes the
root of trust. For example, in the case of an Ampere GPU, the CPU runs the "Booter"
ucode in HS mode on the SEC2 Falcon, which then authenticates and runs the
run-time GSP binary (GSP-RM) in LS mode on the GSP Falcon. Similarly, as an
example, after reset on an Ampere, FWSEC runs on the GSP which then loads the
devinit engine onto the PMU in LS mode.

Root of trust establishment
---------------------------
To establish a root of trust, the code running on a Falcon must be immutable and
hardwired into a read-only memory (ROM). This follows industry norms for
verification of firmware. This code is called the Boot ROM (BROM). The nova-core
driver on the CPU communicates with Falcon's Boot ROM through various Falcon
registers prefixed with "BROM" (see regs.rs).

After nova-core driver reads the necessary ucode from VBIOS, it programs the
BROM and DMA registers to trigger the Falcon to load the HS ucode from the system
memory into the Falcon's IMEM/DMEM. Once the HS ucode is loaded, it is verified
by the Falcon's Boot ROM.

Once the verified HS code is running on a Falcon, it can verify and load other
LS/NS ucode binaries onto other Falcons and start them. The process of signature
verification is the same as HS; just in this case, the hardware (BROM) doesn't
compute the signature, but the HS ucode does.

The root of trust is therefore established as follows:
     Hardware (Boot ROM running on the Falcon) -> HS ucode -> LS/NS ucode.

On an Ampere GPU, for example, the boot verification flow is:
     Hardware (Boot ROM running on the SEC2) ->
          HS ucode (Booter running on the SEC2) ->
               LS ucode (GSP-RM running on the GSP)

.. note::
     While the CPU can load HS ucode onto a Falcon microcontroller and have it
     verified by the hardware and run, the CPU itself typically does not load
     LS or NS ucode and run it. Loading of LS or NS ucode is done mainly by the
     HS ucode. For example, on an Ampere GPU, after the Booter ucode runs on the
     SEC2 in HS mode and loads the GSP-RM binary onto the GSP, it needs to run
     the "SEC2-RTOS" ucode at runtime. This presents a problem: there is no
     component to load the SEC2-RTOS ucode onto the SEC2. The CPU cannot load
     LS code, and GSP-RM must run in LS mode. To overcome this, the GSP is
     temporarily made to run HS ucode (which is itself loaded by the CPU via
     the nova-core driver using a "GSP-provided sequencer") which then loads
     the SEC2-RTOS ucode onto the SEC2 in LS mode. The GSP then resumes
     running its own GSP-RM LS ucode.

Falcon memory subsystem and DMA engine
======================================
Falcons have separate instruction and data memories (IMEM/DMEM)
and contains a small DMA engine called FBDMA (Framebuffer DMA) which does
DMA transfers to/from the IMEM/DMEM memory inside the Falcon via the FBIF
(Framebuffer Interface), to external memory.

DMA transfers are possible from the Falcon's memory to both the system memory
and the framebuffer memory (VRAM).

To perform a DMA via the FBDMA, the FBIF is configured to decide how the memory
is accessed (also known as aperture type). In the nova-core driver, this is
determined by the `FalconFbifTarget` enum.

The IO-PMP block (Input/Output Physical Memory Protection) unit in the Falcon
controls access by the FBDMA to the external memory.

Conceptual diagram (not exact) of the Falcon and its memory subsystem is as follows::

               External Memory (Framebuffer / System DRAM)
                              ^  |
                              |  |
                              |  v
     +-----------------------------------------------------+
     |                           |                         |
     |   +---------------+       |                         |
     |   |     FBIF      |-------+                         |  FALCON
     |   | (FrameBuffer  |   Memory Interface              |  PROCESSOR
     |   |  InterFace)   |                                 |
     |   |  Apertures    |                                 |
     |   |  Configures   |                                 |
     |   |  mem access   |                                 |
     |   +-------^-------+                                 |
     |           |                                         |
     |           | FBDMA uses configured FBIF apertures    |
     |           | to access External Memory
     |           |
     |   +-------v--------+      +---------------+
     |   |    FBDMA       |  cfg |     RISC      |
     |   | (FrameBuffer   |<---->|     CORE      |----->. Direct Core Access
     |   |  DMA Engine)   |      |               |      |
     |   | - Master dev.  |      | (can run both |      |
     |   +-------^--------+      | Falcon and    |      |
     |           |        cfg--->| RISC-V code)  |      |
     |           |        /      |               |      |
     |           |        |      +---------------+      |    +------------+
     |           |        |                             |    |   BROM     |
     |           |        |                             <--->| (Boot ROM) |
     |           |       /                              |    +------------+
     |           |      v                               |
     |   +---------------+                              |
     |   |    IO-PMP     | Controls access by FBDMA     |
     |   | (IO Physical  | and other IO Masters         |
     |   | Memory Protect)                              |
     |   +-------^-------+                              |
     |           |                                      |
     |           | Protected Access Path for FBDMA      |
     |           v                                      |
     |   +---------------------------------------+      |
     |   |       Memory                          |      |
     |   |   +---------------+  +------------+   |      |
     |   |   |    IMEM       |  |    DMEM    |   |<-----+
     |   |   | (Instruction  |  |   (Data    |   |
     |   |   |  Memory)      |  |   Memory)  |   |
     |   |   +---------------+  +------------+   |
     |   +---------------------------------------+
     +-----------------------------------------------------+
+181 −0
Original line number Diff line number Diff line
.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)

=========================
FWSEC (Firmware Security)
=========================
This document briefly/conceptually describes the FWSEC (Firmware Security) image
and its role in the GPU boot sequence. As such, this information is subject to
change in the future and is only current as of the Ampere GPU family. However,
hopefully the concepts described will be useful for understanding the kernel code
that deals with it. All the information is derived from publicly available
sources such as public drivers and documentation.

The role of FWSEC is to provide a secure boot process. It runs in
'Heavy-secure' mode, and performs firmware verification after a GPU reset
before loading various ucode images onto other microcontrollers on the GPU,
such as the PMU and GSP.

FWSEC itself is an application stored in the VBIOS ROM in the FWSEC partition of
ROM (see vbios.rst for more details). It contains different commands like FRTS
(Firmware Runtime Services) and SB (Secure Booting other microcontrollers after
reset and loading them with other non-FWSEC ucode). The kernel driver only needs
to perform FRTS, since Secure Boot (SB) has already completed by the time the driver
is loaded.

The FRTS command carves out the WPR2 region (Write protected region) which contains
data required for power management. Once setup, only HS mode ucode can access it
(see falcon.rst for privilege levels).

The FWSEC image is located in the VBIOS ROM in the partition of the ROM that contains
various ucode images (also known as applications) -- one of them being FWSEC. For how
it is extracted, see vbios.rst and the vbios.rs source code.

The Falcon data for each ucode images (including the FWSEC image) is a combination
of headers, data sections (DMEM) and instruction code sections (IMEM). All these
ucode images are stored in the same ROM partition and the PMU table is used to look
up the application to load it based on its application ID (see vbios.rs).

For the nova-core driver, the FWSEC contains an 'application interface' called
DMEMMAPPER. This interface is used to execute the 'FWSEC-FRTS' command, among others.
For Ampere, FWSEC is running on the GSP in Heavy-secure mode and runs FRTS.

FWSEC Memory Layout
-------------------
The memory layout of the FWSEC image is as follows::

   +---------------------------------------------------------------+
   |                         FWSEC ROM image (type 0xE0)           |
   |                                                               |
   |  +---------------------------------+                          |
   |  |     PMU Falcon Ucode Table      |                          |
   |  |     (PmuLookupTable)            |                          |
   |  |  +-------------------------+    |                          |
   |  |  | Table Header            |    |                          |
   |  |  | - version: 0x01         |    |                          |
   |  |  | - header_size: 6        |    |                          |
   |  |  | - entry_size: 6         |    |                          |
   |  |  | - entry_count: N        |    |                          |
   |  |  | - desc_version:3(unused)|    |                          |
   |  |  +-------------------------+    |                          |
   |  |         ...                     |                          |
   |  |  +-------------------------+    |                          |
   |  |  | Entry for FWSEC (0x85)  |    |                          |
   |  |  | (PmuLookupTableEntry)   |    |                          |
   |  |  | - app_id: 0x85 (FWSEC)  |----|----+                     |
   |  |  | - target_id: 0x01 (PMU) |    |    |                     |
   |  |  | - data: offset ---------|----|----|---+ look up FWSEC   |
   |  |  +-------------------------+    |    |   |                 |
   |  +---------------------------------+    |   |                 |
   |                                         |   |                 |
   |                                         |   |                 |
   |  +---------------------------------+    |   |                 |
   |  |     FWSEC Ucode Component       |<---+   |                 |
   |  |     (aka Falcon data)           |        |                 |
   |  |  +-------------------------+    |        |                 |
   |  |  | FalconUCodeDescV3       |<---|--------+                 |
   |  |  | - hdr                   |    |                          |
   |  |  | - stored_size           |    |                          |
   |  |  | - pkc_data_offset       |    |                          |
   |  |  | - interface_offset -----|----|----------------+         |
   |  |  | - imem_phys_base        |    |                |         |
   |  |  | - imem_load_size        |    |                |         |
   |  |  | - imem_virt_base        |    |                |         |
   |  |  | - dmem_phys_base        |    |                |         |
   |  |  | - dmem_load_size        |    |                |         |
   |  |  | - engine_id_mask        |    |                |         |
   |  |  | - ucode_id              |    |                |         |
   |  |  | - signature_count       |    |    look up sig |         |
   |  |  | - signature_versions --------------+          |         |
   |  |  +-------------------------+    |     |          |         |
   |  |         (no gap)                |     |          |         |
   |  |  +-------------------------+    |     |          |         |
   |  |  | Signatures Section      |<---|-----+          |         |
   |  |  | (384 bytes per sig)     |    |                |         |
   |  |  | - RSA-3K Signature 1    |    |                |         |
   |  |  | - RSA-3K Signature 2    |    |                |         |
   |  |  |   ...                   |    |                |         |
   |  |  +-------------------------+    |                |         |
   |  |                                 |                |         |
   |  |  +-------------------------+    |                |         |
   |  |  | IMEM Section (Code)     |    |                |         |
   |  |  |                         |    |                |         |
   |  |  | Contains instruction    |    |                |         |
   |  |  | code etc.               |    |                |         |
   |  |  +-------------------------+    |                |         |
   |  |                                 |                |         |
   |  |  +-------------------------+    |                |         |
   |  |  | DMEM Section (Data)     |    |                |         |
   |  |  |                         |    |                |         |
   |  |  | +---------------------+ |    |                |         |
   |  |  | | Application         | |<---|----------------+         |
   |  |  | | Interface Table     | |    |                          |
   |  |  | | (FalconAppifHdrV1)  | |    |                          |
   |  |  | | Header:             | |    |                          |
   |  |  | | - version: 0x01     | |    |                          |
   |  |  | | - header_size: 4    | |    |                          |
   |  |  | | - entry_size: 8     | |    |                          |
   |  |  | | - entry_count: N    | |    |                          |
   |  |  | |                     | |    |                          |
   |  |  | | Entries:            | |    |                          |
   |  |  | | +-----------------+ | |    |                          |
   |  |  | | | DEVINIT (ID 1)  | | |    |                          |
   |  |  | | | - id: 0x01      | | |    |                          |
   |  |  | | | - dmemOffset X -|-|-|----+                          |
   |  |  | | +-----------------+ | |    |                          |
   |  |  | | +-----------------+ | |    |                          |
   |  |  | | | DMEMMAPPER(ID 4)| | |    |                          |
   |  |  | | | - id: 0x04      | | |    | Used only for DevInit    |
   |  |  | | |  (NVFW_FALCON_  | | |    | application (not FWSEC)  |
   |  |  | | |   APPIF_ID_DMEMMAPPER)   |                          |
   |  |  | | | - dmemOffset Y -|-|-|----|-----+                    |
   |  |  | | +-----------------+ | |    |     |                    |
   |  |  | +---------------------+ |    |     |                    |
   |  |  |                         |    |     |                    |
   |  |  | +---------------------+ |    |     |                    |
   |  |  | | DEVINIT Engine      |<|----+     | Used by FWSEC      |
   |  |  | | Interface           | |    |     |         app.       |
   |  |  | +---------------------+ |    |     |                    |
   |  |  |                         |    |     |                    |
   |  |  | +---------------------+ |    |     |                    |
   |  |  | | DMEM Mapper (ID 4)  |<|----+-----+                    |
   |  |  | | (FalconAppifDmemmapperV3)  |                          |
   |  |  | | - signature: "DMAP" | |    |                          |
   |  |  | | - version: 0x0003   | |    |                          |
   |  |  | | - Size: 64 bytes    | |    |                          |
   |  |  | | - cmd_in_buffer_off | |----|------------+             |
   |  |  | | - cmd_in_buffer_size| |    |            |             |
   |  |  | | - cmd_out_buffer_off| |----|------------|-----+       |
   |  |  | | - cmd_out_buffer_sz | |    |            |     |       |
   |  |  | | - init_cmd          | |    |            |     |       |
   |  |  | | - features          | |    |            |     |       |
   |  |  | | - cmd_mask0/1       | |    |            |     |       |
   |  |  | +---------------------+ |    |            |     |       |
   |  |  |                         |    |            |     |       |
   |  |  | +---------------------+ |    |            |     |       |
   |  |  | | Command Input Buffer|<|----|------------+     |       |
   |  |  | | - Command data      | |    |                  |       |
   |  |  | | - Arguments         | |    |                  |       |
   |  |  | +---------------------+ |    |                  |       |
   |  |  |                         |    |                  |       |
   |  |  | +---------------------+ |    |                  |       |
   |  |  | | Command Output      |<|----|------------------+       |
   |  |  | | Buffer              | |    |                          |
   |  |  | | - Results           | |    |                          |
   |  |  | | - Status            | |    |                          |
   |  |  | +---------------------+ |    |                          |
   |  |  +-------------------------+    |                          |
   |  +---------------------------------+                          |
   |                                                               |
   +---------------------------------------------------------------+

.. note::
   This is using an GA-102 Ampere GPU as an example and could vary for future GPUs.

.. note::
   The FWSEC image also plays a role in memory scrubbing (ECC initialization) and VPR
   (Video Protected Region) initialization as well. Before the nova-core driver is even
   loaded, the FWSEC image is running on the GSP in heavy-secure mode. After the devinit
   sequence completes, it does VRAM memory scrubbing (ECC initialization). On consumer
   GPUs, it scrubs only part of memory and then initiates 'async scrubbing'. Before this
   async scrubbing completes, the unscrubbed VRAM cannot be used for allocation (thus DRM
   memory allocators need to wait for this scrubbing to complete).
+57 −50

File changed.

Preview size limit exceeded, changes collapsed.

+181 −0

File added.

Preview size limit exceeded, changes collapsed.

Loading