Commit c34f9868 authored by Lucas De Marchi's avatar Lucas De Marchi
Browse files

drm/xe: Update workaround documentation



Bring it up to reality, better documenting the existing batch buffers,
OOB rules and fixing some typos.

Bspec: 60122
Reviewed-by: default avatarStuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250911-wa-bb-cmds-v4-1-c8f7e48f7eae@intel.com


Signed-off-by: default avatarLucas De Marchi <lucas.demarchi@intel.com>
parent 4e1d3b5e
Loading
Loading
Loading
Loading
+32 −13
Original line number Diff line number Diff line
@@ -39,7 +39,8 @@
 *   Register Immediate commands) once when initializing the device and saved in
 *   the default context. That default context is then used on every context
 *   creation to have a "primed golden context", i.e. a context image that
 *   already contains the changes needed to all the registers.
 *   already contains the changes needed to all the registers. See
 *   drivers/gpu/drm/xe/xe_lrc.c for default context handling.
 *
 * - Engine workarounds: the list of these WAs is applied whenever the specific
 *   engine is reset. It's also possible that a set of engine classes share a
@@ -48,10 +49,10 @@
 *   them need to keeep the workaround programming: the approach taken in the
 *   driver is to tie those workarounds to the first compute/render engine that
 *   is registered.  When executing with GuC submission, engine resets are
 *   outside of kernel driver control, hence the list of registers involved in
 *   outside of kernel driver control, hence the list of registers involved is
 *   written once, on engine initialization, and then passed to GuC, that
 *   saves/restores their values before/after the reset takes place. See
 *   ``drivers/gpu/drm/xe/xe_guc_ads.c`` for reference.
 *   drivers/gpu/drm/xe/xe_guc_ads.c for reference.
 *
 * - GT workarounds: the list of these WAs is applied whenever these registers
 *   revert to their default values: on GPU reset, suspend/resume [1]_, etc.
@@ -66,21 +67,39 @@
 *   hardware on every HW context restore. These buffers are created and
 *   programmed in the default context so the hardware always go through those
 *   programming sequences when switching contexts. The support for workaround
 *   batchbuffers is enabled these hardware mechanisms:
 *   batchbuffers is enabled via these hardware mechanisms:
 *
 *   #. INDIRECT_CTX: A batchbuffer and an offset are provided in the default
 *      context, pointing the hardware to jump to that location when that offset
 *      is reached in the context restore. Workaround batchbuffer in the driver
 *      currently uses this mechanism for all platforms.
 *   #. INDIRECT_CTX (also known as **mid context restore bb**): A batchbuffer
 *      and an offset are provided in the default context, pointing the hardware
 *      to jump to that location when that offset is reached in the context
 *      restore.  When a context is being restored, this is executed after the
 *      ring context, in the middle (or beginning) of the engine context image.
 *
 *   #. BB_PER_CTX_PTR: A batchbuffer is provided in the default context,
 *      pointing the hardware to a buffer to continue executing after the
 *      engine registers are restored in a context restore sequence. This is
 *      currently not used in the driver.
 *   #. BB_PER_CTX_PTR (also known as **post context restore bb**): A
 *      batchbuffer is provided in the default context, pointing the hardware to
 *      a buffer to continue executing after the engine registers are restored
 *      in a context restore sequence.
 *
 *   Below is the timeline for a context restore sequence:
 *
 *   .. code::
 *
 *                        INDIRECT_CTX_OFFSET
 *                   |----------->|
 *      .------------.------------.-------------.------------.--------------.-----------.
 *      |Ring        | Engine     | Mid-context | Engine     | Post-context | Ring      |
 *      |Restore     | Restore (1)| BB Restore  | Restore (2)| BB Restore   | Execution |
 *      `------------'------------'-------------'------------'--------------'-----------'
 *
 * - Other/OOB:  There are WAs that, due to their nature, cannot be applied from
 *   a central place. Those are peppered around the rest of the code, as needed.
 *   Workarounds related to the display IP are the main example.
 *   There's a central place to control which workarounds are enabled:
 *   drivers/gpu/drm/xe/xe_wa_oob.rules for GT workarounds and
 *   drivers/gpu/drm/xe/xe_device_wa_oob.rules for device/SoC workarounds.
 *   These files only record which workarounds are enabled: during early device
 *   initialization those rules are evaluated and recorded by the driver. Then
 *   later the driver checks with ``XE_GT_WA()`` and ``XE_DEVICE_WA()`` to
 *   implement them.
 *
 * .. [1] Technically, some registers are powercontext saved & restored, so they
 *    survive a suspend/resume. In practice, writing them again is not too