Commit 7b91683e authored by Dave Airlie's avatar Dave Airlie
Browse files

Merge tag 'drm-misc-next-2025-02-20' of...

Merge tag 'drm-misc-next-2025-02-20' of https://gitlab.freedesktop.org/drm/misc/kernel

 into drm-next

drm-misc-next for v6.15:

UAPI Changes:

device-wedged events:
- Let's drivers notify userspace of hung-up devices via uevent

Cross-subsystem Changes:

media:
- cec: tda998x: Import driver from DRM

Core Changes:

- Cleanups

atomic-helper:
- async-flip: Support on arbitrary planes
- writeback: Fix use-after-free error
- Document atomic-state history
- Pleanty of cleanups to callback parameter names

doc:
- Test for kernel-doc errors

format-helper:
- Support ARGB8888-to-ARGB4444 pixel-format conversion

panel-orientation-quirks:
- Add quirks for AYANEO 2S, AYA NEO Flip DS and KB, AYA NEO Slide, GPD Win 2,
  OneXPlayer Mini (Intel)

sched:
- Add parameter struct for init

Driver Changes:

amdgpu:
- Support device-wedged event
- Support async pageflips on overlay planes

amdxdna:
- Refactoring

ast:
- Refactor cursor handling

bridge:
- Pass full atomic state to various callbacks
- analogix-dp: Cleanups
- cdns-mhdp8546: Fix clock enable/disable
- nwl-dsi: Set bridge type
- panel: Cleanups
- ti-sn65dsi83: Add error recovery; Set bridge type

i2c:
- tda998x: Drop unused platform_data; Split driver into separate media and bridge drivers
- Remove the obsolete directory

i915:
- Support device-wedged event

nouveau:
- Fixes

panel:
- visionox-r66451: Use multi-style MIPI-DSI functions

v3d:
- Handle clock

vkms:
- Fix use-after-free error

xe:
- Support device-wedged event

xlnx:
- Use mutex guards
- Cleanups

Signed-off-by: default avatarDave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20250220085321.GA184551@linux.fritz.box
parents 0ed1356a e82e1a0c
Loading
Loading
Loading
Loading
+3 −0
Original line number Diff line number Diff line
@@ -35,6 +35,9 @@ properties:
  vcc-supply:
    description: A 1.8V power supply (see regulator/regulator.yaml).

  interrupts:
    maxItems: 1

  ports:
    $ref: /schemas/graph.yaml#/properties/ports

+113 −3
Original line number Diff line number Diff line
@@ -371,9 +371,119 @@ Reporting causes of resets

Apart from propagating the reset through the stack so apps can recover, it's
really useful for driver developers to learn more about what caused the reset in
the first place. DRM devices should make use of devcoredump to store relevant
information about the reset, so this information can be added to user bug
reports.
the first place. For this, drivers can make use of devcoredump to store relevant
information about the reset and send device wedged event with ``none`` recovery
method (as explained in "Device Wedging" chapter) to notify userspace, so this
information can be collected and added to user bug reports.

Device Wedging
==============

Drivers can optionally make use of device wedged event (implemented as
drm_dev_wedged_event() in DRM subsystem), which notifies userspace of 'wedged'
(hanged/unusable) state of the DRM device through a uevent. This is useful
especially in cases where the device is no longer operating as expected and has
become unrecoverable from driver context. Purpose of this implementation is to
provide drivers a generic way to recover the device with the help of userspace
intervention, without taking any drastic measures (like resetting or
re-enumerating the full bus, on which the underlying physical device is sitting)
in the driver.

A 'wedged' device is basically a device that is declared dead by the driver
after exhausting all possible attempts to recover it from driver context. The
uevent is the notification that is sent to userspace along with a hint about
what could possibly be attempted to recover the device from userspace and bring
it back to usable state. Different drivers may have different ideas of a
'wedged' device depending on hardware implementation of the underlying physical
device, and hence the vendor agnostic nature of the event. It is up to the
drivers to decide when they see the need for device recovery and how they want
to recover from the available methods.

Driver prerequisites
--------------------

The driver, before opting for recovery, needs to make sure that the 'wedged'
device doesn't harm the system as a whole by taking care of the prerequisites.
Necessary actions must include disabling DMA to system memory as well as any
communication channels with other devices. Further, the driver must ensure
that all dma_fences are signalled and any device state that the core kernel
might depend on is cleaned up. All existing mmaps should be invalidated and
page faults should be redirected to a dummy page. Once the event is sent, the
device must be kept in 'wedged' state until the recovery is performed. New
accesses to the device (IOCTLs) should be rejected, preferably with an error
code that resembles the type of failure the device has encountered. This will
signify the reason for wedging, which can be reported to the application if
needed.

Recovery
--------

Current implementation defines three recovery methods, out of which, drivers
can use any one, multiple or none. Method(s) of choice will be sent in the
uevent environment as ``WEDGED=<method1>[,..,<methodN>]`` in order of less to
more side-effects. If driver is unsure about recovery or method is unknown
(like soft/hard system reboot, firmware flashing, physical device replacement
or any other procedure which can't be attempted on the fly), ``WEDGED=unknown``
will be sent instead.

Userspace consumers can parse this event and attempt recovery as per the
following expectations.

    =============== ========================================
    Recovery method Consumer expectations
    =============== ========================================
    none            optional telemetry collection
    rebind          unbind + bind driver
    bus-reset       unbind + bus reset/re-enumeration + bind
    unknown         consumer policy
    =============== ========================================

The only exception to this is ``WEDGED=none``, which signifies that the device
was temporarily 'wedged' at some point but was recovered from driver context
using device specific methods like reset. No explicit recovery is expected from
the consumer in this case, but it can still take additional steps like gathering
telemetry information (devcoredump, syslog). This is useful because the first
hang is usually the most critical one which can result in consequential hangs or
complete wedging.

Consumer prerequisites
----------------------

It is the responsibility of the consumer to make sure that the device or its
resources are not in use by any process before attempting recovery. With IOCTLs
erroring out, all device memory should be unmapped and file descriptors should
be closed to prevent leaks or undefined behaviour. The idea here is to clear the
device of all user context beforehand and set the stage for a clean recovery.

Example
-------

Udev rule::

    SUBSYSTEM=="drm", ENV{WEDGED}=="rebind", DEVPATH=="*/drm/card[0-9]",
    RUN+="/path/to/rebind.sh $env{DEVPATH}"

Recovery script::

    #!/bin/sh

    DEVPATH=$(readlink -f /sys/$1/device)
    DEVICE=$(basename $DEVPATH)
    DRIVER=$(readlink -f $DEVPATH/driver)

    echo -n $DEVICE > $DRIVER/unbind
    echo -n $DEVICE > $DRIVER/bind

Customization
-------------

Although basic recovery is possible with a simple script, consumers can define
custom policies around recovery. For example, if the driver supports multiple
recovery methods, consumers can opt for the suitable one depending on scenarios
like repeat offences or vendor specific failures. Consumers can also choose to
have the device available for debugging or telemetry collection and base their
recovery decision on the findings. This is useful especially when the driver is
unsure about recovery or method is unknown.

.. _drm_driver_ioctl:

+1 −0
Original line number Diff line number Diff line
@@ -97,3 +97,4 @@ obj-$(CONFIG_SAMPLES) += samples/
obj-$(CONFIG_NET)	+= net/
obj-y			+= virt/
obj-y			+= $(ARCH_DRIVERS)
obj-$(CONFIG_DRM_HEADER_TEST)	+= include/
+3 −2
Original line number Diff line number Diff line
@@ -8007,6 +8007,8 @@ F: include/drm/drm_privacy_screen*
DRM TTM SUBSYSTEM
M:	Christian Koenig <christian.koenig@amd.com>
M:	Huang Rui <ray.huang@amd.com>
R:	Matthew Auld <matthew.auld@intel.com>
R:	Matthew Brost <matthew.brost@intel.com>
L:	dri-devel@lists.freedesktop.org
S:	Maintained
T:	git https://gitlab.freedesktop.org/drm/misc/kernel.git
@@ -17120,8 +17122,7 @@ M: Russell King <linux@armlinux.org.uk>
S:	Maintained
T:	git git://git.armlinux.org.uk/~rmk/linux-arm.git drm-tda998x-devel
T:	git git://git.armlinux.org.uk/~rmk/linux-arm.git drm-tda998x-fixes
F:	drivers/gpu/drm/i2c/tda998x_drv.c
F:	include/drm/i2c/tda998x.h
F:	drivers/gpu/drm/bridge/tda998x_drv.c
F:	include/dt-bindings/display/tda998x.h
K:	"nxp,tda998x"
+25 −16
Original line number Diff line number Diff line
@@ -34,6 +34,8 @@ static void aie2_job_release(struct kref *ref)

	job = container_of(ref, struct amdxdna_sched_job, refcnt);
	amdxdna_sched_job_cleanup(job);
	atomic64_inc(&job->hwctx->job_free_cnt);
	wake_up(&job->hwctx->priv->job_free_wq);
	if (job->out_fence)
		dma_fence_put(job->out_fence);
	kfree(job);
@@ -134,7 +136,8 @@ static void aie2_hwctx_wait_for_idle(struct amdxdna_hwctx *hwctx)
	if (!fence)
		return;

	dma_fence_wait(fence, false);
	/* Wait up to 2 seconds for fw to finish all pending requests */
	dma_fence_wait_timeout(fence, false, msecs_to_jiffies(2000));
	dma_fence_put(fence);
}

@@ -516,6 +519,14 @@ int aie2_hwctx_init(struct amdxdna_hwctx *hwctx)
{
	struct amdxdna_client *client = hwctx->client;
	struct amdxdna_dev *xdna = client->xdna;
	const struct drm_sched_init_args args = {
		.ops = &sched_ops,
		.num_rqs = DRM_SCHED_PRIORITY_COUNT,
		.credit_limit = HWCTX_MAX_CMDS,
		.timeout = msecs_to_jiffies(HWCTX_MAX_TIMEOUT),
		.name = hwctx->name,
		.dev = xdna->ddev.dev,
	};
	struct drm_gpu_scheduler *sched;
	struct amdxdna_hwctx_priv *priv;
	struct amdxdna_gem_obj *heap;
@@ -573,9 +584,7 @@ int aie2_hwctx_init(struct amdxdna_hwctx *hwctx)
	might_lock(&priv->io_lock);
	fs_reclaim_release(GFP_KERNEL);

	ret = drm_sched_init(sched, &sched_ops, NULL, DRM_SCHED_PRIORITY_COUNT,
			     HWCTX_MAX_CMDS, 0, msecs_to_jiffies(HWCTX_MAX_TIMEOUT),
			     NULL, NULL, hwctx->name, xdna->ddev.dev);
	ret = drm_sched_init(sched, &args);
	if (ret) {
		XDNA_ERR(xdna, "Failed to init DRM scheduler. ret %d", ret);
		goto free_cmd_bufs;
@@ -616,6 +625,7 @@ int aie2_hwctx_init(struct amdxdna_hwctx *hwctx)
	hwctx->status = HWCTX_STAT_INIT;
	ndev = xdna->dev_handle;
	ndev->hwctx_num++;
	init_waitqueue_head(&priv->job_free_wq);

	XDNA_DBG(xdna, "hwctx %s init completed", hwctx->name);

@@ -652,25 +662,23 @@ void aie2_hwctx_fini(struct amdxdna_hwctx *hwctx)
	xdna = hwctx->client->xdna;
	ndev = xdna->dev_handle;
	ndev->hwctx_num--;
	drm_sched_wqueue_stop(&hwctx->priv->sched);

	/* Now, scheduler will not send command to device. */
	XDNA_DBG(xdna, "%s sequence number %lld", hwctx->name, hwctx->priv->seq);
	drm_sched_entity_destroy(&hwctx->priv->entity);

	aie2_hwctx_wait_for_idle(hwctx);

	/* Request fw to destroy hwctx and cancel the rest pending requests */
	aie2_release_resource(hwctx);

	/*
	 * All submitted commands are aborted.
	 * Restart scheduler queues to cleanup jobs. The amdxdna_sched_job_run()
	 * will return NODEV if it is called.
	 */
	drm_sched_wqueue_start(&hwctx->priv->sched);
	/* Wait for all submitted jobs to be completed or canceled */
	wait_event(hwctx->priv->job_free_wq,
		   atomic64_read(&hwctx->job_submit_cnt) ==
		   atomic64_read(&hwctx->job_free_cnt));

	aie2_hwctx_wait_for_idle(hwctx);
	drm_sched_entity_destroy(&hwctx->priv->entity);
	drm_sched_fini(&hwctx->priv->sched);
	aie2_ctx_syncobj_destroy(hwctx);

	XDNA_DBG(xdna, "%s sequence number %lld", hwctx->name, hwctx->priv->seq);

	for (idx = 0; idx < ARRAY_SIZE(hwctx->priv->cmd_buf); idx++)
		drm_gem_object_put(to_gobj(hwctx->priv->cmd_buf[idx]));
	amdxdna_gem_unpin(hwctx->priv->heap);
@@ -879,6 +887,7 @@ int aie2_cmd_submit(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *job,
	drm_gem_unlock_reservations(job->bos, job->bo_cnt, &acquire_ctx);

	aie2_job_put(job);
	atomic64_inc(&hwctx->job_submit_cnt);

	return 0;

Loading