Commit 3b569119 authored by Matthew Brost's avatar Matthew Brost
Browse files

drm/xe/vf: Workaround for race condition in GuC firmware during VF pause



A race condition exists where a paused VF's H2G request can be processed
and subsequently rejected. This rejection results in a FAST_REQ failure
being delivered to the KMD, which then terminates the CT via a dead
worker and triggers a GT reset—an undesirable outcome.

This workaround mitigates the issue by checking if a VF post-migration
recovery is in progress and aborting these adverse actions accordingly.
The GuC firmware will address this bug in an upcoming release. Once that
version is available and VF migration depends on it, this workaround can
be safely removed.

Signed-off-by: default avatarMatthew Brost <matthew.brost@intel.com>
Reviewed-by: default avatarTomasz Lis <tomasz.lis@intel.com>
Link: https://lore.kernel.org/r/20251008214532.3442967-30-matthew.brost@intel.com
parent 1521fad9
Loading
Loading
Loading
Loading
+4 −0
Original line number Diff line number Diff line
@@ -1398,6 +1398,10 @@ static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len)

		fast_req_report(ct, fence);

		/* FIXME: W/A race in the GuC, will get in firmware soon */
		if (xe_gt_recovery_pending(gt))
			return 0;

		CT_DEAD(ct, NULL, PARSE_G2H_RESPONSE);

		return -EPROTO;