The expected flow of operations when using PXP is to query the PXP
status and wait for it to transition to "ready" before attempting to
create an exec_queue. This flow is followed by the Mesa driver, but
there is no guarantee that an incorrectly coded (or malicious) app
will not attempt to create the queue first without querying the status.
Therefore, we need to clarify what the expected behavior of the queue
creation ioctl is in this scenario.
Currently, the ioctl always fails with an -EBUSY code no matter the
error, but for consistency it is better to distinguish between "failed
to init" (-EIO) and "not ready" (-EBUSY), the same way the query ioctl
does. Note that, while this is a change in the return code of an ioctl,
the behavior of the ioctl in this particular corner case was not clearly
spec'd, so no one should have been relying on it (and we know that Mesa,
which is the only known userspace for this, didn't).
v2: Minor rework of the doc (Rodrigo)
Fixes: 72d479601d ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250522225401.3953243-7-daniele.ceraolospurio@intel.com
(cherry picked from commit 21784ca960)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Customer is reporting a really subtle issue where we get random DMAR
faults, hangs and other nasties for kernel migration jobs when stressing
stuff like s2idle/s3/s4. The explosions seems to happen somewhere
after resuming the system with splats looking something like:
PM: suspend exit
rfkill: input handler disabled
xe 0000:00:02.0: [drm] GT0: Engine reset: engine_class=bcs, logical_mask: 0x2, guc_id=0
xe 0000:00:02.0: [drm] GT0: Timedout job: seqno=24496, lrc_seqno=24496, guc_id=0, flags=0x13 in no process [-1]
xe 0000:00:02.0: [drm] GT0: Kernel-submitted job timed out
The likely cause appears to be a race between suspend cancelling the
worker that processes the free_job()'s, such that we still have pending
jobs to be freed after the cancel. Following from this, on resume the
pending_list will now contain at least one already complete job, but it
looks like we call drm_sched_resubmit_jobs(), which will then call
run_job() on everything still on the pending_list. But if the job was
already complete, then all the resources tied to the job, like the bb
itself, any memory that is being accessed, the iommu mappings etc. might
be long gone since those are usually tied to the fence signalling.
This scenario can be seen in ftrace when running a slightly modified
xe_pm IGT (kernel was only modified to inject artificial latency into
free_job to make the race easier to hit):
xe_sched_job_run: dev=0000:00:02.0, fence=0xffff888276cc8540, seqno=0, lrc_seqno=0, gt=0, guc_id=0, batch_addr=0x000000146910 ...
xe_exec_queue_stop: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x0, flags=0x13
xe_exec_queue_stop: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=1, guc_state=0x0, flags=0x4
xe_exec_queue_stop: dev=0000:00:02.0, 4:0x1, gt=1, width=1, guc_id=0, guc_state=0x0, flags=0x3
xe_exec_queue_stop: dev=0000:00:02.0, 1:0x1, gt=1, width=1, guc_id=1, guc_state=0x0, flags=0x3
xe_exec_queue_stop: dev=0000:00:02.0, 4:0x1, gt=1, width=1, guc_id=2, guc_state=0x0, flags=0x3
xe_exec_queue_resubmit: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x0, flags=0x13
xe_sched_job_run: dev=0000:00:02.0, fence=0xffff888276cc8540, seqno=0, lrc_seqno=0, gt=0, guc_id=0, batch_addr=0x000000146910 ...
.....
xe_exec_queue_memory_cat_error: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x3, flags=0x13
So the job_run() is clearly triggered twice for the same job, even
though the first must have already signalled to completion during
suspend. We can also see a CAT error after the re-submit.
To prevent this only resubmit jobs on the pending_list that have not yet
signalled.
v2:
- Make sure to re-arm the fence callbacks with sched_start().
v3 (Matt B):
- Stop using drm_sched_resubmit_jobs(), which appears to be deprecated
and just open-code a simple loop such that we skip calling run_job()
on anything already signalled.
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4856
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: William Tseng <william.tseng@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://lore.kernel.org/r/20250528113328.289392-2-matthew.auld@intel.com
(cherry picked from commit 38fafa9f39)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
For preempt_fence mode VM's we're rejecting eviction of
shared bos during VM_BIND. However, since we do this in the
move() callback, we're getting an eviction failure warning from
TTM. The TTM callback intended for these things is
eviction_valuable().
However, the latter doesn't pass in the struct ttm_operation_ctx
needed to determine whether the caller needs this.
Instead, attach the needed information to the vm under the
vm->resv, until we've been able to update TTM to provide the
needed information. And add sufficient lockdep checks to prevent
misuse and races.
v2:
- Fix a copy-paste error in xe_vm_clear_validating()
v3:
- Fix kerneldoc errors.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Fixes: 0af944f0e3 ("drm/xe: Reject BO eviction if BO is bound to current VM")
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250528164105.234718-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit 9d5558649f)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
The XE driver can be built with or without VSEC support, but fails to link as
built-in if vsec is in a loadable module:
x86_64-linux-ld: vmlinux.o: in function `xe_vsec_init':
(.text+0x1e83e16): undefined reference to `intel_vsec_register'
The normal fix for this is to add a 'depends on INTEL_VSEC || !INTEL_VSEC',
forcing XE to be a loadable module as well, but that causes a circular
dependency:
symbol DRM_XE depends on INTEL_VSEC
symbol INTEL_VSEC depends on X86_PLATFORM_DEVICES
symbol X86_PLATFORM_DEVICES is selected by DRM_XE
The problem here is selecting a symbol from another subsystem, so change
that as well and rephrase the 'select' into the corresponding dependency.
Since X86_PLATFORM_DEVICES is 'default y', there is no change to
defconfig builds here.
Fixes: 0c45e76fcc ("drm/xe/vsec: Support BMG devices")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250529172355.2395634-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit e4931f8be3)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Add support to manage power limits using pcode mailbox commands
for supported platforms.
v2:
- Address review comments. (Badal)
- Use mailbox commands instead of registers to manage power limits
for BMG.
- Clamp the maximum power limit to GPU firmware default value.
v3:
- Clamp power limit in write also for platforms with mailbox support.
v4:
- Remove unnecessary debug prints. (Badal)
v5:
- Update description of variable pl1_on_boot to fix kernel-doc error.
v6:
- Improve commit message, refer to BIOS as GPU firmware.
- Change macro READ_PL_FROM_BIOS to READ_PL_FROM_FW.
- Rectify drm_warn to drm_info.
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Fixes: e90f7a58e6 ("drm/xe/hwmon: Add HWMON support for BMG")
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://lore.kernel.org/r/20250529163458.2354509-2-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 7596d839f6)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Non-display related:
- Fix undefined reference to `intel_pxp_gsccs_is_ready_for_sessions'
Display related:
- More work towards display separation (Jani)
- Stop writing VRR_CTL_IGN_MAX_SHIFT for MTL onwards (Jouni)
- DSC checks for 3 engines (Ankit)
- Add link rate and lane count to i915_display_info (Khaled)
- PSR fixes and workaround for underrun on idle (Jouni)
- LOBF enablement and ALMP fixes (Animesh)
- Clean up VGA plane handling (Ville)
- Use an intel_connector pointer everywhere (Imre)
- Fix warning for coffeelake on SunrisePoint PCH (Jiajia)
- Rework/Correction on minimum hblank calculation (Arun)
- Dmesg clean up (Jani)
- Add a couple of simple display workarounds (Ankit, Vinod)
- Refactor HDCP GSC (Jani)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/aByyL3bEufPu79OM@intel.com
Observe that i915->irq_lock is no longer used to protect anything
outside of display. Make it a display thing.
This allows us to remove the ugly #define irq_lock irq.lock hack from xe
compat header.
Note that this is slightly more subtle than it first looks. For i915,
there's no functional change here. The lock is moved. However, for xe,
we'll now have *two* locks, xe->irq.lock and display->irq.lock. These
should protect different things, though. Indeed, nesting in the past
would've lead to a deadlock because they were the same lock.
With the i915 references gone, we can make a handful more files
independent of i915_drv.h.
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://lore.kernel.org/r/6d8d2ce0f34a9c7361a5e2fcf96bb32a34c57e76.1746536745.git.jani.nikula@intel.com
[Jani: Fixed a comment while applying.]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
The workqueue used for the reset worker is marked as WQ_MEM_RECLAIM,
while the GSC one isn't (and can't be as we need to do memory
allocations in the gsc worker). Therefore, we can't flush the latter
from the former.
The reason why we had such a flush was to avoid interrupting either
the GSC FW load or in progress GSC proxy operations. GSC proxy
operations fall into 2 categories:
1) GSC proxy init: this only happens once immediately after GSC FW load
and does not support being interrupted. The only way to recover from
an interruption of the proxy init is to do an FLR and re-load the GSC.
2) GSC proxy request: this can happen in response to a request that
the driver sends to the GSC. If this is interrupted, the GSC FW will
timeout and the driver request will be failed, but overall the GSC
will keep working fine.
Flushing the work allowed us to avoid interruption in both cases (unless
the hang came from the GSC engine itself, in which case we're toast
anyway). However, a failure on a proxy request is tolerable if we're in
a scenario where we're triggering a GT reset (i.e., something is already
gone pretty wrong), so what we really need to avoid is interrupting
the init flow, which we can do by polling on the register that reports
when the proxy init is complete (as that ensure us that all the load and
init operations have been completed).
Note that during suspend we still want to do a flush of the worker to
make sure it completes any operations involving the HW before the power
is cut.
v2: fix spelling in commit msg, rename waiter function (Julia)
Fixes: dd0e89e5ed ("drm/xe/gsc: GSC FW load")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4830
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com>
Link: https://lore.kernel.org/r/20250502155104.2201469-1-daniele.ceraolospurio@intel.com
The gsc_hdcp_ops is duplicated and initialized exactly the same way in
two different places (for i915 and xe), and requires forward
declarations for all the hooks. Deduplicate, and make the functions
static.
There are slight differences in the i915 and xe implementations of
intel_hdcp_gsc_init() and intel_hdcp_gsc_fini(). Take the best of both,
and improve.
We need to expose intel_hdcp_gsc_hdcp2_init() and
intel_hdcp_gsc_free_message() for this, and create the latter for xe.
Cc: Suraj Kandpal <suraj.kandpal@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://lore.kernel.org/r/21e7871b35d4c7d13f016b5ecb4f10e5be72c531.1745524803.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
The device core dumps are copied in 1.5GB chunks, which leads to a
link-time error on 32-bit builds because of the 64-bit division not
getting trivially turned into mask and shift operations:
ERROR: modpost: "__moddi3" [drivers/gpu/drm/xe/xe.ko] undefined!
On top of this, I noticed that the ALIGN_DOWN() usage here cannot
work because that is only defined for power-of-two alignments.
Change ALIGN_DOWN into an explicit div_u64_rem() that avoids the
link error and hopefully produces the right results.
Doing a 1.5GB kvmalloc() does seem a bit suspicious as well, e.g.
this will clearly fail on any 32-bit platform and is also likely
to run out of memory on 64-bit systems under memory pressure, so
using a much smaller power-of-two chunk size might be a good idea
instead.
v2:
- Always call div_u64_rem (Matt)
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202504251238.JsNgFeFc-lkp@intel.com/
Fixes: c4a2e5f865 ("drm/xe: Add devcoredump chunking")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250501012545.1045247-1-matthew.brost@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Use a separate lock in the polling function eu_stall_data_buf_poll()
instead of eu_stall->stream_lock. This would prevent a possible
circular locking dependency leading to a deadlock as described below.
This would also require additional locking with the new lock in
the read function.
<4> [787.192986] ======================================================
<4> [787.192988] WARNING: possible circular locking dependency detected
<4> [787.192991] 6.14.0-rc7-xe+ #1 Tainted: G U
<4> [787.192993] ------------------------------------------------------
<4> [787.192994] xe_eu_stall/20093 is trying to acquire lock:
<4> [787.192996] ffff88819847e2c0 ((work_completion)
(&(&stream->buf_poll_work)->work)), at: __flush_work+0x1f8/0x5e0
<4> [787.193005] but task is already holding lock:
<4> [787.193007] ffff88814ce83ba8 (>->eu_stall->stream_lock){3:3},
at: xe_eu_stall_stream_ioctl+0x41/0x6a0 [xe]
<4> [787.193090] which lock already depends on the new lock.
<4> [787.193093] the existing dependency chain (in reverse order) is:
<4> [787.193095]
-> #1 (>->eu_stall->stream_lock){+.+.}-{3:3}:
<4> [787.193099] __mutex_lock+0xb4/0xe40
<4> [787.193104] mutex_lock_nested+0x1b/0x30
<4> [787.193106] eu_stall_data_buf_poll_work_fn+0x44/0x1d0 [xe]
<4> [787.193155] process_one_work+0x21c/0x740
<4> [787.193159] worker_thread+0x1db/0x3c0
<4> [787.193161] kthread+0x10d/0x270
<4> [787.193164] ret_from_fork+0x44/0x70
<4> [787.193168] ret_from_fork_asm+0x1a/0x30
<4> [787.193172]
-> #0 ((work_completion)(&(&stream->buf_poll_work)->work)){+.+.}-{0:0}:
<4> [787.193176] __lock_acquire+0x1637/0x2810
<4> [787.193180] lock_acquire+0xc9/0x300
<4> [787.193183] __flush_work+0x219/0x5e0
<4> [787.193186] cancel_delayed_work_sync+0x87/0x90
<4> [787.193189] xe_eu_stall_disable_locked+0x9a/0x260 [xe]
<4> [787.193237] xe_eu_stall_stream_ioctl+0x5b/0x6a0 [xe]
<4> [787.193285] __x64_sys_ioctl+0xa4/0xe0
<4> [787.193289] x64_sys_call+0x131e/0x2650
<4> [787.193292] do_syscall_64+0x91/0x180
<4> [787.193295] entry_SYSCALL_64_after_hwframe+0x76/0x7e
<4> [787.193299]
other info that might help us debug this:
<4> [787.193302] Possible unsafe locking scenario:
<4> [787.193304] CPU0 CPU1
<4> [787.193305] ---- ----
<4> [787.193306] lock(>->eu_stall->stream_lock);
<4> [787.193308] lock((work_completion)
(&(&stream->buf_poll_work)->work));
<4> [787.193311] lock(>->eu_stall->stream_lock);
<4> [787.193313] lock((work_completion)
(&(&stream->buf_poll_work)->work));
<4> [787.193315]
*** DEADLOCK ***
Fixes: 760edec939 ("drm/xe/eustall: Add support to read() and poll() EU stall data")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4598
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://lore.kernel.org/r/c896932fca84f79db2df5942911997ed77b2b9b6.1744934656.git.harish.chegondi@intel.com
Core Changes:
Fix drm_gpusvm kernel-doc (Lucas)
Driver Changes:
- Release guc ids before cancelling work (Tejas)
- Remove a duplicated pc_start_call (Rodrigo)
- Fix an incorrect assert in previous userptr fixes (Thomas)
- Remove gen11 assertions and prefixes (Lucas)
- Drop sentinels from arg to xe_rtp_process_to_src (Lucas)
- Temporarily disable D3Cold on BMG (Rodrigo)
- Fix MOCS debugfs LNCF readout (Tvrtko)
- Some ring flush cleanups (Tvrtko)
- Use unsigned int for alignment in fb pinning code (Tvrtko)
- Retry and wait longer for GuC PC start (Rodrigo)
- Recognize 3DSTATE_COARSE_PIXEL in LRC dumps (Matt Roper)
- Remove reduntant check in xe_vm_create_ioctl() (Xin)
- A bunch of SRIOV updates (Michal)
- Add stats for SVM page-faults (Francois)
- Fix an UAF (Harish)
- Expose fan speed (Raag)
- Fix exporting xe buffer objects multiple times (Tomasz)
- Apply a workaround (Vinay)
- Simplify pinned bo iteration (Thomas)
- Remove an incorrect "static" keywork (Lucas)
- Add support for separate firmware files on each GT (Lucas)
- Survivability handling fixes (Lucas)
- Allow to inject error in early probe (Lucas)
- Fix unmet direct dependencies warning (Yue Haibing)
- More error injection during probe (Francois)
- Coding style fix (Maarten)
- Additional stats support (Riana)
- Add fault injection for xe_oa_alloc_regs (Nakshrtra)
- Add a BMG PCI ID (Matt Roper)
- Some SVM fixes and preliminary SVM multi-device work (Thomas)
- Switch the migrate code from drm managed to dev managed (Aradhya)
- Fix an out-of-bounds shift when invalidating TLB (Thomas)
- Ensure fixed_slice_mode gets set after ccs_mode change (Niranjana)
- Use local fence in error path of xe_migrate_clear (Matthew Brost)
- More Workarounds (Julia)
- Define sysfs_ops on all directories (Tejas)
- Set power state to D3Cold during s2idle/s3 (Badal)
- Devcoredump output fix (John)
- Avoid plain 64-bit division (Arnd Bergmann)
- Reword a debug message (John)
- Don't print a hwconfig error message when forcing execlists (Stuart)
- Restore an error code to avoid a smatch warning (Rodrigo)
- Invalidate L3 read-only cachelines for geometry streams too (Kenneth)
- Make PPHWSP size explicit in xe_gt_lrc_size() (Gustavo)
- Add GT frequency events (Vinay)
- Fix xe_pt_stage_bind_walk kerneldoc (Thomas)
- Add a workaround (Aradhya)
- Rework pinned save/restore (Matthew Auld, Matthew Brost)
- Allow non-contig VRAM kernel BO (Matthew Auld)
- Support non-contig VRAM provisioning for SRIOV (Matthew Auld)
- Allow scratch-pages for unmapped parts of page-faulting VMs. (Oak)
- Ensure XE_BO_FLAG_CPU_ADDR_MIRROR had a unique value (Matt Roper)
- Fix taking an invalid lock on wedge (Lucas)
- Configs and documentation for survivability mode (Riana)
- Remove an unused macro (Shuicheng)
- Work around a page-fault full error (Matt Brost)
- Enable a SRIOV workaround (John)
- Bump the recommended GuC version (John)
- Allow to drop VRAM resizing (Lucas)
- Don't expose privileged debugfs files if VF (Michal)
- Don't show GGTT/LMEM debugfs files under media GT (Michal)
- Adjust ring-buffer emission for maximum possible size (Tvrtko)
- Fix notifier vs folio lock deadlock (Matthew Auld)
- Stop relying on placement for dma-buf unmap Matthew Auld)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/aADWaEFKVmxSnDLo@fedora
Add migrate layer functions to access VRAM and update
xe_ttm_access_memory to use for non-visible access and large (more than
16k) BO access. 8G devcoreump on BMG observed 3 minute CPU copy time vs.
3s GPU copy time.
v4:
- Fix non-page aligned accesses
- Add support for small / unaligned access
- Update commit message indicating migrate used for large accesses (Auld)
- Fix warning in xe_res_cursor for non-zero offset
v5:
- Fix 32 bit build (CI)
v6:
- Rebase and use SVM migration copy functions
v7:
- Fix build error (CI)
v8:
- Remove ifdef around VRAM copy functions (CI)
- Use break statement in dma unmmaping (Jonathan)
- Use if/else rather than goto (Jonathan)
- Use single return point (Jonathan)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250423171725.597955-3-matthew.brost@intel.com
Chunk devcoredump into 1.5G pieces to avoid hitting the kvmalloc limit
of 2G. Simple algorithm reads 1.5G at time in xe_devcoredump_read
callback as needed.
Some memory allocations are changed to GFP_ATOMIC as they done in
xe_devcoredump_read which holds lock in the path of reclaim. The
allocations are small, so in practice should never fail.
v2:
- Update commit message wrt gfp atomic (John H)
v6:
- Drop GFP_ATOMIC change for hwconfig (John H)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250423171725.597955-2-matthew.brost@intel.com
This flag used to be used in the old memory tracking code, that
code got migrated into the vmwgfx driver[1], and then got removed
from the tree[2], but this piece got left behind.
[1] f07069da6b ("drm/ttm: move memory accounting into vmwgfx v4")
[2] 8aadeb8ad8 ("drm/vmwgfx: Remove the dedicated memory accounting")
Cleanup the dead code.
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
When an attribute group is created with sysfs_create_group() or
sysfs_create_files() the ->sysfs_ops() callback is set to
kobj_sysfs_ops, which sets the ->show() callback to kobj_attr_show().
kobj_attr_show() uses container_of() to get the ->show() callback
from the attribute it was passed, meaning the ->show() callback needs
to be the same type as the ->show() callback in 'struct kobj_attribute'.
However, cur_freq_show() has the type of the ->show() callback in
'struct device_attribute', which causes a CFI violation when opening the
'id' sysfs node under gtidle/freq/throttle. This happens to work because
the layout of 'struct kobj_attribute' and 'struct device_attribute' are
the same, so the container_of() cast happens to allow the ->show()
callback to still work.
Changed the type of cur_freq_show() and few more functions to match the
->show() callback in 'struct kobj_attributes' to resolve the CFI
violation.
CFI failure seen while accessing sysfs files under
/sys/class/drm/card0/device/tile0/gt*/gtidle/*
/sys/class/drm/card0/device/tile0/gt*/freq0/*
/sys/class/drm/card0/device/tile0/gt*/freq0/throttle/*
[ 2599.618075] RIP: 0010:__cfi_cur_freq_show+0xd/0x10 [xe]
[ 2599.624452] Code: 44 c1 44 89 fa e8 03 95 39 f2 48 98 5b 41 5e 41 5f 5d c3 c9
[ 2599.646638] RSP: 0018:ffffbe438ead7d10 EFLAGS: 00010286
[ 2599.652823] RAX: ffff9f7d8b3845d8 RBX: ffff9f7dee8c95d8 RCX: 0000000000000000
[ 2599.661246] RDX: ffff9f7e6f439000 RSI: ffffffffc13ada30 RDI: ffff9f7d975d4b00
[ 2599.669669] RBP: ffffbe438ead7d18 R08: 0000000000001000 R09: ffff9f7e6f439000
[ 2599.678092] R10: 00000000e07304a6 R11: ffffffffc1241ca0 R12: ffffffffb4836ea0
[ 2599.688435] R13: ffff9f7e45fb1180 R14: ffff9f7d975d4b00 R15: ffff9f7e6f439000
[ 2599.696860] FS: 000076b02b66cfc0(0000) GS:ffff9f80ef400000(0000) knlGS:00000
[ 2599.706412] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2599.713196] CR2: 00005f80d94641a9 CR3: 00000001e44ec006 CR4: 0000000100f72ef0
[ 2599.721618] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2599.730041] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
[ 2599.738464] PKRU: 55555554
[ 2599.741655] Call Trace:
[ 2599.744541] <TASK>
[ 2599.747017] ? __die_body+0x69/0xb0
[ 2599.751151] ? die+0xa9/0xd0
[ 2599.754548] ? do_trap+0x89/0x160
[ 2599.758476] ? __cfi_cur_freq_show+0xd/0x10 [xe b37985c94829727668bd7c5b33c1]
[ 2599.768315] ? handle_invalid_op+0x69/0x90
[ 2599.773167] ? __cfi_cur_freq_show+0xd/0x10 [xe b37985c94829727668bd7c5b33c1]
[ 2599.783010] ? exc_invalid_op+0x36/0x60
[ 2599.787552] ? fred_hwexc+0x123/0x1a0
[ 2599.791873] ? fred_entry_from_kernel+0x7b/0xd0
[ 2599.797219] ? asm_fred_entrypoint_kernel+0x45/0x70
[ 2599.802976] ? act_freq_show+0x70/0x70 [xe b37985c94829727668bd7c5b33c1d9998]
[ 2599.812301] ? __cfi_cur_freq_show+0xd/0x10 [xe b37985c94829727668bd7c5b33c1]
[ 2599.822137] ? __kmalloc_node_noprof+0x1f3/0x420
[ 2599.827594] ? __kvmalloc_node_noprof+0xcb/0x180
[ 2599.833045] ? kobj_attr_show+0x22/0x40
[ 2599.837571] sysfs_kf_seq_show+0xa8/0x110
[ 2599.842302] kernfs_seq_show+0x38/0x50
Signed-off-by: Jeevaka Prabu Badrappan <jeevaka.badrappan@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250422171852.85558-1-jeevaka.badrappan@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Userspace is still alive and kicking at this point so actually moving
pinned stuff here is tricky. However, we can instead pre-allocate the
backup storage upfront from the notifier, such that we scoop up as much
as we can, and then leave the final .suspend() to do the actual copy (or
allocate anything that we missed). That way the bulk of our allocations
will hopefully be done outside the more restrictive .suspend().
We do need to be extra careful though, since the pinned handling can now
race with PM notifier, like something becoming unpinned after we prepare
it from the notifier.
v2 (Thomas):
- Fix kernel doc and drop the pin as soon as we are done with the
restore, instead of deferring to later.
Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/20250416150913.434369-8-matthew.auld@intel.com
We end up needing to grab both locks together anyway and keep them held
until we complete the copy or add the fence. Plus the backup_obj is
short lived and tied to the parent object, so seems reasonable to share
the same dma-resv. This will simplify the locking here, and in follow
up patches.
v2:
- Hold reference to the parent bo to be sure the shared dma-resv can't
go out of scope too soon. (Thomas)
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/20250416150913.434369-7-matthew.auld@intel.com
Calculating the DSS id (index of a steered register) currently
requires reading state from the hwconfig table and that currently
requires dynamically allocating memory. The GuC based register capture
(for dev core dumps) includes this index as part of the register name
in the dump. However, it was calculating said index at the time of the
dump for every dump. That is wasteful. It also breaks anyone trying to
do the dump at a time when memory allocations are not allowed.
So rather than calculating on every print, just calculate at start of
day when creating the register list in the first place.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20250417213303.3021243-1-John.C.Harrison@Intel.com