Commit 8823eaef authored by Breno Leitao's avatar Breno Leitao Committed by Tejun Heo
Browse files

workqueue: Show all busy workers in stall diagnostics



show_cpu_pool_hog() only prints workers whose task is currently running
on the CPU (task_is_running()).  This misses workers that are busy
processing a work item but are sleeping or blocked — for example, a
worker that clears PF_WQ_WORKER and enters wait_event_idle().  Such a
worker still occupies a pool slot and prevents progress, yet produces
an empty backtrace section in the watchdog output.

This is happening on real arm64 systems, where
toggle_allocation_gate() IPIs every single CPU in the machine (which
lacks NMI), causing workqueue stalls that show empty backtraces because
toggle_allocation_gate() is sleeping in wait_event_idle().

Remove the task_is_running() filter so every in-flight worker in the
pool's busy_hash is dumped.  The busy_hash is protected by pool->lock,
which is already held.

Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
Acked-by: default avatarSong Liu <song@kernel.org>
Signed-off-by: default avatarTejun Heo <tj@kernel.org>
parent e8e14ac7
Loading
Loading
Loading
Loading
+13 −15
Original line number Diff line number Diff line
@@ -7583,9 +7583,9 @@ MODULE_PARM_DESC(panic_on_stall_time, "Panic if stall exceeds this many seconds

/*
 * Show workers that might prevent the processing of pending work items.
 * The only candidates are CPU-bound workers in the running state.
 * Pending work items should be handled by another idle worker
 * in all other situations.
 * A busy worker that is not running on the CPU (e.g. sleeping in
 * wait_event_idle() with PF_WQ_WORKER cleared) can stall the pool just as
 * effectively as a CPU-bound one, so dump every in-flight worker.
 */
static void show_cpu_pool_hog(struct worker_pool *pool)
{
@@ -7596,7 +7596,6 @@ static void show_cpu_pool_hog(struct worker_pool *pool)
	raw_spin_lock_irqsave(&pool->lock, irq_flags);

	hash_for_each(pool->busy_hash, bkt, worker, hentry) {
		if (task_is_running(worker->task)) {
		/*
		 * Defer printing to avoid deadlocks in console
		 * drivers that queue work while holding locks
@@ -7609,7 +7608,6 @@ static void show_cpu_pool_hog(struct worker_pool *pool)

		printk_deferred_exit();
	}
	}

	raw_spin_unlock_irqrestore(&pool->lock, irq_flags);
}
@@ -7619,7 +7617,7 @@ static void show_cpu_pools_hogs(void)
	struct worker_pool *pool;
	int pi;

	pr_info("Showing backtraces of running workers in stalled CPU-bound worker pools:\n");
	pr_info("Showing backtraces of busy workers in stalled CPU-bound worker pools:\n");

	rcu_read_lock();