Commit af13e5e4 authored by Peter Zijlstra's avatar Peter Zijlstra
Browse files

sched: Fix the do_set_cpus_allowed() locking fix



Commit abfc0107 ("sched: Fix do_set_cpus_allowed() locking")
overlooked that __balance_push_cpu_stop() calls select_fallback_rq()
with rq->lock held. This makes that set_cpus_allowed_force() will
recursively take rq->lock and the machine locks up.

Run select_fallback_rq() earlier, without holding rq->lock. This opens
up a race window where a task could get migrated out from under us, but
that is harmless, we want the task migrated.

select_fallback_rq() itself will not be subject to concurrency as it
will be fully serialized by p->pi_lock, so there is no chance of
set_cpus_allowed_force() getting called with different arguments and
selecting different fallback CPUs for one task.

Fixes: abfc0107 ("sched: Fix do_set_cpus_allowed() locking")
Reported-by: default avatarJan Polensky <japo@linux.ibm.com>
Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: default avatarJan Polensky <japo@linux.ibm.com>
Closes: https://lore.kernel.org/oe-lkp/202510271206.24495a68-lkp@intel.com
Link: https://patch.msgid.link/20251027110133.GI3245006@noisy.programming.kicks-ass.net
parent 73cbcfe2
Loading
Loading
Loading
Loading
+7 −10
Original line number Diff line number Diff line
@@ -8044,18 +8044,15 @@ static int __balance_push_cpu_stop(void *arg)
	struct rq_flags rf;
	int cpu;

	raw_spin_lock_irq(&p->pi_lock);
	rq_lock(rq, &rf);
	scoped_guard (raw_spinlock_irq, &p->pi_lock) {
		cpu = select_fallback_rq(rq->cpu, p);

		rq_lock(rq, &rf);
		update_rq_clock(rq);

	if (task_rq(p) == rq && task_on_rq_queued(p)) {
		cpu = select_fallback_rq(rq->cpu, p);
		if (task_rq(p) == rq && task_on_rq_queued(p))
			rq = __migrate_task(rq, &rf, p, cpu);
	}

		rq_unlock(rq, &rf);
	raw_spin_unlock_irq(&p->pi_lock);
	}

	put_task_struct(p);