Unverified Commit 56ce6c8b authored by Christian Brauner's avatar Christian Brauner
Browse files

Merge patch series "fs: replace wq users and add WQ_PERCPU to alloc_workqueue() users"

Marco Crivellari <marco.crivellari@suse.com> says:

Below is a summary of a discussion about the Workqueue API and cpu isolation
considerations. Details and more information are available here:

        "workqueue: Always use wq_select_unbound_cpu() for WORK_CPU_UNBOUND."
        https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/

=== Current situation: problems ===

Let's consider a nohz_full system with isolated CPUs: wq_unbound_cpumask is
set to the housekeeping CPUs, for !WQ_UNBOUND the local CPU is selected.

This leads to different scenarios if a work item is scheduled on an isolated
CPU where "delay" value is 0 or greater then 0:
        schedule_delayed_work(, 0);

This will be handled by __queue_work() that will queue the work item on the
current local (isolated) CPU, while:

        schedule_delayed_work(, 1);

Will move the timer on an housekeeping CPU, and schedule the work there.

Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.

This lack of consistentcy cannot be addressed without refactoring the API.

=== Plan and future plans ===

This patchset is the first stone on a refactoring needed in order to
address the points aforementioned; it will have a positive impact also
on the cpu isolation, in the long term, moving away percpu workqueue in
favor to an unbound model.

These are the main steps:
1)  API refactoring (that this patch is introducing)
    -   Make more clear and uniform the system wq names, both per-cpu and
        unbound. This to avoid any possible confusion on what should be
        used.

    -   Introduction of WQ_PERCPU: this flag is the complement of WQ_UNBOUND,
        introduced in this patchset and used on all the callers that are not
        currently using WQ_UNBOUND.

        WQ_UNBOUND will be removed in a future release cycle.

        Most users don't need to be per-cpu, because they don't have
        locality requirements, because of that, a next future step will be
        make "unbound" the default behavior.

2)  Check who really needs to be per-cpu
    -   Remove the WQ_PERCPU flag when is not strictly required.

3)  Add a new API (prefer local cpu)
    -   There are users that don't require a local execution, like mentioned
        above; despite that, local execution yeld to performance gain.

        This new API will prefer the local execution, without requiring it.

=== Introduced Changes by this series ===

1) [P 1-2] Replace use of system_wq and system_unbound_wq

        system_wq is a per-CPU workqueue, but his name is not clear.
        system_unbound_wq is to be used when locality is not required.

        Because of that, system_wq has been renamed in system_percpu_wq, and
        system_unbound_wq has been renamed in system_dfl_wq.

2) [P 3] add WQ_PERCPU to remaining alloc_workqueue() users

        Every alloc_workqueue() caller should use one among WQ_PERCPU or
        WQ_UNBOUND. This is actually enforced warning if both or none of them
        are present at the same time.

        WQ_UNBOUND will be removed in a next release cycle.

* patches from https://lore.kernel.org/20250916082906.77439-1-marco.crivellari@suse.com

:
  fs: WQ_PERCPU added to alloc_workqueue users
  fs: replace use of system_wq with system_percpu_wq
  fs: replace use of system_unbound_wq with system_dfl_wq

Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
parents 8f5ae30d 69635d7f
Loading
Loading
Loading
Loading
+2 −2
Original line number Diff line number Diff line
@@ -42,7 +42,7 @@ static void afs_volume_init_callback(struct afs_volume *volume)
	list_for_each_entry(vnode, &volume->open_mmaps, cb_mmap_link) {
		if (vnode->cb_v_check != atomic_read(&volume->cb_v_break)) {
			afs_clear_cb_promise(vnode, afs_cb_promise_clear_vol_init_cb);
			queue_work(system_unbound_wq, &vnode->cb_work);
			queue_work(system_dfl_wq, &vnode->cb_work);
		}
	}

@@ -90,7 +90,7 @@ void __afs_break_callback(struct afs_vnode *vnode, enum afs_cb_break_reason reas
		if (reason != afs_cb_break_for_deleted &&
		    vnode->status.type == AFS_FTYPE_FILE &&
		    atomic_read(&vnode->cb_nr_mmap))
			queue_work(system_unbound_wq, &vnode->cb_work);
			queue_work(system_dfl_wq, &vnode->cb_work);

		trace_afs_cb_break(&vnode->fid, vnode->cb_break, reason, true);
	} else {
+2 −2
Original line number Diff line number Diff line
@@ -169,13 +169,13 @@ static int __init afs_init(void)

	printk(KERN_INFO "kAFS: Red Hat AFS client v0.1 registering.\n");

	afs_wq = alloc_workqueue("afs", 0, 0);
	afs_wq = alloc_workqueue("afs", WQ_PERCPU, 0);
	if (!afs_wq)
		goto error_afs_wq;
	afs_async_calls = alloc_workqueue("kafsd", WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
	if (!afs_async_calls)
		goto error_async;
	afs_lock_manager = alloc_workqueue("kafs_lockd", WQ_MEM_RECLAIM, 0);
	afs_lock_manager = alloc_workqueue("kafs_lockd", WQ_MEM_RECLAIM | WQ_PERCPU, 0);
	if (!afs_lock_manager)
		goto error_lockmgr;

+1 −1
Original line number Diff line number Diff line
@@ -172,7 +172,7 @@ static void afs_issue_write_worker(struct work_struct *work)
void afs_issue_write(struct netfs_io_subrequest *subreq)
{
	subreq->work.func = afs_issue_write_worker;
	if (!queue_work(system_unbound_wq, &subreq->work))
	if (!queue_work(system_dfl_wq, &subreq->work))
		WARN_ON_ONCE(1);
}

+1 −1
Original line number Diff line number Diff line
@@ -636,7 +636,7 @@ static void free_ioctx_reqs(struct percpu_ref *ref)

	/* Synchronize against RCU protected table->table[] dereferences */
	INIT_RCU_WORK(&ctx->free_rwork, free_ioctx);
	queue_rcu_work(system_wq, &ctx->free_rwork);
	queue_rcu_work(system_percpu_wq, &ctx->free_rwork);
}

/*
+1 −1
Original line number Diff line number Diff line
@@ -827,7 +827,7 @@ int bch2_journal_keys_to_write_buffer_end(struct bch_fs *c, struct journal_keys_

	if (bch2_btree_write_buffer_should_flush(c) &&
	    __enumerated_ref_tryget(&c->writes, BCH_WRITE_REF_btree_write_buffer) &&
	    !queue_work(system_unbound_wq, &c->btree_write_buffer.flush_work))
	    !queue_work(system_dfl_wq, &c->btree_write_buffer.flush_work))
		enumerated_ref_put(&c->writes, BCH_WRITE_REF_btree_write_buffer);

	if (dst->wb == &wb->flushing)
Loading