Commit 0ac20437 authored by Caleb Sander Mateos's avatar Caleb Sander Mateos Committed by Jakub Kicinski
Browse files

mlx5/core: Schedule EQ comp tasklet only if necessary



Currently, the mlx5_eq_comp_int() interrupt handler schedules a tasklet
to call mlx5_cq_tasklet_cb() if it processes any completions. For CQs
whose completions don't need to be processed in tasklet context, this
adds unnecessary overhead. In a heavy TCP workload, we see 4% of CPU
time spent on the tasklet_trylock() in tasklet_action_common(), with a
smaller amount spent on the atomic operations in tasklet_schedule(),
tasklet_clear_sched(), and locking the spinlock in mlx5_cq_tasklet_cb().
TCP completions are handled by mlx5e_completion_event(), which schedules
NAPI to poll the queue, so they don't need tasklet processing.

Schedule the tasklet in mlx5_add_cq_to_tasklet() instead to avoid this
overhead. mlx5_add_cq_to_tasklet() is responsible for enqueuing the CQs
to be processed in tasklet context, so it can schedule the tasklet. CQs
that need tasklet processing have their interrupt comp handler set to
mlx5_add_cq_to_tasklet(), so they will schedule the tasklet. CQs that
don't need tasklet processing won't schedule the tasklet. To avoid
scheduling the tasklet multiple times during the same interrupt, only
schedule the tasklet in mlx5_add_cq_to_tasklet() if the tasklet work
queue was empty before the new CQ was pushed to it.

The additional branch in mlx5_add_cq_to_tasklet(), called for each EQE,
may add a small cost for the userspace Infiniband CQs whose completions
are processed in tasklet context. But this seems worth it to avoid the
tasklet overhead for CQs that don't need it.

Note that the mlx4 driver works the same way: it schedules the tasklet
in mlx4_add_cq_to_tasklet() and only if the work queue was empty before.

Signed-off-by: default avatarCaleb Sander Mateos <csander@purestorage.com>
Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
Acked-by: default avatarTariq Toukan <tariqt@nvidia.com>
Acked-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
Link: https://patch.msgid.link/20241105204000.1807095-1-csander@purestorage.com


Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parent e4e3fd0a
Loading
Loading
Loading
Loading
+11 −0
Original line number Diff line number Diff line
@@ -71,6 +71,7 @@ static void mlx5_add_cq_to_tasklet(struct mlx5_core_cq *cq,
{
	unsigned long flags;
	struct mlx5_eq_tasklet *tasklet_ctx = cq->tasklet_ctx.priv;
	bool schedule_tasklet = false;

	spin_lock_irqsave(&tasklet_ctx->lock, flags);
	/* When migrating CQs between EQs will be implemented, please note
@@ -80,9 +81,19 @@ static void mlx5_add_cq_to_tasklet(struct mlx5_core_cq *cq,
	 */
	if (list_empty_careful(&cq->tasklet_ctx.list)) {
		mlx5_cq_hold(cq);
		/* If the tasklet CQ work list isn't empty, mlx5_cq_tasklet_cb()
		 * is scheduled/running and hasn't processed the list yet, so it
		 * will see this added CQ when it runs. If the list is empty,
		 * the tasklet needs to be scheduled to pick up the CQ. The
		 * spinlock avoids any race with the tasklet accessing the list.
		 */
		schedule_tasklet = list_empty(&tasklet_ctx->list);
		list_add_tail(&cq->tasklet_ctx.list, &tasklet_ctx->list);
	}
	spin_unlock_irqrestore(&tasklet_ctx->lock, flags);

	if (schedule_tasklet)
		tasklet_schedule(&tasklet_ctx->task);
}

/* Callers must verify outbox status in case of err */
+1 −4
Original line number Diff line number Diff line
@@ -114,10 +114,10 @@ static int mlx5_eq_comp_int(struct notifier_block *nb,
	struct mlx5_eq *eq = &eq_comp->core;
	struct mlx5_eqe *eqe;
	int num_eqes = 0;
	u32 cqn = -1;

	while ((eqe = next_eqe_sw(eq))) {
		struct mlx5_core_cq *cq;
		u32 cqn;

		/* Make sure we read EQ entry contents after we've
		 * checked the ownership bit.
@@ -144,9 +144,6 @@ static int mlx5_eq_comp_int(struct notifier_block *nb,

	eq_update_ci(eq, 1);

	if (cqn != -1)
		tasklet_schedule(&eq_comp->tasklet_ctx.task);

	return 0;
}