Commit 86b721bb authored by Jakub Kicinski's avatar Jakub Kicinski
Browse files

Merge branch 'net-use-skb_attempt_defer_free-in-napi_consume_skb'

Eric Dumazet says:

====================
net: use skb_attempt_defer_free() in napi_consume_skb()

There is a lack of NUMA awareness and more generally lack
of slab caches affinity on TX completion path.

Modern drivers are using napi_consume_skb(), hoping to cache sk_buff
in per-cpu caches so that they can be recycled in RX path.

Only use this if the skb was allocated on the same cpu,
otherwise use skb_attempt_defer_free() so that the skb
is freed on the original cpu.

This removes contention on SLUB spinlocks and data structures,
and this makes sure that recycled sk_buff have correct NUMA locality.

After this series, I get ~50% improvement for an UDP tx workload
on an AMD EPYC 9B45 (IDPF 200Gbit NIC with 32 TX queues).

I will later refactor skb_attempt_defer_free()
to no longer have to care of skb_shared() and skb_release_head_state().
====================

Link: https://patch.msgid.link/20251106202935.1776179-1-edumazet@google.com


Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parents fd9557c3 b6178585
Loading
Loading
Loading
Loading
+2 −2
Original line number Diff line number Diff line
@@ -355,9 +355,9 @@ skb_defer_max
-------------

Max size (in skbs) of the per-cpu list of skbs being freed
by the cpu which allocated them. Used by TCP stack so far.
by the cpu which allocated them.

Default: 64
Default: 128

optmem_max
----------
+1 −1
Original line number Diff line number Diff line
@@ -20,7 +20,7 @@ struct net_hotdata net_hotdata __cacheline_aligned = {
	.dev_tx_weight = 64,
	.dev_rx_weight = 64,
	.sysctl_max_skb_frags = MAX_SKB_FRAGS,
	.sysctl_skb_defer_max = 64,
	.sysctl_skb_defer_max = 128,
	.sysctl_mem_pcpu_rsv = SK_MEMORY_PCPU_RESERVE
};
EXPORT_SYMBOL(net_hotdata);
+8 −4
Original line number Diff line number Diff line
@@ -1149,11 +1149,10 @@ void skb_release_head_state(struct sk_buff *skb)
				skb);

#endif
		skb->destructor = NULL;
	}
#if IS_ENABLED(CONFIG_NF_CONNTRACK)
	nf_conntrack_put(skb_nfct(skb));
#endif
	skb_ext_put(skb);
	nf_reset_ct(skb);
	skb_ext_reset(skb);
}

/* Free everything but the sk_buff shell. */
@@ -1477,6 +1476,11 @@ void napi_consume_skb(struct sk_buff *skb, int budget)

	DEBUG_NET_WARN_ON_ONCE(!in_softirq());

	if (skb->alloc_cpu != smp_processor_id() && !skb_shared(skb)) {
		skb_release_head_state(skb);
		return skb_attempt_defer_free(skb);
	}

	if (!skb_unref(skb))
		return;