+1
−1
+8
−4
Loading
Eric Dumazet says: ==================== net: use skb_attempt_defer_free() in napi_consume_skb() There is a lack of NUMA awareness and more generally lack of slab caches affinity on TX completion path. Modern drivers are using napi_consume_skb(), hoping to cache sk_buff in per-cpu caches so that they can be recycled in RX path. Only use this if the skb was allocated on the same cpu, otherwise use skb_attempt_defer_free() so that the skb is freed on the original cpu. This removes contention on SLUB spinlocks and data structures, and this makes sure that recycled sk_buff have correct NUMA locality. After this series, I get ~50% improvement for an UDP tx workload on an AMD EPYC 9B45 (IDPF 200Gbit NIC with 32 TX queues). I will later refactor skb_attempt_defer_free() to no longer have to care of skb_shared() and skb_release_head_state(). ==================== Link: https://patch.msgid.link/20251106202935.1776179-1-edumazet@google.com Signed-off-by:Jakub Kicinski <kuba@kernel.org>