File changed.
Preview size limit exceeded, changes collapsed.
Loading
Previously the HW-GRO code was using a separate page_pool for the header buffer. The pages of the header buffer were replenished via UMR. This mechanism has some drawbacks: - Reference counting on the page_pool page frags is not cheap. - UMRs have HW overhead for updating and also for access. Especially for the KLM type which was previously used. - UMR code for headers is complex. This patch switches to using a static memory area (static MTT MKEY) for the header buffer and does a header memcpy. This happens only once per GRO session. The SKB is allocated from the per-cpu NAPI SKB cache. Performance numbers for x86: +---------------------------------------------------------+ | Test | Baseline | Header Copy | Change | |---------------------+------------+-------------+--------| | iperf3 oncpu | 59.5 Gbps | 64.00 Gbps | 7 % | | iperf3 offcpu | 102.5 Gbps | 104.20 Gbps | 2 % | | kperf oncpu | 115.0 Gbps | 130.00 Gbps | 12 % | | XDP_DROP (skb mode) | 3.9 Mpps | 3.9 Mpps | 0 % | +---------------------------------------------------------+ Notes on test: - System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz - oncpu: NAPI and application running on same CPU - offcpu: NAPI and application running on different CPUs - MTU: 1500 - iperf3 tests are single stream, 60s with IPv6 (for slightly larger headers) - kperf version [1] [1] git://git.kernel.dk/kperf.git Suggested-by:Eric Dumazet <edumazet@google.com> Signed-off-by:
Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by:
Tariq Toukan <tariqt@nvidia.com> Reviewed-by:
Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260204200345.1724098-1-tariqt@nvidia.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
File changed.
Preview size limit exceeded, changes collapsed.