Commit 753c8608 authored by Jakub Kicinski's avatar Jakub Kicinski
Browse files
Daniel Borkmann says:

====================
pull-request: bpf-next 2023-11-30

We've added 30 non-merge commits during the last 7 day(s) which contain
a total of 58 files changed, 1598 insertions(+), 154 deletions(-).

The main changes are:

1) Add initial TX metadata implementation for AF_XDP with support in mlx5
   and stmmac drivers. Two types of offloads are supported right now, that
   is, TX timestamp and TX checksum offload, from Stanislav Fomichev with
   stmmac implementation from Song Yoong Siang.

2) Change BPF verifier logic to validate global subprograms lazily instead
   of unconditionally before the main program, so they can be guarded using
   BPF CO-RE techniques, from Andrii Nakryiko.

3) Add BPF link_info support for uprobe multi link along with bpftool
   integration for the latter, from Jiri Olsa.

4) Use pkg-config in BPF selftests to determine ld flags which is
   in particular needed for linking statically, from Akihiko Odaki.

5) Fix a few BPF selftest failures to adapt to the upcoming LLVM18,
   from Yonghong Song.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (30 commits)
  bpf/tests: Remove duplicate JSGT tests
  selftests/bpf: Add TX side to xdp_hw_metadata
  selftests/bpf: Convert xdp_hw_metadata to XDP_USE_NEED_WAKEUP
  selftests/bpf: Add TX side to xdp_metadata
  selftests/bpf: Add csum helpers
  selftests/xsk: Support tx_metadata_len
  xsk: Add option to calculate TX checksum in SW
  xsk: Validate xsk_tx_metadata flags
  xsk: Document tx_metadata_len layout
  net: stmmac: Add Tx HWTS support to XDP ZC
  net/mlx5e: Implement AF_XDP TX timestamp and checksum offload
  tools: ynl: Print xsk-features from the sample
  xsk: Add TX timestamp and TX checksum offload support
  xsk: Support tx_metadata_len
  selftests/bpf: Use pkg-config for libelf
  selftests/bpf: Override PKG_CONFIG for static builds
  selftests/bpf: Choose pkg-config for the target
  bpftool: Add support to display uprobe_multi links
  selftests/bpf: Add link_info test for uprobe_multi link
  selftests/bpf: Use bpf_link__destroy in fill_link_info tests
  ...
====================

Conflicts:

Documentation/netlink/specs/netdev.yaml:
  839ff60d ("net: page_pool: add nlspec for basic access to page pools")
  48eb03dd ("xsk: Add TX timestamp and TX checksum offload support")
https://lore.kernel.org/all/20231201094705.1ee3cab8@canb.auug.org.au/

While at it also regen, tree is dirty after:
  48eb03dd ("xsk: Add TX timestamp and TX checksum offload support")
looks like code wasn't re-rendered after "render-max" was removed.

Link: https://lore.kernel.org/r/20231130145708.32573-1-daniel@iogearbox.net


Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parents 975f2d73 f690ff91
Loading
Loading
Loading
Loading
+18 −1
Original line number Diff line number Diff line
@@ -45,7 +45,6 @@ definitions:
  -
    type: flags
    name: xdp-rx-metadata
    render-max: true
    entries:
      -
        name: timestamp
@@ -55,6 +54,18 @@ definitions:
        name: hash
        doc:
          Device is capable of exposing receive packet hash via bpf_xdp_metadata_rx_hash().
  -
    type: flags
    name: xsk-flags
    entries:
      -
        name: tx-timestamp
        doc:
          HW timestamping egress packets is supported by the driver.
      -
        name: tx-checksum
        doc:
          L3 checksum HW offload is supported by the driver.

attribute-sets:
  -
@@ -86,6 +97,11 @@ attribute-sets:
             See Documentation/networking/xdp-rx-metadata.rst for more details.
        type: u64
        enum: xdp-rx-metadata
      -
        name: xsk-features
        doc: Bitmask of enabled AF_XDP features.
        type: u64
        enum: xsk-flags
  -
    name: page-pool
    attributes:
@@ -209,6 +225,7 @@ operations:
            - xdp-features
            - xdp-zc-max-segs
            - xdp-rx-metadata-features
            - xsk-features
      dump:
        reply: *dev-all
    -
+1 −0
Original line number Diff line number Diff line
@@ -124,6 +124,7 @@ Contents:
   xfrm_sync
   xfrm_sysctl
   xdp-rx-metadata
   xsk-tx-metadata

.. only::  subproject and html

+2 −0
Original line number Diff line number Diff line
.. SPDX-License-Identifier: GPL-2.0

===============
XDP RX Metadata
===============
+79 −0
Original line number Diff line number Diff line
==================
AF_XDP TX Metadata
==================

This document describes how to enable offloads when transmitting packets
via :doc:`af_xdp`. Refer to :doc:`xdp-rx-metadata` on how to access similar
metadata on the receive side.

General Design
==============

The headroom for the metadata is reserved via ``tx_metadata_len`` in
``struct xdp_umem_reg``. The metadata length is therefore the same for
every socket that shares the same umem. The metadata layout is a fixed UAPI,
refer to ``union xsk_tx_metadata`` in ``include/uapi/linux/if_xdp.h``.
Thus, generally, the ``tx_metadata_len`` field above should contain
``sizeof(union xsk_tx_metadata)``.

The headroom and the metadata itself should be located right before
``xdp_desc->addr`` in the umem frame. Within a frame, the metadata
layout is as follows::

           tx_metadata_len
     /                         \
    +-----------------+---------+----------------------------+
    | xsk_tx_metadata | padding |          payload           |
    +-----------------+---------+----------------------------+
                                ^
                                |
                          xdp_desc->addr

An AF_XDP application can request headrooms larger than ``sizeof(struct
xsk_tx_metadata)``. The kernel will ignore the padding (and will still
use ``xdp_desc->addr - tx_metadata_len`` to locate
the ``xsk_tx_metadata``). For the frames that shouldn't carry
any metadata (i.e., the ones that don't have ``XDP_TX_METADATA`` option),
the metadata area is ignored by the kernel as well.

The flags field enables the particular offload:

- ``XDP_TXMD_FLAGS_TIMESTAMP``: requests the device to put transmission
  timestamp into ``tx_timestamp`` field of ``union xsk_tx_metadata``.
- ``XDP_TXMD_FLAGS_CHECKSUM``: requests the device to calculate L4
  checksum. ``csum_start`` specifies byte offset of where the checksumming
  should start and ``csum_offset`` specifies byte offset where the
  device should store the computed checksum.

Besides the flags above, in order to trigger the offloads, the first
packet's ``struct xdp_desc`` descriptor should set ``XDP_TX_METADATA``
bit in the ``options`` field. Also note that in a multi-buffer packet
only the first chunk should carry the metadata.

Software TX Checksum
====================

For development and testing purposes its possible to pass
``XDP_UMEM_TX_SW_CSUM`` flag to ``XDP_UMEM_REG`` UMEM registration call.
In this case, when running in ``XDK_COPY`` mode, the TX checksum
is calculated on the CPU. Do not enable this option in production because
it will negatively affect performance.

Querying Device Capabilities
============================

Every devices exports its offloads capabilities via netlink netdev family.
Refer to ``xsk-flags`` features bitmask in
``Documentation/netlink/specs/netdev.yaml``.

- ``tx-timestamp``: device supports ``XDP_TXMD_FLAGS_TIMESTAMP``
- ``tx-checksum``: device supports ``XDP_TXMD_FLAGS_CHECKSUM``

See ``tools/net/ynl/samples/netdev.c`` on how to query this information.

Example
=======

See ``tools/testing/selftests/bpf/xdp_hw_metadata.c`` for an example
program that handles TX metadata. Also see https://github.com/fomichev/xskgen
for a more bare-bones example.
+3 −1
Original line number Diff line number Diff line
@@ -484,10 +484,12 @@ struct mlx5e_xdp_info_fifo {

struct mlx5e_xdpsq;
struct mlx5e_xmit_data;
struct xsk_tx_metadata;
typedef int (*mlx5e_fp_xmit_xdp_frame_check)(struct mlx5e_xdpsq *);
typedef bool (*mlx5e_fp_xmit_xdp_frame)(struct mlx5e_xdpsq *,
					struct mlx5e_xmit_data *,
					int);
					int,
					struct xsk_tx_metadata *);

struct mlx5e_xdpsq {
	/* data path */
Loading