Commit 7f7a958a authored by Martin KaFai Lau's avatar Martin KaFai Lau
Browse files

Merge branch 'add-a-dynptr-type-for-skb-metadata-for-tc-bpf'

Jakub Sitnicki says:

====================
Add a dynptr type for skb metadata for TC BPF

TL;DR
-----

This is the first step in an effort which aims to enable skb metadata
access for all BPF programs which operate on an skb context.

By skb metadata we mean the custom metadata area which can be allocated
from an XDP program with the bpf_xdp_adjust_meta helper [1]. Network stack
code accesses it using the skb_metadata_* helpers.

Changelog
---------
Changes in v7:
- Make dynptr read-only for cloned skbs for now. (Martin)
- Extend tests for skb clones to cover writes to metadata.
- Drop Jesse's review stamp for patch 2 due to an update.
- Link to v6: https://lore.kernel.org/r/20250804-skb-metadata-thru-dynptr-v6-0-05da400bfa4b@cloudflare.com

Changes in v6:
- Enable CONFIG_NET_ACT_MIRRED for bpf selftests to fix CI failure
- Switch from u32 to matchall classifier, which bpf selftests already use
- Link to v5: https://lore.kernel.org/r/20250731-skb-metadata-thru-dynptr-v5-0-f02f6b5688dc@cloudflare.com

Changes in v5:
- Invalidate skb payload and metadata slices on write to metadata. (Martin)
- Drop redundant bounds check in bpf_skb_meta_*(). (Martin)
- Check for unexpected flags in __bpf_dynptr_write(). (Martin)
- Fold bpf_skb_meta_{load,store}_bytes() into callers.
- Add a test for metadata access when an skb clone has been modified.
- Drop Eduard's Ack for patch 3. Patch updated.
- Keep Eduard's Ack for patches 4-8.
- Add Jesse's stamp from an internal review.
- Link to v4: https://lore.kernel.org/r/20250723-skb-metadata-thru-dynptr-v4-0-a0fed48bcd37@cloudflare.com

Changes in v4:
- Kill bpf_dynptr_from_skb_meta_rdonly. Not needed for now. (Marin)
- Add a test to cover passing OOB offsets to dynptr ops. (Eduard)
- Factor out bounds checks from bpf_dynptr_{read,write,slice}. (Eduard)
- Squash patches:
      bpf: Enable read access to skb metadata with bpf_dynptr_read
      bpf: Enable write access to skb metadata with bpf_dynptr_write
      bpf: Enable read-write access to skb metadata with dynptr slice
- Kept Eduard's Acks for v3 on unchanged patches.
- Link to v3: https://lore.kernel.org/r/20250721-skb-metadata-thru-dynptr-v3-0-e92be5534174@cloudflare.com

Changes in v3:
- Add a kfunc set for skb metadata access. Limited to TC BPF. (Martin)
- Drop patches related to skb metadata access outside of TC BPF:
      net: Clear skb metadata on handover from device to protocol
      selftests/bpf: Cover lack of access to skb metadata at ip layer
      selftests/bpf: Count successful bpf program runs
- Link to v2: https://lore.kernel.org/r/20250716-skb-metadata-thru-dynptr-v2-0-5f580447e1df@cloudflare.com

Changes in v2:
- Switch to a dedicated dynptr type for skb metadata (Andrii)
- Add verifier test coverage since we now touch its code
- Add missing test coverage for bpf_dynptr_adjust and access at an offset
- Link to v1: https://lore.kernel.org/r/20250630-skb-metadata-thru-dynptr-v1-0-f17da13625d8@cloudflare.com

Overview
--------

Today, the skb metadata is accessible only by the BPF TC ingress programs
through the __sk_buff->data_meta pointer. We propose a three step plan to
make skb metadata available to all other BPF programs which operate on skb
objects:

 1) Add a dynptr type for skb metadata (this patch set)

    This is a preparatory step, but it also stands on its own. Here we
    enable access to the skb metadata through a bpf_dynptr, the same way we
    can already access the skb payload today.

    As the next step (2), we want to relocate the metadata as skb travels
    through the network stack in order to persist it. That will require a
    safe way to access the metadata area irrespective of its location.

    This is where the dynptr [2] comes into play. It solves exactly that
    problem. A dynptr to skb metadata can be backed by a memory area that
    resides in a different location depending on the code path.

 2) Persist skb metadata past the TC hook (future)

    Having the metadata in front of the packet headers as the skb travels
    through the network stack is problematic - see the discussion of
    alternative approaches below. Hence, we plan to relocate it as
    necessary past the TC hook.

    Where to relocate it? We don't know yet. There are a couple of
    options: (i) move it to the top of skb headroom, or (ii) allocate
    dedicated memory for it.  They are not mutually exclusive. The right
    solution might be a mix.

    When to relocate it? That is also an open question. It could be done
    during device to protocol handover or lazily when headers get pushed or
    headroom gets resized.

 3) skb dynptr for sockops, sk_lookup, etc. (future)

    There are BPF program types don't operate on __sk_buff context, but
    either have, or could have, access to the skb itself. As a final touch,
    we want to provide a way to create an skb metadata dynptr for these
    program types.

TIMTOWDI
--------

Alternative approaches which we considered:

* Keep the metadata always in front of skb->data

We think it is a bad idea for two reasons, outlined below. Nevertheless we
are open to it, if necessary.

 1) Performance concerns

    It would require the network stack to move the metadata on each header
    pull/push - see skb_reorder_vlan_header() [3] for an example. While
    doable, there is an expected performance overhead.

 2) Potential for bugs

    In addition to updating skb_push/pull and pskp_expand_head, we would
    need to audit any code paths which operate on skb->data pointer
    directly without going through the helpers. This creates a "known
    unknown" risk.

* Design a new custom metadata area from scratch

We have tried that in Arthur's patch set [4]. One of the outcomes of the
discussion there was that we don't want to have two places to store custom
metadata. Hence the change of approach to make the existing custom metadata
area work.

-jkbs

[1] https://docs.ebpf.io/linux/helper-function/bpf_xdp_adjust_meta/
[2] https://docs.ebpf.io/linux/concepts/dynptrs/
[3] https://elixir.bootlin.com/linux/v6.16-rc6/source/net/core/skbuff.c#L6211
[4] https://lore.kernel.org/all/20250422-afabre-traits-010-rfc2-v2-0-92bcc6b146c9@arthurfabre.com/
====================

Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-0-8a39e636e0fb@cloudflare.com


Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
parents 8f5ae30d 403fae59
Loading
Loading
Loading
Loading
+6 −1
Original line number Diff line number Diff line
@@ -767,12 +767,15 @@ enum bpf_type_flag {
	 */
	MEM_WRITE		= BIT(18 + BPF_BASE_TYPE_BITS),

	/* DYNPTR points to skb_metadata_end()-skb_metadata_len() */
	DYNPTR_TYPE_SKB_META	= BIT(19 + BPF_BASE_TYPE_BITS),

	__BPF_TYPE_FLAG_MAX,
	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
};

#define DYNPTR_TYPE_FLAG_MASK	(DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_RINGBUF | DYNPTR_TYPE_SKB \
				 | DYNPTR_TYPE_XDP)
				 | DYNPTR_TYPE_XDP | DYNPTR_TYPE_SKB_META)

/* Max number of base types. */
#define BPF_BASE_TYPE_LIMIT	(1UL << BPF_BASE_TYPE_BITS)
@@ -1358,6 +1361,8 @@ enum bpf_dynptr_type {
	BPF_DYNPTR_TYPE_SKB,
	/* Underlying data is a xdp_buff */
	BPF_DYNPTR_TYPE_XDP,
	/* Points to skb_metadata_end()-skb_metadata_len() */
	BPF_DYNPTR_TYPE_SKB_META,
};

int bpf_dynptr_check_size(u32 size);
+6 −0
Original line number Diff line number Diff line
@@ -1784,6 +1784,7 @@ int __bpf_xdp_store_bytes(struct xdp_buff *xdp, u32 offset, void *buf, u32 len);
void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len);
void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off,
		      void *buf, unsigned long len, bool flush);
void *bpf_skb_meta_pointer(struct sk_buff *skb, u32 offset);
#else /* CONFIG_NET */
static inline int __bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset,
				       void *to, u32 len)
@@ -1818,6 +1819,11 @@ static inline void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, voi
				    unsigned long len, bool flush)
{
}

static inline void *bpf_skb_meta_pointer(struct sk_buff *skb, u32 offset)
{
	return NULL;
}
#endif /* CONFIG_NET */

#endif /* __LINUX_FILTER_H__ */
+11 −0
Original line number Diff line number Diff line
@@ -1780,6 +1780,9 @@ static int __bpf_dynptr_read(void *dst, u32 len, const struct bpf_dynptr_kern *s
		return __bpf_skb_load_bytes(src->data, src->offset + offset, dst, len);
	case BPF_DYNPTR_TYPE_XDP:
		return __bpf_xdp_load_bytes(src->data, src->offset + offset, dst, len);
	case BPF_DYNPTR_TYPE_SKB_META:
		memmove(dst, bpf_skb_meta_pointer(src->data, src->offset + offset), len);
		return 0;
	default:
		WARN_ONCE(true, "bpf_dynptr_read: unknown dynptr type %d\n", type);
		return -EFAULT;
@@ -1836,6 +1839,11 @@ int __bpf_dynptr_write(const struct bpf_dynptr_kern *dst, u32 offset, void *src,
		if (flags)
			return -EINVAL;
		return __bpf_xdp_store_bytes(dst->data, dst->offset + offset, src, len);
	case BPF_DYNPTR_TYPE_SKB_META:
		if (flags)
			return -EINVAL;
		memmove(bpf_skb_meta_pointer(dst->data, dst->offset + offset), src, len);
		return 0;
	default:
		WARN_ONCE(true, "bpf_dynptr_write: unknown dynptr type %d\n", type);
		return -EFAULT;
@@ -1882,6 +1890,7 @@ BPF_CALL_3(bpf_dynptr_data, const struct bpf_dynptr_kern *, ptr, u32, offset, u3
		return (unsigned long)(ptr->data + ptr->offset + offset);
	case BPF_DYNPTR_TYPE_SKB:
	case BPF_DYNPTR_TYPE_XDP:
	case BPF_DYNPTR_TYPE_SKB_META:
		/* skb and xdp dynptrs should use bpf_dynptr_slice / bpf_dynptr_slice_rdwr */
		return 0;
	default:
@@ -2710,6 +2719,8 @@ __bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr *p, u32 offset,
		bpf_xdp_copy_buf(ptr->data, ptr->offset + offset, buffer__opt, len, false);
		return buffer__opt;
	}
	case BPF_DYNPTR_TYPE_SKB_META:
		return bpf_skb_meta_pointer(ptr->data, ptr->offset + offset);
	default:
		WARN_ONCE(true, "unknown dynptr type %d\n", type);
		return NULL;
+2 −0
Original line number Diff line number Diff line
@@ -498,6 +498,8 @@ const char *dynptr_type_str(enum bpf_dynptr_type type)
		return "skb";
	case BPF_DYNPTR_TYPE_XDP:
		return "xdp";
	case BPF_DYNPTR_TYPE_SKB_META:
		return "skb_meta";
	case BPF_DYNPTR_TYPE_INVALID:
		return "<invalid>";
	default:
+13 −2
Original line number Diff line number Diff line
@@ -674,6 +674,8 @@ static enum bpf_dynptr_type arg_to_dynptr_type(enum bpf_arg_type arg_type)
		return BPF_DYNPTR_TYPE_SKB;
	case DYNPTR_TYPE_XDP:
		return BPF_DYNPTR_TYPE_XDP;
	case DYNPTR_TYPE_SKB_META:
		return BPF_DYNPTR_TYPE_SKB_META;
	default:
		return BPF_DYNPTR_TYPE_INVALID;
	}
@@ -690,6 +692,8 @@ static enum bpf_type_flag get_dynptr_type_flag(enum bpf_dynptr_type type)
		return DYNPTR_TYPE_SKB;
	case BPF_DYNPTR_TYPE_XDP:
		return DYNPTR_TYPE_XDP;
	case BPF_DYNPTR_TYPE_SKB_META:
		return DYNPTR_TYPE_SKB_META;
	default:
		return 0;
	}
@@ -2274,7 +2278,8 @@ static bool reg_is_pkt_pointer_any(const struct bpf_reg_state *reg)
static bool reg_is_dynptr_slice_pkt(const struct bpf_reg_state *reg)
{
	return base_type(reg->type) == PTR_TO_MEM &&
		(reg->type & DYNPTR_TYPE_SKB || reg->type & DYNPTR_TYPE_XDP);
	       (reg->type &
		(DYNPTR_TYPE_SKB | DYNPTR_TYPE_XDP | DYNPTR_TYPE_SKB_META));
}
/* Unmodified PTR_TO_PACKET[_META,_END] register from ctx access. */
@@ -11641,7 +11646,8 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
		if (dynptr_type == BPF_DYNPTR_TYPE_INVALID)
			return -EFAULT;
		if (dynptr_type == BPF_DYNPTR_TYPE_SKB)
		if (dynptr_type == BPF_DYNPTR_TYPE_SKB ||
		    dynptr_type == BPF_DYNPTR_TYPE_SKB_META)
			/* this will trigger clear_all_pkt_pointers(), which will
			 * invalidate all dynptr slices associated with the skb
			 */
@@ -12228,6 +12234,7 @@ enum special_kfunc_type {
	KF_bpf_rbtree_right,
	KF_bpf_dynptr_from_skb,
	KF_bpf_dynptr_from_xdp,
	KF_bpf_dynptr_from_skb_meta,
	KF_bpf_dynptr_slice,
	KF_bpf_dynptr_slice_rdwr,
	KF_bpf_dynptr_clone,
@@ -12277,9 +12284,11 @@ BTF_ID(func, bpf_rbtree_right)
#ifdef CONFIG_NET
BTF_ID(func, bpf_dynptr_from_skb)
BTF_ID(func, bpf_dynptr_from_xdp)
BTF_ID(func, bpf_dynptr_from_skb_meta)
#else
BTF_ID_UNUSED
BTF_ID_UNUSED
BTF_ID_UNUSED
#endif
BTF_ID(func, bpf_dynptr_slice)
BTF_ID(func, bpf_dynptr_slice_rdwr)
@@ -13253,6 +13262,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
				dynptr_arg_type |= DYNPTR_TYPE_SKB;
			} else if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_from_xdp]) {
				dynptr_arg_type |= DYNPTR_TYPE_XDP;
			} else if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_from_skb_meta]) {
				dynptr_arg_type |= DYNPTR_TYPE_SKB_META;
			} else if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_clone] &&
				   (dynptr_arg_type & MEM_UNINIT)) {
				enum bpf_dynptr_type parent_type = meta->initialized_dynptr.type;
Loading