Commit abd0c0f6 authored by Martin KaFai Lau's avatar Martin KaFai Lau
Browse files

Merge branch 'make-tc-bpf-helpers-preserve-skb-metadata'

Jakub Sitnicki says:

====================
Make TC BPF helpers preserve skb metadata

Changes in v4:
- Fix copy-paste bug in check_metadata() test helper (AI review)
- Add "out of scope" section (at the bottom)
- Link to v3: https://lore.kernel.org/r/20251026-skb-meta-rx-path-v3-0-37cceebb95d3@cloudflare.com

Changes in v3:
- Use the already existing BPF_STREAM_STDERR const in tests (Martin)
- Unclone skb head on bpf_dynptr_write to skb metadata (patch 3) (Martin)
- Swap order of patches 1 & 2 to refer to skb_postpush_data_move() in docs
- Mention in skb_data_move() docs how to move just the metadata
- Note in pskb_expand_head() docs to move metadata after skb_push() (Jakub)
- Link to v2: https://lore.kernel.org/r/20251019-skb-meta-rx-path-v2-0-f9a58f3eb6d6@cloudflare.com

Changes in v2:
- Tweak WARN_ON_ONCE check in skb_data_move() (patch 2)
- Convert all tests to verify skb metadata in BPF (patches 9-10)
- Add test coverage for modified BPF helpers (patches 12-15)
- Link to RFCv1: https://lore.kernel.org/r/20250929-skb-meta-rx-path-v1-0-de700a7ab1cb@cloudflare.com

This patch set continues our work [1] to allow BPF programs and user-space
applications to attach multiple bytes of metadata to packets via the
XDP/skb metadata area.

The focus of this patch set it to ensure that skb metadata remains intact
when packets pass through a chain of TC BPF programs that call helpers
which operate on skb head.

Currently, several helpers that either adjust the skb->data pointer or
reallocate skb->head do not preserve metadata at its expected location,
that is immediately in front of the MAC header. These are:

- bpf_skb_adjust_room
- bpf_skb_change_head
- bpf_skb_change_proto
- bpf_skb_change_tail
- bpf_skb_vlan_pop
- bpf_skb_vlan_push

In TC BPF context, metadata must be moved whenever skb->data changes to
keep the skb->data_meta pointer valid. I don't see any way around
it. Creative ideas how to avoid that would be very welcome.

With that in mind, we can patch the helpers in at least two different ways:

1. Integrate metadata move into header move

   Replace the existing memmove, which follows skb_push/pull, with a helper
   that moves both headers and metadata in a single call. This avoids an
   extra memmove but reduces transparency.

        skb_pull(skb, len);
-       memmove(skb->data, skb->data - len, n);
+       skb_postpull_data_move(skb, len, n);
        skb->mac_header += len;

        skb_push(skb, len)
-       memmove(skb->data, skb->data + len, n);
+       skb_postpush_data_move(skb, len, n);
        skb->mac_header -= len;

2. Move metadata separately

   Add a dedicated metadata move after the header move. This is more
   explicit but costs an additional memmove.

        skb_pull(skb, len);
        memmove(skb->data, skb->data - len, n);
+       skb_metadata_postpull_move(skb, len);
        skb->mac_header += len;

        skb_push(skb, len)
+       skb_metadata_postpush_move(skb, len);
        memmove(skb->data, skb->data + len, n);
        skb->mac_header -= len;

This patch set implements option (1), expecting that "you can have just one
memmove" will be the most obvious feedback, while readability is a,
somewhat subjective, matter of taste, which I don't claim to have ;-)

The structure of the patch set is as follows:

- patches 1-4 prepare ground for safe-proofing the BPF helpers
- patches 5-9 modify the BPF helpers to preserve skb metadata
- patches 10-11 prepare ground for metadata tests with BPF helper calls
- patches 12-16 adapt and expand tests to cover the made changes

Out of scope for this series:
- safe-proofing tunnel & tagging devices - VLAN, GRE, ...
  (next in line, in development preview at [2])
- metadata access after packet foward
  (to do after Rx path - once metadata reliably reaches sk_filter)

Thanks,
-jkbs

[1] https://lore.kernel.org/all/20250814-skb-metadata-thru-dynptr-v7-0-8a39e636e0fb@cloudflare.com/
[2] https://github.com/jsitnicki/linux/commits/skb-meta/safeproof-netdevs/
====================

Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-0-5ceb08a9b37b@cloudflare.com


Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
parents a0c3aefb d2c5cca3
Loading
Loading
Loading
Loading
+9 −0
Original line number Diff line number Diff line
@@ -1781,6 +1781,8 @@ int __bpf_xdp_store_bytes(struct xdp_buff *xdp, u32 offset, void *buf, u32 len);
void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len);
void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off,
		      void *buf, unsigned long len, bool flush);
int __bpf_skb_meta_store_bytes(struct sk_buff *skb, u32 offset,
			       const void *from, u32 len, u64 flags);
void *bpf_skb_meta_pointer(struct sk_buff *skb, u32 offset);
#else /* CONFIG_NET */
static inline int __bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset,
@@ -1817,6 +1819,13 @@ static inline void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, voi
{
}

static inline int __bpf_skb_meta_store_bytes(struct sk_buff *skb, u32 offset,
					     const void *from, u32 len,
					     u64 flags)
{
	return -EOPNOTSUPP;
}

static inline void *bpf_skb_meta_pointer(struct sk_buff *skb, u32 offset)
{
	return ERR_PTR(-EOPNOTSUPP);
+6 −7
Original line number Diff line number Diff line
@@ -355,16 +355,17 @@ static inline int __vlan_insert_inner_tag(struct sk_buff *skb,
					  __be16 vlan_proto, u16 vlan_tci,
					  unsigned int mac_len)
{
	const u8 meta_len = mac_len > ETH_TLEN ? skb_metadata_len(skb) : 0;
	struct vlan_ethhdr *veth;

	if (skb_cow_head(skb, VLAN_HLEN) < 0)
	if (skb_cow_head(skb, meta_len + VLAN_HLEN) < 0)
		return -ENOMEM;

	skb_push(skb, VLAN_HLEN);

	/* Move the mac header sans proto to the beginning of the new header. */
	if (likely(mac_len > ETH_TLEN))
		memmove(skb->data, skb->data + VLAN_HLEN, mac_len - ETH_TLEN);
		skb_postpush_data_move(skb, VLAN_HLEN, mac_len - ETH_TLEN);
	if (skb_mac_header_was_set(skb))
		skb->mac_header -= VLAN_HLEN;

@@ -731,18 +732,16 @@ static inline void vlan_set_encap_proto(struct sk_buff *skb,
 *
 * Expects the skb to contain a VLAN tag in the payload, and to have skb->data
 * pointing at the MAC header.
 *
 * Returns: a new pointer to skb->data, or NULL on failure to pull.
 */
static inline void *vlan_remove_tag(struct sk_buff *skb, u16 *vlan_tci)
static inline void vlan_remove_tag(struct sk_buff *skb, u16 *vlan_tci)
{
	struct vlan_hdr *vhdr = (struct vlan_hdr *)(skb->data + ETH_HLEN);

	*vlan_tci = ntohs(vhdr->h_vlan_TCI);

	memmove(skb->data + VLAN_HLEN, skb->data, 2 * ETH_ALEN);
	vlan_set_encap_proto(skb, vhdr);
	return __skb_pull(skb, VLAN_HLEN);
	__skb_pull(skb, VLAN_HLEN);
	skb_postpull_data_move(skb, VLAN_HLEN, 2 * ETH_ALEN);
}

/**
+75 −0
Original line number Diff line number Diff line
@@ -4564,6 +4564,81 @@ static inline void skb_metadata_clear(struct sk_buff *skb)
	skb_metadata_set(skb, 0);
}

/**
 * skb_data_move - Move packet data and metadata after skb_push() or skb_pull().
 * @skb: packet to operate on
 * @len: number of bytes pushed or pulled from &sk_buff->data
 * @n: number of bytes to memmove() from pre-push/pull &sk_buff->data
 *
 * Moves @n bytes of packet data, can be zero, and all bytes of skb metadata.
 *
 * Assumes metadata is located immediately before &sk_buff->data prior to the
 * push/pull, and that sufficient headroom exists to hold it after an
 * skb_push(). Otherwise, metadata is cleared and a one-time warning is issued.
 *
 * Prefer skb_postpull_data_move() or skb_postpush_data_move() to calling this
 * helper directly.
 */
static inline void skb_data_move(struct sk_buff *skb, const int len,
				 const unsigned int n)
{
	const u8 meta_len = skb_metadata_len(skb);
	u8 *meta, *meta_end;

	if (!len || (!n && !meta_len))
		return;

	if (!meta_len)
		goto no_metadata;

	meta_end = skb_metadata_end(skb);
	meta = meta_end - meta_len;

	if (WARN_ON_ONCE(meta_end + len != skb->data ||
			 meta_len > skb_headroom(skb))) {
		skb_metadata_clear(skb);
		goto no_metadata;
	}

	memmove(meta + len, meta, meta_len + n);
	return;

no_metadata:
	memmove(skb->data, skb->data - len, n);
}

/**
 * skb_postpull_data_move - Move packet data and metadata after skb_pull().
 * @skb: packet to operate on
 * @len: number of bytes pulled from &sk_buff->data
 * @n: number of bytes to memmove() from pre-pull &sk_buff->data
 *
 * See skb_data_move() for details.
 */
static inline void skb_postpull_data_move(struct sk_buff *skb,
					  const unsigned int len,
					  const unsigned int n)
{
	DEBUG_NET_WARN_ON_ONCE(len > INT_MAX);
	skb_data_move(skb, len, n);
}

/**
 * skb_postpush_data_move - Move packet data and metadata after skb_push().
 * @skb: packet to operate on
 * @len: number of bytes pushed onto &sk_buff->data
 * @n: number of bytes to memmove() from pre-push &sk_buff->data
 *
 * See skb_data_move() for details.
 */
static inline void skb_postpush_data_move(struct sk_buff *skb,
					  const unsigned int len,
					  const unsigned int n)
{
	DEBUG_NET_WARN_ON_ONCE(len > INT_MAX);
	skb_data_move(skb, -len, n);
}

struct sk_buff *skb_clone_sk(struct sk_buff *skb);

#ifdef CONFIG_NETWORK_PHY_TIMESTAMPING
+2 −4
Original line number Diff line number Diff line
@@ -1842,10 +1842,8 @@ int __bpf_dynptr_write(const struct bpf_dynptr_kern *dst, u32 offset, void *src,
			return -EINVAL;
		return __bpf_xdp_store_bytes(dst->data, dst->offset + offset, src, len);
	case BPF_DYNPTR_TYPE_SKB_META:
		if (flags)
			return -EINVAL;
		memmove(bpf_skb_meta_pointer(dst->data, dst->offset + offset), src, len);
		return 0;
		return __bpf_skb_meta_store_bytes(dst->data, dst->offset + offset, src,
						  len, flags);
	default:
		WARN_ONCE(true, "bpf_dynptr_write: unknown dynptr type %d\n", type);
		return -EFAULT;
+22 −12
Original line number Diff line number Diff line
@@ -3253,11 +3253,11 @@ static void bpf_skb_change_protocol(struct sk_buff *skb, u16 proto)

static int bpf_skb_generic_push(struct sk_buff *skb, u32 off, u32 len)
{
	/* Caller already did skb_cow() with len as headroom,
	/* Caller already did skb_cow() with meta_len+len as headroom,
	 * so no need to do it here.
	 */
	skb_push(skb, len);
	memmove(skb->data, skb->data + len, off);
	skb_postpush_data_move(skb, len, off);
	memset(skb->data + off, 0, len);

	/* No skb_postpush_rcsum(skb, skb->data + off, len)
@@ -3281,7 +3281,7 @@ static int bpf_skb_generic_pop(struct sk_buff *skb, u32 off, u32 len)
	old_data = skb->data;
	__skb_pull(skb, len);
	skb_postpull_rcsum(skb, old_data + off, len);
	memmove(skb->data, old_data, off);
	skb_postpull_data_move(skb, len, off);

	return 0;
}
@@ -3326,10 +3326,11 @@ static int bpf_skb_net_hdr_pop(struct sk_buff *skb, u32 off, u32 len)
static int bpf_skb_proto_4_to_6(struct sk_buff *skb)
{
	const u32 len_diff = sizeof(struct ipv6hdr) - sizeof(struct iphdr);
	const u8 meta_len = skb_metadata_len(skb);
	u32 off = skb_mac_header_len(skb);
	int ret;

	ret = skb_cow(skb, len_diff);
	ret = skb_cow(skb, meta_len + len_diff);
	if (unlikely(ret < 0))
		return ret;

@@ -3489,6 +3490,7 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
	u8 inner_mac_len = flags >> BPF_ADJ_ROOM_ENCAP_L2_SHIFT;
	bool encap = flags & BPF_F_ADJ_ROOM_ENCAP_L3_MASK;
	u16 mac_len = 0, inner_net = 0, inner_trans = 0;
	const u8 meta_len = skb_metadata_len(skb);
	unsigned int gso_type = SKB_GSO_DODGY;
	int ret;

@@ -3499,7 +3501,7 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
			return -ENOTSUPP;
	}

	ret = skb_cow_head(skb, len_diff);
	ret = skb_cow_head(skb, meta_len + len_diff);
	if (unlikely(ret < 0))
		return ret;

@@ -3873,6 +3875,7 @@ static const struct bpf_func_proto sk_skb_change_tail_proto = {
static inline int __bpf_skb_change_head(struct sk_buff *skb, u32 head_room,
					u64 flags)
{
	const u8 meta_len = skb_metadata_len(skb);
	u32 max_len = BPF_SKB_MAX_LEN;
	u32 new_len = skb->len + head_room;
	int ret;
@@ -3882,7 +3885,7 @@ static inline int __bpf_skb_change_head(struct sk_buff *skb, u32 head_room,
		     new_len < skb->len))
		return -EINVAL;

	ret = skb_cow(skb, head_room);
	ret = skb_cow(skb, meta_len + head_room);
	if (likely(!ret)) {
		/* Idea for this helper is that we currently only
		 * allow to expand on mac header. This means that
@@ -3894,6 +3897,7 @@ static inline int __bpf_skb_change_head(struct sk_buff *skb, u32 head_room,
		 * for redirection into L2 device.
		 */
		__skb_push(skb, head_room);
		skb_postpush_data_move(skb, head_room, 0);
		memset(skb->data, 0, head_room);
		skb_reset_mac_header(skb);
		skb_reset_mac_len(skb);
@@ -12102,6 +12106,18 @@ void *bpf_skb_meta_pointer(struct sk_buff *skb, u32 offset)
	return skb_metadata_end(skb) - skb_metadata_len(skb) + offset;
}

int __bpf_skb_meta_store_bytes(struct sk_buff *skb, u32 offset,
			       const void *from, u32 len, u64 flags)
{
	if (unlikely(flags))
		return -EINVAL;
	if (unlikely(bpf_try_make_writable(skb, 0)))
		return -EFAULT;

	memmove(bpf_skb_meta_pointer(skb, offset), from, len);
	return 0;
}

__bpf_kfunc_start_defs();
__bpf_kfunc int bpf_dynptr_from_skb(struct __sk_buff *s, u64 flags,
				    struct bpf_dynptr *ptr__uninit)
@@ -12129,9 +12145,6 @@ __bpf_kfunc int bpf_dynptr_from_skb(struct __sk_buff *s, u64 flags,
 * XDP context with bpf_xdp_adjust_meta(). Serves as an alternative to
 * &__sk_buff->data_meta.
 *
 * If passed @skb_ is a clone which shares the data with the original, the
 * dynptr will be read-only. This limitation may be lifted in the future.
 *
 * Return:
 * * %0         - dynptr ready to use
 * * %-EINVAL   - invalid flags, dynptr set to null
@@ -12149,9 +12162,6 @@ __bpf_kfunc int bpf_dynptr_from_skb_meta(struct __sk_buff *skb_, u64 flags,

	bpf_dynptr_init(ptr, skb, BPF_DYNPTR_TYPE_SKB_META, 0, skb_metadata_len(skb));

	if (skb_cloned(skb))
		bpf_dynptr_set_rdonly(ptr);

	return 0;
}

Loading