Commit 5d9fb42f authored by Andrii Nakryiko's avatar Andrii Nakryiko
Browse files

Merge branch 'support-associating-bpf-programs-with-struct_ops'

Amery Hung says:

====================
Support associating BPF programs with struct_ops

Hi,

This patchset adds a new BPF command BPF_PROG_ASSOC_STRUCT_OPS to
the bpf() syscall to allow associating a BPF program with a struct_ops.
The command is introduced to address a emerging need from struct_ops
users. As the number of subsystems adopting struct_ops grows, more
users are building their struct_ops-based solution with some help from
other BPF programs. For example, scx_layer uses a syscall program as
a user space trigger to refresh layers [0]. It also uses tracing program
to infer whether a task is using GPU and needs to be prioritized [1]. In
these use cases, when there are multiple struct_ops instances, the
struct_ops kfuncs called from different BPF programs, whether struct_ops
or not needs to be able to refer to a specific one, which currently is
not possible.

The new BPF command will allow users to explicitly associate a BPF
program with a struct_ops map. The libbpf wrapper can be called after
loading programs and before attaching programs and struct_ops.

Internally, it will set prog->aux->st_ops_assoc to the struct_ops
map. struct_ops kfuncs can then get the associated struct_ops struct
by calling bpf_prog_get_assoc_struct_ops() with prog->aux, which can
be acquired from a "__prog" argument. The value of the special
argument will be fixed up by the verifier during verification.

The command conceptually associates the implementation of BPF programs
with struct_ops map, not the attachment. A program associated with the
map will take a refcount of it so that st_ops_assoc always points to a
valid struct_ops struct. struct_ops implementers can use the helper,
bpf_prog_get_assoc_struct_ops to get the pointer. The returned
struct_ops if not NULL is guaranteed to be valid and initialized.
However, it is not guaranteed that the struct_ops is attached. The
struct_ops implementer still need to take steps to track and check the
state of the struct_ops in kdata, if the use case demand the struct_ops
to be attached.

We can also consider support associating struct_ops link with BPF
programs, which on one hand make struct_ops implementer's job easier,
but might complicate libbpf workflow and does not apply to legacy
struct_ops attachment.

[0] https://github.com/sched-ext/scx/blob/main/scheds/rust/scx_layered/src/bpf/main.bpf.c#L557
[1] https://github.com/sched-ext/scx/blob/main/scheds/rust/scx_layered/src/bpf/main.bpf.c#L754
---
v7 -> v8
   - Fix libbpf return (Andrii)
   - Follow kfunc _impl suffic naming convention in selftest (Alexei)
   Link: https://lore.kernel.org/bpf/20251121231352.4032020-1-ameryhung@gmail.com/

v6 -> v7
   - Drop the guarantee that bpf_prog_get_assoc_struct_ops() will always return
     an initialized struct_ops (Martin)
   - Minor misc. changes in selftests
   Link: https://lore.kernel.org/bpf/20251114221741.317631-1-ameryhung@gmail.com/

v5 -> v6
   - Drop refcnt bumping for async callbacks and add RCU annotation (Martin)
   - Fix libbpf bug and update comments (Andrii)
   - Fix refcount bug in bpf_prog_assoc_struct_ops() (AI)
   Link: https://lore.kernel.org/bpf/20251104172652.1746988-1-ameryhung@gmail.com/

v4 -> v5
   - Simplify the API for getting associated struct_ops and dont't
     expose struct_ops map lifecycle management (Andrii, Alexei)
   Link: https://lore.kernel.org/bpf/20251024212914.1474337-1-ameryhung@gmail.com/

v3 -> v4
   - Fix potential dangling pointer in timer callback. Protect
     st_ops_assoc with RCU. The get helper now needs to be paired with
     bpf_struct_ops_put()
   - The command should only increase refcount once for a program
     (Andrii)
   - Test a struct_ops program reused in two struct_ops maps
   - Test getting associated struct_ops in timer callback
   Link: https://lore.kernel.org/bpf/20251017215627.722338-1-ameryhung@gmail.com/

v2 -> v3
   - Change the type of st_ops_assoc from void* (i.e., kdata) to bpf_map
     (Andrii)
   - Fix a bug that clears BPF_PTR_POISON when a struct_ops map is freed
     (Andrii)
   - Return NULL if the map is not fully initialized (Martin)
   - Move struct_ops map refcount inc/dec into internal helpers (Martin)
   - Add libbpf API, bpf_program__assoc_struct_ops (Andrii)
   Link: https://lore.kernel.org/bpf/20251016204503.3203690-1-ameryhung@gmail.com/

v1 -> v2
   - Poison st_ops_assoc when reusing the program in more than one
     struct_ops maps and add a helper to access the pointer (Andrii)
   - Minor style and naming changes (Andrii)
   Link: https://lore.kernel.org/bpf/20251010174953.2884682-1-ameryhung@gmail.com/

---
====================

Link: https://patch.msgid.link/20251203233748.668365-1-ameryhung@gmail.com


Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
parents 81f88f6a 0e841d19
Loading
Loading
Loading
Loading
+16 −0
Original line number Diff line number Diff line
@@ -1739,6 +1739,8 @@ struct bpf_prog_aux {
		struct rcu_head	rcu;
	};
	struct bpf_stream stream[2];
	struct mutex st_ops_assoc_mutex;
	struct bpf_map __rcu *st_ops_assoc;
};

struct bpf_prog {
@@ -2041,6 +2043,9 @@ static inline void bpf_module_put(const void *data, struct module *owner)
		module_put(owner);
}
int bpf_struct_ops_link_create(union bpf_attr *attr);
int bpf_prog_assoc_struct_ops(struct bpf_prog *prog, struct bpf_map *map);
void bpf_prog_disassoc_struct_ops(struct bpf_prog *prog);
void *bpf_prog_get_assoc_struct_ops(const struct bpf_prog_aux *aux);
u32 bpf_struct_ops_id(const void *kdata);

#ifdef CONFIG_NET
@@ -2088,6 +2093,17 @@ static inline int bpf_struct_ops_link_create(union bpf_attr *attr)
{
	return -EOPNOTSUPP;
}
static inline int bpf_prog_assoc_struct_ops(struct bpf_prog *prog, struct bpf_map *map)
{
	return -EOPNOTSUPP;
}
static inline void bpf_prog_disassoc_struct_ops(struct bpf_prog *prog)
{
}
static inline void *bpf_prog_get_assoc_struct_ops(const struct bpf_prog_aux *aux)
{
	return NULL;
}
static inline void bpf_map_struct_ops_info_fill(struct bpf_map_info *info, struct bpf_map *map)
{
}
+17 −0
Original line number Diff line number Diff line
@@ -918,6 +918,16 @@ union bpf_iter_link_info {
 *		Number of bytes read from the stream on success, or -1 if an
 *		error occurred (in which case, *errno* is set appropriately).
 *
 * BPF_PROG_ASSOC_STRUCT_OPS
 * 	Description
 * 		Associate a BPF program with a struct_ops map. The struct_ops
 * 		map is identified by *map_fd* and the BPF program is
 * 		identified by *prog_fd*.
 *
 * 	Return
 * 		0 on success or -1 if an error occurred (in which case,
 * 		*errno* is set appropriately).
 *
 * NOTES
 *	eBPF objects (maps and programs) can be shared between processes.
 *
@@ -974,6 +984,7 @@ enum bpf_cmd {
	BPF_PROG_BIND_MAP,
	BPF_TOKEN_CREATE,
	BPF_PROG_STREAM_READ_BY_FD,
	BPF_PROG_ASSOC_STRUCT_OPS,
	__MAX_BPF_CMD,
};

@@ -1894,6 +1905,12 @@ union bpf_attr {
		__u32		prog_fd;
	} prog_stream_read;

	struct {
		__u32		map_fd;
		__u32		prog_fd;
		__u32		flags;
	} prog_assoc_struct_ops;

} __attribute__((aligned(8)));

/* The description below is an attempt at providing documentation to eBPF
+88 −0
Original line number Diff line number Diff line
@@ -533,6 +533,17 @@ static void bpf_struct_ops_map_put_progs(struct bpf_struct_ops_map *st_map)
	}
}

static void bpf_struct_ops_map_dissoc_progs(struct bpf_struct_ops_map *st_map)
{
	u32 i;

	for (i = 0; i < st_map->funcs_cnt; i++) {
		if (!st_map->links[i])
			break;
		bpf_prog_disassoc_struct_ops(st_map->links[i]->prog);
	}
}

static void bpf_struct_ops_map_free_image(struct bpf_struct_ops_map *st_map)
{
	int i;
@@ -801,6 +812,9 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
			goto reset_unlock;
		}

		/* Poison pointer on error instead of return for backward compatibility */
		bpf_prog_assoc_struct_ops(prog, &st_map->map);

		link = kzalloc(sizeof(*link), GFP_USER);
		if (!link) {
			bpf_prog_put(prog);
@@ -980,6 +994,8 @@ static void bpf_struct_ops_map_free(struct bpf_map *map)
	if (btf_is_module(st_map->btf))
		module_put(st_map->st_ops_desc->st_ops->owner);

	bpf_struct_ops_map_dissoc_progs(st_map);

	bpf_struct_ops_map_del_ksyms(st_map);

	/* The struct_ops's function may switch to another struct_ops.
@@ -1396,6 +1412,78 @@ int bpf_struct_ops_link_create(union bpf_attr *attr)
	return err;
}

int bpf_prog_assoc_struct_ops(struct bpf_prog *prog, struct bpf_map *map)
{
	struct bpf_map *st_ops_assoc;

	guard(mutex)(&prog->aux->st_ops_assoc_mutex);

	st_ops_assoc = rcu_dereference_protected(prog->aux->st_ops_assoc,
						 lockdep_is_held(&prog->aux->st_ops_assoc_mutex));
	if (st_ops_assoc && st_ops_assoc == map)
		return 0;

	if (st_ops_assoc) {
		if (prog->type != BPF_PROG_TYPE_STRUCT_OPS)
			return -EBUSY;

		rcu_assign_pointer(prog->aux->st_ops_assoc, BPF_PTR_POISON);
	} else {
		/*
		 * struct_ops map does not track associated non-struct_ops programs.
		 * Bump the refcount to make sure st_ops_assoc is always valid.
		 */
		if (prog->type != BPF_PROG_TYPE_STRUCT_OPS)
			bpf_map_inc(map);

		rcu_assign_pointer(prog->aux->st_ops_assoc, map);
	}

	return 0;
}

void bpf_prog_disassoc_struct_ops(struct bpf_prog *prog)
{
	struct bpf_map *st_ops_assoc;

	guard(mutex)(&prog->aux->st_ops_assoc_mutex);

	st_ops_assoc = rcu_dereference_protected(prog->aux->st_ops_assoc,
						 lockdep_is_held(&prog->aux->st_ops_assoc_mutex));
	if (!st_ops_assoc || st_ops_assoc == BPF_PTR_POISON)
		return;

	if (prog->type != BPF_PROG_TYPE_STRUCT_OPS)
		bpf_map_put(st_ops_assoc);

	RCU_INIT_POINTER(prog->aux->st_ops_assoc, NULL);
}

/*
 * Get a reference to the struct_ops struct (i.e., kdata) associated with a
 * program. Should only be called in BPF program context (e.g., in a kfunc).
 *
 * If the returned pointer is not NULL, it must points to a valid struct_ops.
 * The struct_ops map is not guaranteed to be initialized nor attached.
 * Kernel struct_ops implementers are responsible for tracking and checking
 * the state of the struct_ops if the use case requires an initialized or
 * attached struct_ops.
 */
void *bpf_prog_get_assoc_struct_ops(const struct bpf_prog_aux *aux)
{
	struct bpf_struct_ops_map *st_map;
	struct bpf_map *st_ops_assoc;

	st_ops_assoc = rcu_dereference_check(aux->st_ops_assoc, bpf_rcu_lock_held());
	if (!st_ops_assoc || st_ops_assoc == BPF_PTR_POISON)
		return NULL;

	st_map = (struct bpf_struct_ops_map *)st_ops_assoc;

	return &st_map->kvalue.data;
}
EXPORT_SYMBOL_GPL(bpf_prog_get_assoc_struct_ops);

void bpf_map_struct_ops_info_fill(struct bpf_map_info *info, struct bpf_map *map)
{
	struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map;
+3 −0
Original line number Diff line number Diff line
@@ -136,6 +136,7 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag
	mutex_init(&fp->aux->used_maps_mutex);
	mutex_init(&fp->aux->ext_mutex);
	mutex_init(&fp->aux->dst_mutex);
	mutex_init(&fp->aux->st_ops_assoc_mutex);

#ifdef CONFIG_BPF_SYSCALL
	bpf_prog_stream_init(fp);
@@ -286,6 +287,7 @@ void __bpf_prog_free(struct bpf_prog *fp)
	if (fp->aux) {
		mutex_destroy(&fp->aux->used_maps_mutex);
		mutex_destroy(&fp->aux->dst_mutex);
		mutex_destroy(&fp->aux->st_ops_assoc_mutex);
		kfree(fp->aux->poke_tab);
		kfree(fp->aux);
	}
@@ -2896,6 +2898,7 @@ static void bpf_prog_free_deferred(struct work_struct *work)
#endif
	bpf_free_used_maps(aux);
	bpf_free_used_btfs(aux);
	bpf_prog_disassoc_struct_ops(aux->prog);
	if (bpf_prog_is_dev_bound(aux))
		bpf_prog_dev_bound_destroy(aux->prog);
#ifdef CONFIG_PERF_EVENTS
+46 −0
Original line number Diff line number Diff line
@@ -6122,6 +6122,49 @@ static int prog_stream_read(union bpf_attr *attr)
	return ret;
}

#define BPF_PROG_ASSOC_STRUCT_OPS_LAST_FIELD prog_assoc_struct_ops.prog_fd

static int prog_assoc_struct_ops(union bpf_attr *attr)
{
	struct bpf_prog *prog;
	struct bpf_map *map;
	int ret;

	if (CHECK_ATTR(BPF_PROG_ASSOC_STRUCT_OPS))
		return -EINVAL;

	if (attr->prog_assoc_struct_ops.flags)
		return -EINVAL;

	prog = bpf_prog_get(attr->prog_assoc_struct_ops.prog_fd);
	if (IS_ERR(prog))
		return PTR_ERR(prog);

	if (prog->type == BPF_PROG_TYPE_STRUCT_OPS) {
		ret = -EINVAL;
		goto put_prog;
	}

	map = bpf_map_get(attr->prog_assoc_struct_ops.map_fd);
	if (IS_ERR(map)) {
		ret = PTR_ERR(map);
		goto put_prog;
	}

	if (map->map_type != BPF_MAP_TYPE_STRUCT_OPS) {
		ret = -EINVAL;
		goto put_map;
	}

	ret = bpf_prog_assoc_struct_ops(prog, map);

put_map:
	bpf_map_put(map);
put_prog:
	bpf_prog_put(prog);
	return ret;
}

static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size)
{
	union bpf_attr attr;
@@ -6261,6 +6304,9 @@ static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size)
	case BPF_PROG_STREAM_READ_BY_FD:
		err = prog_stream_read(&attr);
		break;
	case BPF_PROG_ASSOC_STRUCT_OPS:
		err = prog_assoc_struct_ops(&attr);
		break;
	default:
		err = -EINVAL;
		break;
Loading