Commit c390adfd authored by Alexei Starovoitov's avatar Alexei Starovoitov
Browse files

Merge branch 'bpf-fsession-support'

Menglong Dong says:

====================
bpf: fsession support

overall
-------
Sometimes, we need to hook both the entry and exit of a function with
TRACING. Therefore, we need define a FENTRY and a FEXIT for the target
function, which is not convenient.

Therefore, we add a tracing session support for TRACING. Generally
speaking, it's similar to kprobe session, which can hook both the entry
and exit of a function with a single BPF program.

We allow the usage of bpf_get_func_ret() to get the return value in the
fentry of the tracing session, as it will always get "0", which is safe
enough and is OK.

Session cookie is also supported with the kfunc bpf_session_cookie().
In order to limit the stack usage, we limit the maximum number of cookies
to 4.

kfunc design
------------
In order to keep consistency with existing kfunc, we don't introduce new
kfunc for fsession. Instead, we reuse the existing kfunc
bpf_session_cookie() and bpf_session_is_return().

The prototype of bpf_session_cookie() and bpf_session_is_return() don't
satisfy our needs, so we change their prototype by adding the argument
"void *ctx" to them.

We inline bpf_session_cookie() and bpf_session_is_return() for fsession
in the verifier directly. Therefore, we don't need to introduce new
functions for them.

architecture
------------
The fsession stuff is arch related, so the -EOPNOTSUPP will be returned if
it is not supported yet by the arch. In this series, we only support
x86_64. And later, other arch will be implemented.

Changes v12 -> v13:
* fix the selftests fail on !x86_64 in the 11th patch
* v12: https://lore.kernel.org/bpf/20260124033119.28682-1-dongml2@chinatelecom.cn/

Changes v11 -> v12:
* update the variable "delta" in the 2nd patch
* improve the fsession testcase by adding the 11th patch, which will test
  bpf_get_func_* for fsession
* v11: https://lore.kernel.org/bpf/20260123073532.238985-1-dongml2@chinatelecom.cn/

Changes v10 -> v11:
* rebase and fix the conflicts in the 2nd patch
* use "volatile" in the 11th patch
* rename BPF_TRAMP_SHIFT_* to BPF_TRAMP_*_SHIFT
* v10: https://lore.kernel.org/bpf/20260115112246.221082-1-dongml2@chinatelecom.cn/

Changes v9 -> v10:
* 1st patch: some small adjustment, such as use switch in
  bpf_prog_has_trampoline()
* 2nd patch: some adjustment to the commit log and comment
* 3rd patch:
  - drop the declaration of bpf_session_is_return() and
    bpf_session_cookie()
  - use vmlinux.h instead of bpf_kfuncs.h in uprobe_multi_session.c,
    kprobe_multi_session_cookie.c and uprobe_multi_session_cookie.c
* 4th patch:
  - some adjustment to the comment and commit log
  - rename the prefix from BPF_TRAMP_M_ to BPF_TRAMP_SHIFT_
  - remove the definition of BPF_TRAMP_M_NR_ARGS
  - check the program type in bpf_session_filter()
* 5th patch: some adjustment to the commit log
* 6th patch:
  - add the "reg" to the function arguments of emit_store_stack_imm64()
  - use the positive offset in emit_store_stack_imm64()
* 7th patch:
  - use "|" for func_meta instead of "+"
  - pass the "func_meta_off" to invoke_bpf() explicitly, instead of
    computing it with "stack_size + 8"
  - pass the "cookie_off" to invoke_bpf() instead of computing the current
    cookie index with "func_meta"
* 8th patch:
  - split the modification to bpftool to a separate patch
* v9: https://lore.kernel.org/bpf/20260110141115.537055-1-dongml2@chinatelecom.cn/

Changes v8 -> v9:
* remove the definition of bpf_fsession_cookie and bpf_fsession_is_return
  in the 4th and 5th patch
* rename emit_st_r0_imm64() to emit_store_stack_imm64() in the 6th patch
* v8: https://lore.kernel.org/bpf/20260108022450.88086-1-dongml2@chinatelecom.cn/

Changes v7 -> v8:
* use the last byte of nr_args for bpf_get_func_arg_cnt() in the 2nd patch
* v7: https://lore.kernel.org/bpf/20260107064352.291069-1-dongml2@chinatelecom.cn/

Changes v6 -> v7:
* change the prototype of bpf_session_cookie() and bpf_session_is_return(),
  and reuse them instead of introduce new kfunc for fsession.
* v6: https://lore.kernel.org/bpf/20260104122814.183732-1-dongml2@chinatelecom.cn/

Changes v5 -> v6:
* No changes in this version, just a rebase to deal with conflicts.
* v5: https://lore.kernel.org/bpf/20251224130735.201422-1-dongml2@chinatelecom.cn/

Changes v4 -> v5:
* use fsession terminology consistently in all patches
* 1st patch:
  - use more explicit way in __bpf_trampoline_link_prog()
* 4th patch:
  - remove "cookie_cnt" in struct bpf_trampoline
* 6th patch:
  - rename nr_regs to func_md
  - define cookie_off in a new line
* 7th patch:
  - remove the handling of BPF_TRACE_SESSION in legacy fallback path for
    BPF_RAW_TRACEPOINT_OPEN
* v4: https://lore.kernel.org/bpf/20251217095445.218428-1-dongml2@chinatelecom.cn/

Changes v3 -> v4:
* instead of adding a new hlist to progs_hlist in trampoline, add the bpf
  program to both the fentry hlist and the fexit hlist.
* introduce the 2nd patch to reuse the nr_args field in the stack to
  store all the information we need(except the session cookies).
* limit the maximum number of cookies to 4.
* remove the logic to skip fexit if the fentry return non-zero.
* v3: https://lore.kernel.org/bpf/20251026030143.23807-1-dongml2@chinatelecom.cn/

Changes v2 -> v3:
* squeeze some patches:
  - the 2 patches for the kfunc bpf_tracing_is_exit() and
    bpf_fsession_cookie() are merged into the second patch.
  - the testcases for fsession are also squeezed.
* fix the CI error by move the testcase for bpf_get_func_ip to
  fsession_test.c
* v2: https://lore.kernel.org/bpf/20251022080159.553805-1-dongml2@chinatelecom.cn/

Changes v1 -> v2:
* session cookie support.
  In this version, session cookie is implemented, and the kfunc
  bpf_fsession_cookie() is added.
* restructure the layout of the stack.
  In this version, the session stuff that stored in the stack is changed,
  and we locate them after the return value to not break
  bpf_get_func_ip().
* testcase enhancement.
  Some nits in the testcase that suggested by Jiri is fixed. Meanwhile,
  the testcase for get_func_ip and session cookie is added too.
* v1: https://lore.kernel.org/bpf/20251018142124.783206-1-dongml2@chinatelecom.cn/
====================

Link: https://patch.msgid.link/20260124062008.8657-1-dongml2@chinatelecom.cn


Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
parents c7900f22 cb4bfacf
Loading
Loading
Loading
Loading
+53 −23
Original line number Diff line number Diff line
@@ -1300,6 +1300,16 @@ static void emit_st_r12(u8 **pprog, u32 size, u32 dst_reg, int off, int imm)
	emit_st_index(pprog, size, dst_reg, X86_REG_R12, off, imm);
}

static void emit_store_stack_imm64(u8 **pprog, int reg, int stack_off, u64 imm64)
{
	/*
	 * mov reg, imm64
	 * mov QWORD PTR [rbp + stack_off], reg
	 */
	emit_mov_imm64(pprog, reg, imm64 >> 32, (u32) imm64);
	emit_stx(pprog, BPF_DW, BPF_REG_FP, reg, stack_off);
}

static int emit_atomic_rmw(u8 **pprog, u32 atomic_op,
			   u32 dst_reg, u32 src_reg, s16 off, u8 bpf_size)
{
@@ -3084,13 +3094,19 @@ static int emit_cond_near_jump(u8 **pprog, void *func, void *ip, u8 jmp_cond)

static int invoke_bpf(const struct btf_func_model *m, u8 **pprog,
		      struct bpf_tramp_links *tl, int stack_size,
		      int run_ctx_off, bool save_ret,
		      void *image, void *rw_image)
		      int run_ctx_off, int func_meta_off, bool save_ret,
		      void *image, void *rw_image, u64 func_meta,
		      int cookie_off)
{
	int i;
	int i, cur_cookie = (cookie_off - stack_size) / 8;
	u8 *prog = *pprog;

	for (i = 0; i < tl->nr_links; i++) {
		if (tl->links[i]->link.prog->call_session_cookie) {
			emit_store_stack_imm64(&prog, BPF_REG_0, -func_meta_off,
				func_meta | (cur_cookie << BPF_TRAMP_COOKIE_INDEX_SHIFT));
			cur_cookie--;
		}
		if (invoke_bpf_prog(m, &prog, tl->links[i], stack_size,
				    run_ctx_off, save_ret, image, rw_image))
			return -EINVAL;
@@ -3208,12 +3224,14 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
					 void *func_addr)
{
	int i, ret, nr_regs = m->nr_args, stack_size = 0;
	int regs_off, nregs_off, ip_off, run_ctx_off, arg_stack_off, rbx_off;
	int regs_off, func_meta_off, ip_off, run_ctx_off, arg_stack_off, rbx_off;
	struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
	struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
	struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
	void *orig_call = func_addr;
	int cookie_off, cookie_cnt;
	u8 **branches = NULL;
	u64 func_meta;
	u8 *prog;
	bool save_ret;

@@ -3249,7 +3267,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
	 *                 [ ...             ]
	 * RBP - regs_off  [ reg_arg1        ]  program's ctx pointer
	 *
	 * RBP - nregs_off [ regs count	     ]  always
	 * RBP - func_meta_off [ regs count, etc ]  always
	 *
	 * RBP - ip_off    [ traced function ]  BPF_TRAMP_F_IP_ARG flag
	 *
@@ -3272,15 +3290,20 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
	stack_size += nr_regs * 8;
	regs_off = stack_size;

	/* regs count  */
	/* function matedata, such as regs count  */
	stack_size += 8;
	nregs_off = stack_size;
	func_meta_off = stack_size;

	if (flags & BPF_TRAMP_F_IP_ARG)
		stack_size += 8; /* room for IP address argument */

	ip_off = stack_size;

	cookie_cnt = bpf_fsession_cookie_cnt(tlinks);
	/* room for session cookies */
	stack_size += cookie_cnt * 8;
	cookie_off = stack_size;

	stack_size += 8;
	rbx_off = stack_size;

@@ -3348,20 +3371,13 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
	/* mov QWORD PTR [rbp - rbx_off], rbx */
	emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_6, -rbx_off);

	/* Store number of argument registers of the traced function:
	 *   mov rax, nr_regs
	 *   mov QWORD PTR [rbp - nregs_off], rax
	 */
	emit_mov_imm64(&prog, BPF_REG_0, 0, (u32) nr_regs);
	emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -nregs_off);
	func_meta = nr_regs;
	/* Store number of argument registers of the traced function */
	emit_store_stack_imm64(&prog, BPF_REG_0, -func_meta_off, func_meta);

	if (flags & BPF_TRAMP_F_IP_ARG) {
		/* Store IP address of the traced function:
		 * movabsq rax, func_addr
		 * mov QWORD PTR [rbp - ip_off], rax
		 */
		emit_mov_imm64(&prog, BPF_REG_0, (long) func_addr >> 32, (u32) (long) func_addr);
		emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -ip_off);
		/* Store IP address of the traced function */
		emit_store_stack_imm64(&prog, BPF_REG_0, -ip_off, (long)func_addr);
	}

	save_args(m, &prog, regs_off, false, flags);
@@ -3376,9 +3392,18 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
		}
	}

	if (bpf_fsession_cnt(tlinks)) {
		/* clear all the session cookies' value */
		for (int i = 0; i < cookie_cnt; i++)
			emit_store_stack_imm64(&prog, BPF_REG_0, -cookie_off + 8 * i, 0);
		/* clear the return value to make sure fentry always get 0 */
		emit_store_stack_imm64(&prog, BPF_REG_0, -8, 0);
	}

	if (fentry->nr_links) {
		if (invoke_bpf(m, &prog, fentry, regs_off, run_ctx_off,
			       flags & BPF_TRAMP_F_RET_FENTRY_RET, image, rw_image))
		if (invoke_bpf(m, &prog, fentry, regs_off, run_ctx_off, func_meta_off,
			       flags & BPF_TRAMP_F_RET_FENTRY_RET, image, rw_image,
			       func_meta, cookie_off))
			return -EINVAL;
	}

@@ -3438,9 +3463,14 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
		}
	}

	/* set the "is_return" flag for fsession */
	func_meta |= (1ULL << BPF_TRAMP_IS_RETURN_SHIFT);
	if (bpf_fsession_cnt(tlinks))
		emit_store_stack_imm64(&prog, BPF_REG_0, -func_meta_off, func_meta);

	if (fexit->nr_links) {
		if (invoke_bpf(m, &prog, fexit, regs_off, run_ctx_off,
			       false, image, rw_image)) {
		if (invoke_bpf(m, &prog, fexit, regs_off, run_ctx_off, func_meta_off,
			       false, image, rw_image, func_meta, cookie_off)) {
			ret = -EINVAL;
			goto cleanup;
		}
+36 −0
Original line number Diff line number Diff line
@@ -1229,6 +1229,9 @@ enum {
#endif
};

#define BPF_TRAMP_COOKIE_INDEX_SHIFT	8
#define BPF_TRAMP_IS_RETURN_SHIFT	63

struct bpf_tramp_links {
	struct bpf_tramp_link *links[BPF_MAX_TRAMP_LINKS];
	int nr_links;
@@ -1309,6 +1312,7 @@ enum bpf_tramp_prog_type {
	BPF_TRAMP_MODIFY_RETURN,
	BPF_TRAMP_MAX,
	BPF_TRAMP_REPLACE, /* more than MAX */
	BPF_TRAMP_FSESSION,
};

struct bpf_tramp_image {
@@ -1779,6 +1783,7 @@ struct bpf_prog {
				enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
				call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
				call_get_func_ip:1, /* Do we call get_func_ip() */
				call_session_cookie:1, /* Do we call bpf_session_cookie() */
				tstamp_type_access:1, /* Accessed __sk_buff->tstamp_type */
				sleepable:1;	/* BPF program is sleepable */
	enum bpf_prog_type	type;		/* Type of BPF program */
@@ -1875,6 +1880,11 @@ struct bpf_tracing_link {
	struct bpf_prog *tgt_prog;
};

struct bpf_fsession_link {
	struct bpf_tracing_link link;
	struct bpf_tramp_link fexit;
};

struct bpf_raw_tp_link {
	struct bpf_link link;
	struct bpf_raw_event_map *btp;
@@ -2169,6 +2179,32 @@ static inline void bpf_struct_ops_desc_release(struct bpf_struct_ops_desc *st_op

#endif

static inline int bpf_fsession_cnt(struct bpf_tramp_links *links)
{
	struct bpf_tramp_links fentries = links[BPF_TRAMP_FENTRY];
	int cnt = 0;

	for (int i = 0; i < links[BPF_TRAMP_FENTRY].nr_links; i++) {
		if (fentries.links[i]->link.prog->expected_attach_type == BPF_TRACE_FSESSION)
			cnt++;
	}

	return cnt;
}

static inline int bpf_fsession_cookie_cnt(struct bpf_tramp_links *links)
{
	struct bpf_tramp_links fentries = links[BPF_TRAMP_FENTRY];
	int cnt = 0;

	for (int i = 0; i < links[BPF_TRAMP_FENTRY].nr_links; i++) {
		if (fentries.links[i]->link.prog->call_session_cookie)
			cnt++;
	}

	return cnt;
}

int bpf_prog_ctx_arg_info_init(struct bpf_prog *prog,
			       const struct bpf_ctx_arg_aux *info, u32 cnt);

+1 −0
Original line number Diff line number Diff line
@@ -1145,6 +1145,7 @@ enum bpf_attach_type {
	BPF_NETKIT_PEER,
	BPF_TRACE_KPROBE_SESSION,
	BPF_TRACE_UPROBE_SESSION,
	BPF_TRACE_FSESSION,
	__MAX_BPF_ATTACH_TYPE
};

+2 −0
Original line number Diff line number Diff line
@@ -6219,6 +6219,7 @@ static int btf_validate_prog_ctx_type(struct bpf_verifier_log *log, const struct
		case BPF_TRACE_FENTRY:
		case BPF_TRACE_FEXIT:
		case BPF_MODIFY_RETURN:
		case BPF_TRACE_FSESSION:
			/* allow u64* as ctx */
			if (btf_is_int(t) && t->size == 8)
				return 0;
@@ -6820,6 +6821,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
			fallthrough;
		case BPF_LSM_CGROUP:
		case BPF_TRACE_FEXIT:
		case BPF_TRACE_FSESSION:
			/* When LSM programs are attached to void LSM hooks
			 * they use FEXIT trampolines and when attached to
			 * int LSM hooks, they use MODIFY_RETURN trampolines.
+17 −1
Original line number Diff line number Diff line
@@ -3577,6 +3577,7 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
	case BPF_PROG_TYPE_TRACING:
		if (prog->expected_attach_type != BPF_TRACE_FENTRY &&
		    prog->expected_attach_type != BPF_TRACE_FEXIT &&
		    prog->expected_attach_type != BPF_TRACE_FSESSION &&
		    prog->expected_attach_type != BPF_MODIFY_RETURN) {
			err = -EINVAL;
			goto out_put_prog;
@@ -3626,7 +3627,21 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
		key = bpf_trampoline_compute_key(tgt_prog, NULL, btf_id);
	}

	if (prog->expected_attach_type == BPF_TRACE_FSESSION) {
		struct bpf_fsession_link *fslink;

		fslink = kzalloc(sizeof(*fslink), GFP_USER);
		if (fslink) {
			bpf_link_init(&fslink->fexit.link, BPF_LINK_TYPE_TRACING,
				      &bpf_tracing_link_lops, prog, attach_type);
			fslink->fexit.cookie = bpf_cookie;
			link = &fslink->link;
		} else {
			link = NULL;
		}
	} else {
		link = kzalloc(sizeof(*link), GFP_USER);
	}
	if (!link) {
		err = -ENOMEM;
		goto out_put_prog;
@@ -4350,6 +4365,7 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
	case BPF_TRACE_RAW_TP:
	case BPF_TRACE_FENTRY:
	case BPF_TRACE_FEXIT:
	case BPF_TRACE_FSESSION:
	case BPF_MODIFY_RETURN:
		return BPF_PROG_TYPE_TRACING;
	case BPF_LSM_MAC:
Loading