Commit 7b769adc authored by Paolo Abeni's avatar Paolo Abeni
Browse files
Daniel Borkmann says:

====================
pull-request: bpf-next 2024-07-08

The following pull-request contains BPF updates for your *net-next* tree.

We've added 102 non-merge commits during the last 28 day(s) which contain
a total of 127 files changed, 4606 insertions(+), 980 deletions(-).

The main changes are:

1) Support resilient split BTF which cuts down on duplication and makes BTF
   as compact as possible wrt BTF from modules, from Alan Maguire & Eduard Zingerman.

2) Add support for dumping kfunc prototypes from BTF which enables both detecting
   as well as dumping compilable prototypes for kfuncs, from Daniel Xu.

3) Batch of s390x BPF JIT improvements to add support for BPF arena and to implement
   support for BPF exceptions, from Ilya Leoshkevich.

4) Batch of riscv64 BPF JIT improvements in particular to add 12-argument support
   for BPF trampolines and to utilize bpf_prog_pack for the latter, from Pu Lehui.

5) Extend BPF test infrastructure to add a CHECKSUM_COMPLETE validation option
   for skbs and add coverage along with it, from Vadim Fedorenko.

6) Inline bpf_get_current_task/_btf() helpers in the arm64 BPF JIT which gives
   a small 1% performance improvement in micro-benchmarks, from Puranjay Mohan.

7) Extend the BPF verifier to track the delta between linked registers in order
   to better deal with recent LLVM code optimizations, from Alexei Starovoitov.

8) Fix bpf_wq_set_callback_impl() kfunc signature where the third argument should
   have been a pointer to the map value, from Benjamin Tissoires.

9) Extend BPF selftests to add regular expression support for test output matching
   and adjust some of the selftest when compiled under gcc, from Cupertino Miranda.

10) Simplify task_file_seq_get_next() and remove an unnecessary loop which always
    iterates exactly once anyway, from Dan Carpenter.

11) Add the capability to offload the netfilter flowtable in XDP layer through
    kfuncs, from Florian Westphal & Lorenzo Bianconi.

12) Various cleanups in networking helpers in BPF selftests to shave off a few
    lines of open-coded functions on client/server handling, from Geliang Tang.

13) Properly propagate prog->aux->tail_call_reachable out of BPF verifier, so
    that x86 JIT does not need to implement detection, from Leon Hwang.

14) Fix BPF verifier to add a missing check_func_arg_reg_off() to prevent an
    out-of-bounds memory access for dynpointers, from Matt Bobrowski.

15) Fix bpf_session_cookie() kfunc to return __u64 instead of long pointer as
    it might lead to problems on 32-bit archs, from Jiri Olsa.

16) Enhance traffic validation and dynamic batch size support in xsk selftests,
    from Tushar Vyavahare.

bpf-next-for-netdev

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (102 commits)
  selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  selftests/bpf: amend for wrong bpf_wq_set_callback_impl signature
  bpf: helpers: fix bpf_wq_set_callback_impl signature
  libbpf: Add NULL checks to bpf_object__{prev_map,next_map}
  selftests/bpf: Remove exceptions tests from DENYLIST.s390x
  s390/bpf: Implement exceptions
  s390/bpf: Change seen_reg to a mask
  bpf: Remove unnecessary loop in task_file_seq_get_next()
  riscv, bpf: Optimize stack usage of trampoline
  bpf, devmap: Add .map_alloc_check
  selftests/bpf: Remove arena tests from DENYLIST.s390x
  selftests/bpf: Add UAF tests for arena atomics
  selftests/bpf: Introduce __arena_global
  s390/bpf: Support arena atomics
  s390/bpf: Enable arena
  s390/bpf: Support address space cast instruction
  s390/bpf: Support BPF_PROBE_MEM32
  s390/bpf: Land on the next JITed instruction after exception
  s390/bpf: Introduce pre- and post- probe functions
  s390/bpf: Get rid of get_probe_mem_regno()
  ...
====================

Link: https://patch.msgid.link/20240708221438.10974-1-daniel@iogearbox.net


Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
parents 870a1dbc 90dc9460
Loading
Loading
Loading
Loading
+45 −35
Original line number Diff line number Diff line
@@ -5,12 +5,19 @@
BPF Instruction Set Architecture (ISA)
======================================

eBPF (which is no longer an acronym for anything), also commonly
eBPF, also commonly
referred to as BPF, is a technology with origins in the Linux kernel
that can run untrusted programs in a privileged context such as an
operating system kernel. This document specifies the BPF instruction
set architecture (ISA).

As a historical note, BPF originally stood for Berkeley Packet Filter,
but now that it can do so much more than packet filtering, the acronym
no longer makes sense. BPF is now considered a standalone term that
does not stand for anything.  The original BPF is sometimes referred to
as cBPF (classic BPF) to distinguish it from the now widely deployed
eBPF (extended BPF).

Documentation conventions
=========================

@@ -18,7 +25,7 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
BCP 14 `<https://www.rfc-editor.org/info/rfc2119>`_
`RFC8174 <https://www.rfc-editor.org/info/rfc8174>`_
`<https://www.rfc-editor.org/info/rfc8174>`_
when, and only when, they appear in all capitals, as shown here.

For brevity and consistency, this document refers to families
@@ -59,24 +66,18 @@ numbers.

Functions
---------
* htobe16: Takes an unsigned 16-bit number in host-endian format and
  returns the equivalent number as an unsigned 16-bit number in big-endian
  format.
* htobe32: Takes an unsigned 32-bit number in host-endian format and
  returns the equivalent number as an unsigned 32-bit number in big-endian
  format.
* htobe64: Takes an unsigned 64-bit number in host-endian format and
  returns the equivalent number as an unsigned 64-bit number in big-endian
  format.
* htole16: Takes an unsigned 16-bit number in host-endian format and
  returns the equivalent number as an unsigned 16-bit number in little-endian
  format.
* htole32: Takes an unsigned 32-bit number in host-endian format and
  returns the equivalent number as an unsigned 32-bit number in little-endian
  format.
* htole64: Takes an unsigned 64-bit number in host-endian format and
  returns the equivalent number as an unsigned 64-bit number in little-endian
  format.

The following byteswap functions are direction-agnostic.  That is,
the same function is used for conversion in either direction discussed
below.

* be16: Takes an unsigned 16-bit number and converts it between
  host byte order and big-endian
  (`IEN137 <https://www.rfc-editor.org/ien/ien137.txt>`_) byte order.
* be32: Takes an unsigned 32-bit number and converts it between
  host byte order and big-endian byte order.
* be64: Takes an unsigned 64-bit number and converts it between
  host byte order and big-endian byte order.
* bswap16: Takes an unsigned 16-bit number in either big- or little-endian
  format and returns the equivalent number with the same bit width but
  opposite endianness.
@@ -86,7 +87,12 @@ Functions
* bswap64: Takes an unsigned 64-bit number in either big- or little-endian
  format and returns the equivalent number with the same bit width but
  opposite endianness.

* le16: Takes an unsigned 16-bit number and converts it between
  host byte order and little-endian byte order.
* le32: Takes an unsigned 32-bit number and converts it between
  host byte order and little-endian byte order.
* le64: Takes an unsigned 64-bit number and converts it between
  host byte order and little-endian byte order.

Definitions
-----------
@@ -437,8 +443,8 @@ and MUST be set to 0.
  =====  ========  =====  =================================================
  class  source    value  description
  =====  ========  =====  =================================================
  ALU    TO_LE     0      convert between host byte order and little endian
  ALU    TO_BE     1      convert between host byte order and big endian
  ALU    LE        0      convert between host byte order and little endian
  ALU    BE        1      convert between host byte order and big endian
  ALU64  Reserved  0      do byte swap unconditionally
  =====  ========  =====  =================================================

@@ -449,19 +455,19 @@ conformance group.

Examples:

``{END, TO_LE, ALU}`` with 'imm' = 16/32/64 means::
``{END, LE, ALU}`` with 'imm' = 16/32/64 means::

  dst = htole16(dst)
  dst = htole32(dst)
  dst = htole64(dst)
  dst = le16(dst)
  dst = le32(dst)
  dst = le64(dst)

``{END, TO_BE, ALU}`` with 'imm' = 16/32/64 means::
``{END, BE, ALU}`` with 'imm' = 16/32/64 means::

  dst = htobe16(dst)
  dst = htobe32(dst)
  dst = htobe64(dst)
  dst = be16(dst)
  dst = be32(dst)
  dst = be64(dst)

``{END, TO_LE, ALU64}`` with 'imm' = 16/32/64 means::
``{END, TO, ALU64}`` with 'imm' = 16/32/64 means::

  dst = bswap16(dst)
  dst = bswap32(dst)
@@ -541,13 +547,17 @@ Helper functions are a concept whereby BPF programs can call into a
set of function calls exposed by the underlying platform.

Historically, each helper function was identified by a static ID
encoded in the 'imm' field.  The available helper functions may differ
for each program type, but static IDs are unique across all program types.
encoded in the 'imm' field.  Further documentation of helper functions
is outside the scope of this document and standardization is left for
future work, but use is widely deployed and more information can be
found in platform-specific documentation (e.g., Linux kernel documentation).

Platforms that support the BPF Type Format (BTF) support identifying
a helper function by a BTF ID encoded in the 'imm' field, where the BTF ID
identifies the helper name and type.  Further documentation of BTF
is outside the scope of this document and is left for future work.
is outside the scope of this document and standardization is left for
future work, but use is widely deployed and more information can be
found in platform-specific documentation (e.g., Linux kernel documentation).

Program-local functions
~~~~~~~~~~~~~~~~~~~~~~~
+10 −2
Original line number Diff line number Diff line
@@ -1244,6 +1244,13 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
			break;
		}

		/* Implement helper call to bpf_get_current_task/_btf() inline */
		if (insn->src_reg == 0 && (insn->imm == BPF_FUNC_get_current_task ||
					   insn->imm == BPF_FUNC_get_current_task_btf)) {
			emit(A64_MRS_SP_EL0(r0), ctx);
			break;
		}

		ret = bpf_jit_get_func_addr(ctx->prog, insn, extra_pass,
					    &func_addr, &func_addr_fixed);
		if (ret < 0)
@@ -1829,8 +1836,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
			prog->jited_len = 0;
			goto out_free_hdr;
		}
		if (WARN_ON(bpf_jit_binary_pack_finalize(prog, ro_header,
							 header))) {
		if (WARN_ON(bpf_jit_binary_pack_finalize(ro_header, header))) {
			/* ro_header has been freed */
			ro_header = NULL;
			prog = orig_prog;
@@ -2581,6 +2587,8 @@ bool bpf_jit_inlines_helper_call(s32 imm)
{
	switch (imm) {
	case BPF_FUNC_get_smp_processor_id:
	case BPF_FUNC_get_current_task:
	case BPF_FUNC_get_current_task_btf:
		return true;
	default:
		return false;
+2 −2
Original line number Diff line number Diff line
@@ -225,7 +225,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
	fp->jited_len = proglen + FUNCTION_DESCR_SIZE;

	if (!fp->is_func || extra_pass) {
		if (bpf_jit_binary_pack_finalize(fp, fhdr, hdr)) {
		if (bpf_jit_binary_pack_finalize(fhdr, hdr)) {
			fp = org_fp;
			goto out_addrs;
		}
@@ -348,7 +348,7 @@ void bpf_jit_free(struct bpf_prog *fp)
		 * before freeing it.
		 */
		if (jit_data) {
			bpf_jit_binary_pack_finalize(fp, jit_data->fhdr, jit_data->hdr);
			bpf_jit_binary_pack_finalize(jit_data->fhdr, jit_data->hdr);
			kvfree(jit_data->addrs);
			kfree(jit_data);
		}
+87 −36
Original line number Diff line number Diff line
@@ -15,7 +15,10 @@
#include <asm/percpu.h>
#include "bpf_jit.h"

#define RV_MAX_REG_ARGS 8
#define RV_FENTRY_NINSNS 2
/* imm that allows emit_imm to emit max count insns */
#define RV_MAX_COUNT_IMM 0x7FFF7FF7FF7FF7FF

#define RV_REG_TCC RV_REG_A6
#define RV_REG_TCC_SAVED RV_REG_S6 /* Store A6 in S6 if program do calls */
@@ -690,26 +693,45 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
	return ret;
}

static void store_args(int nregs, int args_off, struct rv_jit_context *ctx)
static void store_args(int nr_arg_slots, int args_off, struct rv_jit_context *ctx)
{
	int i;

	for (i = 0; i < nregs; i++) {
	for (i = 0; i < nr_arg_slots; i++) {
		if (i < RV_MAX_REG_ARGS) {
			emit_sd(RV_REG_FP, -args_off, RV_REG_A0 + i, ctx);
		} else {
			/* skip slots for T0 and FP of traced function */
			emit_ld(RV_REG_T1, 16 + (i - RV_MAX_REG_ARGS) * 8, RV_REG_FP, ctx);
			emit_sd(RV_REG_FP, -args_off, RV_REG_T1, ctx);
		}
		args_off -= 8;
	}
}

static void restore_args(int nregs, int args_off, struct rv_jit_context *ctx)
static void restore_args(int nr_reg_args, int args_off, struct rv_jit_context *ctx)
{
	int i;

	for (i = 0; i < nregs; i++) {
	for (i = 0; i < nr_reg_args; i++) {
		emit_ld(RV_REG_A0 + i, -args_off, RV_REG_FP, ctx);
		args_off -= 8;
	}
}

static void restore_stack_args(int nr_stack_args, int args_off, int stk_arg_off,
			       struct rv_jit_context *ctx)
{
	int i;

	for (i = 0; i < nr_stack_args; i++) {
		emit_ld(RV_REG_T1, -(args_off - RV_MAX_REG_ARGS * 8), RV_REG_FP, ctx);
		emit_sd(RV_REG_FP, -stk_arg_off, RV_REG_T1, ctx);
		args_off -= 8;
		stk_arg_off -= 8;
	}
}

static int invoke_bpf_prog(struct bpf_tramp_link *l, int args_off, int retval_off,
			   int run_ctx_off, bool save_ret, struct rv_jit_context *ctx)
{
@@ -782,8 +804,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
{
	int i, ret, offset;
	int *branches_off = NULL;
	int stack_size = 0, nregs = m->nr_args;
	int retval_off, args_off, nregs_off, ip_off, run_ctx_off, sreg_off;
	int stack_size = 0, nr_arg_slots = 0;
	int retval_off, args_off, nregs_off, ip_off, run_ctx_off, sreg_off, stk_arg_off;
	struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
	struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
	struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
@@ -829,20 +851,21 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
	 * FP - sreg_off    [ callee saved reg	]
	 *
	 *		    [ pads              ] pads for 16 bytes alignment
	 *
	 *		    [ stack_argN        ]
	 *		    [ ...               ]
	 * FP - stk_arg_off [ stack_arg1        ] BPF_TRAMP_F_CALL_ORIG
	 */

	if (flags & (BPF_TRAMP_F_ORIG_STACK | BPF_TRAMP_F_SHARE_IPMODIFY))
		return -ENOTSUPP;

	/* extra regiters for struct arguments */
	for (i = 0; i < m->nr_args; i++)
		if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG)
			nregs += round_up(m->arg_size[i], 8) / 8 - 1;

	/* 8 arguments passed by registers */
	if (nregs > 8)
	if (m->nr_args > MAX_BPF_FUNC_ARGS)
		return -ENOTSUPP;

	for (i = 0; i < m->nr_args; i++)
		nr_arg_slots += round_up(m->arg_size[i], 8) / 8;

	/* room of trampoline frame to store return address and frame pointer */
	stack_size += 16;

@@ -852,7 +875,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
		retval_off = stack_size;
	}

	stack_size += nregs * 8;
	stack_size += nr_arg_slots * 8;
	args_off = stack_size;

	stack_size += 8;
@@ -869,8 +892,14 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
	stack_size += 8;
	sreg_off = stack_size;

	if ((flags & BPF_TRAMP_F_CALL_ORIG) && (nr_arg_slots - RV_MAX_REG_ARGS > 0))
		stack_size += (nr_arg_slots - RV_MAX_REG_ARGS) * 8;

	stack_size = round_up(stack_size, STACK_ALIGN);

	/* room for args on stack must be at the top of stack */
	stk_arg_off = stack_size;

	if (!is_struct_ops) {
		/* For the trampoline called from function entry,
		 * the frame of traced function and the frame of
@@ -906,17 +935,17 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
		emit_sd(RV_REG_FP, -ip_off, RV_REG_T1, ctx);
	}

	emit_li(RV_REG_T1, nregs, ctx);
	emit_li(RV_REG_T1, nr_arg_slots, ctx);
	emit_sd(RV_REG_FP, -nregs_off, RV_REG_T1, ctx);

	store_args(nregs, args_off, ctx);
	store_args(nr_arg_slots, args_off, ctx);

	/* skip to actual body of traced function */
	if (flags & BPF_TRAMP_F_SKIP_FRAME)
		orig_call += RV_FENTRY_NINSNS * 4;

	if (flags & BPF_TRAMP_F_CALL_ORIG) {
		emit_imm(RV_REG_A0, (const s64)im, ctx);
		emit_imm(RV_REG_A0, ctx->insns ? (const s64)im : RV_MAX_COUNT_IMM, ctx);
		ret = emit_call((const u64)__bpf_tramp_enter, true, ctx);
		if (ret)
			return ret;
@@ -949,13 +978,14 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
	}

	if (flags & BPF_TRAMP_F_CALL_ORIG) {
		restore_args(nregs, args_off, ctx);
		restore_args(min_t(int, nr_arg_slots, RV_MAX_REG_ARGS), args_off, ctx);
		restore_stack_args(nr_arg_slots - RV_MAX_REG_ARGS, args_off, stk_arg_off, ctx);
		ret = emit_call((const u64)orig_call, true, ctx);
		if (ret)
			goto out;
		emit_sd(RV_REG_FP, -retval_off, RV_REG_A0, ctx);
		emit_sd(RV_REG_FP, -(retval_off - 8), regmap[BPF_REG_0], ctx);
		im->ip_after_call = ctx->insns + ctx->ninsns;
		im->ip_after_call = ctx->ro_insns + ctx->ninsns;
		/* 2 nops reserved for auipc+jalr pair */
		emit(rv_nop(), ctx);
		emit(rv_nop(), ctx);
@@ -976,15 +1006,15 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
	}

	if (flags & BPF_TRAMP_F_CALL_ORIG) {
		im->ip_epilogue = ctx->insns + ctx->ninsns;
		emit_imm(RV_REG_A0, (const s64)im, ctx);
		im->ip_epilogue = ctx->ro_insns + ctx->ninsns;
		emit_imm(RV_REG_A0, ctx->insns ? (const s64)im : RV_MAX_COUNT_IMM, ctx);
		ret = emit_call((const u64)__bpf_tramp_exit, true, ctx);
		if (ret)
			goto out;
	}

	if (flags & BPF_TRAMP_F_RESTORE_REGS)
		restore_args(nregs, args_off, ctx);
		restore_args(min_t(int, nr_arg_slots, RV_MAX_REG_ARGS), args_off, ctx);

	if (save_ret) {
		emit_ld(RV_REG_A0, -retval_off, RV_REG_FP, ctx);
@@ -1039,31 +1069,52 @@ int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
	return ret < 0 ? ret : ninsns_rvoff(ctx.ninsns);
}

int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image,
				void *image_end, const struct btf_func_model *m,
void *arch_alloc_bpf_trampoline(unsigned int size)
{
	return bpf_prog_pack_alloc(size, bpf_fill_ill_insns);
}

void arch_free_bpf_trampoline(void *image, unsigned int size)
{
	bpf_prog_pack_free(image, size);
}

int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *ro_image,
				void *ro_image_end, const struct btf_func_model *m,
				u32 flags, struct bpf_tramp_links *tlinks,
				void *func_addr)
{
	int ret;
	void *image, *res;
	struct rv_jit_context ctx;
	u32 size = ro_image_end - ro_image;

	image = kvmalloc(size, GFP_KERNEL);
	if (!image)
		return -ENOMEM;

	ctx.ninsns = 0;
	/*
	 * The bpf_int_jit_compile() uses a RW buffer (ctx.insns) to write the
	 * JITed instructions and later copies it to a RX region (ctx.ro_insns).
	 * It also uses ctx.ro_insns to calculate offsets for jumps etc. As the
	 * trampoline image uses the same memory area for writing and execution,
	 * both ctx.insns and ctx.ro_insns can be set to image.
	 */
	ctx.insns = image;
	ctx.ro_insns = image;
	ctx.ro_insns = ro_image;
	ret = __arch_prepare_bpf_trampoline(im, m, tlinks, func_addr, flags, &ctx);
	if (ret < 0)
		return ret;
		goto out;

	bpf_flush_icache(ctx.insns, ctx.insns + ctx.ninsns);
	if (WARN_ON(size < ninsns_rvoff(ctx.ninsns))) {
		ret = -E2BIG;
		goto out;
	}

	return ninsns_rvoff(ret);
	res = bpf_arch_text_copy(ro_image, image, size);
	if (IS_ERR(res)) {
		ret = PTR_ERR(res);
		goto out;
	}

	bpf_flush_icache(ro_image, ro_image_end);
out:
	kvfree(image);
	return ret < 0 ? ret : size;
}

int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
+2 −3
Original line number Diff line number Diff line
@@ -178,8 +178,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
	prog->jited_len = prog_size - cfi_get_offset();

	if (!prog->is_func || extra_pass) {
		if (WARN_ON(bpf_jit_binary_pack_finalize(prog, jit_data->ro_header,
							 jit_data->header))) {
		if (WARN_ON(bpf_jit_binary_pack_finalize(jit_data->ro_header, jit_data->header))) {
			/* ro_header has been freed */
			jit_data->ro_header = NULL;
			prog = orig_prog;
@@ -258,7 +257,7 @@ void bpf_jit_free(struct bpf_prog *prog)
		 * before freeing it.
		 */
		if (jit_data) {
			bpf_jit_binary_pack_finalize(prog, jit_data->ro_header, jit_data->header);
			bpf_jit_binary_pack_finalize(jit_data->ro_header, jit_data->header);
			kfree(jit_data);
		}
		hdr = bpf_jit_binary_pack_hdr(prog);
Loading