Commit 89de2db1 authored by Jakub Kicinski's avatar Jakub Kicinski
Browse files
Daniel Borkmann says:

====================
pull-request: bpf-next 2024-04-29

We've added 147 non-merge commits during the last 32 day(s) which contain
a total of 158 files changed, 9400 insertions(+), 2213 deletions(-).

The main changes are:

1) Add an internal-only BPF per-CPU instruction for resolving per-CPU
   memory addresses and implement support in x86 BPF JIT. This allows
   inlining per-CPU array and hashmap lookups
   and the bpf_get_smp_processor_id() helper, from Andrii Nakryiko.

2) Add BPF link support for sk_msg and sk_skb programs, from Yonghong Song.

3) Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
   atomics in bpf_arena which can be JITed as a single x86 instruction,
   from Alexei Starovoitov.

4) Add support for passing mark with bpf_fib_lookup helper,
   from Anton Protopopov.

5) Add a new bpf_wq API for deferring events and refactor sleepable
   bpf_timer code to keep common code where possible,
   from Benjamin Tissoires.

6) Fix BPF_PROG_TEST_RUN infra with regards to bpf_dummy_struct_ops programs
   to check when NULL is passed for non-NULLable parameters,
   from Eduard Zingerman.

7) Harden the BPF verifier's and/or/xor value tracking,
   from Harishankar Vishwanathan.

8) Introduce crypto kfuncs to make BPF programs able to utilize the kernel
   crypto subsystem, from Vadim Fedorenko.

9) Various improvements to the BPF instruction set standardization doc,
   from Dave Thaler.

10) Extend libbpf APIs to partially consume items from the BPF ringbuffer,
    from Andrea Righi.

11) Bigger batch of BPF selftests refactoring to use common network helpers
    and to drop duplicate code, from Geliang Tang.

12) Support bpf_tail_call_static() helper for BPF programs with GCC 13,
    from Jose E. Marchesi.

13) Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
    program to have code sections where preemption is disabled,
    from Kumar Kartikeya Dwivedi.

14) Allow invoking BPF kfuncs from BPF_PROG_TYPE_SYSCALL programs,
    from David Vernet.

15) Extend the BPF verifier to allow different input maps for a given
    bpf_for_each_map_elem() helper call in a BPF program, from Philo Lu.

16) Add support for PROBE_MEM32 and bpf_addr_space_cast instructions
    for riscv64 and arm64 JITs to enable BPF Arena, from Puranjay Mohan.

17) Shut up a false-positive KMSAN splat in interpreter mode by unpoison
    the stack memory, from Martin KaFai Lau.

18) Improve xsk selftest coverage with new tests on maximum and minimum
    hardware ring size configurations, from Tushar Vyavahare.

19) Various ReST man pages fixes as well as documentation and bash completion
    improvements for bpftool, from Rameez Rehman & Quentin Monnet.

20) Fix libbpf with regards to dumping subsequent char arrays,
    from Quentin Deslandes.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (147 commits)
  bpf, docs: Clarify PC use in instruction-set.rst
  bpf_helpers.h: Define bpf_tail_call_static when building with GCC
  bpf, docs: Add introduction for use in the ISA Internet Draft
  selftests/bpf: extend BPF_SOCK_OPS_RTT_CB test for srtt and mrtt_us
  bpf: add mrtt and srtt as BPF_SOCK_OPS_RTT_CB args
  selftests/bpf: dummy_st_ops should reject 0 for non-nullable params
  bpf: check bpf_dummy_struct_ops program params for test runs
  selftests/bpf: do not pass NULL for non-nullable params in dummy_st_ops
  selftests/bpf: adjust dummy_st_ops_success to detect additional error
  bpf: mark bpf_dummy_struct_ops.test_1 parameter as nullable
  selftests/bpf: Add ring_buffer__consume_n test.
  bpf: Add bpf_guard_preempt() convenience macro
  selftests: bpf: crypto: add benchmark for crypto functions
  selftests: bpf: crypto skcipher algo selftests
  bpf: crypto: add skcipher to bpf crypto
  bpf: make common crypto API for TC/XDP programs
  bpf: update the comment for BTF_FIELDS_MAX
  selftests/bpf: Fix wq test.
  selftests/bpf: Use make_sockaddr in test_sock_addr
  selftests/bpf: Use connect_to_addr in test_sock_addr
  ...
====================

Link: https://lore.kernel.org/r/20240429131657.19423-1-daniel@iogearbox.net


Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parents b3f1a08f 07801a24
Loading
Loading
Loading
Loading
+62 −47
Original line number Diff line number Diff line
@@ -5,7 +5,11 @@
BPF Instruction Set Architecture (ISA)
======================================

This document specifies the BPF instruction set architecture (ISA).
eBPF (which is no longer an acronym for anything), also commonly
referred to as BPF, is a technology with origins in the Linux kernel
that can run untrusted programs in a privileged context such as an
operating system kernel. This document specifies the BPF instruction
set architecture (ISA).

Documentation conventions
=========================
@@ -43,7 +47,7 @@ a type's signedness (`S`) and bit width (`N`), respectively.
  ===== =========

For example, `u32` is a type whose valid values are all the 32-bit unsigned
numbers and `s16` is a types whose valid values are all the 16-bit signed
numbers and `s16` is a type whose valid values are all the 16-bit signed
numbers.

Functions
@@ -108,7 +112,7 @@ conformance group means it must support all instructions in that conformance
group.

The use of named conformance groups enables interoperability between a runtime
that executes instructions, and tools as such compilers that generate
that executes instructions, and tools such as compilers that generate
instructions for the runtime.  Thus, capability discovery in terms of
conformance groups might be done manually by users or automatically by tools.

@@ -181,10 +185,13 @@ A basic instruction is encoded as follows::
    (`64-bit immediate instructions`_ reuse this field for other purposes)

  **dst_reg**
    destination register number (0-10)
    destination register number (0-10), unless otherwise specified
    (future instructions might reuse this field for other purposes)

**offset**
  signed integer offset used with pointer arithmetic
  signed integer offset used with pointer arithmetic, except where
  otherwise specified (some arithmetic instructions reuse this field
  for other purposes)

**imm**
  signed integer immediate value
@@ -228,10 +235,12 @@ This is depicted in the following figure::
  operation to perform, encoded as explained above

**regs**
  The source and destination register numbers, encoded as explained above
  The source and destination register numbers (unless otherwise
  specified), encoded as explained above

**offset**
  signed integer offset used with pointer arithmetic
  signed integer offset used with pointer arithmetic, unless
  otherwise specified

**imm**
  signed integer immediate value
@@ -342,8 +351,8 @@ where '(u32)' indicates that the upper 32 bits are zeroed.

  dst = dst ^ imm

Note that most instructions have instruction offset of 0. Only three instructions
(``SDIV``, ``SMOD``, ``MOVSX``) have a non-zero offset.
Note that most arithmetic instructions have 'offset' set to 0. Only three instructions
(``SDIV``, ``SMOD``, ``MOVSX``) have a non-zero 'offset'.

Division, multiplication, and modulo operations for ``ALU`` are part
of the "divmul32" conformance group, and division, multiplication, and
@@ -365,15 +374,15 @@ Note that there are varying definitions of the signed modulo operation
when the dividend or divisor are negative, where implementations often
vary by language such that Python, Ruby, etc.  differ from C, Go, Java,
etc. This specification requires that signed modulo use truncated division
(where -13 % 3 == -1) as implemented in C, Go, etc.:
(where -13 % 3 == -1) as implemented in C, Go, etc.::

   a % n = a - n * trunc(a / n)

The ``MOVSX`` instruction does a move operation with sign extension.
``{MOVSX, X, ALU}`` :term:`sign extends<Sign Extend>` 8-bit and 16-bit operands into 32
bit operands, and zeroes the remaining upper 32 bits.
``{MOVSX, X, ALU}`` :term:`sign extends<Sign Extend>` 8-bit and 16-bit operands into
32-bit operands, and zeroes the remaining upper 32 bits.
``{MOVSX, X, ALU64}`` :term:`sign extends<Sign Extend>` 8-bit, 16-bit, and 32-bit
operands into 64 bit operands.  Unlike other arithmetic instructions,
operands into 64-bit operands.  Unlike other arithmetic instructions,
``MOVSX`` is only defined for register source operands (``X``).

The ``NEG`` instruction is only defined when the source bit is clear
@@ -411,19 +420,19 @@ conformance group.

Examples:

``{END, TO_LE, ALU}`` with imm = 16/32/64 means::
``{END, TO_LE, ALU}`` with 'imm' = 16/32/64 means::

  dst = htole16(dst)
  dst = htole32(dst)
  dst = htole64(dst)

``{END, TO_BE, ALU}`` with imm = 16/32/64 means::
``{END, TO_BE, ALU}`` with 'imm' = 16/32/64 means::

  dst = htobe16(dst)
  dst = htobe32(dst)
  dst = htobe64(dst)

``{END, TO_LE, ALU64}`` with imm = 16/32/64 means::
``{END, TO_LE, ALU64}`` with 'imm' = 16/32/64 means::

  dst = bswap16(dst)
  dst = bswap32(dst)
@@ -438,9 +447,9 @@ otherwise identical operations, and indicates the base64 conformance
group unless otherwise specified.
The 'code' field encodes the operation as below:

========  =====  =======  ===============================  ===================================================
========  =====  =======  =================================  ===================================================
code      value  src_reg  description                        notes
========  =====  =======  ===============================  ===================================================
========  =====  =======  =================================  ===================================================
JA        0x0    0x0      PC += offset                       {JA, K, JMP} only
JA        0x0    0x0      PC += imm                          {JA, K, JMP32} only
JEQ       0x1    any      PC += offset if dst == src
@@ -450,7 +459,7 @@ JSET 0x4 any PC += offset if dst & src
JNE       0x5    any      PC += offset if dst != src
JSGT      0x6    any      PC += offset if dst > src          signed
JSGE      0x7    any      PC += offset if dst >= src         signed
CALL      0x8    0x0      call helper function by address  {CALL, K, JMP} only, see `Helper functions`_
CALL      0x8    0x0      call helper function by static ID  {CALL, K, JMP} only, see `Helper functions`_
CALL      0x8    0x1      call PC += imm                     {CALL, K, JMP} only, see `Program-local functions`_
CALL      0x8    0x2      call helper function by BTF ID     {CALL, K, JMP} only, see `Helper functions`_
EXIT      0x9    0x0      return                             {CALL, K, JMP} only
@@ -458,7 +467,13 @@ JLT 0xa any PC += offset if dst < src unsigned
JLE       0xb    any      PC += offset if dst <= src         unsigned
JSLT      0xc    any      PC += offset if dst < src          signed
JSLE      0xd    any      PC += offset if dst <= src         signed
========  =====  =======  ===============================  ===================================================
========  =====  =======  =================================  ===================================================

where 'PC' denotes the program counter, and the offset to increment by
is in units of 64-bit instructions relative to the instruction following
the jump instruction.  Thus 'PC += 1' skips execution of the next
instruction if it's a basic instruction or results in undefined behavior
if the next instruction is a 128-bit wide instruction.

The BPF program needs to store the return value into register R0 before doing an
``EXIT``.
@@ -475,7 +490,7 @@ where 's>=' indicates a signed '>=' comparison.

  gotol +imm

where 'imm' means the branch offset comes from insn 'imm' field.
where 'imm' means the branch offset comes from the 'imm' field.

Note that there are two flavors of ``JA`` instructions. The
``JMP`` class permits a 16-bit jump offset specified by the 'offset'
@@ -493,26 +508,26 @@ Helper functions
Helper functions are a concept whereby BPF programs can call into a
set of function calls exposed by the underlying platform.

Historically, each helper function was identified by an address
encoded in the imm field.  The available helper functions may differ
for each program type, but address values are unique across all program types.
Historically, each helper function was identified by a static ID
encoded in the 'imm' field.  The available helper functions may differ
for each program type, but static IDs are unique across all program types.

Platforms that support the BPF Type Format (BTF) support identifying
a helper function by a BTF ID encoded in the imm field, where the BTF ID
a helper function by a BTF ID encoded in the 'imm' field, where the BTF ID
identifies the helper name and type.

Program-local functions
~~~~~~~~~~~~~~~~~~~~~~~
Program-local functions are functions exposed by the same BPF program as the
caller, and are referenced by offset from the call instruction, similar to
``JA``.  The offset is encoded in the imm field of the call instruction.
A ``EXIT`` within the program-local function will return to the caller.
``JA``.  The offset is encoded in the 'imm' field of the call instruction.
An ``EXIT`` within the program-local function will return to the caller.

Load and store instructions
===========================

For load and store instructions (``LD``, ``LDX``, ``ST``, and ``STX``), the
8-bit 'opcode' field is divided as::
8-bit 'opcode' field is divided as follows::

  +-+-+-+-+-+-+-+-+
  |mode |sz |class|
@@ -580,7 +595,7 @@ instructions that transfer data between a register and memory.

  dst = *(signed size *) (src + offset)

Where size is one of: ``B``, ``H``, or ``W``, and
Where '<size>' is one of: ``B``, ``H``, or ``W``, and
'signed size' is one of: s8, s16, or s32.

Atomic operations
@@ -662,11 +677,11 @@ src_reg pseudocode imm type dst type
=======  =========================================  ===========  ==============
0x0      dst = (next_imm << 32) | imm               integer      integer
0x1      dst = map_by_fd(imm)                       map fd       map
0x2      dst = map_val(map_by_fd(imm)) + next_imm   map fd       data pointer
0x3      dst = var_addr(imm)                        variable id  data pointer
0x4      dst = code_addr(imm)                       integer      code pointer
0x2      dst = map_val(map_by_fd(imm)) + next_imm   map fd       data address
0x3      dst = var_addr(imm)                        variable id  data address
0x4      dst = code_addr(imm)                       integer      code address
0x5      dst = map_by_idx(imm)                      map index    map
0x6      dst = map_val(map_by_idx(imm)) + next_imm  map index    data pointer
0x6      dst = map_val(map_by_idx(imm)) + next_imm  map index    data address
=======  =========================================  ===========  ==============

where
+8 −0
Original line number Diff line number Diff line
@@ -3822,6 +3822,14 @@ F: kernel/bpf/tnum.c
F:	kernel/bpf/trampoline.c
F:	kernel/bpf/verifier.c
BPF [CRYPTO]
M:	Vadim Fedorenko <vadim.fedorenko@linux.dev>
L:	bpf@vger.kernel.org
S:	Maintained
F:	crypto/bpf_crypto_skcipher.c
F:	include/linux/bpf_crypto.h
F:	kernel/bpf/crypto.c
BPF [DOCUMENTATION] (Related to Standardization)
R:	David Vernet <void@manifault.com>
L:	bpf@vger.kernel.org
+76 −10
Original line number Diff line number Diff line
@@ -29,6 +29,7 @@
#define TCALL_CNT (MAX_BPF_JIT_REG + 2)
#define TMP_REG_3 (MAX_BPF_JIT_REG + 3)
#define FP_BOTTOM (MAX_BPF_JIT_REG + 4)
#define ARENA_VM_START (MAX_BPF_JIT_REG + 5)

#define check_imm(bits, imm) do {				\
	if ((((imm) > 0) && ((imm) >> (bits))) ||		\
@@ -67,6 +68,8 @@ static const int bpf2a64[] = {
	/* temporary register for blinding constants */
	[BPF_REG_AX] = A64_R(9),
	[FP_BOTTOM] = A64_R(27),
	/* callee saved register for kern_vm_start address */
	[ARENA_VM_START] = A64_R(28),
};

struct jit_ctx {
@@ -79,6 +82,7 @@ struct jit_ctx {
	__le32 *ro_image;
	u32 stack_size;
	int fpb_offset;
	u64 user_vm_start;
};

struct bpf_plt {
@@ -295,7 +299,7 @@ static bool is_lsi_offset(int offset, int scale)
#define PROLOGUE_OFFSET (BTI_INSNS + 2 + PAC_INSNS + 8)

static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf,
			  bool is_exception_cb)
			  bool is_exception_cb, u64 arena_vm_start)
{
	const struct bpf_prog *prog = ctx->prog;
	const bool is_main_prog = !bpf_is_subprog(prog);
@@ -306,6 +310,7 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf,
	const u8 fp = bpf2a64[BPF_REG_FP];
	const u8 tcc = bpf2a64[TCALL_CNT];
	const u8 fpb = bpf2a64[FP_BOTTOM];
	const u8 arena_vm_base = bpf2a64[ARENA_VM_START];
	const int idx0 = ctx->idx;
	int cur_offset;

@@ -411,6 +416,10 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf,

	/* Set up function call stack */
	emit(A64_SUB_I(1, A64_SP, A64_SP, ctx->stack_size), ctx);

	if (arena_vm_start)
		emit_a64_mov_i64(arena_vm_base, arena_vm_start, ctx);

	return 0;
}

@@ -738,6 +747,7 @@ static void build_epilogue(struct jit_ctx *ctx, bool is_exception_cb)

#define BPF_FIXUP_OFFSET_MASK	GENMASK(26, 0)
#define BPF_FIXUP_REG_MASK	GENMASK(31, 27)
#define DONT_CLEAR 5 /* Unused ARM64 register from BPF's POV */

bool ex_handler_bpf(const struct exception_table_entry *ex,
		    struct pt_regs *regs)
@@ -745,6 +755,7 @@ bool ex_handler_bpf(const struct exception_table_entry *ex,
	off_t offset = FIELD_GET(BPF_FIXUP_OFFSET_MASK, ex->fixup);
	int dst_reg = FIELD_GET(BPF_FIXUP_REG_MASK, ex->fixup);

	if (dst_reg != DONT_CLEAR)
		regs->regs[dst_reg] = 0;
	regs->pc = (unsigned long)&ex->fixup - offset;
	return true;
@@ -765,7 +776,8 @@ static int add_exception_handler(const struct bpf_insn *insn,
		return 0;

	if (BPF_MODE(insn->code) != BPF_PROBE_MEM &&
		BPF_MODE(insn->code) != BPF_PROBE_MEMSX)
		BPF_MODE(insn->code) != BPF_PROBE_MEMSX &&
			BPF_MODE(insn->code) != BPF_PROBE_MEM32)
		return 0;

	if (!ctx->prog->aux->extable ||
@@ -810,6 +822,9 @@ static int add_exception_handler(const struct bpf_insn *insn,

	ex->insn = ins_offset;

	if (BPF_CLASS(insn->code) != BPF_LDX)
		dst_reg = DONT_CLEAR;

	ex->fixup = FIELD_PREP(BPF_FIXUP_OFFSET_MASK, fixup_offset) |
		    FIELD_PREP(BPF_FIXUP_REG_MASK, dst_reg);

@@ -829,12 +844,13 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
		      bool extra_pass)
{
	const u8 code = insn->code;
	const u8 dst = bpf2a64[insn->dst_reg];
	const u8 src = bpf2a64[insn->src_reg];
	u8 dst = bpf2a64[insn->dst_reg];
	u8 src = bpf2a64[insn->src_reg];
	const u8 tmp = bpf2a64[TMP_REG_1];
	const u8 tmp2 = bpf2a64[TMP_REG_2];
	const u8 fp = bpf2a64[BPF_REG_FP];
	const u8 fpb = bpf2a64[FP_BOTTOM];
	const u8 arena_vm_base = bpf2a64[ARENA_VM_START];
	const s16 off = insn->off;
	const s32 imm = insn->imm;
	const int i = insn - ctx->prog->insnsi;
@@ -853,6 +869,15 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
	/* dst = src */
	case BPF_ALU | BPF_MOV | BPF_X:
	case BPF_ALU64 | BPF_MOV | BPF_X:
		if (insn_is_cast_user(insn)) {
			emit(A64_MOV(0, tmp, src), ctx); // 32-bit mov clears the upper 32 bits
			emit_a64_mov_i(0, dst, ctx->user_vm_start >> 32, ctx);
			emit(A64_LSL(1, dst, dst, 32), ctx);
			emit(A64_CBZ(1, tmp, 2), ctx);
			emit(A64_ORR(1, tmp, dst, tmp), ctx);
			emit(A64_MOV(1, dst, tmp), ctx);
			break;
		}
		switch (insn->off) {
		case 0:
			emit(A64_MOV(is64, dst, src), ctx);
@@ -1237,7 +1262,15 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
	case BPF_LDX | BPF_PROBE_MEMSX | BPF_B:
	case BPF_LDX | BPF_PROBE_MEMSX | BPF_H:
	case BPF_LDX | BPF_PROBE_MEMSX | BPF_W:
		if (ctx->fpb_offset > 0 && src == fp) {
	case BPF_LDX | BPF_PROBE_MEM32 | BPF_B:
	case BPF_LDX | BPF_PROBE_MEM32 | BPF_H:
	case BPF_LDX | BPF_PROBE_MEM32 | BPF_W:
	case BPF_LDX | BPF_PROBE_MEM32 | BPF_DW:
		if (BPF_MODE(insn->code) == BPF_PROBE_MEM32) {
			emit(A64_ADD(1, tmp2, src, arena_vm_base), ctx);
			src = tmp2;
		}
		if (ctx->fpb_offset > 0 && src == fp && BPF_MODE(insn->code) != BPF_PROBE_MEM32) {
			src_adj = fpb;
			off_adj = off + ctx->fpb_offset;
		} else {
@@ -1322,7 +1355,15 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
	case BPF_ST | BPF_MEM | BPF_H:
	case BPF_ST | BPF_MEM | BPF_B:
	case BPF_ST | BPF_MEM | BPF_DW:
		if (ctx->fpb_offset > 0 && dst == fp) {
	case BPF_ST | BPF_PROBE_MEM32 | BPF_B:
	case BPF_ST | BPF_PROBE_MEM32 | BPF_H:
	case BPF_ST | BPF_PROBE_MEM32 | BPF_W:
	case BPF_ST | BPF_PROBE_MEM32 | BPF_DW:
		if (BPF_MODE(insn->code) == BPF_PROBE_MEM32) {
			emit(A64_ADD(1, tmp2, dst, arena_vm_base), ctx);
			dst = tmp2;
		}
		if (ctx->fpb_offset > 0 && dst == fp && BPF_MODE(insn->code) != BPF_PROBE_MEM32) {
			dst_adj = fpb;
			off_adj = off + ctx->fpb_offset;
		} else {
@@ -1365,6 +1406,10 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
			}
			break;
		}

		ret = add_exception_handler(insn, ctx, dst);
		if (ret)
			return ret;
		break;

	/* STX: *(size *)(dst + off) = src */
@@ -1372,7 +1417,15 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
	case BPF_STX | BPF_MEM | BPF_H:
	case BPF_STX | BPF_MEM | BPF_B:
	case BPF_STX | BPF_MEM | BPF_DW:
		if (ctx->fpb_offset > 0 && dst == fp) {
	case BPF_STX | BPF_PROBE_MEM32 | BPF_B:
	case BPF_STX | BPF_PROBE_MEM32 | BPF_H:
	case BPF_STX | BPF_PROBE_MEM32 | BPF_W:
	case BPF_STX | BPF_PROBE_MEM32 | BPF_DW:
		if (BPF_MODE(insn->code) == BPF_PROBE_MEM32) {
			emit(A64_ADD(1, tmp2, dst, arena_vm_base), ctx);
			dst = tmp2;
		}
		if (ctx->fpb_offset > 0 && dst == fp && BPF_MODE(insn->code) != BPF_PROBE_MEM32) {
			dst_adj = fpb;
			off_adj = off + ctx->fpb_offset;
		} else {
@@ -1413,6 +1466,10 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
			}
			break;
		}

		ret = add_exception_handler(insn, ctx, dst);
		if (ret)
			return ret;
		break;

	case BPF_STX | BPF_ATOMIC | BPF_W:
@@ -1594,6 +1651,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
	bool tmp_blinded = false;
	bool extra_pass = false;
	struct jit_ctx ctx;
	u64 arena_vm_start;
	u8 *image_ptr;
	u8 *ro_image_ptr;

@@ -1611,6 +1669,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
		prog = tmp;
	}

	arena_vm_start = bpf_arena_get_kern_vm_start(prog->aux->arena);
	jit_data = prog->aux->jit_data;
	if (!jit_data) {
		jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL);
@@ -1641,6 +1700,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
	}

	ctx.fpb_offset = find_fpb_offset(prog);
	ctx.user_vm_start = bpf_arena_get_user_vm_start(prog->aux->arena);

	/*
	 * 1. Initial fake pass to compute ctx->idx and ctx->offset.
@@ -1648,7 +1708,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
	 * BPF line info needs ctx->offset[i] to be the offset of
	 * instruction[i] in jited image, so build prologue first.
	 */
	if (build_prologue(&ctx, was_classic, prog->aux->exception_cb)) {
	if (build_prologue(&ctx, was_classic, prog->aux->exception_cb,
			   arena_vm_start)) {
		prog = orig_prog;
		goto out_off;
	}
@@ -1696,7 +1757,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
	ctx.idx = 0;
	ctx.exentry_idx = 0;

	build_prologue(&ctx, was_classic, prog->aux->exception_cb);
	build_prologue(&ctx, was_classic, prog->aux->exception_cb, arena_vm_start);

	if (build_body(&ctx, extra_pass)) {
		prog = orig_prog;
@@ -2461,6 +2522,11 @@ bool bpf_jit_supports_exceptions(void)
	return true;
}

bool bpf_jit_supports_arena(void)
{
	return true;
}

void bpf_jit_free(struct bpf_prog *prog)
{
	if (prog->jited) {
+2 −0
Original line number Diff line number Diff line
@@ -81,6 +81,8 @@ struct rv_jit_context {
	int nexentries;
	unsigned long flags;
	int stack_size;
	u64 arena_vm_start;
	u64 user_vm_start;
};

/* Convert from ninsns to bytes. */
+201 −2
Original line number Diff line number Diff line
@@ -18,6 +18,7 @@

#define RV_REG_TCC RV_REG_A6
#define RV_REG_TCC_SAVED RV_REG_S6 /* Store A6 in S6 if program do calls */
#define RV_REG_ARENA RV_REG_S7 /* For storing arena_vm_start */

static const int regmap[] = {
	[BPF_REG_0] =	RV_REG_A5,
@@ -255,6 +256,10 @@ static void __build_epilogue(bool is_tail_call, struct rv_jit_context *ctx)
		emit_ld(RV_REG_S6, store_offset, RV_REG_SP, ctx);
		store_offset -= 8;
	}
	if (ctx->arena_vm_start) {
		emit_ld(RV_REG_ARENA, store_offset, RV_REG_SP, ctx);
		store_offset -= 8;
	}

	emit_addi(RV_REG_SP, RV_REG_SP, stack_adjust, ctx);
	/* Set return value. */
@@ -548,6 +553,7 @@ static void emit_atomic(u8 rd, u8 rs, s16 off, s32 imm, bool is64,

#define BPF_FIXUP_OFFSET_MASK   GENMASK(26, 0)
#define BPF_FIXUP_REG_MASK      GENMASK(31, 27)
#define REG_DONT_CLEAR_MARKER	0	/* RV_REG_ZERO unused in pt_regmap */

bool ex_handler_bpf(const struct exception_table_entry *ex,
		    struct pt_regs *regs)
@@ -555,6 +561,7 @@ bool ex_handler_bpf(const struct exception_table_entry *ex,
	off_t offset = FIELD_GET(BPF_FIXUP_OFFSET_MASK, ex->fixup);
	int regs_offset = FIELD_GET(BPF_FIXUP_REG_MASK, ex->fixup);

	if (regs_offset != REG_DONT_CLEAR_MARKER)
		*(unsigned long *)((void *)regs + pt_regmap[regs_offset]) = 0;
	regs->epc = (unsigned long)&ex->fixup - offset;

@@ -572,7 +579,8 @@ static int add_exception_handler(const struct bpf_insn *insn,
	off_t fixup_offset;

	if (!ctx->insns || !ctx->ro_insns || !ctx->prog->aux->extable ||
	    (BPF_MODE(insn->code) != BPF_PROBE_MEM && BPF_MODE(insn->code) != BPF_PROBE_MEMSX))
	    (BPF_MODE(insn->code) != BPF_PROBE_MEM && BPF_MODE(insn->code) != BPF_PROBE_MEMSX &&
	     BPF_MODE(insn->code) != BPF_PROBE_MEM32))
		return 0;

	if (WARN_ON_ONCE(ctx->nexentries >= ctx->prog->aux->num_exentries))
@@ -1073,6 +1081,15 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
	/* dst = src */
	case BPF_ALU | BPF_MOV | BPF_X:
	case BPF_ALU64 | BPF_MOV | BPF_X:
		if (insn_is_cast_user(insn)) {
			emit_mv(RV_REG_T1, rs, ctx);
			emit_zextw(RV_REG_T1, RV_REG_T1, ctx);
			emit_imm(rd, (ctx->user_vm_start >> 32) << 32, ctx);
			emit(rv_beq(RV_REG_T1, RV_REG_ZERO, 4), ctx);
			emit_or(RV_REG_T1, rd, RV_REG_T1, ctx);
			emit_mv(rd, RV_REG_T1, ctx);
			break;
		}
		if (imm == 1) {
			/* Special mov32 for zext */
			emit_zextw(rd, rd, ctx);
@@ -1539,6 +1556,11 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
	case BPF_LDX | BPF_PROBE_MEMSX | BPF_B:
	case BPF_LDX | BPF_PROBE_MEMSX | BPF_H:
	case BPF_LDX | BPF_PROBE_MEMSX | BPF_W:
	/* LDX | PROBE_MEM32: dst = *(unsigned size *)(src + RV_REG_ARENA + off) */
	case BPF_LDX | BPF_PROBE_MEM32 | BPF_B:
	case BPF_LDX | BPF_PROBE_MEM32 | BPF_H:
	case BPF_LDX | BPF_PROBE_MEM32 | BPF_W:
	case BPF_LDX | BPF_PROBE_MEM32 | BPF_DW:
	{
		int insn_len, insns_start;
		bool sign_ext;
@@ -1546,6 +1568,11 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
		sign_ext = BPF_MODE(insn->code) == BPF_MEMSX ||
			   BPF_MODE(insn->code) == BPF_PROBE_MEMSX;

		if (BPF_MODE(insn->code) == BPF_PROBE_MEM32) {
			emit_add(RV_REG_T2, rs, RV_REG_ARENA, ctx);
			rs = RV_REG_T2;
		}

		switch (BPF_SIZE(code)) {
		case BPF_B:
			if (is_12b_int(off)) {
@@ -1682,6 +1709,86 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
		emit_sd(RV_REG_T2, 0, RV_REG_T1, ctx);
		break;

	case BPF_ST | BPF_PROBE_MEM32 | BPF_B:
	case BPF_ST | BPF_PROBE_MEM32 | BPF_H:
	case BPF_ST | BPF_PROBE_MEM32 | BPF_W:
	case BPF_ST | BPF_PROBE_MEM32 | BPF_DW:
	{
		int insn_len, insns_start;

		emit_add(RV_REG_T3, rd, RV_REG_ARENA, ctx);
		rd = RV_REG_T3;

		/* Load imm to a register then store it */
		emit_imm(RV_REG_T1, imm, ctx);

		switch (BPF_SIZE(code)) {
		case BPF_B:
			if (is_12b_int(off)) {
				insns_start = ctx->ninsns;
				emit(rv_sb(rd, off, RV_REG_T1), ctx);
				insn_len = ctx->ninsns - insns_start;
				break;
			}

			emit_imm(RV_REG_T2, off, ctx);
			emit_add(RV_REG_T2, RV_REG_T2, rd, ctx);
			insns_start = ctx->ninsns;
			emit(rv_sb(RV_REG_T2, 0, RV_REG_T1), ctx);
			insn_len = ctx->ninsns - insns_start;
			break;
		case BPF_H:
			if (is_12b_int(off)) {
				insns_start = ctx->ninsns;
				emit(rv_sh(rd, off, RV_REG_T1), ctx);
				insn_len = ctx->ninsns - insns_start;
				break;
			}

			emit_imm(RV_REG_T2, off, ctx);
			emit_add(RV_REG_T2, RV_REG_T2, rd, ctx);
			insns_start = ctx->ninsns;
			emit(rv_sh(RV_REG_T2, 0, RV_REG_T1), ctx);
			insn_len = ctx->ninsns - insns_start;
			break;
		case BPF_W:
			if (is_12b_int(off)) {
				insns_start = ctx->ninsns;
				emit_sw(rd, off, RV_REG_T1, ctx);
				insn_len = ctx->ninsns - insns_start;
				break;
			}

			emit_imm(RV_REG_T2, off, ctx);
			emit_add(RV_REG_T2, RV_REG_T2, rd, ctx);
			insns_start = ctx->ninsns;
			emit_sw(RV_REG_T2, 0, RV_REG_T1, ctx);
			insn_len = ctx->ninsns - insns_start;
			break;
		case BPF_DW:
			if (is_12b_int(off)) {
				insns_start = ctx->ninsns;
				emit_sd(rd, off, RV_REG_T1, ctx);
				insn_len = ctx->ninsns - insns_start;
				break;
			}

			emit_imm(RV_REG_T2, off, ctx);
			emit_add(RV_REG_T2, RV_REG_T2, rd, ctx);
			insns_start = ctx->ninsns;
			emit_sd(RV_REG_T2, 0, RV_REG_T1, ctx);
			insn_len = ctx->ninsns - insns_start;
			break;
		}

		ret = add_exception_handler(insn, ctx, REG_DONT_CLEAR_MARKER,
					    insn_len);
		if (ret)
			return ret;

		break;
	}

	/* STX: *(size *)(dst + off) = src */
	case BPF_STX | BPF_MEM | BPF_B:
		if (is_12b_int(off)) {
@@ -1728,6 +1835,84 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
		emit_atomic(rd, rs, off, imm,
			    BPF_SIZE(code) == BPF_DW, ctx);
		break;

	case BPF_STX | BPF_PROBE_MEM32 | BPF_B:
	case BPF_STX | BPF_PROBE_MEM32 | BPF_H:
	case BPF_STX | BPF_PROBE_MEM32 | BPF_W:
	case BPF_STX | BPF_PROBE_MEM32 | BPF_DW:
	{
		int insn_len, insns_start;

		emit_add(RV_REG_T2, rd, RV_REG_ARENA, ctx);
		rd = RV_REG_T2;

		switch (BPF_SIZE(code)) {
		case BPF_B:
			if (is_12b_int(off)) {
				insns_start = ctx->ninsns;
				emit(rv_sb(rd, off, rs), ctx);
				insn_len = ctx->ninsns - insns_start;
				break;
			}

			emit_imm(RV_REG_T1, off, ctx);
			emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
			insns_start = ctx->ninsns;
			emit(rv_sb(RV_REG_T1, 0, rs), ctx);
			insn_len = ctx->ninsns - insns_start;
			break;
		case BPF_H:
			if (is_12b_int(off)) {
				insns_start = ctx->ninsns;
				emit(rv_sh(rd, off, rs), ctx);
				insn_len = ctx->ninsns - insns_start;
				break;
			}

			emit_imm(RV_REG_T1, off, ctx);
			emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
			insns_start = ctx->ninsns;
			emit(rv_sh(RV_REG_T1, 0, rs), ctx);
			insn_len = ctx->ninsns - insns_start;
			break;
		case BPF_W:
			if (is_12b_int(off)) {
				insns_start = ctx->ninsns;
				emit_sw(rd, off, rs, ctx);
				insn_len = ctx->ninsns - insns_start;
				break;
			}

			emit_imm(RV_REG_T1, off, ctx);
			emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
			insns_start = ctx->ninsns;
			emit_sw(RV_REG_T1, 0, rs, ctx);
			insn_len = ctx->ninsns - insns_start;
			break;
		case BPF_DW:
			if (is_12b_int(off)) {
				insns_start = ctx->ninsns;
				emit_sd(rd, off, rs, ctx);
				insn_len = ctx->ninsns - insns_start;
				break;
			}

			emit_imm(RV_REG_T1, off, ctx);
			emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
			insns_start = ctx->ninsns;
			emit_sd(RV_REG_T1, 0, rs, ctx);
			insn_len = ctx->ninsns - insns_start;
			break;
		}

		ret = add_exception_handler(insn, ctx, REG_DONT_CLEAR_MARKER,
					    insn_len);
		if (ret)
			return ret;

		break;
	}

	default:
		pr_err("bpf-jit: unknown opcode %02x\n", code);
		return -EINVAL;
@@ -1759,6 +1944,8 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx, bool is_subprog)
		stack_adjust += 8;
	if (seen_reg(RV_REG_S6, ctx))
		stack_adjust += 8;
	if (ctx->arena_vm_start)
		stack_adjust += 8;

	stack_adjust = round_up(stack_adjust, 16);
	stack_adjust += bpf_stack_adjust;
@@ -1810,6 +1997,10 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx, bool is_subprog)
		emit_sd(RV_REG_SP, store_offset, RV_REG_S6, ctx);
		store_offset -= 8;
	}
	if (ctx->arena_vm_start) {
		emit_sd(RV_REG_SP, store_offset, RV_REG_ARENA, ctx);
		store_offset -= 8;
	}

	emit_addi(RV_REG_FP, RV_REG_SP, stack_adjust, ctx);

@@ -1823,6 +2014,9 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx, bool is_subprog)
		emit_mv(RV_REG_TCC_SAVED, RV_REG_TCC, ctx);

	ctx->stack_size = stack_adjust;

	if (ctx->arena_vm_start)
		emit_imm(RV_REG_ARENA, ctx->arena_vm_start, ctx);
}

void bpf_jit_build_epilogue(struct rv_jit_context *ctx)
@@ -1839,3 +2033,8 @@ bool bpf_jit_supports_ptr_xchg(void)
{
	return true;
}

bool bpf_jit_supports_arena(void)
{
	return true;
}
Loading