Commit 4b3529ed authored by Jakub Kicinski's avatar Jakub Kicinski
Browse files
Daniel Borkmann says:

====================
pull-request: bpf-next 2024-05-28

We've added 23 non-merge commits during the last 11 day(s) which contain
a total of 45 files changed, 696 insertions(+), 277 deletions(-).

The main changes are:

1) Rename skb's mono_delivery_time to tstamp_type for extensibility
   and add SKB_CLOCK_TAI type support to bpf_skb_set_tstamp(),
   from Abhishek Chauhan.

2) Add netfilter CT zone ID and direction to bpf_ct_opts so that arbitrary
   CT zones can be used from XDP/tc BPF netfilter CT helper functions,
   from Brad Cowie.

3) Several tweaks to the instruction-set.rst IETF doc to address
   the Last Call review comments, from Dave Thaler.

4) Small batch of riscv64 BPF JIT optimizations in order to emit more
   compressed instructions to the JITed image for better icache efficiency,
   from Xiao Wang.

5) Sort bpftool C dump output from BTF, aiming to simplify vmlinux.h
   diffing and forcing more natural type definitions ordering,
   from Mykyta Yatsenko.

6) Use DEV_STATS_INC() macro in BPF redirect helpers to silence
   a syzbot/KCSAN race report for the tx_errors counter,
   from Jiang Yunshui.

7) Un-constify bpf_func_info in bpftool to fix compilation with LLVM 17+
   which started treating const structs as constants and thus breaking
   full BTF program name resolution, from Ivan Babrou.

8) Fix up BPF program numbers in test_sockmap selftest in order to reduce
   some of the test-internal array sizes, from Geliang Tang.

9) Small cleanup in Makefile.btf script to use test-ge check for v1.25-only
   pahole, from Alan Maguire.

10) Fix bpftool's make dependencies for vmlinux.h in order to avoid needless
    rebuilds in some corner cases, from Artem Savkov.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits)
  bpf, net: Use DEV_STAT_INC()
  bpf, docs: Fix instruction.rst indentation
  bpf, docs: Clarify call local offset
  bpf, docs: Add table captions
  bpf, docs: clarify sign extension of 64-bit use of 32-bit imm
  bpf, docs: Use RFC 2119 language for ISA requirements
  bpf, docs: Move sentence about returning R0 to abi.rst
  bpf: constify member bpf_sysctl_kern:: Table
  riscv, bpf: Try RVC for reg move within BPF_CMPXCHG JIT
  riscv, bpf: Use STACK_ALIGN macro for size rounding up
  riscv, bpf: Optimize zextw insn with Zba extension
  selftests/bpf: Handle forwarding of UDP CLOCK_TAI packets
  net: Add additional bit to support clockid_t timestamp type
  net: Rename mono_delivery_time to tstamp_type for scalabilty
  selftests/bpf: Update tests for new ct zone opts for nf_conntrack kfuncs
  net: netfilter: Make ct zone opts configurable for bpf ct helpers
  selftests/bpf: Fix prog numbers in test_sockmap
  bpf: Remove unused variable "prev_state"
  bpftool: Un-const bpf_func_info to fix it for llvm 17 and newer
  bpf: Fix order of args in call to bpf_map_kvcalloc
  ...
====================

Link: https://lore.kernel.org/r/20240528105924.30905-1-daniel@iogearbox.net


Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parents c30ff5f3 d9cbd834
Loading
Loading
Loading
Loading
+3 −0
Original line number Diff line number Diff line
@@ -23,3 +23,6 @@ The BPF calling convention is defined as:

R0 - R5 are scratch registers and BPF programs needs to spill/fill them if
necessary across calls.

The BPF program needs to store the return value into register R0 before doing an
``EXIT``.
+152 −109
Original line number Diff line number Diff line
@@ -14,6 +14,13 @@ set architecture (ISA).
Documentation conventions
=========================

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
BCP 14 `<https://www.rfc-editor.org/info/rfc2119>`_
`RFC8174 <https://www.rfc-editor.org/info/rfc8174>`_
when, and only when, they appear in all capitals, as shown here.

For brevity and consistency, this document refers to families
of types using a shorthand syntax and refers to several expository,
mnemonic functions when describing the semantics of instructions.
@@ -25,7 +32,7 @@ Types
This document refers to integer types with the notation `SN` to specify
a type's signedness (`S`) and bit width (`N`), respectively.

.. table:: Meaning of signedness notation.
.. table:: Meaning of signedness notation

  ==== =========
  S    Meaning
@@ -34,7 +41,7 @@ a type's signedness (`S`) and bit width (`N`), respectively.
  s    signed
  ==== =========

.. table:: Meaning of bit-width notation.
.. table:: Meaning of bit-width notation

  ===== =========
  N     Bit width
@@ -106,9 +113,9 @@ Conformance groups

An implementation does not need to support all instructions specified in this
document (e.g., deprecated instructions).  Instead, a number of conformance
groups are specified.  An implementation must support the base32 conformance
group and may support additional conformance groups, where supporting a
conformance group means it must support all instructions in that conformance
groups are specified.  An implementation MUST support the base32 conformance
group and MAY support additional conformance groups, where supporting a
conformance group means it MUST support all instructions in that conformance
group.

The use of named conformance groups enables interoperability between a runtime
@@ -209,7 +216,7 @@ For example::
  07     1       0        00 00  11 22 33 44  r1 += 0x11223344 // big

Note that most instructions do not use all of the fields.
Unused fields shall be cleared to zero.
Unused fields SHALL be cleared to zero.

Wide instruction encoding
--------------------------
@@ -256,6 +263,8 @@ Instruction classes

The three least significant bits of the 'opcode' field store the instruction class:

.. table:: Instruction class

  =====  =====  ===============================  ===================================
  class  value  description                      reference
  =====  =====  ===============================  ===================================
@@ -285,6 +294,8 @@ For arithmetic and jump instructions (``ALU``, ``ALU64``, ``JMP`` and
**s (source)**
  the source operand location, which unless otherwise specified is one of:

  .. table:: Source operand location

    ======  =====  ==============================================
    source  value  description
    ======  =====  ==============================================
@@ -305,6 +316,8 @@ The 'code' field encodes the operation as below, where 'src' refers to the
the source operand and 'dst' refers to the value of the destination
register.

.. table:: Arithmetic instructions

  =====  =====  =======  ==========================================================
  name   code   offset   description
  =====  =====  =======  ==========================================================
@@ -374,7 +387,7 @@ interpreted as a 64-bit signed value.
Note that there are varying definitions of the signed modulo operation
when the dividend or divisor are negative, where implementations often
vary by language such that Python, Ruby, etc.  differ from C, Go, Java,
etc. This specification requires that signed modulo use truncated division
etc. This specification requires that signed modulo MUST use truncated division
(where -13 % 3 == -1) as implemented in C, Go, etc.::

   a % n = a - n * trunc(a / n)
@@ -386,6 +399,19 @@ The ``MOVSX`` instruction does a move operation with sign extension.
operands into 64-bit operands.  Unlike other arithmetic instructions,
``MOVSX`` is only defined for register source operands (``X``).

``{MOV, K, ALU64}`` means::

  dst = (s64)imm

``{MOV, X, ALU}`` means::

  dst = (u32)src

``{MOVSX, X, ALU}`` with 'offset' 8 means::

  dst = (u32)(s32)(s8)src


The ``NEG`` instruction is only defined when the source bit is clear
(``K``).

@@ -404,7 +430,9 @@ only and do not use a separate source register or immediate value.
For ``ALU``, the 1-bit source operand field in the opcode is used to
select what byte order the operation converts from or to. For
``ALU64``, the 1-bit source operand field in the opcode is reserved
and must be set to 0.
and MUST be set to 0.

.. table:: Byte swap instructions

  =====  ========  =====  =================================================
  class  source    value  description
@@ -448,6 +476,8 @@ otherwise identical operations, and indicates the base64 conformance
group unless otherwise specified.
The 'code' field encodes the operation as below:

.. table:: Jump instructions

  ========  =====  =======  =================================  ===================================================
  code      value  src_reg  description                        notes
  ========  =====  =======  =================================  ===================================================
@@ -476,9 +506,6 @@ the jump instruction. Thus 'PC += 1' skips execution of the next
instruction if it's a basic instruction or results in undefined behavior
if the next instruction is a 128-bit wide instruction.

The BPF program needs to store the return value into register R0 before doing an
``EXIT``.

Example:

``{JSGE, X, JMP32}`` means::
@@ -487,6 +514,10 @@ Example:

where 's>=' indicates a signed '>=' comparison.

``{JLE, K, JMP}`` means::

  if dst <= (u64)(s64)imm goto +offset

``{JA, K, JMP32}`` means::

  gotol +imm
@@ -515,14 +546,16 @@ for each program type, but static IDs are unique across all program types.

Platforms that support the BPF Type Format (BTF) support identifying
a helper function by a BTF ID encoded in the 'imm' field, where the BTF ID
identifies the helper name and type.
identifies the helper name and type.  Further documentation of BTF
is outside the scope of this document and is left for future work.

Program-local functions
~~~~~~~~~~~~~~~~~~~~~~~
Program-local functions are functions exposed by the same BPF program as the
caller, and are referenced by offset from the call instruction, similar to
``JA``.  The offset is encoded in the 'imm' field of the call instruction.
An ``EXIT`` within the program-local function will return to the caller.
caller, and are referenced by offset from the instruction following the call
instruction, similar to ``JA``.  The offset is encoded in the 'imm' field of
the call instruction. An ``EXIT`` within the program-local function will
return to the caller.

Load and store instructions
===========================
@@ -537,6 +570,8 @@ For load and store instructions (``LD``, ``LDX``, ``ST``, and ``STX``), the
**mode**
  The mode modifier is one of:

  .. table:: Mode modifier

    =============  =====  ====================================  =============
    mode modifier  value  description                           reference
    =============  =====  ====================================  =============
@@ -551,6 +586,8 @@ For load and store instructions (``LD``, ``LDX``, ``ST``, and ``STX``), the
**sz (size)**
  The size modifier is one of:

  .. table:: Size modifier

    ====  =====  =====================
    size  value  description
    ====  =====  =====================
@@ -619,6 +656,8 @@ The 'imm' field is used to encode the actual atomic operation.
Simple atomic operation use a subset of the values defined to encode
arithmetic operations in the 'imm' field to encode the atomic operation:

.. table:: Simple atomic operations

  ========  =====  ===========
  imm       value  description
  ========  =====  ===========
@@ -640,6 +679,8 @@ XOR 0xa0 atomic xor
In addition to the simple atomic operations, there also is a modifier and
two complex atomic operations:

.. table:: Complex atomic operations

  ===========  ================  ===========================
  imm          value             description
  ===========  ================  ===========================
@@ -673,6 +714,8 @@ The following table defines a set of ``{IMM, DW, LD}`` instructions
with opcode subtypes in the 'src_reg' field, using new terms such as "map"
defined further below:

.. table:: 64-bit immediate instructions

  =======  =========================================  ===========  ==============
  src_reg  pseudocode                                 imm type     dst type
  =======  =========================================  ===========  ==============
@@ -725,5 +768,5 @@ carried over from classic BPF. These instructions used an instruction
class of ``LD``, a size modifier of ``W``, ``H``, or ``B``, and a
mode modifier of ``ABS`` or ``IND``.  The 'dst_reg' and 'offset' fields were
set to zero, and 'src_reg' was set to zero for ``ABS``.  However, these
instructions are deprecated and should no longer be used.  All legacy packet
instructions are deprecated and SHOULD no longer be used.  All legacy packet
access instructions belong to the "packet" conformance group.
+12 −0
Original line number Diff line number Diff line
@@ -604,6 +604,18 @@ config TOOLCHAIN_HAS_VECTOR_CRYPTO
	def_bool $(as-instr, .option arch$(comma) +v$(comma) +zvkb)
	depends on AS_HAS_OPTION_ARCH

config RISCV_ISA_ZBA
	bool "Zba extension support for bit manipulation instructions"
	default y
	help
	   Add support for enabling optimisations in the kernel when the Zba
	   extension is detected at boot.

	   The Zba extension provides instructions to accelerate the generation
	   of addresses that index into arrays of basic data types.

	   If you don't know what to do here, say Y.

config RISCV_ISA_ZBB
	bool "Zbb extension support for bit manipulation instructions"
	depends on TOOLCHAIN_HAS_ZBB
+18 −0
Original line number Diff line number Diff line
@@ -18,6 +18,11 @@ static inline bool rvc_enabled(void)
	return IS_ENABLED(CONFIG_RISCV_ISA_C);
}

static inline bool rvzba_enabled(void)
{
	return IS_ENABLED(CONFIG_RISCV_ISA_ZBA) && riscv_has_extension_likely(RISCV_ISA_EXT_ZBA);
}

static inline bool rvzbb_enabled(void)
{
	return IS_ENABLED(CONFIG_RISCV_ISA_ZBB) && riscv_has_extension_likely(RISCV_ISA_EXT_ZBB);
@@ -939,6 +944,14 @@ static inline u16 rvc_sdsp(u32 imm9, u8 rs2)
	return rv_css_insn(0x7, imm, rs2, 0x2);
}

/* RV64-only ZBA instructions. */

static inline u32 rvzba_zextw(u8 rd, u8 rs1)
{
	/* add.uw rd, rs1, ZERO */
	return rv_r_insn(0x04, RV_REG_ZERO, rs1, 0, rd, 0x3b);
}

#endif /* __riscv_xlen == 64 */

/* Helper functions that emit RVC instructions when possible. */
@@ -1161,6 +1174,11 @@ static inline void emit_zexth(u8 rd, u8 rs, struct rv_jit_context *ctx)

static inline void emit_zextw(u8 rd, u8 rs, struct rv_jit_context *ctx)
{
	if (rvzba_enabled()) {
		emit(rvzba_zextw(rd, rs), ctx);
		return;
	}

	emit_slli(rd, rs, 32, ctx);
	emit_srli(rd, rd, 32, ctx);
}
+7 −5
Original line number Diff line number Diff line
@@ -537,8 +537,10 @@ static void emit_atomic(u8 rd, u8 rs, s16 off, s32 imm, bool is64,
	/* r0 = atomic_cmpxchg(dst_reg + off16, r0, src_reg); */
	case BPF_CMPXCHG:
		r0 = bpf_to_rv_reg(BPF_REG_0, ctx);
		emit(is64 ? rv_addi(RV_REG_T2, r0, 0) :
		     rv_addiw(RV_REG_T2, r0, 0), ctx);
		if (is64)
			emit_mv(RV_REG_T2, r0, ctx);
		else
			emit_addiw(RV_REG_T2, r0, 0, ctx);
		emit(is64 ? rv_lr_d(r0, 0, rd, 0, 0) :
		     rv_lr_w(r0, 0, rd, 0, 0), ctx);
		jmp_offset = ninsns_rvoff(8);
@@ -868,7 +870,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
	stack_size += 8;
	sreg_off = stack_size;

	stack_size = round_up(stack_size, 16);
	stack_size = round_up(stack_size, STACK_ALIGN);

	if (!is_struct_ops) {
		/* For the trampoline called from function entry,
@@ -1960,7 +1962,7 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx, bool is_subprog)
{
	int i, stack_adjust = 0, store_offset, bpf_stack_adjust;

	bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16);
	bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, STACK_ALIGN);
	if (bpf_stack_adjust)
		mark_fp(ctx);

@@ -1982,7 +1984,7 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx, bool is_subprog)
	if (ctx->arena_vm_start)
		stack_adjust += 8;

	stack_adjust = round_up(stack_adjust, 16);
	stack_adjust = round_up(stack_adjust, STACK_ALIGN);
	stack_adjust += bpf_stack_adjust;

	store_offset = stack_adjust - 8;
Loading