Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next (4b2765ae) · Commits · git / linux-net

Documentation/bpf/kfuncs.rst

+4 −4

Original line number	Diff line number	Diff line
		@@ -177,10 +177,10 @@ In addition to kfuncs' arguments, verifier may need more information about the
		type of kfunc(s) being registered with the BPF subsystem. To do so, we define
		flags on a set of kfuncs as follows::

		BTF_SET8_START(bpf_task_set)
		BTF_KFUNCS_START(bpf_task_set)
		BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE \| KF_RET_NULL)
		BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE)
		BTF_SET8_END(bpf_task_set)
		BTF_KFUNCS_END(bpf_task_set)

		This set encodes the BTF ID of each kfunc listed above, and encodes the flags
		along with it. Ofcourse, it is also allowed to specify no flags.
		@@ -347,10 +347,10 @@ Once the kfunc is prepared for use, the final step to making it visible is
		registering it with the BPF subsystem. Registration is done per BPF program
		type. An example is shown below::

		BTF_SET8_START(bpf_task_set)
		BTF_KFUNCS_START(bpf_task_set)
		BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE \| KF_RET_NULL)
		BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE)
		BTF_SET8_END(bpf_task_set)
		BTF_KFUNCS_END(bpf_task_set)

		static const struct btf_kfunc_id_set bpf_task_kfunc_set = {
		.owner = THIS_MODULE,

Documentation/bpf/map_lpm_trie.rst

+1 −1

Original line number	Diff line number	Diff line
		@@ -17,7 +17,7 @@ significant byte.

		LPM tries may be created with a maximum prefix length that is a multiple
		of 8, in the range from 8 to 2048. The key used for lookup and update
		operations is a ``struct bpf_lpm_trie_key``, extended by
		operations is a ``struct bpf_lpm_trie_key_u8``, extended by
		``max_prefixlen/8`` bytes.

		- For IPv4 addresses the data length is 4 bytes

Documentation/bpf/standardization/instruction-set.rst

+89 −66

Original line number	Diff line number	Diff line
		.. contents::
		.. sectnum::

		=======================================
		BPF Instruction Set Specification, v1.0
		=======================================
		======================================
		BPF Instruction Set Architecture (ISA)
		======================================

		This document specifies version 1.0 of the BPF instruction set.
		This document specifies the BPF instruction set architecture (ISA).

		Documentation conventions
		=========================
		@@ -102,7 +102,7 @@ Conformance groups

		An implementation does not need to support all instructions specified in this
		document (e.g., deprecated instructions). Instead, a number of conformance
		groups are specified. An implementation must support the "basic" conformance
		groups are specified. An implementation must support the base32 conformance
		group and may support additional conformance groups, where supporting a
		conformance group means it must support all instructions in that conformance
		group.
		@@ -112,12 +112,22 @@ that executes instructions, and tools as such compilers that generate
		instructions for the runtime. Thus, capability discovery in terms of
		conformance groups might be done manually by users or automatically by tools.

		Each conformance group has a short ASCII label (e.g., "basic") that
		Each conformance group has a short ASCII label (e.g., "base32") that
		corresponds to a set of instructions that are mandatory. That is, each
		instruction has one or more conformance groups of which it is a member.

		The "basic" conformance group includes all instructions defined in this
		This document defines the following conformance groups:

		* base32: includes all instructions defined in this
		specification unless otherwise noted.
		* base64: includes base32, plus instructions explicitly noted
		as being in the base64 conformance group.
		* atomic32: includes 32-bit atomic operation instructions (see `Atomic operations`_).
		* atomic64: includes atomic32, plus 64-bit atomic operation instructions.
		* divmul32: includes 32-bit division, multiplication, and modulo instructions.
		* divmul64: includes divmul32, plus 64-bit division, multiplication,
		and modulo instructions.
		* legacy: deprecated packet access instructions.

		Instruction encoding
		====================
		@@ -166,10 +176,10 @@ Note that most instructions do not use all of the fields.
		Unused fields shall be cleared to zero.

		As discussed below in `64-bit immediate instructions`_, a 64-bit immediate
		instruction uses a 64-bit immediate value that is constructed as follows.
		instruction uses two 32-bit immediate values that are constructed as follows.
		The 64 bits following the basic instruction contain a pseudo instruction
		using the same format but with opcode, dst_reg, src_reg, and offset all set to zero,
		and imm containing the high 32 bits of the immediate value.
		using the same format but with 'opcode', 'dst_reg', 'src_reg', and 'offset' all
		set to zero, and imm containing the high 32 bits of the immediate value.

		This is depicted in the following figure::

		@@ -181,13 +191,8 @@ This is depicted in the following figure::
		'--------------'
		pseudo instruction

		Thus the 64-bit immediate value is constructed as follows:

		imm64 = (next_imm << 32) \| imm

		where 'next_imm' refers to the imm value of the pseudo instruction
		following the basic instruction. The unused bytes in the pseudo
		instruction are reserved and shall be cleared to zero.
		Here, the imm value of the pseudo instruction is called 'next_imm'. The unused
		bytes in the pseudo instruction are reserved and shall be cleared to zero.

		Instruction classes
		-------------------
		@@ -239,7 +244,8 @@ Arithmetic instructions
		-----------------------

		``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for
		otherwise identical operations.
		otherwise identical operations. ``BPF_ALU64`` instructions belong to the
		base64 conformance group unless noted otherwise.
		The 'code' field encodes the operation as below, where 'src' and 'dst' refer
		to the values of the source and destination registers, respectively.

		@@ -284,15 +290,19 @@ where '(u32)' indicates that the upper 32 bits are zeroed.

		``BPF_XOR \| BPF_K \| BPF_ALU`` means::

		dst = (u32) dst ^ (u32) imm32
		dst = (u32) dst ^ (u32) imm

		``BPF_XOR \| BPF_K \| BPF_ALU64`` means::

		dst = dst ^ imm32
		dst = dst ^ imm

		Note that most instructions have instruction offset of 0. Only three instructions
		(``BPF_SDIV``, ``BPF_SMOD``, ``BPF_MOVSX``) have a non-zero offset.

		Division, multiplication, and modulo operations for ``BPF_ALU`` are part
		of the "divmul32" conformance group, and division, multiplication, and
		modulo operations for ``BPF_ALU64`` are part of the "divmul64" conformance
		group.
		The division and modulo operations support both unsigned and signed flavors.

		For unsigned operations (``BPF_DIV`` and ``BPF_MOD``), for ``BPF_ALU``,
		@@ -349,7 +359,9 @@ BPF_ALU64 Reserved 0x00 do byte swap unconditionally
		========= ========= ===== =================================================

		The 'imm' field encodes the width of the swap operations. The following widths
		are supported: 16, 32 and 64.
		are supported: 16, 32 and 64. Width 64 operations belong to the base64
		conformance group and other swap operations belong to the base32
		conformance group.

		Examples:

		@@ -374,13 +386,15 @@ Examples:
		Jump instructions
		-----------------

		``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for
		otherwise identical operations.
		``BPF_JMP32`` uses 32-bit wide operands and indicates the base32
		conformance group, while ``BPF_JMP`` uses 64-bit wide operands for
		otherwise identical operations, and indicates the base64 conformance
		group unless otherwise specified.
		The 'code' field encodes the operation as below:

		======== ===== === =============================== =============================================
		code value src description notes
		======== ===== === =============================== =============================================
		======== ===== ======= =============================== =============================================
		code value src_reg description notes
		======== ===== ======= =============================== =============================================
		BPF_JA 0x0 0x0 PC += offset BPF_JMP \| BPF_K only
		BPF_JA 0x0 0x0 PC += imm BPF_JMP32 \| BPF_K only
		BPF_JEQ 0x1 any PC += offset if dst == src
		@@ -398,7 +412,7 @@ BPF_JLT 0xa any PC += offset if dst < src unsigned
		BPF_JLE 0xb any PC += offset if dst <= src unsigned
		BPF_JSLT 0xc any PC += offset if dst < src signed
		BPF_JSLE 0xd any PC += offset if dst <= src signed
		======== ===== === =============================== =============================================
		======== ===== ======= =============================== =============================================

		The BPF program needs to store the return value into register R0 before doing a
		``BPF_EXIT``.
		@@ -424,6 +438,9 @@ specified by the 'imm' field. A > 16-bit conditional jump may be
		converted to a < 16-bit conditional jump plus a 32-bit unconditional
		jump.

		All ``BPF_CALL`` and ``BPF_JA`` instructions belong to the
		base32 conformance group.

		Helper functions
		~~~~~~~~~~~~~~~~

		@@ -481,6 +498,8 @@ The size modifier is one of:
		BPF_DW 0x18 double word (8 bytes)
		============= ===== =====================

		Instructions using ``BPF_DW`` belong to the base64 conformance group.

		Regular load and store operations
		---------------------------------

		@@ -493,7 +512,7 @@ instructions that transfer data between a register and memory.

		``BPF_MEM \| <size> \| BPF_ST`` means::

		(size ) (dst + offset) = imm32
		(size ) (dst + offset) = imm

		``BPF_MEM \| <size> \| BPF_LDX`` means::

		@@ -525,8 +544,10 @@ by other BPF programs or means outside of this specification.
		All atomic operations supported by BPF are encoded as store operations
		that use the ``BPF_ATOMIC`` mode modifier as follows:

		* ``BPF_ATOMIC \| BPF_W \| BPF_STX`` for 32-bit operations
		* ``BPF_ATOMIC \| BPF_DW \| BPF_STX`` for 64-bit operations
		* ``BPF_ATOMIC \| BPF_W \| BPF_STX`` for 32-bit operations, which are
		part of the "atomic32" conformance group.
		* ``BPF_ATOMIC \| BPF_DW \| BPF_STX`` for 64-bit operations, which are
		part of the "atomic64" conformance group.
		* 8-bit and 16-bit wide atomic operations are not supported.

		The 'imm' field is used to encode the actual atomic operation.
		@@ -547,7 +568,7 @@ BPF_XOR 0xa0 atomic xor

		(u32 )(dst + offset) += src

		``BPF_ATOMIC \| BPF_DW \| BPF_STX`` with 'imm' = BPF ADD means::
		``BPF_ATOMIC \| BPF_DW \| BPF_STX`` with 'imm' = BPF_ADD means::

		(u64 )(dst + offset) += src

		@@ -580,24 +601,24 @@ and loaded back to ``R0``.
		-----------------------------

		Instructions with the ``BPF_IMM`` 'mode' modifier use the wide instruction
		encoding defined in `Instruction encoding`_, and use the 'src' field of the
		encoding defined in `Instruction encoding`_, and use the 'src_reg' field of the
		basic instruction to hold an opcode subtype.

		The following table defines a set of ``BPF_IMM \| BPF_DW \| BPF_LD`` instructions
		with opcode subtypes in the 'src' field, using new terms such as "map"
		with opcode subtypes in the 'src_reg' field, using new terms such as "map"
		defined further below:

		========================= ====== === ========================================= =========== ==============
		opcode construction opcode src pseudocode imm type dst type
		========================= ====== === ========================================= =========== ==============
		BPF_IMM \| BPF_DW \| BPF_LD 0x18 0x0 dst = imm64 integer integer
		========================= ====== ======= ========================================= =========== ==============
		opcode construction opcode src_reg pseudocode imm type dst type
		========================= ====== ======= ========================================= =========== ==============
		BPF_IMM \| BPF_DW \| BPF_LD 0x18 0x0 dst = (next_imm << 32) \| imm integer integer
		BPF_IMM \| BPF_DW \| BPF_LD 0x18 0x1 dst = map_by_fd(imm) map fd map
		BPF_IMM \| BPF_DW \| BPF_LD 0x18 0x2 dst = map_val(map_by_fd(imm)) + next_imm map fd data pointer
		BPF_IMM \| BPF_DW \| BPF_LD 0x18 0x3 dst = var_addr(imm) variable id data pointer
		BPF_IMM \| BPF_DW \| BPF_LD 0x18 0x4 dst = code_addr(imm) integer code pointer
		BPF_IMM \| BPF_DW \| BPF_LD 0x18 0x5 dst = map_by_idx(imm) map index map
		BPF_IMM \| BPF_DW \| BPF_LD 0x18 0x6 dst = map_val(map_by_idx(imm)) + next_imm map index data pointer
		========================= ====== === ========================================= =========== ==============
		========================= ====== ======= ========================================= =========== ==============

		where

		@@ -635,7 +656,9 @@ Legacy BPF Packet access instructions
		-------------------------------------

		BPF previously introduced special instructions for access to packet data that were
		carried over from classic BPF. However, these instructions are
		deprecated and should no longer be used. All legacy packet access
		instructions belong to the "legacy" conformance group instead of the "basic"
		conformance group.
		carried over from classic BPF. These instructions used an instruction
		class of BPF_LD, a size modifier of BPF_W, BPF_H, or BPF_B, and a
		mode modifier of BPF_ABS or BPF_IND. The 'dst_reg' and 'offset' fields were
		set to zero, and 'src_reg' was set to zero for BPF_ABS. However, these
		instructions are deprecated and should no longer be used. All legacy packet
		access instructions belong to the "legacy" conformance group.

Documentation/networking/af_xdp.rst

+19 −14

Original line number	Diff line number	Diff line
		@@ -329,23 +329,24 @@ XDP_SHARED_UMEM option and provide the initial socket's fd in the
		sxdp_shared_umem_fd field as you registered the UMEM on that
		socket. These two sockets will now share one and the same UMEM.

		There is no need to supply an XDP program like the one in the previous
		case where sockets were bound to the same queue id and
		device. Instead, use the NIC's packet steering capabilities to steer
		the packets to the right queue. In the previous example, there is only
		one queue shared among sockets, so the NIC cannot do this steering. It
		can only steer between queues.

		In libbpf, you need to use the xsk_socket__create_shared() API as it
		takes a reference to a FILL ring and a COMPLETION ring that will be
		created for you and bound to the shared UMEM. You can use this
		function for all the sockets you create, or you can use it for the
		second and following ones and use xsk_socket__create() for the first
		one. Both methods yield the same result.
		In this case, it is possible to use the NIC's packet steering
		capabilities to steer the packets to the right queue. This is not
		possible in the previous example as there is only one queue shared
		among sockets, so the NIC cannot do this steering as it can only steer
		between queues.

		In libxdp (or libbpf prior to version 1.0), you need to use the
		xsk_socket__create_shared() API as it takes a reference to a FILL ring
		and a COMPLETION ring that will be created for you and bound to the
		shared UMEM. You can use this function for all the sockets you create,
		or you can use it for the second and following ones and use
		xsk_socket__create() for the first one. Both methods yield the same
		result.

		Note that a UMEM can be shared between sockets on the same queue id
		and device, as well as between queues on the same device and between
		devices at the same time.
		devices at the same time. It is also possible to redirect to any
		socket as long as it is bound to the same umem with XDP_SHARED_UMEM.

		XDP_USE_NEED_WAKEUP bind flag
		-----------------------------
		@@ -822,6 +823,10 @@ A: The short answer is no, that is not supported at the moment. The
		switch, or other distribution mechanism, in your NIC to direct
		traffic to the correct queue id and socket.

		Note that if you are using the XDP_SHARED_UMEM option, it is
		possible to switch traffic between any socket bound to the same
		umem.

		Q: My packets are sometimes corrupted. What is wrong?

		A: Care has to be taken not to feed the same buffer in the UMEM into

arch/arm64/include/asm/patching.h

+2 −0

Original line number	Diff line number	Diff line
		@@ -8,6 +8,8 @@ int aarch64_insn_read(void addr, u32 insnp);
		int aarch64_insn_write(void *addr, u32 insn);

		int aarch64_insn_write_literal_u64(void *addr, u64 val);
		void aarch64_insn_set(void dst, u32 insn, size_t len);
		void aarch64_insn_copy(void dst, void *src, size_t len);

		int aarch64_insn_patch_text_nosync(void *addr, u32 insn);
		int aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt);