Commit 1b294a1f authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull networking updates from Jakub Kicinski:
 "Core & protocols:

   - Complete rework of garbage collection of AF_UNIX sockets.

     AF_UNIX is prone to forming reference count cycles due to fd
     passing functionality. New method based on Tarjan's Strongly
     Connected Components algorithm should be both faster and remove a
     lot of workarounds we accumulated over the years.

   - Add TCP fraglist GRO support, allowing chaining multiple TCP
     packets and forwarding them together. Useful for small switches /
     routers which lack basic checksum offload in some scenarios (e.g.
     PPPoE).

   - Support using SMP threads for handling packet backlog i.e. packet
     processing from software interfaces and old drivers which don't use
     NAPI. This helps move the processing out of the softirq jumble.

   - Continue work of converting from rtnl lock to RCU protection.

     Don't require rtnl lock when reading: IPv6 routing FIB, IPv6
     address labels, netdev threaded NAPI sysfs files, bonding driver's
     sysfs files, MPLS devconf, IPv4 FIB rules, netns IDs, tcp metrics,
     TC Qdiscs, neighbor entries, ARP entries via ioctl(SIOCGARP), a lot
     of the link information available via rtnetlink.

   - Small optimizations from Eric to UDP wake up handling, memory
     accounting, RPS/RFS implementation, TCP packet sizing etc.

   - Allow direct page recycling in the bulk API used by XDP, for +2%
     PPS.

   - Support peek with an offset on TCP sockets.

   - Add MPTCP APIs for querying last time packets were received/sent/acked
     and whether MPTCP "upgrade" succeeded on a TCP socket.

   - Add intra-node communication shortcut to improve SMC performance.

   - Add IPv6 (and IPv{4,6}-over-IPv{4,6}) support to the GTP protocol
     driver.

   - Add HSR-SAN (RedBOX) mode of operation to the HSR protocol driver.

   - Add reset reasons for tracing what caused a TCP reset to be sent.

   - Introduce direction attribute for xfrm (IPSec) states. State can be
     used either for input or output packet processing.

  Things we sprinkled into general kernel code:

   - Add bitmap_{read,write}(), bitmap_size(), expose BYTES_TO_BITS().

     This required touch-ups and renaming of a few existing users.

   - Add Endian-dependent __counted_by_{le,be} annotations.

   - Make building selftests "quieter" by printing summaries like
     "CC object.o" rather than full commands with all the arguments.

  Netfilter:

   - Use GFP_KERNEL to clone elements, to deal better with OOM
     situations and avoid failures in the .commit step.

  BPF:

   - Add eBPF JIT for ARCv2 CPUs.

   - Support attaching kprobe BPF programs through kprobe_multi link in
     a session mode, meaning, a BPF program is attached to both function
     entry and return, the entry program can decide if the return
     program gets executed and the entry program can share u64 cookie
     value with return program. "Session mode" is a common use-case for
     tetragon and bpftrace.

   - Add the ability to specify and retrieve BPF cookie for raw
     tracepoint programs in order to ease migration from classic to raw
     tracepoints.

   - Add an internal-only BPF per-CPU instruction for resolving per-CPU
     memory addresses and implement support in x86, ARM64 and RISC-V
     JITs. This allows inlining functions which need to access per-CPU
     state.

   - Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
     atomics in bpf_arena which can be JITed as a single x86
     instruction. Support BPF arena on ARM64.

   - Add a new bpf_wq API for deferring events and refactor
     process-context bpf_timer code to keep common code where possible.

   - Harden the BPF verifier's and/or/xor value tracking.

   - Introduce crypto kfuncs to let BPF programs call kernel crypto
     APIs.

   - Support bpf_tail_call_static() helper for BPF programs with GCC 13.

   - Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
     program to have code sections where preemption is disabled.

  Driver API:

   - Skip software TC processing completely if all installed rules are
     marked as HW-only, instead of checking the HW-only flag rule by
     rule.

   - Add support for configuring PoE (Power over Ethernet), similar to
     the already existing support for PoDL (Power over Data Line)
     config.

   - Initial bits of a queue control API, for now allowing a single
     queue to be reset without disturbing packet flow to other queues.

   - Common (ethtool) statistics for hardware timestamping.

  Tests and tooling:

   - Remove the need to create a config file to run the net forwarding
     tests so that a naive "make run_tests" can exercise them.

   - Define a method of writing tests which require an external endpoint
     to communicate with (to send/receive data towards the test
     machine). Add a few such tests.

   - Create a shared code library for writing Python tests. Expose the
     YAML Netlink library from tools/ to the tests for easy Netlink
     access.

   - Move netfilter tests under net/, extend them, separate performance
     tests from correctness tests, and iron out issues found by running
     them "on every commit".

   - Refactor BPF selftests to use common network helpers.

   - Further work filling in YAML definitions of Netlink messages for:
     nftables, team driver, bonding interfaces, vlan interfaces, VF
     info, TC u32 mark, TC police action.

   - Teach Python YAML Netlink to decode attribute policies.

   - Extend the definition of the "indexed array" construct in the specs
     to cover arrays of scalars rather than just nests.

   - Add hyperlinks between definitions in generated Netlink docs.

  Drivers:

   - Make sure unsupported flower control flags are rejected by drivers,
     and make more drivers report errors directly to the application
     rather than dmesg (large number of driver changes from Asbjørn
     Sloth Tønnesen).

   - Ethernet high-speed NICs:
      - Broadcom (bnxt):
         - support multiple RSS contexts and steering traffic to them
         - support XDP metadata
         - make page pool allocations more NUMA aware
      - Intel (100G, ice, idpf):
         - extract datapath code common among Intel drivers into a library
         - use fewer resources in switchdev by sharing queues with the PF
         - add PFCP filter support
         - add Ethernet filter support
         - use a spinlock instead of HW lock in PTP clock ops
         - support 5 layer Tx scheduler topology
      - nVidia/Mellanox:
         - 800G link modes and 100G SerDes speeds
         - per-queue IRQ coalescing configuration
      - Marvell Octeon:
         - support offloading TC packet mark action

   - Ethernet NICs consumer, embedded and virtual:
      - stop lying about skb->truesize in USB Ethernet drivers, it
        messes up TCP memory calculations
      - Google cloud vNIC:
         - support changing ring size via ethtool
         - support ring reset using the queue control API
      - VirtIO net:
         - expose flow hash from RSS to XDP
         - per-queue statistics
         - add selftests
      - Synopsys (stmmac):
         - support controllers which require an RX clock signal from the
           MII bus to perform their hardware initialization
      - TI:
         - icssg_prueth: support ICSSG-based Ethernet on AM65x SR1.0 devices
         - icssg_prueth: add SW TX / RX Coalescing based on hrtimers
         - cpsw: minimal XDP support
      - Renesas (ravb):
         - support describing the MDIO bus
      - Realtek (r8169):
         - add support for RTL8168M
      - Microchip Sparx5:
         - matchall and flower actions mirred and redirect

   - Ethernet switches:
      - nVidia/Mellanox:
         - improve events processing performance
      - Marvell:
         - add support for MV88E6250 family internal PHYs
      - Microchip:
         - add DCB and DSCP mapping support for KSZ switches
         - vsc73xx: convert to PHYLINK
      - Realtek:
         - rtl8226b/rtl8221b: add C45 instances and SerDes switching

   - Many driver changes related to PHYLIB and PHYLINK deprecated API
     cleanup

   - Ethernet PHYs:
      - Add a new driver for Airoha EN8811H 2.5 Gigabit PHY.
      - micrel: lan8814: add support for PPS out and external timestamp trigger

   - WiFi:
      - Disable Wireless Extensions (WEXT) in all Wi-Fi 7 devices
        drivers. Modern devices can only be configured using nl80211.
      - mac80211/cfg80211
         - handle color change per link for WiFi 7 Multi-Link Operation
      - Intel (iwlwifi):
         - don't support puncturing in 5 GHz
         - support monitor mode on passive channels
         - BZ-W device support
         - P2P with HE/EHT support
         - re-add support for firmware API 90
         - provide channel survey information for Automatic Channel Selection
      - MediaTek (mt76):
         - mt7921 LED control
         - mt7925 EHT radiotap support
         - mt7920e PCI support
      - Qualcomm (ath11k):
         - P2P support for QCA6390, WCN6855 and QCA2066
         - support hibernation
         - ieee80211-freq-limit Device Tree property support
      - Qualcomm (ath12k):
         - refactoring in preparation of multi-link support
         - suspend and hibernation support
         - ACPI support
         - debugfs support, including dfs_simulate_radar support
      - RealTek:
         - rtw88: RTL8723CS SDIO device support
         - rtw89: RTL8922AE Wi-Fi 7 PCI device support
         - rtw89: complete features of new WiFi 7 chip 8922AE including
           BT-coexistence and Wake-on-WLAN
         - rtw89: use BIOS ACPI settings to set TX power and channels
         - rtl8xxxu: enable Management Frame Protection (MFP) support

   - Bluetooth:
      - support for Intel BlazarI and Filmore Peak2 (BE201)
      - support for MediaTek MT7921S SDIO
      - initial support for Intel PCIe BT driver
      - remove HCI_AMP support"

* tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1827 commits)
  selftests: netfilter: fix packetdrill conntrack testcase
  net: gro: fix napi_gro_cb zeroed alignment
  Bluetooth: btintel_pcie: Refactor and code cleanup
  Bluetooth: btintel_pcie: Fix warning reported by sparse
  Bluetooth: hci_core: Fix not handling hdev->le_num_of_adv_sets=1
  Bluetooth: btintel: Fix compiler warning for multi_v7_defconfig config
  Bluetooth: btintel_pcie: Fix compiler warnings
  Bluetooth: btintel_pcie: Add *setup* function to download firmware
  Bluetooth: btintel_pcie: Add support for PCIe transport
  Bluetooth: btintel: Export few static functions
  Bluetooth: HCI: Remove HCI_AMP support
  Bluetooth: L2CAP: Fix div-by-zero in l2cap_le_flowctl_init()
  Bluetooth: qca: Fix error code in qca_read_fw_build_info()
  Bluetooth: hci_conn: Use __counted_by() and avoid -Wfamnae warning
  Bluetooth: btintel: Add support for Filmore Peak2 (BE201)
  Bluetooth: btintel: Add support for BlazarI
  LE Create Connection command timeout increased to 20 secs
  dt-bindings: net: bluetooth: Add MediaTek MT7921S SDIO Bluetooth
  Bluetooth: compute LE flow credits based on recvbuf space
  Bluetooth: hci_sync: Use cmd->num_cis instead of magic number
  ...
parents b850dc20 654de42f
Loading
Loading
Loading
Loading
+1 −0
Original line number Diff line number Diff line
@@ -72,6 +72,7 @@ two flavors of JITs, the newer eBPF JIT currently supported on:
  - riscv64
  - riscv32
  - loongarch64
  - arc

And the older cBPF JIT supported on the following archs:

+62 −47
Original line number Diff line number Diff line
@@ -5,7 +5,11 @@
BPF Instruction Set Architecture (ISA)
======================================

This document specifies the BPF instruction set architecture (ISA).
eBPF (which is no longer an acronym for anything), also commonly
referred to as BPF, is a technology with origins in the Linux kernel
that can run untrusted programs in a privileged context such as an
operating system kernel. This document specifies the BPF instruction
set architecture (ISA).

Documentation conventions
=========================
@@ -43,7 +47,7 @@ a type's signedness (`S`) and bit width (`N`), respectively.
  ===== =========

For example, `u32` is a type whose valid values are all the 32-bit unsigned
numbers and `s16` is a types whose valid values are all the 16-bit signed
numbers and `s16` is a type whose valid values are all the 16-bit signed
numbers.

Functions
@@ -108,7 +112,7 @@ conformance group means it must support all instructions in that conformance
group.

The use of named conformance groups enables interoperability between a runtime
that executes instructions, and tools as such compilers that generate
that executes instructions, and tools such as compilers that generate
instructions for the runtime.  Thus, capability discovery in terms of
conformance groups might be done manually by users or automatically by tools.

@@ -181,10 +185,13 @@ A basic instruction is encoded as follows::
    (`64-bit immediate instructions`_ reuse this field for other purposes)

  **dst_reg**
    destination register number (0-10)
    destination register number (0-10), unless otherwise specified
    (future instructions might reuse this field for other purposes)

**offset**
  signed integer offset used with pointer arithmetic
  signed integer offset used with pointer arithmetic, except where
  otherwise specified (some arithmetic instructions reuse this field
  for other purposes)

**imm**
  signed integer immediate value
@@ -228,10 +235,12 @@ This is depicted in the following figure::
  operation to perform, encoded as explained above

**regs**
  The source and destination register numbers, encoded as explained above
  The source and destination register numbers (unless otherwise
  specified), encoded as explained above

**offset**
  signed integer offset used with pointer arithmetic
  signed integer offset used with pointer arithmetic, unless
  otherwise specified

**imm**
  signed integer immediate value
@@ -342,8 +351,8 @@ where '(u32)' indicates that the upper 32 bits are zeroed.

  dst = dst ^ imm

Note that most instructions have instruction offset of 0. Only three instructions
(``SDIV``, ``SMOD``, ``MOVSX``) have a non-zero offset.
Note that most arithmetic instructions have 'offset' set to 0. Only three instructions
(``SDIV``, ``SMOD``, ``MOVSX``) have a non-zero 'offset'.

Division, multiplication, and modulo operations for ``ALU`` are part
of the "divmul32" conformance group, and division, multiplication, and
@@ -365,15 +374,15 @@ Note that there are varying definitions of the signed modulo operation
when the dividend or divisor are negative, where implementations often
vary by language such that Python, Ruby, etc.  differ from C, Go, Java,
etc. This specification requires that signed modulo use truncated division
(where -13 % 3 == -1) as implemented in C, Go, etc.:
(where -13 % 3 == -1) as implemented in C, Go, etc.::

   a % n = a - n * trunc(a / n)

The ``MOVSX`` instruction does a move operation with sign extension.
``{MOVSX, X, ALU}`` :term:`sign extends<Sign Extend>` 8-bit and 16-bit operands into 32
bit operands, and zeroes the remaining upper 32 bits.
``{MOVSX, X, ALU}`` :term:`sign extends<Sign Extend>` 8-bit and 16-bit operands into
32-bit operands, and zeroes the remaining upper 32 bits.
``{MOVSX, X, ALU64}`` :term:`sign extends<Sign Extend>` 8-bit, 16-bit, and 32-bit
operands into 64 bit operands.  Unlike other arithmetic instructions,
operands into 64-bit operands.  Unlike other arithmetic instructions,
``MOVSX`` is only defined for register source operands (``X``).

The ``NEG`` instruction is only defined when the source bit is clear
@@ -411,19 +420,19 @@ conformance group.

Examples:

``{END, TO_LE, ALU}`` with imm = 16/32/64 means::
``{END, TO_LE, ALU}`` with 'imm' = 16/32/64 means::

  dst = htole16(dst)
  dst = htole32(dst)
  dst = htole64(dst)

``{END, TO_BE, ALU}`` with imm = 16/32/64 means::
``{END, TO_BE, ALU}`` with 'imm' = 16/32/64 means::

  dst = htobe16(dst)
  dst = htobe32(dst)
  dst = htobe64(dst)

``{END, TO_LE, ALU64}`` with imm = 16/32/64 means::
``{END, TO_LE, ALU64}`` with 'imm' = 16/32/64 means::

  dst = bswap16(dst)
  dst = bswap32(dst)
@@ -438,9 +447,9 @@ otherwise identical operations, and indicates the base64 conformance
group unless otherwise specified.
The 'code' field encodes the operation as below:

========  =====  =======  ===============================  ===================================================
========  =====  =======  =================================  ===================================================
code      value  src_reg  description                        notes
========  =====  =======  ===============================  ===================================================
========  =====  =======  =================================  ===================================================
JA        0x0    0x0      PC += offset                       {JA, K, JMP} only
JA        0x0    0x0      PC += imm                          {JA, K, JMP32} only
JEQ       0x1    any      PC += offset if dst == src
@@ -450,7 +459,7 @@ JSET 0x4 any PC += offset if dst & src
JNE       0x5    any      PC += offset if dst != src
JSGT      0x6    any      PC += offset if dst > src          signed
JSGE      0x7    any      PC += offset if dst >= src         signed
CALL      0x8    0x0      call helper function by address  {CALL, K, JMP} only, see `Helper functions`_
CALL      0x8    0x0      call helper function by static ID  {CALL, K, JMP} only, see `Helper functions`_
CALL      0x8    0x1      call PC += imm                     {CALL, K, JMP} only, see `Program-local functions`_
CALL      0x8    0x2      call helper function by BTF ID     {CALL, K, JMP} only, see `Helper functions`_
EXIT      0x9    0x0      return                             {CALL, K, JMP} only
@@ -458,7 +467,13 @@ JLT 0xa any PC += offset if dst < src unsigned
JLE       0xb    any      PC += offset if dst <= src         unsigned
JSLT      0xc    any      PC += offset if dst < src          signed
JSLE      0xd    any      PC += offset if dst <= src         signed
========  =====  =======  ===============================  ===================================================
========  =====  =======  =================================  ===================================================

where 'PC' denotes the program counter, and the offset to increment by
is in units of 64-bit instructions relative to the instruction following
the jump instruction.  Thus 'PC += 1' skips execution of the next
instruction if it's a basic instruction or results in undefined behavior
if the next instruction is a 128-bit wide instruction.

The BPF program needs to store the return value into register R0 before doing an
``EXIT``.
@@ -475,7 +490,7 @@ where 's>=' indicates a signed '>=' comparison.

  gotol +imm

where 'imm' means the branch offset comes from insn 'imm' field.
where 'imm' means the branch offset comes from the 'imm' field.

Note that there are two flavors of ``JA`` instructions. The
``JMP`` class permits a 16-bit jump offset specified by the 'offset'
@@ -493,26 +508,26 @@ Helper functions
Helper functions are a concept whereby BPF programs can call into a
set of function calls exposed by the underlying platform.

Historically, each helper function was identified by an address
encoded in the imm field.  The available helper functions may differ
for each program type, but address values are unique across all program types.
Historically, each helper function was identified by a static ID
encoded in the 'imm' field.  The available helper functions may differ
for each program type, but static IDs are unique across all program types.

Platforms that support the BPF Type Format (BTF) support identifying
a helper function by a BTF ID encoded in the imm field, where the BTF ID
a helper function by a BTF ID encoded in the 'imm' field, where the BTF ID
identifies the helper name and type.

Program-local functions
~~~~~~~~~~~~~~~~~~~~~~~
Program-local functions are functions exposed by the same BPF program as the
caller, and are referenced by offset from the call instruction, similar to
``JA``.  The offset is encoded in the imm field of the call instruction.
A ``EXIT`` within the program-local function will return to the caller.
``JA``.  The offset is encoded in the 'imm' field of the call instruction.
An ``EXIT`` within the program-local function will return to the caller.

Load and store instructions
===========================

For load and store instructions (``LD``, ``LDX``, ``ST``, and ``STX``), the
8-bit 'opcode' field is divided as::
8-bit 'opcode' field is divided as follows::

  +-+-+-+-+-+-+-+-+
  |mode |sz |class|
@@ -580,7 +595,7 @@ instructions that transfer data between a register and memory.

  dst = *(signed size *) (src + offset)

Where size is one of: ``B``, ``H``, or ``W``, and
Where '<size>' is one of: ``B``, ``H``, or ``W``, and
'signed size' is one of: s8, s16, or s32.

Atomic operations
@@ -662,11 +677,11 @@ src_reg pseudocode imm type dst type
=======  =========================================  ===========  ==============
0x0      dst = (next_imm << 32) | imm               integer      integer
0x1      dst = map_by_fd(imm)                       map fd       map
0x2      dst = map_val(map_by_fd(imm)) + next_imm   map fd       data pointer
0x3      dst = var_addr(imm)                        variable id  data pointer
0x4      dst = code_addr(imm)                       integer      code pointer
0x2      dst = map_val(map_by_fd(imm)) + next_imm   map fd       data address
0x3      dst = var_addr(imm)                        variable id  data address
0x4      dst = code_addr(imm)                       integer      code address
0x5      dst = map_by_idx(imm)                      map index    map
0x6      dst = map_val(map_by_idx(imm)) + next_imm  map index    data pointer
0x6      dst = map_val(map_by_idx(imm)) + next_imm  map index    data address
=======  =========================================  ===========  ==============

where
+2 −0
Original line number Diff line number Diff line
@@ -75,6 +75,8 @@ if major >= 3:
            "__rcu",
            "__user",
            "__force",
            "__counted_by_le",
            "__counted_by_be",

            # include/linux/compiler_attributes.h:
            "__alias",
+56 −0
Original line number Diff line number Diff line
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/net/airoha,en8811h.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#

title: Airoha EN8811H PHY

maintainers:
  - Eric Woudstra <ericwouds@gmail.com>

description:
  The Airoha EN8811H PHY has the ability to reverse polarity
  on the lines to and/or from the MAC. It is reversed by
  the booleans in the devicetree node of the phy.

allOf:
  - $ref: ethernet-phy.yaml#

properties:
  compatible:
    enum:
      - ethernet-phy-id03a2.a411

  reg:
    maxItems: 1

  airoha,pnswap-rx:
    type: boolean
    description:
      Reverse rx polarity of the SERDES. This is the receiving
      side of the lines from the MAC towards the EN881H.

  airoha,pnswap-tx:
    type: boolean
    description:
      Reverse tx polarity of SERDES. This is the transmitting
      side of the lines from EN8811H towards the MAC.

required:
  - reg

unevaluatedProperties: false

examples:
  - |
    mdio {
        #address-cells = <1>;
        #size-cells = <0>;

        ethernet-phy@1 {
            compatible = "ethernet-phy-id03a2.a411";
            reg = <1>;
            airoha,pnswap-rx;
        };
    };
+55 −0
Original line number Diff line number Diff line
# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/net/bluetooth/mediatek,mt7921s-bluetooth.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#

title: MediaTek MT7921S Bluetooth

maintainers:
  - Sean Wang <sean.wang@mediatek.com>

description:
  MT7921S is an SDIO-attached dual-radio WiFi+Bluetooth Combo chip; each
  function is its own SDIO function on a shared SDIO interface. The chip
  has two dedicated reset lines, one for each function core.
  This binding only covers the Bluetooth SDIO function, with one device
  node describing only this SDIO function.

allOf:
  - $ref: bluetooth-controller.yaml#

properties:
  compatible:
    enum:
      - mediatek,mt7921s-bluetooth

  reg:
    const: 2

  reset-gpios:
    maxItems: 1
    description:
      An active-low reset line for the Bluetooth core; on typical M.2
      key E modules this is the W_DISABLE2# pin.

required:
  - compatible
  - reg

unevaluatedProperties: false

examples:
  - |
    #include <dt-bindings/gpio/gpio.h>

    mmc {
        #address-cells = <1>;
        #size-cells = <0>;

        bluetooth@2 {
            compatible = "mediatek,mt7921s-bluetooth";
            reg = <2>;
            reset-gpios = <&pio 8 GPIO_ACTIVE_LOW>;
        };
    };
Loading