Commit Graph

1429046 Commits

Author SHA1 Message Date
Marek Behún
eeee5a710f net: sfp: Fix Ubiquiti U-Fiber Instant SFP module on mvneta
In commit 8110633db4 ("net: sfp-bus: allow SFP quirks to override
Autoneg and pause bits") we moved the setting of Autoneg and pause bits
before the call to SFP quirk when parsing SFP module support.

Since the quirk for Ubiquiti U-Fiber Instant SFP module zeroes the
support bits and sets 1000baseX_Full only, the above mentioned commit
changed the overall computed support from
  1000baseX_Full, Autoneg, Pause, Asym_Pause
to just
  1000baseX_Full.

This broke the SFP module for mvneta, which requires Autoneg for
1000baseX since commit c762b7fac1 ("net: mvneta: deny disabling
autoneg for 802.3z modes").

Fix this by setting back the Autoneg, Pause and Asym_Pause bits in the
quirk.

Fixes: 8110633db4 ("net: sfp-bus: allow SFP quirks to override Autoneg and pause bits")
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20260326122038.2489589-1-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-27 20:55:52 -07:00
Xiang Mei
5d17af9eb2 selftests/tc-testing: add test for HFSC divide-by-zero in rtsc_min()
Add a regression test for the divide-by-zero in rtsc_min() triggered
when m2sm() converts a large m1 value (e.g. 32gbit) to a u64 scaled
slope reaching 2^32. rtsc_min() stores the difference of two such u64
values (sm1 - sm2) in a u32 variable `dsm`, truncating 2^32 to zero
and causing a divide-by-zero oops in the concave-curve intersection
path. The test configures an HFSC class with m1=32gbit d=1ms m2=0bit,
sends a packet to activate the class, waits for it to drain and go
idle, then sends another packet to trigger reactivation through
rtsc_min().

Signed-off-by: Xiang Mei <xmei5@asu.edu>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20260326204310.1549327-2-xmei5@asu.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-27 20:41:11 -07:00
Xiang Mei
4576100b8c net/sched: sch_hfsc: fix divide-by-zero in rtsc_min()
m2sm() converts a u32 slope to a u64 scaled value.  For large inputs
(e.g. m1=4000000000), the result can reach 2^32.  rtsc_min() stores
the difference of two such u64 values in a u32 variable `dsm` and
uses it as a divisor.  When the difference is exactly 2^32 the
truncation yields zero, causing a divide-by-zero oops in the
concave-curve intersection path:

  Oops: divide error: 0000
  RIP: 0010:rtsc_min (net/sched/sch_hfsc.c:601)
  Call Trace:
   init_ed (net/sched/sch_hfsc.c:629)
   hfsc_enqueue (net/sched/sch_hfsc.c:1569)
   [...]

Widen `dsm` to u64 and replace do_div() with div64_u64() so the full
difference is preserved.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20260326204310.1549327-1-xmei5@asu.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-27 20:41:11 -07:00
Jakub Kicinski
1a6fdb3518 Merge branch 'bridge-vxlan-harden-nd-option-parsing-paths'
Yang Yang says:

====================
bridge/vxlan: harden ND option parsing paths

This series hardens ND option parsing in bridge and vxlan paths.

Patch 1 linearizes the request skb in br_nd_send() before walking ND
options. Patch 2 adds explicit ND option length validation in
br_nd_send(). Patch 3 adds matching ND option length validation in
vxlan_na_create().
====================

Link: https://patch.msgid.link/20260326034441.2037420-1-n05ec@lzu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-27 20:37:17 -07:00
afa9a05e6c vxlan: validate ND option lengths in vxlan_na_create
vxlan_na_create() walks ND options according to option-provided
lengths. A malformed option can make the parser advance beyond the
computed option span or use a too-short source LLADDR option payload.

Validate option lengths against the remaining NS option area before
advancing, and only read source LLADDR when the option is large enough
for an Ethernet address.

Fixes: 4b29dba9c0 ("vxlan: fix nonfunctional neigh_reduce()")
Cc: stable@vger.kernel.org
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Tested-by: Ao Zhou <n05ec@lzu.edu.cn>
Co-developed-by: Yuan Tan <tanyuan98@outlook.com>
Signed-off-by: Yuan Tan <tanyuan98@outlook.com>
Suggested-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Yang Yang <n05ec@lzu.edu.cn>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260326034441.2037420-4-n05ec@lzu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-27 20:37:14 -07:00
850837965a bridge: br_nd_send: validate ND option lengths
br_nd_send() walks ND options according to option-provided lengths.
A malformed option can make the parser advance beyond the computed
option span or use a too-short source LLADDR option payload.

Validate option lengths against the remaining NS option area before
advancing, and only read source LLADDR when the option is large enough
for an Ethernet address.

Fixes: ed842faeb2 ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports")
Cc: stable@vger.kernel.org
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Tested-by: Ao Zhou <n05ec@lzu.edu.cn>
Co-developed-by: Yuan Tan <tanyuan98@outlook.com>
Signed-off-by: Yuan Tan <tanyuan98@outlook.com>
Suggested-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Yang Yang <n05ec@lzu.edu.cn>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260326034441.2037420-3-n05ec@lzu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-27 20:37:14 -07:00
a01aee7caf bridge: br_nd_send: linearize skb before parsing ND options
br_nd_send() parses neighbour discovery options from ns->opt[] and
assumes that these options are in the linear part of request.

Its callers only guarantee that the ICMPv6 header and target address
are available, so the option area can still be non-linear. Parsing
ns->opt[] in that case can access data past the linear buffer.

Linearize request before option parsing and derive ns from the linear
network header.

Fixes: ed842faeb2 ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports")
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Tested-by: Ao Zhou <n05ec@lzu.edu.cn>
Co-developed-by: Yuan Tan <tanyuan98@outlook.com>
Signed-off-by: Yuan Tan <tanyuan98@outlook.com>
Suggested-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Yang Yang <n05ec@lzu.edu.cn>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260326034441.2037420-2-n05ec@lzu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-27 20:37:14 -07:00
Jakub Kicinski
c11c731a68 Merge branch 'fix-page-fragment-handling-when-page_size-4k'
Dimitri Daskalakis says:

====================
Fix page fragment handling when PAGE_SIZE > 4K

FBNIC operates on fixed size descriptors (4K). When the OS supports pages
larger than 4K, we fragment the page across multiple descriptors.

While performance testing, I found several issues with our page fragment
handling, resulting in low throughput and potential RX stalls.
====================

Link: https://patch.msgid.link/20260324195123.3486219-1-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-27 20:30:26 -07:00
Dimitri Daskalakis
f3567dd428 eth: fbnic: Fix debugfs output for BDQ's with page frags
The rings size_mask represents the number of pages, so we need to
determine the number of page frags when dumping the descriptors.

Fixes: df04373b0d ("eth fbnic: Add debugfs hooks for tx/rx rings")
Signed-off-by: Dimitri Daskalakis <daskald@meta.com>
Link: https://patch.msgid.link/20260324195123.3486219-3-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-27 20:30:23 -07:00
Dimitri Daskalakis
b38c55320b eth: fbnic: Account for page fragments when updating BDQ tail
FBNIC supports fixed size buffers of 4K. When PAGE_SIZE > 4K, we
fragment the page across multiple descriptors (FBNIC_BD_FRAG_COUNT).
When refilling the BDQ, the correct number of entries are populated,
but tail was only incremented by one. So on a system with 64K pages,
HW would get one descriptor refilled for every 16 we populate.

Additionally, we program the ring size in the HW when enabling the BDQ.
This was not accounting for page fragments, so on systems with 64K pages,
the HW used 1/16th of the ring.

Fixes: 0cb4c0a137 ("eth: fbnic: Implement Rx queue alloc/start/stop/free")
Signed-off-by: Dimitri Daskalakis <daskald@meta.com>
Link: https://patch.msgid.link/20260324195123.3486219-2-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-27 20:30:23 -07:00
Eric Dumazet
2edfa31769 ip6_tunnel: clear skb2->cb[] in ip4ip6_err()
Oskar Kjos reported the following problem.

ip4ip6_err() calls icmp_send() on a cloned skb whose cb[] was written
by the IPv6 receive path as struct inet6_skb_parm. icmp_send() passes
IPCB(skb2) to __ip_options_echo(), which interprets that cb[] region
as struct inet_skb_parm (IPv4). The layouts differ: inet6_skb_parm.nhoff
at offset 14 overlaps inet_skb_parm.opt.rr, producing a non-zero rr
value. __ip_options_echo() then reads optlen from attacker-controlled
packet data at sptr[rr+1] and copies that many bytes into dopt->__data,
a fixed 40-byte stack buffer (IP_OPTIONS_DATA_FIXED_SIZE).

To fix this we clear skb2->cb[], as suggested by Oskar Kjos.

Also add minimal IPv4 header validation (version == 4, ihl >= 5).

Fixes: c4d3efafcc ("[IPV6] IP6TUNNEL: Add support to IPv4 over IPv6 tunnel.")
Reported-by: Oskar Kjos <oskar.kjos@hotmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260326155138.2429480-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-27 20:24:12 -07:00
Eric Dumazet
86ab3e5567 ipv6: icmp: clear skb2->cb[] in ip6_err_gen_icmpv6_unreach()
Sashiko AI-review observed:

  In ip6_err_gen_icmpv6_unreach(), the skb is an outer IPv4 ICMP error packet
  where its cb contains an IPv4 inet_skb_parm. When skb is cloned into skb2
  and passed to icmp6_send(), it uses IP6CB(skb2).

  IP6CB interprets the IPv4 inet_skb_parm as an inet6_skb_parm. The cipso
  offset in inet_skb_parm.opt directly overlaps with dsthao in inet6_skb_parm
  at offset 18.

  If an attacker sends a forged ICMPv4 error with a CIPSO IP option, dsthao
  would be a non-zero offset. Inside icmp6_send(), mip6_addr_swap() is called
  and uses ipv6_find_tlv(skb, opt->dsthao, IPV6_TLV_HAO).

  This would scan the inner, attacker-controlled IPv6 packet starting at that
  offset, potentially returning a fake TLV without checking if the remaining
  packet length can hold the full 18-byte struct ipv6_destopt_hao.

  Could mip6_addr_swap() then perform a 16-byte swap that extends past the end
  of the packet data into skb_shared_info?

  Should the cb array also be cleared in ip6_err_gen_icmpv6_unreach() and
  ip6ip6_err() to prevent this?

This patch implements the first suggestion.

I am not sure if ip6ip6_err() needs to be changed.
A separate patch would be better anyway.

Fixes: ca15a078bd ("sit: generate icmpv6 error when receiving icmpv4 error")
Reported-by: Ido Schimmel <idosch@nvidia.com>
Closes: https://sashiko.dev/#/patchset/20260326155138.2429480-1-edumazet%40google.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Oskar Kjos <oskar.kjos@hotmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260326202608.2976021-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-27 20:22:46 -07:00
David Carlier
5597dd284f net: ti: icssg-prueth: fix missing data copy and wrong recycle in ZC RX dispatch
emac_dispatch_skb_zc() allocates a new skb via napi_alloc_skb() but
never copies the packet data from the XDP buffer into it. The skb is
passed up the stack containing uninitialized heap memory instead of
the actual received packet, leaking kernel heap contents to userspace.

Copy the received packet data from the XDP buffer into the skb using
skb_copy_to_linear_data().

Additionally, remove the skb_mark_for_recycle() call since the skb is
backed by the NAPI page frag allocator, not page_pool. Marking a
non-page_pool skb for recycle causes the free path to return pages to
a page_pool that does not own them, corrupting page_pool state.

The non-ZC path (emac_rx_packet) does not have these issues because it
uses napi_build_skb() to wrap the existing page_pool page directly,
requiring no copy, and correctly marks for recycle since the page comes
from page_pool_dev_alloc_pages().

Fixes: 7a64bb388d ("net: ti: icssg-prueth: Add AF_XDP zero copy for RX")
Signed-off-by: David Carlier <devnexen@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2026-03-27 12:08:26 +00:00
Thomas Bogendoerfer
bb417456c7 tg3: Fix race for querying speed/duplex
When driver signals carrier up via netif_carrier_on() its internal
link_up state isn't updated immediately. This leads to inconsistent
speed/duplex in /proc/net/bonding/bondX where the speed and duplex
is shown as unknown while ethtool shows correct values. Fix this by
using netif_carrier_ok() for link checking in get_ksettings function.

Fixes: 84421b99ce ("tg3: Update link_up flag for phylib devices")
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2026-03-27 12:06:38 +00:00
Pengpeng Hou
5e67ba9bb5 net/ipv6: ioam6: prevent schema length wraparound in trace fill
ioam6_fill_trace_data() stores the schema contribution to the trace
length in a u8. With bit 22 enabled and the largest schema payload,
sclen becomes 1 + 1020 / 4, wraps from 256 to 0, and bypasses the
remaining-space check. __ioam6_fill_trace_data() then positions the
write cursor without reserving the schema area but still copies the
4-byte schema header and the full schema payload, overrunning the trace
buffer.

Keep sclen in an unsigned int so the remaining-space check and the write
cursor calculation both see the full schema length.

Fixes: 8c6f6fa677 ("ipv6: ioam: IOAM Generic Netlink API")
Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
Reviewed-by: Justin Iurman <justin.iurman@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2026-03-27 12:05:36 +00:00
Yochai Eisenrich
ae05340cca net: ipv6: ndisc: fix ndisc_ra_useropt to initialize nduseropt_padX fields to zero to prevent an info-leak
When processing Router Advertisements with user options the kernel
builds an RTM_NEWNDUSEROPT netlink message. The nduseroptmsg struct
has three padding fields that are never zeroed and can leak kernel data

The fix is simple, just zeroes the padding fields.

Fixes: 31910575a9 ("[IPv6]: Export userland ND options through netlink (RDNSS support)")
Signed-off-by: Yochai Eisenrich <echelonh@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260324224925.2437775-1-echelonh@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-26 20:38:35 -07:00
Jiayuan Chen
2428083101 net: qrtr: replace qrtr_tx_flow radix_tree with xarray to fix memory leak
__radix_tree_create() allocates and links intermediate nodes into the
tree one by one. If a subsequent allocation fails, the already-linked
nodes remain in the tree with no corresponding leaf entry. These orphaned
internal nodes are never reclaimed because radix_tree_for_each_slot()
only visits slots containing leaf values.

The radix_tree API is deprecated in favor of xarray. As suggested by
Matthew Wilcox, migrate qrtr_tx_flow from radix_tree to xarray instead
of fixing the radix_tree itself [1]. xarray properly handles cleanup of
internal nodes — xa_destroy() frees all internal xarray nodes when the
qrtr_node is released, preventing the leak.

[1] https://lore.kernel.org/all/20260225071623.41275-1-jiayuan.chen@linux.dev/T/
Reported-by: syzbot+006987d1be3586e13555@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000bfba3a060bf4ffcf@google.com/T/
Fixes: 5fdeb0d372 ("net: qrtr: Implement outgoing flow control")
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260324080645.290197-1-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-26 20:22:38 -07:00
Jakub Kicinski
04f272188f Merge branch 'net-enetc-safely-reinitialize-tx-bd-ring-when-it-has-unsent-frames'
Wei Fang says:

====================
net: enetc: safely reinitialize TX BD ring when it has unsent frames

Currently the driver does not reset the producer index register (PIR) and
consumer index register (CIR) when initializing a TX BD ring. The driver
only reads the PIR and CIR and initializes the software indexes. If the
TX BD ring is reinitialized when it still contains unsent frames, its PIR
and CIR will not be equal after the reinitialization. However, the BDs
between CIR and PIR have been freed and become invalid and this can lead
to a hardware malfunction, causing the TX BD ring will not work properly.

Since the PIR and CIR are sofeware-configurable on ENETC v4. Therefore,
the driver must reset them if they are not equal when reinitializing
the TX BD ring.

However, resetting the PIR and CIR alone is insufficient, it cannot
completely solve the problem. When a link-down event occurs while the TX
BD ring is transmitting frames, subsequent reinitialization of the TX BD
ring may cause it to malfunction. Because enetc4_pl_mac_link_down() only
clears PMa_COMMAND_CONFIG[TX_EN] to disable MAC transmit data path. It
doesn't set PORT[TXDIS] to 1 to flush the TX BD ring. Therefore, it is
not safe to reinitialize the TX BD ring at this point.

To safely reinitialize the TX BD ring after a link-down event, we checked
with the NETC IP team, a proper Ethernet MAC graceful stop is necessary.
Therefore, add the Ethernet MAC graceful stop to the link-down event
handler enetc4_pl_mac_link_down(). Note that this patch set is not
applicable to ENETC v1 (LS1028A).
====================

Link: https://patch.msgid.link/20260324062121.2745033-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-26 20:19:09 -07:00
Wei Fang
f2df9567b1 net: enetc: do not access non-existent registers on pseudo MAC
The ENETC4_PM_IEVENT and ENETC4_PM_CMD_CFG registers do not exist on the
ENETC pseudo MAC, so the driver should prevent from accessing them.

Fixes: 5175c1e4ad ("net: enetc: add basic support for the ENETC with pseudo MAC for i.MX94")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Tested-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260324062121.2745033-4-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-26 20:19:06 -07:00
Wei Fang
2725d84efe net: enetc: add graceful stop to safely reinitialize the TX Ring
For ENETC v4, the PIR and CIR will be reset if they are not equal when
reinitializing the TX BD ring. However, resetting the PIR and CIR alone
is insufficient. When a link-down event occurs while the TX BD ring is
transmitting frames, subsequent reinitialization of the TX BD ring may
cause it to malfunction. For example, the below steps can reproduce the
problem.

1. Unplug the cable when the TX BD ring is busy transmitting frames.
2. Disable the network interface (ifconfig eth0 down).
3. Re-enable the network interface (ifconfig eth0 up).
4. Plug in the cable, the TX BD ring may fail to transmit packets.

When the link-down event occurs, enetc4_pl_mac_link_down() only clears
PMa_COMMAND_CONFIG[TX_EN] to disable MAC transmit data path. It doesn't
set PORT[TXDIS] to 1 to flush the TX BD ring. Therefore, reinitializing
the TX BD ring at this point is unsafe. To safely reinitialize the TX BD
ring after a link-down event, we checked with the NETC IP team, a proper
Ethernet MAC graceful stop is necessary. Therefore, add the Ethernet MAC
graceful stop to the link-down event handler enetc4_pl_mac_link_down().

Fixes: 99100d0d99 ("net: enetc: add preliminary support for i.MX95 ENETC PF")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260324062121.2745033-3-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-26 20:19:06 -07:00
Wei Fang
0239fd701d net: enetc: reset PIR and CIR if they are not equal when initializing TX ring
Currently the driver does not reset the producer index register (PIR) and
consumer index register (CIR) when initializing a TX BD ring. The driver
only reads the PIR and CIR and initializes the software indexes. If the
TX BD ring is reinitialized when it still contains unsent frames, its PIR
and CIR will not be equal after the reinitialization. However, the BDs
between CIR and PIR have been freed and become invalid and this can lead
to a hardware malfunction, causing the TX BD ring will not work properly.

For ENETC v4, it supports software to set the PIR and CIR, so the driver
can reset these two registers if they are not equal when reinitializing
the TX BD ring. Therefore, add this solution for ENETC v4. Note that this
patch does not work for ENETC v1 because it does not support software to
set the PIR and CIR.

Fixes: 99100d0d99 ("net: enetc: add preliminary support for i.MX95 ENETC PF")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260324062121.2745033-2-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-26 20:19:05 -07:00
Buday Csaba
e8e44c98f7 net: fec: fix the PTP periodic output sysfs interface
When the PPS channel configuration was implemented, the channel
index for the periodic outputs was configured as the hardware
channel number.

The sysfs interface uses a logical channel index, and rejects numbers
greater than `n_per_out` (see period_store() in ptp_sysfs.c).
That property was left at 1, since the driver implements channel
selection, not simultaneous operation of multiple PTP hardware timer
channels.

A second check in fec_ptp_enable() returns -EOPNOTSUPP when the two
channel numbers disagree, making channels 1..3 unusable from sysfs.

Fix by removing this redundant check in the FEC PTP driver.

Fixes: 566c2d8388 ("net: fec: make PPS channel configurable")
Signed-off-by: Buday Csaba <buday.csaba@prolan.hu>
Link: https://patch.msgid.link/8ec2afe88423c2231f9cf8044d212ce57846670e.1774359059.git.buday.csaba@prolan.hu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-26 20:10:15 -07:00
Qingfang Deng
57a04a13aa netdevsim: fix build if SKB_EXTENSIONS=n
__skb_ext_put() is not declared if SKB_EXTENSIONS is not enabled, which
causes a build error:

drivers/net/netdevsim/netdev.c: In function 'nsim_forward_skb':
drivers/net/netdevsim/netdev.c:114:25: error: implicit declaration of function '__skb_ext_put'; did you mean 'skb_ext_put'? [-Werror=implicit-function-declaration]
  114 |                         __skb_ext_put(psp_ext);
      |                         ^~~~~~~~~~~~~
      |                         skb_ext_put
cc1: some warnings being treated as errors

Add a stub to fix the build.

Fixes: 7d9351435e ("netdevsim: drop PSP ext ref on forward failure")
Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Link: https://patch.msgid.link/20260324140857.783-1-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-26 20:08:58 -07:00
Sven Eckelmann (Plasma Cloud)
976ff48c2a net: ethernet: mtk_ppe: avoid NULL deref when gmac0 is disabled
If the gmac0 is disabled, the precheck for a valid ingress device will
cause a NULL pointer deref and crash the system. This happens because
eth->netdev[0] will be NULL but the code will directly try to access
netdev_ops.

Instead of just checking for the first net_device, it must be checked if
any of the mtk_eth net_devices is matching the netdev_ops of the ingress
device.

Cc: stable@vger.kernel.org
Fixes: 73cfd947db ("net: ethernet: mtk_eth_soc: ppe: prevent ppe update for non-mtk devices")
Signed-off-by: Sven Eckelmann (Plasma Cloud) <se@simonwunderlich.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260324-wed-crash-gmac0-disabled-v1-1-3bc388aee565@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-26 19:01:25 -07:00
Dipayaan Roy
f73896b419 net: mana: Fix RX skb truesize accounting
MANA passes rxq->alloc_size to napi_build_skb() for all RX buffers.
It is correct for fragment-backed RX buffers, where alloc_size matches
the actual backing allocation used for each packet buffer. However, in
the non-fragment RX path mana allocates a full page, or a higher-order
page, per RX buffer. In that case alloc_size only reflects the usable
packet area and not the actual backing memory.

This causes napi_build_skb() to underestimate the skb backing allocation
in the single-buffer RX path, so skb->truesize is derived from a value
smaller than the real RX buffer allocation.

Fix this by updating alloc_size in the non-fragment RX path to the
actual backing allocation size before it is passed to napi_build_skb().

Fixes: 730ff06d3f ("net: mana: Use page pool fragments for RX buffers instead of full pages to improve memory efficiency.")
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/acLUhLpLum6qrD/N@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-26 18:57:07 -07:00
Sabrina Dubroca
629ec78ef8 mpls: add seqcount to protect the platform_label{,s} pair
The RCU-protected codepaths (mpls_forward, mpls_dump_routes) can have
an inconsistent view of platform_labels vs platform_label in case of a
concurrent resize (resize_platform_label_table, under
platform_mutex). This can lead to OOB accesses.

This patch adds a seqcount, so that we get a consistent snapshot.

Note that mpls_label_ok is also susceptible to this, so the check
against RTA_DST in rtm_to_route_config, done outside platform_mutex,
is not sufficient. This value gets passed to mpls_label_ok once more
in both mpls_route_add and mpls_route_del, so there is no issue, but
that additional check must not be removed.

Reported-by: Yuan Tan <tanyuan98@outlook.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Fixes: 7720c01f3f ("mpls: Add a sysctl to control the size of the mpls label table")
Fixes: dde1b38e87 ("mpls: Convert mpls_dump_routes() to RCU.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/cd8fca15e3eb7e212b094064cd83652e20fd9d31.1774284088.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-26 18:32:14 -07:00
Jakub Kicinski
45dbf8fcea Merge tag 'wireless-2026-03-26' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:

====================
Couple more fixes:

 - virt_wifi: remove SET_NETDEV_DEV to avoid UAF on teardown
 - iwlwifi:
   - fix (some) devices that don't have 6 GHz (WiFi6E)
   - fix potential OOB read of firmware notification
   - set WiFi generation for firmware to avoid packet drops
   - fix multi-link scan timing
 - wilc1000: fix integer overflow
 - ath11k/ath12k: fix TID during A-MPDU session teardown
 - wl1251: don't trust firmware TX status response index

* tag 'wireless-2026-03-26' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: virt_wifi: remove SET_NETDEV_DEV to avoid use-after-free
  wifi: iwlwifi: mvm: fix potential out-of-bounds read in iwl_mvm_nd_match_info_handler()
  wifi: wl1251: validate packet IDs before indexing tx_frames
  wifi: wilc1000: fix u8 overflow in SSID scan buffer size calculation
  wifi: ath12k: Pass the correct value of each TID during a stop AMPDU session
  wifi: ath11k: Pass the correct value of each TID during a stop AMPDU session
  wifi: iwlwifi: mld: correctly set wifi generation data
  wifi: iwlwifi: mvm: don't send a 6E related command when not supported
  wifi: iwlwifi: mld: Fix MLO scan timing
====================

Link: https://patch.msgid.link/20260326093329.77815-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-26 17:51:39 -07:00
Linus Torvalds
453a4a5f97 Merge tag 'net-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from Bluetooth, CAN, IPsec and Netfilter.

  Notably, this includes the fix for the Bluetooth regression that you
  were notified about. I'm not aware of any other pending regressions.

  Current release - regressions:

    - bluetooth:
       - fix stack-out-of-bounds read in l2cap_ecred_conn_req
       - fix regressions caused by reusing ident

    - netfilter: revisit array resize logic

    - eth: ice: set max queues in alloc_etherdev_mqs()

  Previous releases - regressions:

    - core: correctly handle tunneled traffic on IPV6_CSUM GSO fallback

    - bluetooth:
       - fix dangling pointer on mgmt_add_adv_patterns_monitor_complete
       - fix deadlock in l2cap_conn_del()

    - sched: codel: fix stale state for empty flows in fq_codel

    - ipv6: remove permanent routes from tb6_gc_hlist when all exceptions expire.

    - xfrm: fix skb_put() panic on non-linear skb during reassembly

    - openvswitch:
       - avoid releasing netdev before teardown completes
       - validate MPLS set/set_masked payload length

    - eth: iavf: fix out-of-bounds writes in iavf_get_ethtool_stats()

  Previous releases - always broken:

    - bluetooth: fix null-ptr-deref on l2cap_sock_ready_cb

    - udp: fix wildcard bind conflict check when using hash2

    - netfilter: fix use of uninitialized rtp_addr in process_sdp

    - tls: Purge async_hold in tls_decrypt_async_wait()

    - xfrm:
       - prevent policy_hthresh.work from racing with netns teardown
       - fix skb leak with espintcp and async crypto

    - smc: fix double-free of smc_spd_priv when tee() duplicates splice pipe buffer

    - can:
       - add missing error handling to call can_ctrlmode_changelink()
       - fix OOB heap access in cgw_csum_crc8_rel()

    - eth:
       - mana: fix use-after-free in add_adev() error path
       - virtio-net: fix for VIRTIO_NET_F_GUEST_HDRLEN
       - bcmasp: fix double free of WoL irq"

* tag 'net-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (90 commits)
  net: macb: use the current queue number for stats
  netfilter: ctnetlink: use netlink policy range checks
  netfilter: nf_conntrack_sip: fix use of uninitialized rtp_addr in process_sdp
  netfilter: nf_conntrack_expect: skip expectations in other netns via proc
  netfilter: nf_conntrack_expect: store netns and zone in expectation
  netfilter: ctnetlink: ensure safe access to master conntrack
  netfilter: nf_conntrack_expect: use expect->helper
  netfilter: nf_conntrack_expect: honor expectation helper field
  netfilter: nft_set_rbtree: revisit array resize logic
  netfilter: ip6t_rt: reject oversized addrnr in rt_mt6_check()
  netfilter: nfnetlink_log: fix uninitialized padding leak in NFULA_PAYLOAD
  tls: Purge async_hold in tls_decrypt_async_wait()
  selftests: netfilter: nft_concat_range.sh: add check for flush+reload bug
  netfilter: nft_set_pipapo_avx2: don't return non-matching entry on expiry
  Bluetooth: btusb: clamp SCO altsetting table indices
  Bluetooth: L2CAP: Fix ERTM re-init and zero pdu_len infinite loop
  Bluetooth: L2CAP: Fix deadlock in l2cap_conn_del()
  Bluetooth: btintel: serialize btintel_hw_error() with hci_req_sync_lock
  Bluetooth: L2CAP: Fix send LE flow credits in ACL link
  net: mana: fix use-after-free in add_adev() error path
  ...
2026-03-26 09:53:08 -07:00
Linus Torvalds
75c78a4faa Merge tag 'pinctrl-v7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:

 - Implement .get_direction() in the spmi-gpio gpio_chip

   Recent changes makes this start to print warnings and it's not nice,
   let's just fix it

 - Clamp the return value of gpio_get() in the Renesas RZA1 driver

 - Add the GPIO_GENERIC dependency to the STM32 HDP driver

 - Modify the Mediatek driver to accept devices that do not use external
   interrupts (EINT) at all

 - Fix flag propagation in the Sunxi driver, so that we can fix an issue
   with uninitialized pins in a follow-up patch using said flags

* tag 'pinctrl-v7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: sunxi: fix gpiochip_lock_as_irq() failure when pinmux is unknown
  pinctrl: sunxi: pass down flags to pinctrl routines
  pinctrl: mediatek: common: Fix probe failure for devices without EINT
  pinctrl: stm32: fix HDP driver dependency on GPIO_GENERIC
  pinctrl: renesas: rza1: Normalize return value of gpio_get()
  pinctrl: qcom: spmi-gpio: implement .get_direction()
  pinctrl: renesas: rzt2h: Fix invalid wait context
  pinctrl: renesas: rzt2h: Fix device node leak in rzt2h_gpio_register()
2026-03-26 08:35:51 -07:00
Linus Torvalds
dabb83ecf4 Merge tag 'dma-mapping-7.0-2026-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux
Pull dma-mapping fixes from Marek Szyprowski:
 "A set of fixes for DMA-mapping subsystem, which resolve false-
  positive warnings from KMSAN and DMA-API debug (Shigeru Yoshida
  and Leon Romanovsky) as well as a simple build fix (Miguel Ojeda)"

* tag 'dma-mapping-7.0-2026-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
  dma-mapping: add missing `inline` for `dma_free_attrs`
  mm/hmm: Indicate that HMM requires DMA coherency
  RDMA/umem: Tell DMA mapping that UMEM requires coherency
  iommu/dma: add support for DMA_ATTR_REQUIRE_COHERENT attribute
  dma-direct: prevent SWIOTLB path when DMA_ATTR_REQUIRE_COHERENT is set
  dma-mapping: Introduce DMA require coherency attribute
  dma-mapping: Clarify valid conditions for CPU cache line overlap
  dma-mapping: handle DMA_ATTR_CPU_CACHE_CLEAN in trace output
  dma-debug: Allow multiple invocations of overlapping entries
  dma: swiotlb: add KMSAN annotations to swiotlb_bounce()
2026-03-26 08:22:07 -07:00
Paolo Abeni
db472c34a7 Merge tag 'nf-26-03-26' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter for net

This is v3, I kept back an ipset fix and another to tigthen the xtables
interface to reject invalid combinations with the NFPROTO_ARP family.
They need a bit more discussion. I fixed the issues reported by AI on
patch 9 (add #ifdef to access ct zone, update nf_conntrack_broadcast
and patch 10 (use better Fixes: tag). Thanks!

The following patchset contains Netfilter fixes for *net*.

Note that most bugs fixed here stem from 2.6 days, the large PR is not
due to an increase in regressions.

1) Fix incorrect reject of set updates with nf_tables pipapo set
   avx2 backend.  This comes with a regression test in patch 2.
   From Florian Westphal.

2) nfnetlink_log needs to zero padding to prevent infoleak to userspace,
   from Weiming Shi.

3) xtables ip6t_rt module never validated that addrnr length is within the
   allowed array boundary. Reject bogus values.  From Ren Wei.

4) Fix high memory usage in rbtree set backend that was unwanted side-effect
   of the recently added binary search blob. From Pablo Neira Ayuso.

5) Patches 5 to 10, also from Pablo, address long-standing RCU safety bugs
   in conntracks handling of expectations: We can never safely defer
   a conntrack extension area without holding a reference. Yet expectation
   handling does so in multiple places.  Fix this by avoiding the need to
   look into the master conntrack to begin with and by extending locked
   sections in a few places.

11) Fix use of uninitialized rtp_addr in the sip conntrack helper,
    also from Weiming Shi.

12) Add stricter netlink policy checks in ctnetlink, from David Carlier.
    This avoids undefined behaviour when userspace provides huge wscale
    value.

netfilter pull request 26-03-26

* tag 'nf-26-03-26' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: ctnetlink: use netlink policy range checks
  netfilter: nf_conntrack_sip: fix use of uninitialized rtp_addr in process_sdp
  netfilter: nf_conntrack_expect: skip expectations in other netns via proc
  netfilter: nf_conntrack_expect: store netns and zone in expectation
  netfilter: ctnetlink: ensure safe access to master conntrack
  netfilter: nf_conntrack_expect: use expect->helper
  netfilter: nf_conntrack_expect: honor expectation helper field
  netfilter: nft_set_rbtree: revisit array resize logic
  netfilter: ip6t_rt: reject oversized addrnr in rt_mt6_check()
  netfilter: nfnetlink_log: fix uninitialized padding leak in NFULA_PAYLOAD
  selftests: netfilter: nft_concat_range.sh: add check for flush+reload bug
  netfilter: nft_set_pipapo_avx2: don't return non-matching entry on expiry
====================

Link: https://patch.msgid.link/20260326125153.685915-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-26 15:38:14 +01:00
Paolo Abeni
deec4f7b41 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
For ice:
Michal corrects call to alloc_etherdev_mqs() to provide maximum number
of queues supported rather than currently allocated number of queues.

Petr Oros fixes issues related to some ethtool operations in switchdev
mode.

For iavf:
Kohei Enju corrects number of reported queues for ethtool statistics to
absolute max as using current number could race and cause out-of-bounds
issues.

For idpf:
Josh NULLs cdev_info pointer after freeing to prevent possible subsequent
improper access. He also defers setting of refillqs value until after
allocation to prevent possible NULL pointer dereference.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  idpf: only assign num refillqs if allocation was successful
  idpf: clear stale cdev_info ptr
  iavf: fix out-of-bounds writes in iavf_get_ethtool_stats()
  ice: use ice_update_eth_stats() for representor stats
  ice: fix inverted ready check for VF representors
  ice: set max queues in alloc_etherdev_mqs()
====================

Link: https://patch.msgid.link/20260323205843.624704-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-26 15:14:52 +01:00
Paolo Valerio
72d96e4e24 net: macb: use the current queue number for stats
There's a potential mismatch between the memory reserved for statistics
and the amount of memory written.

gem_get_sset_count() correctly computes the number of stats based on the
active queues, whereas gem_get_ethtool_stats() indiscriminately copies
data using the maximum number of queues, and in the case the number of
active queues is less than MACB_MAX_QUEUES, this results in a OOB write
as observed in the KASAN splat.

==================================================================
BUG: KASAN: vmalloc-out-of-bounds in gem_get_ethtool_stats+0x54/0x78
  [macb]
Write of size 760 at addr ffff80008080b000 by task ethtool/1027

CPU: [...]
Tainted: [E]=UNSIGNED_MODULE
Hardware name: raspberrypi rpi/rpi, BIOS 2025.10 10/01/2025
Call trace:
 show_stack+0x20/0x38 (C)
 dump_stack_lvl+0x80/0xf8
 print_report+0x384/0x5e0
 kasan_report+0xa0/0xf0
 kasan_check_range+0xe8/0x190
 __asan_memcpy+0x54/0x98
 gem_get_ethtool_stats+0x54/0x78 [macb
   926c13f3af83b0c6fe64badb21ec87d5e93fcf65]
 dev_ethtool+0x1220/0x38c0
 dev_ioctl+0x4ac/0xca8
 sock_do_ioctl+0x170/0x1d8
 sock_ioctl+0x484/0x5d8
 __arm64_sys_ioctl+0x12c/0x1b8
 invoke_syscall+0xd4/0x258
 el0_svc_common.constprop.0+0xb4/0x240
 do_el0_svc+0x48/0x68
 el0_svc+0x40/0xf8
 el0t_64_sync_handler+0xa0/0xe8
 el0t_64_sync+0x1b0/0x1b8

The buggy address belongs to a 1-page vmalloc region starting at
  0xffff80008080b000 allocated at dev_ethtool+0x11f0/0x38c0
The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000
  index:0xffff00000a333000 pfn:0xa333
flags: 0x7fffc000000000(node=0|zone=0|lastcpupid=0x1ffff)
raw: 007fffc000000000 0000000000000000 dead000000000122 0000000000000000
raw: ffff00000a333000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff80008080b080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff80008080b100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff80008080b180: 00 00 00 00 00 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
                                  ^
 ffff80008080b200: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
 ffff80008080b280: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
==================================================================

Fix it by making sure the copied size only considers the active number of
queues.

Fixes: 512286bbd4 ("net: macb: Added some queue statistics")
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Link: https://patch.msgid.link/20260323191634.2185840-1-pvalerio@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-26 13:48:21 +01:00
Paolo Abeni
aa637b2cf3 Merge tag 'for-net-2026-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - L2CAP: Fix deadlock in l2cap_conn_del()
 - L2CAP: Fix ERTM re-init and zero pdu_len infinite loop
 - L2CAP: Fix send LE flow credits in ACL link
 - btintel: serialize btintel_hw_error() with hci_req_sync_lock
 - btusb: clamp SCO altsetting table indices

* tag 'for-net-2026-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: btusb: clamp SCO altsetting table indices
  Bluetooth: L2CAP: Fix ERTM re-init and zero pdu_len infinite loop
  Bluetooth: L2CAP: Fix deadlock in l2cap_conn_del()
  Bluetooth: btintel: serialize btintel_hw_error() with hci_req_sync_lock
  Bluetooth: L2CAP: Fix send LE flow credits in ACL link
====================

Link: https://patch.msgid.link/20260325194358.618892-1-luiz.dentz@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-26 13:46:55 +01:00
David Carlier
8f15b5071b netfilter: ctnetlink: use netlink policy range checks
Replace manual range and mask validations with netlink policy
annotations in ctnetlink code paths, so that the netlink core rejects
invalid values early and can generate extack errors.

- CTA_PROTOINFO_TCP_STATE: reject values > TCP_CONNTRACK_SYN_SENT2 at
  policy level, removing the manual >= TCP_CONNTRACK_MAX check.
- CTA_PROTOINFO_TCP_WSCALE_ORIGINAL/REPLY: reject values > TCP_MAX_WSCALE
  (14). The normal TCP option parsing path already clamps to this value,
  but the ctnetlink path accepted 0-255, causing undefined behavior when
  used as a u32 shift count.
- CTA_FILTER_ORIG_FLAGS/REPLY_FLAGS: use NLA_POLICY_MASK with
  CTA_FILTER_F_ALL, removing the manual mask checks.
- CTA_EXPECT_FLAGS: use NLA_POLICY_MASK with NF_CT_EXPECT_MASK, adding
  a new mask define grouping all valid expect flags.

Extracted from a broader nf-next patch by Florian Westphal, scoped to
ctnetlink for the fixes tree.

Fixes: c8e2078cfe ("[NETFILTER]: ctnetlink: add support for internal tcp connection tracking flags handling")
Signed-off-by: David Carlier <devnexen@gmail.com>
Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-03-26 13:28:17 +01:00
Weiming Shi
6a2b724460 netfilter: nf_conntrack_sip: fix use of uninitialized rtp_addr in process_sdp
process_sdp() declares union nf_inet_addr rtp_addr on the stack and
passes it to the nf_nat_sip sdp_session hook after walking the SDP
media descriptions. However rtp_addr is only initialized inside the
media loop when a recognized media type with a non-zero port is found.

If the SDP body contains no m= lines, only inactive media sections
(m=audio 0 ...) or only unrecognized media types, rtp_addr is never
assigned. Despite that, the function still calls hooks->sdp_session()
with &rtp_addr, causing nf_nat_sdp_session() to format the stale stack
value as an IP address and rewrite the SDP session owner and connection
lines with it.

With CONFIG_INIT_STACK_ALL_ZERO (default on most distributions) this
results in the session-level o= and c= addresses being rewritten to
0.0.0.0 for inactive SDP sessions. Without stack auto-init the
rewritten address is whatever happened to be on the stack.

Fix this by pre-initializing rtp_addr from the session-level connection
address (caddr) when available, and tracking via a have_rtp_addr flag
whether any valid address was established. Skip the sdp_session hook
entirely when no valid address exists.

Fixes: 4ab9e64e5e ("[NETFILTER]: nf_nat_sip: split up SDP mangling")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-03-26 13:28:17 +01:00
Pablo Neira Ayuso
3db5647984 netfilter: nf_conntrack_expect: skip expectations in other netns via proc
Skip expectations that do not reside in this netns.

Similar to e77e6ff502 ("netfilter: conntrack: do not dump other netns's
conntrack entries via proc").

Fixes: 9b03f38d04 ("netfilter: netns nf_conntrack: per-netns expectations")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-03-26 13:28:03 +01:00
Pablo Neira Ayuso
02a3231b6d netfilter: nf_conntrack_expect: store netns and zone in expectation
__nf_ct_expect_find() and nf_ct_expect_find_get() are called under
rcu_read_lock() but they dereference the master conntrack via
exp->master.

Since the expectation does not hold a reference on the master conntrack,
this could be dying conntrack or different recycled conntrack than the
real master due to SLAB_TYPESAFE_RCU.

Store the netns, the master_tuple and the zone in struct
nf_conntrack_expect as a safety measure.

This patch is required by the follow up fix not to dump expectations
that do not belong to this netns.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-03-26 13:24:40 +01:00
Pablo Neira Ayuso
bffcaad9af netfilter: ctnetlink: ensure safe access to master conntrack
Holding reference on the expectation is not sufficient, the master
conntrack object can just go away, making exp->master invalid.

To access exp->master safely:

- Grab the nf_conntrack_expect_lock, this gets serialized with
  clean_from_lists() which also holds this lock when the master
  conntrack goes away.

- Hold reference on master conntrack via nf_conntrack_find_get().
  Not so easy since the master tuple to look up for the master conntrack
  is not available in the existing problematic paths.

This patch goes for extending the nf_conntrack_expect_lock section
to address this issue for simplicity, in the cases that are described
below this is just slightly extending the lock section.

The add expectation command already holds a reference to the master
conntrack from ctnetlink_create_expect().

However, the delete expectation command needs to grab the spinlock
before looking up for the expectation. Expand the existing spinlock
section to address this to cover the expectation lookup. Note that,
the nf_ct_expect_iterate_net() calls already grabs the spinlock while
iterating over the expectation table, which is correct.

The get expectation command needs to grab the spinlock to ensure master
conntrack does not go away. This also expands the existing spinlock
section to cover the expectation lookup too. I needed to move the
netlink skb allocation out of the spinlock to keep it GFP_KERNEL.

For the expectation events, the IPEXP_DESTROY event is already delivered
under the spinlock, just move the delivery of IPEXP_NEW under the
spinlock too because the master conntrack event cache is reached through
exp->master.

While at it, add lockdep notations to help identify what codepaths need
to grab the spinlock.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-03-26 13:18:32 +01:00
Pablo Neira Ayuso
f017941060 netfilter: nf_conntrack_expect: use expect->helper
Use expect->helper in ctnetlink and /proc to dump the helper name.
Using nfct_help() without holding a reference to the master conntrack
is unsafe.

Use exp->master->helper in ctnetlink path if userspace does not provide
an explicit helper when creating an expectation to retain the existing
behaviour. The ctnetlink expectation path holds the reference on the
master conntrack and nf_conntrack_expect lock and the nfnetlink glue
path refers to the master ct that is attached to the skb.

Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-03-26 13:18:31 +01:00
Pablo Neira Ayuso
9c42bc9db9 netfilter: nf_conntrack_expect: honor expectation helper field
The expectation helper field is mostly unused. As a result, the
netfilter codebase relies on accessing the helper through exp->master.

Always set on the expectation helper field so it can be used to reach
the helper.

nf_ct_expect_init() is called from packet path where the skb owns
the ct object, therefore accessing exp->master for the newly created
expectation is safe. This saves a lot of updates in all callsites
to pass the ct object as parameter to nf_ct_expect_init().

This is a preparation patches for follow up fixes.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-03-26 13:18:31 +01:00
Pablo Neira Ayuso
fafdd92b9e netfilter: nft_set_rbtree: revisit array resize logic
Chris Arges reports high memory consumption with thousands of
containers, this patch revisits the array allocation logic.

For anonymous sets, start by 16 slots (which takes 256 bytes on x86_64).
Expand it by x2 until threshold of 512 slots is reached, over that
threshold, expand it by x1.5.

For non-anonymous set, start by 1024 slots in the array (which takes 16
Kbytes initially on x86_64). Expand it by x1.5.

Use set->ndeact to subtract deactivated elements when calculating the
number of the slots in the array, otherwise the array size array gets
increased artifically. Add special case shrink logic to deal with flush
set too.

The shrink logic is skipped by anonymous sets.

Use check_add_overflow() to calculate the new array size.

Add a WARN_ON_ONCE check to make sure elements fit into the new array
size.

Reported-by: Chris Arges <carges@cloudflare.com>
Fixes: 7e43e0a114 ("netfilter: nft_set_rbtree: translate rbtree to array for binary search")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-03-26 13:18:31 +01:00
9d3f027327 netfilter: ip6t_rt: reject oversized addrnr in rt_mt6_check()
Reject rt match rules whose addrnr exceeds IP6T_RT_HOPS.

rt_mt6() expects addrnr to stay within the bounds of rtinfo->addrs[].
Validate addrnr during rule installation so malformed rules are rejected
before the match logic can use an out-of-range value.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Co-developed-by: Yuan Tan <yuantan098@gmail.com>
Signed-off-by: Yuan Tan <yuantan098@gmail.com>
Suggested-by: Xin Liu <bird@lzu.edu.cn>
Tested-by: Yuhang Zheng <z1652074432@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-03-26 13:18:31 +01:00
Weiming Shi
52025ebaa2 netfilter: nfnetlink_log: fix uninitialized padding leak in NFULA_PAYLOAD
__build_packet_message() manually constructs the NFULA_PAYLOAD netlink
attribute using skb_put() and skb_copy_bits(), bypassing the standard
nla_reserve()/nla_put() helpers. While nla_total_size(data_len) bytes
are allocated (including NLA alignment padding), only data_len bytes
of actual packet data are copied. The trailing nla_padlen(data_len)
bytes (1-3 when data_len is not 4-byte aligned) are never initialized,
leaking stale heap contents to userspace via the NFLOG netlink socket.

Replace the manual attribute construction with nla_reserve(), which
handles the tailroom check, header setup, and padding zeroing via
__nla_reserve(). The subsequent skb_copy_bits() fills in the payload
data on top of the properly initialized attribute.

Fixes: df6fb868d6 ("[NETFILTER]: nfnetlink: convert to generic netlink attribute functions")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-03-26 13:15:46 +01:00
Chuck Lever
84a8335d83 tls: Purge async_hold in tls_decrypt_async_wait()
The async_hold queue pins encrypted input skbs while
the AEAD engine references their scatterlist data. Once
tls_decrypt_async_wait() returns, every AEAD operation
has completed and the engine no longer references those
skbs, so they can be freed unconditionally.

A subsequent patch adds batch async decryption to
tls_sw_read_sock(), introducing a new call site that
must drain pending AEAD operations and release held
skbs. Move __skb_queue_purge(&ctx->async_hold) into
tls_decrypt_async_wait() so the purge is centralized
and every caller -- recvmsg's drain path, the -EBUSY
fallback in tls_do_decryption(), and the new read_sock
batch path -- releases held skbs on synchronization
without each site managing the purge independently.

This fixes a leak when tls_strp_msg_hold() fails part-way through,
after having added some cloned skbs to the async_hold
queue. tls_decrypt_sg() will then call tls_decrypt_async_wait() to
process all pending decrypts, and drop back to synchronous mode, but
tls_sw_recvmsg() only flushes the async_hold queue when one record has
been processed in "fully-async" mode, which may not be the case here.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Fixes: b8a6ff84ab ("tls: wait for pending async decryptions if tls_strp_msg_hold fails")
Link: https://patch.msgid.link/20260324-tls-read-sock-v5-1-5408befe5774@oracle.com
[pabeni@redhat.com: added leak comment]
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-26 09:55:53 +01:00
Linus Torvalds
0138af2472 Merge tag 'erofs-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:

 - Mark I/Os as failed when encountering short reads on file-backed
   mounts

 - Label GFP_NOIO in the BIO completion when the completion is in the
   process context, and directly call into the decompression to avoid
   deadlocks

 - Improve Kconfig descriptions to better highlight the overall efforts

 - Fix .fadvise() for page cache sharing

* tag 'erofs-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: fix .fadvise() for page cache sharing
  erofs: update the Kconfig description
  erofs: add GFP_NOIO in the bio completion if needed
  erofs: set fileio bio failed in short read case
2026-03-25 18:41:35 -07:00
Linus Torvalds
aba9da0905 Merge tag 'rcu-fixes.v7.0-20260325a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU fixes from Boqun Feng:
 "Fix a regression introduced by commit c27cea4416 ("rcu: Re-implement
  RCU Tasks Trace in terms of SRCU-fast"): BPF contexts can run with
  preemption disabled or scheduler locks held, so call_srcu() must work
  in all such contexts.

  Fix this by converting SRCU's spinlocks to raw spinlocks and avoiding
  scheduler lock acquisition in call_srcu() by deferring to an irq_work
  (similar to call_rcu_tasks_generic()), for both tree SRCU and tiny
  SRCU.

  Also fix a follow-on lockdep splat caused by srcu_node allocation
  under the newly introduced raw spinlock by deferring the allocation to
  grace-period worker context"

* tag 'rcu-fixes.v7.0-20260325a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux:
  srcu: Use irq_work to start GP in tiny SRCU
  rcu: Use an intermediate irq_work to start process_srcu()
  srcu: Push srcu_node allocation to GP when non-preemptible
  srcu: Use raw spinlocks so call_srcu() can be used under preempt_disable()
2026-03-25 18:14:19 -07:00
Linus Torvalds
d2a43e7f89 Merge tag 'hardening-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:

 - fix required Clang version for CC_HAS_COUNTED_BY_PTR (Nathan
   Chancellor)

 - update Coccinelle script used for kmalloc_obj

* tag 'hardening-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  init/Kconfig: Require a release version of clang-22 for CC_HAS_COUNTED_BY_PTR
  coccinelle: kmalloc_obj: Remove default GFP_KERNEL arg
2026-03-25 14:47:18 -07:00
Linus Torvalds
51088b9d5f Merge tag 'platform-drivers-x86-v7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
 "Fixes and New HW Support. The trivial drop of unused gz_chain_head is
  not exactly fixes material but it allows other work to avoid problems
  so I decided to take it in along with the fixes.

   - amd/hsmp: Fix typo in error message

   - asus-armoury: Add support for G614FP, GA503QM, GZ302EAC, and GZ302EAC

   - asus-nb-wmi: Add DMI quirk for ASUS ROG Flow Z13-KJP GZ302EAC

   - hp-wmi: Support for Omen 16-k0xxx, 16-wf1xxx, 16-xf0xxx

   - intel-hid: Disable wakeup_mode during hibernation

   - ISST:
      - Check HWP support before MSR access
      - Correct locked bit width

   - lenovo: wmi-gamezone: Drop unused gz_chain_head

   - olpc-xo175-ec: Fix overflow error message"

* tag 'platform-drivers-x86-v7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: ISST: Correct locked bit width
  platform/x86: intel-hid: disable wakeup_mode during hibernation
  platform/x86: asus-armoury: add support for GZ302EA and GZ302EAC
  platform/x86: asus-nb-wmi: add DMI quirk for ASUS ROG Flow Z13-KJP GZ302EAC
  platform/x86/amd/hsmp: Fix typo in error message
  platform/olpc: olpc-xo175-ec: Fix overflow error message to print inlen
  platform/x86: lenovo: wmi-gamezone: Drop gz_chain_head
  platform/x86: ISST: Check HWP support before MSR access
  platform/x86: hp-wmi: Add support for Omen 16-k0xxx (8A4D)
  platform/x86: hp-wmi: Add support for Omen 16-wf1xxx (8C76)
  platform/x86: hp-wmi: Add Omen 16-xf0xxx (8BCA) support
  platform/x86: asus-armoury: add support for G614FP
  platform/x86: asus-armoury: add support for GA503QM
  MAINTAINERS: change email address of Denis Benato
2026-03-25 14:43:06 -07:00
Florian Westphal
6caefcd949 selftests: netfilter: nft_concat_range.sh: add check for flush+reload bug
This test will fail without
the preceding commit ("netfilter: nft_set_pipapo_avx2: fix match retart if found element is expired"):

  reject overlapping range on add       0s                              [ OK ]
  reload with flush                 /dev/stdin:59:32-52: Error: Could not process rule: File exists
add element inet filter test { 10.0.0.29 . 10.0.2.29 }

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-03-25 21:40:47 +01:00