linux-net

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/ synced 2026-04-03 23:37:40 -04:00

Author	SHA1	Message	Date
Lorenzo Bianconi	cedc1bf327	net: airoha: Delay offloading until all net_devices are fully registered Netfilter flowtable can theoretically try to offload flower rules as soon as a net_device is registered while all the other ones are not registered or initialized, triggering a possible NULL pointer dereferencing of qdma pointer in airoha_ppe_set_cpu_port routine. Moreover, if register_netdev() fails for a particular net_device, there is a small race if Netfilter tries to offload flowtable rules before all the net_devices are properly unregistered in airoha_probe() error patch, triggering a NULL pointer dereferencing in airoha_ppe_set_cpu_port routine. In order to avoid any possible race, delay offloading until all net_devices are registered in the networking subsystem. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20260329-airoha-regiser-race-fix-v2-1-f4ebb139277b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-30 17:58:40 -07:00
Yochai Eisenrich	e6e3eb5ee8	net: sched: cls_api: fix tc_chain_fill_node to initialize tcm_info to zero to prevent an info-leak When building netlink messages, tc_chain_fill_node() never initializes the tcm_info field of struct tcmsg. Since the allocation is not zeroed, kernel heap memory is leaked to userspace through this 4-byte field. The fix simply zeroes tcm_info alongside the other fields that are already initialized. Fixes: `32a4f5ecd7` ("net: sched: introduce chain object to uapi") Signed-off-by: Yochai Eisenrich <echelonh@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260328211436.1010152-1-echelonh@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-30 17:56:40 -07:00
Guoyu Su	ddc748a391	net: use skb_header_pointer() for TCPv4 GSO frag_off check Syzbot reported a KMSAN uninit-value warning in gso_features_check() called from netif_skb_features() [1]. gso_features_check() reads iph->frag_off to decide whether to clear mangleid_features. Accessing the IPv4 header via ip_hdr()/inner_ip_hdr() can rely on skb header offsets that are not always safe for direct dereference on packets injected from PF_PACKET paths. Use skb_header_pointer() for the TCPv4 frag_off check so the header read is robust whether data is already linear or needs copying. [1] https://syzkaller.appspot.com/bug?extid=1543a7d954d9c6d00407 Link: https://lore.kernel.org/netdev/willemdebruijn.kernel.1a9f35039caab@gmail.com/ Fixes: `cbc53e08a7` ("GSO: Add GSO type for fixed IPv4 ID") Reported-by: syzbot+1543a7d954d9c6d00407@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1543a7d954d9c6d00407 Tested-by: syzbot+1543a7d954d9c6d00407@syzkaller.appspotmail.com Signed-off-by: Guoyu Su <yss2813483011xxl@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260327153507.39742-1-yss2813483011xxl@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-30 17:35:21 -07:00
Lorenzo Bianconi	514aac3599	net: airoha: Add missing cleanup bits in airoha_qdma_cleanup_rx_queue() In order to properly cleanup hw rx QDMA queues and bring the device to the initial state, reset rx DMA queue head/tail index. Moreover, reset queued DMA descriptor fields. Fixes: `23020f0493` ("net: airoha: Introduce ethernet support for EN7581 SoC") Tested-by: Madhur Agrawal <Madhur.Agrawal@airoha.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20260327-airoha_qdma_cleanup_rx_queue-fix-v1-1-369d6ab1511a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-30 17:29:16 -07:00
Paolo Abeni	fd63f18597	ipv6: prevent possible UaF in addrconf_permanent_addr() The mentioned helper try to warn the user about an exceptional condition, but the message is delivered too late, accessing the ipv6 after its possible deletion. Reorder the statement to avoid the possible UaF; while at it, place the warning outside the idev->lock as it needs no protection. Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://sashiko.dev/#/patchset/8c8bfe2e1a324e501f0e15fef404a77443fd8caf.1774365668.git.pabeni%40redhat.com Fixes: `f1705ec197` ("net: ipv6: Make address flushing on ifdown optional") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/ef973c3a8cb4f8f1787ed469f3e5391b9fe95aa0.1774601542.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-30 17:25:10 -07:00
Jakub Kicinski	dc9e9d61e3	Merge branch 'net-enetc-add-more-checks-to-enetc_set_rxfh' Wei Fang says: ==================== net: enetc: add more checks to enetc_set_rxfh() ENETC only supports Toeplitz algorithm, and VFs do not support setting the RSS key, but enetc_set_rxfh() does not check these constraints and silently accepts unsupported configurations. This may mislead users or tools into believing that the requested RSS settings have been successfully applied. So add checks to reject unsupported hash functions and RSS key updates on VFs, and return "-EOPNOTSUPP" to user space. ==================== Link: https://patch.msgid.link/20260326075233.3628047-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-27 20:56:49 -07:00
Wei Fang	a142d13916	net: enetc: do not allow VF to configure the RSS key VFs do not have privilege to configure the RSS key because the registers are owned by the PF. Currently, if VF attempts to configure the RSS key, enetc_set_rxfh() simply skips the configuration and does not generate a warning, which may mislead users into thinking the feature is supported. To improve this situation, add a check to reject RSS key configuration on VFs. Fixes: `d382563f54` ("enetc: Add RFS and RSS support") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Clark Wang <xiaoning.wang@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://patch.msgid.link/20260326075233.3628047-3-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-27 20:56:46 -07:00
Wei Fang	d389954a6c	net: enetc: check whether the RSS algorithm is Toeplitz Both ENETC v1 and v4 only provide Toeplitz RSS support. This patch adds a validation check to reject attempts to configure other RSS algorithms, avoiding misleading configuration options for users. Fixes: `d382563f54` ("enetc: Add RFS and RSS support") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Clark Wang <xiaoning.wang@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://patch.msgid.link/20260326075233.3628047-2-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-27 20:56:45 -07:00
Marek Behún	eeee5a710f	net: sfp: Fix Ubiquiti U-Fiber Instant SFP module on mvneta In commit `8110633db4` ("net: sfp-bus: allow SFP quirks to override Autoneg and pause bits") we moved the setting of Autoneg and pause bits before the call to SFP quirk when parsing SFP module support. Since the quirk for Ubiquiti U-Fiber Instant SFP module zeroes the support bits and sets 1000baseX_Full only, the above mentioned commit changed the overall computed support from 1000baseX_Full, Autoneg, Pause, Asym_Pause to just 1000baseX_Full. This broke the SFP module for mvneta, which requires Autoneg for 1000baseX since commit `c762b7fac1` ("net: mvneta: deny disabling autoneg for 802.3z modes"). Fix this by setting back the Autoneg, Pause and Asym_Pause bits in the quirk. Fixes: `8110633db4` ("net: sfp-bus: allow SFP quirks to override Autoneg and pause bits") Signed-off-by: Marek Behún <kabel@kernel.org> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20260326122038.2489589-1-kabel@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-27 20:55:52 -07:00
Xiang Mei	5d17af9eb2	selftests/tc-testing: add test for HFSC divide-by-zero in rtsc_min() Add a regression test for the divide-by-zero in rtsc_min() triggered when m2sm() converts a large m1 value (e.g. 32gbit) to a u64 scaled slope reaching 2^32. rtsc_min() stores the difference of two such u64 values (sm1 - sm2) in a u32 variable `dsm`, truncating 2^32 to zero and causing a divide-by-zero oops in the concave-curve intersection path. The test configures an HFSC class with m1=32gbit d=1ms m2=0bit, sends a packet to activate the class, waits for it to drain and go idle, then sends another packet to trigger reactivation through rtsc_min(). Signed-off-by: Xiang Mei <xmei5@asu.edu> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20260326204310.1549327-2-xmei5@asu.edu Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-27 20:41:11 -07:00
Xiang Mei	4576100b8c	net/sched: sch_hfsc: fix divide-by-zero in rtsc_min() m2sm() converts a u32 slope to a u64 scaled value. For large inputs (e.g. m1=4000000000), the result can reach 2^32. rtsc_min() stores the difference of two such u64 values in a u32 variable `dsm` and uses it as a divisor. When the difference is exactly 2^32 the truncation yields zero, causing a divide-by-zero oops in the concave-curve intersection path: Oops: divide error: 0000 RIP: 0010:rtsc_min (net/sched/sch_hfsc.c:601) Call Trace: init_ed (net/sched/sch_hfsc.c:629) hfsc_enqueue (net/sched/sch_hfsc.c:1569) [...] Widen `dsm` to u64 and replace do_div() with div64_u64() so the full difference is preserved. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Xiang Mei <xmei5@asu.edu> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260326204310.1549327-1-xmei5@asu.edu Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-27 20:41:11 -07:00
Jakub Kicinski	1a6fdb3518	Merge branch 'bridge-vxlan-harden-nd-option-parsing-paths' Yang Yang says: ==================== bridge/vxlan: harden ND option parsing paths This series hardens ND option parsing in bridge and vxlan paths. Patch 1 linearizes the request skb in br_nd_send() before walking ND options. Patch 2 adds explicit ND option length validation in br_nd_send(). Patch 3 adds matching ND option length validation in vxlan_na_create(). ==================== Link: https://patch.msgid.link/20260326034441.2037420-1-n05ec@lzu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-27 20:37:17 -07:00
Yang Yang	afa9a05e6c	vxlan: validate ND option lengths in vxlan_na_create vxlan_na_create() walks ND options according to option-provided lengths. A malformed option can make the parser advance beyond the computed option span or use a too-short source LLADDR option payload. Validate option lengths against the remaining NS option area before advancing, and only read source LLADDR when the option is large enough for an Ethernet address. Fixes: `4b29dba9c0` ("vxlan: fix nonfunctional neigh_reduce()") Cc: stable@vger.kernel.org Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Tested-by: Ao Zhou <n05ec@lzu.edu.cn> Co-developed-by: Yuan Tan <tanyuan98@outlook.com> Signed-off-by: Yuan Tan <tanyuan98@outlook.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Yang Yang <n05ec@lzu.edu.cn> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260326034441.2037420-4-n05ec@lzu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-27 20:37:14 -07:00
Yang Yang	850837965a	bridge: br_nd_send: validate ND option lengths br_nd_send() walks ND options according to option-provided lengths. A malformed option can make the parser advance beyond the computed option span or use a too-short source LLADDR option payload. Validate option lengths against the remaining NS option area before advancing, and only read source LLADDR when the option is large enough for an Ethernet address. Fixes: `ed842faeb2` ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports") Cc: stable@vger.kernel.org Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Tested-by: Ao Zhou <n05ec@lzu.edu.cn> Co-developed-by: Yuan Tan <tanyuan98@outlook.com> Signed-off-by: Yuan Tan <tanyuan98@outlook.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Yang Yang <n05ec@lzu.edu.cn> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260326034441.2037420-3-n05ec@lzu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-27 20:37:14 -07:00
Yang Yang	a01aee7caf	bridge: br_nd_send: linearize skb before parsing ND options br_nd_send() parses neighbour discovery options from ns->opt[] and assumes that these options are in the linear part of request. Its callers only guarantee that the ICMPv6 header and target address are available, so the option area can still be non-linear. Parsing ns->opt[] in that case can access data past the linear buffer. Linearize request before option parsing and derive ns from the linear network header. Fixes: `ed842faeb2` ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Tested-by: Ao Zhou <n05ec@lzu.edu.cn> Co-developed-by: Yuan Tan <tanyuan98@outlook.com> Signed-off-by: Yuan Tan <tanyuan98@outlook.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Yang Yang <n05ec@lzu.edu.cn> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260326034441.2037420-2-n05ec@lzu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-27 20:37:14 -07:00
Jakub Kicinski	c11c731a68	Merge branch 'fix-page-fragment-handling-when-page_size-4k' Dimitri Daskalakis says: ==================== Fix page fragment handling when PAGE_SIZE > 4K FBNIC operates on fixed size descriptors (4K). When the OS supports pages larger than 4K, we fragment the page across multiple descriptors. While performance testing, I found several issues with our page fragment handling, resulting in low throughput and potential RX stalls. ==================== Link: https://patch.msgid.link/20260324195123.3486219-1-dimitri.daskalakis1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-27 20:30:26 -07:00
Dimitri Daskalakis	f3567dd428	eth: fbnic: Fix debugfs output for BDQ's with page frags The rings size_mask represents the number of pages, so we need to determine the number of page frags when dumping the descriptors. Fixes: `df04373b0d` ("eth fbnic: Add debugfs hooks for tx/rx rings") Signed-off-by: Dimitri Daskalakis <daskald@meta.com> Link: https://patch.msgid.link/20260324195123.3486219-3-dimitri.daskalakis1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-27 20:30:23 -07:00
Dimitri Daskalakis	b38c55320b	eth: fbnic: Account for page fragments when updating BDQ tail FBNIC supports fixed size buffers of 4K. When PAGE_SIZE > 4K, we fragment the page across multiple descriptors (FBNIC_BD_FRAG_COUNT). When refilling the BDQ, the correct number of entries are populated, but tail was only incremented by one. So on a system with 64K pages, HW would get one descriptor refilled for every 16 we populate. Additionally, we program the ring size in the HW when enabling the BDQ. This was not accounting for page fragments, so on systems with 64K pages, the HW used 1/16th of the ring. Fixes: `0cb4c0a137` ("eth: fbnic: Implement Rx queue alloc/start/stop/free") Signed-off-by: Dimitri Daskalakis <daskald@meta.com> Link: https://patch.msgid.link/20260324195123.3486219-2-dimitri.daskalakis1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-27 20:30:23 -07:00
Eric Dumazet	2edfa31769	ip6_tunnel: clear skb2->cb[] in ip4ip6_err() Oskar Kjos reported the following problem. ip4ip6_err() calls icmp_send() on a cloned skb whose cb[] was written by the IPv6 receive path as struct inet6_skb_parm. icmp_send() passes IPCB(skb2) to __ip_options_echo(), which interprets that cb[] region as struct inet_skb_parm (IPv4). The layouts differ: inet6_skb_parm.nhoff at offset 14 overlaps inet_skb_parm.opt.rr, producing a non-zero rr value. __ip_options_echo() then reads optlen from attacker-controlled packet data at sptr[rr+1] and copies that many bytes into dopt->__data, a fixed 40-byte stack buffer (IP_OPTIONS_DATA_FIXED_SIZE). To fix this we clear skb2->cb[], as suggested by Oskar Kjos. Also add minimal IPv4 header validation (version == 4, ihl >= 5). Fixes: `c4d3efafcc` ("[IPV6] IP6TUNNEL: Add support to IPv4 over IPv6 tunnel.") Reported-by: Oskar Kjos <oskar.kjos@hotmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260326155138.2429480-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-27 20:24:12 -07:00
Eric Dumazet	86ab3e5567	ipv6: icmp: clear skb2->cb[] in ip6_err_gen_icmpv6_unreach() Sashiko AI-review observed: In ip6_err_gen_icmpv6_unreach(), the skb is an outer IPv4 ICMP error packet where its cb contains an IPv4 inet_skb_parm. When skb is cloned into skb2 and passed to icmp6_send(), it uses IP6CB(skb2). IP6CB interprets the IPv4 inet_skb_parm as an inet6_skb_parm. The cipso offset in inet_skb_parm.opt directly overlaps with dsthao in inet6_skb_parm at offset 18. If an attacker sends a forged ICMPv4 error with a CIPSO IP option, dsthao would be a non-zero offset. Inside icmp6_send(), mip6_addr_swap() is called and uses ipv6_find_tlv(skb, opt->dsthao, IPV6_TLV_HAO). This would scan the inner, attacker-controlled IPv6 packet starting at that offset, potentially returning a fake TLV without checking if the remaining packet length can hold the full 18-byte struct ipv6_destopt_hao. Could mip6_addr_swap() then perform a 16-byte swap that extends past the end of the packet data into skb_shared_info? Should the cb array also be cleared in ip6_err_gen_icmpv6_unreach() and ip6ip6_err() to prevent this? This patch implements the first suggestion. I am not sure if ip6ip6_err() needs to be changed. A separate patch would be better anyway. Fixes: `ca15a078bd` ("sit: generate icmpv6 error when receiving icmpv4 error") Reported-by: Ido Schimmel <idosch@nvidia.com> Closes: https://sashiko.dev/#/patchset/20260326155138.2429480-1-edumazet%40google.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Oskar Kjos <oskar.kjos@hotmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260326202608.2976021-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-27 20:22:46 -07:00
David Carlier	5597dd284f	net: ti: icssg-prueth: fix missing data copy and wrong recycle in ZC RX dispatch emac_dispatch_skb_zc() allocates a new skb via napi_alloc_skb() but never copies the packet data from the XDP buffer into it. The skb is passed up the stack containing uninitialized heap memory instead of the actual received packet, leaking kernel heap contents to userspace. Copy the received packet data from the XDP buffer into the skb using skb_copy_to_linear_data(). Additionally, remove the skb_mark_for_recycle() call since the skb is backed by the NAPI page frag allocator, not page_pool. Marking a non-page_pool skb for recycle causes the free path to return pages to a page_pool that does not own them, corrupting page_pool state. The non-ZC path (emac_rx_packet) does not have these issues because it uses napi_build_skb() to wrap the existing page_pool page directly, requiring no copy, and correctly marks for recycle since the page comes from page_pool_dev_alloc_pages(). Fixes: `7a64bb388d` ("net: ti: icssg-prueth: Add AF_XDP zero copy for RX") Signed-off-by: David Carlier <devnexen@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2026-03-27 12:08:26 +00:00
Thomas Bogendoerfer	bb417456c7	tg3: Fix race for querying speed/duplex When driver signals carrier up via netif_carrier_on() its internal link_up state isn't updated immediately. This leads to inconsistent speed/duplex in /proc/net/bonding/bondX where the speed and duplex is shown as unknown while ethtool shows correct values. Fix this by using netif_carrier_ok() for link checking in get_ksettings function. Fixes: `84421b99ce` ("tg3: Update link_up flag for phylib devices") Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2026-03-27 12:06:38 +00:00
Pengpeng Hou	5e67ba9bb5	net/ipv6: ioam6: prevent schema length wraparound in trace fill ioam6_fill_trace_data() stores the schema contribution to the trace length in a u8. With bit 22 enabled and the largest schema payload, sclen becomes 1 + 1020 / 4, wraps from 256 to 0, and bypasses the remaining-space check. __ioam6_fill_trace_data() then positions the write cursor without reserving the schema area but still copies the 4-byte schema header and the full schema payload, overrunning the trace buffer. Keep sclen in an unsigned int so the remaining-space check and the write cursor calculation both see the full schema length. Fixes: `8c6f6fa677` ("ipv6: ioam: IOAM Generic Netlink API") Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> Reviewed-by: Justin Iurman <justin.iurman@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2026-03-27 12:05:36 +00:00
Yochai Eisenrich	ae05340cca	net: ipv6: ndisc: fix ndisc_ra_useropt to initialize nduseropt_padX fields to zero to prevent an info-leak When processing Router Advertisements with user options the kernel builds an RTM_NEWNDUSEROPT netlink message. The nduseroptmsg struct has three padding fields that are never zeroed and can leak kernel data The fix is simple, just zeroes the padding fields. Fixes: `31910575a9` ("[IPv6]: Export userland ND options through netlink (RDNSS support)") Signed-off-by: Yochai Eisenrich <echelonh@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260324224925.2437775-1-echelonh@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-26 20:38:35 -07:00
Jiayuan Chen	2428083101	net: qrtr: replace qrtr_tx_flow radix_tree with xarray to fix memory leak __radix_tree_create() allocates and links intermediate nodes into the tree one by one. If a subsequent allocation fails, the already-linked nodes remain in the tree with no corresponding leaf entry. These orphaned internal nodes are never reclaimed because radix_tree_for_each_slot() only visits slots containing leaf values. The radix_tree API is deprecated in favor of xarray. As suggested by Matthew Wilcox, migrate qrtr_tx_flow from radix_tree to xarray instead of fixing the radix_tree itself [1]. xarray properly handles cleanup of internal nodes — xa_destroy() frees all internal xarray nodes when the qrtr_node is released, preventing the leak. [1] https://lore.kernel.org/all/20260225071623.41275-1-jiayuan.chen@linux.dev/T/ Reported-by: syzbot+006987d1be3586e13555@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000bfba3a060bf4ffcf@google.com/T/ Fixes: `5fdeb0d372` ("net: qrtr: Implement outgoing flow control") Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260324080645.290197-1-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-26 20:22:38 -07:00
Jakub Kicinski	04f272188f	Merge branch 'net-enetc-safely-reinitialize-tx-bd-ring-when-it-has-unsent-frames' Wei Fang says: ==================== net: enetc: safely reinitialize TX BD ring when it has unsent frames Currently the driver does not reset the producer index register (PIR) and consumer index register (CIR) when initializing a TX BD ring. The driver only reads the PIR and CIR and initializes the software indexes. If the TX BD ring is reinitialized when it still contains unsent frames, its PIR and CIR will not be equal after the reinitialization. However, the BDs between CIR and PIR have been freed and become invalid and this can lead to a hardware malfunction, causing the TX BD ring will not work properly. Since the PIR and CIR are sofeware-configurable on ENETC v4. Therefore, the driver must reset them if they are not equal when reinitializing the TX BD ring. However, resetting the PIR and CIR alone is insufficient, it cannot completely solve the problem. When a link-down event occurs while the TX BD ring is transmitting frames, subsequent reinitialization of the TX BD ring may cause it to malfunction. Because enetc4_pl_mac_link_down() only clears PMa_COMMAND_CONFIG[TX_EN] to disable MAC transmit data path. It doesn't set PORT[TXDIS] to 1 to flush the TX BD ring. Therefore, it is not safe to reinitialize the TX BD ring at this point. To safely reinitialize the TX BD ring after a link-down event, we checked with the NETC IP team, a proper Ethernet MAC graceful stop is necessary. Therefore, add the Ethernet MAC graceful stop to the link-down event handler enetc4_pl_mac_link_down(). Note that this patch set is not applicable to ENETC v1 (LS1028A). ==================== Link: https://patch.msgid.link/20260324062121.2745033-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-26 20:19:09 -07:00
Wei Fang	f2df9567b1	net: enetc: do not access non-existent registers on pseudo MAC The ENETC4_PM_IEVENT and ENETC4_PM_CMD_CFG registers do not exist on the ENETC pseudo MAC, so the driver should prevent from accessing them. Fixes: `5175c1e4ad` ("net: enetc: add basic support for the ENETC with pseudo MAC for i.MX94") Signed-off-by: Wei Fang <wei.fang@nxp.com> Tested-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260324062121.2745033-4-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-26 20:19:06 -07:00
Wei Fang	2725d84efe	net: enetc: add graceful stop to safely reinitialize the TX Ring For ENETC v4, the PIR and CIR will be reset if they are not equal when reinitializing the TX BD ring. However, resetting the PIR and CIR alone is insufficient. When a link-down event occurs while the TX BD ring is transmitting frames, subsequent reinitialization of the TX BD ring may cause it to malfunction. For example, the below steps can reproduce the problem. 1. Unplug the cable when the TX BD ring is busy transmitting frames. 2. Disable the network interface (ifconfig eth0 down). 3. Re-enable the network interface (ifconfig eth0 up). 4. Plug in the cable, the TX BD ring may fail to transmit packets. When the link-down event occurs, enetc4_pl_mac_link_down() only clears PMa_COMMAND_CONFIG[TX_EN] to disable MAC transmit data path. It doesn't set PORT[TXDIS] to 1 to flush the TX BD ring. Therefore, reinitializing the TX BD ring at this point is unsafe. To safely reinitialize the TX BD ring after a link-down event, we checked with the NETC IP team, a proper Ethernet MAC graceful stop is necessary. Therefore, add the Ethernet MAC graceful stop to the link-down event handler enetc4_pl_mac_link_down(). Fixes: `99100d0d99` ("net: enetc: add preliminary support for i.MX95 ENETC PF") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260324062121.2745033-3-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-26 20:19:06 -07:00
Wei Fang	0239fd701d	net: enetc: reset PIR and CIR if they are not equal when initializing TX ring Currently the driver does not reset the producer index register (PIR) and consumer index register (CIR) when initializing a TX BD ring. The driver only reads the PIR and CIR and initializes the software indexes. If the TX BD ring is reinitialized when it still contains unsent frames, its PIR and CIR will not be equal after the reinitialization. However, the BDs between CIR and PIR have been freed and become invalid and this can lead to a hardware malfunction, causing the TX BD ring will not work properly. For ENETC v4, it supports software to set the PIR and CIR, so the driver can reset these two registers if they are not equal when reinitializing the TX BD ring. Therefore, add this solution for ENETC v4. Note that this patch does not work for ENETC v1 because it does not support software to set the PIR and CIR. Fixes: `99100d0d99` ("net: enetc: add preliminary support for i.MX95 ENETC PF") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260324062121.2745033-2-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-26 20:19:05 -07:00
Buday Csaba	e8e44c98f7	net: fec: fix the PTP periodic output sysfs interface When the PPS channel configuration was implemented, the channel index for the periodic outputs was configured as the hardware channel number. The sysfs interface uses a logical channel index, and rejects numbers greater than `n_per_out` (see period_store() in ptp_sysfs.c). That property was left at 1, since the driver implements channel selection, not simultaneous operation of multiple PTP hardware timer channels. A second check in fec_ptp_enable() returns -EOPNOTSUPP when the two channel numbers disagree, making channels 1..3 unusable from sysfs. Fix by removing this redundant check in the FEC PTP driver. Fixes: `566c2d8388` ("net: fec: make PPS channel configurable") Signed-off-by: Buday Csaba <buday.csaba@prolan.hu> Link: https://patch.msgid.link/8ec2afe88423c2231f9cf8044d212ce57846670e.1774359059.git.buday.csaba@prolan.hu Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-26 20:10:15 -07:00
Qingfang Deng	57a04a13aa	netdevsim: fix build if SKB_EXTENSIONS=n __skb_ext_put() is not declared if SKB_EXTENSIONS is not enabled, which causes a build error: drivers/net/netdevsim/netdev.c: In function 'nsim_forward_skb': drivers/net/netdevsim/netdev.c:114:25: error: implicit declaration of function '__skb_ext_put'; did you mean 'skb_ext_put'? [-Werror=implicit-function-declaration] 114 \| __skb_ext_put(psp_ext); \| ^~~~~~~~~~~~~ \| skb_ext_put cc1: some warnings being treated as errors Add a stub to fix the build. Fixes: `7d9351435e` ("netdevsim: drop PSP ext ref on forward failure") Signed-off-by: Qingfang Deng <dqfext@gmail.com> Link: https://patch.msgid.link/20260324140857.783-1-dqfext@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-26 20:08:58 -07:00
Sven Eckelmann (Plasma Cloud)	976ff48c2a	net: ethernet: mtk_ppe: avoid NULL deref when gmac0 is disabled If the gmac0 is disabled, the precheck for a valid ingress device will cause a NULL pointer deref and crash the system. This happens because eth->netdev[0] will be NULL but the code will directly try to access netdev_ops. Instead of just checking for the first net_device, it must be checked if any of the mtk_eth net_devices is matching the netdev_ops of the ingress device. Cc: stable@vger.kernel.org Fixes: `73cfd947db` ("net: ethernet: mtk_eth_soc: ppe: prevent ppe update for non-mtk devices") Signed-off-by: Sven Eckelmann (Plasma Cloud) <se@simonwunderlich.de> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260324-wed-crash-gmac0-disabled-v1-1-3bc388aee565@simonwunderlich.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-26 19:01:25 -07:00
Dipayaan Roy	f73896b419	net: mana: Fix RX skb truesize accounting MANA passes rxq->alloc_size to napi_build_skb() for all RX buffers. It is correct for fragment-backed RX buffers, where alloc_size matches the actual backing allocation used for each packet buffer. However, in the non-fragment RX path mana allocates a full page, or a higher-order page, per RX buffer. In that case alloc_size only reflects the usable packet area and not the actual backing memory. This causes napi_build_skb() to underestimate the skb backing allocation in the single-buffer RX path, so skb->truesize is derived from a value smaller than the real RX buffer allocation. Fix this by updating alloc_size in the non-fragment RX path to the actual backing allocation size before it is passed to napi_build_skb(). Fixes: `730ff06d3f` ("net: mana: Use page pool fragments for RX buffers instead of full pages to improve memory efficiency.") Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/acLUhLpLum6qrD/N@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-26 18:57:07 -07:00
Sabrina Dubroca	629ec78ef8	mpls: add seqcount to protect the platform_label{,s} pair The RCU-protected codepaths (mpls_forward, mpls_dump_routes) can have an inconsistent view of platform_labels vs platform_label in case of a concurrent resize (resize_platform_label_table, under platform_mutex). This can lead to OOB accesses. This patch adds a seqcount, so that we get a consistent snapshot. Note that mpls_label_ok is also susceptible to this, so the check against RTA_DST in rtm_to_route_config, done outside platform_mutex, is not sufficient. This value gets passed to mpls_label_ok once more in both mpls_route_add and mpls_route_del, so there is no issue, but that additional check must not be removed. Reported-by: Yuan Tan <tanyuan98@outlook.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Fixes: `7720c01f3f` ("mpls: Add a sysctl to control the size of the mpls label table") Fixes: `dde1b38e87` ("mpls: Convert mpls_dump_routes() to RCU.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/cd8fca15e3eb7e212b094064cd83652e20fd9d31.1774284088.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-26 18:32:14 -07:00
Jakub Kicinski	45dbf8fcea	Merge tag 'wireless-2026-03-26' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Couple more fixes: - virt_wifi: remove SET_NETDEV_DEV to avoid UAF on teardown - iwlwifi: - fix (some) devices that don't have 6 GHz (WiFi6E) - fix potential OOB read of firmware notification - set WiFi generation for firmware to avoid packet drops - fix multi-link scan timing - wilc1000: fix integer overflow - ath11k/ath12k: fix TID during A-MPDU session teardown - wl1251: don't trust firmware TX status response index * tag 'wireless-2026-03-26' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: virt_wifi: remove SET_NETDEV_DEV to avoid use-after-free wifi: iwlwifi: mvm: fix potential out-of-bounds read in iwl_mvm_nd_match_info_handler() wifi: wl1251: validate packet IDs before indexing tx_frames wifi: wilc1000: fix u8 overflow in SSID scan buffer size calculation wifi: ath12k: Pass the correct value of each TID during a stop AMPDU session wifi: ath11k: Pass the correct value of each TID during a stop AMPDU session wifi: iwlwifi: mld: correctly set wifi generation data wifi: iwlwifi: mvm: don't send a 6E related command when not supported wifi: iwlwifi: mld: Fix MLO scan timing ==================== Link: https://patch.msgid.link/20260326093329.77815-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-26 17:51:39 -07:00
Linus Torvalds	453a4a5f97	Merge tag 'net-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from Bluetooth, CAN, IPsec and Netfilter. Notably, this includes the fix for the Bluetooth regression that you were notified about. I'm not aware of any other pending regressions. Current release - regressions: - bluetooth: - fix stack-out-of-bounds read in l2cap_ecred_conn_req - fix regressions caused by reusing ident - netfilter: revisit array resize logic - eth: ice: set max queues in alloc_etherdev_mqs() Previous releases - regressions: - core: correctly handle tunneled traffic on IPV6_CSUM GSO fallback - bluetooth: - fix dangling pointer on mgmt_add_adv_patterns_monitor_complete - fix deadlock in l2cap_conn_del() - sched: codel: fix stale state for empty flows in fq_codel - ipv6: remove permanent routes from tb6_gc_hlist when all exceptions expire. - xfrm: fix skb_put() panic on non-linear skb during reassembly - openvswitch: - avoid releasing netdev before teardown completes - validate MPLS set/set_masked payload length - eth: iavf: fix out-of-bounds writes in iavf_get_ethtool_stats() Previous releases - always broken: - bluetooth: fix null-ptr-deref on l2cap_sock_ready_cb - udp: fix wildcard bind conflict check when using hash2 - netfilter: fix use of uninitialized rtp_addr in process_sdp - tls: Purge async_hold in tls_decrypt_async_wait() - xfrm: - prevent policy_hthresh.work from racing with netns teardown - fix skb leak with espintcp and async crypto - smc: fix double-free of smc_spd_priv when tee() duplicates splice pipe buffer - can: - add missing error handling to call can_ctrlmode_changelink() - fix OOB heap access in cgw_csum_crc8_rel() - eth: - mana: fix use-after-free in add_adev() error path - virtio-net: fix for VIRTIO_NET_F_GUEST_HDRLEN - bcmasp: fix double free of WoL irq" * tag 'net-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (90 commits) net: macb: use the current queue number for stats netfilter: ctnetlink: use netlink policy range checks netfilter: nf_conntrack_sip: fix use of uninitialized rtp_addr in process_sdp netfilter: nf_conntrack_expect: skip expectations in other netns via proc netfilter: nf_conntrack_expect: store netns and zone in expectation netfilter: ctnetlink: ensure safe access to master conntrack netfilter: nf_conntrack_expect: use expect->helper netfilter: nf_conntrack_expect: honor expectation helper field netfilter: nft_set_rbtree: revisit array resize logic netfilter: ip6t_rt: reject oversized addrnr in rt_mt6_check() netfilter: nfnetlink_log: fix uninitialized padding leak in NFULA_PAYLOAD tls: Purge async_hold in tls_decrypt_async_wait() selftests: netfilter: nft_concat_range.sh: add check for flush+reload bug netfilter: nft_set_pipapo_avx2: don't return non-matching entry on expiry Bluetooth: btusb: clamp SCO altsetting table indices Bluetooth: L2CAP: Fix ERTM re-init and zero pdu_len infinite loop Bluetooth: L2CAP: Fix deadlock in l2cap_conn_del() Bluetooth: btintel: serialize btintel_hw_error() with hci_req_sync_lock Bluetooth: L2CAP: Fix send LE flow credits in ACL link net: mana: fix use-after-free in add_adev() error path ...	2026-03-26 09:53:08 -07:00
Linus Torvalds	75c78a4faa	Merge tag 'pinctrl-v7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: - Implement .get_direction() in the spmi-gpio gpio_chip Recent changes makes this start to print warnings and it's not nice, let's just fix it - Clamp the return value of gpio_get() in the Renesas RZA1 driver - Add the GPIO_GENERIC dependency to the STM32 HDP driver - Modify the Mediatek driver to accept devices that do not use external interrupts (EINT) at all - Fix flag propagation in the Sunxi driver, so that we can fix an issue with uninitialized pins in a follow-up patch using said flags * tag 'pinctrl-v7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: sunxi: fix gpiochip_lock_as_irq() failure when pinmux is unknown pinctrl: sunxi: pass down flags to pinctrl routines pinctrl: mediatek: common: Fix probe failure for devices without EINT pinctrl: stm32: fix HDP driver dependency on GPIO_GENERIC pinctrl: renesas: rza1: Normalize return value of gpio_get() pinctrl: qcom: spmi-gpio: implement .get_direction() pinctrl: renesas: rzt2h: Fix invalid wait context pinctrl: renesas: rzt2h: Fix device node leak in rzt2h_gpio_register()	2026-03-26 08:35:51 -07:00
Linus Torvalds	dabb83ecf4	Merge tag 'dma-mapping-7.0-2026-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fixes from Marek Szyprowski: "A set of fixes for DMA-mapping subsystem, which resolve false- positive warnings from KMSAN and DMA-API debug (Shigeru Yoshida and Leon Romanovsky) as well as a simple build fix (Miguel Ojeda)" * tag 'dma-mapping-7.0-2026-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma-mapping: add missing `inline` for `dma_free_attrs` mm/hmm: Indicate that HMM requires DMA coherency RDMA/umem: Tell DMA mapping that UMEM requires coherency iommu/dma: add support for DMA_ATTR_REQUIRE_COHERENT attribute dma-direct: prevent SWIOTLB path when DMA_ATTR_REQUIRE_COHERENT is set dma-mapping: Introduce DMA require coherency attribute dma-mapping: Clarify valid conditions for CPU cache line overlap dma-mapping: handle DMA_ATTR_CPU_CACHE_CLEAN in trace output dma-debug: Allow multiple invocations of overlapping entries dma: swiotlb: add KMSAN annotations to swiotlb_bounce()	2026-03-26 08:22:07 -07:00
Paolo Abeni	db472c34a7	Merge tag 'nf-26-03-26' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter for net This is v3, I kept back an ipset fix and another to tigthen the xtables interface to reject invalid combinations with the NFPROTO_ARP family. They need a bit more discussion. I fixed the issues reported by AI on patch 9 (add #ifdef to access ct zone, update nf_conntrack_broadcast and patch 10 (use better Fixes: tag). Thanks! The following patchset contains Netfilter fixes for net. Note that most bugs fixed here stem from 2.6 days, the large PR is not due to an increase in regressions. 1) Fix incorrect reject of set updates with nf_tables pipapo set avx2 backend. This comes with a regression test in patch 2. From Florian Westphal. 2) nfnetlink_log needs to zero padding to prevent infoleak to userspace, from Weiming Shi. 3) xtables ip6t_rt module never validated that addrnr length is within the allowed array boundary. Reject bogus values. From Ren Wei. 4) Fix high memory usage in rbtree set backend that was unwanted side-effect of the recently added binary search blob. From Pablo Neira Ayuso. 5) Patches 5 to 10, also from Pablo, address long-standing RCU safety bugs in conntracks handling of expectations: We can never safely defer a conntrack extension area without holding a reference. Yet expectation handling does so in multiple places. Fix this by avoiding the need to look into the master conntrack to begin with and by extending locked sections in a few places. 11) Fix use of uninitialized rtp_addr in the sip conntrack helper, also from Weiming Shi. 12) Add stricter netlink policy checks in ctnetlink, from David Carlier. This avoids undefined behaviour when userspace provides huge wscale value. netfilter pull request 26-03-26 * tag 'nf-26-03-26' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: ctnetlink: use netlink policy range checks netfilter: nf_conntrack_sip: fix use of uninitialized rtp_addr in process_sdp netfilter: nf_conntrack_expect: skip expectations in other netns via proc netfilter: nf_conntrack_expect: store netns and zone in expectation netfilter: ctnetlink: ensure safe access to master conntrack netfilter: nf_conntrack_expect: use expect->helper netfilter: nf_conntrack_expect: honor expectation helper field netfilter: nft_set_rbtree: revisit array resize logic netfilter: ip6t_rt: reject oversized addrnr in rt_mt6_check() netfilter: nfnetlink_log: fix uninitialized padding leak in NFULA_PAYLOAD selftests: netfilter: nft_concat_range.sh: add check for flush+reload bug netfilter: nft_set_pipapo_avx2: don't return non-matching entry on expiry ==================== Link: https://patch.msgid.link/20260326125153.685915-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-26 15:38:14 +01:00
Paolo Abeni	deec4f7b41	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== For ice: Michal corrects call to alloc_etherdev_mqs() to provide maximum number of queues supported rather than currently allocated number of queues. Petr Oros fixes issues related to some ethtool operations in switchdev mode. For iavf: Kohei Enju corrects number of reported queues for ethtool statistics to absolute max as using current number could race and cause out-of-bounds issues. For idpf: Josh NULLs cdev_info pointer after freeing to prevent possible subsequent improper access. He also defers setting of refillqs value until after allocation to prevent possible NULL pointer dereference. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: idpf: only assign num refillqs if allocation was successful idpf: clear stale cdev_info ptr iavf: fix out-of-bounds writes in iavf_get_ethtool_stats() ice: use ice_update_eth_stats() for representor stats ice: fix inverted ready check for VF representors ice: set max queues in alloc_etherdev_mqs() ==================== Link: https://patch.msgid.link/20260323205843.624704-1-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-26 15:14:52 +01:00
Paolo Valerio	72d96e4e24	net: macb: use the current queue number for stats There's a potential mismatch between the memory reserved for statistics and the amount of memory written. gem_get_sset_count() correctly computes the number of stats based on the active queues, whereas gem_get_ethtool_stats() indiscriminately copies data using the maximum number of queues, and in the case the number of active queues is less than MACB_MAX_QUEUES, this results in a OOB write as observed in the KASAN splat. ================================================================== BUG: KASAN: vmalloc-out-of-bounds in gem_get_ethtool_stats+0x54/0x78 [macb] Write of size 760 at addr ffff80008080b000 by task ethtool/1027 CPU: [...] Tainted: [E]=UNSIGNED_MODULE Hardware name: raspberrypi rpi/rpi, BIOS 2025.10 10/01/2025 Call trace: show_stack+0x20/0x38 (C) dump_stack_lvl+0x80/0xf8 print_report+0x384/0x5e0 kasan_report+0xa0/0xf0 kasan_check_range+0xe8/0x190 __asan_memcpy+0x54/0x98 gem_get_ethtool_stats+0x54/0x78 [macb 926c13f3af83b0c6fe64badb21ec87d5e93fcf65] dev_ethtool+0x1220/0x38c0 dev_ioctl+0x4ac/0xca8 sock_do_ioctl+0x170/0x1d8 sock_ioctl+0x484/0x5d8 __arm64_sys_ioctl+0x12c/0x1b8 invoke_syscall+0xd4/0x258 el0_svc_common.constprop.0+0xb4/0x240 do_el0_svc+0x48/0x68 el0_svc+0x40/0xf8 el0t_64_sync_handler+0xa0/0xe8 el0t_64_sync+0x1b0/0x1b8 The buggy address belongs to a 1-page vmalloc region starting at 0xffff80008080b000 allocated at dev_ethtool+0x11f0/0x38c0 The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff00000a333000 pfn:0xa333 flags: 0x7fffc000000000(node=0\|zone=0\|lastcpupid=0x1ffff) raw: 007fffc000000000 0000000000000000 dead000000000122 0000000000000000 raw: ffff00000a333000 0000000000000000 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff80008080b080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff80008080b100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff80008080b180: 00 00 00 00 00 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ^ ffff80008080b200: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ffff80008080b280: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ================================================================== Fix it by making sure the copied size only considers the active number of queues. Fixes: `512286bbd4` ("net: macb: Added some queue statistics") Signed-off-by: Paolo Valerio <pvalerio@redhat.com> Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de> Link: https://patch.msgid.link/20260323191634.2185840-1-pvalerio@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-26 13:48:21 +01:00
Paolo Abeni	aa637b2cf3	Merge tag 'for-net-2026-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - L2CAP: Fix deadlock in l2cap_conn_del() - L2CAP: Fix ERTM re-init and zero pdu_len infinite loop - L2CAP: Fix send LE flow credits in ACL link - btintel: serialize btintel_hw_error() with hci_req_sync_lock - btusb: clamp SCO altsetting table indices * tag 'for-net-2026-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: btusb: clamp SCO altsetting table indices Bluetooth: L2CAP: Fix ERTM re-init and zero pdu_len infinite loop Bluetooth: L2CAP: Fix deadlock in l2cap_conn_del() Bluetooth: btintel: serialize btintel_hw_error() with hci_req_sync_lock Bluetooth: L2CAP: Fix send LE flow credits in ACL link ==================== Link: https://patch.msgid.link/20260325194358.618892-1-luiz.dentz@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-26 13:46:55 +01:00
David Carlier	8f15b5071b	netfilter: ctnetlink: use netlink policy range checks Replace manual range and mask validations with netlink policy annotations in ctnetlink code paths, so that the netlink core rejects invalid values early and can generate extack errors. - CTA_PROTOINFO_TCP_STATE: reject values > TCP_CONNTRACK_SYN_SENT2 at policy level, removing the manual >= TCP_CONNTRACK_MAX check. - CTA_PROTOINFO_TCP_WSCALE_ORIGINAL/REPLY: reject values > TCP_MAX_WSCALE (14). The normal TCP option parsing path already clamps to this value, but the ctnetlink path accepted 0-255, causing undefined behavior when used as a u32 shift count. - CTA_FILTER_ORIG_FLAGS/REPLY_FLAGS: use NLA_POLICY_MASK with CTA_FILTER_F_ALL, removing the manual mask checks. - CTA_EXPECT_FLAGS: use NLA_POLICY_MASK with NF_CT_EXPECT_MASK, adding a new mask define grouping all valid expect flags. Extracted from a broader nf-next patch by Florian Westphal, scoped to ctnetlink for the fixes tree. Fixes: `c8e2078cfe` ("[NETFILTER]: ctnetlink: add support for internal tcp connection tracking flags handling") Signed-off-by: David Carlier <devnexen@gmail.com> Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-03-26 13:28:17 +01:00
Weiming Shi	6a2b724460	netfilter: nf_conntrack_sip: fix use of uninitialized rtp_addr in process_sdp process_sdp() declares union nf_inet_addr rtp_addr on the stack and passes it to the nf_nat_sip sdp_session hook after walking the SDP media descriptions. However rtp_addr is only initialized inside the media loop when a recognized media type with a non-zero port is found. If the SDP body contains no m= lines, only inactive media sections (m=audio 0 ...) or only unrecognized media types, rtp_addr is never assigned. Despite that, the function still calls hooks->sdp_session() with &rtp_addr, causing nf_nat_sdp_session() to format the stale stack value as an IP address and rewrite the SDP session owner and connection lines with it. With CONFIG_INIT_STACK_ALL_ZERO (default on most distributions) this results in the session-level o= and c= addresses being rewritten to 0.0.0.0 for inactive SDP sessions. Without stack auto-init the rewritten address is whatever happened to be on the stack. Fix this by pre-initializing rtp_addr from the session-level connection address (caddr) when available, and tracking via a have_rtp_addr flag whether any valid address was established. Skip the sdp_session hook entirely when no valid address exists. Fixes: `4ab9e64e5e` ("[NETFILTER]: nf_nat_sip: split up SDP mangling") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-03-26 13:28:17 +01:00
Pablo Neira Ayuso	3db5647984	netfilter: nf_conntrack_expect: skip expectations in other netns via proc Skip expectations that do not reside in this netns. Similar to `e77e6ff502` ("netfilter: conntrack: do not dump other netns's conntrack entries via proc"). Fixes: `9b03f38d04` ("netfilter: netns nf_conntrack: per-netns expectations") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-03-26 13:28:03 +01:00
Pablo Neira Ayuso	02a3231b6d	netfilter: nf_conntrack_expect: store netns and zone in expectation __nf_ct_expect_find() and nf_ct_expect_find_get() are called under rcu_read_lock() but they dereference the master conntrack via exp->master. Since the expectation does not hold a reference on the master conntrack, this could be dying conntrack or different recycled conntrack than the real master due to SLAB_TYPESAFE_RCU. Store the netns, the master_tuple and the zone in struct nf_conntrack_expect as a safety measure. This patch is required by the follow up fix not to dump expectations that do not belong to this netns. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-03-26 13:24:40 +01:00
Pablo Neira Ayuso	bffcaad9af	netfilter: ctnetlink: ensure safe access to master conntrack Holding reference on the expectation is not sufficient, the master conntrack object can just go away, making exp->master invalid. To access exp->master safely: - Grab the nf_conntrack_expect_lock, this gets serialized with clean_from_lists() which also holds this lock when the master conntrack goes away. - Hold reference on master conntrack via nf_conntrack_find_get(). Not so easy since the master tuple to look up for the master conntrack is not available in the existing problematic paths. This patch goes for extending the nf_conntrack_expect_lock section to address this issue for simplicity, in the cases that are described below this is just slightly extending the lock section. The add expectation command already holds a reference to the master conntrack from ctnetlink_create_expect(). However, the delete expectation command needs to grab the spinlock before looking up for the expectation. Expand the existing spinlock section to address this to cover the expectation lookup. Note that, the nf_ct_expect_iterate_net() calls already grabs the spinlock while iterating over the expectation table, which is correct. The get expectation command needs to grab the spinlock to ensure master conntrack does not go away. This also expands the existing spinlock section to cover the expectation lookup too. I needed to move the netlink skb allocation out of the spinlock to keep it GFP_KERNEL. For the expectation events, the IPEXP_DESTROY event is already delivered under the spinlock, just move the delivery of IPEXP_NEW under the spinlock too because the master conntrack event cache is reached through exp->master. While at it, add lockdep notations to help identify what codepaths need to grab the spinlock. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-03-26 13:18:32 +01:00
Pablo Neira Ayuso	f017941060	netfilter: nf_conntrack_expect: use expect->helper Use expect->helper in ctnetlink and /proc to dump the helper name. Using nfct_help() without holding a reference to the master conntrack is unsafe. Use exp->master->helper in ctnetlink path if userspace does not provide an explicit helper when creating an expectation to retain the existing behaviour. The ctnetlink expectation path holds the reference on the master conntrack and nf_conntrack_expect lock and the nfnetlink glue path refers to the master ct that is attached to the skb. Reported-by: Hyunwoo Kim <imv4bel@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-03-26 13:18:31 +01:00
Pablo Neira Ayuso	9c42bc9db9	netfilter: nf_conntrack_expect: honor expectation helper field The expectation helper field is mostly unused. As a result, the netfilter codebase relies on accessing the helper through exp->master. Always set on the expectation helper field so it can be used to reach the helper. nf_ct_expect_init() is called from packet path where the skb owns the ct object, therefore accessing exp->master for the newly created expectation is safe. This saves a lot of updates in all callsites to pass the ct object as parameter to nf_ct_expect_init(). This is a preparation patches for follow up fixes. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-03-26 13:18:31 +01:00
Pablo Neira Ayuso	fafdd92b9e	netfilter: nft_set_rbtree: revisit array resize logic Chris Arges reports high memory consumption with thousands of containers, this patch revisits the array allocation logic. For anonymous sets, start by 16 slots (which takes 256 bytes on x86_64). Expand it by x2 until threshold of 512 slots is reached, over that threshold, expand it by x1.5. For non-anonymous set, start by 1024 slots in the array (which takes 16 Kbytes initially on x86_64). Expand it by x1.5. Use set->ndeact to subtract deactivated elements when calculating the number of the slots in the array, otherwise the array size array gets increased artifically. Add special case shrink logic to deal with flush set too. The shrink logic is skipped by anonymous sets. Use check_add_overflow() to calculate the new array size. Add a WARN_ON_ONCE check to make sure elements fit into the new array size. Reported-by: Chris Arges <carges@cloudflare.com> Fixes: `7e43e0a114` ("netfilter: nft_set_rbtree: translate rbtree to array for binary search") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-03-26 13:18:31 +01:00

1 2 3 4 5 ...

1429054 Commits