linux-net

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/ synced 2026-04-03 23:37:40 -04:00

Author	SHA1	Message	Date
Yiqi Sun	fde29fd934	ipv4: icmp: fix null-ptr-deref in icmp_build_probe() ipv6_stub->ipv6_dev_find() may return ERR_PTR(-EAFNOSUPPORT) when the IPv6 stack is not active (CONFIG_IPV6=m and not loaded), and passing this error pointer to dev_hold() will cause a kernel crash with null-ptr-deref. Instead, silently discard the request. RFC 8335 does not appear to define a specific response for the case where an IPv6 interface identifier is syntactically valid but the implementation cannot perform the lookup at runtime, and silently dropping the request may safer than misreporting "No Such Interface". Fixes: `d329ea5bd8` ("icmp: add response to RFC 8335 PROBE messages") Signed-off-by: Yiqi Sun <sunyiqixm@gmail.com> Link: https://patch.msgid.link/20260402070419.2291578-1-sunyiqixm@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 15:46:17 -07:00
Fernando Fernandez Mancera	14cf0cd353	ipv4: nexthop: allocate skb dynamically in rtm_get_nexthop() When querying a nexthop object via RTM_GETNEXTHOP, the kernel currently allocates a fixed-size skb using NLMSG_GOODSIZE. While sufficient for single nexthops and small Equal-Cost Multi-Path groups, this fixed allocation fails for large nexthop groups like 512 nexthops. This results in the following warning splat: WARNING: net/ipv4/nexthop.c:3395 at rtm_get_nexthop+0x176/0x1c0, CPU#20: rep/4608 [...] RIP: 0010:rtm_get_nexthop (net/ipv4/nexthop.c:3395) [...] Call Trace: <TASK> rtnetlink_rcv_msg (net/core/rtnetlink.c:6989) netlink_rcv_skb (net/netlink/af_netlink.c:2550) netlink_unicast (net/netlink/af_netlink.c:1319 net/netlink/af_netlink.c:1344) netlink_sendmsg (net/netlink/af_netlink.c:1894) ____sys_sendmsg (net/socket.c:721 net/socket.c:736 net/socket.c:2585) ___sys_sendmsg (net/socket.c:2641) __sys_sendmsg (net/socket.c:2671) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) </TASK> Fix this by allocating the size dynamically using nh_nlmsg_size() and using nlmsg_new(), this is consistent with nexthop_notify() behavior. In addition, adjust nh_nlmsg_size_grp() so it calculates the size needed based on flags passed. While at it, also add the size of NHA_FDB for nexthop group size calculation as it was missing too. This cannot be reproduced via iproute2 as the group size is currently limited and the command fails as follows: addattr_l ERROR: message exceeded bound of 1048 Fixes: `430a049190` ("nexthop: Add support for nexthop groups") Reported-by: Yiming Qian <yimingqian591@gmail.com> Closes: https://lore.kernel.org/netdev/CAL_bE8Li2h4KO+AQFXW4S6Yb_u5X4oSKnkywW+LPFjuErhqELA@mail.gmail.com/ Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260402072613.25262-2-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 15:34:27 -07:00
Fernando Fernandez Mancera	06aaf04ca8	ipv4: nexthop: avoid duplicate NHA_HW_STATS_ENABLE on nexthop group dump Currently NHA_HW_STATS_ENABLE is included twice everytime a dump of nexthop group is performed with NHA_OP_FLAG_DUMP_STATS. As all the stats querying were moved to nla_put_nh_group_stats(), leave only that instance of the attribute querying. Fixes: `5072ae00ae` ("net: nexthop: Expose nexthop group HW stats to user space") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260402072613.25262-1-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 15:34:27 -07:00
Oleh Konko	48a5fe3877	tipc: fix bc_ackers underflow on duplicate GRP_ACK_MSG The GRP_ACK_MSG handler in tipc_group_proto_rcv() currently decrements bc_ackers on every inbound group ACK, even when the same member has already acknowledged the current broadcast round. Because bc_ackers is a u16, a duplicate ACK received after the last legitimate ACK wraps the counter to 65535. Once wrapped, tipc_group_bc_cong() keeps reporting congestion and later group broadcasts on the affected socket stay blocked until the group is recreated. Fix this by ignoring duplicate or stale ACKs before touching bc_acked or bc_ackers. This makes repeated GRP_ACK_MSG handling idempotent and prevents the underflow path. Fixes: `2f487712b8` ("tipc: guarantee that group broadcast doesn't bypass group unicast") Cc: stable@vger.kernel.org Signed-off-by: Oleh Konko <security@1seal.org> Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/41a4833f368641218e444fdcff822039.security@1seal.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 15:31:17 -07:00
Nikolaos Gkarlis	7b735ef812	rtnetlink: add missing netlink_ns_capable() check for peer netns rtnl_newlink() lacks a CAP_NET_ADMIN capability check on the peer network namespace when creating paired devices (veth, vxcan, netkit). This allows an unprivileged user with a user namespace to create interfaces in arbitrary network namespaces, including init_net. Add a netlink_ns_capable() check for CAP_NET_ADMIN in the peer namespace before allowing device creation to proceed. Fixes: `81adee47df` ("net: Support specifying the network namespace upon device creation.") Signed-off-by: Nikolaos Gkarlis <nickgarlis@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260402181432.4126920-1-nickgarlis@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 15:07:18 -07:00
Zijing Yin	1979645e18	bridge: guard local VLAN-0 FDB helpers against NULL vlan group When CONFIG_BRIDGE_VLAN_FILTERING is not set, br_vlan_group() and nbp_vlan_group() return NULL (br_private.h stub definitions). The BR_BOOLOPT_FDB_LOCAL_VLAN_0 toggle code is compiled unconditionally and reaches br_fdb_delete_locals_per_vlan_port() and br_fdb_insert_locals_per_vlan_port(), where the NULL vlan group pointer is dereferenced via list_for_each_entry(v, &vg->vlan_list, vlist). The observed crash is in the delete path, triggered when creating a bridge with IFLA_BR_MULTI_BOOLOPT containing BR_BOOLOPT_FDB_LOCAL_VLAN_0 via RTM_NEWLINK. The insert helper has the same bug pattern. Oops: general protection fault, probably for non-canonical address 0xdffffc0000000056: 0000 [#1] KASAN NOPTI KASAN: null-ptr-deref in range [0x00000000000002b0-0x00000000000002b7] RIP: 0010:br_fdb_delete_locals_per_vlan+0x2b9/0x310 Call Trace: br_fdb_toggle_local_vlan_0+0x452/0x4c0 br_toggle_fdb_local_vlan_0+0x31/0x80 net/bridge/br.c:276 br_boolopt_toggle net/bridge/br.c:313 br_boolopt_multi_toggle net/bridge/br.c:364 br_changelink net/bridge/br_netlink.c:1542 br_dev_newlink net/bridge/br_netlink.c:1575 Add NULL checks for the vlan group pointer in both helpers, returning early when there are no VLANs to iterate. This matches the existing pattern used by other bridge FDB functions such as br_fdb_add() and br_fdb_delete(). Fixes: `21446c06b4` ("net: bridge: Introduce UAPI for BR_BOOLOPT_FDB_LOCAL_VLAN_0") Signed-off-by: Zijing Yin <yzjaurora@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260402140153.3925663-1-yzjaurora@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 14:45:51 -07:00
Eric Dumazet	4e65a8b8da	ipv6: ioam: fix potential NULL dereferences in __ioam6_fill_trace_data() We need to check __in6_dev_get() for possible NULL value, as suggested by Yiming Qian. Also add skb_dst_dev_rcu() instead of skb_dst_dev(), and two missing READ_ONCE(). Note that @dev can't be NULL. Fixes: `9ee11f0fff` ("ipv6: ioam: Data plane support for Pre-allocated Trace") Reported-by: Yiming Qian <yimingqian591@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Justin Iurman <justin.iurman@gmail.com> Link: https://patch.msgid.link/20260402101732.1188059-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 14:44:43 -07:00
Ruide Cao	c842743d07	net: sched: act_csum: validate nested VLAN headers tcf_csum_act() walks nested VLAN headers directly from skb->data when an skb still carries in-payload VLAN tags. The current code reads vlan->h_vlan_encapsulated_proto and then pulls VLAN_HLEN bytes without first ensuring that the full VLAN header is present in the linear area. If only part of an inner VLAN header is linearized, accessing h_vlan_encapsulated_proto reads past the linear area, and the following skb_pull(VLAN_HLEN) may violate skb invariants. Fix this by requiring pskb_may_pull(skb, VLAN_HLEN) before accessing and pulling each nested VLAN header. If the header still is not fully available, drop the packet through the existing error path. Fixes: `2ecba2d1e4` ("net: sched: act_csum: Fix csum calc for tagged packets") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Tested-by: Ren Wei <enjou1224z@gmail.com> Signed-off-by: Ruide Cao <caoruide123@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/22df2fcb49f410203eafa5d97963dd36089f4ecf.1774892775.git.caoruide123@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-03 14:34:56 -07:00
Eric Dumazet	4e45337556	ipv6: avoid overflows in ip6_datagram_send_ctl() Yiming Qian reported : <quote> I believe I found a locally triggerable kernel bug in the IPv6 sendmsg ancillary-data path that can panic the kernel via `skb_under_panic()` (local DoS). The core issue is a mismatch between: - a 16-bit length accumulator (`struct ipv6_txoptions::opt_flen`, type `__u16`) and - a pointer to the last provided destination-options header (`opt->dst1opt`) when multiple `IPV6_DSTOPTS` control messages (cmsgs) are provided. - `include/net/ipv6.h`: - `struct ipv6_txoptions::opt_flen` is `__u16` (wrap possible). (lines 291-307, especially 298) - `net/ipv6/datagram.c:ip6_datagram_send_ctl()`: - Accepts repeated `IPV6_DSTOPTS` and accumulates into `opt_flen` without rejecting duplicates. (lines 909-933) - `net/ipv6/ip6_output.c:__ip6_append_data()`: - Uses `opt->opt_flen + opt->opt_nflen` to compute header sizes/headroom decisions. (lines 1448-1466, especially 1463-1465) - `net/ipv6/ip6_output.c:__ip6_make_skb()`: - Calls `ipv6_push_frag_opts()` if `opt->opt_flen` is non-zero. (lines 1930-1934) - `net/ipv6/exthdrs.c:ipv6_push_frag_opts()` / `ipv6_push_exthdr()`: - Push size comes from `ipv6_optlen(opt->dst1opt)` (based on the pointed-to header). (lines 1179-1185 and 1206-1211) 1. `opt_flen` is a 16-bit accumulator: - `include/net/ipv6.h:298` defines `__u16 opt_flen; /* after fragment hdr /`. 2. `ip6_datagram_send_ctl()` accepts repeated* `IPV6_DSTOPTS` cmsgs and increments `opt_flen` each time: - In `net/ipv6/datagram.c:909-933`, for `IPV6_DSTOPTS`: - It computes `len = ((hdr->hdrlen + 1) << 3);` - It checks `CAP_NET_RAW` using `ns_capable(net->user_ns, CAP_NET_RAW)`. (line 922) - Then it does: - `opt->opt_flen += len;` (line 927) - `opt->dst1opt = hdr;` (line 928) There is no duplicate rejection here (unlike the legacy `IPV6_2292DSTOPTS` path which rejects duplicates at `net/ipv6/datagram.c:901-904`). If enough large `IPV6_DSTOPTS` cmsgs are provided, `opt_flen` wraps while `dst1opt` still points to a large (2048-byte) destination-options header. In the attached PoC (`poc.c`): - 32 cmsgs with `hdrlen=255` => `len = (255+1)8 = 2048` - 1 cmsg with `hdrlen=0` => `len = 8` - Total increment: `322048 + 8 = 65544`, so `(__u16)opt_flen == 8` - The last cmsg is 2048 bytes, so `dst1opt` points to a 2048-byte header. 3. The transmit path sizes headers using the wrapped `opt_flen`: - In `net/ipv6/ip6_output.c:1463-1465`: - `headersize = sizeof(struct ipv6hdr) + (opt ? opt->opt_flen + opt->opt_nflen : 0) + ...;` With wrapped `opt_flen`, `headersize`/headroom decisions underestimate what will be pushed later. 4. When building the final skb, the actual push length comes from `dst1opt` and is not limited by wrapped `opt_flen`: - In `net/ipv6/ip6_output.c:1930-1934`: - `if (opt->opt_flen) proto = ipv6_push_frag_opts(skb, opt, proto);` - In `net/ipv6/exthdrs.c:1206-1211`, `ipv6_push_frag_opts()` pushes `dst1opt` via `ipv6_push_exthdr()`. - In `net/ipv6/exthdrs.c:1179-1184`, `ipv6_push_exthdr()` does: - `skb_push(skb, ipv6_optlen(opt));` - `memcpy(h, opt, ipv6_optlen(opt));` With insufficient headroom, `skb_push()` underflows and triggers `skb_under_panic()` -> `BUG()`: - `net/core/skbuff.c:2669-2675` (`skb_push()` calls `skb_under_panic()`) - `net/core/skbuff.c:207-214` (`skb_panic()` ends in `BUG()`) - The `IPV6_DSTOPTS` cmsg path requires `CAP_NET_RAW` in the target netns user namespace (`ns_capable(net->user_ns, CAP_NET_RAW)`). - Root (or any task with `CAP_NET_RAW`) can trigger this without user namespaces. - An unprivileged `uid=1000` user can trigger this if unprivileged user namespaces are enabled and it can create a userns+netns to obtain namespaced `CAP_NET_RAW` (the attached PoC does this). - Local denial of service: kernel BUG/panic (system crash). - Reproducible with a small userspace PoC. </quote> This patch does not reject duplicated options, as this might break some user applications. Instead, it makes sure to adjust opt_flen and opt_nflen to correctly reflect the size of the current option headers, preventing the overflows and the potential for panics. This applies to IPV6_DSTOPTS, IPV6_HOPOPTS, and IPV6_RTHDR. Specifically: When a new IPV6_DSTOPTS is processed, the length of the old opt->dst1opt is subtracted from opt->opt_flen before adding the new length. When a new IPV6_HOPOPTS is processed, the length of the old opt->dst0opt is subtracted from opt->opt_nflen. When a new Routing Header (IPV6_RTHDR or IPV6_2292RTHDR) is processed, the length of the old opt->srcrt is subtracted from opt->opt_nflen. In the special case within IPV6_2292RTHDR handling where dst1opt is moved to dst0opt, the length of the old opt->dst0opt is subtracted from opt->opt_nflen before the new one is added. Fixes: `333fad5364` ("[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542).") Reported-by: Yiming Qian <yimingqian591@gmail.com> Closes: https://lore.kernel.org/netdev/CAL_bE8JNzawgr5OX5m+3jnQDHry2XxhQT5=jThW1zDPtUikRYA@mail.gmail.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260401154721.3740056-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 08:25:22 -07:00
Luka Gejak	2e3514e63b	net: hsr: fix VLAN add unwind on slave errors When vlan_vid_add() fails for a secondary slave, the error path calls vlan_vid_del() on the failing port instead of the peer slave that had already succeeded. This results in asymmetric VLAN state across the HSR pair. Fix this by switching to a centralized unwind path that removes the VID from any slave device that was already programmed. Fixes: `1a8a63a530` ("net: hsr: Add VLAN CTAG filter support") Signed-off-by: Luka Gejak <luka.gejak@linux.dev> Link: https://patch.msgid.link/20260401092243.52121-3-luka.gejak@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 08:23:49 -07:00
Luka Gejak	f5df2990c3	net: hsr: serialize seq_blocks merge across nodes During node merging, hsr_handle_sup_frame() walks node_curr->seq_blocks to update node_real without holding node_curr->seq_out_lock. This allows concurrent mutations from duplicate registration paths, risking inconsistent state or XArray/bitmap corruption. Fix this by locking both nodes' seq_out_lock during the merge. To prevent ABBA deadlocks, locks are acquired in order of memory address. Reviewed-by: Felix Maurer <fmaurer@redhat.com> Fixes: `415e636751` ("hsr: Implement more robust duplicate discard for PRP") Signed-off-by: Luka Gejak <luka.gejak@linux.dev> Link: https://patch.msgid.link/20260401092243.52121-2-luka.gejak@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 08:23:49 -07:00
Stefano Garzarella	b18c833888	vsock: initialize child_ns_mode_locked in vsock_net_init() The `child_ns_mode_locked` field lives in `struct net`, which persists across vsock module reloads. When the module is unloaded and reloaded, `vsock_net_init()` resets `mode` and `child_ns_mode` back to their default values, but does not reset `child_ns_mode_locked`. The stale lock from the previous module load causes subsequent writes to `child_ns_mode` to silently fail: `vsock_net_set_child_mode()` sees the old lock, skips updating the actual value, and returns success when the requested mode matches the stale lock. The sysctl handler reports no error, but `child_ns_mode` remains unchanged. Steps to reproduce: $ modprobe vsock $ echo local > /proc/sys/net/vsock/child_ns_mode $ cat /proc/sys/net/vsock/child_ns_mode local $ modprobe -r vsock $ modprobe vsock $ echo local > /proc/sys/net/vsock/child_ns_mode $ cat /proc/sys/net/vsock/child_ns_mode global <--- expected "local" Fix this by initializing `child_ns_mode_locked` to 0 (unlocked) in `vsock_net_init()`, so the write-once mechanism works correctly after module reload. Fixes: `102eab95f0` ("vsock: lock down child_ns_mode as write-once") Reported-by: Jin Liu <jinl@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Link: https://patch.msgid.link/20260401092153.28462-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 08:18:56 -07:00
Xiang Mei	1a280dd4bd	net/sched: cls_flow: fix NULL pointer dereference on shared blocks flow_change() calls tcf_block_q() and dereferences q->handle to derive a default baseclass. Shared blocks leave block->q NULL, causing a NULL deref when a flow filter without a fully qualified baseclass is created on a shared block. Check tcf_block_shared() before accessing block->q and return -EINVAL for shared blocks. This avoids the null-deref shown below: ======================================================================= KASAN: null-ptr-deref in range [0x0000000000000038-0x000000000000003f] RIP: 0010:flow_change (net/sched/cls_flow.c:508) Call Trace: tc_new_tfilter (net/sched/cls_api.c:2432) rtnetlink_rcv_msg (net/core/rtnetlink.c:6980) [...] ======================================================================= Fixes: `1abf272022` ("net: sched: tcindex, fw, flow: use tcf_block_q helper to get struct Qdisc") Reported-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Xiang Mei <xmei5@asu.edu> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260331050217.504278-2-xmei5@asu.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-02 15:08:42 +02:00
Xiang Mei	faeea8bbf6	net/sched: cls_fw: fix NULL pointer dereference on shared blocks The old-method path in fw_classify() calls tcf_block_q() and dereferences q->handle. Shared blocks leave block->q NULL, causing a NULL deref when an empty cls_fw filter is attached to a shared block and a packet with a nonzero major skb mark is classified. Reject the configuration in fw_change() when the old method (no TCA_OPTIONS) is used on a shared block, since fw_classify()'s old-method path needs block->q which is NULL for shared blocks. The fixed null-ptr-deref calling stack: KASAN: null-ptr-deref in range [0x0000000000000038-0x000000000000003f] RIP: 0010:fw_classify (net/sched/cls_fw.c:81) Call Trace: tcf_classify (./include/net/tc_wrapper.h:197 net/sched/cls_api.c:1764 net/sched/cls_api.c:1860) tc_run (net/core/dev.c:4401) __dev_queue_xmit (net/core/dev.c:4535 net/core/dev.c:4790) Fixes: `1abf272022` ("net: sched: tcindex, fw, flow: use tcf_block_q helper to get struct Qdisc") Reported-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Xiang Mei <xmei5@asu.edu> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260331050217.504278-1-xmei5@asu.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-02 15:08:41 +02:00
Martin Schiller	a1822cb524	net/x25: Fix overflow when accumulating packets Add a check to ensure that `x25_sock.fraglen` does not overflow. The `fraglen` also needs to be resetted when purging `fragment_queue` in `x25_clear_queues()`. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Suggested-by: Yiming Qian <yimingqian591@gmail.com> Signed-off-by: Martin Schiller <ms@dev.tdt.de> Link: https://patch.msgid.link/20260331-x25_fraglen-v4-2-3e69f18464b4@dev.tdt.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-02 13:36:08 +02:00
Martin Schiller	d10a26aa4d	net/x25: Fix potential double free of skb When alloc_skb fails in x25_queue_rx_frame it calls kfree_skb(skb) at line 48 and returns 1 (error). This error propagates back through the call chain: x25_queue_rx_frame returns 1 \| v x25_state3_machine receives the return value 1 and takes the else branch at line 278, setting queued=0 and returning 0 \| v x25_process_rx_frame returns queued=0 \| v x25_backlog_rcv at line 452 sees queued=0 and calls kfree_skb(skb) again This would free the same skb twice. Looking at x25_backlog_rcv: net/x25/x25_in.c:x25_backlog_rcv() { ... queued = x25_process_rx_frame(sk, skb); ... if (!queued) kfree_skb(skb); } Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Martin Schiller <ms@dev.tdt.de> Link: https://patch.msgid.link/20260331-x25_fraglen-v4-1-3e69f18464b4@dev.tdt.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-02 13:36:08 +02:00
Yucheng Lu	d64cb81dcb	net/sched: sch_netem: fix out-of-bounds access in packet corruption In netem_enqueue(), the packet corruption logic uses get_random_u32_below(skb_headlen(skb)) to select an index for modifying skb->data. When an AF_PACKET TX_RING sends fully non-linear packets over an IPIP tunnel, skb_headlen(skb) evaluates to 0. Passing 0 to get_random_u32_below() takes the variable-ceil slow path which returns an unconstrained 32-bit random integer. Using this unconstrained value as an offset into skb->data results in an out-of-bounds memory access. Fix this by verifying skb_headlen(skb) is non-zero before attempting to corrupt the linear data area. Fully non-linear packets will silently bypass the corruption logic. Fixes: `c865e5d99e` ("[PKT_SCHED] netem: packet corruption option") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Signed-off-by: Yuan Tan <tanyuan98@outlook.com> Signed-off-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Yuhang Zheng <z1652074432@gmail.com> Signed-off-by: Yucheng Lu <kanolyc@gmail.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Link: https://patch.msgid.link/45435c0935df877853a81e6d06205ac738ec65fa.1774941614.git.kanolyc@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-01 19:24:20 -07:00
Jakub Kicinski	aba53ccf05	Merge tag 'nf-26-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net. Note that most of the bugs fixed here are >5 years old. The large PR is not due to an increase in regressions. 1) Flowtable hardware offload support in IPv6 can lead to out-of-bounds when populating the rule action array when combined with double-tagged vlan. Bump the maximum number of actions from 16 to 24 and check that such limit is never reached, otherwise bail out. This bugs stems from the original flowtable hardware offload support. 2) nfnetlink_log does not include the netlink header size of the trailing NLMSG_DONE message when calculating the skb size. From Florian Westphal. 3) Reject names in xt_cgroup and xt_rateest extensions which are not nul-terminated. Also from Florian. 4) Use nla_strcmp in ipset lookup by set name, since IPSET_ATTR_NAME and IPSET_ATTR_NAMEREF are of NLA_STRING type. From Florian Westphal. 5) When unregistering conntrack helpers, pass the helper that is going away so the expectation cleanup is done accordingly, otherwise UaF is possible when accessing expectation that refer to the helper that is gone. From Qi Tang. 6) Zero expectation NAT fields to address leaking kernel memory through the expectation netlink dump when unset. Also from Qi Tang. 7) Use the master conntrack helper when creating expectations via ctnetlink, ignore the suggested helper through CTA_EXPECT_HELP_NAME. This allows to address a possible read of kernel memory off the expectation object boundary. 8) Fix incorrect release of the hash bucket logic in ipset when the bucket is empty, leading to shrinking the hash bucket to size 0 which deals to out-of-bound write in next element additions. From Yifan Wu. 9) Allow the use of x_tables extensions that explicitly declare NFPROTO_ARP support only. This is to avoid an incorrect hook number validation due to non-overlapping arp and inet hook number definitions. 10) Reject immediate NF_QUEUE verdict in nf_tables. The userspace nft tool always uses the nft_queue expression for queueing. This ensures this verdict cannot be used for the arp family, which does supported this. * tag 'nf-26-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: reject immediate NF_QUEUE verdict netfilter: x_tables: restrict xt_check_match/xt_check_target extensions for NFPROTO_ARP netfilter: ipset: drop logically empty buckets in mtype_del netfilter: ctnetlink: ignore explicit helper on new expectations netfilter: ctnetlink: zero expect NAT fields when CTA_EXPECT_NAT absent netfilter: nf_conntrack_helper: pass helper to expect cleanup netfilter: ipset: use nla_strcmp for IPSET_ATTR_NAME attr netfilter: x_tables: ensure names are nul-terminated netfilter: nfnetlink_log: account for netlink header size netfilter: flowtable: strictly check for maximum number of actions ==================== Link: https://patch.msgid.link/20260401103646.1015423-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-01 19:19:36 -07:00
Jakub Kicinski	6d6be7070e	Merge tag 'for-net-2026-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - hci_sync: Fix UAF in le_read_features_complete - hci_sync: call destroy in hci_cmd_sync_run if immediate - hci_sync: hci_cmd_sync_queue_once() return -EEXIST if exists - hci_sync: fix leaks when hci_cmd_sync_queue_once fails - hci_sync: fix stack buffer overflow in hci_le_big_create_sync - hci_conn: fix potential UAF in set_cig_params_sync - hci_event: fix potential UAF in hci_le_remote_conn_param_req_evt - hci_event: move wake reason storage into validated event handlers - SMP: force responder MITM requirements before building the pairing response - SMP: derive legacy responder STK authentication from MITM state - MGMT: validate LTK enc_size on load - MGMT: validate mesh send advertising payload length - SCO: fix race conditions in sco_sock_connect() - hci_h4: Fix race during initialization * tag 'for-net-2026-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: hci_sync: fix stack buffer overflow in hci_le_big_create_sync Bluetooth: SMP: derive legacy responder STK authentication from MITM state Bluetooth: SMP: force responder MITM requirements before building the pairing response Bluetooth: MGMT: validate mesh send advertising payload length Bluetooth: hci_event: fix potential UAF in hci_le_remote_conn_param_req_evt Bluetooth: hci_conn: fix potential UAF in set_cig_params_sync Bluetooth: MGMT: validate LTK enc_size on load Bluetooth: hci_h4: Fix race during initialization Bluetooth: hci_sync: Fix UAF in le_read_features_complete Bluetooth: hci_sync: fix leaks when hci_cmd_sync_queue_once fails Bluetooth: hci_sync: hci_cmd_sync_queue_once() return -EEXIST if exists Bluetooth: hci_event: move wake reason storage into validated event handlers Bluetooth: SCO: fix race conditions in sco_sock_connect() Bluetooth: hci_sync: call destroy in hci_cmd_sync_run if immediate ==================== Link: https://patch.msgid.link/20260401205834.2189162-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-01 19:12:23 -07:00
Weiming Shi	a54ecccfae	rds: ib: reject FRMR registration before IB connection is established rds_ib_get_mr() extracts the rds_ib_connection from conn->c_transport_data and passes it to rds_ib_reg_frmr() for FRWR memory registration. On a fresh outgoing connection, ic is allocated in rds_ib_conn_alloc() with i_cm_id = NULL because the connection worker has not yet called rds_ib_conn_path_connect() to create the rdma_cm_id. When sendmsg() with RDS_CMSG_RDMA_MAP is called on such a connection, the sendmsg path parses the control message before any connection establishment, allowing rds_ib_post_reg_frmr() to dereference ic->i_cm_id->qp and crash the kernel. The existing guard in rds_ib_reg_frmr() only checks for !ic (added in commit `9e630bcb77`), which does not catch this case since ic is allocated early and is always non-NULL once the connection object exists. KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017] RIP: 0010:rds_ib_post_reg_frmr+0x50e/0x920 Call Trace: rds_ib_post_reg_frmr (net/rds/ib_frmr.c:167) rds_ib_map_frmr (net/rds/ib_frmr.c:252) rds_ib_reg_frmr (net/rds/ib_frmr.c:430) rds_ib_get_mr (net/rds/ib_rdma.c:615) __rds_rdma_map (net/rds/rdma.c:295) rds_cmsg_rdma_map (net/rds/rdma.c:860) rds_sendmsg (net/rds/send.c:1363) ____sys_sendmsg do_syscall_64 Add a check in rds_ib_get_mr() that verifies ic, i_cm_id, and qp are all non-NULL before proceeding with FRMR registration, mirroring the guard already present in rds_ib_post_inv(). Return -ENODEV when the connection is not ready, which the existing error handling in rds_cmsg_send() converts to -EAGAIN for userspace retry and triggers rds_conn_connect_if_down() to start the connection worker. Fixes: `1659185fb4` ("RDS: IB: Support Fastreg MR (FRMR) memory registration mode") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Allison Henderson <achender@kernel.org> Link: https://patch.msgid.link/20260330163237.2752440-2-bestswngs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-01 17:52:40 -07:00
Hangbin Liu	ffb5a4843c	ipv6: fix data race in fib6_metric_set() using cmpxchg fib6_metric_set() may be called concurrently from softirq context without holding the FIB table lock. A typical path is: ndisc_router_discovery() spin_unlock_bh(&table->tb6_lock) <- lock released fib6_metric_set(rt, RTAX_HOPLIMIT, ...) <- lockless call When two CPUs process Router Advertisement packets for the same router simultaneously, they can both arrive at fib6_metric_set() with the same fib6_info pointer whose fib6_metrics still points to dst_default_metrics. if (f6i->fib6_metrics == &dst_default_metrics) { /* both CPUs: true / struct dst_metrics p = kzalloc_obj(p, GFP_ATOMIC); refcount_set(&p->refcnt, 1); f6i->fib6_metrics = p; / CPU1 overwrites CPU0's p -> p0 leaked */ } The dst_metrics allocated by the losing CPU has refcnt=1 but no pointer to it anywhere in memory, producing a kmemleak report: unreferenced object 0xff1100025aca1400 (size 96): comm "softirq", pid 0, jiffies 4299271239 backtrace: kmalloc_trace+0x28a/0x380 fib6_metric_set+0xcd/0x180 ndisc_router_discovery+0x12dc/0x24b0 icmpv6_rcv+0xc16/0x1360 Fix this by: - Set val for p->metrics before published via cmpxchg() so the metrics value is ready before the pointer becomes visible to other CPUs. - Replace the plain pointer store with cmpxchg() and free the allocation safely when competition failed. - Add READ_ONCE()/WRITE_ONCE() for metrics[] setting in the non-default metrics path to prevent compiler-based data races. Fixes: `d4ead6b34b` ("net/ipv6: move metrics from dst to rt6_info") Reported-by: Fei Liu <feliu@redhat.com> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260331-b4-fib6_metric_set-kmemleak-v3-1-88d27f4d8825@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-01 17:44:35 -07:00
hkbinbin	bc39a09473	Bluetooth: hci_sync: fix stack buffer overflow in hci_le_big_create_sync hci_le_big_create_sync() uses DEFINE_FLEX to allocate a struct hci_cp_le_big_create_sync on the stack with room for 0x11 (17) BIS entries. However, conn->num_bis can hold up to HCI_MAX_ISO_BIS (31) entries — validated against ISO_MAX_NUM_BIS (0x1f) in the caller hci_conn_big_create_sync(). When conn->num_bis is between 18 and 31, the memcpy that copies conn->bis into cp->bis writes up to 14 bytes past the stack buffer, corrupting adjacent stack memory. This is trivially reproducible: binding an ISO socket with bc_num_bis = ISO_MAX_NUM_BIS (31) and calling listen() will eventually trigger hci_le_big_create_sync() from the HCI command sync worker, causing a KASAN-detectable stack-out-of-bounds write: BUG: KASAN: stack-out-of-bounds in hci_le_big_create_sync+0x256/0x3b0 Write of size 31 at addr ffffc90000487b48 by task kworker/u9:0/71 Fix this by changing the DEFINE_FLEX count from the incorrect 0x11 to HCI_MAX_ISO_BIS, which matches the maximum number of BIS entries that conn->bis can actually carry. Fixes: `42ecf19471` ("Bluetooth: ISO: Do not emit LE BIG Create Sync if previous is pending") Cc: stable@vger.kernel.org Signed-off-by: hkbinbin <hkbinbinbin@gmail.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-04-01 16:48:28 -04:00
Oleh Konko	20756fec2f	Bluetooth: SMP: derive legacy responder STK authentication from MITM state The legacy responder path in smp_random() currently labels the stored STK as authenticated whenever pending_sec_level is BT_SECURITY_HIGH. That reflects what the local service requested, not what the pairing flow actually achieved. For Just Works/Confirm legacy pairing, SMP_FLAG_MITM_AUTH stays clear and the resulting STK should remain unauthenticated even if the local side requested HIGH security. Use the established MITM state when storing the responder STK so the key metadata matches the pairing result. This also keeps the legacy path aligned with the Secure Connections code, which already treats JUST_WORKS/JUST_CFM as unauthenticated. Fixes: `fff3490f47` ("Bluetooth: Fix setting correct authentication information for SMP STK") Cc: stable@vger.kernel.org Signed-off-by: Oleh Konko <security@1seal.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-04-01 16:48:06 -04:00
Oleh Konko	d05111bfe3	Bluetooth: SMP: force responder MITM requirements before building the pairing response smp_cmd_pairing_req() currently builds the pairing response from the initiator auth_req before enforcing the local BT_SECURITY_HIGH requirement. If the initiator omits SMP_AUTH_MITM, the response can also omit it even though the local side still requires MITM. tk_request() then sees an auth value without SMP_AUTH_MITM and may select JUST_CFM, making method selection inconsistent with the pairing policy the responder already enforces. When the local side requires HIGH security, first verify that MITM can be achieved from the IO capabilities and then force SMP_AUTH_MITM in the response in both rsp.auth_req and auth. This keeps the responder auth bits and later method selection aligned. Fixes: `2b64d153a0` ("Bluetooth: Add MITM mechanism to LE-SMP") Cc: stable@vger.kernel.org Suggested-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Signed-off-by: Oleh Konko <security@1seal.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-04-01 16:47:43 -04:00
Keenan Dong	bda93eec78	Bluetooth: MGMT: validate mesh send advertising payload length mesh_send() currently bounds MGMT_OP_MESH_SEND by total command length, but it never verifies that the bytes supplied for the flexible adv_data[] array actually match the embedded adv_data_len field. MGMT_MESH_SEND_SIZE only covers the fixed header, so a truncated command can still pass the existing 20..50 byte range check and later drive the async mesh send path past the end of the queued command buffer. Keep rejecting zero-length and oversized advertising payloads, but validate adv_data_len explicitly and require the command length to exactly match the flexible array size before queueing the request. Fixes: `b338d91703` ("Bluetooth: Implement support for Mesh") Reported-by: Keenan Dong <keenanat2000@gmail.com> Signed-off-by: Keenan Dong <keenanat2000@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-04-01 16:47:19 -04:00
Pauli Virtanen	b255531b27	Bluetooth: hci_event: fix potential UAF in hci_le_remote_conn_param_req_evt hci_conn lookup and field access must be covered by hdev lock in hci_le_remote_conn_param_req_evt, otherwise it's possible it is freed concurrently. Extend the hci_dev_lock critical section to cover all conn usage. Fixes: `95118dd4ed` ("Bluetooth: hci_event: Use of a function table to handle LE subevents") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-04-01 16:46:55 -04:00
Pauli Virtanen	a2639a7f0f	Bluetooth: hci_conn: fix potential UAF in set_cig_params_sync hci_conn lookup and field access must be covered by hdev lock in set_cig_params_sync, otherwise it's possible it is freed concurrently. Take hdev lock to prevent hci_conn from being deleted or modified concurrently. Just RCU lock is not suitable here, as we also want to avoid "tearing" in the configuration. Fixes: `a091289218` ("Bluetooth: hci_conn: Fix hci_le_set_cig_params") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-04-01 16:46:33 -04:00
Keenan Dong	b8dbe9648d	Bluetooth: MGMT: validate LTK enc_size on load Load Long Term Keys stores the user-provided enc_size and later uses it to size fixed-size stack operations when replying to LE LTK requests. An enc_size larger than the 16-byte key buffer can therefore overflow the reply stack buffer. Reject oversized enc_size values while validating the management LTK record so invalid keys never reach the stored key state. Fixes: `346af67b8d` ("Bluetooth: Add MGMT handlers for dealing with SMP LTK's") Reported-by: Keenan Dong <keenanat2000@gmail.com> Signed-off-by: Keenan Dong <keenanat2000@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-04-01 16:46:09 -04:00
Luiz Augusto von Dentz	035c25007c	Bluetooth: hci_sync: Fix UAF in le_read_features_complete This fixes the following backtrace caused by hci_conn being freed before le_read_features_complete but after hci_le_read_remote_features_sync so hci_conn_del -> hci_cmd_sync_dequeue is not able to prevent it: ================================================================== BUG: KASAN: slab-use-after-free in instrument_atomic_read_write include/linux/instrumented.h:96 [inline] BUG: KASAN: slab-use-after-free in atomic_dec_and_test include/linux/atomic/atomic-instrumented.h:1383 [inline] BUG: KASAN: slab-use-after-free in hci_conn_drop include/net/bluetooth/hci_core.h:1688 [inline] BUG: KASAN: slab-use-after-free in le_read_features_complete+0x5b/0x340 net/bluetooth/hci_sync.c:7344 Write of size 4 at addr ffff8880796b0010 by task kworker/u9:0/52 CPU: 0 UID: 0 PID: 52 Comm: kworker/u9:0 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 Workqueue: hci0 hci_cmd_sync_work Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xcd/0x630 mm/kasan/report.c:482 kasan_report+0xe0/0x110 mm/kasan/report.c:595 check_region_inline mm/kasan/generic.c:194 [inline] kasan_check_range+0x100/0x1b0 mm/kasan/generic.c:200 instrument_atomic_read_write include/linux/instrumented.h:96 [inline] atomic_dec_and_test include/linux/atomic/atomic-instrumented.h:1383 [inline] hci_conn_drop include/net/bluetooth/hci_core.h:1688 [inline] le_read_features_complete+0x5b/0x340 net/bluetooth/hci_sync.c:7344 hci_cmd_sync_work+0x1ff/0x430 net/bluetooth/hci_sync.c:334 process_one_work+0x9ba/0x1b20 kernel/workqueue.c:3257 process_scheduled_works kernel/workqueue.c:3340 [inline] worker_thread+0x6c8/0xf10 kernel/workqueue.c:3421 kthread+0x3c5/0x780 kernel/kthread.c:463 ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246 </TASK> Allocated by task 5932: kasan_save_stack+0x33/0x60 mm/kasan/common.c:56 kasan_save_track+0x14/0x30 mm/kasan/common.c:77 poison_kmalloc_redzone mm/kasan/common.c:400 [inline] __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:417 kmalloc_noprof include/linux/slab.h:957 [inline] kzalloc_noprof include/linux/slab.h:1094 [inline] __hci_conn_add+0xf8/0x1c70 net/bluetooth/hci_conn.c:963 hci_conn_add_unset+0x76/0x100 net/bluetooth/hci_conn.c:1084 le_conn_complete_evt+0x639/0x1f20 net/bluetooth/hci_event.c:5714 hci_le_enh_conn_complete_evt+0x23d/0x380 net/bluetooth/hci_event.c:5861 hci_le_meta_evt+0x357/0x5e0 net/bluetooth/hci_event.c:7408 hci_event_func net/bluetooth/hci_event.c:7716 [inline] hci_event_packet+0x685/0x11c0 net/bluetooth/hci_event.c:7773 hci_rx_work+0x2c9/0xeb0 net/bluetooth/hci_core.c:4076 process_one_work+0x9ba/0x1b20 kernel/workqueue.c:3257 process_scheduled_works kernel/workqueue.c:3340 [inline] worker_thread+0x6c8/0xf10 kernel/workqueue.c:3421 kthread+0x3c5/0x780 kernel/kthread.c:463 ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246 Freed by task 5932: kasan_save_stack+0x33/0x60 mm/kasan/common.c:56 kasan_save_track+0x14/0x30 mm/kasan/common.c:77 __kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:587 kasan_save_free_info mm/kasan/kasan.h:406 [inline] poison_slab_object mm/kasan/common.c:252 [inline] __kasan_slab_free+0x5f/0x80 mm/kasan/common.c:284 kasan_slab_free include/linux/kasan.h:234 [inline] slab_free_hook mm/slub.c:2540 [inline] slab_free mm/slub.c:6663 [inline] kfree+0x2f8/0x6e0 mm/slub.c:6871 device_release+0xa4/0x240 drivers/base/core.c:2565 kobject_cleanup lib/kobject.c:689 [inline] kobject_release lib/kobject.c:720 [inline] kref_put include/linux/kref.h:65 [inline] kobject_put+0x1e7/0x590 lib/kobject.c:737 put_device drivers/base/core.c:3797 [inline] device_unregister+0x2f/0xc0 drivers/base/core.c:3920 hci_conn_del_sysfs+0xb4/0x180 net/bluetooth/hci_sysfs.c:79 hci_conn_cleanup net/bluetooth/hci_conn.c:173 [inline] hci_conn_del+0x657/0x1180 net/bluetooth/hci_conn.c:1234 hci_disconn_complete_evt+0x410/0xa00 net/bluetooth/hci_event.c:3451 hci_event_func net/bluetooth/hci_event.c:7719 [inline] hci_event_packet+0xa10/0x11c0 net/bluetooth/hci_event.c:7773 hci_rx_work+0x2c9/0xeb0 net/bluetooth/hci_core.c:4076 process_one_work+0x9ba/0x1b20 kernel/workqueue.c:3257 process_scheduled_works kernel/workqueue.c:3340 [inline] worker_thread+0x6c8/0xf10 kernel/workqueue.c:3421 kthread+0x3c5/0x780 kernel/kthread.c:463 ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246 The buggy address belongs to the object at ffff8880796b0000 which belongs to the cache kmalloc-8k of size 8192 The buggy address is located 16 bytes inside of freed 8192-byte region [ffff8880796b0000, ffff8880796b2000) The buggy address belongs to the physical page: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x796b0 head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 anon flags: 0xfff00000000040(head\|node=0\|zone=1\|lastcpupid=0x7ff) page_type: f5(slab) raw: 00fff00000000040 ffff88813ff27280 0000000000000000 0000000000000001 raw: 0000000000000000 0000000000020002 00000000f5000000 0000000000000000 head: 00fff00000000040 ffff88813ff27280 0000000000000000 0000000000000001 head: 0000000000000000 0000000000020002 00000000f5000000 0000000000000000 head: 00fff00000000003 ffffea0001e5ac01 00000000ffffffff 00000000ffffffff head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd2040(__GFP_IO\|__GFP_NOWARN\|__GFP_NORETRY\|__GFP_COMP\|__GFP_NOMEMALLOC), pid 5657, tgid 5657 (dhcpcd-run-hook), ts 79819636908, free_ts 79814310558 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x1af/0x220 mm/page_alloc.c:1845 prep_new_page mm/page_alloc.c:1853 [inline] get_page_from_freelist+0xd0b/0x31a0 mm/page_alloc.c:3879 __alloc_frozen_pages_noprof+0x25f/0x2440 mm/page_alloc.c:5183 alloc_pages_mpol+0x1fb/0x550 mm/mempolicy.c:2416 alloc_slab_page mm/slub.c:3075 [inline] allocate_slab mm/slub.c:3248 [inline] new_slab+0x2c3/0x430 mm/slub.c:3302 ___slab_alloc+0xe18/0x1c90 mm/slub.c:4651 __slab_alloc.constprop.0+0x63/0x110 mm/slub.c:4774 __slab_alloc_node mm/slub.c:4850 [inline] slab_alloc_node mm/slub.c:5246 [inline] __kmalloc_cache_noprof+0x477/0x800 mm/slub.c:5766 kmalloc_noprof include/linux/slab.h:957 [inline] kzalloc_noprof include/linux/slab.h:1094 [inline] tomoyo_print_bprm security/tomoyo/audit.c:26 [inline] tomoyo_init_log+0xc8a/0x2140 security/tomoyo/audit.c:264 tomoyo_supervisor+0x302/0x13b0 security/tomoyo/common.c:2198 tomoyo_audit_env_log security/tomoyo/environ.c:36 [inline] tomoyo_env_perm+0x191/0x200 security/tomoyo/environ.c:63 tomoyo_environ security/tomoyo/domain.c:672 [inline] tomoyo_find_next_domain+0xec1/0x20b0 security/tomoyo/domain.c:888 tomoyo_bprm_check_security security/tomoyo/tomoyo.c:102 [inline] tomoyo_bprm_check_security+0x12d/0x1d0 security/tomoyo/tomoyo.c:92 security_bprm_check+0x1b9/0x1e0 security/security.c:794 search_binary_handler fs/exec.c:1659 [inline] exec_binprm fs/exec.c:1701 [inline] bprm_execve fs/exec.c:1753 [inline] bprm_execve+0x81e/0x1620 fs/exec.c:1729 do_execveat_common.isra.0+0x4a5/0x610 fs/exec.c:1859 page last free pid 5657 tgid 5657 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1394 [inline] __free_frozen_pages+0x7df/0x1160 mm/page_alloc.c:2901 discard_slab mm/slub.c:3346 [inline] __put_partials+0x130/0x170 mm/slub.c:3886 qlink_free mm/kasan/quarantine.c:163 [inline] qlist_free_all+0x4c/0xf0 mm/kasan/quarantine.c:179 kasan_quarantine_reduce+0x195/0x1e0 mm/kasan/quarantine.c:286 __kasan_slab_alloc+0x69/0x90 mm/kasan/common.c:352 kasan_slab_alloc include/linux/kasan.h:252 [inline] slab_post_alloc_hook mm/slub.c:4948 [inline] slab_alloc_node mm/slub.c:5258 [inline] __kmalloc_cache_noprof+0x274/0x800 mm/slub.c:5766 kmalloc_noprof include/linux/slab.h:957 [inline] tomoyo_print_header security/tomoyo/audit.c:156 [inline] tomoyo_init_log+0x197/0x2140 security/tomoyo/audit.c:255 tomoyo_supervisor+0x302/0x13b0 security/tomoyo/common.c:2198 tomoyo_audit_env_log security/tomoyo/environ.c:36 [inline] tomoyo_env_perm+0x191/0x200 security/tomoyo/environ.c:63 tomoyo_environ security/tomoyo/domain.c:672 [inline] tomoyo_find_next_domain+0xec1/0x20b0 security/tomoyo/domain.c:888 tomoyo_bprm_check_security security/tomoyo/tomoyo.c:102 [inline] tomoyo_bprm_check_security+0x12d/0x1d0 security/tomoyo/tomoyo.c:92 security_bprm_check+0x1b9/0x1e0 security/security.c:794 search_binary_handler fs/exec.c:1659 [inline] exec_binprm fs/exec.c:1701 [inline] bprm_execve fs/exec.c:1753 [inline] bprm_execve+0x81e/0x1620 fs/exec.c:1729 do_execveat_common.isra.0+0x4a5/0x610 fs/exec.c:1859 do_execve fs/exec.c:1933 [inline] __do_sys_execve fs/exec.c:2009 [inline] __se_sys_execve fs/exec.c:2004 [inline] __x64_sys_execve+0x8e/0xb0 fs/exec.c:2004 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 Memory state around the buggy address: ffff8880796aff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8880796aff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff8880796b0000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8880796b0080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880796b0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Fixes: `a106e50be7` ("Bluetooth: HCI: Add support for LL Extended Feature Set") Reported-by: syzbot+87badbb9094e008e0685@syzkaller.appspotmail.com Tested-by: syzbot+87badbb9094e008e0685@syzkaller.appspotmail.com Closes: https://syzbot.org/bug?extid=87badbb9094e008e0685 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Pauli Virtanen <pav@iki.fi>	2026-04-01 16:45:24 -04:00
Pauli Virtanen	aca377208e	Bluetooth: hci_sync: fix leaks when hci_cmd_sync_queue_once fails When hci_cmd_sync_queue_once() returns with error, the destroy callback will not be called. Fix leaking references / memory on these failures. Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-04-01 16:45:00 -04:00
Pauli Virtanen	2969554bcf	Bluetooth: hci_sync: hci_cmd_sync_queue_once() return -EEXIST if exists hci_cmd_sync_queue_once() needs to indicate whether a queue item was added, so caller can know if callbacks are called, so it can avoid leaking resources. Change the function to return -EEXIST if queue item already exists. Modify all callsites to handle that. Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-04-01 16:44:38 -04:00
Oleh Konko	2b2bf47cd7	Bluetooth: hci_event: move wake reason storage into validated event handlers hci_store_wake_reason() is called from hci_event_packet() immediately after stripping the HCI event header but before hci_event_func() enforces the per-event minimum payload length from hci_ev_table. This means a short HCI event frame can reach bacpy() before any bounds check runs. Rather than duplicating skb parsing and per-event length checks inside hci_store_wake_reason(), move wake-address storage into the individual event handlers after their existing event-length validation has succeeded. Convert hci_store_wake_reason() into a small helper that only stores an already-validated bdaddr while the caller holds hci_dev_lock(). Use the same helper after hci_event_func() with a NULL address to preserve the existing unexpected-wake fallback semantics when no validated event handler records a wake address. Annotate the helper with __must_hold(&hdev->lock) and add lockdep_assert_held(&hdev->lock) so future call paths keep the lock contract explicit. Call the helper from hci_conn_request_evt(), hci_conn_complete_evt(), hci_sync_conn_complete_evt(), le_conn_complete_evt(), hci_le_adv_report_evt(), hci_le_ext_adv_report_evt(), hci_le_direct_adv_report_evt(), hci_le_pa_sync_established_evt(), and hci_le_past_received_evt(). Fixes: `2f20216c1d` ("Bluetooth: Emit controller suspend and resume events") Cc: stable@vger.kernel.org Signed-off-by: Oleh Konko <security@1seal.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-04-01 16:44:15 -04:00
Cen Zhang	8a5b0135d4	Bluetooth: SCO: fix race conditions in sco_sock_connect() sco_sock_connect() checks sk_state and sk_type without holding the socket lock. Two concurrent connect() syscalls on the same socket can both pass the check and enter sco_connect(), leading to use-after-free. The buggy scenario involves three participants and was confirmed with additional logging instrumentation: Thread A (connect): HCI disconnect: Thread B (connect): sco_sock_connect(sk) sco_sock_connect(sk) sk_state==BT_OPEN sk_state==BT_OPEN (pass, no lock) (pass, no lock) sco_connect(sk): sco_connect(sk): hci_dev_lock hci_dev_lock hci_connect_sco <- blocked -> hcon1 sco_conn_add->conn1 lock_sock(sk) sco_chan_add: conn1->sk = sk sk->conn = conn1 sk_state=BT_CONNECT release_sock hci_dev_unlock hci_dev_lock sco_conn_del: lock_sock(sk) sco_chan_del: sk->conn=NULL conn1->sk=NULL sk_state= BT_CLOSED SOCK_ZAPPED release_sock hci_dev_unlock (unblocked) hci_connect_sco -> hcon2 sco_conn_add -> conn2 lock_sock(sk) sco_chan_add: sk->conn=conn2 sk_state= BT_CONNECT // zombie sk! release_sock hci_dev_unlock Thread B revives a BT_CLOSED + SOCK_ZAPPED socket back to BT_CONNECT. Subsequent cleanup triggers double sock_put() and use-after-free. Meanwhile conn1 is leaked as it was orphaned when sco_conn_del() cleared the association. Fix this by: - Moving lock_sock() before the sk_state/sk_type checks in sco_sock_connect() to serialize concurrent connect attempts - Fixing the sk_type != SOCK_SEQPACKET check to actually return the error instead of just assigning it - Adding a state re-check in sco_connect() after lock_sock() to catch state changes during the window between the locks - Adding sco_pi(sk)->conn check in sco_chan_add() to prevent double-attach of a socket to multiple connections - Adding hci_conn_drop() on sco_chan_add failure to prevent HCI connection leaks Fixes: `9a8ec9e8eb` ("Bluetooth: SCO: Fix possible circular locking dependency on sco_connect_cfm") Signed-off-by: Cen Zhang <zzzccc427@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-04-01 16:43:53 -04:00
Pauli Virtanen	a834a0b66e	Bluetooth: hci_sync: call destroy in hci_cmd_sync_run if immediate hci_cmd_sync_run() may run the work immediately if called from existing sync work (otherwise it queues a new sync work). In this case it fails to call the destroy() function. On immediate run, make it behave same way as if item was queued successfully: call destroy, and return 0. The only callsite is hci_abort_conn() via hci_cmd_sync_run_once(), and this changes its return value. However, its return value is not used except as the return value for hci_disconnect(), and nothing uses the return value of hci_disconnect(). Hence there should be no behavior change anywhere. Fixes: `c898f6d7b0` ("Bluetooth: hci_sync: Introduce hci_cmd_sync_run/hci_cmd_sync_run_once") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-04-01 16:39:43 -04:00
Pablo Neira Ayuso	da107398cb	netfilter: nf_tables: reject immediate NF_QUEUE verdict nft_queue is always used from userspace nftables to deliver the NF_QUEUE verdict. Immediately emitting an NF_QUEUE verdict is never used by the userspace nft tools, so reject immediate NF_QUEUE verdicts. The arp family does not provide queue support, but such an immediate verdict is still reachable. Globally reject NF_QUEUE immediate verdicts to address this issue. Fixes: `f342de4e2f` ("netfilter: nf_tables: reject QUEUE/DROP verdict parameters") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-04-01 11:55:30 +02:00
Pablo Neira Ayuso	3d5d488f11	netfilter: x_tables: restrict xt_check_match/xt_check_target extensions for NFPROTO_ARP Weiming Shi says: xt_match and xt_target structs registered with NFPROTO_UNSPEC can be loaded by any protocol family through nft_compat. When such a match/target sets .hooks to restrict which hooks it may run on, the bitmask uses NF_INET_* constants. This is only correct for families whose hook layout matches NF_INET_*: IPv4, IPv6, INET, and bridge all share the same five hooks (PRE_ROUTING ... POST_ROUTING). ARP only has three hooks (IN=0, OUT=1, FORWARD=2) with different semantics. Because NF_ARP_OUT == 1 == NF_INET_LOCAL_IN, the .hooks validation silently passes for the wrong reasons, allowing matches to run on ARP chains where the hook assumptions (e.g. state->in being set on input hooks) do not hold. This leads to NULL pointer dereferences; xt_devgroup is one concrete example: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000044: 0000 [#1] SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000220-0x0000000000000227] RIP: 0010:devgroup_mt+0xff/0x350 Call Trace: <TASK> nft_match_eval (net/netfilter/nft_compat.c:407) nft_do_chain (net/netfilter/nf_tables_core.c:285) nft_do_chain_arp (net/netfilter/nft_chain_filter.c:61) nf_hook_slow (net/netfilter/core.c:623) arp_xmit (net/ipv4/arp.c:666) </TASK> Kernel panic - not syncing: Fatal exception in interrupt Fix it by restricting arptables to NFPROTO_ARP extensions only. Note that arptables-legacy only supports: - arpt_CLASSIFY - arpt_mangle - arpt_MARK that provide explicit NFPROTO_ARP match/target declarations. Fixes: `9291747f11` ("netfilter: xtables: add device group match") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-04-01 11:55:29 +02:00
Yifan Wu	9862ef9ab0	netfilter: ipset: drop logically empty buckets in mtype_del mtype_del() counts empty slots below n->pos in k, but it only drops the bucket when both n->pos and k are zero. This misses buckets whose live entries have all been removed while n->pos still points past deleted slots. Treat a bucket as empty when all positions below n->pos are unused and release it directly instead of shrinking it further. Fixes: `8af1c6fbd9` ("netfilter: ipset: Fix forceadd evaluation path") Cc: stable@vger.kernel.org Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <dstsmallbird@foxmail.com> Signed-off-by: Yifan Wu <yifanwucs@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Reviewed-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-04-01 11:55:29 +02:00
Pablo Neira Ayuso	917b61fa20	netfilter: ctnetlink: ignore explicit helper on new expectations Use the existing master conntrack helper, anything else is not really supported and it just makes validation more complicated, so just ignore what helper userspace suggests for this expectation. This was uncovered when validating CTA_EXPECT_CLASS via different helper provided by userspace than the existing master conntrack helper: BUG: KASAN: slab-out-of-bounds in nf_ct_expect_related_report+0x2479/0x27c0 Read of size 4 at addr ffff8880043fe408 by task poc/102 Call Trace: nf_ct_expect_related_report+0x2479/0x27c0 ctnetlink_create_expect+0x22b/0x3b0 ctnetlink_new_expect+0x4bd/0x5c0 nfnetlink_rcv_msg+0x67a/0x950 netlink_rcv_skb+0x120/0x350 Allowing to read kernel memory bytes off the expectation boundary. CTA_EXPECT_HELP_NAME is still used to offer the helper name to userspace via netlink dump. Fixes: `bd07793705` ("netfilter: nfnetlink_queue: allow to attach expectations to conntracks") Reported-by: Qi Tang <tpluszz77@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-04-01 11:55:29 +02:00
Qi Tang	35177c6877	netfilter: ctnetlink: zero expect NAT fields when CTA_EXPECT_NAT absent ctnetlink_alloc_expect() allocates expectations from a non-zeroing slab cache via nf_ct_expect_alloc(). When CTA_EXPECT_NAT is not present in the netlink message, saved_addr and saved_proto are never initialized. Stale data from a previous slab occupant can then be dumped to userspace by ctnetlink_exp_dump_expect(), which checks these fields to decide whether to emit CTA_EXPECT_NAT. The safe sibling nf_ct_expect_init(), used by the packet path, explicitly zeroes these fields. Zero saved_addr, saved_proto and dir in the else branch, guarded by IS_ENABLED(CONFIG_NF_NAT) since these fields only exist when NAT is enabled. Confirmed by priming the expect slab with NAT-bearing expectations, freeing them, creating a new expectation without CTA_EXPECT_NAT, and observing that the ctnetlink dump emits a spurious CTA_EXPECT_NAT containing stale data from the prior allocation. Fixes: `076a0ca026` ("netfilter: ctnetlink: add NAT support for expectations") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Qi Tang <tpluszz77@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-04-01 11:55:29 +02:00
Qi Tang	a242a9ae58	netfilter: nf_conntrack_helper: pass helper to expect cleanup nf_conntrack_helper_unregister() calls nf_ct_expect_iterate_destroy() to remove expectations belonging to the helper being unregistered. However, it passes NULL instead of the helper pointer as the data argument, so expect_iter_me() never matches any expectation and all of them survive the cleanup. After unregister returns, nfnl_cthelper_del() frees the helper object immediately. Subsequent expectation dumps or packet-driven init_conntrack() calls then dereference the freed exp->helper, causing a use-after-free. Pass the actual helper pointer so expectations referencing it are properly destroyed before the helper object is freed. BUG: KASAN: slab-use-after-free in string+0x38f/0x430 Read of size 1 at addr ffff888003b14d20 by task poc/103 Call Trace: string+0x38f/0x430 vsnprintf+0x3cc/0x1170 seq_printf+0x17a/0x240 exp_seq_show+0x2e5/0x560 seq_read_iter+0x419/0x1280 proc_reg_read+0x1ac/0x270 vfs_read+0x179/0x930 ksys_read+0xef/0x1c0 Freed by task 103: The buggy address is located 32 bytes inside of freed 192-byte region [ffff888003b14d00, ffff888003b14dc0) Fixes: `ac7b848390` ("netfilter: expect: add and use nf_ct_expect_iterate helpers") Signed-off-by: Qi Tang <tpluszz77@gmail.com> Reviewed-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-04-01 11:55:29 +02:00
Florian Westphal	b7e8590987	netfilter: ipset: use nla_strcmp for IPSET_ATTR_NAME attr IPSET_ATTR_NAME and IPSET_ATTR_NAMEREF are of NLA_STRING type, they cannot be treated like a c-string. They either have to be switched to NLA_NUL_STRING, or the compare operations need to use the nla functions. Fixes: `f830837f0e` ("netfilter: ipset: list:set set type support") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-04-01 11:55:29 +02:00
Florian Westphal	a958a4f90d	netfilter: x_tables: ensure names are nul-terminated Reject names that lack a \0 character before feeding them to functions that expect c-strings. Fixes tag is the most recent commit that needs this change. Fixes: `c38c4597e4` ("netfilter: implement xt_cgroup cgroup2 path match") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-04-01 11:55:29 +02:00
Florian Westphal	6d52a4a052	netfilter: nfnetlink_log: account for netlink header size This is a followup to an old bug fix: NLMSG_DONE needs to account for the netlink header size, not just the attribute size. This can result in a WARN splat + drop of the netlink message, but other than this there are no ill effects. Fixes: `9dfa1dfe4d` ("netfilter: nf_log: account for size of NLMSG_DONE attribute") Reported-by: Yiming Qian <yimingqian591@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-04-01 11:55:29 +02:00
Pablo Neira Ayuso	76522fcdbc	netfilter: flowtable: strictly check for maximum number of actions The maximum number of flowtable hardware offload actions in IPv6 is: * ethernet mangling (4 payload actions, 2 for each ethernet address) * SNAT (4 payload actions) * DNAT (4 payload actions) * Double VLAN (4 vlan actions, 2 for popping vlan, and 2 for pushing) for QinQ. * Redirect (1 action) Which makes 17, while the maximum is 16. But act_ct supports for tunnels actions too. Note that payload action operates at 32-bit word level, so mangling an IPv6 address takes 4 payload actions. Update flow_action_entry_next() calls to check for the maximum number of supported actions. While at it, rise the maximum number of actions per flow from 16 to 24 so this works fine with IPv6 setups. Fixes: `c29f74e0df` ("netfilter: nf_flow_table: hardware offload support") Reported-by: Hyunwoo Kim <imv4bel@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-04-01 11:50:14 +02:00
Li Xiasong	5dd8025a49	mptcp: fix soft lockup in mptcp_recvmsg() syzbot reported a soft lockup in mptcp_recvmsg() [0]. When receiving data with MSG_PEEK \| MSG_WAITALL flags, the skb is not removed from the sk_receive_queue. This causes sk_wait_data() to always find available data and never perform actual waiting, leading to a soft lockup. Fix this by adding a 'last' parameter to track the last peeked skb. This allows sk_wait_data() to make informed waiting decisions and prevent infinite loops when MSG_PEEK is used. [0]: watchdog: BUG: soft lockup - CPU#2 stuck for 156s! [server:1963] Modules linked in: CPU: 2 UID: 0 PID: 1963 Comm: server Not tainted 6.19.0-rc8 #61 PREEMPT(none) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:sk_wait_data+0x15/0x190 Code: 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 41 56 41 55 41 54 49 89 f4 55 48 89 d5 53 48 89 fb <48> 83 ec 30 65 48 8b 05 17 a4 6b 01 48 89 44 24 28 31 c0 65 48 8b RSP: 0018:ffffc90000603ca0 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff888102bf0800 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffffc90000603d18 RDI: ffff888102bf0800 RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000101 R10: 0000000000000000 R11: 0000000000000075 R12: ffffc90000603d18 R13: ffff888102bf0800 R14: ffff888102bf0800 R15: 0000000000000000 FS: 00007f6e38b8c4c0(0000) GS:ffff8881b877e000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055aa7bff1680 CR3: 0000000105cbe000 CR4: 00000000000006f0 Call Trace: <TASK> mptcp_recvmsg+0x547/0x8c0 net/mptcp/protocol.c:2329 inet_recvmsg+0x11f/0x130 net/ipv4/af_inet.c:891 sock_recvmsg+0x94/0xc0 net/socket.c:1100 __sys_recvfrom+0xb2/0x130 net/socket.c:2256 __x64_sys_recvfrom+0x1f/0x30 net/socket.c:2267 do_syscall_64+0x59/0x2d0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x76/0x7e arch/x86/entry/entry_64.S:131 RIP: 0033:0x7f6e386a4a1d Code: 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8d 05 f1 de 2c 00 41 89 ca 8b 00 85 c0 75 20 45 31 c9 45 31 c0 b8 2d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 6b f3 c3 66 0f 1f 84 00 00 00 00 00 41 56 41 RSP: 002b:00007ffc3c4bb078 EFLAGS: 00000246 ORIG_RAX: 000000000000002d RAX: ffffffffffffffda RBX: 000000000000861e RCX: 00007f6e386a4a1d RDX: 00000000000003ff RSI: 00007ffc3c4bb150 RDI: 0000000000000004 RBP: 00007ffc3c4bb570 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000103 R11: 0000000000000246 R12: 00005605dbc00be0 R13: 00007ffc3c4bb650 R14: 0000000000000000 R15: 0000000000000000 </TASK> Fixes: `8e04ce45a8` ("mptcp: fix MSG_PEEK stream corruption") Signed-off-by: Li Xiasong <lixiasong1@huawei.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260330120335.659027-1-lixiasong1@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-31 18:58:37 -07:00
Zhengchuan Liang	9ca562bb8e	net: ipv6: flowlabel: defer exclusive option free until RCU teardown `ip6fl_seq_show()` walks the global flowlabel hash under the seq-file RCU read-side lock and prints `fl->opt->opt_nflen` when an option block is present. Exclusive flowlabels currently free `fl->opt` as soon as `fl->users` drops to zero in `fl_release()`. However, the surrounding `struct ip6_flowlabel` remains visible in the global hash table until later garbage collection removes it and `fl_free_rcu()` finally tears it down. A concurrent `/proc/net/ip6_flowlabel` reader can therefore race that early `kfree()` and dereference freed option state, triggering a crash in `ip6fl_seq_show()`. Fix this by keeping `fl->opt` alive until `fl_free_rcu()`. That matches the lifetime already required for the enclosing flowlabel while readers can still reach it under RCU. Fixes: `d3aedd5ebd` ("ipv6 flowlabel: Convert hash list to RCU.") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Tested-by: Ren Wei <enjou1224z@gmail.com> Signed-off-by: Zhengchuan Liang <zcliangcn@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/07351f0ec47bcee289576f39f9354f4a64add6e4.1774855883.git.zcliangcn@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-31 15:44:29 -07:00
Xiang Mei	fa6e249633	bridge: mrp: reject zero test interval to avoid OOM panic br_mrp_start_test() and br_mrp_start_in_test() accept the user-supplied interval value from netlink without validation. When interval is 0, usecs_to_jiffies(0) yields 0, causing the delayed work (br_mrp_test_work_expired / br_mrp_in_test_work_expired) to reschedule itself with zero delay. This creates a tight loop on system_percpu_wq that allocates and transmits MRP test frames at maximum rate, exhausting all system memory and causing a kernel panic via OOM deadlock. The same zero-interval issue applies to br_mrp_start_in_test_parse() for interconnect test frames. Use NLA_POLICY_MIN(NLA_U32, 1) in the nla_policy tables for both IFLA_BRIDGE_MRP_START_TEST_INTERVAL and IFLA_BRIDGE_MRP_START_IN_TEST_INTERVAL, so zero is rejected at the netlink attribute parsing layer before the value ever reaches the workqueue scheduling code. This is consistent with how other bridge subsystems (br_fdb, br_mst) enforce range constraints on netlink attributes. Fixes: `20f6a05ef6` ("bridge: mrp: Rework the MRP netlink interface") Fixes: `7ab1748e4c` ("bridge: mrp: Extend MRP netlink interface for configuring MRP interconnect") Reported-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Xiang Mei <xmei5@asu.edu> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260328063000.1845376-1-xmei5@asu.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-31 16:11:24 +02:00
Yochai Eisenrich	e6e3eb5ee8	net: sched: cls_api: fix tc_chain_fill_node to initialize tcm_info to zero to prevent an info-leak When building netlink messages, tc_chain_fill_node() never initializes the tcm_info field of struct tcmsg. Since the allocation is not zeroed, kernel heap memory is leaked to userspace through this 4-byte field. The fix simply zeroes tcm_info alongside the other fields that are already initialized. Fixes: `32a4f5ecd7` ("net: sched: introduce chain object to uapi") Signed-off-by: Yochai Eisenrich <echelonh@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260328211436.1010152-1-echelonh@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-30 17:56:40 -07:00
Guoyu Su	ddc748a391	net: use skb_header_pointer() for TCPv4 GSO frag_off check Syzbot reported a KMSAN uninit-value warning in gso_features_check() called from netif_skb_features() [1]. gso_features_check() reads iph->frag_off to decide whether to clear mangleid_features. Accessing the IPv4 header via ip_hdr()/inner_ip_hdr() can rely on skb header offsets that are not always safe for direct dereference on packets injected from PF_PACKET paths. Use skb_header_pointer() for the TCPv4 frag_off check so the header read is robust whether data is already linear or needs copying. [1] https://syzkaller.appspot.com/bug?extid=1543a7d954d9c6d00407 Link: https://lore.kernel.org/netdev/willemdebruijn.kernel.1a9f35039caab@gmail.com/ Fixes: `cbc53e08a7` ("GSO: Add GSO type for fixed IPv4 ID") Reported-by: syzbot+1543a7d954d9c6d00407@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1543a7d954d9c6d00407 Tested-by: syzbot+1543a7d954d9c6d00407@syzkaller.appspotmail.com Signed-off-by: Guoyu Su <yss2813483011xxl@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260327153507.39742-1-yss2813483011xxl@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-30 17:35:21 -07:00
Paolo Abeni	fd63f18597	ipv6: prevent possible UaF in addrconf_permanent_addr() The mentioned helper try to warn the user about an exceptional condition, but the message is delivered too late, accessing the ipv6 after its possible deletion. Reorder the statement to avoid the possible UaF; while at it, place the warning outside the idev->lock as it needs no protection. Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://sashiko.dev/#/patchset/8c8bfe2e1a324e501f0e15fef404a77443fd8caf.1774365668.git.pabeni%40redhat.com Fixes: `f1705ec197` ("net: ipv6: Make address flushing on ifdown optional") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/ef973c3a8cb4f8f1787ed469f3e5391b9fe95aa0.1774601542.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-30 17:25:10 -07:00

1 2 3 4 5 ...

83434 Commits