summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2022-11-02net, neigh: Fix null-ptr-deref in neigh_table_clear()Chen Zhongjin
When IPv6 module gets initialized but hits an error in the middle, kenel panic with: KASAN: null-ptr-deref in range [0x0000000000000598-0x000000000000059f] CPU: 1 PID: 361 Comm: insmod Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:__neigh_ifdown.isra.0+0x24b/0x370 RSP: 0018:ffff888012677908 EFLAGS: 00000202 ... Call Trace: <TASK> neigh_table_clear+0x94/0x2d0 ndisc_cleanup+0x27/0x40 [ipv6] inet6_init+0x21c/0x2cb [ipv6] do_one_initcall+0xd3/0x4d0 do_init_module+0x1ae/0x670 ... Kernel panic - not syncing: Fatal exception When ipv6 initialization fails, it will try to cleanup and calls: neigh_table_clear() neigh_ifdown(tbl, NULL) pneigh_queue_purge(&tbl->proxy_queue, dev_net(dev == NULL)) # dev_net(NULL) triggers null-ptr-deref. Fix it by passing NULL to pneigh_queue_purge() in neigh_ifdown() if dev is NULL, to make kernel not panic immediately. Fixes: 66ba215cb513 ("neigh: fix possible DoS due to net iface start/stop loop") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Link: https://lore.kernel.org/r/20221101121552.21890-1-chenzhongjin@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-02net/smc: Fix possible leaked pernet namespace in smc_init()Chen Zhongjin
In smc_init(), register_pernet_subsys(&smc_net_stat_ops) is called without any error handling. If it fails, registering of &smc_net_ops won't be reverted. And if smc_nl_init() fails, &smc_net_stat_ops itself won't be reverted. This leaves wild ops in subsystem linkedlist and when another module tries to call register_pernet_operations() it triggers page fault: BUG: unable to handle page fault for address: fffffbfff81b964c RIP: 0010:register_pernet_operations+0x1b9/0x5f0 Call Trace: <TASK> register_pernet_subsys+0x29/0x40 ebtables_init+0x58/0x1000 [ebtables] ... Fixes: 194730a9beb5 ("net/smc: Make SMC statistics network namespace aware") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Link: https://lore.kernel.org/r/20221101093722.127223-1-chenzhongjin@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net 1) netlink socket notifier might win race to release objects that are already pending to be released via commit release path, reported by syzbot. 2) No need to postpone flow rule release to commit release path, this triggered the syzbot report, complementary fix to previous patch. 3) Use explicit signed chars in IPVS to unbreak arm, from Jason A. Donenfeld. 4) Missing check for proc entry creation failure in IPVS, from Zhengchao Shao. 5) Incorrect error path handling when BPF NAT fails to register, from Chen Zhongjin. 6) Prevent huge memory allocation in ipset hash types, from Jozsef Kadlecsik. Except the incorrect BPF NAT error path which is broken in 6.1-rc, anything else has been broken for several releases. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: ipset: enforce documented limit to prevent allocating huge memory netfilter: nf_nat: Fix possible memory leak in nf_nat_init() ipvs: fix WARNING in ip_vs_app_net_cleanup() ipvs: fix WARNING in __ip_vs_cleanup_batch() ipvs: use explicitly signed chars netfilter: nf_tables: release flow rule object from commit path netfilter: nf_tables: netlink notifier might race to release objects ==================== Link: https://lore.kernel.org/r/20221102184659.2502-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-02Bluetooth: L2CAP: Fix attempting to access uninitialized memoryLuiz Augusto von Dentz
On l2cap_parse_conf_req the variable efs is only initialized if remote_efs has been set. CVE: CVE-2022-42895 CC: stable@vger.kernel.org Reported-by: Tamás Koczka <poprdi@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Reviewed-by: Tedd Ho-Jeong An <tedd.an@intel.com>
2022-11-02Bluetooth: L2CAP: Fix l2cap_global_chan_by_psmLuiz Augusto von Dentz
l2cap_global_chan_by_psm shall not return fixed channels as they are not meant to be connected by (S)PSM. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Reviewed-by: Tedd Ho-Jeong An <tedd.an@intel.com>
2022-11-02Bluetooth: L2CAP: Fix accepting connection request for invalid SPSMLuiz Augusto von Dentz
The Bluetooth spec states that the valid range for SPSM is from 0x0001-0x00ff so it is invalid to accept values outside of this range: BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part A page 1059: Table 4.15: L2CAP_LE_CREDIT_BASED_CONNECTION_REQ SPSM ranges CVE: CVE-2022-42896 CC: stable@vger.kernel.org Reported-by: Tamás Koczka <poprdi@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Reviewed-by: Tedd Ho-Jeong An <tedd.an@intel.com>
2022-11-02Bluetooth: hci_conn: Fix not restoring ISO buffer count on disconnectLuiz Augusto von Dentz
When disconnecting an ISO link the controller may not generate HCI_EV_NUM_COMP_PKTS for unacked packets which needs to be restored in hci_conn_del otherwise the host would assume they are still in use and would not be able to use all the buffers available. Fixes: 26afbd826ee3 ("Bluetooth: Add initial implementation of CIS connections") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Tested-by: Frédéric Danis <frederic.danis@collabora.com>
2022-11-02Bluetooth: L2CAP: Fix memory leak in vhci_writeHawkins Jiawei
Syzkaller reports a memory leak as follows: ==================================== BUG: memory leak unreferenced object 0xffff88810d81ac00 (size 240): [...] hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff838733d9>] __alloc_skb+0x1f9/0x270 net/core/skbuff.c:418 [<ffffffff833f742f>] alloc_skb include/linux/skbuff.h:1257 [inline] [<ffffffff833f742f>] bt_skb_alloc include/net/bluetooth/bluetooth.h:469 [inline] [<ffffffff833f742f>] vhci_get_user drivers/bluetooth/hci_vhci.c:391 [inline] [<ffffffff833f742f>] vhci_write+0x5f/0x230 drivers/bluetooth/hci_vhci.c:511 [<ffffffff815e398d>] call_write_iter include/linux/fs.h:2192 [inline] [<ffffffff815e398d>] new_sync_write fs/read_write.c:491 [inline] [<ffffffff815e398d>] vfs_write+0x42d/0x540 fs/read_write.c:578 [<ffffffff815e3cdd>] ksys_write+0x9d/0x160 fs/read_write.c:631 [<ffffffff845e0645>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff845e0645>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff84600087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd ==================================== HCI core will uses hci_rx_work() to process frame, which is queued to the hdev->rx_q tail in hci_recv_frame() by HCI driver. Yet the problem is that, HCI core may not free the skb after handling ACL data packets. To be more specific, when start fragment does not contain the L2CAP length, HCI core just copies skb into conn->rx_skb and finishes frame process in l2cap_recv_acldata(), without freeing the skb, which triggers the above memory leak. This patch solves it by releasing the relative skb, after processing the above case in l2cap_recv_acldata(). Fixes: 4d7ea8ee90e4 ("Bluetooth: L2CAP: Fix handling fragmented length") Link: https://lore.kernel.org/all/0000000000000d0b1905e6aaef64@google.com/ Reported-and-tested-by: syzbot+8f819e36e01022991cfa@syzkaller.appspotmail.com Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-11-02Bluetooth: L2CAP: fix use-after-free in l2cap_conn_del()Zhengchao Shao
When l2cap_recv_frame() is invoked to receive data, and the cid is L2CAP_CID_A2MP, if the channel does not exist, it will create a channel. However, after a channel is created, the hold operation of the channel is not performed. In this case, the value of channel reference counting is 1. As a result, after hci_error_reset() is triggered, l2cap_conn_del() invokes the close hook function of A2MP to release the channel. Then l2cap_chan_unlock(chan) will trigger UAF issue. The process is as follows: Receive data: l2cap_data_channel() a2mp_channel_create() --->channel ref is 2 l2cap_chan_put() --->channel ref is 1 Triger event: hci_error_reset() hci_dev_do_close() ... l2cap_disconn_cfm() l2cap_conn_del() l2cap_chan_hold() --->channel ref is 2 l2cap_chan_del() --->channel ref is 1 a2mp_chan_close_cb() --->channel ref is 0, release channel l2cap_chan_unlock() --->UAF of channel The detailed Call Trace is as follows: BUG: KASAN: use-after-free in __mutex_unlock_slowpath+0xa6/0x5e0 Read of size 8 at addr ffff8880160664b8 by task kworker/u11:1/7593 Workqueue: hci0 hci_error_reset Call Trace: <TASK> dump_stack_lvl+0xcd/0x134 print_report.cold+0x2ba/0x719 kasan_report+0xb1/0x1e0 kasan_check_range+0x140/0x190 __mutex_unlock_slowpath+0xa6/0x5e0 l2cap_conn_del+0x404/0x7b0 l2cap_disconn_cfm+0x8c/0xc0 hci_conn_hash_flush+0x11f/0x260 hci_dev_close_sync+0x5f5/0x11f0 hci_dev_do_close+0x2d/0x70 hci_error_reset+0x9e/0x140 process_one_work+0x98a/0x1620 worker_thread+0x665/0x1080 kthread+0x2e4/0x3a0 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 7593: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0xa9/0xd0 l2cap_chan_create+0x40/0x930 amp_mgr_create+0x96/0x990 a2mp_channel_create+0x7d/0x150 l2cap_recv_frame+0x51b8/0x9a70 l2cap_recv_acldata+0xaa3/0xc00 hci_rx_work+0x702/0x1220 process_one_work+0x98a/0x1620 worker_thread+0x665/0x1080 kthread+0x2e4/0x3a0 ret_from_fork+0x1f/0x30 Freed by task 7593: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 ____kasan_slab_free+0x167/0x1c0 slab_free_freelist_hook+0x89/0x1c0 kfree+0xe2/0x580 l2cap_chan_put+0x22a/0x2d0 l2cap_conn_del+0x3fc/0x7b0 l2cap_disconn_cfm+0x8c/0xc0 hci_conn_hash_flush+0x11f/0x260 hci_dev_close_sync+0x5f5/0x11f0 hci_dev_do_close+0x2d/0x70 hci_error_reset+0x9e/0x140 process_one_work+0x98a/0x1620 worker_thread+0x665/0x1080 kthread+0x2e4/0x3a0 ret_from_fork+0x1f/0x30 Last potentially related work creation: kasan_save_stack+0x1e/0x40 __kasan_record_aux_stack+0xbe/0xd0 call_rcu+0x99/0x740 netlink_release+0xe6a/0x1cf0 __sock_release+0xcd/0x280 sock_close+0x18/0x20 __fput+0x27c/0xa90 task_work_run+0xdd/0x1a0 exit_to_user_mode_prepare+0x23c/0x250 syscall_exit_to_user_mode+0x19/0x50 do_syscall_64+0x42/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Second to last potentially related work creation: kasan_save_stack+0x1e/0x40 __kasan_record_aux_stack+0xbe/0xd0 call_rcu+0x99/0x740 netlink_release+0xe6a/0x1cf0 __sock_release+0xcd/0x280 sock_close+0x18/0x20 __fput+0x27c/0xa90 task_work_run+0xdd/0x1a0 exit_to_user_mode_prepare+0x23c/0x250 syscall_exit_to_user_mode+0x19/0x50 do_syscall_64+0x42/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: d0be8347c623 ("Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-11-02Bluetooth: hci_conn: Fix CIS connection dst_type handlingPauli Virtanen
hci_connect_cis and iso_connect_cis call hci_bind_cis inconsistently with dst_type being either ISO socket address type or the HCI type, but these values cannot be mixed like this. Fix this by using only the HCI type. CIS connection dst_type was also not initialized in hci_bind_cis, even though it is used in hci_conn_hash_lookup_cis to find existing connections. Set the value in hci_bind_cis, so that existing CIS connections are found e.g. when doing deferred socket connections, also when dst_type is not 0 (ADDR_LE_DEV_PUBLIC). Fixes: 26afbd826ee3 ("Bluetooth: Add initial implementation of CIS connections") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-11-02Bluetooth: L2CAP: Fix use-after-free caused by l2cap_reassemble_sduMaxim Mikityanskiy
Fix the race condition between the following two flows that run in parallel: 1. l2cap_reassemble_sdu -> chan->ops->recv (l2cap_sock_recv_cb) -> __sock_queue_rcv_skb. 2. bt_sock_recvmsg -> skb_recv_datagram, skb_free_datagram. An SKB can be queued by the first flow and immediately dequeued and freed by the second flow, therefore the callers of l2cap_reassemble_sdu can't use the SKB after that function returns. However, some places continue accessing struct l2cap_ctrl that resides in the SKB's CB for a short time after l2cap_reassemble_sdu returns, leading to a use-after-free condition (the stack trace is below, line numbers for kernel 5.19.8). Fix it by keeping a local copy of struct l2cap_ctrl. BUG: KASAN: use-after-free in l2cap_rx_state_recv (net/bluetooth/l2cap_core.c:6906) bluetooth Read of size 1 at addr ffff88812025f2f0 by task kworker/u17:3/43169 Workqueue: hci0 hci_rx_work [bluetooth] Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) print_report.cold (mm/kasan/report.c:314 mm/kasan/report.c:429) ? l2cap_rx_state_recv (net/bluetooth/l2cap_core.c:6906) bluetooth kasan_report (mm/kasan/report.c:162 mm/kasan/report.c:493) ? l2cap_rx_state_recv (net/bluetooth/l2cap_core.c:6906) bluetooth l2cap_rx_state_recv (net/bluetooth/l2cap_core.c:6906) bluetooth l2cap_rx (net/bluetooth/l2cap_core.c:7236 net/bluetooth/l2cap_core.c:7271) bluetooth ret_from_fork (arch/x86/entry/entry_64.S:306) </TASK> Allocated by task 43169: kasan_save_stack (mm/kasan/common.c:39) __kasan_slab_alloc (mm/kasan/common.c:45 mm/kasan/common.c:436 mm/kasan/common.c:469) kmem_cache_alloc_node (mm/slab.h:750 mm/slub.c:3243 mm/slub.c:3293) __alloc_skb (net/core/skbuff.c:414) l2cap_recv_frag (./include/net/bluetooth/bluetooth.h:425 net/bluetooth/l2cap_core.c:8329) bluetooth l2cap_recv_acldata (net/bluetooth/l2cap_core.c:8442) bluetooth hci_rx_work (net/bluetooth/hci_core.c:3642 net/bluetooth/hci_core.c:3832) bluetooth process_one_work (kernel/workqueue.c:2289) worker_thread (./include/linux/list.h:292 kernel/workqueue.c:2437) kthread (kernel/kthread.c:376) ret_from_fork (arch/x86/entry/entry_64.S:306) Freed by task 27920: kasan_save_stack (mm/kasan/common.c:39) kasan_set_track (mm/kasan/common.c:45) kasan_set_free_info (mm/kasan/generic.c:372) ____kasan_slab_free (mm/kasan/common.c:368 mm/kasan/common.c:328) slab_free_freelist_hook (mm/slub.c:1780) kmem_cache_free (mm/slub.c:3536 mm/slub.c:3553) skb_free_datagram (./include/net/sock.h:1578 ./include/net/sock.h:1639 net/core/datagram.c:323) bt_sock_recvmsg (net/bluetooth/af_bluetooth.c:295) bluetooth l2cap_sock_recvmsg (net/bluetooth/l2cap_sock.c:1212) bluetooth sock_read_iter (net/socket.c:1087) new_sync_read (./include/linux/fs.h:2052 fs/read_write.c:401) vfs_read (fs/read_write.c:482) ksys_read (fs/read_write.c:620) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) Link: https://lore.kernel.org/linux-bluetooth/CAKErNvoqga1WcmoR3-0875esY6TVWFQDandbVZncSiuGPBQXLA@mail.gmail.com/T/#u Fixes: d2a7ac5d5d3a ("Bluetooth: Add the ERTM receive state machine") Fixes: 4b51dae96731 ("Bluetooth: Add streaming mode receive and incoming packet classifier") Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-11-02netfilter: ipset: enforce documented limit to prevent allocating huge memoryJozsef Kadlecsik
Daniel Xu reported that the hash:net,iface type of the ipset subsystem does not limit adding the same network with different interfaces to a set, which can lead to huge memory usage or allocation failure. The quick reproducer is $ ipset create ACL.IN.ALL_PERMIT hash:net,iface hashsize 1048576 timeout 0 $ for i in $(seq 0 100); do /sbin/ipset add ACL.IN.ALL_PERMIT 0.0.0.0/0,kaf_$i timeout 0 -exist; done The backtrace when vmalloc fails: [Tue Oct 25 00:13:08 2022] ipset: vmalloc error: size 1073741848, exceeds total pages <...> [Tue Oct 25 00:13:08 2022] Call Trace: [Tue Oct 25 00:13:08 2022] <TASK> [Tue Oct 25 00:13:08 2022] dump_stack_lvl+0x48/0x60 [Tue Oct 25 00:13:08 2022] warn_alloc+0x155/0x180 [Tue Oct 25 00:13:08 2022] __vmalloc_node_range+0x72a/0x760 [Tue Oct 25 00:13:08 2022] ? hash_netiface4_add+0x7c0/0xb20 [Tue Oct 25 00:13:08 2022] ? __kmalloc_large_node+0x4a/0x90 [Tue Oct 25 00:13:08 2022] kvmalloc_node+0xa6/0xd0 [Tue Oct 25 00:13:08 2022] ? hash_netiface4_resize+0x99/0x710 <...> The fix is to enforce the limit documented in the ipset(8) manpage: > The internal restriction of the hash:net,iface set type is that the same > network prefix cannot be stored with more than 64 different interfaces > in a single set. Fixes: ccf0a4b7fc68 ("netfilter: ipset: Add bucketsize parameter to all hash types") Reported-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-11-02Merge tag 'nfs-for-6.1-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Anna Schumaker: - Fix some coccicheck warnings - Avoid memcpy() run-time warning - Fix up various state reclaim / RECLAIM_COMPLETE errors - Fix a null pointer dereference in sysfs - Fix LOCK races - Fix gss_unwrap_resp_integ() crasher - Fix zero length clones - Fix memleak when allocate slot fails * tag 'nfs-for-6.1-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: nfs4: Fix kmemleak when allocate slot failed NFSv4.2: Fixup CLONE dest file size for zero-length count SUNRPC: Fix crasher in gss_unwrap_resp_integ() NFSv4: Retry LOCK on OLD_STATEID during delegation return SUNRPC: Fix null-ptr-deref when xps sysfs alloc failed NFSv4.1: We must always send RECLAIM_COMPLETE after a reboot NFSv4.1: Handle RECLAIM_COMPLETE trunking errors NFSv4: Fix a potential state reclaim deadlock NFS: Avoid memcpy() run-time warning for struct sockaddr overflows nfs: Remove redundant null checks before kfree
2022-11-02Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== bpf-next 2022-11-02 We've added 70 non-merge commits during the last 14 day(s) which contain a total of 96 files changed, 3203 insertions(+), 640 deletions(-). The main changes are: 1) Make cgroup local storage available to non-cgroup attached BPF programs such as tc BPF ones, from Yonghong Song. 2) Avoid unnecessary deadlock detection and failures wrt BPF task storage helpers, from Martin KaFai Lau. 3) Add LLVM disassembler as default library for dumping JITed code in bpftool, from Quentin Monnet. 4) Various kprobe_multi_link fixes related to kernel modules, from Jiri Olsa. 5) Optimize x86-64 JIT with emitting BMI2-based shift instructions, from Jie Meng. 6) Improve BPF verifier's memory type compatibility for map key/value arguments, from Dave Marchevsky. 7) Only create mmap-able data section maps in libbpf when data is exposed via skeletons, from Andrii Nakryiko. 8) Add an autoattach option for bpftool to load all object assets, from Wang Yufen. 9) Various memory handling fixes for libbpf and BPF selftests, from Xu Kuohai. 10) Initial support for BPF selftest's vmtest.sh on arm64, from Manu Bretelle. 11) Improve libbpf's BTF handling to dedup identical structs, from Alan Maguire. 12) Add BPF CI and denylist documentation for BPF selftests, from Daniel Müller. 13) Check BPF cpumap max_entries before doing allocation work, from Florian Lehner. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (70 commits) samples/bpf: Fix typo in README bpf: Remove the obsolte u64_stats_fetch_*_irq() users. bpf: check max_entries before allocating memory bpf: Fix a typo in comment for DFS algorithm bpftool: Fix spelling mistake "disasembler" -> "disassembler" selftests/bpf: Fix bpftool synctypes checking failure selftests/bpf: Panic on hard/soft lockup docs/bpf: Add documentation for new cgroup local storage selftests/bpf: Add test cgrp_local_storage to DENYLIST.s390x selftests/bpf: Add selftests for new cgroup local storage selftests/bpf: Fix test test_libbpf_str/bpf_map_type_str bpftool: Support new cgroup local storage libbpf: Support new cgroup local storage bpf: Implement cgroup storage available to non-cgroup-attached bpf progs bpf: Refactor some inode/task/sk storage functions for reuse bpf: Make struct cgroup btf id global selftests/bpf: Tracing prog can still do lookup under busy lock selftests/bpf: Ensure no task storage failure for bpf_lsm.s prog due to deadlock detection bpf: Add new bpf_task_storage_delete proto with no deadlock detection bpf: bpf_task_storage_delete_recur does lookup first before the deadlock check ... ==================== Link: https://lore.kernel.org/r/20221102062120.5724-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-02nfc: Add KCOV annotationsDmitry Vyukov
Add remote KCOV annotations for NFC processing that is done in background threads. This enables efficient coverage-guided fuzzing of the NFC subsystem. The intention is to add annotations to background threads that process skb's that were allocated in syscall context (thus have a KCOV handle associated with the current fuzz test). This includes nci_recv_frame() that is called by the virtual nci driver in the syscall context. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: Bongsu Jeon <bongsu.jeon@samsung.com> Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-02rose: Fix NULL pointer dereference in rose_send_frame()Zhang Qilong
The syzkaller reported an issue: KASAN: null-ptr-deref in range [0x0000000000000380-0x0000000000000387] CPU: 0 PID: 4069 Comm: kworker/0:15 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 Workqueue: rcu_gp srcu_invoke_callbacks RIP: 0010:rose_send_frame+0x1dd/0x2f0 net/rose/rose_link.c:101 Call Trace: <IRQ> rose_transmit_clear_request+0x1d5/0x290 net/rose/rose_link.c:255 rose_rx_call_request+0x4c0/0x1bc0 net/rose/af_rose.c:1009 rose_loopback_timer+0x19e/0x590 net/rose/rose_loopback.c:111 call_timer_fn+0x1a0/0x6b0 kernel/time/timer.c:1474 expire_timers kernel/time/timer.c:1519 [inline] __run_timers.part.0+0x674/0xa80 kernel/time/timer.c:1790 __run_timers kernel/time/timer.c:1768 [inline] run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1803 __do_softirq+0x1d0/0x9c8 kernel/softirq.c:571 [...] </IRQ> It triggers NULL pointer dereference when 'neigh->dev->dev_addr' is called in the rose_send_frame(). It's the first occurrence of the `neigh` is in rose_loopback_timer() as `rose_loopback_neigh', and the 'dev' in 'rose_loopback_neigh' is initialized sa nullptr. It had been fixed by commit 3b3fd068c56e3fbea30090859216a368398e39bf ("rose: Fix Null pointer dereference in rose_send_frame()") ever. But it's introduced by commit 3c53cd65dece47dd1f9d3a809f32e59d1d87b2b8 ("rose: check NULL rose_loopback_neigh->loopback") again. We fix it by add NULL check in rose_transmit_clear_request(). When the 'dev' in 'neigh' is NULL, we don't reply the request and just clear it. syzkaller don't provide repro, and I provide a syz repro like: r0 = syz_init_net_socket$bt_sco(0x1f, 0x5, 0x2) ioctl$sock_inet_SIOCSIFFLAGS(r0, 0x8914, &(0x7f0000000180)={'rose0\x00', 0x201}) r1 = syz_init_net_socket$rose(0xb, 0x5, 0x0) bind$rose(r1, &(0x7f00000000c0)=@full={0xb, @dev, @null, 0x0, [@null, @null, @netrom, @netrom, @default, @null]}, 0x40) connect$rose(r1, &(0x7f0000000240)=@short={0xb, @dev={0xbb, 0xbb, 0xbb, 0x1, 0x0}, @remote={0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0x1}, 0x1, @netrom={0xbb, 0xbb, 0xbb, 0xbb, 0xbb, 0x0, 0x0}}, 0x1c) Fixes: 3c53cd65dece ("rose: check NULL rose_loopback_neigh->loopback") Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-02netfilter: nf_nat: Fix possible memory leak in nf_nat_init()Chen Zhongjin
In nf_nat_init(), register_nf_nat_bpf() can fail and return directly without any error handling. Then nf_nat_bysource will leak and registering of &nat_net_ops, &follow_master_nat and nf_nat_hook won't be reverted. This leaves wild ops in linkedlists and when another module tries to call register_pernet_operations() or nf_ct_helper_expectfn_register() it triggers page fault: BUG: unable to handle page fault for address: fffffbfff81b964c RIP: 0010:register_pernet_operations+0x1b9/0x5f0 Call Trace: <TASK> register_pernet_subsys+0x29/0x40 ebtables_init+0x58/0x1000 [ebtables] ... Fixes: 820dc0523e05 ("net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.c") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-11-02wifi: mac80211: Set TWT Information Frame Disabled bit as 1Howard Hsu
The TWT Information Frame Disabled bit of control field of TWT Setup frame shall be set to 1 since handling TWT Information frame is not supported by current mac80211 implementation. Fixes: f5a4c24e689f ("mac80211: introduce individual TWT support in AP mode") Signed-off-by: Howard Hsu <howard-yh.hsu@mediatek.com> Link: https://lore.kernel.org/r/20221027015653.1448-1-howard-yh.hsu@mediatek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-11-02wifi: mac80211: Fix ack frame idr leak when mesh has no routeNicolas Cavallari
When trying to transmit an data frame with tx_status to a destination that have no route in the mesh, then it is dropped without recrediting the ack_status_frames idr. Once it is exhausted, wpa_supplicant starts failing to do SAE with NL80211_CMD_FRAME and logs "nl80211: Frame command failed". Use ieee80211_free_txskb() instead of kfree_skb() to fix it. Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr> Link: https://lore.kernel.org/r/20221027140133.1504-1-nicolas.cavallari@green-communications.fr Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-11-02wifi: mac80211: fix general-protection-fault in ieee80211_subif_start_xmit()Zhengchao Shao
When device is running and the interface status is changed, the gpf issue is triggered. The problem triggering process is as follows: Thread A: Thread B ieee80211_runtime_change_iftype() process_one_work() ... ... ieee80211_do_stop() ... ... ... sdata->bss = NULL ... ... ieee80211_subif_start_xmit() ieee80211_multicast_to_unicast //!sdata->bss->multicast_to_unicast cause gpf issue When the interface status is changed, the sending queue continues to send packets. After the bss is set to NULL, the bss is accessed. As a result, this causes a general-protection-fault issue. The following is the stack information: general protection fault, probably for non-canonical address 0xdffffc000000002f: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000178-0x000000000000017f] Workqueue: mld mld_ifc_work RIP: 0010:ieee80211_subif_start_xmit+0x25b/0x1310 Call Trace: <TASK> dev_hard_start_xmit+0x1be/0x990 __dev_queue_xmit+0x2c9a/0x3b60 ip6_finish_output2+0xf92/0x1520 ip6_finish_output+0x6af/0x11e0 ip6_output+0x1ed/0x540 mld_sendpack+0xa09/0xe70 mld_ifc_work+0x71c/0xdb0 process_one_work+0x9bf/0x1710 worker_thread+0x665/0x1080 kthread+0x2e4/0x3a0 ret_from_fork+0x1f/0x30 </TASK> Fixes: f856373e2f31 ("wifi: mac80211: do not wake queues on a vif that is being stopped") Reported-by: syzbot+c6e8fca81c294fd5620a@syzkaller.appspotmail.com Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20221026063959.177813-1-shaozhengchao@huawei.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-11-02ipvs: fix WARNING in ip_vs_app_net_cleanup()Zhengchao Shao
During the initialization of ip_vs_app_net_init(), if file ip_vs_app fails to be created, the initialization is successful by default. Therefore, the ip_vs_app file doesn't be found during the remove in ip_vs_app_net_cleanup(). It will cause WRNING. The following is the stack information: name 'ip_vs_app' WARNING: CPU: 1 PID: 9 at fs/proc/generic.c:712 remove_proc_entry+0x389/0x460 Modules linked in: Workqueue: netns cleanup_net RIP: 0010:remove_proc_entry+0x389/0x460 Call Trace: <TASK> ops_exit_list+0x125/0x170 cleanup_net+0x4ea/0xb00 process_one_work+0x9bf/0x1710 worker_thread+0x665/0x1080 kthread+0x2e4/0x3a0 ret_from_fork+0x1f/0x30 </TASK> Fixes: 457c4cbc5a3d ("[NET]: Make /proc/net per network namespace") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-11-02ipvs: fix WARNING in __ip_vs_cleanup_batch()Zhengchao Shao
During the initialization of ip_vs_conn_net_init(), if file ip_vs_conn or ip_vs_conn_sync fails to be created, the initialization is successful by default. Therefore, the ip_vs_conn or ip_vs_conn_sync file doesn't be found during the remove. The following is the stack information: name 'ip_vs_conn_sync' WARNING: CPU: 3 PID: 9 at fs/proc/generic.c:712 remove_proc_entry+0x389/0x460 Modules linked in: Workqueue: netns cleanup_net RIP: 0010:remove_proc_entry+0x389/0x460 Call Trace: <TASK> __ip_vs_cleanup_batch+0x7d/0x120 ops_exit_list+0x125/0x170 cleanup_net+0x4ea/0xb00 process_one_work+0x9bf/0x1710 worker_thread+0x665/0x1080 kthread+0x2e4/0x3a0 ret_from_fork+0x1f/0x30 </TASK> Fixes: 61b1ab4583e2 ("IPVS: netns, add basic init per netns.") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-11-02ipvs: use explicitly signed charsJason A. Donenfeld
The `char` type with no explicit sign is sometimes signed and sometimes unsigned. This code will break on platforms such as arm, where char is unsigned. So mark it here as explicitly signed, so that the todrop_counter decrement and subsequent comparison is correct. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-11-01netlink: introduce bigendian integer typesFlorian Westphal
Jakub reported that the addition of the "network_byte_order" member in struct nla_policy increases size of 32bit platforms. Instead of scraping the bit from elsewhere Johannes suggested to add explicit NLA_BE types instead, so do this here. NLA_POLICY_MAX_BE() macro is removed again, there is no need for it: NLA_POLICY_MAX(NLA_BE.., ..) will do the right thing. NLA_BE64 can be added later. Fixes: 08724ef69907 ("netlink: introduce NLA_POLICY_MAX_BE") Reported-by: Jakub Kicinski <kuba@kernel.org> Suggested-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20221031123407.9158-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-01tcp: refine tcp_prune_ofo_queue() logicEric Dumazet
After commits 36a6503fedda ("tcp: refine tcp_prune_ofo_queue() to not drop all packets") and 72cd43ba64fc1 ("tcp: free batches of packets in tcp_prune_ofo_queue()") tcp_prune_ofo_queue() drops a fraction of ooo queue, to make room for incoming packet. However it makes no sense to drop packets that are before the incoming packet, in sequence space. In order to recover from packet losses faster, it makes more sense to only drop ooo packets which are after the incoming packet. Tested: packetdrill test: 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 setsockopt(3, SOL_SOCKET, SO_RCVBUF, [3800], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 0> +.1 < . 1:1(0) ack 1 win 1024 +0 accept(3, ..., ...) = 4 +.01 < . 200:300(100) ack 1 win 1024 +0 > . 1:1(0) ack 1 <nop,nop, sack 200:300> +.01 < . 400:500(100) ack 1 win 1024 +0 > . 1:1(0) ack 1 <nop,nop, sack 400:500 200:300> +.01 < . 600:700(100) ack 1 win 1024 +0 > . 1:1(0) ack 1 <nop,nop, sack 600:700 400:500 200:300> +.01 < . 800:900(100) ack 1 win 1024 +0 > . 1:1(0) ack 1 <nop,nop, sack 800:900 600:700 400:500 200:300> +.01 < . 1000:1100(100) ack 1 win 1024 +0 > . 1:1(0) ack 1 <nop,nop, sack 1000:1100 800:900 600:700 400:500> +.01 < . 1200:1300(100) ack 1 win 1024 +0 > . 1:1(0) ack 1 <nop,nop, sack 1200:1300 1000:1100 800:900 600:700> // this packet is dropped because we have no room left. +.01 < . 1400:1500(100) ack 1 win 1024 +.01 < . 1:200(199) ack 1 win 1024 // Make sure kernel did not drop 200:300 sequence +0 > . 1:1(0) ack 300 <nop,nop, sack 1200:1300 1000:1100 800:900 600:700> // Make room, since our RCVBUF is very small +0 read(4, ..., 299) = 299 +.01 < . 300:400(100) ack 1 win 1024 +0 > . 1:1(0) ack 500 <nop,nop, sack 1200:1300 1000:1100 800:900 600:700> +.01 < . 500:600(100) ack 1 win 1024 +0 > . 1:1(0) ack 700 <nop,nop, sack 1200:1300 1000:1100 800:900> +0 read(4, ..., 400) = 400 +.01 < . 700:800(100) ack 1 win 1024 +0 > . 1:1(0) ack 900 <nop,nop, sack 1200:1300 1000:1100> +.01 < . 900:1000(100) ack 1 win 1024 +0 > . 1:1(0) ack 1100 <nop,nop, sack 1200:1300> +.01 < . 1100:1200(100) ack 1 win 1024 // This checks that 1200:1300 has not been removed from ooo queue +0 > . 1:1(0) ack 1300 Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20221101035234.3910189-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-01net: core: inet[46]_pton strlen len typesDr. David Alan Gilbert
inet[46]_pton check the input length against a sane length limit (INET[6]_ADDRSTRLEN), but the strlen value gets truncated due to being stored in an int, so there's a theoretical potential for a >4G string to pass the limit test. Use size_t since that's what strlen actually returns. I've had a hunt for callers that could hit this, but I've not managed to find anything that doesn't get checked with some other limit first; but it's possible that I've missed something in the depth of the storage target paths. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://lore.kernel.org/r/20221029014604.114024-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-01bpf, sockmap: Fix the sk->sk_forward_alloc warning of sk_stream_kill_queuesWang Yufen
When running `test_sockmap` selftests, the following warning appears: WARNING: CPU: 2 PID: 197 at net/core/stream.c:205 sk_stream_kill_queues+0xd3/0xf0 Call Trace: <TASK> inet_csk_destroy_sock+0x55/0x110 tcp_rcv_state_process+0xd28/0x1380 ? tcp_v4_do_rcv+0x77/0x2c0 tcp_v4_do_rcv+0x77/0x2c0 __release_sock+0x106/0x130 __tcp_close+0x1a7/0x4e0 tcp_close+0x20/0x70 inet_release+0x3c/0x80 __sock_release+0x3a/0xb0 sock_close+0x14/0x20 __fput+0xa3/0x260 task_work_run+0x59/0xb0 exit_to_user_mode_prepare+0x1b3/0x1c0 syscall_exit_to_user_mode+0x19/0x50 do_syscall_64+0x48/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae The root case is in commit 84472b436e76 ("bpf, sockmap: Fix more uncharged while msg has more_data"), where I used msg->sg.size to replace the tosend, causing breakage: if (msg->apply_bytes && msg->apply_bytes < tosend) tosend = psock->apply_bytes; Fixes: 84472b436e76 ("bpf, sockmap: Fix more uncharged while msg has more_data") Reported-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/1667266296-8794-1-git-send-email-wangyufen@huawei.com
2022-11-01netfilter: nf_tables: release flow rule object from commit pathPablo Neira Ayuso
No need to postpone this to the commit release path, since no packets are walking over this object, this is accessed from control plane only. This helped uncovered UAF triggered by races with the netlink notifier. Fixes: 9dd732e0bdf5 ("netfilter: nf_tables: memleak flow rule from commit path") Reported-by: syzbot+8f747f62763bc6c32916@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-11-01netfilter: nf_tables: netlink notifier might race to release objectsPablo Neira Ayuso
commit release path is invoked via call_rcu and it runs lockless to release the objects after rcu grace period. The netlink notifier handler might win race to remove objects that the transaction context is still referencing from the commit release path. Call rcu_barrier() to ensure pending rcu callbacks run to completion if the list of transactions to be destroyed is not empty. Fixes: 6001a930ce03 ("netfilter: nftables: introduce table ownership") Reported-by: syzbot+8f747f62763bc6c32916@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-10-31net: dropreason: add SKB_DROP_REASON_FRAG_TOO_FAREric Dumazet
IPv4 reassembly unit can decide to drop frags based on /proc/sys/net/ipv4/ipfrag_max_dist sysctl. Add a specific drop reason to track this specific and weird case. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-31net: dropreason: add SKB_DROP_REASON_FRAG_REASM_TIMEOUTEric Dumazet
Used to track skbs freed after a timeout happened in a reassmbly unit. Passing a @reason argument to inet_frag_rbtree_purge() allows to use correct consumed status for frags that have been successfully re-assembled. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-31net: dropreason: add SKB_DROP_REASON_DUP_FRAGEric Dumazet
This is used to track when a duplicate segment received by various reassembly units is dropped. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-31net: dropreason: propagate drop_reason to skb_release_data()Eric Dumazet
When an skb with a frag list is consumed, we currently pretend all skbs in the frag list were dropped. In order to fix this, add a @reason argument to skb_release_data() and skb_release_all(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-31net: dropreason: add SKB_CONSUMED reasonEric Dumazet
This will allow to simply use in the future: kfree_skb_reason(skb, reason); Instead of repeating sequences like: if (dropped) kfree_skb_reason(skb, reason); else consume_skb(skb); For instance, following patch in the series is adding @reason to skb_release_data() and skb_release_all(), so that we can propagate a meaningful @reason whenever consume_skb()/kfree_skb() have to take care of a potential frag_list. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-31rtnetlink: Honour NLM_F_ECHO flag in rtnl_delete_linkHangbin Liu
This patch use the new helper unregister_netdevice_many_notify() for rtnl_delete_link(), so that the kernel could reply unicast when userspace set NLM_F_ECHO flag to request the new created interface info. At the same time, the parameters of rtnl_delete_link() need to be updated since we need nlmsghdr and portid info. Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-31rtnetlink: Honour NLM_F_ECHO flag in rtnl_newlink_createHangbin Liu
This patch pass the netlink header message in rtnl_newlink_create() to the new updated rtnl_configure_link(), so that the kernel could reply unicast when userspace set NLM_F_ECHO flag to request the new created interface info. Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-31net: add new helper unregister_netdevice_many_notifyHangbin Liu
Add new helper unregister_netdevice_many_notify(), pass netlink message header and portid, which could be used to notify userspace when flag NLM_F_ECHO is set. Make the unregister_netdevice_many() as a wrapper of new function unregister_netdevice_many_notify(). Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-31rtnetlink: pass netlink message header and portid to rtnl_configure_link()Hangbin Liu
This patch pass netlink message header and portid to rtnl_configure_link() All the functions in this call chain need to add the parameters so we can use them in the last call rtnl_notify(), and notify the userspace about the new link info if NLM_F_ECHO flag is set. - rtnl_configure_link() - __dev_notify_flags() - rtmsg_ifinfo() - rtmsg_ifinfo_event() - rtmsg_ifinfo_build_skb() - rtmsg_ifinfo_send() - rtnl_notify() Also move __dev_notify_flags() declaration to net/core/dev.h, as Jakub suggested. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-31net: sched: Fix use after free in red_enqueue()Dan Carpenter
We can't use "skb" again after passing it to qdisc_enqueue(). This is basically identical to commit 2f09707d0c97 ("sch_sfb: Also store skb len before calling child enqueue"). Fixes: d7f4f332f082 ("sch_red: update backlog as well") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31netlink: split up copies in the ack constructionJakub Kicinski
Clean up the use of unsafe_memcpy() by adding a flexible array at the end of netlink message header and splitting up the header and data copies. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28net/packet: add PACKET_FANOUT_FLAG_IGNORE_OUTGOINGWillem de Bruijn
Extend packet socket option PACKET_IGNORE_OUTGOING to fanout groups. The socket option sets ptype.ignore_outgoing, which makes dev_queue_xmit_nit skip the socket. When the socket joins a fanout group, the option is not reflected in the struct ptype of the group. dev_queue_xmit_nit only tests the fanout ptype, so the flag is ignored once a socket joins a fanout group. Inheriting the option from a socket would change established behavior. Different sockets in the group can set different flags, and can also change them at runtime. Testing in packet_rcv_fanout defeats the purpose of the original patch, which is to avoid skb_clone in dev_queue_xmit_nit (esp. for MSG_ZEROCOPY packets). Instead, introduce a new fanout group flag with the same behavior. Tested with https://github.com/wdebruij/kerneltools/blob/master/tests/test_psock_fanout_ignore_outgoing.c Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20221027211014.3581513-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: dsa: fall back to default tagger if we can't load the one from DTVladimir Oltean
DSA tagging protocol drivers can be changed at runtime through sysfs and at probe time through the device tree (support for the latter was added later). When changing through sysfs, it is assumed that the module for the new tagging protocol was already loaded into the kernel (in fact this is only a concern for Ocelot/Felix switches, where we have tag_ocelot.ko and tag_ocelot_8021q.ko; for every other switch, the default and alternative protocols are compiled within the same .ko, so there is nothing for the user to load). The kernel cannot currently call request_module(), because it has no way of constructing the modalias name of the tagging protocol driver ("dsa_tag-%d", where the number is one of DSA_TAG_PROTO_*_VALUE). The device tree only contains the string name of the tagging protocol ("ocelot-8021q"), and the only mapping between the string and the DSA_TAG_PROTO_OCELOT_8021Q_VALUE is present in tag_ocelot_8021q.ko. So this is a chicken-and-egg situation and dsa_core.ko has nothing based on which it can automatically request the insertion of the module. As a consequence, if CONFIG_NET_DSA_TAG_OCELOT_8021Q is built as module, the switch will forever defer probing. The long-term solution is to make DSA call request_module() somehow, but that probably needs some refactoring. What we can do to keep operating with existing device tree blobs is to cancel the attempt to change the tagging protocol with the one specified there, and to remain operating with the default one. Depending on the situation, the default protocol might still allow some functionality (in the case of ocelot, it does), and it's better to have that than to fail to probe. Fixes: deff710703d8 ("net: dsa: Allow default tag protocol to be overridden from DT") Link: https://lore.kernel.org/lkml/20221027113248.420216-1-michael@walle.cc/ Reported-by: Heiko Thiery <heiko.thiery@gmail.com> Reported-by: Michael Walle <michael@walle.cc> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Michael Walle <michael@walle.cc> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20221027145439.3086017-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: also flag accepted sockets supporting msghdr originated zerocopyStefan Metzmacher
Without this only the client initiated tcp sockets have SOCK_SUPPORT_ZC. The listening socket on the server also has it, but the accepted connections didn't, which meant IORING_OP_SEND[MSG]_ZC will always fails with -EOPNOTSUPP. Fixes: e993ffe3da4b ("net: flag sockets supporting msghdr originated zerocopy") Cc: <stable@vger.kernel.org> # 6.0 CC: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/io-uring/20221024141503.22b4e251@kernel.org/T/#m38aa19b0b825758fb97860a38ad13122051f9dda Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net/ulp: remove SOCK_SUPPORT_ZC from tls socketsPavel Begunkov
Remove SOCK_SUPPORT_ZC when we're setting ulp as it might not support msghdr::ubuf_info, e.g. like TLS replacing ->sk_prot with a new set of handlers. Cc: <stable@vger.kernel.org> # 6.0 Reported-by: Jakub Kicinski <kuba@kernel.org> Fixes: e993ffe3da4bc ("net: flag sockets supporting msghdr originated zerocopy") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: remove SOCK_SUPPORT_ZC from sockmapPavel Begunkov
sockmap replaces ->sk_prot with its own callbacks, we should remove SOCK_SUPPORT_ZC as the new proto doesn't support msghdr::ubuf_info. Cc: <stable@vger.kernel.org> # 6.0 Reported-by: Jakub Kicinski <kuba@kernel.org> Fixes: e993ffe3da4bc ("net: flag sockets supporting msghdr originated zerocopy") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28udp: advertise ipv6 udp support for msghdr::ubuf_infoPavel Begunkov
Mark udp ipv6 as supporting msghdr::ubuf_info. In the original commit SOCK_SUPPORT_ZC was supposed to be set by a udp_init_sock() call from udp6_init_sock(), but d38afeec26ed4 ("tcp/udp: Call inet6_destroy_sock() in IPv6 ...") removed it and so ipv6 udp misses the flag. Cc: <stable@vger.kernel.org> # 6.0 Fixes: e993ffe3da4bc ("net: flag sockets supporting msghdr originated zerocopy") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: openvswitch: add missing .resv_start_opJakub Kicinski
I missed one of the families in OvS when annotating .resv_start_op. This triggers the warning added in commit ce48ebdd5651 ("genetlink: limit the use of validation workarounds to old ops"). Reported-by: syzbot+40eb8c0447c0e47a7e9b@syzkaller.appspotmail.com Fixes: 9c5d03d36251 ("genetlink: start to validate reserved header bytes") Link: https://lore.kernel.org/r/20221028032501.2724270-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: Remove the obsolte u64_stats_fetch_*_irq() users (net).Thomas Gleixner
Now that the 32bit UP oddity is gone and 32bit uses always a sequence count, there is no need for the fetch_irq() variants anymore. Convert to the regular interface. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28Kalle Valo says:Jakub Kicinski
==================== pull-request: wireless-next-2022-10-28 First set of patches v6.2. mac80211 refactoring continues for Wi-Fi 7. All mac80211 driver are now converted to use internal TX queues, this might cause some regressions so we wanted to do this early in the cycle. Note: wireless tree was merged[1] to wireless-next to avoid some conflicts with mac80211 patches between the trees. Unfortunately there are still two smaller conflicts in net/mac80211/util.c which Stephen also reported[2]. In the first conflict initialise scratch_len to "params->scratch_len ?: 3 * params->len" (note number 3, not 2!) and in the second conflict take the version which uses elems->scratch_pos. [1] https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next.git/commit/?id=dfd2d876b3fda1790bc0239ba4c6967e25d16e91 [2] https://lore.kernel.org/all/20221020032340.5cf101c0@canb.auug.org.au/ mac80211 - preparation for Wi-Fi 7 Multi-Link Operation (MLO) continues - add API to show the link STAs in debugfs - all mac80211 drivers are now using mac80211 internal TX queues (iTXQs) rtw89 - support 8852BE rtl8xxxu - support RTL8188FU brmfmac - support two station interfaces concurrently bcma - support SPROM rev 11 ==================== Link: https://lore.kernel.org/r/20221028132943.304ECC433B5@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28tcp: add rcv_wnd and plb_rehash to TCP_INFOMubashir Adnan Qureshi
rcv_wnd can be useful to diagnose TCP performance where receiver window becomes the bottleneck. rehash reports the PLB and timeout triggered rehash attempts by the TCP connection. Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>