Age | Commit message (Collapse) | Author |
|
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/ethernet/ti/icssg/icssg_prueth.c
net/mac80211/chan.c
89884459a0b9 ("wifi: mac80211: fix idle calculation with multi-link")
87f5500285fb ("wifi: mac80211: simplify ieee80211_assign_link_chanctx()")
https://lore.kernel.org/all/20240422105623.7b1fbda2@canb.auug.org.au/
net/unix/garbage.c
1971d13ffa84 ("af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().")
4090fa373f0e ("af_unix: Replace garbage collection algorithm.")
drivers/net/ethernet/ti/icssg/icssg_prueth.c
drivers/net/ethernet/ti/icssg/icssg_common.c
4dcd0e83ea1d ("net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()")
e2dc7bfd677f ("net: ti: icssg-prueth: Move common functions into a separate file")
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
While testing TCP performance with latest trees,
I saw suspect SOCKET_BACKLOG drops.
tcp_add_backlog() computes its limit with :
limit = (u32)READ_ONCE(sk->sk_rcvbuf) +
(u32)(READ_ONCE(sk->sk_sndbuf) >> 1);
limit += 64 * 1024;
This does not take into account that sk->sk_backlog.len
is reset only at the very end of __release_sock().
Both sk->sk_backlog.len and sk->sk_rmem_alloc could reach
sk_rcvbuf in normal conditions.
We should double sk->sk_rcvbuf contribution in the formula
to absorb bubbles in the backlog, which happen more often
for very fast flows.
This change maintains decent protection against abuses.
Fixes: c377411f2494 ("net: sk_add_backlog() take rmem_alloc into account")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240423125620.3309458-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.10
The second "new features" pull request for v6.10 with changes both in
stack and in drivers. This time the pull request is rather small and
nothing special standing out except maybe that we have several
kernel-doc fixes. Great to see that we are getting warning free
wireless code (until new warnings are added).
Major changes:
rtl8xxxu:
* enable Management Frame Protection (MFP) support
rtw88:
* disable unsupported interface type of mesh point for all chips, and only
support station mode for SDIO chips.
* tag 'wireless-next-2024-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (63 commits)
wifi: mac80211: handle link ID during management Tx
wifi: mac80211: handle sdata->u.ap.active flag with MLO
wifi: cfg80211: add return docs for regulatory functions
wifi: cfg80211: make some regulatory functions void
wifi: mac80211: add return docs for sta_info_flush()
wifi: mac80211: keep mac80211 consistent on link activation failure
wifi: mac80211: simplify ieee80211_assign_link_chanctx()
wifi: mac80211: reserve chanctx during find
wifi: cfg80211: fix cfg80211 function kernel-doc
wifi: mac80211_hwsim: Use wider regulatory for custom for 6GHz tests
wifi: iwlwifi: mvm: Don't allow EMLSR when the RSSI is low
wifi: iwlwifi: mvm: disable EMLSR when we suspend with wowlan
wifi: iwlwifi: mvm: get periodic statistics in EMLSR
wifi: iwlwifi: mvm: don't recompute EMLSR mode in can_activate_links
wifi: iwlwifi: mvm: implement EMLSR prevention mechanism.
wifi: iwlwifi: mvm: exit EMLSR upon missed beacon
wifi: iwlwifi: mvm: init vif works only once
wifi: iwlwifi: mvm: Add helper functions to update EMLSR status
wifi: iwlwifi: mvm: Implement new link selection algorithm
wifi: iwlwifi: mvm: move EMLSR/links code
...
====================
Link: https://lore.kernel.org/r/20240424100122.217AEC113CE@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, wireless and bluetooth.
Nothing major, regression fixes are mostly in drivers, two more of
those are flowing towards us thru various trees. I wish some of the
changes went into -rc5, we'll try to keep an eye on frequency of PRs
from sub-trees.
Also disproportional number of fixes for bugs added in v6.4, strange
coincidence.
Current release - regressions:
- igc: fix LED-related deadlock on driver unbind
- wifi: mac80211: small fixes to recent clean up of the connection
process
- Revert "wifi: iwlwifi: bump FW API to 90 for BZ/SC devices", kernel
doesn't have all the code to deal with that version, yet
- Bluetooth:
- set power_ctrl_enabled on NULL returned by gpiod_get_optional()
- qca: fix invalid device address check, again
- eth: ravb: fix registered interrupt names
Current release - new code bugs:
- wifi: mac80211: check EHT/TTLM action frame length
Previous releases - regressions:
- fix sk_memory_allocated_{add|sub} for architectures where
__this_cpu_{add|sub}* are not IRQ-safe
- dsa: mv88e6xx: fix link setup for 88E6250
Previous releases - always broken:
- ip: validate dev returned from __in_dev_get_rcu(), prevent possible
null-derefs in a few places
- switch number of for_each_rcu() loops using call_rcu() on the
iterator to for_each_safe()
- macsec: fix isolation of broadcast traffic in presence of offload
- vxlan: drop packets from invalid source address
- eth: mlxsw: trap and ACL programming fixes
- eth: bnxt: PCIe error recovery fixes, fix counting dropped packets
- Bluetooth:
- lots of fixes for the command submission rework from v6.4
- qca: fix NULL-deref on non-serdev suspend
Misc:
- tools: ynl: don't ignore errors in NLMSG_DONE messages"
* tag 'net-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().
net: b44: set pause params only when interface is up
tls: fix lockless read of strp->msg_ready in ->poll
dpll: fix dpll_pin_on_pin_register() for multiple parent pins
net: ravb: Fix registered interrupt names
octeontx2-af: fix the double free in rvu_npc_freemem()
net: ethernet: ti: am65-cpts: Fix PTPv1 message type on TX packets
ice: fix LAG and VF lock dependency in ice_reset_vf()
iavf: Fix TC config comparison with existing adapter TC config
i40e: Report MFS in decimal base instead of hex
i40e: Do not use WQ_MEM_RECLAIM flag for workqueue
net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()
net/mlx5e: Advertise mlx5 ethernet driver updates sk_buff md_dst for MACsec
macsec: Detect if Rx skb is macsec-related for offloading devices that update md_dst
ethernet: Add helper for assigning packet type when dest address does not match device address
macsec: Enable devices to advertise whether they update sk_buff md_dst during offloads
net: phy: dp83869: Fix MII mode failure
netfilter: nf_tables: honor table dormant flag from netdev release event path
eth: bnxt: fix counting packets discarded due to OOM and netpoll
igc: Fix LED-related deadlock on driver unbind
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains two Netfilter/IPVS fixes for net:
Patch #1 fixes SCTP checksumming for IPVS with gso packets,
from Ismael Luceno.
Patch #2 honor dormant flag from netdev event path to fix a possible
double hook unregistration.
* tag 'nf-24-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: honor table dormant flag from netdev release event path
ipvs: Fix checksumming on GSO of SCTP packets
====================
Link: https://lore.kernel.org/r/20240425090149.1359547-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot reported a lockdep splat regarding unix_gc_lock and
unix_state_lock().
One is called from recvmsg() for a connected socket, and another
is called from GC for TCP_LISTEN socket.
So, the splat is false-positive.
Let's add a dedicated lock class for the latter to suppress the splat.
Note that this change is not necessary for net-next.git as the issue
is only applied to the old GC impl.
[0]:
WARNING: possible circular locking dependency detected
6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0 Not tainted
-----------------------------------------------------
kworker/u8:1/11 is trying to acquire lock:
ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: __unix_gc+0x40e/0xf70 net/unix/garbage.c:302
but task is already holding lock:
ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (unix_gc_lock){+.+.}-{2:2}:
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
__raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
_raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:351 [inline]
unix_notinflight+0x13d/0x390 net/unix/garbage.c:140
unix_detach_fds net/unix/af_unix.c:1819 [inline]
unix_destruct_scm+0x221/0x350 net/unix/af_unix.c:1876
skb_release_head_state+0x100/0x250 net/core/skbuff.c:1188
skb_release_all net/core/skbuff.c:1200 [inline]
__kfree_skb net/core/skbuff.c:1216 [inline]
kfree_skb_reason+0x16d/0x3b0 net/core/skbuff.c:1252
kfree_skb include/linux/skbuff.h:1262 [inline]
manage_oob net/unix/af_unix.c:2672 [inline]
unix_stream_read_generic+0x1125/0x2700 net/unix/af_unix.c:2749
unix_stream_splice_read+0x239/0x320 net/unix/af_unix.c:2981
do_splice_read fs/splice.c:985 [inline]
splice_file_to_pipe+0x299/0x500 fs/splice.c:1295
do_splice+0xf2d/0x1880 fs/splice.c:1379
__do_splice fs/splice.c:1436 [inline]
__do_sys_splice fs/splice.c:1652 [inline]
__se_sys_splice+0x331/0x4a0 fs/splice.c:1634
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #0 (&u->lock){+.+.}-{2:2}:
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
__raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
_raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:351 [inline]
__unix_gc+0x40e/0xf70 net/unix/garbage.c:302
process_one_work kernel/workqueue.c:3254 [inline]
process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335
worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(unix_gc_lock);
lock(&u->lock);
lock(unix_gc_lock);
lock(&u->lock);
*** DEADLOCK ***
3 locks held by kworker/u8:1/11:
#0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3229 [inline]
#0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_scheduled_works+0x8e0/0x17c0 kernel/workqueue.c:3335
#1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3230 [inline]
#1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_scheduled_works+0x91b/0x17c0 kernel/workqueue.c:3335
#2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
#2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261
stack backtrace:
CPU: 0 PID: 11 Comm: kworker/u8:1 Not tainted 6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Workqueue: events_unbound __unix_gc
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
__raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
_raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:351 [inline]
__unix_gc+0x40e/0xf70 net/unix/garbage.c:302
process_one_work kernel/workqueue.c:3254 [inline]
process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335
worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Fixes: 47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()")
Reported-and-tested-by: syzbot+fa379358c28cc87cc307@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=fa379358c28cc87cc307
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240424170443.9832-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tls_sk_poll is called without locking the socket, and needs to read
strp->msg_ready (via tls_strp_msg_ready). Convert msg_ready to a bool
and use READ_ONCE/WRITE_ONCE where needed. The remaining reads are
only performed when the socket is locked.
Fixes: 121dca784fc0 ("tls: suppress wakeups unless we have a full record")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/0b7ee062319037cf86af6b317b3d72f7bfcd2e97.1713797701.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
match device address
Enable reuse of logic in eth_type_trans for determining packet type.
Suggested-by: Sabrina Dubroca <sd@queasysnail.net>
Cc: stable@vger.kernel.org
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/20240423181319.115860-3-rrameshbabu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In br_fill_forward_path(), f->dst is checked not to be NULL, then
immediately read using READ_ONCE and checked again. The first check is
useless, so this patch aims to remove the redundant check of f->dst.
Signed-off-by: linke li <lilinke99@qq.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes berg says:
====================
Fixes for the current cycle:
* ath11k: convert to correct RCU iteration of IPv6 addresses
* iwlwifi: link ID, FW API version, scanning and PASN fixes
* cfg80211: NULL-deref and tracing fixes
* mac80211: connection mode, mesh fast-TX, multi-link and
various other small fixes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Check for table dormant flag otherwise netdev release event path tries
to unregister an already unregistered hook.
[524854.857999] ------------[ cut here ]------------
[524854.858010] WARNING: CPU: 0 PID: 3386599 at net/netfilter/core.c:501 __nf_unregister_net_hook+0x21a/0x260
[...]
[524854.858848] CPU: 0 PID: 3386599 Comm: kworker/u32:2 Not tainted 6.9.0-rc3+ #365
[524854.858869] Workqueue: netns cleanup_net
[524854.858886] RIP: 0010:__nf_unregister_net_hook+0x21a/0x260
[524854.858903] Code: 24 e8 aa 73 83 ff 48 63 43 1c 83 f8 01 0f 85 3d ff ff ff e8 98 d1 f0 ff 48 8b 3c 24 e8 8f 73 83 ff 48 63 43 1c e9 26 ff ff ff <0f> 0b 48 83 c4 18 48 c7 c7 00 68 e9 82 5b 5d 41 5c 41 5d 41 5e 41
[524854.858914] RSP: 0018:ffff8881e36d79e0 EFLAGS: 00010246
[524854.858926] RAX: 0000000000000000 RBX: ffff8881339ae790 RCX: ffffffff81ba524a
[524854.858936] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff8881c8a16438
[524854.858945] RBP: ffff8881c8a16438 R08: 0000000000000001 R09: ffffed103c6daf34
[524854.858954] R10: ffff8881e36d79a7 R11: 0000000000000000 R12: 0000000000000005
[524854.858962] R13: ffff8881c8a16000 R14: 0000000000000000 R15: ffff8881351b5a00
[524854.858971] FS: 0000000000000000(0000) GS:ffff888390800000(0000) knlGS:0000000000000000
[524854.858982] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[524854.858991] CR2: 00007fc9be0f16f4 CR3: 00000001437cc004 CR4: 00000000001706f0
[524854.859000] Call Trace:
[524854.859006] <TASK>
[524854.859013] ? __warn+0x9f/0x1a0
[524854.859027] ? __nf_unregister_net_hook+0x21a/0x260
[524854.859044] ? report_bug+0x1b1/0x1e0
[524854.859060] ? handle_bug+0x3c/0x70
[524854.859071] ? exc_invalid_op+0x17/0x40
[524854.859083] ? asm_exc_invalid_op+0x1a/0x20
[524854.859100] ? __nf_unregister_net_hook+0x6a/0x260
[524854.859116] ? __nf_unregister_net_hook+0x21a/0x260
[524854.859135] nf_tables_netdev_event+0x337/0x390 [nf_tables]
[524854.859304] ? __pfx_nf_tables_netdev_event+0x10/0x10 [nf_tables]
[524854.859461] ? packet_notifier+0xb3/0x360
[524854.859476] ? _raw_spin_unlock_irqrestore+0x11/0x40
[524854.859489] ? dcbnl_netdevice_event+0x35/0x140
[524854.859507] ? __pfx_nf_tables_netdev_event+0x10/0x10 [nf_tables]
[524854.859661] notifier_call_chain+0x7d/0x140
[524854.859677] unregister_netdevice_many_notify+0x5e1/0xae0
Fixes: d54725cd11a5 ("netfilter: nf_tables: support for multiple devices per netdev hook")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Marking TCP_SKB_CB(skb)->sacked with TCPCB_EVER_RETRANS after the
traceopint (trace_tcp_retransmit_skb), then we can get the
retransmission efficiency by counting skbs w/ and w/o TCPCB_EVER_RETRANS
mark in this tracepoint.
We have discussed to achieve this with BPF_SOCK_OPS in [0], and using
tracepoint is thought to be a better solution.
[0]
https://lore.kernel.org/all/20240417124622.35333-1-lulie@linux.alibaba.com/
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- qca: set power_ctrl_enabled on NULL returned by gpiod_get_optional()
- hci_sync: Using hci_cmd_sync_submit when removing Adv Monitor
- qca: fix invalid device address check
- hci_sync: Use advertised PHYs on hci_le_ext_create_conn_sync
- Fix type of len in {l2cap,sco}_sock_getsockopt_old()
- btusb: mediatek: Fix double free of skb in coredump
- btusb: Add Realtek RTL8852BE support ID 0x0bda:0x4853
- btusb: Fix triggering coredump implementation for QCA
* tag 'for-net-2024-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: qca: set power_ctrl_enabled on NULL returned by gpiod_get_optional()
Bluetooth: hci_sync: Using hci_cmd_sync_submit when removing Adv Monitor
Bluetooth: qca: fix NULL-deref on non-serdev setup
Bluetooth: qca: fix NULL-deref on non-serdev suspend
Bluetooth: btusb: mediatek: Fix double free of skb in coredump
Bluetooth: MGMT: Fix failing to MGMT_OP_ADD_UUID/MGMT_OP_REMOVE_UUID
Bluetooth: qca: fix invalid device address check
Bluetooth: hci_event: Fix sending HCI_OP_READ_ENC_KEY_SIZE
Bluetooth: btusb: Fix triggering coredump implementation for QCA
Bluetooth: btusb: Add Realtek RTL8852BE support ID 0x0bda:0x4853
Bluetooth: hci_sync: Use advertised PHYs on hci_le_ext_create_conn_sync
Bluetooth: Fix type of len in {l2cap,sco}_sock_getsockopt_old()
====================
Link: https://lore.kernel.org/r/20240424204102.2319483-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
ice: Support 5 layer Tx scheduler topology
Mateusz Polchlopek says:
For performance reasons there is a need to have support for selectable
Tx scheduler topology. Currently firmware supports only the default
9-layer and 5-layer topology. This patch series enables switch from
default to 5-layer topology, if user decides to opt-in.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: Document tx_scheduling_layers parameter
ice: Add tx_scheduling_layers devlink param
ice: Enable switching default Tx scheduler topology
ice: Adjust the VSI/Aggregator layers
ice: Support 5 layer topology
devlink: extend devlink_param *set pointer
====================
Link: https://lore.kernel.org/r/20240422203913.225151-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since kfree_rcu, which is called in the hlist_for_each_entry_rcu traversal
of ovs_ct_limit_exit, is not part of the RCU read critical section, it
is possible that the RCU grace period will pass during the traversal and
the key will be free.
To prevent this, it should be changed to hlist_for_each_entry_safe.
Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit")
Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://lore.kernel.org/r/ZiYvzQN/Ry5oeFQW@v4bel-B760M-AORUS-ELITE-AX
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
dev_get_by_name will provide a reference on the netdev. So ensure that
the reference of netdev is released after completed.
Fixes: 2540088b836f ("net: openvswitch: Check vport netdev name")
Signed-off-by: Jun Gu <jun.gu@easystack.cn>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://lore.kernel.org/r/20240423073751.52706-1-jun.gu@easystack.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It was observed in the wild that pairs of consecutive packets would leave
the IPVS with the same wrong checksum, and the issue only went away when
disabling GSO.
IPVS needs to avoid computing the SCTP checksum when using GSO.
Fixes: 90017accff61 ("sctp: Add GSO support")
Co-developed-by: Firo Yang <firo.yang@suse.com>
Signed-off-by: Ismael Luceno <iluceno@suse.de>
Tested-by: Andreas Taschner <andreas.taschner@suse.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Since the d883a4669a1de be introduced in v6.4, bluetooth daemon
got the following failed message of MGMT_OP_REMOVE_ADV_MONITOR
command when controller is power-off:
bluetoothd[20976]:
src/adapter.c:reset_adv_monitors_complete() Failed to reset Adv
Monitors: Failed>
Normally this situation is happened when the bluetoothd deamon
be started manually after system booting. Which means that
bluetoothd received MGMT_EV_INDEX_ADDED event after kernel
runs hci_power_off().
Base on doc/mgmt-api.txt, the MGMT_OP_REMOVE_ADV_MONITOR command
can be used when the controller is not powered. This patch changes
the code in remove_adv_monitor() to use hci_cmd_sync_submit()
instead of hci_cmd_sync_queue().
Fixes: d883a4669a1de ("Bluetooth: hci_sync: Only allow hci_cmd_sync_queue if running")
Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Cc: Manish Mandlik <mmandlik@google.com>
Cc: Archie Pusaka <apusaka@chromium.org>
Cc: Miao-chen Chou <mcchou@chromium.org>
Signed-off-by: Chun-Yi Lee <jlee@suse.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
These commands don't require the adapter to be up and running so don't
use hci_cmd_sync_queue which would check that flag, instead use
hci_cmd_sync_submit which would ensure mgmt_class_complete is set
properly regardless if any command was actually run or not.
Link: https://github.com/bluez/bluez/issues/809
Fixes: d883a4669a1d ("Bluetooth: hci_sync: Only allow hci_cmd_sync_queue if running")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
The code shall always check if HCI_QUIRK_BROKEN_READ_ENC_KEY_SIZE has
been set before attempting to use HCI_OP_READ_ENC_KEY_SIZE.
Fixes: c569242cd492 ("Bluetooth: hci_event: set the conn encrypted before conn establishes")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
The extended advertising reports do report the PHYs so this store then
in hci_conn so it can be later used in hci_le_ext_create_conn_sync to
narrow the PHYs to be scanned since the controller will also perform a
scan having a smaller set of PHYs shall reduce the time it takes to
find and connect peers.
Fixes: 288c90224eec ("Bluetooth: Enable all supported LE PHY by default")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
After an innocuous optimization change in LLVM main (19.0.0), x86_64
allmodconfig (which enables CONFIG_KCSAN / -fsanitize=thread) fails to
build due to the checks in check_copy_size():
In file included from net/bluetooth/sco.c:27:
In file included from include/linux/module.h:13:
In file included from include/linux/stat.h:19:
In file included from include/linux/time.h:60:
In file included from include/linux/time32.h:13:
In file included from include/linux/timex.h:67:
In file included from arch/x86/include/asm/timex.h:6:
In file included from arch/x86/include/asm/tsc.h:10:
In file included from arch/x86/include/asm/msr.h:15:
In file included from include/linux/percpu.h:7:
In file included from include/linux/smp.h:118:
include/linux/thread_info.h:244:4: error: call to '__bad_copy_from'
declared with 'error' attribute: copy source size is too small
244 | __bad_copy_from();
| ^
The same exact error occurs in l2cap_sock.c. The copy_to_user()
statements that are failing come from l2cap_sock_getsockopt_old() and
sco_sock_getsockopt_old(). This does not occur with GCC with or without
KCSAN or Clang without KCSAN enabled.
len is defined as an 'int' because it is assigned from
'__user int *optlen'. However, it is clamped against the result of
sizeof(), which has a type of 'size_t' ('unsigned long' for 64-bit
platforms). This is done with min_t() because min() requires compatible
types, which results in both len and the result of sizeof() being casted
to 'unsigned int', meaning len changes signs and the result of sizeof()
is truncated. From there, len is passed to copy_to_user(), which has a
third parameter type of 'unsigned long', so it is widened and changes
signs again. This excessive casting in combination with the KCSAN
instrumentation causes LLVM to fail to eliminate the __bad_copy_from()
call, failing the build.
The official recommendation from LLVM developers is to consistently use
long types for all size variables to avoid the unnecessary casting in
the first place. Change the type of len to size_t in both
l2cap_sock_getsockopt_old() and sco_sock_getsockopt_old(). This clears
up the error while allowing min_t() to be replaced with min(), resulting
in simpler code with no casts and fewer implicit conversions. While len
is a different type than optlen now, it should result in no functional
change because the result of sizeof() will clamp all values of optlen in
the same manner as before.
Cc: stable@vger.kernel.org
Closes: https://github.com/ClangBuiltLinux/linux/issues/2007
Link: https://github.com/llvm/llvm-project/issues/85647
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
There are several functions taking pointers to data they don't modify.
This includes statistics fetching, page and page_pool parameters, etc.
Constify the pointers, so that call sites will be able to pass const
pointers as well.
No functional changes, no visible changes in functions sizes.
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
It is impossible to use init_dummy_netdev together with alloc_netdev()
as the 'setup' argument.
This is because alloc_netdev() initializes some fields in the net_device
structure, and later init_dummy_netdev() memzero them all. This causes
some problems as reported here:
https://lore.kernel.org/all/20240322082336.49f110cc@kernel.org/
Split the init_dummy_netdev() function in two. Create a new function called
init_dummy_netdev_core() that does not memzero the net_device structure.
Then have init_dummy_netdev() memzero-ing and calling
init_dummy_netdev_core(), keeping the old behaviour.
init_dummy_netdev_core() is the new function that could be called as an
argument for alloc_netdev().
Also, create a helper to allocate and initialize dummy net devices,
leveraging init_dummy_netdev_core() as the setup argument. This function
basically simplify the allocation of dummy devices, by allocating and
initializing it. Freeing the device continue to be done through
free_netdev()
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For dummy devices, exit earlier at free_netdev() instead of executing
the whole function. This is necessary, because dummy devices are
special, and shouldn't have the second part of the function executed.
Otherwise reg_state, which is NETREG_DUMMY, will be overwritten and
there will be no way to identify that this is a dummy device. Also, this
device do not need the final put_device(), since dummy devices are not
registered (through register_netdevice()), where the device reference is
increased (at netdev_register_kobject()/device_add()).
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix bad grammar in description of init_dummy_netdev() function. This
topic showed up in the review of the "allocate dummy device dynamically"
patch set.
Suggested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since call_rcu, which is called in the hlist_for_each_entry_rcu traversal
of tcp_ao_connect_init, is not part of the RCU read critical section, it
is possible that the RCU grace period will pass during the traversal and
the key will be free.
To prevent this, it should be changed to hlist_for_each_entry_safe.
Fixes: 7c2ffaf21bd6 ("net/tcp: Calculate TCP-AO traffic keys")
Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://lore.kernel.org/r/ZiYu9NJ/ClR8uSkH@v4bel-B760M-AORUS-ELITE-AX
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If we no longer hold RTNL, we must use netdev_master_upper_dev_get_rcu()
instead of netdev_master_upper_dev_get().
Fixes: ba0f78069423 ("neighbour: no longer hold RTNL in neigh_dump_info()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240421185753.1808077-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot was able to trigger a NULL deref in fib_validate_source()
in an old tree [1].
It appears the bug exists in latest trees.
All calls to __in_dev_get_rcu() must be checked for a NULL result.
[1]
general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 2 PID: 3257 Comm: syz-executor.3 Not tainted 5.10.0-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:fib_validate_source+0xbf/0x15a0 net/ipv4/fib_frontend.c:425
Code: 18 f2 f2 f2 f2 42 c7 44 20 23 f3 f3 f3 f3 48 89 44 24 78 42 c6 44 20 27 f3 e8 5d 88 48 fc 4c 89 e8 48 c1 e8 03 48 89 44 24 18 <42> 80 3c 20 00 74 08 4c 89 ef e8 d2 15 98 fc 48 89 5c 24 10 41 bf
RSP: 0018:ffffc900015fee40 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88800f7a4000 RCX: ffff88800f4f90c0
RDX: 0000000000000000 RSI: 0000000004001eac RDI: ffff8880160c64c0
RBP: ffffc900015ff060 R08: 0000000000000000 R09: ffff88800f7a4000
R10: 0000000000000002 R11: ffff88800f4f90c0 R12: dffffc0000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88800f7a4000
FS: 00007f938acfe6c0(0000) GS:ffff888058c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f938acddd58 CR3: 000000001248e000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
ip_route_use_hint+0x410/0x9b0 net/ipv4/route.c:2231
ip_rcv_finish_core+0x2c4/0x1a30 net/ipv4/ip_input.c:327
ip_list_rcv_finish net/ipv4/ip_input.c:612 [inline]
ip_sublist_rcv+0x3ed/0xe50 net/ipv4/ip_input.c:638
ip_list_rcv+0x422/0x470 net/ipv4/ip_input.c:673
__netif_receive_skb_list_ptype net/core/dev.c:5572 [inline]
__netif_receive_skb_list_core+0x6b1/0x890 net/core/dev.c:5620
__netif_receive_skb_list net/core/dev.c:5672 [inline]
netif_receive_skb_list_internal+0x9f9/0xdc0 net/core/dev.c:5764
netif_receive_skb_list+0x55/0x3e0 net/core/dev.c:5816
xdp_recv_frames net/bpf/test_run.c:257 [inline]
xdp_test_run_batch net/bpf/test_run.c:335 [inline]
bpf_test_run_xdp_live+0x1818/0x1d00 net/bpf/test_run.c:363
bpf_prog_test_run_xdp+0x81f/0x1170 net/bpf/test_run.c:1376
bpf_prog_test_run+0x349/0x3c0 kernel/bpf/syscall.c:3736
__sys_bpf+0x45c/0x710 kernel/bpf/syscall.c:5115
__do_sys_bpf kernel/bpf/syscall.c:5201 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5199 [inline]
__x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5199
Fixes: 02b24941619f ("ipv4: use dst hint for ipv4 list receive")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240421184326.1704930-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Note that when this commit message refers to netlink dump
it only means the actual dumping part, the parsing / dump
start is handled by the same code as "doit".
Commit 4a19edb60d02 ("netlink: Pass extack to dump handlers")
added support for returning extack messages from dump handlers,
but left out other extack info, e.g. bad attribute.
This used to be fine because until YNL we had little practical
use for the machine readable attributes, and only messages were
used in practice.
YNL flips the preference 180 degrees, it's now much more useful
to point to a bad attr with NL_SET_BAD_ATTR() than type
an English message saying "attribute XYZ is $reason-why-bad".
Support all of extack. The fact that extack only gets added if
it fits remains unaddressed.
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240420023543.3300306-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Next change will need them in netlink_dump_done(), pure move.
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240420023543.3300306-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Having to filter the right ifindex in the tests is a bit tedious.
Add support for dumping qstats for a single ifindex.
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240420023543.3300306-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot reported use-after-free in unix_del_edges(). [0]
What the repro does is basically repeat the following quickly.
1. pass a fd of an AF_UNIX socket to itself
socketpair(AF_UNIX, SOCK_DGRAM, 0, [3, 4]) = 0
sendmsg(3, {..., msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET,
cmsg_type=SCM_RIGHTS, cmsg_data=[4]}], ...}, 0) = 0
2. pass other fds of AF_UNIX sockets to the socket above
socketpair(AF_UNIX, SOCK_SEQPACKET, 0, [5, 6]) = 0
sendmsg(3, {..., msg_control=[{cmsg_len=48, cmsg_level=SOL_SOCKET,
cmsg_type=SCM_RIGHTS, cmsg_data=[5, 6]}], ...}, 0) = 0
3. close all sockets
Here, two skb are created, and every unix_edge->successor is the first
socket. Then, __unix_gc() will garbage-collect the two skb:
(a) free skb with self-referencing fd
(b) free skb holding other sockets
After (a), the self-referencing socket will be scheduled to be freed
later by the delayed_fput() task.
syzbot repeated the sequences above (1. ~ 3.) quickly and triggered
the task concurrently while GC was running.
So, at (b), the socket was already freed, and accessing it was illegal.
unix_del_edges() accesses the receiver socket as edge->successor to
optimise GC. However, we should not do it during GC.
Garbage-collecting sockets does not change the shape of the rest
of the graph, so we need not call unix_update_graph() to update
unix_graph_grouped when we purge skb.
However, if we clean up all loops in the unix_walk_scc_fast() path,
unix_graph_maybe_cyclic remains unchanged (true), and __unix_gc()
will call unix_walk_scc_fast() continuously even though there is no
socket to garbage-collect.
To keep that optimisation while fixing UAF, let's add the same
updating logic of unix_graph_maybe_cyclic in unix_walk_scc_fast()
as done in unix_walk_scc() and __unix_walk_scc().
Note that when unix_del_edges() is called from other places, the
receiver socket is always alive:
- sendmsg: the successor's sk_refcnt is bumped by sock_hold()
unix_find_other() for SOCK_DGRAM, connect() for SOCK_STREAM
- recvmsg: the successor is the receiver, and its fd is alive
[0]:
BUG: KASAN: slab-use-after-free in unix_edge_successor net/unix/garbage.c:109 [inline]
BUG: KASAN: slab-use-after-free in unix_del_edge net/unix/garbage.c:165 [inline]
BUG: KASAN: slab-use-after-free in unix_del_edges+0x148/0x630 net/unix/garbage.c:237
Read of size 8 at addr ffff888079c6e640 by task kworker/u8:6/1099
CPU: 0 PID: 1099 Comm: kworker/u8:6 Not tainted 6.9.0-rc4-next-20240418-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Workqueue: events_unbound __unix_gc
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x169/0x550 mm/kasan/report.c:488
kasan_report+0x143/0x180 mm/kasan/report.c:601
unix_edge_successor net/unix/garbage.c:109 [inline]
unix_del_edge net/unix/garbage.c:165 [inline]
unix_del_edges+0x148/0x630 net/unix/garbage.c:237
unix_destroy_fpl+0x59/0x210 net/unix/garbage.c:298
unix_detach_fds net/unix/af_unix.c:1811 [inline]
unix_destruct_scm+0x13e/0x210 net/unix/af_unix.c:1826
skb_release_head_state+0x100/0x250 net/core/skbuff.c:1127
skb_release_all net/core/skbuff.c:1138 [inline]
__kfree_skb net/core/skbuff.c:1154 [inline]
kfree_skb_reason+0x16d/0x3b0 net/core/skbuff.c:1190
__skb_queue_purge_reason include/linux/skbuff.h:3251 [inline]
__skb_queue_purge include/linux/skbuff.h:3256 [inline]
__unix_gc+0x1732/0x1830 net/unix/garbage.c:575
process_one_work kernel/workqueue.c:3218 [inline]
process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299
worker_thread+0x86d/0xd70 kernel/workqueue.c:3380
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Allocated by task 14427:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
unpoison_slab_object mm/kasan/common.c:312 [inline]
__kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:338
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook mm/slub.c:3897 [inline]
slab_alloc_node mm/slub.c:3957 [inline]
kmem_cache_alloc_noprof+0x135/0x290 mm/slub.c:3964
sk_prot_alloc+0x58/0x210 net/core/sock.c:2074
sk_alloc+0x38/0x370 net/core/sock.c:2133
unix_create1+0xb4/0x770
unix_create+0x14e/0x200 net/unix/af_unix.c:1034
__sock_create+0x490/0x920 net/socket.c:1571
sock_create net/socket.c:1622 [inline]
__sys_socketpair+0x33e/0x720 net/socket.c:1773
__do_sys_socketpair net/socket.c:1822 [inline]
__se_sys_socketpair net/socket.c:1819 [inline]
__x64_sys_socketpair+0x9b/0xb0 net/socket.c:1819
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 1805:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579
poison_slab_object+0xe0/0x150 mm/kasan/common.c:240
__kasan_slab_free+0x37/0x60 mm/kasan/common.c:256
kasan_slab_free include/linux/kasan.h:184 [inline]
slab_free_hook mm/slub.c:2190 [inline]
slab_free mm/slub.c:4393 [inline]
kmem_cache_free+0x145/0x340 mm/slub.c:4468
sk_prot_free net/core/sock.c:2114 [inline]
__sk_destruct+0x467/0x5f0 net/core/sock.c:2208
sock_put include/net/sock.h:1948 [inline]
unix_release_sock+0xa8b/0xd20 net/unix/af_unix.c:665
unix_release+0x91/0xc0 net/unix/af_unix.c:1049
__sock_release net/socket.c:659 [inline]
sock_close+0xbc/0x240 net/socket.c:1421
__fput+0x406/0x8b0 fs/file_table.c:422
delayed_fput+0x59/0x80 fs/file_table.c:445
process_one_work kernel/workqueue.c:3218 [inline]
process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299
worker_thread+0x86d/0xd70 kernel/workqueue.c:3380
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
The buggy address belongs to the object at ffff888079c6e000
which belongs to the cache UNIX of size 1920
The buggy address is located 1600 bytes inside of
freed 1920-byte region [ffff888079c6e000, ffff888079c6e780)
Reported-by: syzbot+f3f3eef1d2100200e593@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f3f3eef1d2100200e593
Fixes: 77e5593aebba ("af_unix: Skip GC if no cycle exists.")
Fixes: fd86344823b5 ("af_unix: Try not to hold unix_gc_lock during accept().")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240419235102.31707-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The dev_tracker is added to ax25_cb in ax25_bind(). When the
ax25 device is detaching, the dev_tracker of ax25_cb should be
deallocated in ax25_kill_by_device() instead of the dev_tracker
of ax25_dev. The log reported by ref_tracker is shown below:
[ 80.884935] ref_tracker: reference already released.
[ 80.885150] ref_tracker: allocated in:
[ 80.885349] ax25_dev_device_up+0x105/0x540
[ 80.885730] ax25_device_event+0xa4/0x420
[ 80.885730] notifier_call_chain+0xc9/0x1e0
[ 80.885730] __dev_notify_flags+0x138/0x280
[ 80.885730] dev_change_flags+0xd7/0x180
[ 80.885730] dev_ifsioc+0x6a9/0xa30
[ 80.885730] dev_ioctl+0x4d8/0xd90
[ 80.885730] sock_do_ioctl+0x1c2/0x2d0
[ 80.885730] sock_ioctl+0x38b/0x4f0
[ 80.885730] __se_sys_ioctl+0xad/0xf0
[ 80.885730] do_syscall_64+0xc4/0x1b0
[ 80.885730] entry_SYSCALL_64_after_hwframe+0x67/0x6f
[ 80.885730] ref_tracker: freed in:
[ 80.885730] ax25_device_event+0x272/0x420
[ 80.885730] notifier_call_chain+0xc9/0x1e0
[ 80.885730] dev_close_many+0x272/0x370
[ 80.885730] unregister_netdevice_many_notify+0x3b5/0x1180
[ 80.885730] unregister_netdev+0xcf/0x120
[ 80.885730] sixpack_close+0x11f/0x1b0
[ 80.885730] tty_ldisc_kill+0xcb/0x190
[ 80.885730] tty_ldisc_hangup+0x338/0x3d0
[ 80.885730] __tty_hangup+0x504/0x740
[ 80.885730] tty_release+0x46e/0xd80
[ 80.885730] __fput+0x37f/0x770
[ 80.885730] __x64_sys_close+0x7b/0xb0
[ 80.885730] do_syscall_64+0xc4/0x1b0
[ 80.885730] entry_SYSCALL_64_after_hwframe+0x67/0x6f
[ 80.893739] ------------[ cut here ]------------
[ 80.894030] WARNING: CPU: 2 PID: 140 at lib/ref_tracker.c:255 ref_tracker_free+0x47b/0x6b0
[ 80.894297] Modules linked in:
[ 80.894929] CPU: 2 PID: 140 Comm: ax25_conn_rel_6 Not tainted 6.9.0-rc4-g8cd26fd90c1a #11
[ 80.895190] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qem4
[ 80.895514] RIP: 0010:ref_tracker_free+0x47b/0x6b0
[ 80.895808] Code: 83 c5 18 4c 89 eb 48 c1 eb 03 8a 04 13 84 c0 0f 85 df 01 00 00 41 83 7d 00 00 75 4b 4c 89 ff 9
[ 80.896171] RSP: 0018:ffff888009edf8c0 EFLAGS: 00000286
[ 80.896339] RAX: 1ffff1100141ac00 RBX: 1ffff1100149463b RCX: dffffc0000000000
[ 80.896502] RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffff88800a0d6518
[ 80.896925] RBP: ffff888009edf9b0 R08: ffff88806d3288d3 R09: 1ffff1100da6511a
[ 80.897212] R10: dffffc0000000000 R11: ffffed100da6511b R12: ffff88800a4a31d4
[ 80.897859] R13: ffff88800a4a31d8 R14: dffffc0000000000 R15: ffff88800a0d6518
[ 80.898279] FS: 00007fd88b7fe700(0000) GS:ffff88806d300000(0000) knlGS:0000000000000000
[ 80.899436] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 80.900181] CR2: 00007fd88c001d48 CR3: 000000000993e000 CR4: 00000000000006f0
...
[ 80.935774] ref_tracker: sp%d@000000000bb9df3d has 1/1 users at
[ 80.935774] ax25_bind+0x424/0x4e0
[ 80.935774] __sys_bind+0x1d9/0x270
[ 80.935774] __x64_sys_bind+0x75/0x80
[ 80.935774] do_syscall_64+0xc4/0x1b0
[ 80.935774] entry_SYSCALL_64_after_hwframe+0x67/0x6f
Change ax25_dev->dev_tracker to the dev_tracker of ax25_cb
in order to mitigate the bug.
Fixes: feef318c855a ("ax25: fix UAF bugs of net_device caused by rebinding operation")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Link: https://lore.kernel.org/r/20240419020456.29826-1-duoming@zju.edu.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Ensure that the provided netdev name is not one of its aliases to
prevent unnecessary creation and destruction of the vport by
ovs-vswitchd.
Signed-off-by: Jun Gu <jun.gu@easystack.cn>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/20240419061425.132723-1-jun.gu@easystack.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The NLM_F_ACK flag is ignored for nfnetlink batch begin and end
messages. This is a problem for ynl which wants to receive an ack for
every message it sends, not just the commands in between the begin/end
messages.
Add processing for ACKs for begin/end messages and provide responses
when requested.
I have checked that iproute2, pyroute2 and systemd are unaffected by
this change since none of them use NLM_F_ACK for batch begin/end.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240418104737.77914-5-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pavel Begunkov says:
====================
implement io_uring notification (ubuf_info) stacking (net part)
To have per request buffer notifications each zerocopy io_uring send
request allocates a new ubuf_info. However, as an skb can carry only
one uarg, it may force the stack to create many small skbs hurting
performance in many ways.
The patchset implements notification, i.e. an io_uring's ubuf_info
extension, stacking. It attempts to link ubuf_info's into a list,
allowing to have multiple of them per skb.
liburing/examples/send-zerocopy shows up 6 times performance improvement
for TCP with 4KB bytes per send, and levels it with MSG_ZEROCOPY. Without
the patchset it requires much larger sends to utilise all potential.
bytes | before | after (Kqps)
1200 | 195 | 1023
4000 | 193 | 1386
8000 | 154 | 1058
====================
Link: https://lore.kernel.org/all/cover.1713369317.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
At the moment an skb can only have one ubuf_info associated with it,
which might be a performance problem for zerocopy sends in cases like
TCP via io_uring. Add a callback for assigning ubuf_info to skb, this
way we will implement smarter assignment later like linking ubuf_info
together.
Note, it's an optional callback, which should be compatible with
skb_zcopy_set(), that's because the net stack might potentially decide
to clone an skb and take another reference to ubuf_info whenever it
wishes. Also, a correct implementation should always be able to bind to
an skb without prior ubuf_info, otherwise we could end up in a situation
when the send would not be able to progress.
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/all/b7918aadffeb787c84c9e72e34c729dc04f3a45d.1713369317.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We'll need to associate additional callbacks with ubuf_info, introduce
a structure holding ubuf_info callbacks. Apart from a more smarter
io_uring notification management introduced in next patches, it can be
used to generalise msg_zerocopy_put_abort() and also store
->sg_from_iter, which is currently passed in struct msghdr.
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/all/a62015541de49c0e2a8a0377a1d5d0a5aeb07016.1713369317.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
While investigating TCP performance, I found that TCP would
sometimes send big skbs followed by a single MSS skb,
in a 'locked' pattern.
For instance, BIG TCP is enabled, MSS is set to have 4096 bytes
of payload per segment. gso_max_size is set to 181000.
This means that an optimal TCP packet size should contain
44 * 4096 = 180224 bytes of payload,
However, I was seeing packets sizes interleaved in this pattern:
172032, 8192, 172032, 8192, 172032, 8192, <repeat>
tcp_tso_should_defer() heuristic is defeated, because after a split of
a packet in write queue for whatever reason (this might be a too small
CWND or a small enough pacing_rate),
the leftover packet in the queue is smaller than the optimal size.
It is time to try to make 'leftover packets' bigger so that
tcp_tso_should_defer() can give its full potential.
After this patch, we can see the following output:
14:13:34.009273 IP6 sender > receiver: Flags [P.], seq 4048380:4098360, ack 1, win 256, options [nop,nop,TS val 3425678144 ecr 1561784500], length 49980
14:13:34.010272 IP6 sender > receiver: Flags [P.], seq 4098360:4148340, ack 1, win 256, options [nop,nop,TS val 3425678145 ecr 1561784501], length 49980
14:13:34.011271 IP6 sender > receiver: Flags [P.], seq 4148340:4198320, ack 1, win 256, options [nop,nop,TS val 3425678146 ecr 1561784502], length 49980
14:13:34.012271 IP6 sender > receiver: Flags [P.], seq 4198320:4248300, ack 1, win 256, options [nop,nop,TS val 3425678147 ecr 1561784503], length 49980
14:13:34.013272 IP6 sender > receiver: Flags [P.], seq 4248300:4298280, ack 1, win 256, options [nop,nop,TS val 3425678148 ecr 1561784504], length 49980
14:13:34.014271 IP6 sender > receiver: Flags [P.], seq 4298280:4348260, ack 1, win 256, options [nop,nop,TS val 3425678149 ecr 1561784505], length 49980
14:13:34.015272 IP6 sender > receiver: Flags [P.], seq 4348260:4398240, ack 1, win 256, options [nop,nop,TS val 3425678150 ecr 1561784506], length 49980
14:13:34.016270 IP6 sender > receiver: Flags [P.], seq 4398240:4448220, ack 1, win 256, options [nop,nop,TS val 3425678151 ecr 1561784507], length 49980
14:13:34.017269 IP6 sender > receiver: Flags [P.], seq 4448220:4498200, ack 1, win 256, options [nop,nop,TS val 3425678152 ecr 1561784508], length 49980
14:13:34.018276 IP6 sender > receiver: Flags [P.], seq 4498200:4548180, ack 1, win 256, options [nop,nop,TS val 3425678153 ecr 1561784509], length 49980
14:13:34.019259 IP6 sender > receiver: Flags [P.], seq 4548180:4598160, ack 1, win 256, options [nop,nop,TS val 3425678154 ecr 1561784510], length 49980
With 200 concurrent flows on a 100Gbit NIC, we can see a reduction
of TSO packets (and ACK packets) of about 30 %.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240418214600.1291486-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tcp_write_xmit() calls tcp_init_tso_segs()
to set gso_size and gso_segs on the packet.
tcp_init_tso_segs() requires the stack to maintain
an up to date tcp_skb_pcount(), and this makes sense
for packets in rtx queue. Not so much for packets
still in the write queue.
In the following patch, we don't want to deal with
tcp_skb_pcount() when moving payload from 2nd
skb to 1st skb in the write queue.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240418214600.1291486-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tcp_cwnd_test() has a special handing for the last packet in
the write queue if it is smaller than one MSS and has the FIN flag.
This is in violation of TCP RFC, and seems quite dubious.
This packet can be sent only if the current CWND is bigger
than the number of packets in flight.
Making tcp_cwnd_test() result independent of the first skb
in the write queue is needed for the last patch of the series.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240418214600.1291486-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Extend devlink_param *set function pointer to take extack as a param.
Sometimes it is needed to pass information to the end user from set
function. It is more proper to use for that netlink instead of passing
message to dmesg.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:
- Fix an NFS/RDMA performance regression in v6.9-rc
* tag 'nfsd-6.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
Revert "svcrdma: Add Write chunk WRs to the RPC's Send WR chain"
|
|
br_info_notify is a void function. There is no need to return.
Fixes: b6d0425b816e ("bridge: cfm: Netlink Notifications.")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After commit 1eeb50435739 ("tcp/dccp: do not care about
families in inet_twsk_purge()") tcp_twsk_purge() is
no longer potentially called from a module.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
First problem is a double call to __in_dev_get_rcu(), because
the second one could return NULL.
if (__in_dev_get_rcu(dev) && __in_dev_get_rcu(dev)->ifa_list)
Second problem is a read from dev->ip6_ptr with no NULL check:
if (!list_empty(&rcu_dereference(dev->ip6_ptr)->addr_list))
Use the correct RCU API to fix these.
v2: add missing include <net/addrconf.h>
Fixes: d329ea5bd884 ("icmp: add response to RFC 8335 PROBE messages")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Andreas Roeseler <andreas.a.roeseler@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To be able to constify instances of struct ctl_tables it is necessary to
remove ways through which non-const versions are exposed from the
sysctl core.
One of these is the ctl_table_arg member of struct ctl_table_header.
Constify this reference as a prerequisite for the full constification of
struct ctl_table instances.
No functional change.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Performance regression reported with NFS/RDMA using Omnipath,
bisected to commit e084ee673c77 ("svcrdma: Add Write chunk WRs to
the RPC's Send WR chain").
Tracing on the server reports:
nfsd-7771 [060] 1758.891809: svcrdma_sq_post_err:
cq.id=205 cid=226 sc_sq_avail=13643/851 status=-12
sq_post_err reports ENOMEM, and the rdma->sc_sq_avail (13643) is
larger than rdma->sc_sq_depth (851). The number of available Send
Queue entries is always supposed to be smaller than the Send Queue
depth. That seems like a Send Queue accounting bug in svcrdma.
As it's getting to be late in the 6.9-rc cycle, revert this commit.
It can be revisited in a subsequent kernel release.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218743
Fixes: e084ee673c77 ("svcrdma: Add Write chunk WRs to the RPC's Send WR chain")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
If "udp_cmsg_send()" returned 0 (i.e. only UDP cmsg),
"connected" should not be set to 0. Otherwise it stops
the connected socket from using the cached route.
Fixes: 2e8de8576343 ("udp: add gso segment cmsg")
Signed-off-by: Yick Xie <yick.xie@gmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240418170610.867084-1-yick.xie@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|