summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2020-10-21rtnetlink: fix data overflow in rtnl_calcit()Di Zhu
"ip addr show" command execute error when we have a physical network card with a large number of VFs The return value of if_nlmsg_size() in rtnl_calcit() will exceed range of u16 data type when any network cards has a larger number of VFs. rtnl_vfinfo_size() will significant increase needed dump size when the value of num_vfs is larger. Eventually we get a wrong value of min_ifinfo_dump_size because of overflow which decides the memory size needed by netlink dump and netlink_dump() will return -EMSGSIZE because of not enough memory was allocated. So fix it by promoting min_dump_alloc data type to u32 to avoid whole netlink message size overflow and it's also align with the data type of struct netlink_callback{}.min_dump_alloc which is assigned by return value of rtnl_calcit() Signed-off-by: Di Zhu <zhudi21@huawei.com> Link: https://lore.kernel.org/r/20201021020053.1401-1-zhudi21@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-21mptcp: depends on IPV6 but not as a moduleMatthieu Baerts
Like TCP, MPTCP cannot be compiled as a module. Obviously, MPTCP IPv6' support also depends on CONFIG_IPV6. But not all functions from IPv6 code are exported. To simplify the code and reduce modifications outside MPTCP, it was decided from the beginning to support MPTCP with IPv6 only if CONFIG_IPV6 was built inlined. That's also why CONFIG_MPTCP_IPV6 was created. More modifications are needed to support CONFIG_IPV6=m. Even if it was not explicit, until recently, we were forcing CONFIG_IPV6 to be built-in because we had "select IPV6" in Kconfig. Now that we have "depends on IPV6", we have to explicitly set "IPV6=y" to force CONFIG_IPV6 not to be built as a module. In other words, we can now only have CONFIG_MPTCP_IPV6=y if CONFIG_IPV6=y. Note that the new dependency might hide the fact IPv6 is not supported in MPTCP even if we have CONFIG_IPV6=m. But selecting IPV6 like we did before was forcing it to be built-in while it was maybe not what the user wants. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Fixes: 010b430d5df5 ("mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting it") Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/20201021105154.628257-1-matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-20mpls: load mpls_gso after mpls_iptunnelAlexander Ovechkin
mpls_iptunnel is used only for mpls encapsuation, and if encaplusated packet is larger than MTU we need mpls_gso for segmentation. Signed-off-by: Alexander Ovechkin <ovov@yandex-team.ru> Acked-by: Dmitry Yakunin <zeil@yandex-team.ru> Reviewed-by: David Ahern <dsahern@gmail.com> Link: https://lore.kernel.org/r/20201020114333.26866-1-ovov@yandex-team.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-20net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnelsDavide Caratti
the following command # tc action add action tunnel_key \ > set src_ip 2001:db8::1 dst_ip 2001:db8::2 id 10 erspan_opts 1:6789:0:0 generates the following splat: BUG: KASAN: slab-out-of-bounds in tunnel_key_copy_opts+0xcc9/0x1010 [act_tunnel_key] Write of size 4 at addr ffff88813f5f1cc8 by task tc/873 CPU: 2 PID: 873 Comm: tc Not tainted 5.9.0+ #282 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014 Call Trace: dump_stack+0x99/0xcb print_address_description.constprop.7+0x1e/0x230 kasan_report.cold.13+0x37/0x7c tunnel_key_copy_opts+0xcc9/0x1010 [act_tunnel_key] tunnel_key_init+0x160c/0x1f40 [act_tunnel_key] tcf_action_init_1+0x5b5/0x850 tcf_action_init+0x15d/0x370 tcf_action_add+0xd9/0x2f0 tc_ctl_action+0x29b/0x3a0 rtnetlink_rcv_msg+0x341/0x8d0 netlink_rcv_skb+0x120/0x380 netlink_unicast+0x439/0x630 netlink_sendmsg+0x719/0xbf0 sock_sendmsg+0xe2/0x110 ____sys_sendmsg+0x5ba/0x890 ___sys_sendmsg+0xe9/0x160 __sys_sendmsg+0xd3/0x170 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f872a96b338 Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 43 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55 RSP: 002b:00007ffffe367518 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000000005f8f5aed RCX: 00007f872a96b338 RDX: 0000000000000000 RSI: 00007ffffe367580 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000001c R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000001 R13: 0000000000686760 R14: 0000000000000601 R15: 0000000000000000 Allocated by task 873: kasan_save_stack+0x19/0x40 __kasan_kmalloc.constprop.7+0xc1/0xd0 __kmalloc+0x151/0x310 metadata_dst_alloc+0x20/0x40 tunnel_key_init+0xfff/0x1f40 [act_tunnel_key] tcf_action_init_1+0x5b5/0x850 tcf_action_init+0x15d/0x370 tcf_action_add+0xd9/0x2f0 tc_ctl_action+0x29b/0x3a0 rtnetlink_rcv_msg+0x341/0x8d0 netlink_rcv_skb+0x120/0x380 netlink_unicast+0x439/0x630 netlink_sendmsg+0x719/0xbf0 sock_sendmsg+0xe2/0x110 ____sys_sendmsg+0x5ba/0x890 ___sys_sendmsg+0xe9/0x160 __sys_sendmsg+0xd3/0x170 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The buggy address belongs to the object at ffff88813f5f1c00 which belongs to the cache kmalloc-256 of size 256 The buggy address is located 200 bytes inside of 256-byte region [ffff88813f5f1c00, ffff88813f5f1d00) The buggy address belongs to the page: page:0000000011b48a19 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13f5f0 head:0000000011b48a19 order:1 compound_mapcount:0 flags: 0x17ffffc0010200(slab|head) raw: 0017ffffc0010200 0000000000000000 0000000d00000001 ffff888107c43400 raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88813f5f1b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88813f5f1c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff88813f5f1c80: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc ^ ffff88813f5f1d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88813f5f1d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc using IPv6 tunnels, act_tunnel_key allocates a fixed amount of memory for the tunnel metadata, but then it expects additional bytes to store tunnel specific metadata with tunnel_key_copy_opts(). Fix the arguments of __ipv6_tun_set_dst(), so that 'md_size' contains the size previously computed by tunnel_key_get_opts_len(), like it's done for IPv4 tunnels. Fixes: 0ed5269f9e41 ("net/sched: add tunnel option support to act_tunnel_key") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://lore.kernel.org/r/36ebe969f6d13ff59912d6464a4356fe6f103766.1603231100.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-20net/sched: act_gate: Unlock ->tcfa_lock in tc_setup_flow_action()Guillaume Nault
We need to jump to the "err_out_locked" label when tcf_gate_get_entries() fails. Otherwise, tc_setup_flow_action() exits with ->tcfa_lock still held. Fixes: d29bdd69ecdd ("net: schedule: add action gate offloading") Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://lore.kernel.org/r/12f60e385584c52c22863701c0185e40ab08a7a7.1603207948.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-20mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting itGeert Uytterhoeven
MPTCP_IPV6 selects IPV6, thus enabling an optional feature the user may not want to enable. Fix this by making MPTCP_IPV6 depend on IPV6, like is done for all other IPv6 features. Fixes: f870fa0b5768842c ("mptcp: Add MPTCP socket stubs") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/20201020073839.29226-1-geert@linux-m68k.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-20nfc: Ensure presence of NFC_ATTR_FIRMWARE_NAME attribute in ↵Defang Bo
nfc_genl_fw_download() Check that the NFC_ATTR_FIRMWARE_NAME attributes are provided by the netlink client prior to accessing them.This prevents potential unhandled NULL pointer dereference exceptions which can be triggered by malicious user-mode programs, if they omit one or both of these attributes. Similar to commit a0323b979f81 ("nfc: Ensure presence of required attributes in the activate_target handler"). Fixes: 9674da8759df ("NFC: Add firmware upload netlink command") Signed-off-by: Defang Bo <bodefang@126.com> Link: https://lore.kernel.org/r/1603107538-4744-1-git-send-email-bodefang@126.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-20mptcp: MPTCP_KUNIT_TESTS should depend on MPTCP instead of selecting itGeert Uytterhoeven
MPTCP_KUNIT_TESTS selects MPTCP, thus enabling an optional feature the user may not want to enable. Fix this by making the test depend on MPTCP instead. Fixes: a00a582203dbc43e ("mptcp: move crypto test to KUNIT") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/20201019113240.11516-1-geert@linux-m68k.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-20mptcp: move mptcp_options_received's port initializationGeliang Tang
Move mptcp_options_received's port initialization from mptcp_parse_option to mptcp_get_options, put it together with the other fields initializations of mptcp_options_received. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-20mptcp: initialize mptcp_options_received's ahmacGeliang Tang
Initialize mptcp_options_received's ahmac to zero, otherwise it will be a random number when receiving ADD_ADDR suboption with echo-flag=1. Fixes: 3df523ab582c5 ("mptcp: Add ADD_ADDR handling") Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-20net/sched: act_ct: Fix adding udp port mangle operationRoi Dayan
Need to use the udp header type and not tcp. Fixes: 9c26ba9b1f45 ("net/sched: act_ct: Instantiate flow table entry actions") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Link: https://lore.kernel.org/r/20201019090244.3015186-1-roid@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-19nexthop: Fix performance regression in nexthop deletionIdo Schimmel
While insertion of 16k nexthops all using the same netdev ('dummy10') takes less than a second, deletion takes about 130 seconds: # time -p ip -b nexthop.batch real 0.29 user 0.01 sys 0.15 # time -p ip link set dev dummy10 down real 131.03 user 0.06 sys 0.52 This is because of repeated calls to synchronize_rcu() whenever a nexthop is removed from a nexthop group: # /usr/share/bcc/tools/offcputime -p `pgrep -nx ip` -K ... b'finish_task_switch' b'schedule' b'schedule_timeout' b'wait_for_completion' b'__wait_rcu_gp' b'synchronize_rcu.part.0' b'synchronize_rcu' b'__remove_nexthop' b'remove_nexthop' b'nexthop_flush_dev' b'nh_netdev_event' b'raw_notifier_call_chain' b'call_netdevice_notifiers_info' b'__dev_notify_flags' b'dev_change_flags' b'do_setlink' b'__rtnl_newlink' b'rtnl_newlink' b'rtnetlink_rcv_msg' b'netlink_rcv_skb' b'rtnetlink_rcv' b'netlink_unicast' b'netlink_sendmsg' b'____sys_sendmsg' b'___sys_sendmsg' b'__sys_sendmsg' b'__x64_sys_sendmsg' b'do_syscall_64' b'entry_SYSCALL_64_after_hwframe' - ip (277) 126554955 Since nexthops are always deleted under RTNL, synchronize_net() can be used instead. It will call synchronize_rcu_expedited() which only blocks for several microseconds as opposed to multiple milliseconds like synchronize_rcu(). With this patch deletion of 16k nexthops takes less than a second: # time -p ip link set dev dummy10 down real 0.12 user 0.00 sys 0.04 Tested with fib_nexthops.sh which includes torture tests that prompted the initial change: # ./fib_nexthops.sh ... Tests passed: 134 Tests failed: 0 Fixes: 90f33bffa382 ("nexthops: don't modify published nexthop groups") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: David Ahern <dsahern@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20201016172914.643282-1-idosch@idosch.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-19net: dsa: tag_ksz: KSZ8795 and KSZ9477 also use tail tagsChristian Eggers
The Marvell 88E6060 uses tag_trailer.c and the KSZ8795, KSZ9477 and KSZ9893 switches also use tail tags. Fixes: 7a6ffe764be3 ("net: dsa: point out the tail taggers") Signed-off-by: Christian Eggers <ceggers@arri.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20201016171603.10587-1-ceggers@arri.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-18net: core: use list_del_init() instead of list_del() in netdev_run_todo()Taehee Yoo
dev->unlink_list is reused unless dev is deleted. So, list_del() should not be used. Due to using list_del(), dev->unlink_list can't be reused so that dev->nested_level update logic doesn't work. In order to fix this bug, list_del_init() should be used instead of list_del(). Test commands: ip link add bond0 type bond ip link add bond1 type bond ip link set bond0 master bond1 ip link set bond0 nomaster ip link set bond1 master bond0 ip link set bond1 nomaster Splat looks like: [ 255.750458][ T1030] ============================================ [ 255.751967][ T1030] WARNING: possible recursive locking detected [ 255.753435][ T1030] 5.9.0-rc8+ #772 Not tainted [ 255.754553][ T1030] -------------------------------------------- [ 255.756047][ T1030] ip/1030 is trying to acquire lock: [ 255.757304][ T1030] ffff88811782a280 (&dev_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync_multiple+0xc2/0x150 [ 255.760056][ T1030] [ 255.760056][ T1030] but task is already holding lock: [ 255.761862][ T1030] ffff88811130a280 (&dev_addr_list_lock_key/1){+...}-{2:2}, at: bond_enslave+0x3d4d/0x43e0 [bonding] [ 255.764581][ T1030] [ 255.764581][ T1030] other info that might help us debug this: [ 255.766645][ T1030] Possible unsafe locking scenario: [ 255.766645][ T1030] [ 255.768566][ T1030] CPU0 [ 255.769415][ T1030] ---- [ 255.770259][ T1030] lock(&dev_addr_list_lock_key/1); [ 255.771629][ T1030] lock(&dev_addr_list_lock_key/1); [ 255.772994][ T1030] [ 255.772994][ T1030] *** DEADLOCK *** [ 255.772994][ T1030] [ 255.775091][ T1030] May be due to missing lock nesting notation [ 255.775091][ T1030] [ 255.777182][ T1030] 2 locks held by ip/1030: [ 255.778299][ T1030] #0: ffffffffb1f63250 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x2e4/0x8b0 [ 255.780600][ T1030] #1: ffff88811130a280 (&dev_addr_list_lock_key/1){+...}-{2:2}, at: bond_enslave+0x3d4d/0x43e0 [bonding] [ 255.783411][ T1030] [ 255.783411][ T1030] stack backtrace: [ 255.784874][ T1030] CPU: 7 PID: 1030 Comm: ip Not tainted 5.9.0-rc8+ #772 [ 255.786595][ T1030] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 255.789030][ T1030] Call Trace: [ 255.789850][ T1030] dump_stack+0x99/0xd0 [ 255.790882][ T1030] __lock_acquire.cold.71+0x166/0x3cc [ 255.792285][ T1030] ? register_lock_class+0x1a30/0x1a30 [ 255.793619][ T1030] ? rcu_read_lock_sched_held+0x91/0xc0 [ 255.794963][ T1030] ? rcu_read_lock_bh_held+0xa0/0xa0 [ 255.796246][ T1030] lock_acquire+0x1b8/0x850 [ 255.797332][ T1030] ? dev_mc_sync_multiple+0xc2/0x150 [ 255.798624][ T1030] ? bond_enslave+0x3d4d/0x43e0 [bonding] [ 255.800039][ T1030] ? check_flags+0x50/0x50 [ 255.801143][ T1030] ? lock_contended+0xd80/0xd80 [ 255.802341][ T1030] _raw_spin_lock_nested+0x2e/0x70 [ 255.803592][ T1030] ? dev_mc_sync_multiple+0xc2/0x150 [ 255.804897][ T1030] dev_mc_sync_multiple+0xc2/0x150 [ 255.806168][ T1030] bond_enslave+0x3d58/0x43e0 [bonding] [ 255.807542][ T1030] ? __lock_acquire+0xe53/0x51b0 [ 255.808824][ T1030] ? bond_update_slave_arr+0xdc0/0xdc0 [bonding] [ 255.810451][ T1030] ? check_chain_key+0x236/0x5e0 [ 255.811742][ T1030] ? mutex_is_locked+0x13/0x50 [ 255.812910][ T1030] ? rtnl_is_locked+0x11/0x20 [ 255.814061][ T1030] ? netdev_master_upper_dev_get+0xf/0x120 [ 255.815553][ T1030] do_setlink+0x94c/0x3040 [ ... ] Reported-by: syzbot+4a0f7bc34e3997a6c7df@syzkaller.appspotmail.com Fixes: 1fc70edb7d7b ("net: core: add nested_level variable in net_device") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://lore.kernel.org/r/20201015162606.9377-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-18net: openvswitch: fix to make sure flow_lookup() is not preemptedEelco Chaudron
The flow_lookup() function uses per CPU variables, which must be called with BH disabled. However, this is fine in the general NAPI use case where the local BH is disabled. But, it's also called from the netlink context. The below patch makes sure that even in the netlink path, the BH is disabled. In addition, u64_stats_update_begin() requires a lock to ensure one writer which is not ensured here. Making it per-CPU and disabling NAPI (softirq) ensures that there is always only one writer. Fixes: eac87c413bf9 ("net: openvswitch: reorder masks array based on usage") Reported-by: Juri Lelli <jlelli@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/160295903253.7789.826736662555102345.stgit@ebuild Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-16icmp: randomize the global rate limiterEric Dumazet
Keyu Man reported that the ICMP rate limiter could be used by attackers to get useful signal. Details will be provided in an upcoming academic publication. Our solution is to add some noise, so that the attackers no longer can get help from the predictable token bucket limiter. Fixes: 4cdf507d5452 ("icmp: add a global rate limitation") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Keyu Man <kman001@ucr.edu> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-16tipc: fix incorrect setting window for bcast linkHoang Huu Le
In commit 16ad3f4022bb ("tipc: introduce variable window congestion control"), we applied the algorithm to select window size from minimum window to the configured maximum window for unicast link, and, besides we chose to keep the window size for broadcast link unchanged and equal (i.e fix window 50) However, when setting maximum window variable via command, the window variable was re-initialized to unexpect value (i.e 32). We fix this by updating the fix window for broadcast as we stated. Fixes: 16ad3f4022bb ("tipc: introduce variable window congestion control") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-16tipc: re-configure queue limit for broadcast linkHoang Huu Le
The queue limit of the broadcast link is being calculated base on initial MTU. However, when MTU value changed (e.g manual changing MTU on NIC device, MTU negotiation etc.,) we do not re-calculate queue limit. This gives throughput does not reflect with the change. So fix it by calling the function to re-calculate queue limit of the broadcast link. Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-15Merge tag 'net-next-5.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: - Add redirect_neigh() BPF packet redirect helper, allowing to limit stack traversal in common container configs and improving TCP back-pressure. Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain. - Expand netlink policy support and improve policy export to user space. (Ge)netlink core performs request validation according to declared policies. Expand the expressiveness of those policies (min/max length and bitmasks). Allow dumping policies for particular commands. This is used for feature discovery by user space (instead of kernel version parsing or trial and error). - Support IGMPv3/MLDv2 multicast listener discovery protocols in bridge. - Allow more than 255 IPv4 multicast interfaces. - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK packets of TCPv6. - In Multi-patch TCP (MPTCP) support concurrent transmission of data on multiple subflows in a load balancing scenario. Enhance advertising addresses via the RM_ADDR/ADD_ADDR options. - Support SMC-Dv2 version of SMC, which enables multi-subnet deployments. - Allow more calls to same peer in RxRPC. - Support two new Controller Area Network (CAN) protocols - CAN-FD and ISO 15765-2:2016. - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit kernel problem. - Add TC actions for implementing MPLS L2 VPNs. - Improve nexthop code - e.g. handle various corner cases when nexthop objects are removed from groups better, skip unnecessary notifications and make it easier to offload nexthops into HW by converting to a blocking notifier. - Support adding and consuming TCP header options by BPF programs, opening the doors for easy experimental and deployment-specific TCP option use. - Reorganize TCP congestion control (CC) initialization to simplify life of TCP CC implemented in BPF. - Add support for shipping BPF programs with the kernel and loading them early on boot via the User Mode Driver mechanism, hence reusing all the user space infra we have. - Support sleepable BPF programs, initially targeting LSM and tracing. - Add bpf_d_path() helper for returning full path for given 'struct path'. - Make bpf_tail_call compatible with bpf-to-bpf calls. - Allow BPF programs to call map_update_elem on sockmaps. - Add BPF Type Format (BTF) support for type and enum discovery, as well as support for using BTF within the kernel itself (current use is for pretty printing structures). - Support listing and getting information about bpf_links via the bpf syscall. - Enhance kernel interfaces around NIC firmware update. Allow specifying overwrite mask to control if settings etc. are reset during update; report expected max time operation may take to users; support firmware activation without machine reboot incl. limits of how much impact reset may have (e.g. dropping link or not). - Extend ethtool configuration interface to report IEEE-standard counters, to limit the need for per-vendor logic in user space. - Adopt or extend devlink use for debug, monitoring, fw update in many drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx, dpaa2-eth). - In mlxsw expose critical and emergency SFP module temperature alarms. Refactor port buffer handling to make the defaults more suitable and support setting these values explicitly via the DCBNL interface. - Add XDP support for Intel's igb driver. - Support offloading TC flower classification and filtering rules to mscc_ocelot switches. - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as fixed interval period pulse generator and one-step timestamping in dpaa-eth. - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3) offload. - Add Lynx PHY/PCS MDIO module, and convert various drivers which have this HW to use it. Convert mvpp2 to split PCS. - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as 7-port Mediatek MT7531 IP. - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver, and wcn3680 support in wcn36xx. - Improve performance for packets which don't require much offloads on recent Mellanox NICs by 20% by making multiple packets share a descriptor entry. - Move chelsio inline crypto drivers (for TLS and IPsec) from the crypto subtree to drivers/net. Move MDIO drivers out of the phy directory. - Clean up a lot of W=1 warnings, reportedly the actively developed subsections of networking drivers should now build W=1 warning free. - Make sure drivers don't use in_interrupt() to dynamically adapt their code. Convert tasklets to use new tasklet_setup API (sadly this conversion is not yet complete). * tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits) Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH" net, sockmap: Don't call bpf_prog_put() on NULL pointer bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo bpf, sockmap: Add locking annotations to iterator netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements net: fix pos incrementment in ipv6_route_seq_next net/smc: fix invalid return code in smcd_new_buf_create() net/smc: fix valid DMBE buffer sizes net/smc: fix use-after-free of delayed events bpfilter: Fix build error with CONFIG_BPFILTER_UMH cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info bpf: Fix register equivalence tracking. rxrpc: Fix loss of final ack on shutdown rxrpc: Fix bundle counting for exclusive connections netfilter: restore NF_INET_NUMHOOKS ibmveth: Identify ingress large send packets. ibmveth: Switch order of ibmveth_helper calls. cxgb4: handle 4-tuple PEDIT to NAT mode translation selftests: Add VRF route leaking tests ...
2020-10-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-10-15 The main changes are: 1) Fix register equivalence tracking in verifier, from Alexei Starovoitov. 2) Fix sockmap error path to not call bpf_prog_put() with NULL, from Alex Dewar. 3) Fix sockmap to add locking annotations to iterator, from Lorenz Bauer. 4) Fix tcp_hdr_options test to use loopback address, from Martin KaFai Lau. ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Minor conflicts in net/mptcp/protocol.h and tools/testing/selftests/net/Makefile. In both cases code was added on both sides in the same place so just keep both. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-15Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"Jakub Kicinski
This reverts commit 1d273fcc2c29343e59658276b77b02e5897a3123. Alexei points out there's nothing implying headers will be built and therefore exist under usr/include, so this fix doesn't make much sense. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-15net, sockmap: Don't call bpf_prog_put() on NULL pointerAlex Dewar
If bpf_prog_inc_not_zero() fails for skb_parser, then bpf_prog_put() is called unconditionally on skb_verdict, even though it may be NULL. Fix and tidy up error path. Fixes: 743df8b7749f ("bpf, sockmap: Check skb_verdict and skb_parser programs explicitly") Addresses-Coverity-ID: 1497799: Null pointer dereferences (FORWARD_NULL) Signed-off-by: Alex Dewar <alex.dewar90@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20201012170952.60750-1-alex.dewar90@gmail.com
2020-10-15bpf, sockmap: Add locking annotations to iteratorLorenz Bauer
The sparse checker currently outputs the following warnings: include/linux/rcupdate.h:632:9: sparse: sparse: context imbalance in 'sock_hash_seq_start' - wrong count at exit include/linux/rcupdate.h:632:9: sparse: sparse: context imbalance in 'sock_map_seq_start' - wrong count at exit Add the necessary __acquires and __release annotations to make the iterator locking schema palatable to sparse. Also add __must_hold for good measure. The kernel codebase uses both __acquires(rcu) and __acquires(RCU). I couldn't find any guidance which one is preferred, so I used what is easier to type out. Fixes: 0365351524d7 ("net: Allow iterating sockmap and sockhash") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20201012091850.67452-1-lmb@cloudflare.com
2020-10-15netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statementsDavide Caratti
nftables payload statements are used to mangle SCTP headers, but they can only replace the Internet Checksum. As a consequence, nftables rules that mangle sport/dport/vtag in SCTP headers potentially generate packets that are discarded by the receiver, unless the CRC-32C is "offloaded" (e.g the rule mangles a skb having 'ip_summed' equal to 'CHECKSUM_PARTIAL'. Fix this extending uAPI definitions and L4 checksum update function, in a way that userspace programs (e.g. nft) can instruct the kernel to compute CRC-32C in SCTP headers. Also ensure that LIBCRC32C is built if NF_TABLES is 'y' or 'm' in the kernel build configuration. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-15net: fix pos incrementment in ipv6_route_seq_nextYonghong Song
Commit 4fc427e05158 ("ipv6_route_seq_next should increase position index") tried to fix the issue where seq_file pos is not increased if a NULL element is returned with seq_ops->next(). See bug https://bugzilla.kernel.org/show_bug.cgi?id=206283 The commit effectively does: - increase pos for all seq_ops->start() - increase pos for all seq_ops->next() For ipv6_route, increasing pos for all seq_ops->next() is correct. But increasing pos for seq_ops->start() is not correct since pos is used to determine how many items to skip during seq_ops->start(): iter->skip = *pos; seq_ops->start() just fetches the *current* pos item. The item can be skipped only after seq_ops->show() which essentially is the beginning of seq_ops->next(). For example, I have 7 ipv6 route entries, root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=4096 00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001 eth0 fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001 eth0 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001 eth0 ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000004 00000000 00000001 eth0 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo 0+1 records in 0+1 records out 1050 bytes (1.0 kB, 1.0 KiB) copied, 0.00707908 s, 148 kB/s root@arch-fb-vm1:~/net-next In the above, I specify buffer size 4096, so all records can be returned to user space with a single trip to the kernel. If I use buffer size 128, since each record size is 149, internally kernel seq_read() will read 149 into its internal buffer and return the data to user space in two read() syscalls. Then user read() syscall will trigger next seq_ops->start(). Since the current implementation increased pos even for seq_ops->start(), it will skip record #2, #4 and #6, assuming the first record is #1. root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=128 00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001 eth0 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001 eth0 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo 4+1 records in 4+1 records out 600 bytes copied, 0.00127758 s, 470 kB/s To fix the problem, create a fake pos pointer so seq_ops->start() won't actually increase seq_file pos. With this fix, the above `dd` command with `bs=128` will show correct result. Fixes: 4fc427e05158 ("ipv6_route_seq_next should increase position index") Cc: Alexei Starovoitov <ast@kernel.org> Suggested-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-15net/smc: fix invalid return code in smcd_new_buf_create()Karsten Graul
smc_ism_register_dmb() returns error codes set by the ISM driver which are not guaranteed to be negative or in the errno range. Such values would not be handled by ERR_PTR() and finally the return code will be used as a memory address. Fix that by using a valid negative errno value with ERR_PTR(). Fixes: 72b7f6c48708 ("net/smc: unique reason code for exceeded max dmb count") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-15net/smc: fix valid DMBE buffer sizesKarsten Graul
The SMCD_DMBE_SIZES should include all valid DMBE buffer sizes, so the correct value is 6 which means 1MB. With 7 the registration of an ISM buffer would always fail because of the invalid size requested. Fix that and set the value to 6. Fixes: c6ba7c9ba43d ("net/smc: add base infrastructure for SMC-D and ISM") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-15net/smc: fix use-after-free of delayed eventsKarsten Graul
When a delayed event is enqueued then the event worker will send this event the next time it is running and no other flow is currently active. The event handler is called for the delayed event, and the pointer to the event keeps set in lgr->delayed_event. This pointer is cleared later in the processing by smc_llc_flow_start(). This can lead to a use-after-free condition when the processing does not reach smc_llc_flow_start(), but frees the event because of an error situation. Then the delayed_event pointer is still set but the event is freed. Fix this by always clearing the delayed event pointer when the event is provided to the event handler for processing, and remove the code to clear it in smc_llc_flow_start(). Fixes: 555da9af827d ("net/smc: add event-based llc_flow framework") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-15bpfilter: Fix build error with CONFIG_BPFILTER_UMHYueHaibing
IF CONFIG_BPFILTER_UMH is set, building fails: In file included from /usr/include/sys/socket.h:33:0, from net/bpfilter/main.c:6: /usr/include/bits/socket.h:390:10: fatal error: asm/socket.h: No such file or directory #include <asm/socket.h> ^~~~~~~~~~~~~~ compilation terminated. scripts/Makefile.userprogs:43: recipe for target 'net/bpfilter/main.o' failed make[2]: *** [net/bpfilter/main.o] Error 1 Add missing include path to fix this. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-15rxrpc: Fix loss of final ack on shutdownDavid Howells
Fix the loss of transmission of a call's final ack when a socket gets shut down. This means that the server will retransmit the last data packet or send a ping ack and then get an ICMP indicating the port got closed. The server will then view this as a failure. Fixes: 3136ef49a14c ("rxrpc: Delay terminal ACK transmission on a client call") Signed-off-by: David Howells <dhowells@redhat.com>
2020-10-15rxrpc: Fix bundle counting for exclusive connectionsDavid Howells
Fix rxrpc_unbundle_conn() to not drop the bundle usage count when cleaning up an exclusive connection. Based on the suggested fix from Hillf Danton. Fixes: 245500d853e9 ("rxrpc: Rewrite the client connection manager") Reported-by: syzbot+d57aaf84dd8a550e6d91@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Hillf Danton <hdanton@sina.com>
2020-10-14netfilter: restore NF_INET_NUMHOOKSPablo Neira Ayuso
This definition is used by the iptables legacy UAPI, restore it. Fixes: d3519cb89f6d ("netfilter: nf_tables: add inet ingress support") Reported-by: Jason A. Donenfeld <Jason@zx2c4.com> Tested-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-14ipv6/icmp: l3mdev: Perform icmp error route lookup on source device routing ↵Mathieu Desnoyers
table (v2) As per RFC4443, the destination address field for ICMPv6 error messages is copied from the source address field of the invoking packet. In configurations with Virtual Routing and Forwarding tables, looking up which routing table to use for sending ICMPv6 error messages is currently done by using the destination net_device. If the source and destination interfaces are within separate VRFs, or one in the global routing table and the other in a VRF, looking up the source address of the invoking packet in the destination interface's routing table will fail if the destination interface's routing table contains no route to the invoking packet's source address. One observable effect of this issue is that traceroute6 does not work in the following cases: - Route leaking between global routing table and VRF - Route leaking between VRFs Use the source device routing table when sending ICMPv6 error messages. [ In the context of ipv4, it has been pointed out that a similar issue may exist with ICMP errors triggered when forwarding between network namespaces. It would be worthwhile to investigate whether ipv6 has similar issues, but is outside of the scope of this investigation. ] [ Testing shows that similar issues exist with ipv6 unreachable / fragmentation needed messages. However, investigation of this additional failure mode is beyond this investigation's scope. ] Link: https://tools.ietf.org/html/rfc4443 Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-14ipv4/icmp: l3mdev: Perform icmp error route lookup on source device routing ↵Mathieu Desnoyers
table (v2) As per RFC792, ICMP errors should be sent to the source host. However, in configurations with Virtual Routing and Forwarding tables, looking up which routing table to use is currently done by using the destination net_device. commit 9d1a6c4ea43e ("net: icmp_route_lookup should use rt dev to determine L3 domain") changes the interface passed to l3mdev_master_ifindex() and inet_addr_type_dev_table() from skb_in->dev to skb_dst(skb_in)->dev. This effectively uses the destination device rather than the source device for choosing which routing table should be used to lookup where to send the ICMP error. Therefore, if the source and destination interfaces are within separate VRFs, or one in the global routing table and the other in a VRF, looking up the source host in the destination interface's routing table will fail if the destination interface's routing table contains no route to the source host. One observable effect of this issue is that traceroute does not work in the following cases: - Route leaking between global routing table and VRF - Route leaking between VRFs Preferably use the source device routing table when sending ICMP error messages. If no source device is set, fall-back on the destination device routing table. Else, use the main routing table (index 0). [ It has been pointed out that a similar issue may exist with ICMP errors triggered when forwarding between network namespaces. It would be worthwhile to investigate, but is outside of the scope of this investigation. ] [ It has also been pointed out that a similar issue exists with unreachable / fragmentation needed messages, which can be triggered by changing the MTU of eth1 in r1 to 1400 and running: ip netns exec h1 ping -s 1450 -Mdo -c1 172.16.2.2 Some investigation points to raw_icmp_error() and raw_err() as being involved in this last scenario. The focus of this patch is TTL expired ICMP messages, which go through icmp_route_lookup. Investigation of failure modes related to raw_icmp_error() is beyond this investigation's scope. ] Fixes: 9d1a6c4ea43e ("net: icmp_route_lookup should use rt dev to determine L3 domain") Link: https://tools.ietf.org/html/rfc792 Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Extend nf_queue selftest to cover re-queueing, non-gso mode and delayed queueing, from Florian Westphal. 2) Clear skb->tstamp in IPVS forwarding path, from Julian Anastasov. 3) Provide netlink extended error reporting for EEXIST case. 4) Missing VLAN offload tag and proto in log target. ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-13ip_gre: set dev->hard_header_len and dev->needed_headroom properlyCong Wang
GRE tunnel has its own header_ops, ipgre_header_ops, and sets it conditionally. When it is set, it assumes the outer IP header is already created before ipgre_xmit(). This is not true when we send packets through a raw packet socket, where L2 headers are supposed to be constructed by user. Packet socket calls dev_validate_header() to validate the header. But GRE tunnel does not set dev->hard_header_len, so that check can be simply bypassed, therefore uninit memory could be passed down to ipgre_xmit(). Similar for dev->needed_headroom. dev->hard_header_len is supposed to be the length of the header created by dev->header_ops->create(), so it should be used whenever header_ops is set, and dev->needed_headroom should be used when it is not set. Reported-and-tested-by: syzbot+4a2c52677a8a1aa283cb@syzkaller.appspotmail.com Cc: William Tu <u9012063@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Xie He <xie.he.0141@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-13xfrm: use new function dev_fetch_sw_netstatsHeiner Kallweit
Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/a6b816f4-bbf2-9db0-d59a-7e4e9cc808fe@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-13net: openvswitch: use new function dev_fetch_sw_netstatsHeiner Kallweit
Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/5e52dc91-97b1-82b0-214b-65d404e4a2ec@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-13mac80211: use new function dev_fetch_sw_netstatsHeiner Kallweit
Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/93dda477-70ae-0ccf-71b4-bfebb66c9beb@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-13iptunnel: use new function dev_fetch_sw_netstatsHeiner Kallweit
Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/050f9a83-b195-a3d6-edbd-91a59040be21@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-13net: dsa: use new function dev_fetch_sw_netstatsHeiner Kallweit
Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/b6047017-8226-6b7e-a3cd-064e69fdfa27@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-13net: bridge: use new function dev_fetch_sw_netstatsHeiner Kallweit
Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/d1c3ff29-5691-9d54-d164-16421905fa59@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-13net: add function dev_fetch_sw_netstats for fetching pcpu_sw_netstatsHeiner Kallweit
In several places the same code is used to populate rtnl_link_stats64 fields with data from pcpu_sw_netstats. Therefore factor out this code to a new function dev_fetch_sw_netstats(). v2: - constify argument netstats - don't ignore netstats being NULL or an ERRPTR - switch to EXPORT_SYMBOL_GPL Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/6d16a338-52f5-df69-0020-6bc771a7d498@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-13net/af_unix: Remove unused old_pid variableOr Cohen
Commit 109f6e39fa07c48f5801 ("af_unix: Allow SO_PEERCRED to work across namespaces.") introduced the old_pid variable in unix_listen, but it's never used. Remove the declaration and the call to put_pid. Signed-off-by: Or Cohen <orcohen@paloaltonetworks.com> Link: https://lore.kernel.org/r/20201011153527.18628-1-orcohen@paloaltonetworks.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-13socket: don't clear SOCK_TSTAMP_NEW when SO_TIMESTAMPNS is disabledChristian Eggers
SOCK_TSTAMP_NEW (timespec64 instead of timespec) is also used for hardware time stamps (configured via SO_TIMESTAMPING_NEW). User space (ptp4l) first configures hardware time stamping via SO_TIMESTAMPING_NEW which sets SOCK_TSTAMP_NEW. In the next step, ptp4l disables SO_TIMESTAMPNS(_NEW) (software time stamps), but this must not switch hardware time stamps back to "32 bit mode". This problem happens on 32 bit platforms were the libc has already switched to struct timespec64 (from SO_TIMExxx_OLD to SO_TIMExxx_NEW socket options). ptp4l complains with "missing timestamp on transmitted peer delay request" because the wrong format is received (and discarded). Fixes: 887feae36aee ("socket: Add SO_TIMESTAMP[NS]_NEW") Fixes: 783da70e8396 ("net: add sock_enable_timestamps") Signed-off-by: Christian Eggers <ceggers@arri.de> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-13socket: fix option SO_TIMESTAMPING_NEWChristian Eggers
The comparison of optname with SO_TIMESTAMPING_NEW is wrong way around, so SOCK_TSTAMP_NEW will first be set and than reset again. Additionally move it out of the test for SOF_TIMESTAMPING_RX_SOFTWARE as this seems unrelated. This problem happens on 32 bit platforms were the libc has already switched to struct timespec64 (from SO_TIMExxx_OLD to SO_TIMExxx_NEW socket options). ptp4l complains with "missing timestamp on transmitted peer delay request" because the wrong format is received (and discarded). Fixes: 9718475e6908 ("socket: Add SO_TIMESTAMPING_NEW") Signed-off-by: Christian Eggers <ceggers@arri.de> Reviewed-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Reviewed-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-13net/tls: use semicolons rather than commas to separate statementsJulia Lawall
Replace commas with semicolons. Commas introduce unnecessary variability in the code structure and are hard to see. What is done is essentially described by the following Coccinelle semantic patch (http://coccinelle.lip6.fr/): // <smpl> @@ expression e1,e2; @@ e1 -, +; e2 ... when any // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/1602412498-32025-6-git-send-email-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-13net/ipv6: use semicolons rather than commas to separate statementsJulia Lawall
Replace commas with semicolons. Commas introduce unnecessary variability in the code structure and are hard to see. What is done is essentially described by the following Coccinelle semantic patch (http://coccinelle.lip6.fr/): // <smpl> @@ expression e1,e2; @@ e1 -, +; e2 ... when any // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/r/1602412498-32025-5-git-send-email-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-13tcp: use semicolons rather than commas to separate statementsJulia Lawall
Replace commas with semicolons. Commas introduce unnecessary variability in the code structure and are hard to see. What is done is essentially described by the following Coccinelle semantic patch (http://coccinelle.lip6.fr/): // <smpl> @@ expression e1,e2; @@ e1 -, +; e2 ... when any // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/1602412498-32025-4-git-send-email-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>