From 223f5f79f2ce8facd9d77dd44e9f403343630bfc Mon Sep 17 00:00:00 2001 From: Kui-Feng Lee Date: Fri, 23 Jun 2023 18:45:59 -0700 Subject: bpf, net: Check skb ownership against full socket. Check skb ownership of an skb against full sockets instead of request_sock. The filters were called only if an skb is owned by the sock that the skb is sent out through. In another words, skb->sk should point to the sock that it is sending through its egress. However, the filters would miss SYN/ACK skbs that they are owned by a request_sock but sent through the listener sock, that is the socket listening incoming connections. However, the listener socket is also the full socket of the request socket. We should use the full socket as the owner socket of an skb instead. What is the ownership check for? ================================ BPF_CGROUP_RUN_PROG_INET_EGRESS() checked sk == skb->sk to ensure the ownership of an skb. Alexei referred to a mailing list conversation [0] that took place a few years ago. In that conversation, Daniel Borkmann stated that: Wouldn't that mean however, when you go through stacked devices that you'd run the same eBPF cgroup program for skb->sk multiple times? According to what Daniel said, the ownership check mentioned earlier presumably prevents multiple calls of egress filters caused by an skb. A test that reproduce this scenario shows that the BPF cgroup egress programs can be called multiple times for one skb if this ownership check is not there. So, we can not just remove this check. Test Stacked Devices ==================== We use L2TP to build an environment of stacked devices. L2TP (Layer 2 Tunneling Protocol) is a tunneling protocol used to support virtual private networks (VPNs). It relays encapsulated packets; for example in UDP, to its peer by using a socket. Using L2TP, packets are first sent through the IP stack and should then arrive at an L2TP device. The device will expand its skb header to encapsulate the packet. The skb will be sent back to the IP stack using the socket that was made for the L2TP session. After that, the routing process will occur once more, but this time for a new destination. We changed tools/testing/selftests/net/l2tp.sh to set up a test environment using L2TP. The run_ping() function in l2tp.sh is where the main change occurred. run_ping() { local desc="$1" sleep 10 run_cmd host-1 ${ping6} -s 227 -c 4 -i 10 -I fc00:101::1 fc00:101::2 log_test $? 0 "IPv6 route through L2TP tunnel ${desc}" sleep 10 } The test will use L2TP devices to send PING messages. These messages will have a message size of 227 bytes as a special label to distinguish them. This is not an ideal solution, but works. During the execution of the test script, bpftrace was attached to ip6_finish_output() and l2tp_xmit_skb(): bpftrace -e ' kfunc:ip6_finish_output { time("%H:%M:%S: "); printf("ip6_finish_output skb=%p skb->len=%d cgroup=%p sk=%p skb->sk=%p\n", args->skb, args->skb->len, args->sk->sk_cgrp_data.cgroup, args->sk, args->skb->sk); } kfunc:l2tp_xmit_skb { time("%H:%M:%S: "); printf("l2tp_xmit_skb skb=%p sk=%p\n", args->skb, args->session->tunnel->sock); }' The following is part of the output messages printed by bpftrace: 16:35:20: ip6_finish_output skb=0xffff888103d8e600 skb->len=275 cgroup=0xffff88810741f800 sk=0xffff888105f3b900 skb->sk=0xffff888105f3b900 16:35:20: l2tp_xmit_skb skb=0xffff888103d8e600 sk=0xffff888103dd6300 16:35:20: ip6_finish_output skb=0xffff888103d8e600 skb->len=337 cgroup=0xffff88810741f800 sk=0xffff888103dd6300 skb->sk=0xffff888105f3b900 16:35:20: ip6_finish_output skb=0xffff888103d8e600 skb->len=337 cgroup=(nil) sk=(nil) skb->sk=(nil) 16:35:20: ip6_finish_output skb=0xffff888103d8e000 skb->len=275 cgroup=0xffffffff837741d0 sk=0xffff888101fe0000 skb->sk=0xffff888101fe0000 16:35:20: l2tp_xmit_skb skb=0xffff888103d8e000 sk=0xffff888103483180 16:35:20: ip6_finish_output skb=0xffff888103d8e000 skb->len=337 cgroup=0xffff88810741f800 sk=0xffff888103483180 skb->sk=0xffff888101fe0000 16:35:20: ip6_finish_output skb=0xffff888103d8e000 skb->len=337 cgroup=(nil) sk=(nil) skb->sk=(nil) The first four entries describe a PING message that was sent using the ping command, whereas the following four entries describe the response received. Multiple sockets are used to send one skb, including the socket used by the L2TP session. This can be observed. Based on this information, it seems that the ownership check is designed to avoid multiple calls of egress filters caused by a single skb. [0] https://lore.kernel.org/all/58193E9D.7040201@iogearbox.net/ Signed-off-by: Kui-Feng Lee Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20230624014600.576756-2-kuifeng@meta.com --- include/linux/bpf-cgroup.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index 57e9e109257e..8506690dbb9c 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -199,9 +199,9 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk, #define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk, skb) \ ({ \ int __ret = 0; \ - if (cgroup_bpf_enabled(CGROUP_INET_EGRESS) && sk && sk == skb->sk) { \ + if (cgroup_bpf_enabled(CGROUP_INET_EGRESS) && sk) { \ typeof(sk) __sk = sk_to_full_sk(sk); \ - if (sk_fullsock(__sk) && \ + if (sk_fullsock(__sk) && __sk == skb_to_full_sk(skb) && \ cgroup_bpf_sock_enabled(__sk, CGROUP_INET_EGRESS)) \ __ret = __cgroup_bpf_run_filter_skb(__sk, skb, \ CGROUP_INET_EGRESS); \ -- cgit v1.2.3 From 25954730461af01f66afa9e17036b051986b007e Mon Sep 17 00:00:00 2001 From: Anton Protopopov Date: Thu, 6 Jul 2023 13:39:28 +0000 Subject: bpf: add percpu stats for bpf_map elements insertions/deletions Add a generic percpu stats for bpf_map elements insertions/deletions in order to keep track of both, the current (approximate) number of elements in a map and per-cpu statistics on update/delete operations. To expose these stats a particular map implementation should initialize the counter and adjust it as needed using the 'bpf_map_*_elem_count' helpers provided by this commit. Signed-off-by: Anton Protopopov Link: https://lore.kernel.org/r/20230706133932.45883-2-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) (limited to 'include') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index f58895830ada..360433f14496 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -275,6 +275,7 @@ struct bpf_map { } owner; bool bypass_spec_v1; bool frozen; /* write-once; write-protected by freeze_mutex */ + s64 __percpu *elem_count; }; static inline const char *btf_field_type_name(enum btf_field_type type) @@ -2040,6 +2041,35 @@ bpf_map_alloc_percpu(const struct bpf_map *map, size_t size, size_t align, } #endif +static inline int +bpf_map_init_elem_count(struct bpf_map *map) +{ + size_t size = sizeof(*map->elem_count), align = size; + gfp_t flags = GFP_USER | __GFP_NOWARN; + + map->elem_count = bpf_map_alloc_percpu(map, size, align, flags); + if (!map->elem_count) + return -ENOMEM; + + return 0; +} + +static inline void +bpf_map_free_elem_count(struct bpf_map *map) +{ + free_percpu(map->elem_count); +} + +static inline void bpf_map_inc_elem_count(struct bpf_map *map) +{ + this_cpu_inc(*map->elem_count); +} + +static inline void bpf_map_dec_elem_count(struct bpf_map *map) +{ + this_cpu_dec(*map->elem_count); +} + extern int sysctl_unprivileged_bpf_disabled; static inline bool bpf_allow_ptr_leaks(void) -- cgit v1.2.3 From 7ac8d0d2619256cc13eaf4a889b3177a1607b02d Mon Sep 17 00:00:00 2001 From: Yafang Shao Date: Sun, 9 Jul 2023 02:56:21 +0000 Subject: bpf: Support ->fill_link_info for kprobe_multi With the addition of support for fill_link_info to the kprobe_multi link, users will gain the ability to inspect it conveniently using the `bpftool link show`. This enhancement provides valuable information to the user, including the count of probed functions and their respective addresses. It's important to note that if the kptr_restrict setting is not permitted, the probed address will not be exposed, ensuring security. Signed-off-by: Yafang Shao Acked-by: Jiri Olsa Acked-by: Andrii Nakryiko Link: https://lore.kernel.org/r/20230709025630.3735-2-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov --- include/uapi/linux/bpf.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 60a9d59beeab..a4e881c64e0f 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -6439,6 +6439,11 @@ struct bpf_link_info { __s32 priority; __u32 flags; } netfilter; + struct { + __aligned_u64 addrs; + __u32 count; /* in/out: kprobe_multi function count */ + __u32 flags; + } kprobe_multi; }; } __attribute__((aligned(8))); -- cgit v1.2.3 From 5125e757e62f6c1d5478db4c2b61a744060ddf3f Mon Sep 17 00:00:00 2001 From: Yafang Shao Date: Sun, 9 Jul 2023 02:56:25 +0000 Subject: bpf: Clear the probe_addr for uprobe To avoid returning uninitialized or random values when querying the file descriptor (fd) and accessing probe_addr, it is necessary to clear the variable prior to its use. Fixes: 41bdc4b40ed6 ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY") Signed-off-by: Yafang Shao Acked-by: Yonghong Song Acked-by: Jiri Olsa Link: https://lore.kernel.org/r/20230709025630.3735-6-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/trace_events.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 7c4a0b72334e..36de9ebec440 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -864,7 +864,8 @@ extern int perf_uprobe_init(struct perf_event *event, extern void perf_uprobe_destroy(struct perf_event *event); extern int bpf_get_uprobe_info(const struct perf_event *event, u32 *fd_type, const char **filename, - u64 *probe_offset, bool perf_type_tracepoint); + u64 *probe_offset, u64 *probe_addr, + bool perf_type_tracepoint); #endif extern int ftrace_profile_set_filter(struct perf_event *event, int event_id, char *filter_str); -- cgit v1.2.3 From 1b715e1b0ec531fae72cd6698fe1c98affa436f8 Mon Sep 17 00:00:00 2001 From: Yafang Shao Date: Sun, 9 Jul 2023 02:56:28 +0000 Subject: bpf: Support ->fill_link_info for perf_event By introducing support for ->fill_link_info to the perf_event link, users gain the ability to inspect it using `bpftool link show`. While the current approach involves accessing this information via `bpftool perf show`, consolidating link information for all link types in one place offers greater convenience. Additionally, this patch extends support to the generic perf event, which is not currently accommodated by `bpftool perf show`. While only the perf type and config are exposed to userspace, other attributes such as sample_period and sample_freq are ignored. It's important to note that if kptr_restrict is not permitted, the probed address will not be exposed, maintaining security measures. A new enum bpf_perf_event_type is introduced to help the user understand which struct is relevant. Signed-off-by: Yafang Shao Acked-by: Jiri Olsa Link: https://lore.kernel.org/r/20230709025630.3735-9-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov --- include/uapi/linux/bpf.h | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index a4e881c64e0f..600d0caebbd8 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1057,6 +1057,16 @@ enum bpf_link_type { MAX_BPF_LINK_TYPE, }; +enum bpf_perf_event_type { + BPF_PERF_EVENT_UNSPEC = 0, + BPF_PERF_EVENT_UPROBE = 1, + BPF_PERF_EVENT_URETPROBE = 2, + BPF_PERF_EVENT_KPROBE = 3, + BPF_PERF_EVENT_KRETPROBE = 4, + BPF_PERF_EVENT_TRACEPOINT = 5, + BPF_PERF_EVENT_EVENT = 6, +}; + /* cgroup-bpf attach flags used in BPF_PROG_ATTACH command * * NONE(default): No further bpf programs allowed in the subtree. @@ -6444,6 +6454,31 @@ struct bpf_link_info { __u32 count; /* in/out: kprobe_multi function count */ __u32 flags; } kprobe_multi; + struct { + __u32 type; /* enum bpf_perf_event_type */ + __u32 :32; + union { + struct { + __aligned_u64 file_name; /* in/out */ + __u32 name_len; + __u32 offset; /* offset from file_name */ + } uprobe; /* BPF_PERF_EVENT_UPROBE, BPF_PERF_EVENT_URETPROBE */ + struct { + __aligned_u64 func_name; /* in/out */ + __u32 name_len; + __u32 offset; /* offset from func_name */ + __u64 addr; + } kprobe; /* BPF_PERF_EVENT_KPROBE, BPF_PERF_EVENT_KRETPROBE */ + struct { + __aligned_u64 tp_name; /* in/out */ + __u32 name_len; + } tracepoint; /* BPF_PERF_EVENT_TRACEPOINT */ + struct { + __u64 config; + __u32 type; + } event; /* BPF_PERF_EVENT_EVENT */ + }; + } perf_event; }; } __attribute__((aligned(8))); -- cgit v1.2.3 From 43a89baecfe200cb4530f42b9fcf904925d6d14a Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Wed, 5 Jul 2023 20:34:43 -0700 Subject: rcu: Export rcu_request_urgent_qs_task() If a CPU is executing a long series of non-sleeping system calls, RCU grace periods can be delayed for on the order of a couple hundred milliseconds. This is normally not a problem, but if each system call does a call_rcu(), those callbacks can stack up. RCU will eventually notice this callback storm, but use of rcu_request_urgent_qs_task() allows the code invoking call_rcu() to give RCU a heads up. This function is not for general use, not yet, anyway. Reported-by: Alexei Starovoitov Signed-off-by: Paul E. McKenney Signed-off-by: Alexei Starovoitov Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20230706033447.54696-11-alexei.starovoitov@gmail.com --- include/linux/rcutiny.h | 2 ++ include/linux/rcutree.h | 1 + 2 files changed, 3 insertions(+) (limited to 'include') diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h index 7f17acf29dda..7b949292908a 100644 --- a/include/linux/rcutiny.h +++ b/include/linux/rcutiny.h @@ -138,6 +138,8 @@ static inline int rcu_needs_cpu(void) return 0; } +static inline void rcu_request_urgent_qs_task(struct task_struct *t) { } + /* * Take advantage of the fact that there is only one CPU, which * allows us to ignore virtualization-based context switches. diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h index 56bccb5a8fde..126f6b418f6a 100644 --- a/include/linux/rcutree.h +++ b/include/linux/rcutree.h @@ -21,6 +21,7 @@ void rcu_softirq_qs(void); void rcu_note_context_switch(bool preempt); int rcu_needs_cpu(void); void rcu_cpu_stall_reset(void); +void rcu_request_urgent_qs_task(struct task_struct *t); /* * Note a virtualization-based context switch. This is simply a -- cgit v1.2.3 From 5af6807bdb10d1af9d412d7d6c177ba8440adffb Mon Sep 17 00:00:00 2001 From: Alexei Starovoitov Date: Wed, 5 Jul 2023 20:34:45 -0700 Subject: bpf: Introduce bpf_mem_free_rcu() similar to kfree_rcu(). Introduce bpf_mem_[cache_]free_rcu() similar to kfree_rcu(). Unlike bpf_mem_[cache_]free() that links objects for immediate reuse into per-cpu free list the _rcu() flavor waits for RCU grace period and then moves objects into free_by_rcu_ttrace list where they are waiting for RCU task trace grace period to be freed into slab. The life cycle of objects: alloc: dequeue free_llist free: enqeueu free_llist free_rcu: enqueue free_by_rcu -> waiting_for_gp free_llist above high watermark -> free_by_rcu_ttrace after RCU GP waiting_for_gp -> free_by_rcu_ttrace free_by_rcu_ttrace -> waiting_for_gp_ttrace -> slab Signed-off-by: Alexei Starovoitov Signed-off-by: Daniel Borkmann Acked-by: Hou Tao Link: https://lore.kernel.org/bpf/20230706033447.54696-13-alexei.starovoitov@gmail.com --- include/linux/bpf_mem_alloc.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h index 3929be5743f4..d644bbb298af 100644 --- a/include/linux/bpf_mem_alloc.h +++ b/include/linux/bpf_mem_alloc.h @@ -27,10 +27,12 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma); /* kmalloc/kfree equivalent: */ void *bpf_mem_alloc(struct bpf_mem_alloc *ma, size_t size); void bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr); +void bpf_mem_free_rcu(struct bpf_mem_alloc *ma, void *ptr); /* kmem_cache_alloc/free equivalent: */ void *bpf_mem_cache_alloc(struct bpf_mem_alloc *ma); void bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr); +void bpf_mem_cache_free_rcu(struct bpf_mem_alloc *ma, void *ptr); void bpf_mem_cache_raw_free(void *ptr); void *bpf_mem_cache_alloc_flags(struct bpf_mem_alloc *ma, gfp_t flags); -- cgit v1.2.3 From d26979f1cef71d6fa036f6cedfa0059715092503 Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 10 Jul 2023 10:59:50 +0200 Subject: net: stmmac: replace the has_integrated_pcs field with a flag struct plat_stmmacenet_data contains several boolean fields that could be easily replaced with a common integer 'flags' bitfield and bit defines. Start the process with the has_integrated_pcs field. Signed-off-by: Bartosz Golaszewski Reviewed-by: Andrew Halaney Link: https://lore.kernel.org/r/20230710090001.303225-2-brgl@bgdev.pl Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 06090538fe2d..8e7511071ef1 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -204,6 +204,8 @@ struct dwmac4_addrs { u32 mtl_low_cred_offset; }; +#define STMMAC_FLAG_HAS_INTEGRATED_PCS BIT(0) + struct plat_stmmacenet_data { int bus_id; int phy_addr; @@ -293,6 +295,6 @@ struct plat_stmmacenet_data { bool sph_disable; bool serdes_up_after_phy_linkup; const struct dwmac4_addrs *dwmac4_addrs; - bool has_integrated_pcs; + unsigned int flags; }; #endif -- cgit v1.2.3 From 309efe6eb499d04b7c09e57298c453b602efe3fd Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 10 Jul 2023 10:59:51 +0200 Subject: net: stmmac: replace the sph_disable field with a flag Drop the boolean field of the plat_stmmacenet_data structure in favor of a simple bitfield flag. Signed-off-by: Bartosz Golaszewski Reviewed-by: Andrew Halaney Link: https://lore.kernel.org/r/20230710090001.303225-3-brgl@bgdev.pl Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 8e7511071ef1..1b02f866316c 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -205,6 +205,7 @@ struct dwmac4_addrs { }; #define STMMAC_FLAG_HAS_INTEGRATED_PCS BIT(0) +#define STMMAC_FLAG_SPH_DISABLE BIT(1) struct plat_stmmacenet_data { int bus_id; @@ -292,7 +293,6 @@ struct plat_stmmacenet_data { int msi_rx_base_vec; int msi_tx_base_vec; bool use_phy_wol; - bool sph_disable; bool serdes_up_after_phy_linkup; const struct dwmac4_addrs *dwmac4_addrs; unsigned int flags; -- cgit v1.2.3 From fd1d62d80ebc1a68ba700e92c9da9d443c08f371 Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 10 Jul 2023 10:59:52 +0200 Subject: net: stmmac: replace the use_phy_wol field with a flag Drop the boolean field of the plat_stmmacenet_data structure in favor of a simple bitfield flag. Signed-off-by: Bartosz Golaszewski Reviewed-by: Andrew Halaney Link: https://lore.kernel.org/r/20230710090001.303225-4-brgl@bgdev.pl Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 1b02f866316c..15fb07cc89c8 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -206,6 +206,7 @@ struct dwmac4_addrs { #define STMMAC_FLAG_HAS_INTEGRATED_PCS BIT(0) #define STMMAC_FLAG_SPH_DISABLE BIT(1) +#define STMMAC_FLAG_USE_PHY_WOL BIT(2) struct plat_stmmacenet_data { int bus_id; @@ -292,7 +293,6 @@ struct plat_stmmacenet_data { int msi_sfty_ue_vec; int msi_rx_base_vec; int msi_tx_base_vec; - bool use_phy_wol; bool serdes_up_after_phy_linkup; const struct dwmac4_addrs *dwmac4_addrs; unsigned int flags; -- cgit v1.2.3 From d8daff284e305409fd640feda7345d0221d782ce Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 10 Jul 2023 10:59:53 +0200 Subject: net: stmmac: replace the has_sun8i field with a flag Drop the boolean field of the plat_stmmacenet_data structure in favor of a simple bitfield flag. Signed-off-by: Bartosz Golaszewski Reviewed-by: Andrew Halaney Link: https://lore.kernel.org/r/20230710090001.303225-5-brgl@bgdev.pl Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 15fb07cc89c8..66dcf84d024a 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -207,6 +207,7 @@ struct dwmac4_addrs { #define STMMAC_FLAG_HAS_INTEGRATED_PCS BIT(0) #define STMMAC_FLAG_SPH_DISABLE BIT(1) #define STMMAC_FLAG_USE_PHY_WOL BIT(2) +#define STMMAC_FLAG_HAS_SUN8I BIT(3) struct plat_stmmacenet_data { int bus_id; @@ -270,7 +271,6 @@ struct plat_stmmacenet_data { struct reset_control *stmmac_ahb_rst; struct stmmac_axi *axi; int has_gmac4; - bool has_sun8i; bool tso_en; int rss_en; int mac_port_sel_speed; -- cgit v1.2.3 From 68861a3bcc1caf5c15a56e02090310271fd085e1 Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 10 Jul 2023 10:59:54 +0200 Subject: net: stmmac: replace the tso_en field with a flag Drop the boolean field of the plat_stmmacenet_data structure in favor of a simple bitfield flag. Signed-off-by: Bartosz Golaszewski Reviewed-by: Andrew Halaney Link: https://lore.kernel.org/r/20230710090001.303225-6-brgl@bgdev.pl Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 66dcf84d024a..47ae29a98835 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -208,6 +208,7 @@ struct dwmac4_addrs { #define STMMAC_FLAG_SPH_DISABLE BIT(1) #define STMMAC_FLAG_USE_PHY_WOL BIT(2) #define STMMAC_FLAG_HAS_SUN8I BIT(3) +#define STMMAC_FLAG_TSO_EN BIT(4) struct plat_stmmacenet_data { int bus_id; @@ -271,7 +272,6 @@ struct plat_stmmacenet_data { struct reset_control *stmmac_ahb_rst; struct stmmac_axi *axi; int has_gmac4; - bool tso_en; int rss_en; int mac_port_sel_speed; bool en_tx_lpi_clockgating; -- cgit v1.2.3 From efe92571bfc30f251f6f2fa7828aa5f239736abf Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 10 Jul 2023 10:59:55 +0200 Subject: net: stmmac: replace the serdes_up_after_phy_linkup field with a flag Drop the boolean field of the plat_stmmacenet_data structure in favor of a simple bitfield flag. Signed-off-by: Bartosz Golaszewski Reviewed-by: Andrew Halaney Link: https://lore.kernel.org/r/20230710090001.303225-7-brgl@bgdev.pl Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 47ae29a98835..aeb3e75dc748 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -209,6 +209,7 @@ struct dwmac4_addrs { #define STMMAC_FLAG_USE_PHY_WOL BIT(2) #define STMMAC_FLAG_HAS_SUN8I BIT(3) #define STMMAC_FLAG_TSO_EN BIT(4) +#define STMMAC_FLAG_SERDES_UP_AFTER_PHY_LINKUP BIT(5) struct plat_stmmacenet_data { int bus_id; @@ -293,7 +294,6 @@ struct plat_stmmacenet_data { int msi_sfty_ue_vec; int msi_rx_base_vec; int msi_tx_base_vec; - bool serdes_up_after_phy_linkup; const struct dwmac4_addrs *dwmac4_addrs; unsigned int flags; }; -- cgit v1.2.3 From fc02152bdbb28bd46df66ddcf4f469760c1b8df8 Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 10 Jul 2023 10:59:56 +0200 Subject: net: stmmac: replace the vlan_fail_q_en field with a flag Drop the boolean field of the plat_stmmacenet_data structure in favor of a simple bitfield flag. Signed-off-by: Bartosz Golaszewski Reviewed-by: Andrew Halaney Link: https://lore.kernel.org/r/20230710090001.303225-8-brgl@bgdev.pl Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index aeb3e75dc748..155cb11b1c8a 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -210,6 +210,7 @@ struct dwmac4_addrs { #define STMMAC_FLAG_HAS_SUN8I BIT(3) #define STMMAC_FLAG_TSO_EN BIT(4) #define STMMAC_FLAG_SERDES_UP_AFTER_PHY_LINKUP BIT(5) +#define STMMAC_FLAG_VLAN_FAIL_Q_EN BIT(6) struct plat_stmmacenet_data { int bus_id; @@ -278,7 +279,6 @@ struct plat_stmmacenet_data { bool en_tx_lpi_clockgating; bool rx_clk_runs_in_lpi; int has_xgmac; - bool vlan_fail_q_en; u8 vlan_fail_q; unsigned int eee_usecs_rate; struct pci_dev *pdev; -- cgit v1.2.3 From 956c3f09b9c4cc9567ca11c84007545b939e61aa Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 10 Jul 2023 10:59:57 +0200 Subject: net: stmmac: replace the multi_msi_en field with a flag Drop the boolean field of the plat_stmmacenet_data structure in favor of a simple bitfield flag. Signed-off-by: Bartosz Golaszewski Reviewed-by: Andrew Halaney Link: https://lore.kernel.org/r/20230710090001.303225-9-brgl@bgdev.pl Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 155cb11b1c8a..3365b8071686 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -211,6 +211,7 @@ struct dwmac4_addrs { #define STMMAC_FLAG_TSO_EN BIT(4) #define STMMAC_FLAG_SERDES_UP_AFTER_PHY_LINKUP BIT(5) #define STMMAC_FLAG_VLAN_FAIL_Q_EN BIT(6) +#define STMMAC_FLAG_MULTI_MSI_EN BIT(7) struct plat_stmmacenet_data { int bus_id; @@ -286,7 +287,6 @@ struct plat_stmmacenet_data { int ext_snapshot_num; bool int_snapshot_en; bool ext_snapshot_en; - bool multi_msi_en; int msi_mac_vec; int msi_wol_vec; int msi_lpi_vec; -- cgit v1.2.3 From aa5513f5d95f6b5311de859f8f466a09863bedf6 Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 10 Jul 2023 10:59:58 +0200 Subject: net: stmmac: replace the ext_snapshot_en field with a flag Drop the boolean field of the plat_stmmacenet_data structure in favor of a simple bitfield flag. Signed-off-by: Bartosz Golaszewski Reviewed-by: Andrew Halaney Link: https://lore.kernel.org/r/20230710090001.303225-10-brgl@bgdev.pl Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 3365b8071686..0a77e8b05d3a 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -212,6 +212,7 @@ struct dwmac4_addrs { #define STMMAC_FLAG_SERDES_UP_AFTER_PHY_LINKUP BIT(5) #define STMMAC_FLAG_VLAN_FAIL_Q_EN BIT(6) #define STMMAC_FLAG_MULTI_MSI_EN BIT(7) +#define STMMAC_FLAG_EXT_SNAPSHOT_EN BIT(8) struct plat_stmmacenet_data { int bus_id; @@ -286,7 +287,6 @@ struct plat_stmmacenet_data { int int_snapshot_num; int ext_snapshot_num; bool int_snapshot_en; - bool ext_snapshot_en; int msi_mac_vec; int msi_wol_vec; int msi_lpi_vec; -- cgit v1.2.3 From 621ba7ad7891b381baf9ebf3da4ec4e95c86ea4e Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 10 Jul 2023 10:59:59 +0200 Subject: net: stmmac: replace the int_snapshot_en field with a flag Drop the boolean field of the plat_stmmacenet_data structure in favor of a simple bitfield flag. Signed-off-by: Bartosz Golaszewski Reviewed-by: Andrew Halaney Link: https://lore.kernel.org/r/20230710090001.303225-11-brgl@bgdev.pl Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 0a77e8b05d3a..47708ddd57fd 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -213,6 +213,7 @@ struct dwmac4_addrs { #define STMMAC_FLAG_VLAN_FAIL_Q_EN BIT(6) #define STMMAC_FLAG_MULTI_MSI_EN BIT(7) #define STMMAC_FLAG_EXT_SNAPSHOT_EN BIT(8) +#define STMMAC_FLAG_INT_SNAPSHOT_EN BIT(9) struct plat_stmmacenet_data { int bus_id; @@ -286,7 +287,6 @@ struct plat_stmmacenet_data { struct pci_dev *pdev; int int_snapshot_num; int ext_snapshot_num; - bool int_snapshot_en; int msi_mac_vec; int msi_wol_vec; int msi_lpi_vec; -- cgit v1.2.3 From 743dd1db85f40be1e2c7416c83f0289aaa260ceb Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 10 Jul 2023 11:00:00 +0200 Subject: net: stmmac: replace the rx_clk_runs_in_lpi field with a flag Drop the boolean field of the plat_stmmacenet_data structure in favor of a simple bitfield flag. Signed-off-by: Bartosz Golaszewski Reviewed-by: Andrew Halaney Link: https://lore.kernel.org/r/20230710090001.303225-12-brgl@bgdev.pl Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 47708ddd57fd..c3769dad8238 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -214,6 +214,7 @@ struct dwmac4_addrs { #define STMMAC_FLAG_MULTI_MSI_EN BIT(7) #define STMMAC_FLAG_EXT_SNAPSHOT_EN BIT(8) #define STMMAC_FLAG_INT_SNAPSHOT_EN BIT(9) +#define STMMAC_FLAG_RX_CLK_RUNS_IN_LPI BIT(10) struct plat_stmmacenet_data { int bus_id; @@ -280,7 +281,6 @@ struct plat_stmmacenet_data { int rss_en; int mac_port_sel_speed; bool en_tx_lpi_clockgating; - bool rx_clk_runs_in_lpi; int has_xgmac; u8 vlan_fail_q; unsigned int eee_usecs_rate; -- cgit v1.2.3 From 9d0c0d5ebd635f914ab2ab691b68e8754fbe0a57 Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 10 Jul 2023 11:00:01 +0200 Subject: net: stmmac: replace the en_tx_lpi_clockgating field with a flag Drop the boolean field of the plat_stmmacenet_data structure in favor of a simple bitfield flag. Signed-off-by: Bartosz Golaszewski Reviewed-by: Andrew Halaney Link: https://lore.kernel.org/r/20230710090001.303225-13-brgl@bgdev.pl Reviewed-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index c3769dad8238..ef67dba775d0 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -215,6 +215,7 @@ struct dwmac4_addrs { #define STMMAC_FLAG_EXT_SNAPSHOT_EN BIT(8) #define STMMAC_FLAG_INT_SNAPSHOT_EN BIT(9) #define STMMAC_FLAG_RX_CLK_RUNS_IN_LPI BIT(10) +#define STMMAC_FLAG_EN_TX_LPI_CLOCKGATING BIT(11) struct plat_stmmacenet_data { int bus_id; @@ -280,7 +281,6 @@ struct plat_stmmacenet_data { int has_gmac4; int rss_en; int mac_port_sel_speed; - bool en_tx_lpi_clockgating; int has_xgmac; u8 vlan_fail_q; unsigned int eee_usecs_rate; -- cgit v1.2.3 From 5b52ad34f9487b2c2d1e60fe37e5bd5656b4dac8 Mon Sep 17 00:00:00 2001 From: Guillaume Nault Date: Tue, 11 Jul 2023 15:06:08 +0200 Subject: security: Constify sk in the sk_getsecid hook. The sk_getsecid hook shouldn't need to modify its socket argument. Make it const so that callers of security_sk_classify_flow() can use a const struct sock *. Signed-off-by: Guillaume Nault Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/linux/lsm_hook_defs.h | 2 +- include/linux/security.h | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index 7308a1a7599b..4f2621e87634 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -316,7 +316,7 @@ LSM_HOOK(int, 0, sk_alloc_security, struct sock *sk, int family, gfp_t priority) LSM_HOOK(void, LSM_RET_VOID, sk_free_security, struct sock *sk) LSM_HOOK(void, LSM_RET_VOID, sk_clone_security, const struct sock *sk, struct sock *newsk) -LSM_HOOK(void, LSM_RET_VOID, sk_getsecid, struct sock *sk, u32 *secid) +LSM_HOOK(void, LSM_RET_VOID, sk_getsecid, const struct sock *sk, u32 *secid) LSM_HOOK(void, LSM_RET_VOID, sock_graft, struct sock *sk, struct socket *parent) LSM_HOOK(int, 0, inet_conn_request, const struct sock *sk, struct sk_buff *skb, struct request_sock *req) diff --git a/include/linux/security.h b/include/linux/security.h index 32828502f09e..994cf099d9ac 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1439,7 +1439,8 @@ int security_socket_getpeersec_dgram(struct socket *sock, struct sk_buff *skb, u int security_sk_alloc(struct sock *sk, int family, gfp_t priority); void security_sk_free(struct sock *sk); void security_sk_clone(const struct sock *sk, struct sock *newsk); -void security_sk_classify_flow(struct sock *sk, struct flowi_common *flic); +void security_sk_classify_flow(const struct sock *sk, + struct flowi_common *flic); void security_req_classify_flow(const struct request_sock *req, struct flowi_common *flic); void security_sock_graft(struct sock*sk, struct socket *parent); @@ -1597,7 +1598,7 @@ static inline void security_sk_clone(const struct sock *sk, struct sock *newsk) { } -static inline void security_sk_classify_flow(struct sock *sk, +static inline void security_sk_classify_flow(const struct sock *sk, struct flowi_common *flic) { } -- cgit v1.2.3 From 8d6eba33a2726e463eeab5c42b8eb373d7462127 Mon Sep 17 00:00:00 2001 From: Guillaume Nault Date: Tue, 11 Jul 2023 15:06:14 +0200 Subject: ipv4: Constify the sk parameter of ip_route_output_*(). These functions don't need to modify the socket, so let's allow the callers to pass a const struct sock *. Signed-off-by: Guillaume Nault Reviewed-by: Simon Horman Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/net/route.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/net/route.h b/include/net/route.h index 5a5c726472bd..d8d150155195 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -163,7 +163,7 @@ static inline struct rtable *ip_route_output(struct net *net, __be32 daddr, } static inline struct rtable *ip_route_output_ports(struct net *net, struct flowi4 *fl4, - struct sock *sk, + const struct sock *sk, __be32 daddr, __be32 saddr, __be16 dport, __be16 sport, __u8 proto, __u8 tos, int oif) @@ -309,7 +309,7 @@ static inline void ip_route_connect_init(struct flowi4 *fl4, __be32 dst, static inline struct rtable *ip_route_connect(struct flowi4 *fl4, __be32 dst, __be32 src, int oif, u8 protocol, __be16 sport, __be16 dport, - struct sock *sk) + const struct sock *sk) { struct net *net = sock_net(sk); struct rtable *rt; @@ -330,7 +330,7 @@ static inline struct rtable *ip_route_connect(struct flowi4 *fl4, __be32 dst, static inline struct rtable *ip_route_newports(struct flowi4 *fl4, struct rtable *rt, __be16 orig_sport, __be16 orig_dport, __be16 sport, __be16 dport, - struct sock *sk) + const struct sock *sk) { if (sport != orig_sport || dport != orig_dport) { fl4->fl4_dport = dport; -- cgit v1.2.3 From 5bc67a854cb4982aa7746e8d2206a00b46a9cc0f Mon Sep 17 00:00:00 2001 From: Guillaume Nault Date: Tue, 11 Jul 2023 15:06:21 +0200 Subject: ipv6: Constify the sk parameter of several helper functions. icmpv6_flow_init(), ip6_datagram_flow_key_init() and ip6_mc_hdr() don't need to modify their sk argument. Make that explicit using const. Signed-off-by: Guillaume Nault Reviewed-by: Simon Horman Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/linux/icmpv6.h | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) (limited to 'include') diff --git a/include/linux/icmpv6.h b/include/linux/icmpv6.h index db0f4fcfdaf4..e3b3b0fa2a8f 100644 --- a/include/linux/icmpv6.h +++ b/include/linux/icmpv6.h @@ -85,12 +85,10 @@ extern void icmpv6_param_prob_reason(struct sk_buff *skb, struct flowi6; struct in6_addr; -extern void icmpv6_flow_init(struct sock *sk, - struct flowi6 *fl6, - u8 type, - const struct in6_addr *saddr, - const struct in6_addr *daddr, - int oif); + +void icmpv6_flow_init(const struct sock *sk, struct flowi6 *fl6, u8 type, + const struct in6_addr *saddr, + const struct in6_addr *daddr, int oif); static inline void icmpv6_param_prob(struct sk_buff *skb, u8 code, int pos) { -- cgit v1.2.3 From 90ef0a7b0622c62758b2638604927867775479ea Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Thu, 13 Jul 2023 09:42:07 +0100 Subject: net: phylink: add pcs_enable()/pcs_disable() methods Add phylink PCS enable/disable callbacks that will allow us to place IEEE 802.3 register compliant PCS in power-down mode while not being used. Signed-off-by: Russell King (Oracle) Signed-off-by: David S. Miller --- include/linux/phylink.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'include') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 1817940a3418..8e2fdd48a935 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -535,6 +535,8 @@ struct phylink_pcs { /** * struct phylink_pcs_ops - MAC PCS operations structure. * @pcs_validate: validate the link configuration. + * @pcs_enable: enable the PCS. + * @pcs_disable: disable the PCS. * @pcs_get_state: read the current MAC PCS link state from the hardware. * @pcs_config: configure the MAC PCS for the selected mode and state. * @pcs_an_restart: restart 802.3z BaseX autonegotiation. @@ -544,6 +546,8 @@ struct phylink_pcs { struct phylink_pcs_ops { int (*pcs_validate)(struct phylink_pcs *pcs, unsigned long *supported, const struct phylink_link_state *state); + int (*pcs_enable)(struct phylink_pcs *pcs); + void (*pcs_disable)(struct phylink_pcs *pcs); void (*pcs_get_state)(struct phylink_pcs *pcs, struct phylink_link_state *state); int (*pcs_config)(struct phylink_pcs *pcs, unsigned int neg_mode, @@ -573,6 +577,18 @@ struct phylink_pcs_ops { int pcs_validate(struct phylink_pcs *pcs, unsigned long *supported, const struct phylink_link_state *state); +/** + * pcs_enable() - enable the PCS. + * @pcs: a pointer to a &struct phylink_pcs. + */ +int pcs_enable(struct phylink_pcs *pcs); + +/** + * pcs_disable() - disable the PCS. + * @pcs: a pointer to a &struct phylink_pcs. + */ +void pcs_disable(struct phylink_pcs *pcs); + /** * pcs_get_state() - Read the current inband link state from the hardware * @pcs: a pointer to a &struct phylink_pcs. -- cgit v1.2.3 From aee6098822ed8a298ad817da8339ba4c7ea381fe Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Thu, 13 Jul 2023 09:42:12 +0100 Subject: net: phylink: add pcs_pre_config()/pcs_post_config() methods Add hooks that are called before and after the mac_config() call, which will be needed to deal with errata workarounds for the Marvell 88e639x DSA switches. Reviewed-by: Andrew Lunn Signed-off-by: Russell King (Oracle) Signed-off-by: David S. Miller --- include/linux/phylink.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 8e2fdd48a935..99fc2fa60695 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -537,6 +537,8 @@ struct phylink_pcs { * @pcs_validate: validate the link configuration. * @pcs_enable: enable the PCS. * @pcs_disable: disable the PCS. + * @pcs_pre_config: pre-mac_config method (for errata) + * @pcs_post_config: post-mac_config method (for arrata) * @pcs_get_state: read the current MAC PCS link state from the hardware. * @pcs_config: configure the MAC PCS for the selected mode and state. * @pcs_an_restart: restart 802.3z BaseX autonegotiation. @@ -548,6 +550,10 @@ struct phylink_pcs_ops { const struct phylink_link_state *state); int (*pcs_enable)(struct phylink_pcs *pcs); void (*pcs_disable)(struct phylink_pcs *pcs); + void (*pcs_pre_config)(struct phylink_pcs *pcs, + phy_interface_t interface); + int (*pcs_post_config)(struct phylink_pcs *pcs, + phy_interface_t interface); void (*pcs_get_state)(struct phylink_pcs *pcs, struct phylink_link_state *state); int (*pcs_config)(struct phylink_pcs *pcs, unsigned int neg_mode, -- cgit v1.2.3 From 24699cc1ff3e633d7c3a0d3ef394243db11757ec Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Thu, 13 Jul 2023 09:42:17 +0100 Subject: net: phylink: add support for PCS link change notifications Add a function, phylink_pcs_change() which can be used by PCs drivers to notify phylink about changes to the PCS link state. Signed-off-by: Russell King (Oracle) Signed-off-by: David S. Miller --- include/linux/phylink.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 99fc2fa60695..b28aa3eef7d5 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -9,6 +9,7 @@ struct device_node; struct ethtool_cmd; struct fwnode_handle; struct net_device; +struct phylink; enum { MLO_PAUSE_NONE, @@ -520,14 +521,19 @@ struct phylink_pcs_ops; /** * struct phylink_pcs - PHYLINK PCS instance * @ops: a pointer to the &struct phylink_pcs_ops structure + * @phylink: pointer to &struct phylink_config * @neg_mode: provide PCS neg mode via "mode" argument * @poll: poll the PCS for link changes * * This structure is designed to be embedded within the PCS private data, * and will be passed between phylink and the PCS. + * + * The @phylink member is private to phylink and must not be touched by + * the PCS driver. */ struct phylink_pcs { const struct phylink_pcs_ops *ops; + struct phylink *phylink; bool neg_mode; bool poll; }; @@ -699,6 +705,7 @@ int phylink_fwnode_phy_connect(struct phylink *pl, void phylink_disconnect_phy(struct phylink *); void phylink_mac_change(struct phylink *, bool up); +void phylink_pcs_change(struct phylink_pcs *, bool up); void phylink_start(struct phylink *); void phylink_stop(struct phylink *); -- cgit v1.2.3 From e6a45700e7e19b1c945ee56feab429ff8489370b Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Thu, 13 Jul 2023 09:42:22 +0100 Subject: net: mdio: add unlocked mdiobus and mdiodev bus accessors Add the following unlocked accessors to complete the set: __mdiobus_modify() __mdiodev_read() __mdiodev_write() __mdiodev_modify() __mdiodev_modify_changed() which we will need for Marvell DSA PCS conversion. Reviewed-by: Andrew Lunn Signed-off-by: Russell King (Oracle) Signed-off-by: David S. Miller --- include/linux/mdio.h | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) (limited to 'include') diff --git a/include/linux/mdio.h b/include/linux/mdio.h index c1b7008826e5..8fa23bdcedbf 100644 --- a/include/linux/mdio.h +++ b/include/linux/mdio.h @@ -537,6 +537,8 @@ static inline void mii_c73_mod_linkmode(unsigned long *adv, u16 *lpa) int __mdiobus_read(struct mii_bus *bus, int addr, u32 regnum); int __mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val); +int __mdiobus_modify(struct mii_bus *bus, int addr, u32 regnum, u16 mask, + u16 set); int __mdiobus_modify_changed(struct mii_bus *bus, int addr, u32 regnum, u16 mask, u16 set); @@ -564,6 +566,30 @@ int mdiobus_c45_modify(struct mii_bus *bus, int addr, int devad, u32 regnum, int mdiobus_c45_modify_changed(struct mii_bus *bus, int addr, int devad, u32 regnum, u16 mask, u16 set); +static inline int __mdiodev_read(struct mdio_device *mdiodev, u32 regnum) +{ + return __mdiobus_read(mdiodev->bus, mdiodev->addr, regnum); +} + +static inline int __mdiodev_write(struct mdio_device *mdiodev, u32 regnum, + u16 val) +{ + return __mdiobus_write(mdiodev->bus, mdiodev->addr, regnum, val); +} + +static inline int __mdiodev_modify(struct mdio_device *mdiodev, u32 regnum, + u16 mask, u16 set) +{ + return __mdiobus_modify(mdiodev->bus, mdiodev->addr, regnum, mask, set); +} + +static inline int __mdiodev_modify_changed(struct mdio_device *mdiodev, + u32 regnum, u16 mask, u16 set) +{ + return __mdiobus_modify_changed(mdiodev->bus, mdiodev->addr, regnum, + mask, set); +} + static inline int mdiodev_read(struct mdio_device *mdiodev, u32 regnum) { return mdiobus_read(mdiodev->bus, mdiodev->addr, regnum); -- cgit v1.2.3 From 9fa0bba012c2dd6d2b0db893314a4cc252a91b5f Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Thu, 13 Jul 2023 15:19:05 -0700 Subject: net: phy: bcm7xxx: Add EPHY entry for 74165 74165 is a 16nm process SoC with a 10/100 integrated Ethernet PHY, utilize the recently defined 16nm EPHY macro to configure that PHY. Reviewed-by: Simon Horman Reviewed-by: Andrew Lunn Signed-off-by: Florian Fainelli Signed-off-by: Justin Chen Signed-off-by: David S. Miller --- include/linux/brcmphy.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h index 5d732f48f787..c55810a43541 100644 --- a/include/linux/brcmphy.h +++ b/include/linux/brcmphy.h @@ -44,6 +44,7 @@ #define PHY_ID_BCM7366 0x600d8490 #define PHY_ID_BCM7346 0x600d8650 #define PHY_ID_BCM7362 0x600d84b0 +#define PHY_ID_BCM74165 0x359052c0 #define PHY_ID_BCM7425 0x600d86b0 #define PHY_ID_BCM7429 0x600d8730 #define PHY_ID_BCM7435 0x600d8750 -- cgit v1.2.3 From a88dd7538461b2daf6883e5392957b21270638ed Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Fri, 14 Jul 2023 10:12:07 +0100 Subject: net: dsa: remove legacy_pre_march2020 detection All drivers are now updated for the March 2020 changes, and no longer make use of the mac_pcs_get_state() or mac_an_restart() operations, which are now NULL across all DSA drivers. All DSA drivers don't look at speed, duplex, pause or advertisement in their phylink_mac_config() method either. Remove support for these operations from DSA, and stop marking DSA as a legacy driver by default. Signed-off-by: Russell King (Oracle) Reviewed-by: Florian Fainelli Reviewed-by: Vladimir Oltean Signed-off-by: Paolo Abeni --- include/net/dsa.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include') diff --git a/include/net/dsa.h b/include/net/dsa.h index d309ee7ed04b..0b9c6aa27047 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -873,8 +873,6 @@ struct dsa_switch_ops { struct phylink_pcs *(*phylink_mac_select_pcs)(struct dsa_switch *ds, int port, phy_interface_t iface); - int (*phylink_mac_link_state)(struct dsa_switch *ds, int port, - struct phylink_link_state *state); int (*phylink_mac_prepare)(struct dsa_switch *ds, int port, unsigned int mode, phy_interface_t interface); @@ -884,7 +882,6 @@ struct dsa_switch_ops { int (*phylink_mac_finish)(struct dsa_switch *ds, int port, unsigned int mode, phy_interface_t interface); - void (*phylink_mac_an_restart)(struct dsa_switch *ds, int port); void (*phylink_mac_link_down)(struct dsa_switch *ds, int port, unsigned int mode, phy_interface_t interface); -- cgit v1.2.3 From 76226787e137962750241bb29a9572dfc10d9eb1 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Fri, 14 Jul 2023 10:12:17 +0100 Subject: net: phylink: remove legacy mac_an_restart() method The mac_an_restart() method is now completely unused, and has been superseded by phylink_pcs support. Remove this method. Since phylink_pcs_mac_an_restart() now only deals with the PCS, rename the function to remove the _mac infix. Signed-off-by: Russell King (Oracle) Reviewed-by: Florian Fainelli Signed-off-by: Paolo Abeni --- include/linux/phylink.h | 12 ------------ 1 file changed, 12 deletions(-) (limited to 'include') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index b28aa3eef7d5..9e861c8316d0 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -234,7 +234,6 @@ struct phylink_config { * @mac_prepare: prepare for a major reconfiguration of the interface. * @mac_config: configure the MAC for the selected mode and state. * @mac_finish: finish a major reconfiguration of the interface. - * @mac_an_restart: restart 802.3z BaseX autonegotiation. * @mac_link_down: take the link down. * @mac_link_up: allow the link to come up. * @@ -254,7 +253,6 @@ struct phylink_mac_ops { const struct phylink_link_state *state); int (*mac_finish)(struct phylink_config *config, unsigned int mode, phy_interface_t iface); - void (*mac_an_restart)(struct phylink_config *config); void (*mac_link_down)(struct phylink_config *config, unsigned int mode, phy_interface_t interface); void (*mac_link_up)(struct phylink_config *config, @@ -459,16 +457,6 @@ void mac_config(struct phylink_config *config, unsigned int mode, int mac_finish(struct phylink_config *config, unsigned int mode, phy_interface_t iface); -/** - * mac_an_restart() - restart 802.3z BaseX autonegotiation - * @config: a pointer to a &struct phylink_config. - * - * Note: This is a legacy method. This function will not be called unless - * legacy_pre_march2020 is set in &struct phylink_config and there is no - * PCS attached. - */ -void mac_an_restart(struct phylink_config *config); - /** * mac_link_down() - take the link down * @config: a pointer to a &struct phylink_config. -- cgit v1.2.3 From 0a1f7bfe35a3e1302529fa900bf0574a5dfc8ea6 Mon Sep 17 00:00:00 2001 From: Dave Marchevsky Date: Tue, 18 Jul 2023 01:38:09 -0700 Subject: bpf: Introduce internal definitions for UAPI-opaque bpf_{rb,list}_node Structs bpf_rb_node and bpf_list_node are opaquely defined in uapi/linux/bpf.h, as BPF program writers are not expected to touch their fields - nor does the verifier allow them to do so. Currently these structs are simple wrappers around structs rb_node and list_head and linked_list / rbtree implementation just casts and passes to library functions for those data structures. Later patches in this series, though, will add an "owner" field to bpf_{rb,list}_node, such that they're not just wrapping an underlying node type. Moreover, the bpf linked_list and rbtree implementations will deal with these owner pointers directly in a few different places. To avoid having to do void *owner = (void*)bpf_list_node + sizeof(struct list_head) with opaque UAPI node types, add bpf_{list,rb}_node_kern struct definitions to internal headers and modify linked_list and rbtree to use the internal types where appropriate. Signed-off-by: Dave Marchevsky Link: https://lore.kernel.org/r/20230718083813.3416104-3-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 360433f14496..511ed49c3fe9 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -228,6 +228,16 @@ struct btf_record { struct btf_field fields[]; }; +/* Non-opaque version of bpf_rb_node in uapi/linux/bpf.h */ +struct bpf_rb_node_kern { + struct rb_node rb_node; +} __attribute__((aligned(8))); + +/* Non-opaque version of bpf_list_node in uapi/linux/bpf.h */ +struct bpf_list_node_kern { + struct list_head list_head; +} __attribute__((aligned(8))); + struct bpf_map { /* The first two cachelines with read-mostly members of which some * are also accessed in fast-path (e.g. ops, max_entries). -- cgit v1.2.3 From c3c510ce431cd99fa10dcd50d995c8e89330ee5b Mon Sep 17 00:00:00 2001 From: Dave Marchevsky Date: Tue, 18 Jul 2023 01:38:10 -0700 Subject: bpf: Add 'owner' field to bpf_{list,rb}_node As described by Kumar in [0], in shared ownership scenarios it is necessary to do runtime tracking of {rb,list} node ownership - and synchronize updates using this ownership information - in order to prevent races. This patch adds an 'owner' field to struct bpf_list_node and bpf_rb_node to implement such runtime tracking. The owner field is a void * that describes the ownership state of a node. It can have the following values: NULL - the node is not owned by any data structure BPF_PTR_POISON - the node is in the process of being added to a data structure ptr_to_root - the pointee is a data structure 'root' (bpf_rb_root / bpf_list_head) which owns this node The field is initially NULL (set by bpf_obj_init_field default behavior) and transitions states in the following sequence: Insertion: NULL -> BPF_PTR_POISON -> ptr_to_root Removal: ptr_to_root -> NULL Before a node has been successfully inserted, it is not protected by any root's lock, and therefore two programs can attempt to add the same node to different roots simultaneously. For this reason the intermediate BPF_PTR_POISON state is necessary. For removal, the node is protected by some root's lock so this intermediate hop isn't necessary. Note that bpf_list_pop_{front,back} helpers don't need to check owner before removing as the node-to-be-removed is not passed in as input and is instead taken directly from the list. Do the check anyways and WARN_ON_ONCE in this unexpected scenario. Selftest changes in this patch are entirely mechanical: some BTF tests have hardcoded struct sizes for structs that contain bpf_{list,rb}_node fields, those were adjusted to account for the new sizes. Selftest additions to validate the owner field are added in a further patch in the series. [0]: https://lore.kernel.org/bpf/d7hyspcow5wtjcmw4fugdgyp3fwhljwuscp3xyut5qnwivyeru@ysdq543otzv2 Signed-off-by: Dave Marchevsky Suggested-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20230718083813.3416104-4-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 2 ++ include/uapi/linux/bpf.h | 2 ++ 2 files changed, 4 insertions(+) (limited to 'include') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 511ed49c3fe9..ceaa8c23287f 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -231,11 +231,13 @@ struct btf_record { /* Non-opaque version of bpf_rb_node in uapi/linux/bpf.h */ struct bpf_rb_node_kern { struct rb_node rb_node; + void *owner; } __attribute__((aligned(8))); /* Non-opaque version of bpf_list_node in uapi/linux/bpf.h */ struct bpf_list_node_kern { struct list_head list_head; + void *owner; } __attribute__((aligned(8))); struct bpf_map { diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 600d0caebbd8..9ed59896ebc5 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -7052,6 +7052,7 @@ struct bpf_list_head { struct bpf_list_node { __u64 :64; __u64 :64; + __u64 :64; } __attribute__((aligned(8))); struct bpf_rb_root { @@ -7063,6 +7064,7 @@ struct bpf_rb_node { __u64 :64; __u64 :64; __u64 :64; + __u64 :64; } __attribute__((aligned(8))); struct bpf_refcount { -- cgit v1.2.3 From dfa2f0483360d4d6f2324405464c9f281156bd87 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 17 Jul 2023 15:29:17 +0000 Subject: tcp: get rid of sysctl_tcp_adv_win_scale With modern NIC drivers shifting to full page allocations per received frame, we face the following issue: TCP has one per-netns sysctl used to tweak how to translate a memory use into an expected payload (RWIN), in RX path. tcp_win_from_space() implementation is limited to few cases. For hosts dealing with various MSS, we either under estimate or over estimate the RWIN we send to the remote peers. For instance with the default sysctl_tcp_adv_win_scale value, we expect to store 50% of payload per allocated chunk of memory. For the typical use of MTU=1500 traffic, and order-0 pages allocations by NIC drivers, we are sending too big RWIN, leading to potential tcp collapse operations, which are extremely expensive and source of latency spikes. This patch makes sysctl_tcp_adv_win_scale obsolete, and instead uses a per socket scaling factor, so that we can precisely adjust the RWIN based on effective skb->len/skb->truesize ratio. This patch alone can double TCP receive performance when receivers are too slow to drain their receive queue, or by allowing a bigger RWIN when MSS is close to PAGE_SIZE. Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Link: https://lore.kernel.org/r/20230717152917.751987-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- include/linux/tcp.h | 4 +++- include/net/netns/ipv4.h | 2 +- include/net/tcp.h | 24 ++++++++++++++++++++---- 3 files changed, 24 insertions(+), 6 deletions(-) (limited to 'include') diff --git a/include/linux/tcp.h b/include/linux/tcp.h index b4c08ac86983..fbcb0ce13171 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -172,6 +172,8 @@ static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req) return (struct tcp_request_sock *)req; } +#define TCP_RMEM_TO_WIN_SCALE 8 + struct tcp_sock { /* inet_connection_sock has to be the first member of tcp_sock */ struct inet_connection_sock inet_conn; @@ -238,7 +240,7 @@ struct tcp_sock { u32 window_clamp; /* Maximal window to advertise */ u32 rcv_ssthresh; /* Current window clamp */ - + u8 scaling_ratio; /* see tcp_win_from_space() */ /* Information of the most recently (s)acked skb */ struct tcp_rack { u64 mstamp; /* (Re)sent time of the skb */ diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index f00374718159..7a41c4791536 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -152,7 +152,7 @@ struct netns_ipv4 { u8 sysctl_tcp_abort_on_overflow; u8 sysctl_tcp_fack; /* obsolete */ int sysctl_tcp_max_reordering; - int sysctl_tcp_adv_win_scale; + int sysctl_tcp_adv_win_scale; /* obsolete */ u8 sysctl_tcp_dsack; u8 sysctl_tcp_app_win; u8 sysctl_tcp_frto; diff --git a/include/net/tcp.h b/include/net/tcp.h index 226bce6d1e8c..2104a71c75ba 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1434,11 +1434,27 @@ void tcp_select_initial_window(const struct sock *sk, int __space, static inline int tcp_win_from_space(const struct sock *sk, int space) { - int tcp_adv_win_scale = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_adv_win_scale); + s64 scaled_space = (s64)space * tcp_sk(sk)->scaling_ratio; - return tcp_adv_win_scale <= 0 ? - (space>>(-tcp_adv_win_scale)) : - space - (space>>tcp_adv_win_scale); + return scaled_space >> TCP_RMEM_TO_WIN_SCALE; +} + +/* inverse of tcp_win_from_space() */ +static inline int tcp_space_from_win(const struct sock *sk, int win) +{ + u64 val = (u64)win << TCP_RMEM_TO_WIN_SCALE; + + do_div(val, tcp_sk(sk)->scaling_ratio); + return val; +} + +static inline void tcp_scaling_ratio_init(struct sock *sk) +{ + /* Assume a conservative default of 1200 bytes of payload per 4K page. + * This may be adjusted later in tcp_measure_rcv_mss(). + */ + tcp_sk(sk)->scaling_ratio = (1200 << TCP_RMEM_TO_WIN_SCALE) / + SKB_TRUESIZE(4096); } /* Note: caller must be prepared to deal with negative returns */ -- cgit v1.2.3 From 8bb5e82589f0141a990d3fd21d5b79a73a8c6c7b Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Mon, 17 Jul 2023 11:12:26 +0300 Subject: ip_tunnels: Add nexthop ID field to ip_tunnel_key Extend the ip_tunnel_key structure with a field indicating the ID of the nexthop object via which the skb should be routed. The field is going to be populated in subsequent patches by the bridge driver in order to indicate to the VXLAN driver which FDB nexthop object to use in order to reach the target host. Signed-off-by: Ido Schimmel Reviewed-by: Nikolay Aleksandrov Signed-off-by: David S. Miller --- include/net/ip_tunnels.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h index ed4b6ad3fcac..e8750b4ef7e1 100644 --- a/include/net/ip_tunnels.h +++ b/include/net/ip_tunnels.h @@ -52,6 +52,7 @@ struct ip_tunnel_key { u8 tos; /* TOS for IPv4, TC for IPv6 */ u8 ttl; /* TTL for IPv4, HL for IPv6 */ __be32 label; /* Flow Label for IPv6 */ + u32 nhid; __be16 tp_src; __be16 tp_dst; __u8 flow_flags; -- cgit v1.2.3 From 29cfb2aaa4425a608651a05b9b875bc445394443 Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Mon, 17 Jul 2023 11:12:28 +0300 Subject: bridge: Add backup nexthop ID support Add a new bridge port attribute that allows attaching a nexthop object ID to an skb that is redirected to a backup bridge port with VLAN tunneling enabled. Specifically, when redirecting a known unicast packet, read the backup nexthop ID from the bridge port that lost its carrier and set it in the bridge control block of the skb before forwarding it via the backup port. Note that reading the ID from the bridge port should not result in a cache miss as the ID is added next to the 'backup_port' field that was already accessed. After this change, the 'state' field still stays on the first cache line, together with other data path related fields such as 'flags and 'vlgrp': struct net_bridge_port { struct net_bridge * br; /* 0 8 */ struct net_device * dev; /* 8 8 */ netdevice_tracker dev_tracker; /* 16 0 */ struct list_head list; /* 16 16 */ long unsigned int flags; /* 32 8 */ struct net_bridge_vlan_group * vlgrp; /* 40 8 */ struct net_bridge_port * backup_port; /* 48 8 */ u32 backup_nhid; /* 56 4 */ u8 priority; /* 60 1 */ u8 state; /* 61 1 */ u16 port_no; /* 62 2 */ /* --- cacheline 1 boundary (64 bytes) --- */ [...] } __attribute__((__aligned__(8))); When forwarding an skb via a bridge port that has VLAN tunneling enabled, check if the backup nexthop ID stored in the bridge control block is valid (i.e., not zero). If so, instead of attaching the pre-allocated metadata (that only has the tunnel key set), allocate a new metadata, set both the tunnel key and the nexthop object ID and attach it to the skb. By default, do not dump the new attribute to user space as a value of zero is an invalid nexthop object ID. The above is useful for EVPN multihoming. When one of the links composing an Ethernet Segment (ES) fails, traffic needs to be redirected towards the host via one of the other ES peers. For example, if a host is multihomed to three different VTEPs, the backup port of each ES link needs to be set to the VXLAN device and the backup nexthop ID needs to point to an FDB nexthop group that includes the IP addresses of the other two VTEPs. The VXLAN driver will extract the ID from the metadata of the redirected skb, calculate its flow hash and forward it towards one of the other VTEPs. If the ID does not exist, or represents an invalid nexthop object, the VXLAN driver will drop the skb. This relieves the bridge driver from the need to validate the ID. Signed-off-by: Ido Schimmel Acked-by: Nikolay Aleksandrov Signed-off-by: David S. Miller --- include/uapi/linux/if_link.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 0f6a0fe09bdb..ce3117df9cec 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -570,6 +570,7 @@ enum { IFLA_BRPORT_MCAST_N_GROUPS, IFLA_BRPORT_MCAST_MAX_GROUPS, IFLA_BRPORT_NEIGH_VLAN_SUPPRESS, + IFLA_BRPORT_BACKUP_NHID, __IFLA_BRPORT_MAX }; #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1) -- cgit v1.2.3 From 5ba190c29cf92f157bd63c9909c7050d6dc43df7 Mon Sep 17 00:00:00 2001 From: Anton Protopopov Date: Wed, 19 Jul 2023 09:29:50 +0000 Subject: bpf: consider CONST_PTR_TO_MAP as trusted pointer to struct bpf_map Add the BTF id of struct bpf_map to the reg2btf_ids array. This makes the values of the CONST_PTR_TO_MAP type to be considered as trusted by kfuncs. This, in turn, allows users to execute trusted kfuncs which accept `struct bpf_map *` arguments from non-tracing programs. While exporting the btf_bpf_map_id variable, save some bytes by defining it as BTF_ID_LIST_GLOBAL_SINGLE (which is u32[1]) and not as BTF_ID_LIST (which is u32[64]). Signed-off-by: Anton Protopopov Link: https://lore.kernel.org/r/20230719092952.41202-3-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov --- include/linux/btf_ids.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/btf_ids.h b/include/linux/btf_ids.h index 00950cc03bff..a3462a9b8e18 100644 --- a/include/linux/btf_ids.h +++ b/include/linux/btf_ids.h @@ -267,5 +267,6 @@ MAX_BTF_TRACING_TYPE, extern u32 btf_tracing_ids[]; extern u32 bpf_cgroup_btf_id[]; extern u32 bpf_local_storage_map_btf_id[]; +extern u32 btf_bpf_map_id[]; #endif -- cgit v1.2.3 From 63a64a56bc3f77c74085047ee45356ac850da3e8 Mon Sep 17 00:00:00 2001 From: Tirthendu Sarkar Date: Wed, 19 Jul 2023 15:23:58 +0200 Subject: xsk: prepare 'options' in xdp_desc for multi-buffer use Use the 'options' field in xdp_desc as a packet continuity marker. Since 'options' field was unused till now and was expected to be set to 0, the 'eop' descriptor will have it set to 0, while the non-eop descriptors will have to set it to 1. This ensures legacy applications continue to work without needing any change for single-buffer packets. Add helper functions and extend xskq_prod_reserve_desc() to use the 'options' field. Signed-off-by: Tirthendu Sarkar Link: https://lore.kernel.org/r/20230719132421.584801-2-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov --- include/uapi/linux/if_xdp.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h index a78a8096f4ce..434f313dc26c 100644 --- a/include/uapi/linux/if_xdp.h +++ b/include/uapi/linux/if_xdp.h @@ -108,4 +108,11 @@ struct xdp_desc { /* UMEM descriptor is __u64 */ +/* Flag indicating that the packet continues with the buffer pointed out by the + * next frame in the ring. The end of the packet is signalled by setting this + * bit to zero. For single buffer packets, every descriptor has 'options' set + * to 0 and this maintains backward compatibility. + */ +#define XDP_PKT_CONTD (1 << 0) + #endif /* _LINUX_IF_XDP_H */ -- cgit v1.2.3 From 81470b5c3c6649eef8e5f282cd06793f788ae165 Mon Sep 17 00:00:00 2001 From: Tirthendu Sarkar Date: Wed, 19 Jul 2023 15:23:59 +0200 Subject: xsk: introduce XSK_USE_SG bind flag for xsk socket As of now xsk core drops any xdp_buff with data size greater than the xsk frame_size as set by the af_xdp application. With multi-buffer support introduced in the next patch xsk core can now split those buffers into multiple descriptors provided the af_xdp application can handle them. Such capability of the application needs to be independent of the xdp_prog's frag support capability since there are cases where even a single xdp_buffer may need to be split into multiple descriptors owing to a smaller xsk frame size. For e.g., with NIC rx_buffer size set to 4kB, a 3kB packet will constitute of a single buffer and so will be sent as such to AF_XDP layer irrespective of 'xdp.frags' capability of the XDP program. Now if the xsk frame size is set to 2kB by the AF_XDP application, then the packet will need to be split into 2 descriptors if AF_XDP application can handle multi-buffer, else it needs to be dropped. Applications can now advertise their frag handling capability to xsk core so that xsk core can decide if it should drop or split xdp_buffs that exceed xsk frame size. This is done using a new 'XSK_USE_SG' bind flag for the xdp socket. Signed-off-by: Tirthendu Sarkar Link: https://lore.kernel.org/r/20230719132421.584801-3-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov --- include/net/xdp_sock.h | 1 + include/uapi/linux/if_xdp.h | 6 ++++++ 2 files changed, 7 insertions(+) (limited to 'include') diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index e96a1151ec75..36b0411a0d1b 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -52,6 +52,7 @@ struct xdp_sock { struct xsk_buff_pool *pool; u16 queue_id; bool zc; + bool sg; enum { XSK_READY = 0, XSK_BOUND, diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h index 434f313dc26c..8d48863472b9 100644 --- a/include/uapi/linux/if_xdp.h +++ b/include/uapi/linux/if_xdp.h @@ -25,6 +25,12 @@ * application. */ #define XDP_USE_NEED_WAKEUP (1 << 3) +/* By setting this option, userspace application indicates that it can + * handle multiple descriptors per packet thus enabling AF_XDP to split + * multi-buffer XDP frames into multiple Rx descriptors. Without this set + * such frames will be dropped. + */ +#define XDP_USE_SG (1 << 4) /* Flags for xsk_umem_config flags */ #define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0) -- cgit v1.2.3 From b7f72a30e9ac2555b05afc6cfddc9dbc98e1eb8d Mon Sep 17 00:00:00 2001 From: Tirthendu Sarkar Date: Wed, 19 Jul 2023 15:24:03 +0200 Subject: xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path In Tx path, xsk core reserves space for each desc to be transmitted in the completion queue and it's address contained in it is stored in the skb destructor arg. After successful transmission the skb destructor submits the addr marking completion. To handle multiple descriptors per packet, now along with reserving space for each descriptor, the corresponding address is also stored in completion queue. The number of pending descriptors are stored in skb destructor arg and is used by the skb destructor to update completions. Introduce 'skb' in xdp_sock to store a partially built packet when __xsk_generic_xmit() must return before it sees the EOP descriptor for the current packet so that packet building can resume in next call of __xsk_generic_xmit(). Helper functions are introduced to set and get the pending descriptors in the skb destructor arg. Also, wrappers are introduced for storing descriptor addresses, submitting and cancelling (for unsuccessful transmissions) the number of completions. Signed-off-by: Tirthendu Sarkar Link: https://lore.kernel.org/r/20230719132421.584801-7-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov --- include/net/xdp_sock.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include') diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index 36b0411a0d1b..1617af380162 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -68,6 +68,12 @@ struct xdp_sock { u64 rx_dropped; u64 rx_queue_full; + /* When __xsk_generic_xmit() must return before it sees the EOP descriptor for the current + * packet, the partially built skb is saved here so that packet building can resume in next + * call of __xsk_generic_xmit(). + */ + struct sk_buff *skb; + struct list_head map_list; /* Protects map_list */ spinlock_t map_list_lock; -- cgit v1.2.3 From 1b725b0c8163cfd2d9ab22057df46b487981cfea Mon Sep 17 00:00:00 2001 From: Maciej Fijalkowski Date: Wed, 19 Jul 2023 15:24:04 +0200 Subject: xsk: allow core/drivers to test EOP bit Drivers are used to check for EOP bit whereas AF_XDP operates on inverted logic - user space indicates that current frag is not the last one and packet continues. For AF_XDP core needs, add xp_mb_desc() that will simply test XDP_PKT_CONTD from xdp_desc::options, but in order to preserve drivers default behavior, introduce an interface for ZC drivers that will negate xp_mb_desc() result and therefore make it easier to test EOP bit from during production of HW Tx descriptors. Signed-off-by: Maciej Fijalkowski Link: https://lore.kernel.org/r/20230719132421.584801-8-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov --- include/net/xdp_sock_drv.h | 10 ++++++++++ include/net/xsk_buff_pool.h | 5 +++++ 2 files changed, 15 insertions(+) (limited to 'include') diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index c243f906ebed..0d34cdb5567d 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -89,6 +89,11 @@ static inline struct xdp_buff *xsk_buff_alloc(struct xsk_buff_pool *pool) return xp_alloc(pool); } +static inline bool xsk_is_eop_desc(struct xdp_desc *desc) +{ + return !xp_mb_desc(desc); +} + /* Returns as many entries as possible up to max. 0 <= N <= max. */ static inline u32 xsk_buff_alloc_batch(struct xsk_buff_pool *pool, struct xdp_buff **xdp, u32 max) { @@ -241,6 +246,11 @@ static inline struct xdp_buff *xsk_buff_alloc(struct xsk_buff_pool *pool) return NULL; } +static inline bool xsk_is_eop_desc(struct xdp_desc *desc) +{ + return false; +} + static inline u32 xsk_buff_alloc_batch(struct xsk_buff_pool *pool, struct xdp_buff **xdp, u32 max) { return 0; diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index a8d7b8a3688a..4dcca163e076 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -184,6 +184,11 @@ static inline bool xp_desc_crosses_non_contig_pg(struct xsk_buff_pool *pool, !(pool->dma_pages[addr >> PAGE_SHIFT] & XSK_NEXT_PG_CONTIG_MASK); } +static inline bool xp_mb_desc(struct xdp_desc *desc) +{ + return desc->options & XDP_PKT_CONTD; +} + static inline u64 xp_aligned_extract_addr(struct xsk_buff_pool *pool, u64 addr) { return addr & pool->chunk_mask; -- cgit v1.2.3 From 13ce2daa259a3bfbc9a5aeeee8b9a87058703731 Mon Sep 17 00:00:00 2001 From: Maciej Fijalkowski Date: Wed, 19 Jul 2023 15:24:07 +0200 Subject: xsk: add new netlink attribute dedicated for ZC max frags Introduce new netlink attribute NETDEV_A_DEV_XDP_ZC_MAX_SEGS that will carry maximum fragments that underlying ZC driver is able to handle on TX side. It is going to be included in netlink response only when driver supports ZC. Any value higher than 1 implies multi-buffer ZC support on underlying device. Signed-off-by: Maciej Fijalkowski Link: https://lore.kernel.org/r/20230719132421.584801-11-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov --- include/linux/netdevice.h | 1 + include/uapi/linux/netdev.h | 1 + 2 files changed, 2 insertions(+) (limited to 'include') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index b828c7a75be2..b12477ea4032 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2250,6 +2250,7 @@ struct net_device { #define GRO_MAX_SIZE (8 * 65535u) unsigned int gro_max_size; unsigned int gro_ipv4_max_size; + unsigned int xdp_zc_max_segs; rx_handler_func_t __rcu *rx_handler; void __rcu *rx_handler_data; diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h index 639524b59930..bf71698a1e82 100644 --- a/include/uapi/linux/netdev.h +++ b/include/uapi/linux/netdev.h @@ -41,6 +41,7 @@ enum { NETDEV_A_DEV_IFINDEX = 1, NETDEV_A_DEV_PAD, NETDEV_A_DEV_XDP_FEATURES, + NETDEV_A_DEV_XDP_ZC_MAX_SEGS, __NETDEV_A_DEV_MAX, NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1) -- cgit v1.2.3 From 24ea50127ecf0efe819c1f6230add27abc6ca9d9 Mon Sep 17 00:00:00 2001 From: Maciej Fijalkowski Date: Wed, 19 Jul 2023 15:24:08 +0200 Subject: xsk: support mbuf on ZC RX Given that skb_shared_info relies on skb_frag_t, in order to support xskb chaining, introduce xdp_buff_xsk::xskb_list_node and xsk_buff_pool::xskb_list. This is needed so ZC drivers can add frags as xskb nodes which will make it possible to handle it both when producing AF_XDP Rx descriptors as well as freeing/recycling all the frags that a single frame carries. Speaking of latter, update xsk_buff_free() to take care of list nodes. For the former (adding as frags), introduce xsk_buff_add_frag() for ZC drivers usage that is going to be used to add a frag to xskb list from pool. xsk_buff_get_frag() will be utilized by XDP_TX and, on contrary, will return xdp_buff. One of the previous patches added a wrapper for ZC Rx so implement xskb list walk and production of Rx descriptors there. On bind() path, bail out if socket wants to use ZC multi-buffer but underlying netdev does not support it. Signed-off-by: Maciej Fijalkowski Link: https://lore.kernel.org/r/20230719132421.584801-12-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov --- include/net/xdp_sock_drv.h | 44 ++++++++++++++++++++++++++++++++++++++++++++ include/net/xsk_buff_pool.h | 2 ++ 2 files changed, 46 insertions(+) (limited to 'include') diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index 0d34cdb5567d..1f6fc8c7a84c 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -108,10 +108,45 @@ static inline bool xsk_buff_can_alloc(struct xsk_buff_pool *pool, u32 count) static inline void xsk_buff_free(struct xdp_buff *xdp) { struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp); + struct list_head *xskb_list = &xskb->pool->xskb_list; + struct xdp_buff_xsk *pos, *tmp; + if (likely(!xdp_buff_has_frags(xdp))) + goto out; + + list_for_each_entry_safe(pos, tmp, xskb_list, xskb_list_node) { + list_del(&pos->xskb_list_node); + xp_free(pos); + } + + xdp_get_shared_info_from_buff(xdp)->nr_frags = 0; +out: xp_free(xskb); } +static inline void xsk_buff_add_frag(struct xdp_buff *xdp) +{ + struct xdp_buff_xsk *frag = container_of(xdp, struct xdp_buff_xsk, xdp); + + list_add_tail(&frag->xskb_list_node, &frag->pool->xskb_list); +} + +static inline struct xdp_buff *xsk_buff_get_frag(struct xdp_buff *first) +{ + struct xdp_buff_xsk *xskb = container_of(first, struct xdp_buff_xsk, xdp); + struct xdp_buff *ret = NULL; + struct xdp_buff_xsk *frag; + + frag = list_first_entry_or_null(&xskb->pool->xskb_list, + struct xdp_buff_xsk, xskb_list_node); + if (frag) { + list_del(&frag->xskb_list_node); + ret = &frag->xdp; + } + + return ret; +} + static inline void xsk_buff_set_size(struct xdp_buff *xdp, u32 size) { xdp->data = xdp->data_hard_start + XDP_PACKET_HEADROOM; @@ -265,6 +300,15 @@ static inline void xsk_buff_free(struct xdp_buff *xdp) { } +static inline void xsk_buff_add_frag(struct xdp_buff *xdp) +{ +} + +static inline struct xdp_buff *xsk_buff_get_frag(struct xdp_buff *first) +{ + return NULL; +} + static inline void xsk_buff_set_size(struct xdp_buff *xdp, u32 size) { } diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index 4dcca163e076..b0bdff26fc88 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -29,6 +29,7 @@ struct xdp_buff_xsk { struct xsk_buff_pool *pool; u64 orig_addr; struct list_head free_list_node; + struct list_head xskb_list_node; }; #define XSK_CHECK_PRIV_TYPE(t) BUILD_BUG_ON(sizeof(t) > offsetofend(struct xdp_buff_xsk, cb)) @@ -54,6 +55,7 @@ struct xsk_buff_pool { struct xdp_umem *umem; struct work_struct work; struct list_head free_list; + struct list_head xskb_list; u32 heads_cnt; u16 queue_id; -- cgit v1.2.3 From 053c8e1f235dc3f69d13375b32f4209228e1cb96 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Wed, 19 Jul 2023 16:08:51 +0200 Subject: bpf: Add generic attach/detach/query API for multi-progs This adds a generic layer called bpf_mprog which can be reused by different attachment layers to enable multi-program attachment and dependency resolution. In-kernel users of the bpf_mprog don't need to care about the dependency resolution internals, they can just consume it with few API calls. The initial idea of having a generic API sparked out of discussion [0] from an earlier revision of this work where tc's priority was reused and exposed via BPF uapi as a way to coordinate dependencies among tc BPF programs, similar as-is for classic tc BPF. The feedback was that priority provides a bad user experience and is hard to use [1], e.g.: I cannot help but feel that priority logic copy-paste from old tc, netfilter and friends is done because "that's how things were done in the past". [...] Priority gets exposed everywhere in uapi all the way to bpftool when it's right there for users to understand. And that's the main problem with it. The user don't want to and don't need to be aware of it, but uapi forces them to pick the priority. [...] Your cover letter [0] example proves that in real life different service pick the same priority. They simply don't know any better. Priority is an unnecessary magic that apps _have_ to pick, so they just copy-paste and everyone ends up using the same. The course of the discussion showed more and more the need for a generic, reusable API where the "same look and feel" can be applied for various other program types beyond just tc BPF, for example XDP today does not have multi- program support in kernel, but also there was interest around this API for improving management of cgroup program types. Such common multi-program management concept is useful for BPF management daemons or user space BPF applications coordinating internally about their attachments. Both from Cilium and Meta side [2], we've collected the following requirements for a generic attach/detach/query API for multi-progs which has been implemented as part of this work: - Support prog-based attach/detach and link API - Dependency directives (can also be combined): - BPF_F_{BEFORE,AFTER} with relative_{fd,id} which can be {prog,link,none} - BPF_F_ID flag as {fd,id} toggle; the rationale for id is so that user space application does not need CAP_SYS_ADMIN to retrieve foreign fds via bpf_*_get_fd_by_id() - BPF_F_LINK flag as {prog,link} toggle - If relative_{fd,id} is none, then BPF_F_BEFORE will just prepend, and BPF_F_AFTER will just append for attaching - Enforced only at attach time - BPF_F_REPLACE with replace_bpf_fd which can be prog, links have their own infra for replacing their internal prog - If no flags are set, then it's default append behavior for attaching - Internal revision counter and optionally being able to pass expected_revision - User space application can query current state with revision, and pass it along for attachment to assert current state before doing updates - Query also gets extension for link_ids array and link_attach_flags: - prog_ids are always filled with program IDs - link_ids are filled with link IDs when link was used, otherwise 0 - {prog,link}_attach_flags for holding {prog,link}-specific flags - Must be easy to integrate/reuse for in-kernel users The uapi-side changes needed for supporting bpf_mprog are rather minimal, consisting of the additions of the attachment flags, revision counter, and expanding existing union with relative_{fd,id} member. The bpf_mprog framework consists of an bpf_mprog_entry object which holds an array of bpf_mprog_fp (fast-path structure). The bpf_mprog_cp (control-path structure) is part of bpf_mprog_bundle. Both have been separated, so that fast-path gets efficient packing of bpf_prog pointers for maximum cache efficiency. Also, array has been chosen instead of linked list or other structures to remove unnecessary indirections for a fast point-to-entry in tc for BPF. The bpf_mprog_entry comes as a pair via bpf_mprog_bundle so that in case of updates the peer bpf_mprog_entry is populated and then just swapped which avoids additional allocations that could otherwise fail, for example, in detach case. bpf_mprog_{fp,cp} arrays are currently static, but they could be converted to dynamic allocation if necessary at a point in future. Locking is deferred to the in-kernel user of bpf_mprog, for example, in case of tcx which uses this API in the next patch, it piggybacks on rtnl. An extensive test suite for checking all aspects of this API for prog-based attach/detach and link API comes as BPF selftests in this series. Thanks also to Andrii Nakryiko for early API discussions wrt Meta's BPF prog management. [0] https://lore.kernel.org/bpf/20221004231143.19190-1-daniel@iogearbox.net [1] https://lore.kernel.org/bpf/CAADnVQ+gEY3FjCR=+DmjDR4gp5bOYZUFJQXj4agKFHT9CQPZBw@mail.gmail.com [2] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/r/20230719140858.13224-2-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov --- include/linux/bpf_mprog.h | 318 ++++++++++++++++++++++++++++++++++++++++++++++ include/uapi/linux/bpf.h | 36 ++++-- 2 files changed, 346 insertions(+), 8 deletions(-) create mode 100644 include/linux/bpf_mprog.h (limited to 'include') diff --git a/include/linux/bpf_mprog.h b/include/linux/bpf_mprog.h new file mode 100644 index 000000000000..6feefec43422 --- /dev/null +++ b/include/linux/bpf_mprog.h @@ -0,0 +1,318 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright (c) 2023 Isovalent */ +#ifndef __BPF_MPROG_H +#define __BPF_MPROG_H + +#include + +/* bpf_mprog framework: + * + * bpf_mprog is a generic layer for multi-program attachment. In-kernel users + * of the bpf_mprog don't need to care about the dependency resolution + * internals, they can just consume it with few API calls. Currently available + * dependency directives are BPF_F_{BEFORE,AFTER} which enable insertion of + * a BPF program or BPF link relative to an existing BPF program or BPF link + * inside the multi-program array as well as prepend and append behavior if + * no relative object was specified, see corresponding selftests for concrete + * examples (e.g. tc_links and tc_opts test cases of test_progs). + * + * Usage of bpf_mprog_{attach,detach,query}() core APIs with pseudo code: + * + * Attach case: + * + * struct bpf_mprog_entry *entry, *entry_new; + * int ret; + * + * // bpf_mprog user-side lock + * // fetch active @entry from attach location + * [...] + * ret = bpf_mprog_attach(entry, &entry_new, [...]); + * if (!ret) { + * if (entry != entry_new) { + * // swap @entry to @entry_new at attach location + * // ensure there are no inflight users of @entry: + * synchronize_rcu(); + * } + * bpf_mprog_commit(entry); + * } else { + * // error path, bail out, propagate @ret + * } + * // bpf_mprog user-side unlock + * + * Detach case: + * + * struct bpf_mprog_entry *entry, *entry_new; + * int ret; + * + * // bpf_mprog user-side lock + * // fetch active @entry from attach location + * [...] + * ret = bpf_mprog_detach(entry, &entry_new, [...]); + * if (!ret) { + * // all (*) marked is optional and depends on the use-case + * // whether bpf_mprog_bundle should be freed or not + * if (!bpf_mprog_total(entry_new)) (*) + * entry_new = NULL (*) + * // swap @entry to @entry_new at attach location + * // ensure there are no inflight users of @entry: + * synchronize_rcu(); + * bpf_mprog_commit(entry); + * if (!entry_new) (*) + * // free bpf_mprog_bundle (*) + * } else { + * // error path, bail out, propagate @ret + * } + * // bpf_mprog user-side unlock + * + * Query case: + * + * struct bpf_mprog_entry *entry; + * int ret; + * + * // bpf_mprog user-side lock + * // fetch active @entry from attach location + * [...] + * ret = bpf_mprog_query(attr, uattr, entry); + * // bpf_mprog user-side unlock + * + * Data/fast path: + * + * struct bpf_mprog_entry *entry; + * struct bpf_mprog_fp *fp; + * struct bpf_prog *prog; + * int ret = [...]; + * + * rcu_read_lock(); + * // fetch active @entry from attach location + * [...] + * bpf_mprog_foreach_prog(entry, fp, prog) { + * ret = bpf_prog_run(prog, [...]); + * // process @ret from program + * } + * [...] + * rcu_read_unlock(); + * + * bpf_mprog locking considerations: + * + * bpf_mprog_{attach,detach,query}() must be protected by an external lock + * (like RTNL in case of tcx). + * + * bpf_mprog_entry pointer can be an __rcu annotated pointer (in case of tcx + * the netdevice has tcx_ingress and tcx_egress __rcu pointer) which gets + * updated via rcu_assign_pointer() pointing to the active bpf_mprog_entry of + * the bpf_mprog_bundle. + * + * Fast path accesses the active bpf_mprog_entry within RCU critical section + * (in case of tcx it runs in NAPI which provides RCU protection there, + * other users might need explicit rcu_read_lock()). The bpf_mprog_commit() + * assumes that for the old bpf_mprog_entry there are no inflight users + * anymore. + * + * The READ_ONCE()/WRITE_ONCE() pairing for bpf_mprog_fp's prog access is for + * the replacement case where we don't swap the bpf_mprog_entry. + */ + +#define bpf_mprog_foreach_tuple(entry, fp, cp, t) \ + for (fp = &entry->fp_items[0], cp = &entry->parent->cp_items[0];\ + ({ \ + t.prog = READ_ONCE(fp->prog); \ + t.link = cp->link; \ + t.prog; \ + }); \ + fp++, cp++) + +#define bpf_mprog_foreach_prog(entry, fp, p) \ + for (fp = &entry->fp_items[0]; \ + (p = READ_ONCE(fp->prog)); \ + fp++) + +#define BPF_MPROG_MAX 64 + +struct bpf_mprog_fp { + struct bpf_prog *prog; +}; + +struct bpf_mprog_cp { + struct bpf_link *link; +}; + +struct bpf_mprog_entry { + struct bpf_mprog_fp fp_items[BPF_MPROG_MAX]; + struct bpf_mprog_bundle *parent; +}; + +struct bpf_mprog_bundle { + struct bpf_mprog_entry a; + struct bpf_mprog_entry b; + struct bpf_mprog_cp cp_items[BPF_MPROG_MAX]; + struct bpf_prog *ref; + atomic64_t revision; + u32 count; +}; + +struct bpf_tuple { + struct bpf_prog *prog; + struct bpf_link *link; +}; + +static inline struct bpf_mprog_entry * +bpf_mprog_peer(const struct bpf_mprog_entry *entry) +{ + if (entry == &entry->parent->a) + return &entry->parent->b; + else + return &entry->parent->a; +} + +static inline void bpf_mprog_bundle_init(struct bpf_mprog_bundle *bundle) +{ + BUILD_BUG_ON(sizeof(bundle->a.fp_items[0]) > sizeof(u64)); + BUILD_BUG_ON(ARRAY_SIZE(bundle->a.fp_items) != + ARRAY_SIZE(bundle->cp_items)); + + memset(bundle, 0, sizeof(*bundle)); + atomic64_set(&bundle->revision, 1); + bundle->a.parent = bundle; + bundle->b.parent = bundle; +} + +static inline void bpf_mprog_inc(struct bpf_mprog_entry *entry) +{ + entry->parent->count++; +} + +static inline void bpf_mprog_dec(struct bpf_mprog_entry *entry) +{ + entry->parent->count--; +} + +static inline int bpf_mprog_max(void) +{ + return ARRAY_SIZE(((struct bpf_mprog_entry *)NULL)->fp_items) - 1; +} + +static inline int bpf_mprog_total(struct bpf_mprog_entry *entry) +{ + int total = entry->parent->count; + + WARN_ON_ONCE(total > bpf_mprog_max()); + return total; +} + +static inline bool bpf_mprog_exists(struct bpf_mprog_entry *entry, + struct bpf_prog *prog) +{ + const struct bpf_mprog_fp *fp; + const struct bpf_prog *tmp; + + bpf_mprog_foreach_prog(entry, fp, tmp) { + if (tmp == prog) + return true; + } + return false; +} + +static inline void bpf_mprog_mark_for_release(struct bpf_mprog_entry *entry, + struct bpf_tuple *tuple) +{ + WARN_ON_ONCE(entry->parent->ref); + if (!tuple->link) + entry->parent->ref = tuple->prog; +} + +static inline void bpf_mprog_complete_release(struct bpf_mprog_entry *entry) +{ + /* In the non-link case prog deletions can only drop the reference + * to the prog after the bpf_mprog_entry got swapped and the + * bpf_mprog ensured that there are no inflight users anymore. + * + * Paired with bpf_mprog_mark_for_release(). + */ + if (entry->parent->ref) { + bpf_prog_put(entry->parent->ref); + entry->parent->ref = NULL; + } +} + +static inline void bpf_mprog_revision_new(struct bpf_mprog_entry *entry) +{ + atomic64_inc(&entry->parent->revision); +} + +static inline void bpf_mprog_commit(struct bpf_mprog_entry *entry) +{ + bpf_mprog_complete_release(entry); + bpf_mprog_revision_new(entry); +} + +static inline u64 bpf_mprog_revision(struct bpf_mprog_entry *entry) +{ + return atomic64_read(&entry->parent->revision); +} + +static inline void bpf_mprog_entry_copy(struct bpf_mprog_entry *dst, + struct bpf_mprog_entry *src) +{ + memcpy(dst->fp_items, src->fp_items, sizeof(src->fp_items)); +} + +static inline void bpf_mprog_entry_grow(struct bpf_mprog_entry *entry, int idx) +{ + int total = bpf_mprog_total(entry); + + memmove(entry->fp_items + idx + 1, + entry->fp_items + idx, + (total - idx) * sizeof(struct bpf_mprog_fp)); + + memmove(entry->parent->cp_items + idx + 1, + entry->parent->cp_items + idx, + (total - idx) * sizeof(struct bpf_mprog_cp)); +} + +static inline void bpf_mprog_entry_shrink(struct bpf_mprog_entry *entry, int idx) +{ + /* Total array size is needed in this case to enure the NULL + * entry is copied at the end. + */ + int total = ARRAY_SIZE(entry->fp_items); + + memmove(entry->fp_items + idx, + entry->fp_items + idx + 1, + (total - idx - 1) * sizeof(struct bpf_mprog_fp)); + + memmove(entry->parent->cp_items + idx, + entry->parent->cp_items + idx + 1, + (total - idx - 1) * sizeof(struct bpf_mprog_cp)); +} + +static inline void bpf_mprog_read(struct bpf_mprog_entry *entry, u32 idx, + struct bpf_mprog_fp **fp, + struct bpf_mprog_cp **cp) +{ + *fp = &entry->fp_items[idx]; + *cp = &entry->parent->cp_items[idx]; +} + +static inline void bpf_mprog_write(struct bpf_mprog_fp *fp, + struct bpf_mprog_cp *cp, + struct bpf_tuple *tuple) +{ + WRITE_ONCE(fp->prog, tuple->prog); + cp->link = tuple->link; +} + +int bpf_mprog_attach(struct bpf_mprog_entry *entry, + struct bpf_mprog_entry **entry_new, + struct bpf_prog *prog_new, struct bpf_link *link, + struct bpf_prog *prog_old, + u32 flags, u32 id_or_fd, u64 revision); + +int bpf_mprog_detach(struct bpf_mprog_entry *entry, + struct bpf_mprog_entry **entry_new, + struct bpf_prog *prog, struct bpf_link *link, + u32 flags, u32 id_or_fd, u64 revision); + +int bpf_mprog_query(const union bpf_attr *attr, union bpf_attr __user *uattr, + struct bpf_mprog_entry *entry); + +#endif /* __BPF_MPROG_H */ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 9ed59896ebc5..d4c07e435336 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1113,7 +1113,12 @@ enum bpf_perf_event_type { */ #define BPF_F_ALLOW_OVERRIDE (1U << 0) #define BPF_F_ALLOW_MULTI (1U << 1) +/* Generic attachment flags. */ #define BPF_F_REPLACE (1U << 2) +#define BPF_F_BEFORE (1U << 3) +#define BPF_F_AFTER (1U << 4) +#define BPF_F_ID (1U << 5) +#define BPF_F_LINK BPF_F_LINK /* 1 << 13 */ /* If BPF_F_STRICT_ALIGNMENT is used in BPF_PROG_LOAD command, the * verifier will perform strict alignment checking as if the kernel @@ -1444,14 +1449,19 @@ union bpf_attr { }; struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */ - __u32 target_fd; /* container object to attach to */ - __u32 attach_bpf_fd; /* eBPF program to attach */ + union { + __u32 target_fd; /* target object to attach to or ... */ + __u32 target_ifindex; /* target ifindex */ + }; + __u32 attach_bpf_fd; __u32 attach_type; __u32 attach_flags; - __u32 replace_bpf_fd; /* previously attached eBPF - * program to replace if - * BPF_F_REPLACE is used - */ + __u32 replace_bpf_fd; + union { + __u32 relative_fd; + __u32 relative_id; + }; + __u64 expected_revision; }; struct { /* anonymous struct used by BPF_PROG_TEST_RUN command */ @@ -1497,16 +1507,26 @@ union bpf_attr { } info; struct { /* anonymous struct used by BPF_PROG_QUERY command */ - __u32 target_fd; /* container object to query */ + union { + __u32 target_fd; /* target object to query or ... */ + __u32 target_ifindex; /* target ifindex */ + }; __u32 attach_type; __u32 query_flags; __u32 attach_flags; __aligned_u64 prog_ids; - __u32 prog_cnt; + union { + __u32 prog_cnt; + __u32 count; + }; + __u32 :32; /* output: per-program attach_flags. * not allowed to be set during effective query. */ __aligned_u64 prog_attach_flags; + __aligned_u64 link_ids; + __aligned_u64 link_attach_flags; + __u64 revision; } query; struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */ -- cgit v1.2.3 From e420bed025071a623d2720a92bc2245c84757ecb Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Wed, 19 Jul 2023 16:08:52 +0200 Subject: bpf: Add fd-based tcx multi-prog infra with link support This work refactors and adds a lightweight extension ("tcx") to the tc BPF ingress and egress data path side for allowing BPF program management based on fds via bpf() syscall through the newly added generic multi-prog API. The main goal behind this work which we also presented at LPC [0] last year and a recent update at LSF/MM/BPF this year [3] is to support long-awaited BPF link functionality for tc BPF programs, which allows for a model of safe ownership and program detachment. Given the rise in tc BPF users in cloud native environments, this becomes necessary to avoid hard to debug incidents either through stale leftover programs or 3rd party applications accidentally stepping on each others toes. As a recap, a BPF link represents the attachment of a BPF program to a BPF hook point. The BPF link holds a single reference to keep BPF program alive. Moreover, hook points do not reference a BPF link, only the application's fd or pinning does. A BPF link holds meta-data specific to attachment and implements operations for link creation, (atomic) BPF program update, detachment and introspection. The motivation for BPF links for tc BPF programs is multi-fold, for example: - From Meta: "It's especially important for applications that are deployed fleet-wide and that don't "control" hosts they are deployed to. If such application crashes and no one notices and does anything about that, BPF program will keep running draining resources or even just, say, dropping packets. We at FB had outages due to such permanent BPF attachment semantics. With fd-based BPF link we are getting a framework, which allows safe, auto-detachable behavior by default, unless application explicitly opts in by pinning the BPF link." [1] - From Cilium-side the tc BPF programs we attach to host-facing veth devices and phys devices build the core datapath for Kubernetes Pods, and they implement forwarding, load-balancing, policy, EDT-management, etc, within BPF. Currently there is no concept of 'safe' ownership, e.g. we've recently experienced hard-to-debug issues in a user's staging environment where another Kubernetes application using tc BPF attached to the same prio/handle of cls_bpf, accidentally wiping all Cilium-based BPF programs from underneath it. The goal is to establish a clear/safe ownership model via links which cannot accidentally be overridden. [0,2] BPF links for tc can co-exist with non-link attachments, and the semantics are in line also with XDP links: BPF links cannot replace other BPF links, BPF links cannot replace non-BPF links, non-BPF links cannot replace BPF links and lastly only non-BPF links can replace non-BPF links. In case of Cilium, this would solve mentioned issue of safe ownership model as 3rd party applications would not be able to accidentally wipe Cilium programs, even if they are not BPF link aware. Earlier attempts [4] have tried to integrate BPF links into core tc machinery to solve cls_bpf, which has been intrusive to the generic tc kernel API with extensions only specific to cls_bpf and suboptimal/complex since cls_bpf could be wiped from the qdisc also. Locking a tc BPF program in place this way, is getting into layering hacks given the two object models are vastly different. We instead implemented the tcx (tc 'express') layer which is an fd-based tc BPF attach API, so that the BPF link implementation blends in naturally similar to other link types which are fd-based and without the need for changing core tc internal APIs. BPF programs for tc can then be successively migrated from classic cls_bpf to the new tc BPF link without needing to change the program's source code, just the BPF loader mechanics for attaching is sufficient. For the current tc framework, there is no change in behavior with this change and neither does this change touch on tc core kernel APIs. The gist of this patch is that the ingress and egress hook have a lightweight, qdisc-less extension for BPF to attach its tc BPF programs, in other words, a minimal entry point for tc BPF. The name tcx has been suggested from discussion of earlier revisions of this work as a good fit, and to more easily differ between the classic cls_bpf attachment and the fd-based one. For the ingress and egress tcx points, the device holds a cache-friendly array with program pointers which is separated from control plane (slow-path) data. Earlier versions of this work used priority to determine ordering and expression of dependencies similar as with classic tc, but it was challenged that for something more future-proof a better user experience is required. Hence this resulted in the design and development of the generic attach/detach/query API for multi-progs. See prior patch with its discussion on the API design. tcx is the first user and later we plan to integrate also others, for example, one candidate is multi-prog support for XDP which would benefit and have the same 'look and feel' from API perspective. The goal with tcx is to have maximum compatibility to existing tc BPF programs, so they don't need to be rewritten specifically. Compatibility to call into classic tcf_classify() is also provided in order to allow successive migration or both to cleanly co-exist where needed given its all one logical tc layer and the tcx plus classic tc cls/act build one logical overall processing pipeline. tcx supports the simplified return codes TCX_NEXT which is non-terminating (go to next program) and terminating ones with TCX_PASS, TCX_DROP, TCX_REDIRECT. The fd-based API is behind a static key, so that when unused the code is also not entered. The struct tcx_entry's program array is currently static, but could be made dynamic if necessary at a point in future. The a/b pair swap design has been chosen so that for detachment there are no allocations which otherwise could fail. The work has been tested with tc-testing selftest suite which all passes, as well as the tc BPF tests from the BPF CI, and also with Cilium's L4LB. Thanks also to Nikolay Aleksandrov and Martin Lau for in-depth early reviews of this work. [0] https://lpc.events/event/16/contributions/1353/ [1] https://lore.kernel.org/bpf/CAEf4BzbokCJN33Nw_kg82sO=xppXnKWEncGTWCTB9vGCmLB6pw@mail.gmail.com [2] https://colocatedeventseu2023.sched.com/event/1Jo6O/tales-from-an-ebpf-programs-murder-mystery-hemanth-malla-guillaume-fournier-datadog [3] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf [4] https://lore.kernel.org/bpf/20210604063116.234316-1-memxor@gmail.com Signed-off-by: Daniel Borkmann Acked-by: Jakub Kicinski Link: https://lore.kernel.org/r/20230719140858.13224-3-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov --- include/linux/bpf_mprog.h | 9 ++ include/linux/netdevice.h | 15 ++-- include/linux/skbuff.h | 4 +- include/net/sch_generic.h | 2 +- include/net/tcx.h | 206 ++++++++++++++++++++++++++++++++++++++++++++++ include/uapi/linux/bpf.h | 34 +++++++- 6 files changed, 254 insertions(+), 16 deletions(-) create mode 100644 include/net/tcx.h (limited to 'include') diff --git a/include/linux/bpf_mprog.h b/include/linux/bpf_mprog.h index 6feefec43422..2b429488f840 100644 --- a/include/linux/bpf_mprog.h +++ b/include/linux/bpf_mprog.h @@ -315,4 +315,13 @@ int bpf_mprog_detach(struct bpf_mprog_entry *entry, int bpf_mprog_query(const union bpf_attr *attr, union bpf_attr __user *uattr, struct bpf_mprog_entry *entry); +static inline bool bpf_mprog_supported(enum bpf_prog_type type) +{ + switch (type) { + case BPF_PROG_TYPE_SCHED_CLS: + return true; + default: + return false; + } +} #endif /* __BPF_MPROG_H */ diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index b12477ea4032..3800d0479698 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1930,8 +1930,7 @@ enum netdev_ml_priv_type { * * @rx_handler: handler for received packets * @rx_handler_data: XXX: need comments on this one - * @miniq_ingress: ingress/clsact qdisc specific data for - * ingress processing + * @tcx_ingress: BPF & clsact qdisc specific data for ingress processing * @ingress_queue: XXX: need comments on this one * @nf_hooks_ingress: netfilter hooks executed for ingress packets * @broadcast: hw bcast address @@ -1952,8 +1951,7 @@ enum netdev_ml_priv_type { * @xps_maps: all CPUs/RXQs maps for XPS device * * @xps_maps: XXX: need comments on this one - * @miniq_egress: clsact qdisc specific data for - * egress processing + * @tcx_egress: BPF & clsact qdisc specific data for egress processing * @nf_hooks_egress: netfilter hooks executed for egress packets * @qdisc_hash: qdisc hash table * @watchdog_timeo: Represents the timeout that is used by @@ -2253,9 +2251,8 @@ struct net_device { unsigned int xdp_zc_max_segs; rx_handler_func_t __rcu *rx_handler; void __rcu *rx_handler_data; - -#ifdef CONFIG_NET_CLS_ACT - struct mini_Qdisc __rcu *miniq_ingress; +#ifdef CONFIG_NET_XGRESS + struct bpf_mprog_entry __rcu *tcx_ingress; #endif struct netdev_queue __rcu *ingress_queue; #ifdef CONFIG_NETFILTER_INGRESS @@ -2283,8 +2280,8 @@ struct net_device { #ifdef CONFIG_XPS struct xps_dev_maps __rcu *xps_maps[XPS_MAPS_MAX]; #endif -#ifdef CONFIG_NET_CLS_ACT - struct mini_Qdisc __rcu *miniq_egress; +#ifdef CONFIG_NET_XGRESS + struct bpf_mprog_entry __rcu *tcx_egress; #endif #ifdef CONFIG_NETFILTER_EGRESS struct nf_hook_entries __rcu *nf_hooks_egress; diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 91ed66952580..ed83f1c5fc1f 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -944,7 +944,7 @@ struct sk_buff { __u8 __mono_tc_offset[0]; /* public: */ __u8 mono_delivery_time:1; /* See SKB_MONO_DELIVERY_TIME_MASK */ -#ifdef CONFIG_NET_CLS_ACT +#ifdef CONFIG_NET_XGRESS __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */ __u8 tc_skip_classify:1; #endif @@ -993,7 +993,7 @@ struct sk_buff { __u8 csum_not_inet:1; #endif -#ifdef CONFIG_NET_SCHED +#if defined(CONFIG_NET_SCHED) || defined(CONFIG_NET_XGRESS) __u16 tc_index; /* traffic control index */ #endif diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index e92f73bb3198..15be2d96b06d 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -703,7 +703,7 @@ int skb_do_redirect(struct sk_buff *); static inline bool skb_at_tc_ingress(const struct sk_buff *skb) { -#ifdef CONFIG_NET_CLS_ACT +#ifdef CONFIG_NET_XGRESS return skb->tc_at_ingress; #else return false; diff --git a/include/net/tcx.h b/include/net/tcx.h new file mode 100644 index 000000000000..264f147953ba --- /dev/null +++ b/include/net/tcx.h @@ -0,0 +1,206 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright (c) 2023 Isovalent */ +#ifndef __NET_TCX_H +#define __NET_TCX_H + +#include +#include + +#include + +struct mini_Qdisc; + +struct tcx_entry { + struct mini_Qdisc __rcu *miniq; + struct bpf_mprog_bundle bundle; + bool miniq_active; + struct rcu_head rcu; +}; + +struct tcx_link { + struct bpf_link link; + struct net_device *dev; + u32 location; +}; + +static inline void tcx_set_ingress(struct sk_buff *skb, bool ingress) +{ +#ifdef CONFIG_NET_XGRESS + skb->tc_at_ingress = ingress; +#endif +} + +#ifdef CONFIG_NET_XGRESS +static inline struct tcx_entry *tcx_entry(struct bpf_mprog_entry *entry) +{ + struct bpf_mprog_bundle *bundle = entry->parent; + + return container_of(bundle, struct tcx_entry, bundle); +} + +static inline struct tcx_link *tcx_link(struct bpf_link *link) +{ + return container_of(link, struct tcx_link, link); +} + +static inline const struct tcx_link *tcx_link_const(const struct bpf_link *link) +{ + return tcx_link((struct bpf_link *)link); +} + +void tcx_inc(void); +void tcx_dec(void); + +static inline void tcx_entry_sync(void) +{ + /* bpf_mprog_entry got a/b swapped, therefore ensure that + * there are no inflight users on the old one anymore. + */ + synchronize_rcu(); +} + +static inline void +tcx_entry_update(struct net_device *dev, struct bpf_mprog_entry *entry, + bool ingress) +{ + ASSERT_RTNL(); + if (ingress) + rcu_assign_pointer(dev->tcx_ingress, entry); + else + rcu_assign_pointer(dev->tcx_egress, entry); +} + +static inline struct bpf_mprog_entry * +tcx_entry_fetch(struct net_device *dev, bool ingress) +{ + ASSERT_RTNL(); + if (ingress) + return rcu_dereference_rtnl(dev->tcx_ingress); + else + return rcu_dereference_rtnl(dev->tcx_egress); +} + +static inline struct bpf_mprog_entry *tcx_entry_create(void) +{ + struct tcx_entry *tcx = kzalloc(sizeof(*tcx), GFP_KERNEL); + + if (tcx) { + bpf_mprog_bundle_init(&tcx->bundle); + return &tcx->bundle.a; + } + return NULL; +} + +static inline void tcx_entry_free(struct bpf_mprog_entry *entry) +{ + kfree_rcu(tcx_entry(entry), rcu); +} + +static inline struct bpf_mprog_entry * +tcx_entry_fetch_or_create(struct net_device *dev, bool ingress, bool *created) +{ + struct bpf_mprog_entry *entry = tcx_entry_fetch(dev, ingress); + + *created = false; + if (!entry) { + entry = tcx_entry_create(); + if (!entry) + return NULL; + *created = true; + } + return entry; +} + +static inline void tcx_skeys_inc(bool ingress) +{ + tcx_inc(); + if (ingress) + net_inc_ingress_queue(); + else + net_inc_egress_queue(); +} + +static inline void tcx_skeys_dec(bool ingress) +{ + if (ingress) + net_dec_ingress_queue(); + else + net_dec_egress_queue(); + tcx_dec(); +} + +static inline void tcx_miniq_set_active(struct bpf_mprog_entry *entry, + const bool active) +{ + ASSERT_RTNL(); + tcx_entry(entry)->miniq_active = active; +} + +static inline bool tcx_entry_is_active(struct bpf_mprog_entry *entry) +{ + ASSERT_RTNL(); + return bpf_mprog_total(entry) || tcx_entry(entry)->miniq_active; +} + +static inline enum tcx_action_base tcx_action_code(struct sk_buff *skb, + int code) +{ + switch (code) { + case TCX_PASS: + skb->tc_index = qdisc_skb_cb(skb)->tc_classid; + fallthrough; + case TCX_DROP: + case TCX_REDIRECT: + return code; + case TCX_NEXT: + default: + return TCX_NEXT; + } +} +#endif /* CONFIG_NET_XGRESS */ + +#if defined(CONFIG_NET_XGRESS) && defined(CONFIG_BPF_SYSCALL) +int tcx_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog); +int tcx_link_attach(const union bpf_attr *attr, struct bpf_prog *prog); +int tcx_prog_detach(const union bpf_attr *attr, struct bpf_prog *prog); +void tcx_uninstall(struct net_device *dev, bool ingress); + +int tcx_prog_query(const union bpf_attr *attr, + union bpf_attr __user *uattr); + +static inline void dev_tcx_uninstall(struct net_device *dev) +{ + ASSERT_RTNL(); + tcx_uninstall(dev, true); + tcx_uninstall(dev, false); +} +#else +static inline int tcx_prog_attach(const union bpf_attr *attr, + struct bpf_prog *prog) +{ + return -EINVAL; +} + +static inline int tcx_link_attach(const union bpf_attr *attr, + struct bpf_prog *prog) +{ + return -EINVAL; +} + +static inline int tcx_prog_detach(const union bpf_attr *attr, + struct bpf_prog *prog) +{ + return -EINVAL; +} + +static inline int tcx_prog_query(const union bpf_attr *attr, + union bpf_attr __user *uattr) +{ + return -EINVAL; +} + +static inline void dev_tcx_uninstall(struct net_device *dev) +{ +} +#endif /* CONFIG_NET_XGRESS && CONFIG_BPF_SYSCALL */ +#endif /* __NET_TCX_H */ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index d4c07e435336..739c15906a65 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1036,6 +1036,8 @@ enum bpf_attach_type { BPF_LSM_CGROUP, BPF_STRUCT_OPS, BPF_NETFILTER, + BPF_TCX_INGRESS, + BPF_TCX_EGRESS, __MAX_BPF_ATTACH_TYPE }; @@ -1053,7 +1055,7 @@ enum bpf_link_type { BPF_LINK_TYPE_KPROBE_MULTI = 8, BPF_LINK_TYPE_STRUCT_OPS = 9, BPF_LINK_TYPE_NETFILTER = 10, - + BPF_LINK_TYPE_TCX = 11, MAX_BPF_LINK_TYPE, }; @@ -1569,13 +1571,13 @@ union bpf_attr { __u32 map_fd; /* struct_ops to attach */ }; union { - __u32 target_fd; /* object to attach to */ - __u32 target_ifindex; /* target ifindex */ + __u32 target_fd; /* target object to attach to or ... */ + __u32 target_ifindex; /* target ifindex */ }; __u32 attach_type; /* attach type */ __u32 flags; /* extra flags */ union { - __u32 target_btf_id; /* btf_id of target to attach to */ + __u32 target_btf_id; /* btf_id of target to attach to */ struct { __aligned_u64 iter_info; /* extra bpf_iter_link_info */ __u32 iter_info_len; /* iter_info length */ @@ -1609,6 +1611,13 @@ union bpf_attr { __s32 priority; __u32 flags; } netfilter; + struct { + union { + __u32 relative_fd; + __u32 relative_id; + }; + __u64 expected_revision; + } tcx; }; } link_create; @@ -6217,6 +6226,19 @@ struct bpf_sock_tuple { }; }; +/* (Simplified) user return codes for tcx prog type. + * A valid tcx program must return one of these defined values. All other + * return codes are reserved for future use. Must remain compatible with + * their TC_ACT_* counter-parts. For compatibility in behavior, unknown + * return codes are mapped to TCX_NEXT. + */ +enum tcx_action_base { + TCX_NEXT = -1, + TCX_PASS = 0, + TCX_DROP = 2, + TCX_REDIRECT = 7, +}; + struct bpf_xdp_sock { __u32 queue_id; }; @@ -6499,6 +6521,10 @@ struct bpf_link_info { } event; /* BPF_PERF_EVENT_EVENT */ }; } perf_event; + struct { + __u32 ifindex; + __u32 attach_type; + } tcx; }; } __attribute__((aligned(8))); -- cgit v1.2.3 From 6f5a630d7c57cd79b1f526a95e757311e32d41e5 Mon Sep 17 00:00:00 2001 From: Alexei Starovoitov Date: Tue, 18 Jul 2023 16:40:21 -0700 Subject: bpf, net: Introduce skb_pointer_if_linear(). Network drivers always call skb_header_pointer() with non-null buffer. Remove !buffer check to prevent accidental misuse of skb_header_pointer(). Introduce skb_pointer_if_linear() instead. Reported-by: Jakub Kicinski Acked-by: Jakub Kicinski Link: https://lore.kernel.org/r/20230718234021.43640-1-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/skbuff.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index ed83f1c5fc1f..faaba050f843 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -4023,7 +4023,7 @@ __skb_header_pointer(const struct sk_buff *skb, int offset, int len, if (likely(hlen - offset >= len)) return (void *)data + offset; - if (!skb || !buffer || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0)) + if (!skb || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0)) return NULL; return buffer; @@ -4036,6 +4036,14 @@ skb_header_pointer(const struct sk_buff *skb, int offset, int len, void *buffer) skb_headlen(skb), buffer); } +static inline void * __must_check +skb_pointer_if_linear(const struct sk_buff *skb, int offset, int len) +{ + if (likely(skb_headlen(skb) - offset >= len)) + return skb->data + offset; + return NULL; +} + /** * skb_needs_linearize - check if we need to linearize a given skb * depending on the given device features. -- cgit v1.2.3 From 730b9051b8bce5eabd1f5b67dfd090c37d5dabea Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 18 Jul 2023 16:16:20 +0000 Subject: tcp: remove tcp_send_partial() This function does not exist. Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/20230718161620.1391951-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- include/net/tcp.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include') diff --git a/include/net/tcp.h b/include/net/tcp.h index 2104a71c75ba..8d1f1af5e02a 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -606,7 +606,6 @@ int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue, unsigned int mss_now, gfp_t gfp); void tcp_send_probe0(struct sock *); -void tcp_send_partial(struct sock *); int tcp_write_wakeup(struct sock *, int mib); void tcp_send_fin(struct sock *sk); void tcp_send_active_reset(struct sock *sk, gfp_t priority); -- cgit v1.2.3 From 03b123debcbc8db987bda17ed8412cc011064c22 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 18 Jul 2023 16:20:49 +0000 Subject: tcp: tcp_enter_quickack_mode() should be static After commit d2ccd7bc8acd ("tcp: avoid resetting ACK timer in DCTCP"), tcp_enter_quickack_mode() is only used from net/ipv4/tcp_input.c. Fixes: d2ccd7bc8acd ("tcp: avoid resetting ACK timer in DCTCP") Signed-off-by: Eric Dumazet Cc: Yuchung Cheng Cc: Neal Cardwell Link: https://lore.kernel.org/r/20230718162049.1444938-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- include/net/tcp.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include') diff --git a/include/net/tcp.h b/include/net/tcp.h index 8d1f1af5e02a..c5fb90079920 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -350,7 +350,6 @@ ssize_t tcp_splice_read(struct socket *sk, loff_t *ppos, struct sk_buff *tcp_stream_alloc_skb(struct sock *sk, gfp_t gfp, bool force_schedule); -void tcp_enter_quickack_mode(struct sock *sk, unsigned int max_quickacks); static inline void tcp_dec_quickack_mode(struct sock *sk, const unsigned int pkts) { -- cgit v1.2.3 From 4914109a8e1e494c6aa9852f9e84ec77a5fc643f Mon Sep 17 00:00:00 2001 From: Xin Long Date: Sun, 16 Jul 2023 17:09:17 -0400 Subject: netfilter: allow exp not to be removed in nf_ct_find_expectation Currently nf_conntrack_in() calling nf_ct_find_expectation() will remove the exp from the hash table. However, in some scenario, we expect the exp not to be removed when the created ct will not be confirmed, like in OVS and TC conntrack in the following patches. This patch allows exp not to be removed by setting IPS_CONFIRMED in the status of the tmpl. Signed-off-by: Xin Long Acked-by: Aaron Conole Acked-by: Florian Westphal Signed-off-by: Paolo Abeni --- include/net/netfilter/nf_conntrack_expect.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/netfilter/nf_conntrack_expect.h b/include/net/netfilter/nf_conntrack_expect.h index cf0d81be5a96..165e7a03b8e9 100644 --- a/include/net/netfilter/nf_conntrack_expect.h +++ b/include/net/netfilter/nf_conntrack_expect.h @@ -100,7 +100,7 @@ nf_ct_expect_find_get(struct net *net, struct nf_conntrack_expect * nf_ct_find_expectation(struct net *net, const struct nf_conntrack_zone *zone, - const struct nf_conntrack_tuple *tuple); + const struct nf_conntrack_tuple *tuple, bool unlink); void nf_ct_unlink_expect_report(struct nf_conntrack_expect *exp, u32 portid, int report); -- cgit v1.2.3 From 6f1c646d88c591a8139997c5591c1385cbc3d4e1 Mon Sep 17 00:00:00 2001 From: Stefan Eichenberger Date: Wed, 19 Jul 2023 08:42:54 +0200 Subject: net: phy: add registers to support 1000BASE-T1 Add registers and definitions to support 1000BASE-T1. This includes the PCS Control and Status registers (3.2304 and 3.2305) as well as some missing bits on the PMA/PMD extended ability register (1.18) and PMA/PMD CTRL (1.2100) register. Signed-off-by: Stefan Eichenberger Reviewed-by: Andrew Lunn Signed-off-by: Paolo Abeni --- include/uapi/linux/mdio.h | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/uapi/linux/mdio.h b/include/uapi/linux/mdio.h index b826598d1e94..d03863da180e 100644 --- a/include/uapi/linux/mdio.h +++ b/include/uapi/linux/mdio.h @@ -82,6 +82,8 @@ #define MDIO_AN_10BT1_AN_CTRL 526 /* 10BASE-T1 AN control register */ #define MDIO_AN_10BT1_AN_STAT 527 /* 10BASE-T1 AN status register */ #define MDIO_PMA_PMD_BT1_CTRL 2100 /* BASE-T1 PMA/PMD control register */ +#define MDIO_PCS_1000BT1_CTRL 2304 /* 1000BASE-T1 PCS control register */ +#define MDIO_PCS_1000BT1_STAT 2305 /* 1000BASE-T1 PCS status register */ /* LASI (Link Alarm Status Interrupt) registers, defined by XENPAK MSA. */ #define MDIO_PMA_LASI_RXCTRL 0x9000 /* RX_ALARM control */ @@ -332,6 +334,8 @@ #define MDIO_PCS_10T1L_CTRL_RESET 0x8000 /* PCS reset */ /* BASE-T1 PMA/PMD extended ability register. */ +#define MDIO_PMA_PMD_BT1_B100_ABLE 0x0001 /* 100BASE-T1 Ability */ +#define MDIO_PMA_PMD_BT1_B1000_ABLE 0x0002 /* 1000BASE-T1 Ability */ #define MDIO_PMA_PMD_BT1_B10L_ABLE 0x0004 /* 10BASE-T1L Ability */ /* BASE-T1 auto-negotiation advertisement register [15:0] */ @@ -373,7 +377,19 @@ #define MDIO_AN_10BT1_AN_STAT_LPA_EEE_T1L 0x4000 /* 10BASE-T1L LP EEE ability advertisement */ /* BASE-T1 PMA/PMD control register */ -#define MDIO_PMA_PMD_BT1_CTRL_CFG_MST 0x4000 /* MASTER-SLAVE config value */ +#define MDIO_PMA_PMD_BT1_CTRL_STRAP 0x000F /* Type selection (Strap) */ +#define MDIO_PMA_PMD_BT1_CTRL_STRAP_B1000 0x0001 /* Select 1000BASE-T1 */ +#define MDIO_PMA_PMD_BT1_CTRL_CFG_MST 0x4000 /* MASTER-SLAVE config value */ + +/* 1000BASE-T1 PCS control register */ +#define MDIO_PCS_1000BT1_CTRL_LOW_POWER 0x0800 /* Low power mode */ +#define MDIO_PCS_1000BT1_CTRL_DISABLE_TX 0x4000 /* Global PMA transmit disable */ +#define MDIO_PCS_1000BT1_CTRL_RESET 0x8000 /* Software reset value */ + +/* 1000BASE-T1 PCS status register */ +#define MDIO_PCS_1000BT1_STAT_LINK 0x0004 /* PCS Link is up */ +#define MDIO_PCS_1000BT1_STAT_FAULT 0x0080 /* There is a fault condition */ + /* EEE Supported/Advertisement/LP Advertisement registers. * -- cgit v1.2.3 From eba2e4c2faef618b6e33dfdba918c76727f891b5 Mon Sep 17 00:00:00 2001 From: Stefan Eichenberger Date: Wed, 19 Jul 2023 08:42:56 +0200 Subject: net: phy: c45: add a separate function to read BASE-T1 abilities Add a separate function to read the BASE-T1 abilities. Some PHYs do not indicate the availability of the extended BASE-T1 ability register, so this function must be called separately. Signed-off-by: Stefan Eichenberger Reviewed-by: Andrew Lunn Signed-off-by: Paolo Abeni --- include/linux/phy.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/phy.h b/include/linux/phy.h index 11c1e91563d4..b254848a9c99 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1826,6 +1826,7 @@ int genphy_c45_an_config_aneg(struct phy_device *phydev); int genphy_c45_an_disable_aneg(struct phy_device *phydev); int genphy_c45_read_mdix(struct phy_device *phydev); int genphy_c45_pma_read_abilities(struct phy_device *phydev); +int genphy_c45_pma_baset1_read_abilities(struct phy_device *phydev); int genphy_c45_read_eee_abilities(struct phy_device *phydev); int genphy_c45_pma_baset1_read_master_slave(struct phy_device *phydev); int genphy_c45_read_status(struct phy_device *phydev); -- cgit v1.2.3 From 00f11ac71708d2e5e434aa2ef9249f95b5e7e313 Mon Sep 17 00:00:00 2001 From: Stefan Eichenberger Date: Wed, 19 Jul 2023 08:42:58 +0200 Subject: net: phy: marvell-88q2xxx: add driver for the Marvell 88Q2110 PHY Add a driver for the Marvell 88Q2110. This driver allows to detect the link, switch between 100BASE-T1 and 1000BASE-T1 and switch between master and slave mode. Autonegotiation supported by the PHY does not yet work. Signed-off-by: Stefan Eichenberger Reviewed-by: Andrew Lunn Signed-off-by: Paolo Abeni --- include/linux/marvell_phy.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/marvell_phy.h b/include/linux/marvell_phy.h index 0f06c2287b52..9b54c4f0677f 100644 --- a/include/linux/marvell_phy.h +++ b/include/linux/marvell_phy.h @@ -25,6 +25,7 @@ #define MARVELL_PHY_ID_88X3310 0x002b09a0 #define MARVELL_PHY_ID_88E2110 0x002b09b0 #define MARVELL_PHY_ID_88X2222 0x01410f10 +#define MARVELL_PHY_ID_88Q2110 0x002b0980 /* Marvel 88E1111 in Finisar SFP module with modified PHY ID */ #define MARVELL_PHY_ID_88E1111_FINISAR 0x01ff0cc0 -- cgit v1.2.3 From b44693495af8f309b8ddec4b30833085d1c2d0c4 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 19 Jul 2023 06:47:54 +0000 Subject: tcp: add TCP_OLD_SEQUENCE drop reason tcp_sequence() uses two conditions to decide to drop a packet, and we currently report generic TCP_INVALID_SEQUENCE drop reason. Duplicates are common, we need to distinguish them from the other case. I chose to not reuse TCP_OLD_DATA, and instead added TCP_OLD_SEQUENCE drop reason. Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/20230719064754.2794106-1-edumazet@google.com Signed-off-by: Paolo Abeni --- include/net/dropreason-core.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/net/dropreason-core.h b/include/net/dropreason-core.h index a2b953b57689..f291a3b0f9e5 100644 --- a/include/net/dropreason-core.h +++ b/include/net/dropreason-core.h @@ -30,6 +30,7 @@ FN(TCP_OVERWINDOW) \ FN(TCP_OFOMERGE) \ FN(TCP_RFC7323_PAWS) \ + FN(TCP_OLD_SEQUENCE) \ FN(TCP_INVALID_SEQUENCE) \ FN(TCP_RESET) \ FN(TCP_INVALID_SYN) \ @@ -188,6 +189,8 @@ enum skb_drop_reason { * LINUX_MIB_PAWSESTABREJECTED */ SKB_DROP_REASON_TCP_RFC7323_PAWS, + /** @SKB_DROP_REASON_TCP_OLD_SEQUENCE: Old SEQ field (duplicate packet) */ + SKB_DROP_REASON_TCP_OLD_SEQUENCE, /** @SKB_DROP_REASON_TCP_INVALID_SEQUENCE: Not acceptable SEQ field */ SKB_DROP_REASON_TCP_INVALID_SEQUENCE, /** @SKB_DROP_REASON_TCP_RESET: Invalid RST packet */ -- cgit v1.2.3 From f2e2857b352277a451e2f91409e461fa7ebf2d15 Mon Sep 17 00:00:00 2001 From: Petr Machata Date: Wed, 19 Jul 2023 13:01:17 +0200 Subject: net: switchdev: Add a helper to replay objects on a bridge port When a front panel joins a bridge via another netdevice (typically a LAG), the driver needs to learn about the objects configured on the bridge port. When the bridge port is offloaded by the driver for the first time, this can be achieved by passing a notifier to switchdev_bridge_port_offload(). The notifier is then invoked for the individual objects (such as VLANs) configured on the bridge, and can look for the interesting ones. Calling switchdev_bridge_port_offload() when the second port joins the bridge lower is unnecessary, but the replay is still needed. To that end, add a new function, switchdev_bridge_port_replay(), which does only the replay part of the _offload() function in exactly the same way as that function. Cc: Jiri Pirko Cc: Ivan Vecera Cc: Roopa Prabhu Cc: Nikolay Aleksandrov Cc: bridge@lists.linux-foundation.org Signed-off-by: Petr Machata Reviewed-by: Danielle Ratson Signed-off-by: David S. Miller --- include/net/switchdev.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include') diff --git a/include/net/switchdev.h b/include/net/switchdev.h index ca0312b78294..4d324e2a2eef 100644 --- a/include/net/switchdev.h +++ b/include/net/switchdev.h @@ -231,6 +231,7 @@ enum switchdev_notifier_type { SWITCHDEV_BRPORT_OFFLOADED, SWITCHDEV_BRPORT_UNOFFLOADED, + SWITCHDEV_BRPORT_REPLAY, }; struct switchdev_notifier_info { @@ -299,6 +300,11 @@ void switchdev_bridge_port_unoffload(struct net_device *brport_dev, const void *ctx, struct notifier_block *atomic_nb, struct notifier_block *blocking_nb); +int switchdev_bridge_port_replay(struct net_device *brport_dev, + struct net_device *dev, const void *ctx, + struct notifier_block *atomic_nb, + struct notifier_block *blocking_nb, + struct netlink_ext_ack *extack); void switchdev_deferred_process(void); int switchdev_port_attr_set(struct net_device *dev, -- cgit v1.2.3 From 9fe63d5f1da939855bdfaebfd9e95c96938b6411 Mon Sep 17 00:00:00 2001 From: Naveen Mamindlapalli Date: Wed, 19 Jul 2023 16:34:41 +0530 Subject: sch_htb: Allow HTB quantum parameter in offload mode The current implementation of HTB offload returns the EINVAL error for quantum parameter. This patch removes the error returning checks for 'quantum' parameter and populates its value to tc_htb_qopt_offload structure such that driver can use the same. Add quantum parameter check in mlx5 driver, as mlx5 devices are not capable of supporting the quantum parameter when htb offload is used. Report error if quantum parameter is set to a non-default value. Signed-off-by: Naveen Mamindlapalli Signed-off-by: Hariprasad Kelam Signed-off-by: David S. Miller --- include/net/pkt_cls.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index a2ea45c7b53e..139cd09828af 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -866,6 +866,7 @@ struct tc_htb_qopt_offload { u32 parent_classid; u16 classid; u16 qid; + u32 quantum; u64 rate; u64 ceil; u8 prio; -- cgit v1.2.3 From 535b9c61bdef6017228c708128b7849a476f8da5 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 19 Jul 2023 18:04:08 -0700 Subject: net: page_pool: hide page_pool_release_page() There seems to be no user calling page_pool_release_page() for legit reasons, all the users simply haven't been converted to skb-based recycling, yet. Previous changes converted them. Update the docs, and unexport the function. Link: https://lore.kernel.org/r/20230720010409.1967072-4-kuba@kernel.org Reviewed-by: Alexander Lobakin Signed-off-by: Jakub Kicinski --- include/net/page_pool.h | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) (limited to 'include') diff --git a/include/net/page_pool.h b/include/net/page_pool.h index 126f9e294389..f1d5cc1fa13b 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -18,9 +18,8 @@ * * API keeps track of in-flight pages, in-order to let API user know * when it is safe to dealloactor page_pool object. Thus, API users - * must make sure to call page_pool_release_page() when a page is - * "leaving" the page_pool. Or call page_pool_put_page() where - * appropiate. For maintaining correct accounting. + * must call page_pool_put_page() where appropriate and only attach + * the page to a page_pool-aware objects, like skbs marked for recycling. * * API user must only call page_pool_put_page() once on a page, as it * will either recycle the page, or in case of elevated refcnt, it @@ -251,7 +250,6 @@ void page_pool_unlink_napi(struct page_pool *pool); void page_pool_destroy(struct page_pool *pool); void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *), struct xdp_mem_info *mem); -void page_pool_release_page(struct page_pool *pool, struct page *page); void page_pool_put_page_bulk(struct page_pool *pool, void **data, int count); #else @@ -268,10 +266,6 @@ static inline void page_pool_use_xdp_mem(struct page_pool *pool, struct xdp_mem_info *mem) { } -static inline void page_pool_release_page(struct page_pool *pool, - struct page *page) -{ -} static inline void page_pool_put_page_bulk(struct page_pool *pool, void **data, int count) -- cgit v1.2.3 From a3377386b56420d78a4c0a931a40f9a25c3ca2bd Mon Sep 17 00:00:00 2001 From: Anjali Kulkarni Date: Wed, 19 Jul 2023 13:18:16 -0700 Subject: netlink: Reverse the patch which removed filtering To use filtering at the connector & cn_proc layers, we need to enable filtering in the netlink layer. This reverses the patch which removed netlink filtering - commit ID for that patch: 549017aa1bb7 (netlink: remove netlink_broadcast_filtered). Signed-off-by: Anjali Kulkarni Reviewed-by: Liam R. Howlett Acked-by: Jakub Kicinski Signed-off-by: David S. Miller --- include/linux/netlink.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include') diff --git a/include/linux/netlink.h b/include/linux/netlink.h index 9eec3f4f5351..3a6563681b50 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -227,6 +227,11 @@ bool netlink_strict_get_check(struct sk_buff *skb); int netlink_unicast(struct sock *ssk, struct sk_buff *skb, __u32 portid, int nonblock); int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, __u32 portid, __u32 group, gfp_t allocation); +int netlink_broadcast_filtered(struct sock *ssk, struct sk_buff *skb, + __u32 portid, __u32 group, gfp_t allocation, + int (*filter)(struct sock *dsk, + struct sk_buff *skb, void *data), + void *filter_data); int netlink_set_err(struct sock *ssk, __u32 portid, __u32 group, int code); int netlink_register_notifier(struct notifier_block *nb); int netlink_unregister_notifier(struct notifier_block *nb); -- cgit v1.2.3 From a4c9a56e6a2cdeeab7caef1f496b7bfefd95b50e Mon Sep 17 00:00:00 2001 From: Anjali Kulkarni Date: Wed, 19 Jul 2023 13:18:17 -0700 Subject: netlink: Add new netlink_release function A new function netlink_release is added in netlink_sock to store the protocol's release function. This is called when the socket is deleted. This can be supplied by the protocol via the release function in netlink_kernel_cfg. This is being added for the NETLINK_CONNECTOR protocol, so it can free it's data when socket is deleted. Signed-off-by: Anjali Kulkarni Reviewed-by: Liam R. Howlett Acked-by: Jakub Kicinski Signed-off-by: David S. Miller --- include/linux/netlink.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/netlink.h b/include/linux/netlink.h index 3a6563681b50..75d7de34c908 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -50,6 +50,7 @@ struct netlink_kernel_cfg { struct mutex *cb_mutex; int (*bind)(struct net *net, int group); void (*unbind)(struct net *net, int group); + void (*release) (struct sock *sk, unsigned long *groups); }; struct sock *__netlink_kernel_create(struct net *net, int unit, -- cgit v1.2.3 From 2aa1f7a1f47ce8dac7593af605aaa859b3cf3bb1 Mon Sep 17 00:00:00 2001 From: Anjali Kulkarni Date: Wed, 19 Jul 2023 13:18:18 -0700 Subject: connector/cn_proc: Add filtering to fix some bugs The current proc connector code has the foll. bugs - if there are more than one listeners for the proc connector messages, and one of them deregisters for listening using PROC_CN_MCAST_IGNORE, they will still get all proc connector messages, as long as there is another listener. Another issue is if one client calls PROC_CN_MCAST_LISTEN, and another one calls PROC_CN_MCAST_IGNORE, then both will end up not getting any messages. This patch adds filtering and drops packet if client has sent PROC_CN_MCAST_IGNORE. This data is stored in the client socket's sk_user_data. In addition, we only increment or decrement proc_event_num_listeners once per client. This fixes the above issues. cn_release is the release function added for NETLINK_CONNECTOR. It uses the newly added netlink_release function added to netlink_sock. It will free sk_user_data. Signed-off-by: Anjali Kulkarni Reviewed-by: Liam R. Howlett Signed-off-by: David S. Miller --- include/linux/connector.h | 8 +++++++- include/uapi/linux/cn_proc.h | 43 +++++++++++++++++++++++++------------------ 2 files changed, 32 insertions(+), 19 deletions(-) (limited to 'include') diff --git a/include/linux/connector.h b/include/linux/connector.h index 487350bb19c3..cec2d99ae902 100644 --- a/include/linux/connector.h +++ b/include/linux/connector.h @@ -90,13 +90,19 @@ void cn_del_callback(const struct cb_id *id); * If @group is not zero, then message will be delivered * to the specified group. * @gfp_mask: GFP mask. + * @filter: Filter function to be used at netlink layer. + * @filter_data:Filter data to be supplied to the filter function * * It can be safely called from softirq context, but may silently * fail under strong memory pressure. * * If there are no listeners for given group %-ESRCH can be returned. */ -int cn_netlink_send_mult(struct cn_msg *msg, u16 len, u32 portid, u32 group, gfp_t gfp_mask); +int cn_netlink_send_mult(struct cn_msg *msg, u16 len, u32 portid, + u32 group, gfp_t gfp_mask, + int (*filter)(struct sock *dsk, struct sk_buff *skb, + void *data), + void *filter_data); /** * cn_netlink_send - Sends message to the specified groups. diff --git a/include/uapi/linux/cn_proc.h b/include/uapi/linux/cn_proc.h index db210625cee8..6a06fb424313 100644 --- a/include/uapi/linux/cn_proc.h +++ b/include/uapi/linux/cn_proc.h @@ -30,6 +30,30 @@ enum proc_cn_mcast_op { PROC_CN_MCAST_IGNORE = 2 }; +enum proc_cn_event { + /* Use successive bits so the enums can be used to record + * sets of events as well + */ + PROC_EVENT_NONE = 0x00000000, + PROC_EVENT_FORK = 0x00000001, + PROC_EVENT_EXEC = 0x00000002, + PROC_EVENT_UID = 0x00000004, + PROC_EVENT_GID = 0x00000040, + PROC_EVENT_SID = 0x00000080, + PROC_EVENT_PTRACE = 0x00000100, + PROC_EVENT_COMM = 0x00000200, + /* "next" should be 0x00000400 */ + /* "last" is the last process event: exit, + * while "next to last" is coredumping event + */ + PROC_EVENT_COREDUMP = 0x40000000, + PROC_EVENT_EXIT = 0x80000000 +}; + +struct proc_input { + enum proc_cn_mcast_op mcast_op; +}; + /* * From the user's point of view, the process * ID is the thread group ID and thread ID is the internal @@ -44,24 +68,7 @@ enum proc_cn_mcast_op { */ struct proc_event { - enum what { - /* Use successive bits so the enums can be used to record - * sets of events as well - */ - PROC_EVENT_NONE = 0x00000000, - PROC_EVENT_FORK = 0x00000001, - PROC_EVENT_EXEC = 0x00000002, - PROC_EVENT_UID = 0x00000004, - PROC_EVENT_GID = 0x00000040, - PROC_EVENT_SID = 0x00000080, - PROC_EVENT_PTRACE = 0x00000100, - PROC_EVENT_COMM = 0x00000200, - /* "next" should be 0x00000400 */ - /* "last" is the last process event: exit, - * while "next to last" is coredumping event */ - PROC_EVENT_COREDUMP = 0x40000000, - PROC_EVENT_EXIT = 0x80000000 - } what; + enum proc_cn_event what; __u32 cpu; __u64 __attribute__((aligned(8))) timestamp_ns; /* Number of nano seconds since system boot */ -- cgit v1.2.3 From 743acf351bae1ff7ff4aaadd6a406d4d6091d90b Mon Sep 17 00:00:00 2001 From: Anjali Kulkarni Date: Wed, 19 Jul 2023 13:18:19 -0700 Subject: connector/cn_proc: Performance improvements This patch adds the capability to filter messages sent by the proc connector on the event type supplied in the message from the client to the connector. The client can register to listen for an event type given in struct proc_input. This event based filteting will greatly enhance performance - handling 8K exits takes about 70ms, whereas 8K-forks + 8K-exits takes about 150ms & handling 8K-forks + 8K-exits + 8K-execs takes 200ms. There are currently 9 different types of events, and we need to listen to all of them. Also, measuring the time using pidfds for monitoring 8K process exits took much longer - 200ms, as compared to 70ms using only exit notifications of proc connector. We also add a new event type - PROC_EVENT_NONZERO_EXIT, which is only sent by kernel to a listening application when any process exiting, has a non-zero exit status. This will help the clients like Oracle DB, where a monitoring process wants notfications for non-zero process exits so it can cleanup after them. This kind of a new event could also be useful to other applications like Google's lmkd daemon, which needs a killed process's exit notification. The patch takes care that existing clients using old mechanism of not sending the event type work without any changes. cn_filter function checks to see if the event type being notified via proc connector matches the event type requested by client, before sending(matches) or dropping(does not match) a packet. Signed-off-by: Anjali Kulkarni Reviewed-by: Liam R. Howlett Signed-off-by: David S. Miller --- include/uapi/linux/cn_proc.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/cn_proc.h b/include/uapi/linux/cn_proc.h index 6a06fb424313..f2afb7cc4926 100644 --- a/include/uapi/linux/cn_proc.h +++ b/include/uapi/linux/cn_proc.h @@ -30,6 +30,15 @@ enum proc_cn_mcast_op { PROC_CN_MCAST_IGNORE = 2 }; +#define PROC_EVENT_ALL (PROC_EVENT_FORK | PROC_EVENT_EXEC | PROC_EVENT_UID | \ + PROC_EVENT_GID | PROC_EVENT_SID | PROC_EVENT_PTRACE | \ + PROC_EVENT_COMM | PROC_EVENT_NONZERO_EXIT | \ + PROC_EVENT_COREDUMP | PROC_EVENT_EXIT) + +/* + * If you add an entry in proc_cn_event, make sure you add it in + * PROC_EVENT_ALL above as well. + */ enum proc_cn_event { /* Use successive bits so the enums can be used to record * sets of events as well @@ -45,15 +54,25 @@ enum proc_cn_event { /* "next" should be 0x00000400 */ /* "last" is the last process event: exit, * while "next to last" is coredumping event + * before that is report only if process dies + * with non-zero exit status */ + PROC_EVENT_NONZERO_EXIT = 0x20000000, PROC_EVENT_COREDUMP = 0x40000000, PROC_EVENT_EXIT = 0x80000000 }; struct proc_input { enum proc_cn_mcast_op mcast_op; + enum proc_cn_event event_type; }; +static inline enum proc_cn_event valid_event(enum proc_cn_event ev_type) +{ + ev_type &= PROC_EVENT_ALL; + return ev_type; +} + /* * From the user's point of view, the process * ID is the thread group ID and thread ID is the internal -- cgit v1.2.3 From 1671bcfd76fdc0b9e65153cf759153083755fe4c Mon Sep 17 00:00:00 2001 From: Patrick Rohr Date: Wed, 19 Jul 2023 07:52:13 -0700 Subject: net: add sysctl accept_ra_min_rtr_lft MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This change adds a new sysctl accept_ra_min_rtr_lft to specify the minimum acceptable router lifetime in an RA. If the received RA router lifetime is less than the configured value (and not 0), the RA is ignored. This is useful for mobile devices, whose battery life can be impacted by networks that configure RAs with a short lifetime. On such networks, the device should never gain IPv6 provisioning and should attempt to drop RAs via hardware offload, if available. Signed-off-by: Patrick Rohr Cc: Maciej Żenczykowski Cc: Lorenzo Colitti Signed-off-by: David S. Miller --- include/linux/ipv6.h | 1 + include/uapi/linux/ipv6.h | 1 + 2 files changed, 2 insertions(+) (limited to 'include') diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index 839247a4f48e..ed3d110c2eb5 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -33,6 +33,7 @@ struct ipv6_devconf { __s32 accept_ra_defrtr; __u32 ra_defrtr_metric; __s32 accept_ra_min_hop_limit; + __s32 accept_ra_min_rtr_lft; __s32 accept_ra_pinfo; __s32 ignore_routes_with_linkdown; #ifdef CONFIG_IPV6_ROUTER_PREF diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h index ac56605fe9bc..8b6bcbf6ed4a 100644 --- a/include/uapi/linux/ipv6.h +++ b/include/uapi/linux/ipv6.h @@ -198,6 +198,7 @@ enum { DEVCONF_IOAM6_ID_WIDE, DEVCONF_NDISC_EVICT_NOCARRIER, DEVCONF_ACCEPT_UNTRACKED_NA, + DEVCONF_ACCEPT_RA_MIN_RTR_LFT, DEVCONF_MAX }; -- cgit v1.2.3 From f5f80e32de12fad2813d37270e8364a03e6d3ef0 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 20 Jul 2023 11:09:01 +0000 Subject: ipv6: remove hard coded limitation on ipv6_pinfo IPv6 inet sockets are supposed to have a "struct ipv6_pinfo" field at the end of their definition, so that inet6_sk_generic() can derive from socket size the offset of the "struct ipv6_pinfo". This is very fragile, and prevents adding bigger alignment in sockets, because inet6_sk_generic() does not work if the compiler adds padding after the ipv6_pinfo component. We are currently working on a patch series to reorganize TCP structures for better data locality and found issues similar to the one fixed in commit f5d547676ca0 ("tcp: fix tcp_inet6_sk() for 32bit kernels") Alternative would be to force an alignment on "struct ipv6_pinfo", greater or equal to __alignof__(any ipv6 sock) to ensure there is no padding. This does not look great. v2: fix typo in mptcp_proto_v6_init() (Paolo) Signed-off-by: Eric Dumazet Cc: Chao Wu Cc: Wei Wang Cc: Coco Li Cc: YiFei Zhu Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/linux/ipv6.h | 15 ++++----------- include/net/sock.h | 1 + 2 files changed, 5 insertions(+), 11 deletions(-) (limited to 'include') diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index ed3d110c2eb5..0295b47c10a3 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -200,14 +200,7 @@ struct inet6_cork { u8 tclass; }; -/** - * struct ipv6_pinfo - ipv6 private area - * - * In the struct sock hierarchy (tcp6_sock, upd6_sock, etc) - * this _must_ be the last member, so that inet6_sk_generic - * is able to calculate its offset from the base struct sock - * by using the struct proto->slab_obj_size member. -acme - */ +/* struct ipv6_pinfo - ipv6 private area */ struct ipv6_pinfo { struct in6_addr saddr; struct in6_pktinfo sticky_pktinfo; @@ -307,19 +300,19 @@ struct raw6_sock { __u32 offset; /* checksum offset */ struct icmp6_filter filter; __u32 ip6mr_table; - /* ipv6_pinfo has to be the last member of raw6_sock, see inet6_sk_generic */ + struct ipv6_pinfo inet6; }; struct udp6_sock { struct udp_sock udp; - /* ipv6_pinfo has to be the last member of udp6_sock, see inet6_sk_generic */ + struct ipv6_pinfo inet6; }; struct tcp6_sock { struct tcp_sock tcp; - /* ipv6_pinfo has to be the last member of tcp6_sock, see inet6_sk_generic */ + struct ipv6_pinfo inet6; }; diff --git a/include/net/sock.h b/include/net/sock.h index 2eb916d1ff64..7ae44bf866af 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1339,6 +1339,7 @@ struct proto { struct kmem_cache *slab; unsigned int obj_size; + unsigned int ipv6_pinfo_offset; slab_flags_t slab_flags; unsigned int useroffset; /* Usercopy region offset */ unsigned int usersize; /* Usercopy region size */ -- cgit v1.2.3 From a097627dcaddb90f55c1780da348bf9b56f6b4bd Mon Sep 17 00:00:00 2001 From: Maciej Fijalkowski Date: Fri, 21 Jul 2023 16:58:08 +0200 Subject: net: add missing net_device::xdp_zc_max_segs description Cited commit under 'Fixes' tag introduced new member to struct net_device without providing description of it - fix it. Reported-by: Stephen Rothwell Closes: https://lore.kernel.org/all/20230720141613.61488b9e@canb.auug.org.au/ Fixes: 13ce2daa259a ("xsk: add new netlink attribute dedicated for ZC max frags") Signed-off-by: Maciej Fijalkowski Reviewed-by: Simon Horman Tested-by: Simon Horman # build-tested Link: https://lore.kernel.org/r/20230721145808.596298-1-maciej.fijalkowski@intel.com Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 3800d0479698..11652e464f5d 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2043,6 +2043,8 @@ enum netdev_ml_priv_type { * receive offload (GRO) * @gro_ipv4_max_size: Maximum size of aggregated packet in generic * receive offload (GRO), for IPv4. + * @xdp_zc_max_segs: Maximum number of segments supported by AF_XDP + * zero copy driver * * @dev_addr_shadow: Copy of @dev_addr to catch direct writes. * @linkwatch_dev_tracker: refcount tracker used by linkwatch. -- cgit v1.2.3 From b8dc6d6ce93142ccd4c976003bb6c25d63aac2ce Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Thu, 20 Jul 2023 20:47:50 +0200 Subject: mptcp: fix rcv buffer auto-tuning The MPTCP code uses the assumption that the tcp_win_from_space() helper does not use any TCP-specific field, and thus works correctly operating on an MPTCP socket. The commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale") broke such assumption, and as a consequence most MPTCP connections stall on zero-window event due to auto-tuning changing the rcv buffer size quite randomly. Address the issue syncing again the MPTCP auto-tuning code with the TCP one. To achieve that, factor out the windows size logic in socket independent helpers, and reuse them in mptcp_rcv_space_adjust(). The MPTCP level scaling_ratio is selected as the minimum one from the all the subflows, as a worst-case estimate. Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale") Signed-off-by: Paolo Abeni Co-developed-by: Matthieu Baerts Signed-off-by: Matthieu Baerts Reviewed-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Link: https://lore.kernel.org/r/20230720-upstream-net-next-20230720-mptcp-fix-rcv-buffer-auto-tuning-v1-1-175ef12b8380@tessares.net Signed-off-by: Jakub Kicinski --- include/net/tcp.h | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) (limited to 'include') diff --git a/include/net/tcp.h b/include/net/tcp.h index d17cb8ab4c48..6ebf54992ffe 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1430,22 +1430,32 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 *window_clamp, int wscale_ok, __u8 *rcv_wscale, __u32 init_rcv_wnd); -static inline int tcp_win_from_space(const struct sock *sk, int space) +static inline int __tcp_win_from_space(u8 scaling_ratio, int space) { - s64 scaled_space = (s64)space * tcp_sk(sk)->scaling_ratio; + s64 scaled_space = (s64)space * scaling_ratio; return scaled_space >> TCP_RMEM_TO_WIN_SCALE; } -/* inverse of tcp_win_from_space() */ -static inline int tcp_space_from_win(const struct sock *sk, int win) +static inline int tcp_win_from_space(const struct sock *sk, int space) +{ + return __tcp_win_from_space(tcp_sk(sk)->scaling_ratio, space); +} + +/* inverse of __tcp_win_from_space() */ +static inline int __tcp_space_from_win(u8 scaling_ratio, int win) { u64 val = (u64)win << TCP_RMEM_TO_WIN_SCALE; - do_div(val, tcp_sk(sk)->scaling_ratio); + do_div(val, scaling_ratio); return val; } +static inline int tcp_space_from_win(const struct sock *sk, int win) +{ + return __tcp_space_from_win(tcp_sk(sk)->scaling_ratio, win); +} + static inline void tcp_scaling_ratio_init(struct sock *sk) { /* Assume a conservative default of 1200 bytes of payload per 4K page. -- cgit v1.2.3 From 4d72c3bb60dd9d5ea180f157bac72b4458112282 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Sat, 22 Jul 2023 21:32:59 +0100 Subject: net: phylink: strip out pre-March 2020 legacy code Strip out all the pre-March 2020 legacy code from phylink now that the last user of it is gone. Reviewed-by: Daniel Golle Tested-by: Daniel Golle Tested-by: Frank Wunderlich Signed-off-by: Russell King (Oracle) Signed-off-by: Paolo Abeni --- include/linux/phylink.h | 45 ++++++--------------------------------------- 1 file changed, 6 insertions(+), 39 deletions(-) (limited to 'include') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 9e861c8316d0..789c516c6b4a 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -201,8 +201,6 @@ enum phylink_op_type { * struct phylink_config - PHYLINK configuration structure * @dev: a pointer to a struct device associated with the MAC * @type: operation type of PHYLINK instance - * @legacy_pre_march2020: driver has not been updated for March 2020 updates - * (See commit 7cceb599d15d ("net: phylink: avoid mac_config calls") * @poll_fixed_state: if true, starts link_poll, * if MAC link is at %MLO_AN_FIXED mode. * @mac_managed_pm: if true, indicate the MAC driver is responsible for PHY PM. @@ -216,7 +214,6 @@ enum phylink_op_type { struct phylink_config { struct device *dev; enum phylink_op_type type; - bool legacy_pre_march2020; bool poll_fixed_state; bool mac_managed_pm; bool ovr_an_inband; @@ -230,7 +227,6 @@ struct phylink_config { * struct phylink_mac_ops - MAC operations structure. * @validate: Validate and update the link configuration. * @mac_select_pcs: Select a PCS for the interface mode. - * @mac_pcs_get_state: Read the current link state from the hardware. * @mac_prepare: prepare for a major reconfiguration of the interface. * @mac_config: configure the MAC for the selected mode and state. * @mac_finish: finish a major reconfiguration of the interface. @@ -245,8 +241,6 @@ struct phylink_mac_ops { struct phylink_link_state *state); struct phylink_pcs *(*mac_select_pcs)(struct phylink_config *config, phy_interface_t interface); - void (*mac_pcs_get_state)(struct phylink_config *config, - struct phylink_link_state *state); int (*mac_prepare)(struct phylink_config *config, unsigned int mode, phy_interface_t iface); void (*mac_config)(struct phylink_config *config, unsigned int mode, @@ -312,25 +306,6 @@ void validate(struct phylink_config *config, unsigned long *supported, struct phylink_pcs *mac_select_pcs(struct phylink_config *config, phy_interface_t interface); -/** - * mac_pcs_get_state() - Read the current inband link state from the hardware - * @config: a pointer to a &struct phylink_config. - * @state: a pointer to a &struct phylink_link_state. - * - * Read the current inband link state from the MAC PCS, reporting the - * current speed in @state->speed, duplex mode in @state->duplex, pause - * mode in @state->pause using the %MLO_PAUSE_RX and %MLO_PAUSE_TX bits, - * negotiation completion state in @state->an_complete, and link up state - * in @state->link. If possible, @state->lp_advertising should also be - * populated. - * - * Note: This is a legacy method. This function will not be called unless - * legacy_pre_march2020 is set in &struct phylink_config and there is no - * PCS attached. - */ -void mac_pcs_get_state(struct phylink_config *config, - struct phylink_link_state *state); - /** * mac_prepare() - prepare to change the PHY interface mode * @config: a pointer to a &struct phylink_config. @@ -367,17 +342,9 @@ int mac_prepare(struct phylink_config *config, unsigned int mode, * guaranteed to be correct, and so any mac_config() implementation must * never reference these fields. * - * Note: For legacy March 2020 drivers (drivers with legacy_pre_march2020 set - * in their &phylnk_config and which don't have a PCS), this function will be - * called on each link up event, and to also change the in-band advert. For - * non-legacy drivers, it will only be called to reconfigure the MAC for a - * "major" change in e.g. interface mode. It will not be called for changes - * in speed, duplex or pause modes or to change the in-band advertisement. - * In any case, it is strongly preferred that speed, duplex and pause settings - * are handled in the mac_link_up() method and not in this method. - * - * (this requires a rewrite - please refer to mac_link_up() for situations - * where the PCS and MAC are not tightly integrated.) + * This will only be called to reconfigure the MAC for a "major" change in + * e.g. interface mode. It will not be called for changes in speed, duplex + * or pause modes or to change the in-band advertisement. * * In all negotiation modes, as defined by @mode, @state->pause indicates the * pause settings which should be applied as follows. If %MLO_PAUSE_AN is not @@ -409,7 +376,7 @@ int mac_prepare(struct phylink_config *config, unsigned int mode, * 1000base-X or Cisco SGMII mode depending on the @state->interface * mode). In both cases, link state management (whether the link * is up or not) is performed by the MAC, and reported via the - * mac_pcs_get_state() callback. Changes in link state must be made + * pcs_get_state() callback. Changes in link state must be made * by calling phylink_mac_change(). * * Interface mode specific details are mentioned below. @@ -601,8 +568,8 @@ void pcs_disable(struct phylink_pcs *pcs); * in @state->link. If possible, @state->lp_advertising should also be * populated. * - * When present, this overrides mac_pcs_get_state() in &struct - * phylink_mac_ops. + * When present, this overrides pcs_get_state() in &struct + * phylink_pcs_ops. */ void pcs_get_state(struct phylink_pcs *pcs, struct phylink_link_state *state); -- cgit v1.2.3 From 57266281271add0132bea0b83ef0d6f32704c402 Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Wed, 19 Jul 2023 12:26:53 +0300 Subject: net/mlx5: Add relevant capabilities bits to support NAT-T Provide an ability to check if flow steering supports UDP encapsulation and decapsulation of IPsec ESP packets. Signed-off-by: Leon Romanovsky Signed-off-by: Paolo Abeni --- include/linux/mlx5/mlx5_ifc.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 33344a71c3e3..b3ad6b9852ec 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -464,10 +464,10 @@ struct mlx5_ifc_flow_table_prop_layout_bits { u8 reformat_add_esp_trasport[0x1]; u8 reformat_l2_to_l3_esp_tunnel[0x1]; - u8 reserved_at_42[0x1]; + u8 reformat_add_esp_transport_over_udp[0x1]; u8 reformat_del_esp_trasport[0x1]; u8 reformat_l3_esp_tunnel_to_l2[0x1]; - u8 reserved_at_45[0x1]; + u8 reformat_del_esp_transport_over_udp[0x1]; u8 execute_aso[0x1]; u8 reserved_at_47[0x19]; @@ -6665,9 +6665,12 @@ enum mlx5_reformat_ctx_type { MLX5_REFORMAT_TYPE_L2_TO_L3_TUNNEL = 0x4, MLX5_REFORMAT_TYPE_ADD_ESP_TRANSPORT_OVER_IPV4 = 0x5, MLX5_REFORMAT_TYPE_L2_TO_L3_ESP_TUNNEL = 0x6, + MLX5_REFORMAT_TYPE_ADD_ESP_TRANSPORT_OVER_UDPV4 = 0x7, MLX5_REFORMAT_TYPE_DEL_ESP_TRANSPORT = 0x8, MLX5_REFORMAT_TYPE_L3_ESP_TUNNEL_TO_L2 = 0x9, + MLX5_REFORMAT_TYPE_DEL_ESP_TRANSPORT_OVER_UDP = 0xa, MLX5_REFORMAT_TYPE_ADD_ESP_TRANSPORT_OVER_IPV6 = 0xb, + MLX5_REFORMAT_TYPE_ADD_ESP_TRANSPORT_OVER_UDPV6 = 0xc, MLX5_REFORMAT_TYPE_INSERT_HDR = 0xf, MLX5_REFORMAT_TYPE_REMOVE_HDR = 0x10, MLX5_REFORMAT_TYPE_ADD_MACSEC = 0x11, -- cgit v1.2.3 From ce796e60b3b196b61fcc565df195443cbb846ef0 Mon Sep 17 00:00:00 2001 From: Lorenz Bauer Date: Thu, 20 Jul 2023 17:30:07 +0200 Subject: net: export inet_lookup_reuseport and inet6_lookup_reuseport Rename the existing reuseport helpers for IPv4 and IPv6 so that they can be invoked in the follow up commit. Export them so that building DCCP and IPv6 as a module works. No change in functionality. Reviewed-by: Kuniyuki Iwashima Signed-off-by: Lorenz Bauer Link: https://lore.kernel.org/r/20230720-so-reuseport-v6-3-7021b683cdae@isovalent.com Signed-off-by: Martin KaFai Lau --- include/net/inet6_hashtables.h | 7 +++++++ include/net/inet_hashtables.h | 5 +++++ 2 files changed, 12 insertions(+) (limited to 'include') diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h index 56f1286583d3..032ddab48f8f 100644 --- a/include/net/inet6_hashtables.h +++ b/include/net/inet6_hashtables.h @@ -48,6 +48,13 @@ struct sock *__inet6_lookup_established(struct net *net, const u16 hnum, const int dif, const int sdif); +struct sock *inet6_lookup_reuseport(struct net *net, struct sock *sk, + struct sk_buff *skb, int doff, + const struct in6_addr *saddr, + __be16 sport, + const struct in6_addr *daddr, + unsigned short hnum); + struct sock *inet6_lookup_listener(struct net *net, struct inet_hashinfo *hashinfo, struct sk_buff *skb, int doff, diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 99bd823e97f6..8734f3488f5d 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -379,6 +379,11 @@ struct sock *__inet_lookup_established(struct net *net, const __be32 daddr, const u16 hnum, const int dif, const int sdif); +struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk, + struct sk_buff *skb, int doff, + __be32 saddr, __be16 sport, + __be32 daddr, unsigned short hnum); + static inline struct sock * inet_lookup_established(struct net *net, struct inet_hashinfo *hashinfo, const __be32 saddr, const __be16 sport, -- cgit v1.2.3 From 0f495f7617229772403e683033abc473f0f0553c Mon Sep 17 00:00:00 2001 From: Lorenz Bauer Date: Thu, 20 Jul 2023 17:30:08 +0200 Subject: net: remove duplicate reuseport_lookup functions There are currently four copies of reuseport_lookup: one each for (TCP, UDP)x(IPv4, IPv6). This forces us to duplicate all callers of those functions as well. This is already the case for sk_lookup helpers (inet,inet6,udp4,udp6)_lookup_run_bpf. There are two differences between the reuseport_lookup helpers: 1. They call different hash functions depending on protocol 2. UDP reuseport_lookup checks that sk_state != TCP_ESTABLISHED Move the check for sk_state into the caller and use the INDIRECT_CALL infrastructure to cut down the helpers to one per IP version. Reviewed-by: Kuniyuki Iwashima Signed-off-by: Lorenz Bauer Link: https://lore.kernel.org/r/20230720-so-reuseport-v6-4-7021b683cdae@isovalent.com Signed-off-by: Martin KaFai Lau --- include/net/inet6_hashtables.h | 11 ++++++++++- include/net/inet_hashtables.h | 15 ++++++++++----- 2 files changed, 20 insertions(+), 6 deletions(-) (limited to 'include') diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h index 032ddab48f8f..f89320b6fee3 100644 --- a/include/net/inet6_hashtables.h +++ b/include/net/inet6_hashtables.h @@ -48,12 +48,21 @@ struct sock *__inet6_lookup_established(struct net *net, const u16 hnum, const int dif, const int sdif); +typedef u32 (inet6_ehashfn_t)(const struct net *net, + const struct in6_addr *laddr, const u16 lport, + const struct in6_addr *faddr, const __be16 fport); + +inet6_ehashfn_t inet6_ehashfn; + +INDIRECT_CALLABLE_DECLARE(inet6_ehashfn_t udp6_ehashfn); + struct sock *inet6_lookup_reuseport(struct net *net, struct sock *sk, struct sk_buff *skb, int doff, const struct in6_addr *saddr, __be16 sport, const struct in6_addr *daddr, - unsigned short hnum); + unsigned short hnum, + inet6_ehashfn_t *ehashfn); struct sock *inet6_lookup_listener(struct net *net, struct inet_hashinfo *hashinfo, diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 8734f3488f5d..ddfa2e67fdb5 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -379,10 +379,19 @@ struct sock *__inet_lookup_established(struct net *net, const __be32 daddr, const u16 hnum, const int dif, const int sdif); +typedef u32 (inet_ehashfn_t)(const struct net *net, + const __be32 laddr, const __u16 lport, + const __be32 faddr, const __be16 fport); + +inet_ehashfn_t inet_ehashfn; + +INDIRECT_CALLABLE_DECLARE(inet_ehashfn_t udp_ehashfn); + struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk, struct sk_buff *skb, int doff, __be32 saddr, __be16 sport, - __be32 daddr, unsigned short hnum); + __be32 daddr, unsigned short hnum, + inet_ehashfn_t *ehashfn); static inline struct sock * inet_lookup_established(struct net *net, struct inet_hashinfo *hashinfo, @@ -453,10 +462,6 @@ static inline struct sock *__inet_lookup_skb(struct inet_hashinfo *hashinfo, refcounted); } -u32 inet6_ehashfn(const struct net *net, - const struct in6_addr *laddr, const u16 lport, - const struct in6_addr *faddr, const __be16 fport); - static inline void sk_daddr_set(struct sock *sk, __be32 addr) { sk->sk_daddr = addr; /* alias of inet_daddr */ -- cgit v1.2.3 From 6c886db2e78ce1dee163d07240467770a235f33e Mon Sep 17 00:00:00 2001 From: Lorenz Bauer Date: Thu, 20 Jul 2023 17:30:10 +0200 Subject: net: remove duplicate sk_lookup helpers Now that inet[6]_lookup_reuseport are parameterised on the ehashfn we can remove two sk_lookup helpers. Reviewed-by: Kuniyuki Iwashima Signed-off-by: Lorenz Bauer Link: https://lore.kernel.org/r/20230720-so-reuseport-v6-6-7021b683cdae@isovalent.com Signed-off-by: Martin KaFai Lau --- include/net/inet6_hashtables.h | 9 +++++++++ include/net/inet_hashtables.h | 7 +++++++ 2 files changed, 16 insertions(+) (limited to 'include') diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h index f89320b6fee3..a6722d6ef80f 100644 --- a/include/net/inet6_hashtables.h +++ b/include/net/inet6_hashtables.h @@ -73,6 +73,15 @@ struct sock *inet6_lookup_listener(struct net *net, const unsigned short hnum, const int dif, const int sdif); +struct sock *inet6_lookup_run_sk_lookup(struct net *net, + int protocol, + struct sk_buff *skb, int doff, + const struct in6_addr *saddr, + const __be16 sport, + const struct in6_addr *daddr, + const u16 hnum, const int dif, + inet6_ehashfn_t *ehashfn); + static inline struct sock *__inet6_lookup(struct net *net, struct inet_hashinfo *hashinfo, struct sk_buff *skb, int doff, diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index ddfa2e67fdb5..c0532cc7587f 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -393,6 +393,13 @@ struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk, __be32 daddr, unsigned short hnum, inet_ehashfn_t *ehashfn); +struct sock *inet_lookup_run_sk_lookup(struct net *net, + int protocol, + struct sk_buff *skb, int doff, + __be32 saddr, __be16 sport, + __be32 daddr, u16 hnum, const int dif, + inet_ehashfn_t *ehashfn); + static inline struct sock * inet_lookup_established(struct net *net, struct inet_hashinfo *hashinfo, const __be32 saddr, const __be16 sport, -- cgit v1.2.3 From 9c02bec95954252c3c01bfbb3f7560e0b95ca955 Mon Sep 17 00:00:00 2001 From: Lorenz Bauer Date: Thu, 20 Jul 2023 17:30:11 +0200 Subject: bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign Currently the bpf_sk_assign helper in tc BPF context refuses SO_REUSEPORT sockets. This means we can't use the helper to steer traffic to Envoy, which configures SO_REUSEPORT on its sockets. In turn, we're blocked from removing TPROXY from our setup. The reason that bpf_sk_assign refuses such sockets is that the bpf_sk_lookup helpers don't execute SK_REUSEPORT programs. Instead, one of the reuseport sockets is selected by hash. This could cause dispatch to the "wrong" socket: sk = bpf_sk_lookup_tcp(...) // select SO_REUSEPORT by hash bpf_sk_assign(skb, sk) // SK_REUSEPORT wasn't executed Fixing this isn't as simple as invoking SK_REUSEPORT from the lookup helpers unfortunately. In the tc context, L2 headers are at the start of the skb, while SK_REUSEPORT expects L3 headers instead. Instead, we execute the SK_REUSEPORT program when the assigned socket is pulled out of the skb, further up the stack. This creates some trickiness with regards to refcounting as bpf_sk_assign will put both refcounted and RCU freed sockets in skb->sk. reuseport sockets are RCU freed. We can infer that the sk_assigned socket is RCU freed if the reuseport lookup succeeds, but convincing yourself of this fact isn't straight forward. Therefore we defensively check refcounting on the sk_assign sock even though it's probably not required in practice. Fixes: 8e368dc72e86 ("bpf: Fix use of sk->sk_reuseport from sk_assign") Fixes: cf7fbe660f2d ("bpf: Add socket assign support") Co-developed-by: Daniel Borkmann Signed-off-by: Daniel Borkmann Cc: Joe Stringer Link: https://lore.kernel.org/bpf/CACAyw98+qycmpQzKupquhkxbvWK4OFyDuuLMBNROnfWMZxUWeA@mail.gmail.com/ Reviewed-by: Kuniyuki Iwashima Signed-off-by: Lorenz Bauer Link: https://lore.kernel.org/r/20230720-so-reuseport-v6-7-7021b683cdae@isovalent.com Signed-off-by: Martin KaFai Lau --- include/net/inet6_hashtables.h | 56 ++++++++++++++++++++++++++++++++++++++---- include/net/inet_hashtables.h | 49 ++++++++++++++++++++++++++++++++++-- include/net/sock.h | 7 ++++-- include/uapi/linux/bpf.h | 3 --- 4 files changed, 103 insertions(+), 12 deletions(-) (limited to 'include') diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h index a6722d6ef80f..284b5ce7205d 100644 --- a/include/net/inet6_hashtables.h +++ b/include/net/inet6_hashtables.h @@ -103,6 +103,46 @@ static inline struct sock *__inet6_lookup(struct net *net, daddr, hnum, dif, sdif); } +static inline +struct sock *inet6_steal_sock(struct net *net, struct sk_buff *skb, int doff, + const struct in6_addr *saddr, const __be16 sport, + const struct in6_addr *daddr, const __be16 dport, + bool *refcounted, inet6_ehashfn_t *ehashfn) +{ + struct sock *sk, *reuse_sk; + bool prefetched; + + sk = skb_steal_sock(skb, refcounted, &prefetched); + if (!sk) + return NULL; + + if (!prefetched) + return sk; + + if (sk->sk_protocol == IPPROTO_TCP) { + if (sk->sk_state != TCP_LISTEN) + return sk; + } else if (sk->sk_protocol == IPPROTO_UDP) { + if (sk->sk_state != TCP_CLOSE) + return sk; + } else { + return sk; + } + + reuse_sk = inet6_lookup_reuseport(net, sk, skb, doff, + saddr, sport, daddr, ntohs(dport), + ehashfn); + if (!reuse_sk) + return sk; + + /* We've chosen a new reuseport sock which is never refcounted. This + * implies that sk also isn't refcounted. + */ + WARN_ON_ONCE(*refcounted); + + return reuse_sk; +} + static inline struct sock *__inet6_lookup_skb(struct inet_hashinfo *hashinfo, struct sk_buff *skb, int doff, const __be16 sport, @@ -110,14 +150,20 @@ static inline struct sock *__inet6_lookup_skb(struct inet_hashinfo *hashinfo, int iif, int sdif, bool *refcounted) { - struct sock *sk = skb_steal_sock(skb, refcounted); - + struct net *net = dev_net(skb_dst(skb)->dev); + const struct ipv6hdr *ip6h = ipv6_hdr(skb); + struct sock *sk; + + sk = inet6_steal_sock(net, skb, doff, &ip6h->saddr, sport, &ip6h->daddr, dport, + refcounted, inet6_ehashfn); + if (IS_ERR(sk)) + return NULL; if (sk) return sk; - return __inet6_lookup(dev_net(skb_dst(skb)->dev), hashinfo, skb, - doff, &ipv6_hdr(skb)->saddr, sport, - &ipv6_hdr(skb)->daddr, ntohs(dport), + return __inet6_lookup(net, hashinfo, skb, + doff, &ip6h->saddr, sport, + &ip6h->daddr, ntohs(dport), iif, sdif, refcounted); } diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index c0532cc7587f..1177effabed3 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -449,6 +449,46 @@ static inline struct sock *inet_lookup(struct net *net, return sk; } +static inline +struct sock *inet_steal_sock(struct net *net, struct sk_buff *skb, int doff, + const __be32 saddr, const __be16 sport, + const __be32 daddr, const __be16 dport, + bool *refcounted, inet_ehashfn_t *ehashfn) +{ + struct sock *sk, *reuse_sk; + bool prefetched; + + sk = skb_steal_sock(skb, refcounted, &prefetched); + if (!sk) + return NULL; + + if (!prefetched) + return sk; + + if (sk->sk_protocol == IPPROTO_TCP) { + if (sk->sk_state != TCP_LISTEN) + return sk; + } else if (sk->sk_protocol == IPPROTO_UDP) { + if (sk->sk_state != TCP_CLOSE) + return sk; + } else { + return sk; + } + + reuse_sk = inet_lookup_reuseport(net, sk, skb, doff, + saddr, sport, daddr, ntohs(dport), + ehashfn); + if (!reuse_sk) + return sk; + + /* We've chosen a new reuseport sock which is never refcounted. This + * implies that sk also isn't refcounted. + */ + WARN_ON_ONCE(*refcounted); + + return reuse_sk; +} + static inline struct sock *__inet_lookup_skb(struct inet_hashinfo *hashinfo, struct sk_buff *skb, int doff, @@ -457,13 +497,18 @@ static inline struct sock *__inet_lookup_skb(struct inet_hashinfo *hashinfo, const int sdif, bool *refcounted) { - struct sock *sk = skb_steal_sock(skb, refcounted); + struct net *net = dev_net(skb_dst(skb)->dev); const struct iphdr *iph = ip_hdr(skb); + struct sock *sk; + sk = inet_steal_sock(net, skb, doff, iph->saddr, sport, iph->daddr, dport, + refcounted, inet_ehashfn); + if (IS_ERR(sk)) + return NULL; if (sk) return sk; - return __inet_lookup(dev_net(skb_dst(skb)->dev), hashinfo, skb, + return __inet_lookup(net, hashinfo, skb, doff, iph->saddr, sport, iph->daddr, dport, inet_iif(skb), sdif, refcounted); diff --git a/include/net/sock.h b/include/net/sock.h index 7ae44bf866af..74cbfb15d289 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2815,20 +2815,23 @@ sk_is_refcounted(struct sock *sk) * skb_steal_sock - steal a socket from an sk_buff * @skb: sk_buff to steal the socket from * @refcounted: is set to true if the socket is reference-counted + * @prefetched: is set to true if the socket was assigned from bpf */ static inline struct sock * -skb_steal_sock(struct sk_buff *skb, bool *refcounted) +skb_steal_sock(struct sk_buff *skb, bool *refcounted, bool *prefetched) { if (skb->sk) { struct sock *sk = skb->sk; *refcounted = true; - if (skb_sk_is_prefetched(skb)) + *prefetched = skb_sk_is_prefetched(skb); + if (*prefetched) *refcounted = sk_is_refcounted(sk); skb->destructor = NULL; skb->sk = NULL; return sk; } + *prefetched = false; *refcounted = false; return NULL; } diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 739c15906a65..7fc98f4b63e9 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -4198,9 +4198,6 @@ union bpf_attr { * **-EOPNOTSUPP** if the operation is not supported, for example * a call from outside of TC ingress. * - * **-ESOCKTNOSUPPORT** if the socket type is not supported - * (reuseport). - * * long bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags) * Description * Helper is overloaded depending on BPF program type. This -- cgit v1.2.3 From 2303fae130640874823ea1bc7ec65c3cd074a7eb Mon Sep 17 00:00:00 2001 From: Peter Seiderer Date: Mon, 24 Jul 2023 18:22:55 +0200 Subject: net: skbuff: remove unused HAVE_HW_TIME_STAMP feature define Remove unused HAVE_HW_TIME_STAMP feature define (introduced by commit ac45f602ee3d ("net: infrastructure for hardware time stamping"). Signed-off-by: Peter Seiderer Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/linux/skbuff.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index faaba050f843..16a49ba534e4 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -441,8 +441,6 @@ static inline bool skb_frag_must_loop(struct page *p) copied += p_len, p++, p_off = 0, \ p_len = min_t(u32, f_len - copied, PAGE_SIZE)) \ -#define HAVE_HW_TIME_STAMP - /** * struct skb_shared_hwtstamps - hardware time stamps * @hwtstamp: hardware time stamp transformed into duration -- cgit v1.2.3 From 8b305ee2a91c3c4c89cb82ea940265b247eb0a13 Mon Sep 17 00:00:00 2001 From: Tristram Ha Date: Tue, 25 Jul 2023 16:54:30 -0700 Subject: net: phy: smsc: add WoL support to LAN8740/LAN8742 PHYs Microchip LAN8740/LAN8742 PHYs support basic unicast, broadcast, and Magic Packet WoL. They have one pattern filter matching up to 128 bytes of frame data, which can be used to implement ARP or multicast WoL. ARP WoL matches any ARP frame with broadcast address. Multicast WoL matches any multicast frame. Signed-off-by: Tristram Ha Reviewed-by: Florian Fainelli Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/1690329270-2873-1-git-send-email-Tristram.Ha@microchip.com Signed-off-by: Jakub Kicinski --- include/linux/smscphy.h | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) (limited to 'include') diff --git a/include/linux/smscphy.h b/include/linux/smscphy.h index e1c88627755a..1a6a851d2cf8 100644 --- a/include/linux/smscphy.h +++ b/include/linux/smscphy.h @@ -38,4 +38,38 @@ int smsc_phy_set_tunable(struct phy_device *phydev, struct ethtool_tunable *tuna, const void *data); int smsc_phy_probe(struct phy_device *phydev); +#define MII_LAN874X_PHY_MMD_WOL_WUCSR 0x8010 +#define MII_LAN874X_PHY_MMD_WOL_WUF_CFGA 0x8011 +#define MII_LAN874X_PHY_MMD_WOL_WUF_CFGB 0x8012 +#define MII_LAN874X_PHY_MMD_WOL_WUF_MASK0 0x8021 +#define MII_LAN874X_PHY_MMD_WOL_WUF_MASK1 0x8022 +#define MII_LAN874X_PHY_MMD_WOL_WUF_MASK2 0x8023 +#define MII_LAN874X_PHY_MMD_WOL_WUF_MASK3 0x8024 +#define MII_LAN874X_PHY_MMD_WOL_WUF_MASK4 0x8025 +#define MII_LAN874X_PHY_MMD_WOL_WUF_MASK5 0x8026 +#define MII_LAN874X_PHY_MMD_WOL_WUF_MASK6 0x8027 +#define MII_LAN874X_PHY_MMD_WOL_WUF_MASK7 0x8028 +#define MII_LAN874X_PHY_MMD_WOL_RX_ADDRA 0x8061 +#define MII_LAN874X_PHY_MMD_WOL_RX_ADDRB 0x8062 +#define MII_LAN874X_PHY_MMD_WOL_RX_ADDRC 0x8063 +#define MII_LAN874X_PHY_MMD_MCFGR 0x8064 + +#define MII_LAN874X_PHY_PME1_SET (2 << 13) +#define MII_LAN874X_PHY_PME2_SET (2 << 11) +#define MII_LAN874X_PHY_PME_SELF_CLEAR BIT(9) +#define MII_LAN874X_PHY_WOL_PFDA_FR BIT(7) +#define MII_LAN874X_PHY_WOL_WUFR BIT(6) +#define MII_LAN874X_PHY_WOL_MPR BIT(5) +#define MII_LAN874X_PHY_WOL_BCAST_FR BIT(4) +#define MII_LAN874X_PHY_WOL_PFDAEN BIT(3) +#define MII_LAN874X_PHY_WOL_WUEN BIT(2) +#define MII_LAN874X_PHY_WOL_MPEN BIT(1) +#define MII_LAN874X_PHY_WOL_BCSTEN BIT(0) + +#define MII_LAN874X_PHY_WOL_FILTER_EN BIT(15) +#define MII_LAN874X_PHY_WOL_FILTER_MCASTTEN BIT(9) +#define MII_LAN874X_PHY_WOL_FILTER_BCSTEN BIT(8) + +#define MII_LAN874X_PHY_PME_SELF_CLEAR_DELAY 0x1000 /* 81 milliseconds */ + #endif /* __LINUX_SMSCPHY_H__ */ -- cgit v1.2.3 From 5fac9b7c16c50c6c7699517f582b56e3743f453a Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Tue, 18 Jul 2023 09:52:29 +0200 Subject: netlink: allow be16 and be32 types in all uint policy checks __NLA_IS_BEINT_TYPE(tp) isn't useful. NLA_BE16/32 are identical to NLA_U16/32, the only difference is that it tells the netlink validation functions that byteorder conversion might be needed before comparing the value to the policy min/max ones. After this change all policy macros that can be used with UINT types, such as NLA_POLICY_MASK() can also be used with NLA_BE16/32. This will be used to validate nf_tables flag attributes which are in bigendian byte order. Signed-off-by: Florian Westphal --- include/net/netlink.h | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) (limited to 'include') diff --git a/include/net/netlink.h b/include/net/netlink.h index b12cd957abb4..8a7cd1170e1f 100644 --- a/include/net/netlink.h +++ b/include/net/netlink.h @@ -375,12 +375,11 @@ struct nla_policy { #define NLA_POLICY_BITFIELD32(valid) \ { .type = NLA_BITFIELD32, .bitfield32_valid = valid } -#define __NLA_IS_UINT_TYPE(tp) \ - (tp == NLA_U8 || tp == NLA_U16 || tp == NLA_U32 || tp == NLA_U64) +#define __NLA_IS_UINT_TYPE(tp) \ + (tp == NLA_U8 || tp == NLA_U16 || tp == NLA_U32 || \ + tp == NLA_U64 || tp == NLA_BE16 || tp == NLA_BE32) #define __NLA_IS_SINT_TYPE(tp) \ (tp == NLA_S8 || tp == NLA_S16 || tp == NLA_S32 || tp == NLA_S64) -#define __NLA_IS_BEINT_TYPE(tp) \ - (tp == NLA_BE16 || tp == NLA_BE32) #define __NLA_ENSURE(condition) BUILD_BUG_ON_ZERO(!(condition)) #define NLA_ENSURE_UINT_TYPE(tp) \ @@ -394,7 +393,6 @@ struct nla_policy { #define NLA_ENSURE_INT_OR_BINARY_TYPE(tp) \ (__NLA_ENSURE(__NLA_IS_UINT_TYPE(tp) || \ __NLA_IS_SINT_TYPE(tp) || \ - __NLA_IS_BEINT_TYPE(tp) || \ tp == NLA_MSECS || \ tp == NLA_BINARY) + tp) #define NLA_ENSURE_NO_VALIDATION_PTR(tp) \ @@ -402,8 +400,6 @@ struct nla_policy { tp != NLA_REJECT && \ tp != NLA_NESTED && \ tp != NLA_NESTED_ARRAY) + tp) -#define NLA_ENSURE_BEINT_TYPE(tp) \ - (__NLA_ENSURE(__NLA_IS_BEINT_TYPE(tp)) + tp) #define NLA_POLICY_RANGE(tp, _min, _max) { \ .type = NLA_ENSURE_INT_OR_BINARY_TYPE(tp), \ -- cgit v1.2.3 From 88d162b479815f5d6b6a4ff5fdb07aec9dc6280c Mon Sep 17 00:00:00 2001 From: Roi Dayan Date: Thu, 4 May 2023 12:14:00 +0300 Subject: net/mlx5: Devcom, Infrastructure changes Update devcom infrastructure to be more generic, without depending on max supported ports definition or a device guid, and also more encapsulated so callers don't need to pass the register devcom component id per event call. Signed-off-by: Eli Cohen Signed-off-by: Roi Dayan Reviewed-by: Shay Drory Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 25d0528f9219..56dd3dfe2304 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -501,7 +501,7 @@ struct mlx5_events; struct mlx5_mpfs; struct mlx5_eswitch; struct mlx5_lag; -struct mlx5_devcom; +struct mlx5_devcom_dev; struct mlx5_fw_reset; struct mlx5_eq_table; struct mlx5_irq_table; @@ -618,7 +618,7 @@ struct mlx5_priv { struct mlx5_core_sriov sriov; struct mlx5_lag *lag; u32 flags; - struct mlx5_devcom *devcom; + struct mlx5_devcom_dev *devc; struct mlx5_fw_reset *fw_reset; struct mlx5_core_roce roce; struct mlx5_fc_stats fc_stats; -- cgit v1.2.3 From 58db72869a9f8e01910844ca145efc2ea91bbbf9 Mon Sep 17 00:00:00 2001 From: Shay Drory Date: Wed, 18 Jan 2023 16:52:17 +0200 Subject: net/mlx5: Re-organize mlx5_cmd struct Downstream patch will split mlx5_cmd_init() to probe and reload routines. As a preparation, organize mlx5_cmd struct so that any field that will be used in the reload routine are grouped at new nested struct. Signed-off-by: Shay Drory Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) (limited to 'include') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 56dd3dfe2304..39c5f4087c39 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -287,18 +287,23 @@ struct mlx5_cmd_stats { struct mlx5_cmd { struct mlx5_nb nb; + /* members which needs to be queried or reinitialized each reload */ + struct { + u16 cmdif_rev; + u8 log_sz; + u8 log_stride; + int max_reg_cmds; + unsigned long bitmask; + struct semaphore sem; + struct semaphore pages_sem; + struct semaphore throttle_sem; + } vars; enum mlx5_cmdif_state state; void *cmd_alloc_buf; dma_addr_t alloc_dma; int alloc_size; void *cmd_buf; dma_addr_t dma; - u16 cmdif_rev; - u8 log_sz; - u8 log_stride; - int max_reg_cmds; - int events; - u32 __iomem *vector; /* protect command queue allocations */ @@ -308,12 +313,8 @@ struct mlx5_cmd { */ spinlock_t token_lock; u8 token; - unsigned long bitmask; char wq_name[MLX5_CMD_WQ_MAX_NAME]; struct workqueue_struct *wq; - struct semaphore sem; - struct semaphore pages_sem; - struct semaphore throttle_sem; int mode; u16 allowed_opcode; struct mlx5_cmd_work_ent *ent_arr[MLX5_MAX_COMMANDS]; -- cgit v1.2.3 From b90ebfc018b087ba1e4981b298b58733236ff296 Mon Sep 17 00:00:00 2001 From: Shay Drory Date: Thu, 19 Jan 2023 09:10:50 +0200 Subject: net/mlx5: Allocate command stats with xarray Command stats is an array with more than 2K entries, which amounts to ~180KB. This is way more than actually needed, as only ~190 entries are being used. Therefore, replace the array with xarray. Signed-off-by: Shay Drory Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 39c5f4087c39..f21703fb75fd 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -322,7 +322,7 @@ struct mlx5_cmd { struct mlx5_cmd_debug dbg; struct cmd_msg_cache cache[MLX5_NUM_COMMAND_CACHES]; int checksum_disabled; - struct mlx5_cmd_stats stats[MLX5_CMD_OP_MAX]; + struct xarray stats; }; struct mlx5_cmd_mailbox { -- cgit v1.2.3 From d0358c1a37db4c46b9a1cd6c1b36e5a24ff970f9 Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Wed, 26 Jul 2023 22:37:15 +0800 Subject: net: Remove unused declaration dev_restart() This is not used, so can remove it. Signed-off-by: YueHaibing Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20230726143715.24700-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 11652e464f5d..32a4cdf37dd4 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3130,8 +3130,6 @@ struct net_device *netdev_get_by_name(struct net *net, const char *name, netdevice_tracker *tracker, gfp_t gfp); struct net_device *dev_get_by_index_rcu(struct net *net, int ifindex); struct net_device *dev_get_by_napi_id(unsigned int napi_id); -int dev_restart(struct net_device *dev); - static inline int dev_hard_header(struct sk_buff *skb, struct net_device *dev, unsigned short type, -- cgit v1.2.3 From 994650353cae9e10cca7fd61de79c911aa8ed287 Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Wed, 26 Jul 2023 22:40:54 +0800 Subject: net: datalink: Remove unused declarations These declarations is not used after ipx protocol removed. Signed-off-by: YueHaibing Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20230726144054.28780-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski --- include/net/datalink.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include') diff --git a/include/net/datalink.h b/include/net/datalink.h index c837ffc7ebf8..6c529a40e00d 100644 --- a/include/net/datalink.h +++ b/include/net/datalink.h @@ -23,6 +23,4 @@ struct datalink_proto { struct list_head node; }; -struct datalink_proto *make_EII_client(void); -void destroy_EII_client(struct datalink_proto *dl); #endif -- cgit v1.2.3 From 1f9a1ea821ff25353a0e80d971e7958cd55b47a3 Mon Sep 17 00:00:00 2001 From: Yonghong Song Date: Thu, 27 Jul 2023 18:11:56 -0700 Subject: bpf: Support new sign-extension load insns Add interpreter/jit support for new sign-extension load insns which adds a new mode (BPF_MEMSX). Also add verifier support to recognize these insns and to do proper verification with new insns. In verifier, besides to deduce proper bounds for the dst_reg, probed memory access is also properly handled. Acked-by: Eduard Zingerman Signed-off-by: Yonghong Song Link: https://lore.kernel.org/r/20230728011156.3711870-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov --- include/linux/filter.h | 3 +++ include/uapi/linux/bpf.h | 1 + 2 files changed, 4 insertions(+) (limited to 'include') diff --git a/include/linux/filter.h b/include/linux/filter.h index f69114083ec7..a93242b5516b 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -69,6 +69,9 @@ struct ctl_table_header; /* unused opcode to mark special load instruction. Same as BPF_ABS */ #define BPF_PROBE_MEM 0x20 +/* unused opcode to mark special ldsx instruction. Same as BPF_IND */ +#define BPF_PROBE_MEMSX 0x40 + /* unused opcode to mark call to interpreter with arguments */ #define BPF_CALL_ARGS 0xe0 diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 7fc98f4b63e9..14fd26b09e4b 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -19,6 +19,7 @@ /* ld/ldx fields */ #define BPF_DW 0x18 /* double word (64-bit) */ +#define BPF_MEMSX 0x80 /* load with sign extension */ #define BPF_ATOMIC 0xc0 /* atomic memory ops - op type in immediate */ #define BPF_XADD 0xc0 /* exclusive add - legacy name */ -- cgit v1.2.3 From 7058e3a31ee4b9240cccab5bc13c1afbfa3d16a0 Mon Sep 17 00:00:00 2001 From: Yonghong Song Date: Thu, 27 Jul 2023 18:12:25 -0700 Subject: bpf: Fix jit blinding with new sdiv/smov insns Handle new insns properly in bpf_jit_blind_insn() function. Acked-by: Eduard Zingerman Signed-off-by: Yonghong Song Link: https://lore.kernel.org/r/20230728011225.3715812-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov --- include/linux/filter.h | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) (limited to 'include') diff --git a/include/linux/filter.h b/include/linux/filter.h index a93242b5516b..f5eabe3fa5e8 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -93,22 +93,28 @@ struct ctl_table_header; /* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */ -#define BPF_ALU64_REG(OP, DST, SRC) \ +#define BPF_ALU64_REG_OFF(OP, DST, SRC, OFF) \ ((struct bpf_insn) { \ .code = BPF_ALU64 | BPF_OP(OP) | BPF_X, \ .dst_reg = DST, \ .src_reg = SRC, \ - .off = 0, \ + .off = OFF, \ .imm = 0 }) -#define BPF_ALU32_REG(OP, DST, SRC) \ +#define BPF_ALU64_REG(OP, DST, SRC) \ + BPF_ALU64_REG_OFF(OP, DST, SRC, 0) + +#define BPF_ALU32_REG_OFF(OP, DST, SRC, OFF) \ ((struct bpf_insn) { \ .code = BPF_ALU | BPF_OP(OP) | BPF_X, \ .dst_reg = DST, \ .src_reg = SRC, \ - .off = 0, \ + .off = OFF, \ .imm = 0 }) +#define BPF_ALU32_REG(OP, DST, SRC) \ + BPF_ALU32_REG_OFF(OP, DST, SRC, 0) + /* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */ #define BPF_ALU64_IMM(OP, DST, IMM) \ -- cgit v1.2.3 From d928d14be6514af9da37009c2091270c5c714366 Mon Sep 17 00:00:00 2001 From: Andrew Halaney Date: Tue, 25 Jul 2023 16:04:25 -0500 Subject: net: stmmac: Make ptp_clk_freq_config variable type explicit The priv variable is _always_ of type (struct stmmac_priv *), so let's stop using (void *) since it isn't abstracting anything. Reviewed-by: Simon Horman Signed-off-by: Andrew Halaney Link: https://lore.kernel.org/r/20230725211853.895832-3-ahalaney@redhat.com Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index ef67dba775d0..3d0702510224 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -76,6 +76,8 @@ | DMA_AXI_BLEN_32 | DMA_AXI_BLEN_64 \ | DMA_AXI_BLEN_128 | DMA_AXI_BLEN_256) +struct stmmac_priv; + /* Platfrom data for platform device structure's platform_data field */ struct stmmac_mdio_bus_data { @@ -258,7 +260,7 @@ struct plat_stmmacenet_data { int (*serdes_powerup)(struct net_device *ndev, void *priv); void (*serdes_powerdown)(struct net_device *ndev, void *priv); void (*speed_mode_2500)(struct net_device *ndev, void *priv); - void (*ptp_clk_freq_config)(void *priv); + void (*ptp_clk_freq_config)(struct stmmac_priv *priv); int (*init)(struct platform_device *pdev, void *priv); void (*exit)(struct platform_device *pdev, void *priv); struct mac_device_info *(*setup)(void *priv); -- cgit v1.2.3 From 2e3df4a3b3178006d530f4c4d0e91f3d96cddb3c Mon Sep 17 00:00:00 2001 From: Marc Kleine-Budde Date: Fri, 2 Jun 2023 09:36:38 +0200 Subject: can: rx-offload: rename rx_offload_get_echo_skb() -> can_rx_offload_get_echo_skb_queue_timestamp() Rename the rx_offload_get_echo_skb() function to can_rx_offload_get_echo_skb_queue_timestamp(), since it inserts the echo skb into the rx-offload queue sorted by timestamp. This is a preparation for adding can_rx_offload_get_echo_skb_queue_tail(), which adds the echo skb to the end of the queue. This is intended for devices that do not support timestamps. Link: https://lore.kernel.org/all/20230718-gs_usb-rx-offload-v2-1-716e542d14d5@pengutronix.de Signed-off-by: Marc Kleine-Budde --- include/linux/can/rx-offload.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/linux/can/rx-offload.h b/include/linux/can/rx-offload.h index c205c51d79c9..e3b4199732c6 100644 --- a/include/linux/can/rx-offload.h +++ b/include/linux/can/rx-offload.h @@ -44,9 +44,9 @@ int can_rx_offload_irq_offload_timestamp(struct can_rx_offload *offload, int can_rx_offload_irq_offload_fifo(struct can_rx_offload *offload); int can_rx_offload_queue_timestamp(struct can_rx_offload *offload, struct sk_buff *skb, u32 timestamp); -unsigned int can_rx_offload_get_echo_skb(struct can_rx_offload *offload, - unsigned int idx, u32 timestamp, - unsigned int *frame_len_ptr); +unsigned int can_rx_offload_get_echo_skb_queue_timestamp(struct can_rx_offload *offload, + unsigned int idx, u32 timestamp, + unsigned int *frame_len_ptr); int can_rx_offload_queue_tail(struct can_rx_offload *offload, struct sk_buff *skb); void can_rx_offload_irq_finish(struct can_rx_offload *offload); -- cgit v1.2.3 From 8e0e2950c9ef48f7f40a7175048744ec2390b16e Mon Sep 17 00:00:00 2001 From: Marc Kleine-Budde Date: Mon, 3 Jul 2023 18:18:19 +0200 Subject: can: rx-offload: add can_rx_offload_get_echo_skb_queue_tail() Add can_rx_offload_get_echo_skb_queue_tail(). This function addds the echo skb at the end of rx-offload the queue. This is intended for devices without timestamp support. Link: https://lore.kernel.org/all/20230718-gs_usb-rx-offload-v2-2-716e542d14d5@pengutronix.de Signed-off-by: Marc Kleine-Budde --- include/linux/can/rx-offload.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/can/rx-offload.h b/include/linux/can/rx-offload.h index e3b4199732c6..d29bb4521947 100644 --- a/include/linux/can/rx-offload.h +++ b/include/linux/can/rx-offload.h @@ -3,7 +3,7 @@ * linux/can/rx-offload.h * * Copyright (c) 2014 David Jander, Protonic Holland - * Copyright (c) 2014-2017 Pengutronix, Marc Kleine-Budde + * Copyright (c) 2014-2017, 2023 Pengutronix, Marc Kleine-Budde */ #ifndef _CAN_RX_OFFLOAD_H @@ -49,6 +49,9 @@ unsigned int can_rx_offload_get_echo_skb_queue_timestamp(struct can_rx_offload * unsigned int *frame_len_ptr); int can_rx_offload_queue_tail(struct can_rx_offload *offload, struct sk_buff *skb); +unsigned int can_rx_offload_get_echo_skb_queue_tail(struct can_rx_offload *offload, + unsigned int idx, + unsigned int *frame_len_ptr); void can_rx_offload_irq_finish(struct can_rx_offload *offload); void can_rx_offload_threaded_irq_finish(struct can_rx_offload *offload); void can_rx_offload_del(struct can_rx_offload *offload); -- cgit v1.2.3 From 7f6c40391a048c5d0f593f285bee45f7f98a3ca4 Mon Sep 17 00:00:00 2001 From: Hangbin Liu Date: Wed, 26 Jul 2023 10:39:05 +0800 Subject: IPv6: add extack info for IPv6 address add/delete Add extack info for IPv6 address add/delete, which would be useful for users to understand the problem without having to read kernel code. Suggested-by: Beniamino Galvani Reviewed-by: Ido Schimmel Reviewed-by: David Ahern Signed-off-by: Hangbin Liu Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/net/ip6_route.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h index 3556595ce59a..b32539bb0fb0 100644 --- a/include/net/ip6_route.h +++ b/include/net/ip6_route.h @@ -156,7 +156,7 @@ void fib6_force_start_gc(struct net *net); struct fib6_info *addrconf_f6i_alloc(struct net *net, struct inet6_dev *idev, const struct in6_addr *addr, bool anycast, - gfp_t gfp_flags); + gfp_t gfp_flags, struct netlink_ext_ack *extack); struct rt6_info *ip6_dst_alloc(struct net *net, struct net_device *dev, int flags); -- cgit v1.2.3 From 25b5a2a1905fd8631c73596f98793f7494f29f2a Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Thu, 27 Jul 2023 09:30:00 -0700 Subject: ynl: regenerate all headers Also add support to pass topdir to ynl-regen.sh (Jakub) and call it from the makefile to update the UAPI headers. Signed-off-by: Stanislav Fomichev Co-developed-by: Jakub Kicinski Reviewed-by: Jakub Kicinski Link: https://lore.kernel.org/r/20230727163001.3952878-4-sdf@google.com Signed-off-by: Jakub Kicinski --- include/uapi/linux/netdev.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h index bf71698a1e82..c1634b95c223 100644 --- a/include/uapi/linux/netdev.h +++ b/include/uapi/linux/netdev.h @@ -11,7 +11,7 @@ /** * enum netdev_xdp_act - * @NETDEV_XDP_ACT_BASIC: XDP feautues set supported by all drivers + * @NETDEV_XDP_ACT_BASIC: XDP features set supported by all drivers * (XDP_ABORTED, XDP_DROP, XDP_PASS, XDP_TX) * @NETDEV_XDP_ACT_REDIRECT: The netdev supports XDP_REDIRECT * @NETDEV_XDP_ACT_NDO_XMIT: This feature informs if netdev implements @@ -34,6 +34,7 @@ enum netdev_xdp_act { NETDEV_XDP_ACT_RX_SG = 32, NETDEV_XDP_ACT_NDO_XMIT_SG = 64, + /* private: */ NETDEV_XDP_ACT_MASK = 127, }; -- cgit v1.2.3 From 759ab1edb56c88906830fd6b2e7b12514dd32758 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 26 Jul 2023 11:55:29 -0700 Subject: net: store netdevs in an xarray Iterating over the netdev hash table for netlink dumps is hard. Dumps are done in "chunks" so we need to save the position after each chunk, so we know where to restart from. Because netdevs are stored in a hash table we remember which bucket we were in and how many devices we dumped. Since we don't hold any locks across the "chunks" - devices may come and go while we're dumping. If that happens we may miss a device (if device is deleted from the bucket we were in). We indicate to user space that this may have happened by setting NLM_F_DUMP_INTR. User space is supposed to dump again (I think) if it sees that. Somehow I doubt most user space gets this right.. To illustrate let's look at an example: System state: start: # [A, B, C] del: B # [A, C] with the hash table we may dump [A, B], missing C completely even tho it existed both before and after the "del B". Add an xarray and use it to allocate ifindexes. This way we can iterate ifindexes in order, without the worry that we'll skip one. We may still generate a dump of a state which "never existed", for example for a set of values and sequence of ops: System state: start: # [A, B] add: C # [A, C, B] del: B # [A, C] we may generate a dump of [A], if C got an index between A and B. System has never been in such state. But I'm 90% sure that's perfectly fine, important part is that we can't _miss_ devices which exist before and after. User space which wants to mirror kernel's state subscribes to notifications and does periodic dumps so it will know that C exists from the notification about its creation or from the next dump (next dump is _guaranteed_ to include C, if it doesn't get removed). To avoid any perf regressions keep the hash table for now. Most net namespaces have very few devices and microbenchmarking 1M lookups on Skylake I get the following results (not counting loopback to number of devs): #devs | hash | xa | delta 2 | 18.3 | 20.1 | + 9.8% 16 | 18.3 | 20.1 | + 9.5% 64 | 18.3 | 26.3 | +43.8% 128 | 20.4 | 26.3 | +28.6% 256 | 20.0 | 26.4 | +32.1% 1024 | 26.6 | 26.7 | + 0.2% 8192 |541.3 | 33.5 | -93.8% No surprises since the hash table has 256 entries. The microbenchmark scans indexes in order, if the pattern is more random xa starts to win at 512 devices already. But that's a lot of devices, in practice. Reviewed-by: Leon Romanovsky Link: https://lore.kernel.org/r/20230726185530.2247698-2-kuba@kernel.org Signed-off-by: Jakub Kicinski --- include/net/net_namespace.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 78beaa765c73..9f6add96de2d 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -42,6 +42,7 @@ #include #include #include +#include struct user_namespace; struct proc_dir_entry; @@ -69,7 +70,7 @@ struct net { atomic_t dev_unreg_count; unsigned int dev_base_seq; /* protected by rtnl_mutex */ - int ifindex; + u32 ifindex; spinlock_t nsid_lock; atomic_t fnhe_genid; @@ -110,6 +111,7 @@ struct net { struct hlist_head *dev_name_head; struct hlist_head *dev_index_head; + struct xarray dev_by_index; struct raw_notifier_head netdev_chain; /* Note that @hash_mix can be read millions times per second, -- cgit v1.2.3 From 84e00d9bd4e472bd9b145ed40dbd132dd7a15462 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 26 Jul 2023 11:55:30 -0700 Subject: net: convert some netlink netdev iterators to depend on the xarray Reap the benefits of easier iteration thanks to the xarray. Convert just the genetlink ones, those are easier to test. Reviewed-by: Leon Romanovsky Link: https://lore.kernel.org/r/20230726185530.2247698-3-kuba@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 32a4cdf37dd4..84c36a7f873f 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3016,6 +3016,9 @@ extern rwlock_t dev_base_lock; /* Device list lock */ if (netdev_master_upper_dev_get_rcu(slave) == (bond)) #define net_device_entry(lh) list_entry(lh, struct net_device, dev_list) +#define for_each_netdev_dump(net, d, ifindex) \ + xa_for_each_start(&(net)->dev_by_index, (ifindex), (d), (ifindex)) + static inline struct net_device *next_net_device(struct net_device *dev) { struct list_head *lh; -- cgit v1.2.3 From 5027d54a9c30bc7ec808360378e2b4753f053f25 Mon Sep 17 00:00:00 2001 From: Patrick Rohr Date: Wed, 26 Jul 2023 16:07:01 -0700 Subject: net: change accept_ra_min_rtr_lft to affect all RA lifetimes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit accept_ra_min_rtr_lft only considered the lifetime of the default route and discarded entire RAs accordingly. This change renames accept_ra_min_rtr_lft to accept_ra_min_lft, and applies the value to individual RA sections; in particular, router lifetime, PIO preferred lifetime, and RIO lifetime. If any of those lifetimes are lower than the configured value, the specific RA section is ignored. In order for the sysctl to be useful to Android, it should really apply to all lifetimes in the RA, since that is what determines the minimum frequency at which RAs must be processed by the kernel. Android uses hardware offloads to drop RAs for a fraction of the minimum of all lifetimes present in the RA (some networks have very frequent RAs (5s) with high lifetimes (2h)). Despite this, we have encountered networks that set the router lifetime to 30s which results in very frequent CPU wakeups. Instead of disabling IPv6 (and dropping IPv6 ethertype in the WiFi firmware) entirely on such networks, it seems better to ignore the misconfigured routers while still processing RAs from other IPv6 routers on the same network (i.e. to support IoT applications). The previous implementation dropped the entire RA based on router lifetime. This turned out to be hard to expand to the other lifetimes present in the RA in a consistent manner; dropping the entire RA based on RIO/PIO lifetimes would essentially require parsing the whole thing twice. Fixes: 1671bcfd76fd ("net: add sysctl accept_ra_min_rtr_lft") Cc: Lorenzo Colitti Signed-off-by: Patrick Rohr Reviewed-by: Maciej Żenczykowski Reviewed-by: David Ahern Link: https://lore.kernel.org/r/20230726230701.919212-1-prohr@google.com Signed-off-by: Jakub Kicinski --- include/linux/ipv6.h | 2 +- include/uapi/linux/ipv6.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index 0295b47c10a3..5883551b1ee8 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -33,7 +33,7 @@ struct ipv6_devconf { __s32 accept_ra_defrtr; __u32 ra_defrtr_metric; __s32 accept_ra_min_hop_limit; - __s32 accept_ra_min_rtr_lft; + __s32 accept_ra_min_lft; __s32 accept_ra_pinfo; __s32 ignore_routes_with_linkdown; #ifdef CONFIG_IPV6_ROUTER_PREF diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h index 8b6bcbf6ed4a..cf592d7b630f 100644 --- a/include/uapi/linux/ipv6.h +++ b/include/uapi/linux/ipv6.h @@ -198,7 +198,7 @@ enum { DEVCONF_IOAM6_ID_WIDE, DEVCONF_NDISC_EVICT_NOCARRIER, DEVCONF_ACCEPT_UNTRACKED_NA, - DEVCONF_ACCEPT_RA_MIN_RTR_LFT, + DEVCONF_ACCEPT_RA_MIN_LFT, DEVCONF_MAX }; -- cgit v1.2.3 From 6a7eccef47b205ae66371a26d36dfb2529835075 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 27 Jul 2023 13:35:23 -0400 Subject: net/tls: Move TLS protocol elements to a separate header Kernel TLS consumers will need definitions of various parts of the TLS protocol, but often do not need the function declarations and other infrastructure provided in . Break out existing standardized protocol elements into a separate header, and make room for a few more elements in subsequent patches. Signed-off-by: Chuck Lever Link: https://lore.kernel.org/r/169047931374.5241.7713175865185969309.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski --- include/net/tls.h | 4 ---- include/net/tls_prot.h | 26 ++++++++++++++++++++++++++ 2 files changed, 26 insertions(+), 4 deletions(-) create mode 100644 include/net/tls_prot.h (limited to 'include') diff --git a/include/net/tls.h b/include/net/tls.h index 5e71dd3df8ca..06fca9160346 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -69,10 +69,6 @@ extern const struct tls_cipher_size_desc tls_cipher_size_desc[]; #define TLS_CRYPTO_INFO_READY(info) ((info)->cipher_type) -#define TLS_RECORD_TYPE_ALERT 0x15 -#define TLS_RECORD_TYPE_HANDSHAKE 0x16 -#define TLS_RECORD_TYPE_DATA 0x17 - #define TLS_AAD_SPACE_SIZE 13 #define MAX_IV_SIZE 16 diff --git a/include/net/tls_prot.h b/include/net/tls_prot.h new file mode 100644 index 000000000000..47d6cfd1619e --- /dev/null +++ b/include/net/tls_prot.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */ +/* + * Copyright (c) 2023, Oracle and/or its affiliates. + * + * TLS Protocol definitions + * + * From https://www.iana.org/assignments/tls-parameters/tls-parameters.xhtml + */ + +#ifndef _TLS_PROT_H +#define _TLS_PROT_H + +/* + * TLS Record protocol: ContentType + */ +enum { + TLS_RECORD_TYPE_CHANGE_CIPHER_SPEC = 20, + TLS_RECORD_TYPE_ALERT = 21, + TLS_RECORD_TYPE_HANDSHAKE = 22, + TLS_RECORD_TYPE_DATA = 23, + TLS_RECORD_TYPE_HEARTBEAT = 24, + TLS_RECORD_TYPE_TLS12_CID = 25, + TLS_RECORD_TYPE_ACK = 26, +}; + +#endif /* _TLS_PROT_H */ -- cgit v1.2.3 From 0257427146e84af365612508ace9d0d87dfb7d7a Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 27 Jul 2023 13:35:50 -0400 Subject: net/tls: Add TLS Alert definitions I'm about to add support for kernel handshake API consumers to send TLS Alerts, so introduce the needed protocol definitions in the new header tls_prot.h. This presages support for Closure alerts. Also, support for alerts is a pre-requite for handling session re-keying, where one peer will signal the need for a re-key by sending a TLS Alert. Reviewed-by: Hannes Reinecke Signed-off-by: Chuck Lever Link: https://lore.kernel.org/r/169047934064.5241.8377890858495063518.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski --- include/net/tls_prot.h | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) (limited to 'include') diff --git a/include/net/tls_prot.h b/include/net/tls_prot.h index 47d6cfd1619e..68a40756440b 100644 --- a/include/net/tls_prot.h +++ b/include/net/tls_prot.h @@ -23,4 +23,46 @@ enum { TLS_RECORD_TYPE_ACK = 26, }; +/* + * TLS Alert protocol: AlertLevel + */ +enum { + TLS_ALERT_LEVEL_WARNING = 1, + TLS_ALERT_LEVEL_FATAL = 2, +}; + +/* + * TLS Alert protocol: AlertDescription + */ +enum { + TLS_ALERT_DESC_CLOSE_NOTIFY = 0, + TLS_ALERT_DESC_UNEXPECTED_MESSAGE = 10, + TLS_ALERT_DESC_BAD_RECORD_MAC = 20, + TLS_ALERT_DESC_RECORD_OVERFLOW = 22, + TLS_ALERT_DESC_HANDSHAKE_FAILURE = 40, + TLS_ALERT_DESC_BAD_CERTIFICATE = 42, + TLS_ALERT_DESC_UNSUPPORTED_CERTIFICATE = 43, + TLS_ALERT_DESC_CERTIFICATE_REVOKED = 44, + TLS_ALERT_DESC_CERTIFICATE_EXPIRED = 45, + TLS_ALERT_DESC_CERTIFICATE_UNKNOWN = 46, + TLS_ALERT_DESC_ILLEGAL_PARAMETER = 47, + TLS_ALERT_DESC_UNKNOWN_CA = 48, + TLS_ALERT_DESC_ACCESS_DENIED = 49, + TLS_ALERT_DESC_DECODE_ERROR = 50, + TLS_ALERT_DESC_DECRYPT_ERROR = 51, + TLS_ALERT_DESC_TOO_MANY_CIDS_REQUESTED = 52, + TLS_ALERT_DESC_PROTOCOL_VERSION = 70, + TLS_ALERT_DESC_INSUFFICIENT_SECURITY = 71, + TLS_ALERT_DESC_INTERNAL_ERROR = 80, + TLS_ALERT_DESC_INAPPROPRIATE_FALLBACK = 86, + TLS_ALERT_DESC_USER_CANCELED = 90, + TLS_ALERT_DESC_MISSING_EXTENSION = 109, + TLS_ALERT_DESC_UNSUPPORTED_EXTENSION = 110, + TLS_ALERT_DESC_UNRECOGNIZED_NAME = 112, + TLS_ALERT_DESC_BAD_CERTIFICATE_STATUS_RESPONSE = 113, + TLS_ALERT_DESC_UNKNOWN_PSK_IDENTITY = 115, + TLS_ALERT_DESC_CERTIFICATE_REQUIRED = 116, + TLS_ALERT_DESC_NO_APPLICATION_PROTOCOL = 120, +}; + #endif /* _TLS_PROT_H */ -- cgit v1.2.3 From 35b1b538d422fd765d88fbdaaa6e06ee466d9f93 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 27 Jul 2023 13:36:17 -0400 Subject: net/handshake: Add API for sending TLS Closure alerts This helper sends an alert only if a TLS session was established. Signed-off-by: Chuck Lever Link: https://lore.kernel.org/r/169047936730.5241.618595693821012638.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski --- include/net/handshake.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/net/handshake.h b/include/net/handshake.h index 2e26e436e85f..bb88dfa6e3c9 100644 --- a/include/net/handshake.h +++ b/include/net/handshake.h @@ -40,5 +40,6 @@ int tls_server_hello_x509(const struct tls_handshake_args *args, gfp_t flags); int tls_server_hello_psk(const struct tls_handshake_args *args, gfp_t flags); bool tls_handshake_cancel(struct sock *sk); +void tls_handshake_close(struct socket *sock); #endif /* _NET_HANDSHAKE_H */ -- cgit v1.2.3 From 39d0e38dcced8d4da92cd11f3ff618bacc42d8a9 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 27 Jul 2023 13:37:10 -0400 Subject: net/handshake: Add helpers for parsing incoming TLS Alerts Kernel TLS consumers can replace common TLS Alert parsing code with these helpers. Signed-off-by: Chuck Lever Link: https://lore.kernel.org/r/169047942074.5241.13791647439480672048.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski --- include/net/handshake.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include') diff --git a/include/net/handshake.h b/include/net/handshake.h index bb88dfa6e3c9..8ebd4f9ed26e 100644 --- a/include/net/handshake.h +++ b/include/net/handshake.h @@ -42,4 +42,8 @@ int tls_server_hello_psk(const struct tls_handshake_args *args, gfp_t flags); bool tls_handshake_cancel(struct sock *sk); void tls_handshake_close(struct socket *sock); +u8 tls_get_record_type(const struct sock *sk, const struct cmsghdr *msg); +void tls_alert_recv(const struct sock *sk, const struct msghdr *msg, + u8 *level, u8 *description); + #endif /* _NET_HANDSHAKE_H */ -- cgit v1.2.3 From b470985c76df6d53a9454670fb7551e1197f55e2 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 27 Jul 2023 13:38:04 -0400 Subject: net/handshake: Trace events for TLS Alert helpers Add observability for the new TLS Alert infrastructure. Reviewed-by: Hannes Reinecke Signed-off-by: Chuck Lever Link: https://lore.kernel.org/r/169047947409.5241.14548832149596892717.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski --- include/trace/events/handshake.h | 160 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 160 insertions(+) (limited to 'include') diff --git a/include/trace/events/handshake.h b/include/trace/events/handshake.h index 8dadcab5f12a..bdd8a03cf5ba 100644 --- a/include/trace/events/handshake.h +++ b/include/trace/events/handshake.h @@ -6,7 +6,86 @@ #define _TRACE_HANDSHAKE_H #include +#include #include +#include + +#define TLS_RECORD_TYPE_LIST \ + record_type(CHANGE_CIPHER_SPEC) \ + record_type(ALERT) \ + record_type(HANDSHAKE) \ + record_type(DATA) \ + record_type(HEARTBEAT) \ + record_type(TLS12_CID) \ + record_type_end(ACK) + +#undef record_type +#undef record_type_end +#define record_type(x) TRACE_DEFINE_ENUM(TLS_RECORD_TYPE_##x); +#define record_type_end(x) TRACE_DEFINE_ENUM(TLS_RECORD_TYPE_##x); + +TLS_RECORD_TYPE_LIST + +#undef record_type +#undef record_type_end +#define record_type(x) { TLS_RECORD_TYPE_##x, #x }, +#define record_type_end(x) { TLS_RECORD_TYPE_##x, #x } + +#define show_tls_content_type(type) \ + __print_symbolic(type, TLS_RECORD_TYPE_LIST) + +TRACE_DEFINE_ENUM(TLS_ALERT_LEVEL_WARNING); +TRACE_DEFINE_ENUM(TLS_ALERT_LEVEL_FATAL); + +#define show_tls_alert_level(level) \ + __print_symbolic(level, \ + { TLS_ALERT_LEVEL_WARNING, "Warning" }, \ + { TLS_ALERT_LEVEL_FATAL, "Fatal" }) + +#define TLS_ALERT_DESCRIPTION_LIST \ + alert_description(CLOSE_NOTIFY) \ + alert_description(UNEXPECTED_MESSAGE) \ + alert_description(BAD_RECORD_MAC) \ + alert_description(RECORD_OVERFLOW) \ + alert_description(HANDSHAKE_FAILURE) \ + alert_description(BAD_CERTIFICATE) \ + alert_description(UNSUPPORTED_CERTIFICATE) \ + alert_description(CERTIFICATE_REVOKED) \ + alert_description(CERTIFICATE_EXPIRED) \ + alert_description(CERTIFICATE_UNKNOWN) \ + alert_description(ILLEGAL_PARAMETER) \ + alert_description(UNKNOWN_CA) \ + alert_description(ACCESS_DENIED) \ + alert_description(DECODE_ERROR) \ + alert_description(DECRYPT_ERROR) \ + alert_description(TOO_MANY_CIDS_REQUESTED) \ + alert_description(PROTOCOL_VERSION) \ + alert_description(INSUFFICIENT_SECURITY) \ + alert_description(INTERNAL_ERROR) \ + alert_description(INAPPROPRIATE_FALLBACK) \ + alert_description(USER_CANCELED) \ + alert_description(MISSING_EXTENSION) \ + alert_description(UNSUPPORTED_EXTENSION) \ + alert_description(UNRECOGNIZED_NAME) \ + alert_description(BAD_CERTIFICATE_STATUS_RESPONSE) \ + alert_description(UNKNOWN_PSK_IDENTITY) \ + alert_description(CERTIFICATE_REQUIRED) \ + alert_description_end(NO_APPLICATION_PROTOCOL) + +#undef alert_description +#undef alert_description_end +#define alert_description(x) TRACE_DEFINE_ENUM(TLS_ALERT_DESC_##x); +#define alert_description_end(x) TRACE_DEFINE_ENUM(TLS_ALERT_DESC_##x); + +TLS_ALERT_DESCRIPTION_LIST + +#undef alert_description +#undef alert_description_end +#define alert_description(x) { TLS_ALERT_DESC_##x, #x }, +#define alert_description_end(x) { TLS_ALERT_DESC_##x, #x } + +#define show_tls_alert_description(desc) \ + __print_symbolic(desc, TLS_ALERT_DESCRIPTION_LIST) DECLARE_EVENT_CLASS(handshake_event_class, TP_PROTO( @@ -106,6 +185,47 @@ DECLARE_EVENT_CLASS(handshake_error_class, ), \ TP_ARGS(net, req, sk, err)) +DECLARE_EVENT_CLASS(handshake_alert_class, + TP_PROTO( + const struct sock *sk, + unsigned char level, + unsigned char description + ), + TP_ARGS(sk, level, description), + TP_STRUCT__entry( + /* sockaddr_in6 is always bigger than sockaddr_in */ + __array(__u8, saddr, sizeof(struct sockaddr_in6)) + __array(__u8, daddr, sizeof(struct sockaddr_in6)) + __field(unsigned int, netns_ino) + __field(unsigned long, level) + __field(unsigned long, description) + ), + TP_fast_assign( + const struct inet_sock *inet = inet_sk(sk); + + memset(__entry->saddr, 0, sizeof(struct sockaddr_in6)); + memset(__entry->daddr, 0, sizeof(struct sockaddr_in6)); + TP_STORE_ADDR_PORTS(__entry, inet, sk); + + __entry->netns_ino = sock_net(sk)->ns.inum; + __entry->level = level; + __entry->description = description; + ), + TP_printk("src=%pISpc dest=%pISpc %s: %s", + __entry->saddr, __entry->daddr, + show_tls_alert_level(__entry->level), + show_tls_alert_description(__entry->description) + ) +); +#define DEFINE_HANDSHAKE_ALERT(name) \ + DEFINE_EVENT(handshake_alert_class, name, \ + TP_PROTO( \ + const struct sock *sk, \ + unsigned char level, \ + unsigned char description \ + ), \ + TP_ARGS(sk, level, description)) + /* * Request lifetime events @@ -154,6 +274,46 @@ DEFINE_HANDSHAKE_ERROR(handshake_cmd_accept_err); DEFINE_HANDSHAKE_FD_EVENT(handshake_cmd_done); DEFINE_HANDSHAKE_ERROR(handshake_cmd_done_err); +/* + * TLS Record events + */ + +TRACE_EVENT(tls_contenttype, + TP_PROTO( + const struct sock *sk, + unsigned char type + ), + TP_ARGS(sk, type), + TP_STRUCT__entry( + /* sockaddr_in6 is always bigger than sockaddr_in */ + __array(__u8, saddr, sizeof(struct sockaddr_in6)) + __array(__u8, daddr, sizeof(struct sockaddr_in6)) + __field(unsigned int, netns_ino) + __field(unsigned long, type) + ), + TP_fast_assign( + const struct inet_sock *inet = inet_sk(sk); + + memset(__entry->saddr, 0, sizeof(struct sockaddr_in6)); + memset(__entry->daddr, 0, sizeof(struct sockaddr_in6)); + TP_STORE_ADDR_PORTS(__entry, inet, sk); + + __entry->netns_ino = sock_net(sk)->ns.inum; + __entry->type = type; + ), + TP_printk("src=%pISpc dest=%pISpc %s", + __entry->saddr, __entry->daddr, + show_tls_content_type(__entry->type) + ) +); + +/* + * TLS Alert events + */ + +DEFINE_HANDSHAKE_ALERT(tls_alert_send); +DEFINE_HANDSHAKE_ALERT(tls_alert_recv); + #endif /* _TRACE_HANDSHAKE_H */ #include -- cgit v1.2.3 From 9abddac583d68e16258d5e0b95dc1b3ca1886173 Mon Sep 17 00:00:00 2001 From: Daniel Xu Date: Fri, 21 Jul 2023 14:22:45 -0600 Subject: netfilter: defrag: Add glue hooks for enabling/disabling defrag We want to be able to enable/disable IP packet defrag from core bpf/netfilter code. In other words, execute code from core that could possibly be built as a module. To help avoid symbol resolution errors, use glue hooks that the modules will register callbacks with during module init. Signed-off-by: Daniel Xu Reviewed-by: Florian Westphal Link: https://lore.kernel.org/r/f6a8824052441b72afe5285acedbd634bd3384c1.1689970773.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov --- include/linux/netfilter.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include') diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h index d4fed4c508ca..d68644b7c299 100644 --- a/include/linux/netfilter.h +++ b/include/linux/netfilter.h @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -481,6 +482,15 @@ struct nfnl_ct_hook { }; extern const struct nfnl_ct_hook __rcu *nfnl_ct_hook; +struct nf_defrag_hook { + struct module *owner; + int (*enable)(struct net *net); + void (*disable)(struct net *net); +}; + +extern const struct nf_defrag_hook __rcu *nf_defrag_v4_hook; +extern const struct nf_defrag_hook __rcu *nf_defrag_v6_hook; + /* * nf_skb_duplicated - TEE target has sent a packet * -- cgit v1.2.3 From 91721c2d02d3a0141df8a4787c7079b89b0d0607 Mon Sep 17 00:00:00 2001 From: Daniel Xu Date: Fri, 21 Jul 2023 14:22:46 -0600 Subject: netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link This commit adds support for enabling IP defrag using pre-existing netfilter defrag support. Basically all the flag does is bump a refcnt while the link the active. Checks are also added to ensure the prog requesting defrag support is run _after_ netfilter defrag hooks. We also take care to avoid any issues w.r.t. module unloading -- while defrag is active on a link, the module is prevented from unloading. Signed-off-by: Daniel Xu Reviewed-by: Florian Westphal Link: https://lore.kernel.org/r/5cff26f97e55161b7d56b09ddcf5f8888a5add1d.1689970773.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov --- include/uapi/linux/bpf.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 14fd26b09e4b..70da85200695 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1188,6 +1188,11 @@ enum bpf_perf_event_type { */ #define BPF_F_KPROBE_MULTI_RETURN (1U << 0) +/* link_create.netfilter.flags used in LINK_CREATE command for + * BPF_PROG_TYPE_NETFILTER to enable IP packet defragmentation. + */ +#define BPF_F_NETFILTER_IP_DEFRAG (1U << 0) + /* When BPF ldimm64's insn[0].src_reg != 0 then this can have * the following extensions: * -- cgit v1.2.3 From 61c5145317a23b3ee03d3aa01d46df57d75e4dee Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Wed, 26 Jul 2023 22:38:16 +0800 Subject: bonding: 3ad: Remove unused declaration bond_3ad_update_lacp_active() This is not used since commit 3a755cd8b7c6 ("bonding: add new option lacp_active") Signed-off-by: YueHaibing Acked-by: Hangbin Liu Link: https://lore.kernel.org/r/20230726143816.15280-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski --- include/net/bond_3ad.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include') diff --git a/include/net/bond_3ad.h b/include/net/bond_3ad.h index a016f275cb01..c5e57c6bd873 100644 --- a/include/net/bond_3ad.h +++ b/include/net/bond_3ad.h @@ -301,7 +301,6 @@ int __bond_3ad_get_active_agg_info(struct bonding *bond, int bond_3ad_lacpdu_recv(const struct sk_buff *skb, struct bonding *bond, struct slave *slave); int bond_3ad_set_carrier(struct bonding *bond); -void bond_3ad_update_lacp_active(struct bonding *bond); void bond_3ad_update_lacp_rate(struct bonding *bond); void bond_3ad_update_ad_actor_settings(struct bonding *bond); int bond_3ad_stats_fill(struct sk_buff *skb, struct bond_3ad_stats *stats); -- cgit v1.2.3 From 2b3082c6ef3b0104d822f6f18d2afbe5fc9a5c2c Mon Sep 17 00:00:00 2001 From: Ratheesh Kannoth Date: Sat, 29 Jul 2023 04:52:15 +0530 Subject: net: flow_dissector: Use 64bits for used_keys As 32bits of dissector->used_keys are exhausted, increase the size to 64bits. This is base change for ESP/AH flow dissector patch. Please find patch and discussions at https://lore.kernel.org/netdev/ZMDNjD46BvZ5zp5I@corigine.com/T/#t Signed-off-by: Ratheesh Kannoth Reviewed-by: Petr Machata # for mlxsw Tested-by: Petr Machata Reviewed-by: Martin Habets Reviewed-by: Simon Horman Reviewed-by: Vladimir Oltean Signed-off-by: David S. Miller --- include/net/flow_dissector.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h index 8664ed4fbbdf..830f06b2f36d 100644 --- a/include/net/flow_dissector.h +++ b/include/net/flow_dissector.h @@ -370,7 +370,8 @@ struct flow_dissector_key { }; struct flow_dissector { - unsigned int used_keys; /* each bit repesents presence of one key id */ + unsigned long long used_keys; + /* each bit represents presence of one key id */ unsigned short int offset[FLOW_DISSECTOR_KEY_MAX]; }; @@ -430,7 +431,7 @@ void skb_flow_get_icmp_tci(const struct sk_buff *skb, static inline bool dissector_uses_key(const struct flow_dissector *flow_dissector, enum flow_dissector_key_id key_id) { - return flow_dissector->used_keys & (1 << key_id); + return flow_dissector->used_keys & (1ULL << key_id); } static inline void *skb_flow_dissector_target(struct flow_dissector *flow_dissector, -- cgit v1.2.3 From 2628d40899d1acb5120993bef651595787ddaa8e Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Fri, 28 Jul 2023 21:21:13 +0800 Subject: devlink: Remove unused extern declaration devlink_port_region_destroy() devlink_port_region_destroy() is never implemented since commit 544e7c33ec2f ("net: devlink: Add support for port regions"). Signed-off-by: Yue Haibing Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20230728132113.32888-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski --- include/net/devlink.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include') diff --git a/include/net/devlink.h b/include/net/devlink.h index 0cdb4b16e5b5..a1a8e1b6e7df 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -1790,8 +1790,6 @@ devlink_port_region_create(struct devlink_port *port, u32 region_max_snapshots, u64 region_size); void devl_region_destroy(struct devlink_region *region); void devlink_region_destroy(struct devlink_region *region); -void devlink_port_region_destroy(struct devlink_region *region); - int devlink_region_snapshot_id_get(struct devlink *devlink, u32 *id); void devlink_region_snapshot_id_put(struct devlink *devlink, u32 id); int devlink_region_snapshot_create(struct devlink_region *region, -- cgit v1.2.3 From 68223f96997e8ac2bb1751a72a211d1551a0dbcd Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Sat, 29 Jul 2023 20:26:44 +0800 Subject: tcp: Remove unused function declarations commit 8a59f9d1e3d4 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()") left behind tcp_bpf_get_proto() declaration. And tcp_v4_tw_remember_stamp() function is remove in ccb7c410ddc0 ("timewait_sock: Create and use getpeer op."). Since commit 686989700cab ("tcp: simplify tcp_mark_skb_lost") tcp_skb_mark_lost_uncond_verify() declaration is not used anymore. Signed-off-by: Yue Haibing Reviewed-by: Simon Horman Reviewed-by: Eric Dumazet Link: https://lore.kernel.org/r/20230729122644.10648-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski --- include/net/tcp.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include') diff --git a/include/net/tcp.h b/include/net/tcp.h index 6ebf54992ffe..6d77c08d83b7 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -323,7 +323,6 @@ int tcp_v4_early_demux(struct sk_buff *skb); int tcp_v4_rcv(struct sk_buff *skb); void tcp_remove_empty_skb(struct sock *sk); -int tcp_v4_tw_remember_stamp(struct inet_timewait_sock *tw); int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size); int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size); int tcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg, int *copied, @@ -622,7 +621,6 @@ void tcp_skb_collapse_tstamp(struct sk_buff *skb, void tcp_rearm_rto(struct sock *sk); void tcp_synack_rtt_meas(struct sock *sk, struct request_sock *req); void tcp_reset(struct sock *sk, struct sk_buff *skb); -void tcp_skb_mark_lost_uncond_verify(struct tcp_sock *tp, struct sk_buff *skb); void tcp_fin(struct sock *sk); void tcp_check_space(struct sock *sk); void tcp_sack_compress_send_ack(struct sock *sk); @@ -2360,7 +2358,6 @@ struct sk_msg; struct sk_psock; #ifdef CONFIG_BPF_SYSCALL -struct proto *tcp_bpf_get_proto(struct sock *sk, struct sk_psock *psock); int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore); void tcp_bpf_clone(const struct sock *sk, struct sock *newsk); #endif /* CONFIG_BPF_SYSCALL */ -- cgit v1.2.3 From 079082c60affefeb9d2bd4176a4f2b390a9ccfda Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Fri, 28 Jul 2023 23:47:17 +0200 Subject: tcx: Fix splat during dev unregister During unregister_netdevice_many_notify(), the ordering of our concerned function calls is like this: unregister_netdevice_many_notify dev_shutdown qdisc_put clsact_destroy tcx_uninstall The syzbot reproducer triggered a case that the qdisc refcnt is not zero during dev_shutdown(). tcx_uninstall() will then WARN_ON_ONCE(tcx_entry(entry)->miniq_active) because the miniq is still active and the entry should not be freed. The latter assumed that qdisc destruction happens before tcx teardown. This fix is to avoid tcx_uninstall() doing tcx_entry_free() when the miniq is still alive and let the clsact_destroy() do the free later, so that we do not assume any specific ordering for either of them. If still active, tcx_uninstall() does clear the entry when flushing out the prog/link. clsact_destroy() will then notice the "!tcx_entry_is_active()" and then does the tcx_entry_free() eventually. Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support") Reported-by: syzbot+376a289e86a0fd02b9ba@syzkaller.appspotmail.com Reported-by: Leon Romanovsky Signed-off-by: Martin KaFai Lau Co-developed-by: Daniel Borkmann Signed-off-by: Daniel Borkmann Tested-by: syzbot+376a289e86a0fd02b9ba@syzkaller.appspotmail.com Tested-by: Leon Romanovsky Link: https://lore.kernel.org/r/222255fe07cb58f15ee662e7ee78328af5b438e4.1690549248.git.daniel@iogearbox.net Signed-off-by: Jakub Kicinski --- include/linux/bpf_mprog.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'include') diff --git a/include/linux/bpf_mprog.h b/include/linux/bpf_mprog.h index 2b429488f840..929225f7b095 100644 --- a/include/linux/bpf_mprog.h +++ b/include/linux/bpf_mprog.h @@ -256,6 +256,22 @@ static inline void bpf_mprog_entry_copy(struct bpf_mprog_entry *dst, memcpy(dst->fp_items, src->fp_items, sizeof(src->fp_items)); } +static inline void bpf_mprog_entry_clear(struct bpf_mprog_entry *dst) +{ + memset(dst->fp_items, 0, sizeof(dst->fp_items)); +} + +static inline void bpf_mprog_clear_all(struct bpf_mprog_entry *entry, + struct bpf_mprog_entry **entry_new) +{ + struct bpf_mprog_entry *peer; + + peer = bpf_mprog_peer(entry); + bpf_mprog_entry_clear(peer); + peer->parent->count = 0; + *entry_new = peer; +} + static inline void bpf_mprog_entry_grow(struct bpf_mprog_entry *entry, int idx) { int total = bpf_mprog_total(entry); -- cgit v1.2.3 From 8798481b667fa7c9bbd5aa843bf1557ada699964 Mon Sep 17 00:00:00 2001 From: Pedro Tammela Date: Fri, 28 Jul 2023 12:35:33 -0300 Subject: net/sched: wrap open coded Qdics class filter counter The 'filter_cnt' counter is used to control a Qdisc class lifetime. Each filter referecing this class by its id will eventually increment/decrement this counter in their respective 'add/update/delete' routines. As these operations are always serialized under rtnl lock, we don't need an atomic type like 'refcount_t'. It also means that we lose the overflow/underflow checks already present in refcount_t, which are valuable to hunt down bugs where the unsigned counter wraps around as it aids automated tools like syzkaller to scream in such situations. Wrap the open coded increment/decrement into helper functions and add overflow checks to the operations. Acked-by: Jamal Hadi Salim Signed-off-by: Pedro Tammela Reviewed-by: Simon Horman Signed-off-by: Paolo Abeni --- include/net/sch_generic.h | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) (limited to 'include') diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 15be2d96b06d..f232512505f8 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -599,6 +599,7 @@ get_default_qdisc_ops(const struct net_device *dev, int ntx) struct Qdisc_class_common { u32 classid; + unsigned int filter_cnt; struct hlist_node hnode; }; @@ -633,6 +634,31 @@ qdisc_class_find(const struct Qdisc_class_hash *hash, u32 id) return NULL; } +static inline bool qdisc_class_in_use(const struct Qdisc_class_common *cl) +{ + return cl->filter_cnt > 0; +} + +static inline void qdisc_class_get(struct Qdisc_class_common *cl) +{ + unsigned int res; + + if (check_add_overflow(cl->filter_cnt, 1, &res)) + WARN(1, "Qdisc class overflow"); + + cl->filter_cnt = res; +} + +static inline void qdisc_class_put(struct Qdisc_class_common *cl) +{ + unsigned int res; + + if (check_sub_overflow(cl->filter_cnt, 1, &res)) + WARN(1, "Qdisc class underflow"); + + cl->filter_cnt = res; +} + static inline int tc_classid_to_hwtc(struct net_device *dev, u32 classid) { u32 hwtc = TC_H_MIN(classid) - TC_H_MIN_PRIORITY; -- cgit v1.2.3 From 999d0863ff6416dd33c904f496bf363c8afb6540 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Mon, 31 Jul 2023 22:04:37 +0800 Subject: inet6: Remove unused function declaration udpv6_connect() This is never implemented since the beginning of git history. Signed-off-by: Yue Haibing Reviewed-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20230731140437.37056-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski --- include/net/transp_v6.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include') diff --git a/include/net/transp_v6.h b/include/net/transp_v6.h index d27b1caf3753..1a97e3f32029 100644 --- a/include/net/transp_v6.h +++ b/include/net/transp_v6.h @@ -33,8 +33,6 @@ void udplitev6_exit(void); int tcpv6_init(void); void tcpv6_exit(void); -int udpv6_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len); - /* this does all the common and the specific ctl work */ void ip6_datagram_recv_ctl(struct sock *sk, struct msghdr *msg, struct sk_buff *skb); -- cgit v1.2.3 From 394bd87764b615b0fc17d34127a1cc7da76ff49f Mon Sep 17 00:00:00 2001 From: Gavin Li Date: Mon, 31 Jul 2023 10:06:55 +0300 Subject: virtio_net: support per queue interrupt coalesce command Add interrupt_coalesce config in send_queue and receive_queue to cache user config. Send per virtqueue interrupt moderation config to underlying device in order to have more efficient interrupt moderation and cpu utilization of guest VM. Additionally, address all the VQs when updating the global configuration, as now the individual VQs configuration can diverge from the global configuration. Signed-off-by: Gavin Li Reviewed-by: Dragos Tatulea Reviewed-by: Jiri Pirko Acked-by: Michael S. Tsirkin Reviewed-by: Heng Qi Acked-by: Jason Wang Link: https://lore.kernel.org/r/20230731070656.96411-3-gavinl@nvidia.com Signed-off-by: Jakub Kicinski --- include/uapi/linux/virtio_net.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h index 12c1c9699935..cc65ef0f3c3e 100644 --- a/include/uapi/linux/virtio_net.h +++ b/include/uapi/linux/virtio_net.h @@ -56,6 +56,7 @@ #define VIRTIO_NET_F_MQ 22 /* Device supports Receive Flow * Steering */ #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ +#define VIRTIO_NET_F_VQ_NOTF_COAL 52 /* Device supports virtqueue notification coalescing */ #define VIRTIO_NET_F_NOTF_COAL 53 /* Device supports notifications coalescing */ #define VIRTIO_NET_F_GUEST_USO4 54 /* Guest can handle USOv4 in. */ #define VIRTIO_NET_F_GUEST_USO6 55 /* Guest can handle USOv6 in. */ @@ -391,5 +392,18 @@ struct virtio_net_ctrl_coal_rx { }; #define VIRTIO_NET_CTRL_NOTF_COAL_RX_SET 1 +#define VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET 2 +#define VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET 3 + +struct virtio_net_ctrl_coal { + __le32 max_packets; + __le32 max_usecs; +}; + +struct virtio_net_ctrl_coal_vq { + __le16 vqn; + __le16 reserved; + struct virtio_net_ctrl_coal coal; +}; #endif /* _UAPI_LINUX_VIRTIO_NET_H */ -- cgit v1.2.3 From a57c34a80cbe15e36e12d42a4ddc5160a5bbb1a4 Mon Sep 17 00:00:00 2001 From: Ratheesh Kannoth Date: Tue, 1 Aug 2023 07:10:58 +0530 Subject: net: flow_dissector: Add IPSEC dissector Support for dissecting IPSEC field SPI (which is 32bits in size) for ESP and AH packets. Signed-off-by: Ratheesh Kannoth Signed-off-by: David S. Miller --- include/net/flow_dissector.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include') diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h index 830f06b2f36d..1a7131d6cb0e 100644 --- a/include/net/flow_dissector.h +++ b/include/net/flow_dissector.h @@ -301,6 +301,14 @@ struct flow_dissector_key_l2tpv3 { __be32 session_id; }; +/** + * struct flow_dissector_key_ipsec: + * @spi: identifier for a ipsec connection + */ +struct flow_dissector_key_ipsec { + __be32 spi; +}; + /** * struct flow_dissector_key_cfm * @mdl_ver: maintenance domain level (mdl) and cfm protocol version @@ -354,6 +362,7 @@ enum flow_dissector_key_id { FLOW_DISSECTOR_KEY_PPPOE, /* struct flow_dissector_key_pppoe */ FLOW_DISSECTOR_KEY_L2TPV3, /* struct flow_dissector_key_l2tpv3 */ FLOW_DISSECTOR_KEY_CFM, /* struct flow_dissector_key_cfm */ + FLOW_DISSECTOR_KEY_IPSEC, /* struct flow_dissector_key_ipsec */ FLOW_DISSECTOR_KEY_MAX, }; -- cgit v1.2.3 From 4c13eda757e3ca72f523d07ed9e5f3e72b374299 Mon Sep 17 00:00:00 2001 From: Ratheesh Kannoth Date: Tue, 1 Aug 2023 07:10:59 +0530 Subject: tc: flower: support for SPI tc flower rules support to classify ESP/AH packets matching SPI field. Signed-off-by: Ratheesh Kannoth Signed-off-by: David S. Miller --- include/uapi/linux/pkt_cls.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h index 7865f5a9885b..75506f157340 100644 --- a/include/uapi/linux/pkt_cls.h +++ b/include/uapi/linux/pkt_cls.h @@ -598,6 +598,9 @@ enum { TCA_FLOWER_KEY_CFM, /* nested */ + TCA_FLOWER_KEY_SPI, /* be32 */ + TCA_FLOWER_KEY_SPI_MASK, /* be32 */ + __TCA_FLOWER_MAX, }; -- cgit v1.2.3 From c8915d7329d6e9164c5c847dc1c56a2c0437053f Mon Sep 17 00:00:00 2001 From: Ratheesh Kannoth Date: Tue, 1 Aug 2023 07:11:00 +0530 Subject: tc: flower: Enable offload support IPSEC SPI field. This patch enables offload for TC classifier flower rules which matches against SPI field. Signed-off-by: Ratheesh Kannoth Signed-off-by: David S. Miller --- include/net/flow_offload.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include') diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h index 118082eae48c..9efa9a59e81f 100644 --- a/include/net/flow_offload.h +++ b/include/net/flow_offload.h @@ -64,6 +64,10 @@ struct flow_match_tcp { struct flow_dissector_key_tcp *key, *mask; }; +struct flow_match_ipsec { + struct flow_dissector_key_ipsec *key, *mask; +}; + struct flow_match_mpls { struct flow_dissector_key_mpls *key, *mask; }; @@ -116,6 +120,8 @@ void flow_rule_match_ports_range(const struct flow_rule *rule, struct flow_match_ports_range *out); void flow_rule_match_tcp(const struct flow_rule *rule, struct flow_match_tcp *out); +void flow_rule_match_ipsec(const struct flow_rule *rule, + struct flow_match_ipsec *out); void flow_rule_match_icmp(const struct flow_rule *rule, struct flow_match_icmp *out); void flow_rule_match_mpls(const struct flow_rule *rule, -- cgit v1.2.3 From 9e63a99c566faf2a4b8f6257ea7fea82106612cc Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Tue, 1 Aug 2023 21:39:02 +0800 Subject: udp: Remove unused function declaration udp_bpf_get_proto() commit 8a59f9d1e3d4 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()") left behind this. Signed-off-by: Yue Haibing Reviewed-by: Willem de Bruijn Link: https://lore.kernel.org/r/20230801133902.3660-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski --- include/net/udp.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include') diff --git a/include/net/udp.h b/include/net/udp.h index 4d13424f8f72..5a8421cd9083 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -529,7 +529,6 @@ static inline void udp_post_segment_fix_csum(struct sk_buff *skb) #ifdef CONFIG_BPF_SYSCALL struct sk_psock; -struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock); int udp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore); #endif -- cgit v1.2.3 From 2fca1b5ef898b131c430eed6f07b8972fb6c9673 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Tue, 1 Aug 2023 22:31:29 +0800 Subject: ila: Remove unnecessary file net/ila.h Commit 642c2c95585d ("ila: xlat changes") removed ila_xlat_outgoing() and ila_xlat_incoming() functions, then this file became unnecessary. Signed-off-by: Yue Haibing Reviewed-by: Larysa Zaremba Reviewed-by: David Ahern Link: https://lore.kernel.org/r/20230801143129.40652-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski --- include/net/ila.h | 16 ---------------- 1 file changed, 16 deletions(-) delete mode 100644 include/net/ila.h (limited to 'include') diff --git a/include/net/ila.h b/include/net/ila.h deleted file mode 100644 index 73ebe5eab272..000000000000 --- a/include/net/ila.h +++ /dev/null @@ -1,16 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-or-later */ -/* - * ILA kernel interface - * - * Copyright (c) 2015 Tom Herbert - */ - -#ifndef _NET_ILA_H -#define _NET_ILA_H - -struct sk_buff; - -int ila_xlat_outgoing(struct sk_buff *skb); -int ila_xlat_incoming(struct sk_buff *skb); - -#endif /* _NET_ILA_H */ -- cgit v1.2.3 From f85b1c7da776d0cb2b4509bdd7f406fe5607930b Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Tue, 1 Aug 2023 22:42:09 +0800 Subject: net: switchdev: Remove unused typedef switchdev_obj_dump_cb_t() Commit 29ab586c3d83 ("net: switchdev: Remove bridge bypass support from switchdev") leave this unused. Signed-off-by: Yue Haibing Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20230801144209.27512-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski --- include/net/switchdev.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include') diff --git a/include/net/switchdev.h b/include/net/switchdev.h index 4d324e2a2eef..0294cfec9c37 100644 --- a/include/net/switchdev.h +++ b/include/net/switchdev.h @@ -201,8 +201,6 @@ struct switchdev_obj_in_state_mrp { #define SWITCHDEV_OBJ_IN_STATE_MRP(OBJ) \ container_of((OBJ), struct switchdev_obj_in_state_mrp, obj) -typedef int switchdev_obj_dump_cb_t(struct switchdev_obj *obj); - struct switchdev_brport { struct net_device *dev; const void *ctx; -- cgit v1.2.3 From 6a5a148aaf14747570cc634f9cdfcb0393f5617f Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 1 Aug 2023 13:13:58 +0200 Subject: bpf: fix bpf_probe_read_kernel prototype mismatch bpf_probe_read_kernel() has a __weak definition in core.c and another definition with an incompatible prototype in kernel/trace/bpf_trace.c, when CONFIG_BPF_EVENTS is enabled. Since the two are incompatible, there cannot be a shared declaration in a header file, but the lack of a prototype causes a W=1 warning: kernel/bpf/core.c:1638:12: error: no previous prototype for 'bpf_probe_read_kernel' [-Werror=missing-prototypes] On 32-bit architectures, the local prototype u64 __weak bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr) passes arguments in other registers as the one in bpf_trace.c BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size, const void *, unsafe_ptr) which uses 64-bit arguments in pairs of registers. As both versions of the function are fairly simple and only really differ in one line, just move them into a header file as an inline function that does not add any overhead for the bpf_trace.c callers and actually avoids a function call for the other one. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/ac25cb0f-b804-1649-3afb-1dc6138c2716@iogearbox.net/ Signed-off-by: Arnd Bergmann Acked-by: Yonghong Song Link: https://lore.kernel.org/r/20230801111449.185301-1-arnd@kernel.org Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index ceaa8c23287f..abe75063630b 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2661,6 +2661,18 @@ static inline void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr) } #endif /* CONFIG_BPF_SYSCALL */ +static __always_inline int +bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr) +{ + int ret = -EFAULT; + + if (IS_ENABLED(CONFIG_BPF_EVENTS)) + ret = copy_from_kernel_nofault(dst, unsafe_ptr, size); + if (unlikely(ret < 0)) + memset(dst, 0, size); + return ret; +} + void __bpf_free_used_btfs(struct bpf_prog_aux *aux, struct btf_mod_pair *used_btfs, u32 len); -- cgit v1.2.3 From bf4ea1d0b2cb2251f9e5619c81daa98591087c33 Mon Sep 17 00:00:00 2001 From: Leon Hwang Date: Tue, 1 Aug 2023 22:26:20 +0800 Subject: bpf, xdp: Add tracepoint to xdp attaching failure When error happens in dev_xdp_attach(), it should have a way to tell users the error message like the netlink approach. To avoid breaking uapi, adding a tracepoint in bpf_xdp_link_attach() is an appropriate way to notify users the error message. Hence, bpf libraries are able to retrieve the error message by this tracepoint, and then report the error message to users. Signed-off-by: Leon Hwang Link: https://lore.kernel.org/r/20230801142621.7925-2-hffilwlqm@gmail.com Signed-off-by: Alexei Starovoitov --- include/trace/events/xdp.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) (limited to 'include') diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h index c40fc97f9417..cd89f1d5ce7b 100644 --- a/include/trace/events/xdp.h +++ b/include/trace/events/xdp.h @@ -404,6 +404,23 @@ TRACE_EVENT(mem_return_failed, ) ); +TRACE_EVENT(bpf_xdp_link_attach_failed, + + TP_PROTO(const char *msg), + + TP_ARGS(msg), + + TP_STRUCT__entry( + __string(msg, msg) + ), + + TP_fast_assign( + __assign_str(msg, msg); + ), + + TP_printk("errmsg=%s", __get_str(msg)) +); + #endif /* _TRACE_XDP_H */ #include -- cgit v1.2.3 From 1762f132d54200ffa008e86f9f6c96ab4ee3fb71 Mon Sep 17 00:00:00 2001 From: Jianbo Liu Date: Mon, 31 Jul 2023 14:28:16 +0300 Subject: net/mlx5e: Support IPsec packet offload for RX in switchdev mode As decryption must be done first, add new prio for IPsec offload in FDB, and put it just lower than BYPASS prio and higher than TC prio. Three levels are added for RX. The first one is for ip xfrm policy. SA table is created in the second level for ip xfrm state. The status table is created in the last to check the decryption result. If success, packets continue with the next process, or dropped otherwise. For now, the set of reg c1 is removed for swtichdev mode, and the datapath process will be added in the next patch. Signed-off-by: Jianbo Liu Signed-off-by: Leon Romanovsky Link: https://lore.kernel.org/r/c91063554cf643fb50b99cf093e8a9bf11729de5.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/mlx5/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index 2cb404c7ea13..6b1fa94f69c8 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -109,6 +109,7 @@ enum mlx5_flow_namespace_type { enum { FDB_BYPASS_PATH, + FDB_CRYPTO_INGRESS, FDB_TC_OFFLOAD, FDB_FT_OFFLOAD, FDB_TC_MISS, -- cgit v1.2.3 From 91bafc638ed4128eaca074fe7e88a5444db14325 Mon Sep 17 00:00:00 2001 From: Jianbo Liu Date: Mon, 31 Jul 2023 14:28:17 +0300 Subject: net/mlx5e: Handle IPsec offload for RX datapath in switchdev mode Reuse tun opts bits in reg c1, to pass IPsec obj id to datapath. As this is only for RX SA and there are only 11 bits, xarray is used to map IPsec obj id to an index, which is between 1 and 0x7ff, and replace obj id to write to reg c1. Signed-off-by: Jianbo Liu Signed-off-by: Leon Romanovsky Link: https://lore.kernel.org/r/43d60fbcc9cd672a97d7e2a2f7fe6a3d9e9a776d.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/mlx5/eswitch.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/linux/mlx5/eswitch.h b/include/linux/mlx5/eswitch.h index e2701ed0200e..950d2431a53c 100644 --- a/include/linux/mlx5/eswitch.h +++ b/include/linux/mlx5/eswitch.h @@ -144,6 +144,9 @@ u32 mlx5_eswitch_get_vport_metadata_for_set(struct mlx5_eswitch *esw, GENMASK(31 - ESW_TUN_ID_BITS - ESW_RESERVED_BITS, \ ESW_TUN_OPTS_OFFSET + 1) +/* reuse tun_opts for the mapped ipsec obj id when tun_id is 0 (invalid) */ +#define ESW_IPSEC_RX_MAPPED_ID_MASK GENMASK(ESW_TUN_OPTS_BITS - 1, 0) + u8 mlx5_eswitch_mode(const struct mlx5_core_dev *dev); u16 mlx5_eswitch_get_total_vports(const struct mlx5_core_dev *dev); struct mlx5_core_dev *mlx5_eswitch_get_core_dev(struct mlx5_eswitch *esw); -- cgit v1.2.3 From c6c2bf5db4ea14b316af1fd03cc6c5c61f751f79 Mon Sep 17 00:00:00 2001 From: Jianbo Liu Date: Mon, 31 Jul 2023 14:28:19 +0300 Subject: net/mlx5e: Support IPsec packet offload for TX in switchdev mode The IPsec encryption is done at the last, so add new prio for IPsec offload in FDB, and put it just lower than the slow path prio and higher than the per-vport prio. Three levels are added for TX. The first one is for ip xfrm policy. The sa table is created in the second level for ip xfrm state. The status table is created at the last to count the number of packets encrypted. The rules, which forward packets to uplink, are changed to forward them to IPsec TX tables first. These rules are restored after those tables are destroyed, which is done immediately when there is no reference to them, just as what does in legacy mode. The support for slow path is added here, by refreshing uplink's channels. But, the handling for TC fast path, which is more complicated, will be added later. Besides, reg c4 is used instead to match reqid. Signed-off-by: Jianbo Liu Signed-off-by: Leon Romanovsky Link: https://lore.kernel.org/r/cfd0e6ffaf0b8c55ebaa9fb0649b7c504b6b8ec6.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/mlx5/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index 6b1fa94f69c8..c302ec34255b 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -115,6 +115,7 @@ enum { FDB_TC_MISS, FDB_BR_OFFLOAD, FDB_SLOW_PATH, + FDB_CRYPTO_EGRESS, FDB_PER_VPORT, }; -- cgit v1.2.3 From c8e350e62fc51f3fda28f166fc402f4fb539f528 Mon Sep 17 00:00:00 2001 From: Jianbo Liu Date: Mon, 31 Jul 2023 14:28:24 +0300 Subject: net/mlx5e: Make TC and IPsec offloads mutually exclusive on a netdev For IPsec packet offload mode, the order of TC offload and IPsec offload on the same netdevice is not aligned with the order in the non-offload software. For example, for RX, the software performs TC first and then IPsec transformation, but the implementation for offload does that in the opposite way. To resolve the difference for now, either IPsec offload or TC offload, not both, is allowed for a specific interface. Signed-off-by: Jianbo Liu Signed-off-by: Leon Romanovsky Link: https://lore.kernel.org/r/8e2e5e3b0984d785066e8663aaf97b3ba1bb873f.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/mlx5/driver.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index f21703fb75fd..fa70c25423b2 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -806,6 +806,8 @@ struct mlx5_core_dev { u32 vsc_addr; struct mlx5_hv_vhca *hv_vhca; struct mlx5_thermal *thermal; + u64 num_block_tc; + u64 num_block_ipsec; }; struct mlx5_db { -- cgit v1.2.3 From 49c467dca39df9a3674854969cc5a8eb7170682d Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Mon, 31 Jul 2023 22:10:30 +0800 Subject: sctp: Remove unused function declarations These declarations are never implemented since beginning of git history. Signed-off-by: Yue Haibing Reviewed-by: Simon Horman Acked-by: Xin Long Link: https://lore.kernel.org/r/20230731141030.32772-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski --- include/net/sctp/sm.h | 3 --- include/net/sctp/structs.h | 2 -- 2 files changed, 5 deletions(-) (limited to 'include') diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h index f37c7a558d6d..64c42bd56bb2 100644 --- a/include/net/sctp/sm.h +++ b/include/net/sctp/sm.h @@ -156,7 +156,6 @@ sctp_state_fn_t sctp_sf_do_6_2_sack; sctp_state_fn_t sctp_sf_autoclose_timer_expire; /* Prototypes for utility support functions. */ -__u8 sctp_get_chunk_type(struct sctp_chunk *chunk); const struct sctp_sm_table_entry *sctp_sm_lookup_event( struct net *net, enum sctp_event_type event_type, @@ -166,8 +165,6 @@ int sctp_chunk_iif(const struct sctp_chunk *); struct sctp_association *sctp_make_temp_asoc(const struct sctp_endpoint *, struct sctp_chunk *, gfp_t gfp); -__u32 sctp_generate_verification_tag(void); -void sctp_populate_tie_tags(__u8 *cookie, __u32 curTag, __u32 hisTag); /* Prototypes for chunk-building functions. */ struct sctp_chunk *sctp_make_init(const struct sctp_association *asoc, diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index 5c72d1864dd6..5a24d6d8522a 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -1122,8 +1122,6 @@ void sctp_outq_free(struct sctp_outq*); void sctp_outq_tail(struct sctp_outq *, struct sctp_chunk *chunk, gfp_t); int sctp_outq_sack(struct sctp_outq *, struct sctp_chunk *); int sctp_outq_is_empty(const struct sctp_outq *); -void sctp_outq_restart(struct sctp_outq *); - void sctp_retransmit(struct sctp_outq *q, struct sctp_transport *transport, enum sctp_retransmit_reason reason); void sctp_retransmit_mark(struct sctp_outq *, struct sctp_transport *, __u8); -- cgit v1.2.3 From 66f7223039c0cef81221e9779520479895995815 Mon Sep 17 00:00:00 2001 From: Maxim Georgiev Date: Tue, 1 Aug 2023 17:28:13 +0300 Subject: net: add NDOs for configuring hardware timestamping Current hardware timestamping API for NICs requires implementing .ndo_eth_ioctl() for SIOCGHWTSTAMP and SIOCSHWTSTAMP. That API has some boilerplate such as request parameter translation between user and kernel address spaces, handling possible translation failures correctly, etc. Since it is the same all across the board, it would be desirable to handle it through generic code. Here we introduce .ndo_hwtstamp_get() and .ndo_hwtstamp_set(), which implement that boilerplate and allow drivers to just act upon requests. Suggested-by: Jakub Kicinski Signed-off-by: Maxim Georgiev Signed-off-by: Vladimir Oltean Reviewed-by: Jacob Keller Tested-by: Horatiu Vultur Link: https://lore.kernel.org/r/20230801142824.1772134-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski --- include/linux/net_tstamp.h | 8 ++++++++ include/linux/netdevice.h | 16 ++++++++++++++++ 2 files changed, 24 insertions(+) (limited to 'include') diff --git a/include/linux/net_tstamp.h b/include/linux/net_tstamp.h index fd67f3cc0c4b..7c59824f43f5 100644 --- a/include/linux/net_tstamp.h +++ b/include/linux/net_tstamp.h @@ -30,4 +30,12 @@ static inline void hwtstamp_config_to_kernel(struct kernel_hwtstamp_config *kern kernel_cfg->rx_filter = cfg->rx_filter; } +static inline void hwtstamp_config_from_kernel(struct hwtstamp_config *cfg, + const struct kernel_hwtstamp_config *kernel_cfg) +{ + cfg->flags = kernel_cfg->flags; + cfg->tx_type = kernel_cfg->tx_type; + cfg->rx_filter = kernel_cfg->rx_filter; +} + #endif /* _LINUX_NET_TIMESTAMPING_H_ */ diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 84c36a7f873f..08a0b8d45dc9 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -57,6 +57,7 @@ struct netpoll_info; struct device; struct ethtool_ops; +struct kernel_hwtstamp_config; struct phy_device; struct dsa_port; struct ip_tunnel_parm; @@ -1418,6 +1419,16 @@ struct netdev_net_notifier { * Get hardware timestamp based on normal/adjustable time or free running * cycle counter. This function is required if physical clock supports a * free running cycle counter. + * + * int (*ndo_hwtstamp_get)(struct net_device *dev, + * struct kernel_hwtstamp_config *kernel_config); + * Get the currently configured hardware timestamping parameters for the + * NIC device. + * + * int (*ndo_hwtstamp_set)(struct net_device *dev, + * struct kernel_hwtstamp_config *kernel_config, + * struct netlink_ext_ack *extack); + * Change the hardware timestamping parameters for NIC device. */ struct net_device_ops { int (*ndo_init)(struct net_device *dev); @@ -1652,6 +1663,11 @@ struct net_device_ops { ktime_t (*ndo_get_tstamp)(struct net_device *dev, const struct skb_shared_hwtstamps *hwtstamps, bool cycles); + int (*ndo_hwtstamp_get)(struct net_device *dev, + struct kernel_hwtstamp_config *kernel_config); + int (*ndo_hwtstamp_set)(struct net_device *dev, + struct kernel_hwtstamp_config *kernel_config, + struct netlink_ext_ack *extack); }; struct xdp_metadata_ops { -- cgit v1.2.3 From e47d01fea663b1fe58d0a493efc9ed667f70242e Mon Sep 17 00:00:00 2001 From: Maxim Georgiev Date: Tue, 1 Aug 2023 17:28:14 +0300 Subject: net: add hwtstamping helpers for stackable net devices The stackable net devices with hwtstamping support (vlan, macvlan, bonding) only pass the hwtstamping ops to the lower (real) device. These drivers are the first that need to be converted to the new timestamping API, because if they aren't prepared to handle that, then no real device driver cannot be converted to the new API either. After studying what vlan_dev_ioctl(), macvlan_eth_ioctl() and bond_eth_ioctl() have in common, here we propose two generic implementations of ndo_hwtstamp_get() and ndo_hwtstamp_set() which can be called by those 3 drivers, with "dev" being their lower device. These helpers cover both cases, when the lower driver is converted to the new API or unconverted. We need some hacks in case of an unconverted driver, namely to stuff some pointers in struct kernel_hwtstamp_config which shouldn't have been there (since the new API isn't supposed to need it). These will be removed when all drivers will have been converted to the new API. Signed-off-by: Maxim Georgiev Signed-off-by: Vladimir Oltean Reviewed-by: Jacob Keller Link: https://lore.kernel.org/r/20230801142824.1772134-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski --- include/linux/net_tstamp.h | 6 ++++++ include/linux/netdevice.h | 5 +++++ 2 files changed, 11 insertions(+) (limited to 'include') diff --git a/include/linux/net_tstamp.h b/include/linux/net_tstamp.h index 7c59824f43f5..03e922814851 100644 --- a/include/linux/net_tstamp.h +++ b/include/linux/net_tstamp.h @@ -11,6 +11,10 @@ * @flags: see struct hwtstamp_config * @tx_type: see struct hwtstamp_config * @rx_filter: see struct hwtstamp_config + * @ifr: pointer to ifreq structure from the original ioctl request, to pass to + * a legacy implementation of a lower driver + * @copied_to_user: request was passed to a legacy implementation which already + * copied the ioctl request back to user space * * Prefer using this structure for in-kernel processing of hardware * timestamping configuration, over the inextensible struct hwtstamp_config @@ -20,6 +24,8 @@ struct kernel_hwtstamp_config { int flags; int tx_type; int rx_filter; + struct ifreq *ifr; + bool copied_to_user; }; static inline void hwtstamp_config_to_kernel(struct kernel_hwtstamp_config *kernel_cfg, diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 08a0b8d45dc9..23e335f245cf 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3950,6 +3950,11 @@ int put_user_ifreq(struct ifreq *ifr, void __user *arg); int dev_ioctl(struct net *net, unsigned int cmd, struct ifreq *ifr, void __user *data, bool *need_copyout); int dev_ifconf(struct net *net, struct ifconf __user *ifc); +int generic_hwtstamp_get_lower(struct net_device *dev, + struct kernel_hwtstamp_config *kernel_cfg); +int generic_hwtstamp_set_lower(struct net_device *dev, + struct kernel_hwtstamp_config *kernel_cfg, + struct netlink_ext_ack *extack); int dev_ethtool(struct net *net, struct ifreq *ifr, void __user *userdata); unsigned int dev_get_flags(const struct net_device *); int __dev_change_flags(struct net_device *dev, unsigned int flags, -- cgit v1.2.3 From 60495b6622ca67f5180343b89bd932d28d23f63a Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 1 Aug 2023 17:28:23 +0300 Subject: net: phy: provide phylib stubs for hardware timestamping operations net/core/dev_ioctl.c (built-in code) will want to call phy_mii_ioctl() for hardware timestamping purposes. This is not directly possible, because phy_mii_ioctl() is a symbol provided under CONFIG_PHYLIB. Do something similar to what was done in DSA in commit 5a17818682cf ("net: dsa: replace NETDEV_PRE_CHANGE_HWTSTAMP notifier with a stub"), and arrange some indirect calls to phy_mii_ioctl() through a stub structure containing function pointers, that's provided by phylib as built-in even when CONFIG_PHYLIB=m, and which phy_init() populates at runtime (module insertion). Note: maybe the ownership of the ethtool_phy_ops singleton is backwards, and the methods exposed by that should be later merged into phylib_stubs. Signed-off-by: Vladimir Oltean Link: https://lore.kernel.org/r/20230801142824.1772134-12-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 7 +++++ include/linux/phylib_stubs.h | 68 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 75 insertions(+) create mode 100644 include/linux/phylib_stubs.h (limited to 'include') diff --git a/include/linux/phy.h b/include/linux/phy.h index b254848a9c99..ba08b0e60279 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -298,6 +298,7 @@ static inline const char *phy_modes(phy_interface_t interface) #define MII_BUS_ID_SIZE 61 struct device; +struct kernel_hwtstamp_config; struct phylink; struct sfp_bus; struct sfp_upstream_ops; @@ -1955,6 +1956,12 @@ int phy_ethtool_set_plca_cfg(struct phy_device *phydev, int phy_ethtool_get_plca_status(struct phy_device *phydev, struct phy_plca_status *plca_st); +int __phy_hwtstamp_get(struct phy_device *phydev, + struct kernel_hwtstamp_config *config); +int __phy_hwtstamp_set(struct phy_device *phydev, + struct kernel_hwtstamp_config *config, + struct netlink_ext_ack *extack); + static inline int phy_package_read(struct phy_device *phydev, u32 regnum) { struct phy_package_shared *shared = phydev->shared; diff --git a/include/linux/phylib_stubs.h b/include/linux/phylib_stubs.h new file mode 100644 index 000000000000..1279f48c8a70 --- /dev/null +++ b/include/linux/phylib_stubs.h @@ -0,0 +1,68 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Stubs for the Network PHY library + */ + +#include + +struct kernel_hwtstamp_config; +struct netlink_ext_ack; +struct phy_device; + +#if IS_ENABLED(CONFIG_PHYLIB) + +extern const struct phylib_stubs *phylib_stubs; + +struct phylib_stubs { + int (*hwtstamp_get)(struct phy_device *phydev, + struct kernel_hwtstamp_config *config); + int (*hwtstamp_set)(struct phy_device *phydev, + struct kernel_hwtstamp_config *config, + struct netlink_ext_ack *extack); +}; + +static inline int phy_hwtstamp_get(struct phy_device *phydev, + struct kernel_hwtstamp_config *config) +{ + /* phylib_register_stubs() and phylib_unregister_stubs() + * also run under rtnl_lock(). + */ + ASSERT_RTNL(); + + if (!phylib_stubs) + return -EOPNOTSUPP; + + return phylib_stubs->hwtstamp_get(phydev, config); +} + +static inline int phy_hwtstamp_set(struct phy_device *phydev, + struct kernel_hwtstamp_config *config, + struct netlink_ext_ack *extack) +{ + /* phylib_register_stubs() and phylib_unregister_stubs() + * also run under rtnl_lock(). + */ + ASSERT_RTNL(); + + if (!phylib_stubs) + return -EOPNOTSUPP; + + return phylib_stubs->hwtstamp_set(phydev, config, extack); +} + +#else + +static inline int phy_hwtstamp_get(struct phy_device *phydev, + struct kernel_hwtstamp_config *config) +{ + return -EOPNOTSUPP; +} + +static inline int phy_hwtstamp_set(struct phy_device *phydev, + struct kernel_hwtstamp_config *config, + struct netlink_ext_ack *extack) +{ + return -EOPNOTSUPP; +} + +#endif -- cgit v1.2.3 From fd770e856e226f80fe6e1dc9d1861bcb135cdf0b Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 1 Aug 2023 17:28:24 +0300 Subject: net: remove phy_has_hwtstamp() -> phy_mii_ioctl() decision from converted drivers It is desirable that the new .ndo_hwtstamp_set() API gives more uniformity, less overhead and future flexibility w.r.t. the PHY timestamping behavior. Currently there are some drivers which allow PHY timestamping through the procedure mentioned in Documentation/networking/timestamping.rst. They don't do anything locally if phy_has_hwtstamp() is set, except for lan966x which installs PTP packet traps. Centralize that behavior in a new dev_set_hwtstamp_phylib() code function, which calls either phy_mii_ioctl() for the phylib PHY, or .ndo_hwtstamp_set() of the netdev, based on a single policy (currently simplistic: phy_has_hwtstamp()). Any driver converted to .ndo_hwtstamp_set() will automatically opt into the centralized phylib timestamping policy. Unconverted drivers still get to choose whether they let the PHY handle timestamping or not. Netdev drivers with integrated PHY drivers that don't use phylib presumably don't set dev->phydev, and those will always see HWTSTAMP_SOURCE_NETDEV requests even when converted. The timestamping policy will remain 100% up to them. Signed-off-by: Vladimir Oltean Reviewed-by: Jacob Keller Tested-by: Horatiu Vultur Link: https://lore.kernel.org/r/20230801142824.1772134-13-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski --- include/linux/net_tstamp.h | 16 ++++++++++++++++ include/linux/netdevice.h | 4 ++++ 2 files changed, 20 insertions(+) (limited to 'include') diff --git a/include/linux/net_tstamp.h b/include/linux/net_tstamp.h index 03e922814851..eb01c37e71e0 100644 --- a/include/linux/net_tstamp.h +++ b/include/linux/net_tstamp.h @@ -5,6 +5,11 @@ #include +enum hwtstamp_source { + HWTSTAMP_SOURCE_NETDEV, + HWTSTAMP_SOURCE_PHYLIB, +}; + /** * struct kernel_hwtstamp_config - Kernel copy of struct hwtstamp_config * @@ -15,6 +20,8 @@ * a legacy implementation of a lower driver * @copied_to_user: request was passed to a legacy implementation which already * copied the ioctl request back to user space + * @source: indication whether timestamps should come from the netdev or from + * an attached phylib PHY * * Prefer using this structure for in-kernel processing of hardware * timestamping configuration, over the inextensible struct hwtstamp_config @@ -26,6 +33,7 @@ struct kernel_hwtstamp_config { int rx_filter; struct ifreq *ifr; bool copied_to_user; + enum hwtstamp_source source; }; static inline void hwtstamp_config_to_kernel(struct kernel_hwtstamp_config *kernel_cfg, @@ -44,4 +52,12 @@ static inline void hwtstamp_config_from_kernel(struct hwtstamp_config *cfg, cfg->rx_filter = kernel_cfg->rx_filter; } +static inline bool kernel_hwtstamp_config_changed(const struct kernel_hwtstamp_config *a, + const struct kernel_hwtstamp_config *b) +{ + return a->flags != b->flags || + a->tx_type != b->tx_type || + a->rx_filter != b->rx_filter; +} + #endif /* _LINUX_NET_TIMESTAMPING_H_ */ diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 23e335f245cf..85d594460c66 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1724,6 +1724,9 @@ struct xdp_metadata_ops { * @IFF_TX_SKB_NO_LINEAR: device/driver is capable of xmitting frames with * skb_headlen(skb) == 0 (data starts from frag0) * @IFF_CHANGE_PROTO_DOWN: device supports setting carrier via IFLA_PROTO_DOWN + * @IFF_SEE_ALL_HWTSTAMP_REQUESTS: device wants to see calls to + * ndo_hwtstamp_set() for all timestamp requests regardless of source, + * even if those aren't HWTSTAMP_SOURCE_NETDEV. */ enum netdev_priv_flags { IFF_802_1Q_VLAN = 1<<0, @@ -1759,6 +1762,7 @@ enum netdev_priv_flags { IFF_NO_ADDRCONF = BIT_ULL(30), IFF_TX_SKB_NO_LINEAR = BIT_ULL(31), IFF_CHANGE_PROTO_DOWN = BIT_ULL(32), + IFF_SEE_ALL_HWTSTAMP_REQUESTS = BIT_ULL(33), }; #define IFF_802_1Q_VLAN IFF_802_1Q_VLAN -- cgit v1.2.3 From f11e5bd159b08976db9e7a9eabbf0318dfe5429d Mon Sep 17 00:00:00 2001 From: Mateusz Kowalski Date: Tue, 1 Aug 2023 14:37:50 +0200 Subject: bonding: support balance-alb with openvswitch Commit d5410ac7b0ba ("net:bonding:support balance-alb interface with vlan to bridge") introduced a support for balance-alb mode for interfaces connected to the linux bridge by fixing missing matching of MAC entry in FDB. In our testing we discovered that it still does not work when the bond is connected to the OVS bridge as show in diagram below: eth1(mac:eth1_mac)--bond0(balance-alb,mac:eth0_mac)--eth0(mac:eth0_mac) | bond0.150(mac:eth0_mac) | ovs_bridge(ip:bridge_ip,mac:eth0_mac) This patch fixes it by checking not only if the device is a bridge but also if it is an openvswitch. Signed-off-by: Mateusz Kowalski Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/9fe7297c-609e-208b-c77b-3ceef6eb51a4@redhat.com Signed-off-by: Paolo Abeni --- include/linux/netdevice.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 85d594460c66..4176a738177b 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -5128,6 +5128,11 @@ static inline bool netif_is_ovs_port(const struct net_device *dev) return dev->priv_flags & IFF_OVS_DATAPATH; } +static inline bool netif_is_any_bridge_master(const struct net_device *dev) +{ + return netif_is_bridge_master(dev) || netif_is_ovs_master(dev); +} + static inline bool netif_is_any_bridge_port(const struct net_device *dev) { return netif_is_bridge_port(dev) || netif_is_ovs_port(dev); -- cgit v1.2.3 From 92272ec4107ef4f826b694a1338562c007e09821 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 2 Aug 2023 18:02:28 -0700 Subject: eth: add missing xdp.h includes in drivers Handful of drivers currently expect to get xdp.h by virtue of including netdevice.h. This will soon no longer be the case so add explicit includes. Reviewed-by: Wei Fang Reviewed-by: Gerhard Engleder Signed-off-by: Jakub Kicinski Acked-by: Jesper Dangaard Brouer Link: https://lore.kernel.org/r/20230803010230.1755386-2-kuba@kernel.org Signed-off-by: Martin KaFai Lau --- include/net/mana/mana.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h index 024ad8ddb27e..1ccdca03e166 100644 --- a/include/net/mana/mana.h +++ b/include/net/mana/mana.h @@ -4,6 +4,8 @@ #ifndef _MANA_H #define _MANA_H +#include + #include "gdma.h" #include "hw_channel.h" -- cgit v1.2.3 From 49e47a5b6145d86c30022fe0e949bbb24bae28ba Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 2 Aug 2023 18:02:29 -0700 Subject: net: move struct netdev_rx_queue out of netdevice.h struct netdev_rx_queue is touched in only a few places and having it defined in netdevice.h brings in the dependency on xdp.h, because struct xdp_rxq_info gets embedded in struct netdev_rx_queue. In prep for removal of xdp.h from netdevice.h move all the netdev_rx_queue stuff to a new header. We could technically break the new header up to avoid the sysfs.h include but it's so rarely included it doesn't seem to be worth it at this point. Reviewed-by: Amritha Nambiar Signed-off-by: Jakub Kicinski Acked-by: Jesper Dangaard Brouer Link: https://lore.kernel.org/r/20230803010230.1755386-3-kuba@kernel.org Signed-off-by: Martin KaFai Lau --- include/linux/netdevice.h | 44 ----------------------------------- include/net/netdev_rx_queue.h | 53 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 53 insertions(+), 44 deletions(-) create mode 100644 include/net/netdev_rx_queue.h (limited to 'include') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 3800d0479698..5563c8a210b5 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -782,32 +782,6 @@ bool rps_may_expire_flow(struct net_device *dev, u16 rxq_index, u32 flow_id, #endif #endif /* CONFIG_RPS */ -/* This structure contains an instance of an RX queue. */ -struct netdev_rx_queue { - struct xdp_rxq_info xdp_rxq; -#ifdef CONFIG_RPS - struct rps_map __rcu *rps_map; - struct rps_dev_flow_table __rcu *rps_flow_table; -#endif - struct kobject kobj; - struct net_device *dev; - netdevice_tracker dev_tracker; - -#ifdef CONFIG_XDP_SOCKETS - struct xsk_buff_pool *pool; -#endif -} ____cacheline_aligned_in_smp; - -/* - * RX queue sysfs structures and functions. - */ -struct rx_queue_attribute { - struct attribute attr; - ssize_t (*show)(struct netdev_rx_queue *queue, char *buf); - ssize_t (*store)(struct netdev_rx_queue *queue, - const char *buf, size_t len); -}; - /* XPS map type and offset of the xps map within net_device->xps_maps[]. */ enum xps_map_type { XPS_CPUS = 0, @@ -3828,24 +3802,6 @@ static inline int netif_set_real_num_rx_queues(struct net_device *dev, int netif_set_real_num_queues(struct net_device *dev, unsigned int txq, unsigned int rxq); -static inline struct netdev_rx_queue * -__netif_get_rx_queue(struct net_device *dev, unsigned int rxq) -{ - return dev->_rx + rxq; -} - -#ifdef CONFIG_SYSFS -static inline unsigned int get_netdev_rx_queue_index( - struct netdev_rx_queue *queue) -{ - struct net_device *dev = queue->dev; - int index = queue - dev->_rx; - - BUG_ON(index >= dev->num_rx_queues); - return index; -} -#endif - int netif_get_num_default_rss_queues(void); void dev_kfree_skb_irq_reason(struct sk_buff *skb, enum skb_drop_reason reason); diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h new file mode 100644 index 000000000000..cdcafb30d437 --- /dev/null +++ b/include/net/netdev_rx_queue.h @@ -0,0 +1,53 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_NETDEV_RX_QUEUE_H +#define _LINUX_NETDEV_RX_QUEUE_H + +#include +#include +#include +#include + +/* This structure contains an instance of an RX queue. */ +struct netdev_rx_queue { + struct xdp_rxq_info xdp_rxq; +#ifdef CONFIG_RPS + struct rps_map __rcu *rps_map; + struct rps_dev_flow_table __rcu *rps_flow_table; +#endif + struct kobject kobj; + struct net_device *dev; + netdevice_tracker dev_tracker; + +#ifdef CONFIG_XDP_SOCKETS + struct xsk_buff_pool *pool; +#endif +} ____cacheline_aligned_in_smp; + +/* + * RX queue sysfs structures and functions. + */ +struct rx_queue_attribute { + struct attribute attr; + ssize_t (*show)(struct netdev_rx_queue *queue, char *buf); + ssize_t (*store)(struct netdev_rx_queue *queue, + const char *buf, size_t len); +}; + +static inline struct netdev_rx_queue * +__netif_get_rx_queue(struct net_device *dev, unsigned int rxq) +{ + return dev->_rx + rxq; +} + +#ifdef CONFIG_SYSFS +static inline unsigned int +get_netdev_rx_queue_index(struct netdev_rx_queue *queue) +{ + struct net_device *dev = queue->dev; + int index = queue - dev->_rx; + + BUG_ON(index >= dev->num_rx_queues); + return index; +} +#endif +#endif -- cgit v1.2.3 From 680ee0456a5712309db9ec2692e908ea1d6b1644 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 2 Aug 2023 18:02:30 -0700 Subject: net: invert the netdevice.h vs xdp.h dependency xdp.h is far more specific and is included in only 67 other files vs netdevice.h's 1538 include sites. Make xdp.h include netdevice.h, instead of the other way around. This decreases the incremental allmodconfig builds size when xdp.h is touched from 5947 to 662 objects. Move bpf_prog_run_xdp() to xdp.h, seems appropriate and filter.h is a mega-header in its own right so it's nice to avoid xdp.h getting included there as well. The only unfortunate part is that the typedef for xdp_features_t has to move to netdevice.h, since its embedded in struct netdevice. Signed-off-by: Jakub Kicinski Acked-by: Jesper Dangaard Brouer Link: https://lore.kernel.org/r/20230803010230.1755386-4-kuba@kernel.org Signed-off-by: Martin KaFai Lau --- include/linux/filter.h | 17 ----------------- include/linux/netdevice.h | 11 ++++------- include/net/busy_poll.h | 1 + include/net/xdp.h | 29 +++++++++++++++++++++++++---- include/trace/events/xdp.h | 1 + 5 files changed, 31 insertions(+), 28 deletions(-) (limited to 'include') diff --git a/include/linux/filter.h b/include/linux/filter.h index f5eabe3fa5e8..2d6fe30bad5f 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -774,23 +774,6 @@ DECLARE_STATIC_KEY_FALSE(bpf_master_redirect_enabled_key); u32 xdp_master_redirect(struct xdp_buff *xdp); -static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog, - struct xdp_buff *xdp) -{ - /* Driver XDP hooks are invoked within a single NAPI poll cycle and thus - * under local_bh_disable(), which provides the needed RCU protection - * for accessing map entries. - */ - u32 act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp)); - - if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) { - if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev)) - act = xdp_master_redirect(xdp); - } - - return act; -} - void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog); static inline u32 bpf_prog_insn_size(const struct bpf_prog *prog) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 5563c8a210b5..d8ed85183fe4 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -40,7 +40,6 @@ #include #endif #include -#include #include #include @@ -76,8 +75,12 @@ struct udp_tunnel_nic_info; struct udp_tunnel_nic; struct bpf_prog; struct xdp_buff; +struct xdp_frame; +struct xdp_metadata_ops; struct xdp_md; +typedef u32 xdp_features_t; + void synchronize_net(void); void netdev_set_default_ethtool_ops(struct net_device *dev, const struct ethtool_ops *ops); @@ -1628,12 +1631,6 @@ struct net_device_ops { bool cycles); }; -struct xdp_metadata_ops { - int (*xmo_rx_timestamp)(const struct xdp_md *ctx, u64 *timestamp); - int (*xmo_rx_hash)(const struct xdp_md *ctx, u32 *hash, - enum xdp_rss_hash_type *rss_type); -}; - /** * enum netdev_priv_flags - &struct net_device priv_flags * diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h index f90f0021f5f2..4dabeb6c76d3 100644 --- a/include/net/busy_poll.h +++ b/include/net/busy_poll.h @@ -16,6 +16,7 @@ #include #include #include +#include /* 0 - Reserved to indicate value not set * 1..NR_CPUS - Reserved for sender_cpu diff --git a/include/net/xdp.h b/include/net/xdp.h index d1c5381fc95f..de08c8e0d134 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -6,9 +6,10 @@ #ifndef __LINUX_NET_XDP_H__ #define __LINUX_NET_XDP_H__ -#include /* skb_shared_info */ -#include #include +#include +#include +#include /* skb_shared_info */ /** * DOC: XDP RX-queue information @@ -45,8 +46,6 @@ enum xdp_mem_type { MEM_TYPE_MAX, }; -typedef u32 xdp_features_t; - /* XDP flags for ndo_xdp_xmit */ #define XDP_XMIT_FLUSH (1U << 0) /* doorbell signal consumer */ #define XDP_XMIT_FLAGS_MASK XDP_XMIT_FLUSH @@ -443,6 +442,12 @@ enum xdp_rss_hash_type { XDP_RSS_TYPE_L4_IPV6_SCTP_EX = XDP_RSS_TYPE_L4_IPV6_SCTP | XDP_RSS_L3_DYNHDR, }; +struct xdp_metadata_ops { + int (*xmo_rx_timestamp)(const struct xdp_md *ctx, u64 *timestamp); + int (*xmo_rx_hash)(const struct xdp_md *ctx, u32 *hash, + enum xdp_rss_hash_type *rss_type); +}; + #ifdef CONFIG_NET u32 bpf_xdp_metadata_kfunc_id(int id); bool bpf_dev_bound_kfunc_id(u32 btf_id); @@ -474,4 +479,20 @@ static inline void xdp_clear_features_flag(struct net_device *dev) xdp_set_features_flag(dev, 0); } +static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog, + struct xdp_buff *xdp) +{ + /* Driver XDP hooks are invoked within a single NAPI poll cycle and thus + * under local_bh_disable(), which provides the needed RCU protection + * for accessing map entries. + */ + u32 act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp)); + + if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) { + if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev)) + act = xdp_master_redirect(xdp); + } + + return act; +} #endif /* __LINUX_NET_XDP_H__ */ diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h index cd89f1d5ce7b..9adc2bdf2f94 100644 --- a/include/trace/events/xdp.h +++ b/include/trace/events/xdp.h @@ -9,6 +9,7 @@ #include #include #include +#include #define __XDP_ACT_MAP(FN) \ FN(ABORTED) \ -- cgit v1.2.3 From 82e896d992fa631cda1f63239fd47b3ab781ffa6 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 2 Aug 2023 09:18:21 -0700 Subject: docs: net: page_pool: use kdoc to avoid duplicating the information All struct members of the driver-facing APIs are documented twice, in the code and under Documentation. This is a bit tedious. I also get the feeling that a lot of developers will read the header when coding, rather than the doc. Bring the two a little closer together by using kdoc for structs and functions. Using kdoc also gives us links (mentioning a function or struct in the text gets replaced by a link to its doc). Reviewed-by: Randy Dunlap Tested-by: Randy Dunlap Acked-by: Jesper Dangaard Brouer Link: https://lore.kernel.org/r/20230802161821.3621985-3-kuba@kernel.org Signed-off-by: Jakub Kicinski --- include/net/page_pool.h | 134 +++++++++++++++++++++++++++++++++++++----------- 1 file changed, 104 insertions(+), 30 deletions(-) (limited to 'include') diff --git a/include/net/page_pool.h b/include/net/page_pool.h index f1d5cc1fa13b..73d4f786418d 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -70,47 +70,76 @@ struct pp_alloc_cache { struct page *cache[PP_ALLOC_CACHE_SIZE]; }; +/** + * struct page_pool_params - page pool parameters + * @flags: PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV, PP_FLAG_PAGE_FRAG + * @order: 2^order pages on allocation + * @pool_size: size of the ptr_ring + * @nid: NUMA node id to allocate from pages from + * @dev: device, for DMA pre-mapping purposes + * @napi: NAPI which is the sole consumer of pages, otherwise NULL + * @dma_dir: DMA mapping direction + * @max_len: max DMA sync memory size for PP_FLAG_DMA_SYNC_DEV + * @offset: DMA sync address offset for PP_FLAG_DMA_SYNC_DEV + */ struct page_pool_params { unsigned int flags; unsigned int order; unsigned int pool_size; - int nid; /* Numa node id to allocate from pages from */ - struct device *dev; /* device, for DMA pre-mapping purposes */ - struct napi_struct *napi; /* Sole consumer of pages, otherwise NULL */ - enum dma_data_direction dma_dir; /* DMA mapping direction */ - unsigned int max_len; /* max DMA sync memory size */ - unsigned int offset; /* DMA addr offset */ + int nid; + struct device *dev; + struct napi_struct *napi; + enum dma_data_direction dma_dir; + unsigned int max_len; + unsigned int offset; +/* private: used by test code only */ void (*init_callback)(struct page *page, void *arg); void *init_arg; }; #ifdef CONFIG_PAGE_POOL_STATS +/** + * struct page_pool_alloc_stats - allocation statistics + * @fast: successful fast path allocations + * @slow: slow path order-0 allocations + * @slow_high_order: slow path high order allocations + * @empty: ptr ring is empty, so a slow path allocation was forced + * @refill: an allocation which triggered a refill of the cache + * @waive: pages obtained from the ptr ring that cannot be added to + * the cache due to a NUMA mismatch + */ struct page_pool_alloc_stats { - u64 fast; /* fast path allocations */ - u64 slow; /* slow-path order 0 allocations */ - u64 slow_high_order; /* slow-path high order allocations */ - u64 empty; /* failed refills due to empty ptr ring, forcing - * slow path allocation - */ - u64 refill; /* allocations via successful refill */ - u64 waive; /* failed refills due to numa zone mismatch */ + u64 fast; + u64 slow; + u64 slow_high_order; + u64 empty; + u64 refill; + u64 waive; }; +/** + * struct page_pool_recycle_stats - recycling (freeing) statistics + * @cached: recycling placed page in the page pool cache + * @cache_full: page pool cache was full + * @ring: page placed into the ptr ring + * @ring_full: page released from page pool because the ptr ring was full + * @released_refcnt: page released (and not recycled) because refcnt > 1 + */ struct page_pool_recycle_stats { - u64 cached; /* recycling placed page in the cache. */ - u64 cache_full; /* cache was full */ - u64 ring; /* recycling placed page back into ptr ring */ - u64 ring_full; /* page was released from page-pool because - * PTR ring was full. - */ - u64 released_refcnt; /* page released because of elevated - * refcnt - */ + u64 cached; + u64 cache_full; + u64 ring; + u64 ring_full; + u64 released_refcnt; }; -/* This struct wraps the above stats structs so users of the - * page_pool_get_stats API can pass a single argument when requesting the - * stats for the page pool. +/** + * struct page_pool_stats - combined page pool use statistics + * @alloc_stats: see struct page_pool_alloc_stats + * @recycle_stats: see struct page_pool_recycle_stats + * + * Wrapper struct for combining page pool stats with different storage + * requirements. */ struct page_pool_stats { struct page_pool_alloc_stats alloc_stats; @@ -211,6 +240,12 @@ struct page_pool { struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp); +/** + * page_pool_dev_alloc_pages() - allocate a page. + * @pool: pool from which to allocate + * + * Get a page from the page allocator or page_pool caches. + */ static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool) { gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN); @@ -230,8 +265,12 @@ static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool, return page_pool_alloc_frag(pool, offset, size, gfp); } -/* get the stored dma direction. A driver might decide to treat this locally and - * avoid the extra cache line from page_pool to determine the direction +/** + * page_pool_get_dma_dir() - Retrieve the stored DMA direction. + * @pool: pool from which page was allocated + * + * Get the stored dma direction. A driver might decide to store this locally + * and avoid the extra cache line from page_pool to determine the direction. */ static inline enum dma_data_direction page_pool_get_dma_dir(struct page_pool *pool) @@ -321,6 +360,19 @@ static inline bool page_pool_is_last_frag(struct page_pool *pool, (page_pool_defrag_page(page, 1) == 0); } +/** + * page_pool_put_page() - release a reference to a page pool page + * @pool: pool from which page was allocated + * @page: page to release a reference on + * @dma_sync_size: how much of the page may have been touched by the device + * @allow_direct: released by the consumer, allow lockless caching + * + * The outcome of this depends on the page refcnt. If the driver bumps + * the refcnt > 1 this will unmap the page. If the page refcnt is 1 + * the allocator owns the page and will try to recycle it in one of the pool + * caches. If PP_FLAG_DMA_SYNC_DEV is set, the page will be synced for_device + * using dma_sync_single_range_for_device(). + */ static inline void page_pool_put_page(struct page_pool *pool, struct page *page, unsigned int dma_sync_size, @@ -337,14 +389,29 @@ static inline void page_pool_put_page(struct page_pool *pool, #endif } -/* Same as above but will try to sync the entire area pool->max_len */ +/** + * page_pool_put_full_page() - release a reference on a page pool page + * @pool: pool from which page was allocated + * @page: page to release a reference on + * @allow_direct: released by the consumer, allow lockless caching + * + * Similar to page_pool_put_page(), but will DMA sync the entire memory area + * as configured in &page_pool_params.max_len. + */ static inline void page_pool_put_full_page(struct page_pool *pool, struct page *page, bool allow_direct) { page_pool_put_page(pool, page, -1, allow_direct); } -/* Same as above but the caller must guarantee safe context. e.g NAPI */ +/** + * page_pool_recycle_direct() - release a reference on a page pool page + * @pool: pool from which page was allocated + * @page: page to release a reference on + * + * Similar to page_pool_put_full_page() but caller must guarantee safe context + * (e.g NAPI), since it will recycle the page directly into the pool fast cache. + */ static inline void page_pool_recycle_direct(struct page_pool *pool, struct page *page) { @@ -354,6 +421,13 @@ static inline void page_pool_recycle_direct(struct page_pool *pool, #define PAGE_POOL_DMA_USE_PP_FRAG_COUNT \ (sizeof(dma_addr_t) > sizeof(unsigned long)) +/** + * page_pool_get_dma_addr() - Retrieve the stored DMA address. + * @page: page allocated from a page pool + * + * Fetch the DMA address of the page. The page pool to which the page belongs + * must had been created with PP_FLAG_DMA_MAP. + */ static inline dma_addr_t page_pool_get_dma_addr(struct page *page) { dma_addr_t ret = page->dma_addr; -- cgit v1.2.3 From 992725ff32f534eaa13adfa37f933d4fbf1aa040 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Wed, 2 Aug 2023 21:07:16 +0800 Subject: net: Space.h: Remove unused function declarations Commit 5aa83a4c0a15 (" [PATCH] remove two obsolete net drivers") remove fmv18x_probe(). And commmit 01f4685797a5 ("eth: amd: remove NI6510 support (ni65)") leave ni65_probe(). Commit a10079c66290 ("staging: remove hp100 driver") remove hp100 driver and hp100_probe() declaration is not used anymore. sonic_probe() and iph5526_probe() are never implemented since the beginning of git history. Signed-off-by: Yue Haibing Link: https://lore.kernel.org/r/20230802130716.37308-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski --- include/net/Space.h | 7 ------- 1 file changed, 7 deletions(-) (limited to 'include') diff --git a/include/net/Space.h b/include/net/Space.h index 08ca9cef0213..c29f3d51c078 100644 --- a/include/net/Space.h +++ b/include/net/Space.h @@ -3,18 +3,11 @@ * ethernet adaptor have the name "eth[0123...]". */ -struct net_device *hp100_probe(int unit); struct net_device *ultra_probe(int unit); struct net_device *wd_probe(int unit); struct net_device *ne_probe(int unit); -struct net_device *fmv18x_probe(int unit); -struct net_device *ni65_probe(int unit); -struct net_device *sonic_probe(int unit); struct net_device *smc_init(int unit); struct net_device *cs89x0_probe(int unit); struct net_device *tc515_probe(int unit); struct net_device *lance_probe(int unit); struct net_device *cops_probe(int unit); - -/* Fibre Channel adapters */ -int iph5526_probe(struct net_device *dev); -- cgit v1.2.3 From 62c1bff593b7e30041d0273b835af9fd6f5ee737 Mon Sep 17 00:00:00 2001 From: Souradeep Chakrabarti Date: Wed, 2 Aug 2023 04:07:40 -0700 Subject: net: mana: Configure hwc timeout from hardware At present hwc timeout value is a fixed value. This patch sets the hwc timeout from the hardware. It now uses a new hardware capability GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG to query and set the value in hwc_timeout. Signed-off-by: Souradeep Chakrabarti Reviewed-by: Jesse Brandeburg Signed-off-by: David S. Miller --- include/net/mana/gdma.h | 20 +++++++++++++++++++- include/net/mana/hw_channel.h | 5 +++++ 2 files changed, 24 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h index 96c120160f15..88b6ef7ce1a6 100644 --- a/include/net/mana/gdma.h +++ b/include/net/mana/gdma.h @@ -33,6 +33,7 @@ enum gdma_request_type { GDMA_DESTROY_PD = 30, GDMA_CREATE_MR = 31, GDMA_DESTROY_MR = 32, + GDMA_QUERY_HWC_TIMEOUT = 84, /* 0x54 */ }; #define GDMA_RESOURCE_DOORBELL_PAGE 27 @@ -57,6 +58,8 @@ enum gdma_eqe_type { GDMA_EQE_HWC_INIT_EQ_ID_DB = 129, GDMA_EQE_HWC_INIT_DATA = 130, GDMA_EQE_HWC_INIT_DONE = 131, + GDMA_EQE_HWC_SOC_RECONFIG = 132, + GDMA_EQE_HWC_SOC_RECONFIG_DATA = 133, }; enum { @@ -531,10 +534,12 @@ enum { * so the driver is able to reliably support features like busy_poll. */ #define GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX BIT(2) +#define GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG BIT(3) #define GDMA_DRV_CAP_FLAGS1 \ (GDMA_DRV_CAP_FLAG_1_EQ_SHARING_MULTI_VPORT | \ - GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX) + GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX | \ + GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG) #define GDMA_DRV_CAP_FLAGS2 0 @@ -664,6 +669,19 @@ struct gdma_disable_queue_req { u32 alloc_res_id_on_creation; }; /* HW DATA */ +/* GDMA_QUERY_HWC_TIMEOUT */ +struct gdma_query_hwc_timeout_req { + struct gdma_req_hdr hdr; + u32 timeout_ms; + u32 reserved; +}; + +struct gdma_query_hwc_timeout_resp { + struct gdma_resp_hdr hdr; + u32 timeout_ms; + u32 reserved; +}; + enum atb_page_size { ATB_PAGE_SIZE_4K, ATB_PAGE_SIZE_8K, diff --git a/include/net/mana/hw_channel.h b/include/net/mana/hw_channel.h index 6a757a6e2732..3d3b5c881bc1 100644 --- a/include/net/mana/hw_channel.h +++ b/include/net/mana/hw_channel.h @@ -23,6 +23,10 @@ #define HWC_INIT_DATA_PF_DEST_RQ_ID 10 #define HWC_INIT_DATA_PF_DEST_CQ_ID 11 +#define HWC_DATA_CFG_HWC_TIMEOUT 1 + +#define HW_CHANNEL_WAIT_RESOURCE_TIMEOUT_MS 30000 + /* Structures labeled with "HW DATA" are exchanged with the hardware. All of * them are naturally aligned and hence don't need __packed. */ @@ -182,6 +186,7 @@ struct hw_channel_context { u32 pf_dest_vrq_id; u32 pf_dest_vrcq_id; + u32 hwc_timeout; struct hwc_caller_ctx *caller_ctx; }; -- cgit v1.2.3 From 6f5ca184cbef6d2b78772a350a3ed8be696b54a2 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 3 Aug 2023 07:53:34 +0000 Subject: tcp/dccp: cache line align inet_hashinfo I have seen tcp_hashinfo starting at a non optimal location, forcing input handlers to pull two cache lines instead of one, and sharing a cache line that was dirtied more than necessary: ffffffff83680600 b tcp_orphan_timer ffffffff83680628 b tcp_orphan_cache ffffffff8368062c b tcp_enable_tx_delay.__tcp_tx_delay_enabled ffffffff83680630 B tcp_hashinfo ffffffff83680680 b tcp_cong_list_lock After this patch, ehash, ehash_locks, ehash_mask and ehash_locks_mask are located in a read-only cache line. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/net/inet_hashtables.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 1177effabed3..843557223414 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -177,7 +177,7 @@ struct inet_hashinfo { struct inet_listen_hashbucket *lhash2; bool pernet; -}; +} ____cacheline_aligned_in_smp; static inline struct inet_hashinfo *tcp_or_dccp_get_hashinfo(const struct sock *sk) { -- cgit v1.2.3 From 7740bb882fdea16b2f3a4d3804827b910f44206c Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 3 Aug 2023 07:14:26 +0000 Subject: net: vlan: update wrong comments vlan_insert_tag() and friends do not allocate a new skb. However they might allocate a new skb->head. Update their comments to better describe their behavior. Signed-off-by: Eric Dumazet Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/linux/if_vlan.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h index 6ba71957851e..3028af87716e 100644 --- a/include/linux/if_vlan.h +++ b/include/linux/if_vlan.h @@ -408,7 +408,7 @@ static inline int __vlan_insert_tag(struct sk_buff *skb, * @mac_len: MAC header length including outer vlan headers * * Inserts the VLAN tag into @skb as part of the payload at offset mac_len - * Returns a VLAN tagged skb. If a new skb is created, @skb is freed. + * Returns a VLAN tagged skb. This might change skb->head. * * Following the skb_unshare() example, in case of error, the calling function * doesn't have to worry about freeing the original skb. @@ -437,7 +437,7 @@ static inline struct sk_buff *vlan_insert_inner_tag(struct sk_buff *skb, * @vlan_tci: VLAN TCI to insert * * Inserts the VLAN tag into @skb as part of the payload - * Returns a VLAN tagged skb. If a new skb is created, @skb is freed. + * Returns a VLAN tagged skb. This might change skb->head. * * Following the skb_unshare() example, in case of error, the calling function * doesn't have to worry about freeing the original skb. @@ -457,7 +457,7 @@ static inline struct sk_buff *vlan_insert_tag(struct sk_buff *skb, * @vlan_tci: VLAN TCI to insert * * Inserts the VLAN tag into @skb as part of the payload - * Returns a VLAN tagged skb. If a new skb is created, @skb is freed. + * Returns a VLAN tagged skb. This might change skb->head. * * Following the skb_unshare() example, in case of error, the calling function * doesn't have to worry about freeing the original skb. -- cgit v1.2.3 From 8a60a041eada0fbfdc7b6b7a10fdf68ae6a840ce Mon Sep 17 00:00:00 2001 From: Kui-Feng Lee Date: Thu, 3 Aug 2023 17:51:01 -0700 Subject: bpf: fix inconsistent return types of bpf_xdp_copy_buf(). Fix inconsistent return types in two implementations of bpf_xdp_copy_buf(). There are two implementations: one is an empty implementation whose return type does not match the actual implementation. Suggested-by: Alexei Starovoitov Signed-off-by: Kui-Feng Lee Acked-by: Yonghong Song Link: https://lore.kernel.org/r/20230804005101.1534505-1-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau --- include/linux/filter.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/linux/filter.h b/include/linux/filter.h index 2d6fe30bad5f..761af6b3cf2b 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -1572,10 +1572,9 @@ static inline void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len) return NULL; } -static inline void *bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, void *buf, - unsigned long len, bool flush) +static inline void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, void *buf, + unsigned long len, bool flush) { - return NULL; } #endif /* CONFIG_NET */ -- cgit v1.2.3 From 57ecc157b68eac54ee7cdc039e0df8f7f7bd5bc7 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Thu, 3 Aug 2023 21:47:47 +0800 Subject: net: llc: Remove unused function declarations llc_conn_ac_send_i_rsp_as_ack() and llc_conn_ev_sendack_tmr_exp() are never implemented since beginning of git history. Signed-off-by: Yue Haibing Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20230803134747.41512-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski --- include/net/llc_c_ac.h | 1 - include/net/llc_c_ev.h | 1 - 2 files changed, 2 deletions(-) (limited to 'include') diff --git a/include/net/llc_c_ac.h b/include/net/llc_c_ac.h index 3e1f76786d7b..7620a9196922 100644 --- a/include/net/llc_c_ac.h +++ b/include/net/llc_c_ac.h @@ -175,7 +175,6 @@ int llc_conn_ac_send_ack_if_needed(struct sock *sk, struct sk_buff *skb); int llc_conn_ac_adjust_npta_by_rr(struct sock *sk, struct sk_buff *skb); int llc_conn_ac_adjust_npta_by_rnr(struct sock *sk, struct sk_buff *skb); int llc_conn_ac_rst_sendack_flag(struct sock *sk, struct sk_buff *skb); -int llc_conn_ac_send_i_rsp_as_ack(struct sock *sk, struct sk_buff *skb); int llc_conn_ac_send_i_as_ack(struct sock *sk, struct sk_buff *skb); void llc_conn_busy_tmr_cb(struct timer_list *t); diff --git a/include/net/llc_c_ev.h b/include/net/llc_c_ev.h index 3948cf111dd0..241889955157 100644 --- a/include/net/llc_c_ev.h +++ b/include/net/llc_c_ev.h @@ -158,7 +158,6 @@ int llc_conn_ev_p_tmr_exp(struct sock *sk, struct sk_buff *skb); int llc_conn_ev_ack_tmr_exp(struct sock *sk, struct sk_buff *skb); int llc_conn_ev_rej_tmr_exp(struct sock *sk, struct sk_buff *skb); int llc_conn_ev_busy_tmr_exp(struct sock *sk, struct sk_buff *skb); -int llc_conn_ev_sendack_tmr_exp(struct sock *sk, struct sk_buff *skb); /* NOT_USED functions and their variations */ int llc_conn_ev_rx_xxx_cmd_pbit_set_1(struct sock *sk, struct sk_buff *skb); int llc_conn_ev_rx_xxx_rsp_fbit_set_1(struct sock *sk, struct sk_buff *skb); -- cgit v1.2.3 From 2f0e807bc2f105435be902997a385ebab86d1a7c Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Thu, 3 Aug 2023 21:54:24 +0800 Subject: net: 802: Remove unused function declarations Commit d8d9ba8dc9c7 ("net: 802: remove dead leftover after ipx driver removal") remove these implementations but leave the declarations. Signed-off-by: Yue Haibing Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20230803135424.41664-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski --- include/net/p8022.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include') diff --git a/include/net/p8022.h b/include/net/p8022.h index b690ffcad66b..a29e224ac498 100644 --- a/include/net/p8022.h +++ b/include/net/p8022.h @@ -13,7 +13,4 @@ register_8022_client(unsigned char type, struct packet_type *pt, struct net_device *orig_dev)); void unregister_8022_client(struct datalink_proto *proto); - -struct datalink_proto *make_8023_client(void); -void destroy_8023_client(struct datalink_proto *dl); #endif -- cgit v1.2.3 From 781486e415dc701466a25e49fb8bcc28362d80bf Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Thu, 3 Aug 2023 21:45:07 +0800 Subject: af_vsock: Remove unused declaration vsock_release_pending()/vsock_init_tap() Commit d021c344051a ("VSOCK: Introduce VM Sockets") declared but never implemented vsock_release_pending(). Also vsock_init_tap() never implemented since introduction in commit 531b374834c8 ("VSOCK: Add vsockmon tap functions"). Signed-off-by: Yue Haibing Reviewed-by: Simon Horman Reviewed-by: Stefano Garzarella Link: https://lore.kernel.org/r/20230803134507.22660-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski --- include/net/af_vsock.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include') diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index 0e7504a42925..b01cf9ac2437 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -201,7 +201,6 @@ static inline bool __vsock_in_connected_table(struct vsock_sock *vsk) return !list_empty(&vsk->connected_table); } -void vsock_release_pending(struct sock *pending); void vsock_add_pending(struct sock *listener, struct sock *pending); void vsock_remove_pending(struct sock *listener, struct sock *pending); void vsock_enqueue_accept(struct sock *listener, struct sock *connected); @@ -225,7 +224,6 @@ struct vsock_tap { struct list_head list; }; -int vsock_init_tap(void); int vsock_add_tap(struct vsock_tap *vt); int vsock_remove_tap(struct vsock_tap *vt); void vsock_deliver_tap(struct sk_buff *build_skb(void *opaque), void *opaque); -- cgit v1.2.3 From d58f2e15aa0c07f6f03ec71f64d7697ca43d04a1 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 4 Aug 2023 14:46:12 +0000 Subject: tcp: set TCP_USER_TIMEOUT locklessly icsk->icsk_user_timeout can be set locklessly, if all read sides use READ_ONCE(). Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Signed-off-by: David S. Miller --- include/linux/tcp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/tcp.h b/include/linux/tcp.h index d16abdb3541a..3c5efeeb024f 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -564,6 +564,6 @@ void __tcp_sock_set_nodelay(struct sock *sk, bool on); void tcp_sock_set_nodelay(struct sock *sk); void tcp_sock_set_quickack(struct sock *sk, int val); int tcp_sock_set_syncnt(struct sock *sk, int val); -void tcp_sock_set_user_timeout(struct sock *sk, u32 val); +int tcp_sock_set_user_timeout(struct sock *sk, int val); #endif /* _LINUX_TCP_H */ -- cgit v1.2.3 From b1d13f7a3b5396503e6869ed627bb4eeab9b524f Mon Sep 17 00:00:00 2001 From: Haiyang Zhang Date: Fri, 4 Aug 2023 13:33:53 -0700 Subject: net: mana: Add page pool for RX buffers Add page pool for RX buffers for faster buffer cycle and reduce CPU usage. The standard page pool API is used. With iperf and 128 threads test, this patch improved the throughput by 12-15%, and decreased the IRQ associated CPU's usage from 99-100% to 10-50%. Signed-off-by: Haiyang Zhang Reviewed-by: Jesse Brandeburg Signed-off-by: David S. Miller --- include/net/mana/mana.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h index 1ccdca03e166..879990101c9f 100644 --- a/include/net/mana/mana.h +++ b/include/net/mana/mana.h @@ -282,6 +282,7 @@ struct mana_recv_buf_oob { struct gdma_wqe_request wqe_req; void *buf_va; + bool from_pool; /* allocated from a page pool */ /* SGL of the buffer going to be sent has part of the work request. */ u32 num_sge; @@ -332,6 +333,8 @@ struct mana_rxq { bool xdp_flush; int xdp_rc; /* XDP redirect return code */ + struct page_pool *page_pool; + /* MUST BE THE LAST MEMBER: * Each receive buffer has an associated mana_recv_buf_oob. */ -- cgit v1.2.3 From 047551cd305ce2f51f8cf16ed5638f1e120f90af Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Sat, 5 Aug 2023 18:50:33 +0800 Subject: neighbour: Remove unused function declaration pneigh_for_each() pneigh_for_each() is never implemented since the beginning of git history. Signed-off-by: Yue Haibing Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/net/neighbour.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include') diff --git a/include/net/neighbour.h b/include/net/neighbour.h index f6a8ecc6b1fa..6da68886fabb 100644 --- a/include/net/neighbour.h +++ b/include/net/neighbour.h @@ -394,8 +394,6 @@ void neigh_for_each(struct neigh_table *tbl, void __neigh_for_each_release(struct neigh_table *tbl, int (*cb)(struct neighbour *)); int neigh_xmit(int fam, struct net_device *, const void *, struct sk_buff *); -void pneigh_for_each(struct neigh_table *tbl, - void (*cb)(struct pneigh_entry *)); struct neigh_seq_state { struct seq_net_private p; -- cgit v1.2.3 From 992b47851be9125c1de480c8f34cc0ea7eb58daf Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Sat, 5 Aug 2023 18:52:08 +0800 Subject: net: pkt_cls: Remove unused inline helpers Commit acb674428c3d ("net: sched: introduce per-block callbacks") implemented these but never used it. Signed-off-by: Yue Haibing Signed-off-by: David S. Miller --- include/net/pkt_cls.h | 13 ------------- 1 file changed, 13 deletions(-) (limited to 'include') diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index 139cd09828af..f308e8268651 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -138,19 +138,6 @@ static inline struct Qdisc *tcf_block_q(struct tcf_block *block) return NULL; } -static inline -int tc_setup_cb_block_register(struct tcf_block *block, flow_setup_cb_t *cb, - void *cb_priv) -{ - return 0; -} - -static inline -void tc_setup_cb_block_unregister(struct tcf_block *block, flow_setup_cb_t *cb, - void *cb_priv) -{ -} - static inline int tcf_classify(struct sk_buff *skb, const struct tcf_block *block, const struct tcf_proto *tp, -- cgit v1.2.3 From 2c6af36beb2e7e0066a74bc930e562f66ad88bb6 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Sat, 5 Aug 2023 18:53:54 +0800 Subject: ndisc: Remove unused ndisc_ifinfo_sysctl_strategy() declaration Commit f8572d8f2a2b ("sysctl net: Remove unused binary sysctl code") left behind this declaration. Signed-off-by: Yue Haibing Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/net/ndisc.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include') diff --git a/include/net/ndisc.h b/include/net/ndisc.h index 52eae0943433..9bbdf6eaa942 100644 --- a/include/net/ndisc.h +++ b/include/net/ndisc.h @@ -488,9 +488,6 @@ void igmp6_event_report(struct sk_buff *skb); #ifdef CONFIG_SYSCTL int ndisc_ifinfo_sysctl_change(struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos); -int ndisc_ifinfo_sysctl_strategy(struct ctl_table *ctl, - void __user *oldval, size_t __user *oldlenp, - void __user *newval, size_t newlen); #endif void inet6_ifinfo_notify(int event, struct inet6_dev *idev); -- cgit v1.2.3 From cc97777c80fdfabe12997581131872a03fdcf683 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Sat, 5 Aug 2023 19:00:09 +0800 Subject: udp/udplite: Remove unused function declarations udp{,lite}_get_port() Commit 6ba5a3c52da0 ("[UDP]: Make full use of proto.h.udp_hash innovation.") removed these implementations but leave declarations. Signed-off-by: Yue Haibing Reviewed-by: Willem de Bruijn Signed-off-by: David S. Miller --- include/net/udp.h | 3 --- include/net/udplite.h | 2 -- 2 files changed, 5 deletions(-) (limited to 'include') diff --git a/include/net/udp.h b/include/net/udp.h index 5a8421cd9083..488a6d2babcc 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -273,9 +273,6 @@ static inline struct sk_buff *skb_recv_udp(struct sock *sk, unsigned int flags, int udp_v4_early_demux(struct sk_buff *skb); bool udp_sk_rx_dst_set(struct sock *sk, struct dst_entry *dst); -int udp_get_port(struct sock *sk, unsigned short snum, - int (*saddr_cmp)(const struct sock *, - const struct sock *)); int udp_err(struct sk_buff *, u32); int udp_abort(struct sock *sk, int err); int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len); diff --git a/include/net/udplite.h b/include/net/udplite.h index 299c14ce2bb9..bd33ff2b8f42 100644 --- a/include/net/udplite.h +++ b/include/net/udplite.h @@ -81,6 +81,4 @@ static inline __wsum udplite_csum(struct sk_buff *skb) } void udplite4_register(void); -int udplite_get_port(struct sock *sk, unsigned short snum, - int (*scmp)(const struct sock *, const struct sock *)); #endif /* _UDPLITE_H */ -- cgit v1.2.3 From f3147015fa0769cf1dcbfdb9040ad380cc4daeb5 Mon Sep 17 00:00:00 2001 From: Maher Sanalla Date: Mon, 12 Jun 2023 11:58:14 +0300 Subject: net/mlx5: Add IRQ vector to CPU lookup function Currently, once driver load completes, IRQ requests were performed for all vectors. However, as we move to support dynamic creation of EQs, this will not be the case as some IRQs will not exist at this stage. Thus, in such case, use the default CPU to IRQ mapping which is the serial mapping based on IRQ vector index. Meaning, the n'th vector gets mapped to the n'th CPU. Introduce an API function mlx5_comp_vector_cpu() that takes an IRQ index and provides the corresponding CPU mapping. It utilizes the existing IRQ affinity if defined, or resorts to the default serialized CPU mapping otherwise. Signed-off-by: Maher Sanalla Reviewed-by: Shay Drory Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index fa70c25423b2..e686722fa4ca 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1109,8 +1109,7 @@ int mlx5_alloc_bfreg(struct mlx5_core_dev *mdev, struct mlx5_sq_bfreg *bfreg, void mlx5_free_bfreg(struct mlx5_core_dev *mdev, struct mlx5_sq_bfreg *bfreg); unsigned int mlx5_comp_vectors_count(struct mlx5_core_dev *dev); -struct cpumask * -mlx5_comp_irq_get_affinity_mask(struct mlx5_core_dev *dev, int vector); +int mlx5_comp_vector_get_cpu(struct mlx5_core_dev *dev, int vector); unsigned int mlx5_core_reserved_gids_count(struct mlx5_core_dev *dev); int mlx5_core_roce_gid_set(struct mlx5_core_dev *dev, unsigned int index, u8 roce_version, u8 roce_l3_type, const u8 *gid, -- cgit v1.2.3 From 674dd4e2e04e7a62bfacf28129e0808f33395bdf Mon Sep 17 00:00:00 2001 From: Maher Sanalla Date: Thu, 22 Jun 2023 18:52:44 +0300 Subject: net/mlx5: Rename mlx5_comp_vectors_count() to mlx5_comp_vectors_max() To accurately represent its purpose, rename the function that retrieves the value of maximum vectors from mlx5_comp_vectors_count() to mlx5_comp_vectors_max(). Signed-off-by: Maher Sanalla Reviewed-by: Shay Drory Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index e686722fa4ca..43c4fd26c69a 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1108,7 +1108,7 @@ int mlx5_alloc_bfreg(struct mlx5_core_dev *mdev, struct mlx5_sq_bfreg *bfreg, bool map_wc, bool fast_path); void mlx5_free_bfreg(struct mlx5_core_dev *mdev, struct mlx5_sq_bfreg *bfreg); -unsigned int mlx5_comp_vectors_count(struct mlx5_core_dev *dev); +unsigned int mlx5_comp_vectors_max(struct mlx5_core_dev *dev); int mlx5_comp_vector_get_cpu(struct mlx5_core_dev *dev, int vector); unsigned int mlx5_core_reserved_gids_count(struct mlx5_core_dev *dev); int mlx5_core_roce_gid_set(struct mlx5_core_dev *dev, unsigned int index, -- cgit v1.2.3 From f14c1a14e63227a65faa68237687784a6dd2e922 Mon Sep 17 00:00:00 2001 From: Maher Sanalla Date: Mon, 12 Jun 2023 10:13:50 +0300 Subject: net/mlx5: Allocate completion EQs dynamically This commit enables the dynamic allocation of EQs at runtime, allowing for more flexibility in managing completion EQs and reducing the memory overhead of driver load. Whenever a CQ is created for a given vector index, the driver will lookup to see if there is an already mapped completion EQ for that vector, if so, utilize it. Otherwise, allocate a new EQ on demand and then utilize it for the CQ completion events. Add a protection lock to the EQ table to protect from concurrent EQ creation attempts. While at it, replace mlx5_vector2irqn()/mlx5_vector2eqn() with mlx5_comp_eqn_get() and mlx5_comp_irqn_get() which will allocate an EQ on demand if no EQ is found for the given vector. Signed-off-by: Maher Sanalla Reviewed-by: Shay Drory Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 43c4fd26c69a..3e1017d764b7 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1058,7 +1058,7 @@ void mlx5_unregister_debugfs(void); void mlx5_fill_page_frag_array_perm(struct mlx5_frag_buf *buf, __be64 *pas, u8 perm); void mlx5_fill_page_frag_array(struct mlx5_frag_buf *frag_buf, __be64 *pas); -int mlx5_vector2eqn(struct mlx5_core_dev *dev, int vector, int *eqn); +int mlx5_comp_eqn_get(struct mlx5_core_dev *dev, u16 vecidx, int *eqn); int mlx5_core_attach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid, u32 qpn); int mlx5_core_detach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid, u32 qpn); -- cgit v1.2.3 From 26cfb838aa002a5c03319f7fce87e9313e794351 Mon Sep 17 00:00:00 2001 From: Johannes Zink Date: Tue, 1 Aug 2023 17:44:29 +0200 Subject: net: stmmac: correct MAC propagation delay The IEEE1588 Standard specifies that the timestamps of Packets must be captured when the PTP message timestamp point (leading edge of first octet after the start of frame delimiter) crosses the boundary between the node and the network. As the MAC latches the timestamp at an internal point, the captured timestamp must be corrected for the additional data transmission latency, as described in the publicly available datasheet [1]. This patch only corrects for the MAC-Internal delay, which can be read out from the MAC_Ingress_Timestamp_Latency register on DWMAC version 5, since the Phy framework currently does not support querying the Phy ingress and egress latency. The Closs Domain Crossing Circuits errors as indicated in [1] are already being accounted in the stmmac_get_tx_hwtstamp() function and are not corrected here. As the Latency varies for different link speeds and MII modes of operation, the correction value needs to be updated on each link state change. As the delay also causes a phase shift in the timestamp counter compared to the rest of the network, this correction will also reduce phase error when generating PPS outputs from the timestamp counter. Since the correction registers may be unavailable on some hardware and no feature bits are documented for dynamically detection of the MAC propagation delay readout, introduce a feature bit to explicitely enable MAC delay Correction in the gluecode driver. [1] i.MX8MP Reference Manual, rev.1 Section 11.7.2.5.3 "Timestamp correction" Signed-off-by: Johannes Zink Link: https://lore.kernel.org/r/20230719-stmmac_correct_mac_delay-v2-1-3366f38ee9a6@pengutronix.de Link: https://lore.kernel.org/r/20230719-stmmac_correct_mac_delay-v3-1-61e63427735e@pengutronix.de Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 3d0702510224..652404c03944 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -218,6 +218,7 @@ struct dwmac4_addrs { #define STMMAC_FLAG_INT_SNAPSHOT_EN BIT(9) #define STMMAC_FLAG_RX_CLK_RUNS_IN_LPI BIT(10) #define STMMAC_FLAG_EN_TX_LPI_CLOCKGATING BIT(11) +#define STMMAC_FLAG_HWTSTAMP_CORRECT_LATENCY BIT(12) struct plat_stmmacenet_data { int bus_id; -- cgit v1.2.3 From a9ca9f9ceff382b58b488248f0c0da9e157f5d06 Mon Sep 17 00:00:00 2001 From: Yunsheng Lin Date: Fri, 4 Aug 2023 20:05:24 +0200 Subject: page_pool: split types and declarations from page_pool.h Split types and pure function declarations from page_pool.h and add them in page_page/types.h, so that C sources can include page_pool.h and headers should generally only include page_pool/types.h as suggested by jakub. Rename page_pool.h to page_pool/helpers.h to have both in one place. Signed-off-by: Yunsheng Lin Suggested-by: Jakub Kicinski Signed-off-by: Alexander Lobakin Reviewed-by: Alexander Duyck Link: https://lore.kernel.org/r/20230804180529.2483231-2-aleksander.lobakin@intel.com [Jakub: change microsoft/mana, fix kdoc paths in Documentation] Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 2 +- include/net/page_pool.h | 470 --------------------------------------- include/net/page_pool/helpers.h | 238 ++++++++++++++++++++ include/net/page_pool/types.h | 238 ++++++++++++++++++++ include/trace/events/page_pool.h | 2 +- 5 files changed, 478 insertions(+), 472 deletions(-) delete mode 100644 include/net/page_pool.h create mode 100644 include/net/page_pool/helpers.h create mode 100644 include/net/page_pool/types.h (limited to 'include') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 16a49ba534e4..888e3d7e74c1 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -32,7 +32,7 @@ #include #include #include -#include +#include #if IS_ENABLED(CONFIG_NF_CONNTRACK) #include #endif diff --git a/include/net/page_pool.h b/include/net/page_pool.h deleted file mode 100644 index 73d4f786418d..000000000000 --- a/include/net/page_pool.h +++ /dev/null @@ -1,470 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 - * - * page_pool.h - * Author: Jesper Dangaard Brouer - * Copyright (C) 2016 Red Hat, Inc. - */ - -/** - * DOC: page_pool allocator - * - * This page_pool allocator is optimized for the XDP mode that - * uses one-frame-per-page, but have fallbacks that act like the - * regular page allocator APIs. - * - * Basic use involve replacing alloc_pages() calls with the - * page_pool_alloc_pages() call. Drivers should likely use - * page_pool_dev_alloc_pages() replacing dev_alloc_pages(). - * - * API keeps track of in-flight pages, in-order to let API user know - * when it is safe to dealloactor page_pool object. Thus, API users - * must call page_pool_put_page() where appropriate and only attach - * the page to a page_pool-aware objects, like skbs marked for recycling. - * - * API user must only call page_pool_put_page() once on a page, as it - * will either recycle the page, or in case of elevated refcnt, it - * will release the DMA mapping and in-flight state accounting. We - * hope to lift this requirement in the future. - */ -#ifndef _NET_PAGE_POOL_H -#define _NET_PAGE_POOL_H - -#include /* Needed by ptr_ring */ -#include -#include - -#define PP_FLAG_DMA_MAP BIT(0) /* Should page_pool do the DMA - * map/unmap - */ -#define PP_FLAG_DMA_SYNC_DEV BIT(1) /* If set all pages that the driver gets - * from page_pool will be - * DMA-synced-for-device according to - * the length provided by the device - * driver. - * Please note DMA-sync-for-CPU is still - * device driver responsibility - */ -#define PP_FLAG_PAGE_FRAG BIT(2) /* for page frag feature */ -#define PP_FLAG_ALL (PP_FLAG_DMA_MAP |\ - PP_FLAG_DMA_SYNC_DEV |\ - PP_FLAG_PAGE_FRAG) - -/* - * Fast allocation side cache array/stack - * - * The cache size and refill watermark is related to the network - * use-case. The NAPI budget is 64 packets. After a NAPI poll the RX - * ring is usually refilled and the max consumed elements will be 64, - * thus a natural max size of objects needed in the cache. - * - * Keeping room for more objects, is due to XDP_DROP use-case. As - * XDP_DROP allows the opportunity to recycle objects directly into - * this array, as it shares the same softirq/NAPI protection. If - * cache is already full (or partly full) then the XDP_DROP recycles - * would have to take a slower code path. - */ -#define PP_ALLOC_CACHE_SIZE 128 -#define PP_ALLOC_CACHE_REFILL 64 -struct pp_alloc_cache { - u32 count; - struct page *cache[PP_ALLOC_CACHE_SIZE]; -}; - -/** - * struct page_pool_params - page pool parameters - * @flags: PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV, PP_FLAG_PAGE_FRAG - * @order: 2^order pages on allocation - * @pool_size: size of the ptr_ring - * @nid: NUMA node id to allocate from pages from - * @dev: device, for DMA pre-mapping purposes - * @napi: NAPI which is the sole consumer of pages, otherwise NULL - * @dma_dir: DMA mapping direction - * @max_len: max DMA sync memory size for PP_FLAG_DMA_SYNC_DEV - * @offset: DMA sync address offset for PP_FLAG_DMA_SYNC_DEV - */ -struct page_pool_params { - unsigned int flags; - unsigned int order; - unsigned int pool_size; - int nid; - struct device *dev; - struct napi_struct *napi; - enum dma_data_direction dma_dir; - unsigned int max_len; - unsigned int offset; -/* private: used by test code only */ - void (*init_callback)(struct page *page, void *arg); - void *init_arg; -}; - -#ifdef CONFIG_PAGE_POOL_STATS -/** - * struct page_pool_alloc_stats - allocation statistics - * @fast: successful fast path allocations - * @slow: slow path order-0 allocations - * @slow_high_order: slow path high order allocations - * @empty: ptr ring is empty, so a slow path allocation was forced - * @refill: an allocation which triggered a refill of the cache - * @waive: pages obtained from the ptr ring that cannot be added to - * the cache due to a NUMA mismatch - */ -struct page_pool_alloc_stats { - u64 fast; - u64 slow; - u64 slow_high_order; - u64 empty; - u64 refill; - u64 waive; -}; - -/** - * struct page_pool_recycle_stats - recycling (freeing) statistics - * @cached: recycling placed page in the page pool cache - * @cache_full: page pool cache was full - * @ring: page placed into the ptr ring - * @ring_full: page released from page pool because the ptr ring was full - * @released_refcnt: page released (and not recycled) because refcnt > 1 - */ -struct page_pool_recycle_stats { - u64 cached; - u64 cache_full; - u64 ring; - u64 ring_full; - u64 released_refcnt; -}; - -/** - * struct page_pool_stats - combined page pool use statistics - * @alloc_stats: see struct page_pool_alloc_stats - * @recycle_stats: see struct page_pool_recycle_stats - * - * Wrapper struct for combining page pool stats with different storage - * requirements. - */ -struct page_pool_stats { - struct page_pool_alloc_stats alloc_stats; - struct page_pool_recycle_stats recycle_stats; -}; - -int page_pool_ethtool_stats_get_count(void); -u8 *page_pool_ethtool_stats_get_strings(u8 *data); -u64 *page_pool_ethtool_stats_get(u64 *data, void *stats); - -/* - * Drivers that wish to harvest page pool stats and report them to users - * (perhaps via ethtool, debugfs, or another mechanism) can allocate a - * struct page_pool_stats call page_pool_get_stats to get stats for the specified pool. - */ -bool page_pool_get_stats(struct page_pool *pool, - struct page_pool_stats *stats); -#else - -static inline int page_pool_ethtool_stats_get_count(void) -{ - return 0; -} - -static inline u8 *page_pool_ethtool_stats_get_strings(u8 *data) -{ - return data; -} - -static inline u64 *page_pool_ethtool_stats_get(u64 *data, void *stats) -{ - return data; -} - -#endif - -struct page_pool { - struct page_pool_params p; - - struct delayed_work release_dw; - void (*disconnect)(void *); - unsigned long defer_start; - unsigned long defer_warn; - - u32 pages_state_hold_cnt; - unsigned int frag_offset; - struct page *frag_page; - long frag_users; - -#ifdef CONFIG_PAGE_POOL_STATS - /* these stats are incremented while in softirq context */ - struct page_pool_alloc_stats alloc_stats; -#endif - u32 xdp_mem_id; - - /* - * Data structure for allocation side - * - * Drivers allocation side usually already perform some kind - * of resource protection. Piggyback on this protection, and - * require driver to protect allocation side. - * - * For NIC drivers this means, allocate a page_pool per - * RX-queue. As the RX-queue is already protected by - * Softirq/BH scheduling and napi_schedule. NAPI schedule - * guarantee that a single napi_struct will only be scheduled - * on a single CPU (see napi_schedule). - */ - struct pp_alloc_cache alloc ____cacheline_aligned_in_smp; - - /* Data structure for storing recycled pages. - * - * Returning/freeing pages is more complicated synchronization - * wise, because free's can happen on remote CPUs, with no - * association with allocation resource. - * - * Use ptr_ring, as it separates consumer and producer - * effeciently, it a way that doesn't bounce cache-lines. - * - * TODO: Implement bulk return pages into this structure. - */ - struct ptr_ring ring; - -#ifdef CONFIG_PAGE_POOL_STATS - /* recycle stats are per-cpu to avoid locking */ - struct page_pool_recycle_stats __percpu *recycle_stats; -#endif - atomic_t pages_state_release_cnt; - - /* A page_pool is strictly tied to a single RX-queue being - * protected by NAPI, due to above pp_alloc_cache. This - * refcnt serves purpose is to simplify drivers error handling. - */ - refcount_t user_cnt; - - u64 destroy_cnt; -}; - -struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp); - -/** - * page_pool_dev_alloc_pages() - allocate a page. - * @pool: pool from which to allocate - * - * Get a page from the page allocator or page_pool caches. - */ -static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool) -{ - gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN); - - return page_pool_alloc_pages(pool, gfp); -} - -struct page *page_pool_alloc_frag(struct page_pool *pool, unsigned int *offset, - unsigned int size, gfp_t gfp); - -static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool, - unsigned int *offset, - unsigned int size) -{ - gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN); - - return page_pool_alloc_frag(pool, offset, size, gfp); -} - -/** - * page_pool_get_dma_dir() - Retrieve the stored DMA direction. - * @pool: pool from which page was allocated - * - * Get the stored dma direction. A driver might decide to store this locally - * and avoid the extra cache line from page_pool to determine the direction. - */ -static -inline enum dma_data_direction page_pool_get_dma_dir(struct page_pool *pool) -{ - return pool->p.dma_dir; -} - -bool page_pool_return_skb_page(struct page *page, bool napi_safe); - -struct page_pool *page_pool_create(const struct page_pool_params *params); - -struct xdp_mem_info; - -#ifdef CONFIG_PAGE_POOL -void page_pool_unlink_napi(struct page_pool *pool); -void page_pool_destroy(struct page_pool *pool); -void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *), - struct xdp_mem_info *mem); -void page_pool_put_page_bulk(struct page_pool *pool, void **data, - int count); -#else -static inline void page_pool_unlink_napi(struct page_pool *pool) -{ -} - -static inline void page_pool_destroy(struct page_pool *pool) -{ -} - -static inline void page_pool_use_xdp_mem(struct page_pool *pool, - void (*disconnect)(void *), - struct xdp_mem_info *mem) -{ -} - -static inline void page_pool_put_page_bulk(struct page_pool *pool, void **data, - int count) -{ -} -#endif - -void page_pool_put_defragged_page(struct page_pool *pool, struct page *page, - unsigned int dma_sync_size, - bool allow_direct); - -/* pp_frag_count represents the number of writers who can update the page - * either by updating skb->data or via DMA mappings for the device. - * We can't rely on the page refcnt for that as we don't know who might be - * holding page references and we can't reliably destroy or sync DMA mappings - * of the fragments. - * - * When pp_frag_count reaches 0 we can either recycle the page if the page - * refcnt is 1 or return it back to the memory allocator and destroy any - * mappings we have. - */ -static inline void page_pool_fragment_page(struct page *page, long nr) -{ - atomic_long_set(&page->pp_frag_count, nr); -} - -static inline long page_pool_defrag_page(struct page *page, long nr) -{ - long ret; - - /* If nr == pp_frag_count then we have cleared all remaining - * references to the page. No need to actually overwrite it, instead - * we can leave this to be overwritten by the calling function. - * - * The main advantage to doing this is that an atomic_read is - * generally a much cheaper operation than an atomic update, - * especially when dealing with a page that may be partitioned - * into only 2 or 3 pieces. - */ - if (atomic_long_read(&page->pp_frag_count) == nr) - return 0; - - ret = atomic_long_sub_return(nr, &page->pp_frag_count); - WARN_ON(ret < 0); - return ret; -} - -static inline bool page_pool_is_last_frag(struct page_pool *pool, - struct page *page) -{ - /* If fragments aren't enabled or count is 0 we were the last user */ - return !(pool->p.flags & PP_FLAG_PAGE_FRAG) || - (page_pool_defrag_page(page, 1) == 0); -} - -/** - * page_pool_put_page() - release a reference to a page pool page - * @pool: pool from which page was allocated - * @page: page to release a reference on - * @dma_sync_size: how much of the page may have been touched by the device - * @allow_direct: released by the consumer, allow lockless caching - * - * The outcome of this depends on the page refcnt. If the driver bumps - * the refcnt > 1 this will unmap the page. If the page refcnt is 1 - * the allocator owns the page and will try to recycle it in one of the pool - * caches. If PP_FLAG_DMA_SYNC_DEV is set, the page will be synced for_device - * using dma_sync_single_range_for_device(). - */ -static inline void page_pool_put_page(struct page_pool *pool, - struct page *page, - unsigned int dma_sync_size, - bool allow_direct) -{ - /* When page_pool isn't compiled-in, net/core/xdp.c doesn't - * allow registering MEM_TYPE_PAGE_POOL, but shield linker. - */ -#ifdef CONFIG_PAGE_POOL - if (!page_pool_is_last_frag(pool, page)) - return; - - page_pool_put_defragged_page(pool, page, dma_sync_size, allow_direct); -#endif -} - -/** - * page_pool_put_full_page() - release a reference on a page pool page - * @pool: pool from which page was allocated - * @page: page to release a reference on - * @allow_direct: released by the consumer, allow lockless caching - * - * Similar to page_pool_put_page(), but will DMA sync the entire memory area - * as configured in &page_pool_params.max_len. - */ -static inline void page_pool_put_full_page(struct page_pool *pool, - struct page *page, bool allow_direct) -{ - page_pool_put_page(pool, page, -1, allow_direct); -} - -/** - * page_pool_recycle_direct() - release a reference on a page pool page - * @pool: pool from which page was allocated - * @page: page to release a reference on - * - * Similar to page_pool_put_full_page() but caller must guarantee safe context - * (e.g NAPI), since it will recycle the page directly into the pool fast cache. - */ -static inline void page_pool_recycle_direct(struct page_pool *pool, - struct page *page) -{ - page_pool_put_full_page(pool, page, true); -} - -#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT \ - (sizeof(dma_addr_t) > sizeof(unsigned long)) - -/** - * page_pool_get_dma_addr() - Retrieve the stored DMA address. - * @page: page allocated from a page pool - * - * Fetch the DMA address of the page. The page pool to which the page belongs - * must had been created with PP_FLAG_DMA_MAP. - */ -static inline dma_addr_t page_pool_get_dma_addr(struct page *page) -{ - dma_addr_t ret = page->dma_addr; - - if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) - ret |= (dma_addr_t)page->dma_addr_upper << 16 << 16; - - return ret; -} - -static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr) -{ - page->dma_addr = addr; - if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) - page->dma_addr_upper = upper_32_bits(addr); -} - -static inline bool is_page_pool_compiled_in(void) -{ -#ifdef CONFIG_PAGE_POOL - return true; -#else - return false; -#endif -} - -static inline bool page_pool_put(struct page_pool *pool) -{ - return refcount_dec_and_test(&pool->user_cnt); -} - -/* Caller must provide appropriate safe context, e.g. NAPI. */ -void page_pool_update_nid(struct page_pool *pool, int new_nid); -static inline void page_pool_nid_changed(struct page_pool *pool, int new_nid) -{ - if (unlikely(pool->p.nid != new_nid)) - page_pool_update_nid(pool, new_nid); -} - -#endif /* _NET_PAGE_POOL_H */ diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h new file mode 100644 index 000000000000..78df91804c87 --- /dev/null +++ b/include/net/page_pool/helpers.h @@ -0,0 +1,238 @@ +/* SPDX-License-Identifier: GPL-2.0 + * + * page_pool/helpers.h + * Author: Jesper Dangaard Brouer + * Copyright (C) 2016 Red Hat, Inc. + */ + +/** + * DOC: page_pool allocator + * + * This page_pool allocator is optimized for the XDP mode that + * uses one-frame-per-page, but have fallbacks that act like the + * regular page allocator APIs. + * + * Basic use involve replacing alloc_pages() calls with the + * page_pool_alloc_pages() call. Drivers should likely use + * page_pool_dev_alloc_pages() replacing dev_alloc_pages(). + * + * API keeps track of in-flight pages, in-order to let API user know + * when it is safe to dealloactor page_pool object. Thus, API users + * must call page_pool_put_page() where appropriate and only attach + * the page to a page_pool-aware objects, like skbs marked for recycling. + * + * API user must only call page_pool_put_page() once on a page, as it + * will either recycle the page, or in case of elevated refcnt, it + * will release the DMA mapping and in-flight state accounting. We + * hope to lift this requirement in the future. + */ +#ifndef _NET_PAGE_POOL_HELPERS_H +#define _NET_PAGE_POOL_HELPERS_H + +#include + +#ifdef CONFIG_PAGE_POOL_STATS +int page_pool_ethtool_stats_get_count(void); +u8 *page_pool_ethtool_stats_get_strings(u8 *data); +u64 *page_pool_ethtool_stats_get(u64 *data, void *stats); + +/* + * Drivers that wish to harvest page pool stats and report them to users + * (perhaps via ethtool, debugfs, or another mechanism) can allocate a + * struct page_pool_stats call page_pool_get_stats to get stats for the specified pool. + */ +bool page_pool_get_stats(struct page_pool *pool, + struct page_pool_stats *stats); +#else +static inline int page_pool_ethtool_stats_get_count(void) +{ + return 0; +} + +static inline u8 *page_pool_ethtool_stats_get_strings(u8 *data) +{ + return data; +} + +static inline u64 *page_pool_ethtool_stats_get(u64 *data, void *stats) +{ + return data; +} +#endif + +/** + * page_pool_dev_alloc_pages() - allocate a page. + * @pool: pool from which to allocate + * + * Get a page from the page allocator or page_pool caches. + */ +static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool) +{ + gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN); + + return page_pool_alloc_pages(pool, gfp); +} + +static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool, + unsigned int *offset, + unsigned int size) +{ + gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN); + + return page_pool_alloc_frag(pool, offset, size, gfp); +} + +/** + * page_pool_get_dma_dir() - Retrieve the stored DMA direction. + * @pool: pool from which page was allocated + * + * Get the stored dma direction. A driver might decide to store this locally + * and avoid the extra cache line from page_pool to determine the direction. + */ +static +inline enum dma_data_direction page_pool_get_dma_dir(struct page_pool *pool) +{ + return pool->p.dma_dir; +} + +/* pp_frag_count represents the number of writers who can update the page + * either by updating skb->data or via DMA mappings for the device. + * We can't rely on the page refcnt for that as we don't know who might be + * holding page references and we can't reliably destroy or sync DMA mappings + * of the fragments. + * + * When pp_frag_count reaches 0 we can either recycle the page if the page + * refcnt is 1 or return it back to the memory allocator and destroy any + * mappings we have. + */ +static inline void page_pool_fragment_page(struct page *page, long nr) +{ + atomic_long_set(&page->pp_frag_count, nr); +} + +static inline long page_pool_defrag_page(struct page *page, long nr) +{ + long ret; + + /* If nr == pp_frag_count then we have cleared all remaining + * references to the page. No need to actually overwrite it, instead + * we can leave this to be overwritten by the calling function. + * + * The main advantage to doing this is that an atomic_read is + * generally a much cheaper operation than an atomic update, + * especially when dealing with a page that may be partitioned + * into only 2 or 3 pieces. + */ + if (atomic_long_read(&page->pp_frag_count) == nr) + return 0; + + ret = atomic_long_sub_return(nr, &page->pp_frag_count); + WARN_ON(ret < 0); + return ret; +} + +static inline bool page_pool_is_last_frag(struct page_pool *pool, + struct page *page) +{ + /* If fragments aren't enabled or count is 0 we were the last user */ + return !(pool->p.flags & PP_FLAG_PAGE_FRAG) || + (page_pool_defrag_page(page, 1) == 0); +} + +/** + * page_pool_put_page() - release a reference to a page pool page + * @pool: pool from which page was allocated + * @page: page to release a reference on + * @dma_sync_size: how much of the page may have been touched by the device + * @allow_direct: released by the consumer, allow lockless caching + * + * The outcome of this depends on the page refcnt. If the driver bumps + * the refcnt > 1 this will unmap the page. If the page refcnt is 1 + * the allocator owns the page and will try to recycle it in one of the pool + * caches. If PP_FLAG_DMA_SYNC_DEV is set, the page will be synced for_device + * using dma_sync_single_range_for_device(). + */ +static inline void page_pool_put_page(struct page_pool *pool, + struct page *page, + unsigned int dma_sync_size, + bool allow_direct) +{ + /* When page_pool isn't compiled-in, net/core/xdp.c doesn't + * allow registering MEM_TYPE_PAGE_POOL, but shield linker. + */ +#ifdef CONFIG_PAGE_POOL + if (!page_pool_is_last_frag(pool, page)) + return; + + page_pool_put_defragged_page(pool, page, dma_sync_size, allow_direct); +#endif +} + +/** + * page_pool_put_full_page() - release a reference on a page pool page + * @pool: pool from which page was allocated + * @page: page to release a reference on + * @allow_direct: released by the consumer, allow lockless caching + * + * Similar to page_pool_put_page(), but will DMA sync the entire memory area + * as configured in &page_pool_params.max_len. + */ +static inline void page_pool_put_full_page(struct page_pool *pool, + struct page *page, bool allow_direct) +{ + page_pool_put_page(pool, page, -1, allow_direct); +} + +/** + * page_pool_recycle_direct() - release a reference on a page pool page + * @pool: pool from which page was allocated + * @page: page to release a reference on + * + * Similar to page_pool_put_full_page() but caller must guarantee safe context + * (e.g NAPI), since it will recycle the page directly into the pool fast cache. + */ +static inline void page_pool_recycle_direct(struct page_pool *pool, + struct page *page) +{ + page_pool_put_full_page(pool, page, true); +} + +#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT \ + (sizeof(dma_addr_t) > sizeof(unsigned long)) + +/** + * page_pool_get_dma_addr() - Retrieve the stored DMA address. + * @page: page allocated from a page pool + * + * Fetch the DMA address of the page. The page pool to which the page belongs + * must had been created with PP_FLAG_DMA_MAP. + */ +static inline dma_addr_t page_pool_get_dma_addr(struct page *page) +{ + dma_addr_t ret = page->dma_addr; + + if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) + ret |= (dma_addr_t)page->dma_addr_upper << 16 << 16; + + return ret; +} + +static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr) +{ + page->dma_addr = addr; + if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) + page->dma_addr_upper = upper_32_bits(addr); +} + +static inline bool page_pool_put(struct page_pool *pool) +{ + return refcount_dec_and_test(&pool->user_cnt); +} + +static inline void page_pool_nid_changed(struct page_pool *pool, int new_nid) +{ + if (unlikely(pool->p.nid != new_nid)) + page_pool_update_nid(pool, new_nid); +} + +#endif /* _NET_PAGE_POOL_HELPERS_H */ diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h new file mode 100644 index 000000000000..9ac39191bed7 --- /dev/null +++ b/include/net/page_pool/types.h @@ -0,0 +1,238 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _NET_PAGE_POOL_TYPES_H +#define _NET_PAGE_POOL_TYPES_H + +#include +#include + +#define PP_FLAG_DMA_MAP BIT(0) /* Should page_pool do the DMA + * map/unmap + */ +#define PP_FLAG_DMA_SYNC_DEV BIT(1) /* If set all pages that the driver gets + * from page_pool will be + * DMA-synced-for-device according to + * the length provided by the device + * driver. + * Please note DMA-sync-for-CPU is still + * device driver responsibility + */ +#define PP_FLAG_PAGE_FRAG BIT(2) /* for page frag feature */ +#define PP_FLAG_ALL (PP_FLAG_DMA_MAP |\ + PP_FLAG_DMA_SYNC_DEV |\ + PP_FLAG_PAGE_FRAG) + +/* + * Fast allocation side cache array/stack + * + * The cache size and refill watermark is related to the network + * use-case. The NAPI budget is 64 packets. After a NAPI poll the RX + * ring is usually refilled and the max consumed elements will be 64, + * thus a natural max size of objects needed in the cache. + * + * Keeping room for more objects, is due to XDP_DROP use-case. As + * XDP_DROP allows the opportunity to recycle objects directly into + * this array, as it shares the same softirq/NAPI protection. If + * cache is already full (or partly full) then the XDP_DROP recycles + * would have to take a slower code path. + */ +#define PP_ALLOC_CACHE_SIZE 128 +#define PP_ALLOC_CACHE_REFILL 64 +struct pp_alloc_cache { + u32 count; + struct page *cache[PP_ALLOC_CACHE_SIZE]; +}; + +/** + * struct page_pool_params - page pool parameters + * @flags: PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV, PP_FLAG_PAGE_FRAG + * @order: 2^order pages on allocation + * @pool_size: size of the ptr_ring + * @nid: NUMA node id to allocate from pages from + * @dev: device, for DMA pre-mapping purposes + * @napi: NAPI which is the sole consumer of pages, otherwise NULL + * @dma_dir: DMA mapping direction + * @max_len: max DMA sync memory size for PP_FLAG_DMA_SYNC_DEV + * @offset: DMA sync address offset for PP_FLAG_DMA_SYNC_DEV + */ +struct page_pool_params { + unsigned int flags; + unsigned int order; + unsigned int pool_size; + int nid; + struct device *dev; + struct napi_struct *napi; + enum dma_data_direction dma_dir; + unsigned int max_len; + unsigned int offset; +/* private: used by test code only */ + void (*init_callback)(struct page *page, void *arg); + void *init_arg; +}; + +#ifdef CONFIG_PAGE_POOL_STATS +/** + * struct page_pool_alloc_stats - allocation statistics + * @fast: successful fast path allocations + * @slow: slow path order-0 allocations + * @slow_high_order: slow path high order allocations + * @empty: ptr ring is empty, so a slow path allocation was forced + * @refill: an allocation which triggered a refill of the cache + * @waive: pages obtained from the ptr ring that cannot be added to + * the cache due to a NUMA mismatch + */ +struct page_pool_alloc_stats { + u64 fast; + u64 slow; + u64 slow_high_order; + u64 empty; + u64 refill; + u64 waive; +}; + +/** + * struct page_pool_recycle_stats - recycling (freeing) statistics + * @cached: recycling placed page in the page pool cache + * @cache_full: page pool cache was full + * @ring: page placed into the ptr ring + * @ring_full: page released from page pool because the ptr ring was full + * @released_refcnt: page released (and not recycled) because refcnt > 1 + */ +struct page_pool_recycle_stats { + u64 cached; + u64 cache_full; + u64 ring; + u64 ring_full; + u64 released_refcnt; +}; + +/** + * struct page_pool_stats - combined page pool use statistics + * @alloc_stats: see struct page_pool_alloc_stats + * @recycle_stats: see struct page_pool_recycle_stats + * + * Wrapper struct for combining page pool stats with different storage + * requirements. + */ +struct page_pool_stats { + struct page_pool_alloc_stats alloc_stats; + struct page_pool_recycle_stats recycle_stats; +}; +#endif + +struct page_pool { + struct page_pool_params p; + + struct delayed_work release_dw; + void (*disconnect)(void *pool); + unsigned long defer_start; + unsigned long defer_warn; + + u32 pages_state_hold_cnt; + unsigned int frag_offset; + struct page *frag_page; + long frag_users; + +#ifdef CONFIG_PAGE_POOL_STATS + /* these stats are incremented while in softirq context */ + struct page_pool_alloc_stats alloc_stats; +#endif + u32 xdp_mem_id; + + /* + * Data structure for allocation side + * + * Drivers allocation side usually already perform some kind + * of resource protection. Piggyback on this protection, and + * require driver to protect allocation side. + * + * For NIC drivers this means, allocate a page_pool per + * RX-queue. As the RX-queue is already protected by + * Softirq/BH scheduling and napi_schedule. NAPI schedule + * guarantee that a single napi_struct will only be scheduled + * on a single CPU (see napi_schedule). + */ + struct pp_alloc_cache alloc ____cacheline_aligned_in_smp; + + /* Data structure for storing recycled pages. + * + * Returning/freeing pages is more complicated synchronization + * wise, because free's can happen on remote CPUs, with no + * association with allocation resource. + * + * Use ptr_ring, as it separates consumer and producer + * efficiently, it a way that doesn't bounce cache-lines. + * + * TODO: Implement bulk return pages into this structure. + */ + struct ptr_ring ring; + +#ifdef CONFIG_PAGE_POOL_STATS + /* recycle stats are per-cpu to avoid locking */ + struct page_pool_recycle_stats __percpu *recycle_stats; +#endif + atomic_t pages_state_release_cnt; + + /* A page_pool is strictly tied to a single RX-queue being + * protected by NAPI, due to above pp_alloc_cache. This + * refcnt serves purpose is to simplify drivers error handling. + */ + refcount_t user_cnt; + + u64 destroy_cnt; +}; + +struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp); +struct page *page_pool_alloc_frag(struct page_pool *pool, unsigned int *offset, + unsigned int size, gfp_t gfp); +bool page_pool_return_skb_page(struct page *page, bool napi_safe); + +struct page_pool *page_pool_create(const struct page_pool_params *params); + +struct xdp_mem_info; + +#ifdef CONFIG_PAGE_POOL +void page_pool_unlink_napi(struct page_pool *pool); +void page_pool_destroy(struct page_pool *pool); +void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *), + struct xdp_mem_info *mem); +void page_pool_put_page_bulk(struct page_pool *pool, void **data, + int count); +#else +static inline void page_pool_unlink_napi(struct page_pool *pool) +{ +} + +static inline void page_pool_destroy(struct page_pool *pool) +{ +} + +static inline void page_pool_use_xdp_mem(struct page_pool *pool, + void (*disconnect)(void *), + struct xdp_mem_info *mem) +{ +} + +static inline void page_pool_put_page_bulk(struct page_pool *pool, void **data, + int count) +{ +} +#endif + +void page_pool_put_defragged_page(struct page_pool *pool, struct page *page, + unsigned int dma_sync_size, + bool allow_direct); + +static inline bool is_page_pool_compiled_in(void) +{ +#ifdef CONFIG_PAGE_POOL + return true; +#else + return false; +#endif +} + +/* Caller must provide appropriate safe context, e.g. NAPI. */ +void page_pool_update_nid(struct page_pool *pool, int new_nid); + +#endif /* _NET_PAGE_POOL_H */ diff --git a/include/trace/events/page_pool.h b/include/trace/events/page_pool.h index ca534501158b..6834356b2d2a 100644 --- a/include/trace/events/page_pool.h +++ b/include/trace/events/page_pool.h @@ -9,7 +9,7 @@ #include #include -#include +#include TRACE_EVENT(page_pool_release, -- cgit v1.2.3 From 75eaf63ea7afeafd026ffef03bdc69e31f10829b Mon Sep 17 00:00:00 2001 From: Alexander Lobakin Date: Fri, 4 Aug 2023 20:05:25 +0200 Subject: net: skbuff: don't include to Currently, touching triggers a rebuild of more than half of the kernel. That's because it's included in . And each new include to page_pool/types.h adds more [useless] data for the toolchain to process per each source file from that pile. In commit 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling"), Matteo included it to be able to call a couple of functions defined there. Then, in commit 57f05bc2ab24 ("page_pool: keep pp info as long as page pool owns the page") one of the calls was removed, so only one was left. It's the call to page_pool_return_skb_page() in napi_frag_unref(). The function is external and doesn't have any dependencies. Having very niche page_pool_types.h included only for that looks like an overkill. As %PP_SIGNATURE is not local to page_pool.c (was only in the early submissions), nothing holds this function there. Teleport page_pool_return_skb_page() to skbuff.c, just next to the main consumer, skb_pp_recycle(), and rename it to napi_pp_put_page(), as it doesn't work with skbs at all and the former name tells nothing. The #if guards here are only to not compile and have it in the vmlinux when not needed -- both call sites are already guarded. Now, touching page_pool_types.h only triggers rebuilding of the drivers using it and a couple of core networking files. Suggested-by: Jakub Kicinski # make skbuff.h less heavy Suggested-by: Alexander Duyck # move to skbuff.c Signed-off-by: Alexander Lobakin Reviewed-by: Alexander Duyck Link: https://lore.kernel.org/r/20230804180529.2483231-3-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 5 +++-- include/net/page_pool/types.h | 2 -- 2 files changed, 3 insertions(+), 4 deletions(-) (limited to 'include') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 888e3d7e74c1..aa57e2eca33b 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -32,7 +32,6 @@ #include #include #include -#include #if IS_ENABLED(CONFIG_NF_CONNTRACK) #include #endif @@ -3421,13 +3420,15 @@ static inline void skb_frag_ref(struct sk_buff *skb, int f) __skb_frag_ref(&skb_shinfo(skb)->frags[f]); } +bool napi_pp_put_page(struct page *page, bool napi_safe); + static inline void napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe) { struct page *page = skb_frag_page(frag); #ifdef CONFIG_PAGE_POOL - if (recycle && page_pool_return_skb_page(page, napi_safe)) + if (recycle && napi_pp_put_page(page, napi_safe)) return; #endif put_page(page); diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h index 9ac39191bed7..fcb846523398 100644 --- a/include/net/page_pool/types.h +++ b/include/net/page_pool/types.h @@ -185,8 +185,6 @@ struct page_pool { struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp); struct page *page_pool_alloc_frag(struct page_pool *pool, unsigned int *offset, unsigned int size, gfp_t gfp); -bool page_pool_return_skb_page(struct page *page, bool napi_safe); - struct page_pool *page_pool_create(const struct page_pool_params *params); struct xdp_mem_info; -- cgit v1.2.3 From 06d0fbdad612cb8def19065cf1fa14fc34dba9f8 Mon Sep 17 00:00:00 2001 From: Alexander Lobakin Date: Fri, 4 Aug 2023 20:05:26 +0200 Subject: page_pool: place frag_* fields in one cacheline On x86_64, frag_* fields of struct page_pool are scattered across two cachelines despite the summary size of 24 bytes. All three fields are used in pretty much the same places, but the last field, ::frag_users, is pushed out to the next CL, provoking unwanted false-sharing on hotpath (frags allocation code). There are some holes and cold members to move around. Move frag_* one block up, placing them right after &page_pool_params perfectly at the beginning of CL2. This doesn't do any meaningful to the second block, as those are some destroy-path cold structures, and doesn't do anything to ::alloc_stats, which still starts at 200-byte offset, 8 bytes after CL3 (still fitting into 1 cacheline). On my setup, this yields 1-2% of Mpps when using PP frags actively. When it comes to 32-bit architectures with 32-byte CL: &page_pool_params plus ::pad is 44 bytes, the block taken care of is 16 bytes within one CL, so there should be at least no regressions from the actual change. ::pages_state_hold_cnt is not related directly to that triple, but is paired currently with ::frags_offset and decoupling them would mean either two 4-byte holes or more invasive layout changes. Signed-off-by: Alexander Lobakin Reviewed-by: Alexander Duyck Link: https://lore.kernel.org/r/20230804180529.2483231-4-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski --- include/net/page_pool/types.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'include') diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h index fcb846523398..887e7946a597 100644 --- a/include/net/page_pool/types.h +++ b/include/net/page_pool/types.h @@ -123,16 +123,16 @@ struct page_pool_stats { struct page_pool { struct page_pool_params p; + long frag_users; + struct page *frag_page; + unsigned int frag_offset; + u32 pages_state_hold_cnt; + struct delayed_work release_dw; void (*disconnect)(void *pool); unsigned long defer_start; unsigned long defer_warn; - u32 pages_state_hold_cnt; - unsigned int frag_offset; - struct page *frag_page; - long frag_users; - #ifdef CONFIG_PAGE_POOL_STATS /* these stats are incremented while in softirq context */ struct page_pool_alloc_stats alloc_stats; -- cgit v1.2.3 From ff4e538c8c3e675a15e1e49509c55951832e0451 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Fri, 4 Aug 2023 20:05:28 +0200 Subject: page_pool: add a lockdep check for recycling in hardirq Page pool use in hardirq is prohibited, add debug checks to catch misuses. IIRC we previously discussed using DEBUG_NET_WARN_ON_ONCE() for this, but there were concerns that people will have DEBUG_NET enabled in perf testing. I don't think anyone enables lockdep in perf testing, so use lockdep to avoid pushback and arguing :) Acked-by: Jesper Dangaard Brouer Signed-off-by: Alexander Lobakin Reviewed-by: Alexander Duyck Link: https://lore.kernel.org/r/20230804180529.2483231-6-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski --- include/linux/lockdep.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include') diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 310f85903c91..dc2844b071c2 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -625,6 +625,12 @@ do { \ WARN_ON_ONCE(__lockdep_enabled && !this_cpu_read(hardirq_context)); \ } while (0) +#define lockdep_assert_no_hardirq() \ +do { \ + WARN_ON_ONCE(__lockdep_enabled && (this_cpu_read(hardirq_context) || \ + !this_cpu_read(hardirqs_enabled))); \ +} while (0) + #define lockdep_assert_preemption_enabled() \ do { \ WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT) && \ @@ -659,6 +665,7 @@ do { \ # define lockdep_assert_irqs_enabled() do { } while (0) # define lockdep_assert_irqs_disabled() do { } while (0) # define lockdep_assert_in_irq() do { } while (0) +# define lockdep_assert_no_hardirq() do { } while (0) # define lockdep_assert_preemption_enabled() do { } while (0) # define lockdep_assert_preemption_disabled() do { } while (0) -- cgit v1.2.3 From a3c485a5d8d47af5d2d1a0e5c3b7a1ed223669f9 Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Mon, 7 Aug 2023 10:59:54 +0200 Subject: bpf: Add support for bpf_get_func_ip helper for uprobe program Adding support for bpf_get_func_ip helper for uprobe program to return probed address for both uprobe and return uprobe. We discussed this in [1] and agreed that uprobe can have special use of bpf_get_func_ip helper that differs from kprobe. The kprobe bpf_get_func_ip returns: - address of the function if probe is attach on function entry for both kprobe and return kprobe - 0 if the probe is not attach on function entry The uprobe bpf_get_func_ip returns: - address of the probe for both uprobe and return uprobe The reason for this semantic change is that kernel can't really tell if the probe user space address is function entry. The uprobe program is actually kprobe type program attached as uprobe. One of the consequences of this design is that uprobes do not have its own set of helpers, but share them with kprobes. As we need different functionality for bpf_get_func_ip helper for uprobe, I'm adding the bool value to the bpf_trace_run_ctx, so the helper can detect that it's executed in uprobe context and call specific code. The is_uprobe bool is set as true in bpf_prog_run_array_sleepable, which is currently used only for executing bpf programs in uprobe. Renaming bpf_prog_run_array_sleepable to bpf_prog_run_array_uprobe to address that it's only used for uprobes and that it sets the run_ctx.is_uprobe as suggested by Yafang Shao. Suggested-by: Andrii Nakryiko Tested-by: Alan Maguire [1] https://lore.kernel.org/bpf/CAEf4BzZ=xLVkG5eurEuvLU79wAMtwho7ReR+XJAgwhFF4M-7Cg@mail.gmail.com/ Signed-off-by: Jiri Olsa Tested-by: Viktor Malik Acked-by: Yonghong Song Link: https://lore.kernel.org/r/20230807085956.2344866-2-jolsa@kernel.org Signed-off-by: Martin KaFai Lau --- include/linux/bpf.h | 9 +++++++-- include/uapi/linux/bpf.h | 7 ++++++- 2 files changed, 13 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index abe75063630b..db3fe5a61b05 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1819,6 +1819,7 @@ struct bpf_cg_run_ctx { struct bpf_trace_run_ctx { struct bpf_run_ctx run_ctx; u64 bpf_cookie; + bool is_uprobe; }; struct bpf_tramp_run_ctx { @@ -1867,6 +1868,8 @@ bpf_prog_run_array(const struct bpf_prog_array *array, if (unlikely(!array)) return ret; + run_ctx.is_uprobe = false; + migrate_disable(); old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); item = &array->items[0]; @@ -1891,8 +1894,8 @@ bpf_prog_run_array(const struct bpf_prog_array *array, * rcu-protected dynamically sized maps. */ static __always_inline u32 -bpf_prog_run_array_sleepable(const struct bpf_prog_array __rcu *array_rcu, - const void *ctx, bpf_prog_run_fn run_prog) +bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu, + const void *ctx, bpf_prog_run_fn run_prog) { const struct bpf_prog_array_item *item; const struct bpf_prog *prog; @@ -1906,6 +1909,8 @@ bpf_prog_run_array_sleepable(const struct bpf_prog_array __rcu *array_rcu, rcu_read_lock_trace(); migrate_disable(); + run_ctx.is_uprobe = true; + array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held()); if (unlikely(!array)) goto out; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 70da85200695..d21deb46f49f 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -5086,9 +5086,14 @@ union bpf_attr { * u64 bpf_get_func_ip(void *ctx) * Description * Get address of the traced function (for tracing and kprobe programs). + * + * When called for kprobe program attached as uprobe it returns + * probe address for both entry and return uprobe. + * * Return - * Address of the traced function. + * Address of the traced function for kprobe. * 0 for kprobes placed within the function (not at the entry). + * Address of the probe for uprobe and return uprobe. * * u64 bpf_get_attach_cookie(void *ctx) * Description -- cgit v1.2.3 From 29cfda963f899da403d6bc5a3abe19d2e0be0cf4 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Wed, 2 Aug 2023 21:09:57 +0800 Subject: netfilter: gre: Remove unused function declaration nf_ct_gre_keymap_flush() Commit a23f89a99906 ("netfilter: conntrack: nf_ct_gre_keymap_flush() removal") leave this unused, remove it. Signed-off-by: Yue Haibing Signed-off-by: Florian Westphal --- include/linux/netfilter/nf_conntrack_proto_gre.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include') diff --git a/include/linux/netfilter/nf_conntrack_proto_gre.h b/include/linux/netfilter/nf_conntrack_proto_gre.h index f33aa6021364..34ce5d2f37a2 100644 --- a/include/linux/netfilter/nf_conntrack_proto_gre.h +++ b/include/linux/netfilter/nf_conntrack_proto_gre.h @@ -25,7 +25,6 @@ struct nf_ct_gre_keymap { int nf_ct_gre_keymap_add(struct nf_conn *ct, enum ip_conntrack_dir dir, struct nf_conntrack_tuple *t); -void nf_ct_gre_keymap_flush(struct net *net); /* delete keymap entries */ void nf_ct_gre_keymap_destroy(struct nf_conn *ct); -- cgit v1.2.3 From 529f63fa11eba5fbe448fbe537b3576edd9fd277 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Wed, 2 Aug 2023 21:15:49 +0800 Subject: netfilter: helper: Remove unused function declarations Commit b118509076b3 ("netfilter: remove nf_conntrack_helper sysctl and modparam toggles") leave these unused declarations. Signed-off-by: Yue Haibing Reviewed-by: Simon Horman Signed-off-by: Florian Westphal --- include/net/netfilter/nf_conntrack_helper.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include') diff --git a/include/net/netfilter/nf_conntrack_helper.h b/include/net/netfilter/nf_conntrack_helper.h index f30b1694b690..de2f956abf34 100644 --- a/include/net/netfilter/nf_conntrack_helper.h +++ b/include/net/netfilter/nf_conntrack_helper.h @@ -136,8 +136,6 @@ static inline void *nfct_help_data(const struct nf_conn *ct) return (void *)help->data; } -void nf_conntrack_helper_pernet_init(struct net *net); - int nf_conntrack_helper_init(void); void nf_conntrack_helper_fini(void); @@ -182,5 +180,4 @@ void nf_nat_helper_unregister(struct nf_conntrack_nat_helper *nat); int nf_nat_helper_try_module_get(const char *name, u16 l3num, u8 protonum); void nf_nat_helper_put(struct nf_conntrack_helper *helper); -void nf_ct_set_auto_assign_helper_warned(struct net *net); #endif /*_NF_CONNTRACK_HELPER_H*/ -- cgit v1.2.3 From 172af3eab05f096122d7c239ab9a11b38b5e5c90 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Fri, 4 Aug 2023 21:41:49 +0800 Subject: netfilter: conntrack: Remove unused function declarations Commit 1015c3de23ee ("netfilter: conntrack: remove extension register api") leave nf_conntrack_acct_fini() and nf_conntrack_labels_init() unused, remove it. And commit a0ae2562c6c4 ("netfilter: conntrack: remove l3proto abstraction") leave behind nf_ct_l3proto_try_module_get() and nf_ct_l3proto_module_put(). Signed-off-by: Yue Haibing Reviewed-by: Simon Horman Signed-off-by: Florian Westphal --- include/net/netfilter/nf_conntrack.h | 4 ---- include/net/netfilter/nf_conntrack_acct.h | 2 -- include/net/netfilter/nf_conntrack_labels.h | 1 - 3 files changed, 7 deletions(-) (limited to 'include') diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index a72028dbef0c..4085765c3370 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -190,10 +190,6 @@ static inline void nf_ct_put(struct nf_conn *ct) nf_ct_destroy(&ct->ct_general); } -/* Protocol module loading */ -int nf_ct_l3proto_try_module_get(unsigned short l3proto); -void nf_ct_l3proto_module_put(unsigned short l3proto); - /* load module; enable/disable conntrack in this namespace */ int nf_ct_netns_get(struct net *net, u8 nfproto); void nf_ct_netns_put(struct net *net, u8 nfproto); diff --git a/include/net/netfilter/nf_conntrack_acct.h b/include/net/netfilter/nf_conntrack_acct.h index 4b2b7f8914ea..a120685cac93 100644 --- a/include/net/netfilter/nf_conntrack_acct.h +++ b/include/net/netfilter/nf_conntrack_acct.h @@ -78,6 +78,4 @@ static inline void nf_ct_acct_update(struct nf_conn *ct, u32 dir, void nf_conntrack_acct_pernet_init(struct net *net); -void nf_conntrack_acct_fini(void); - #endif /* _NF_CONNTRACK_ACCT_H */ diff --git a/include/net/netfilter/nf_conntrack_labels.h b/include/net/netfilter/nf_conntrack_labels.h index 66bab6c60d12..fcb19a4e8f2b 100644 --- a/include/net/netfilter/nf_conntrack_labels.h +++ b/include/net/netfilter/nf_conntrack_labels.h @@ -52,7 +52,6 @@ int nf_connlabels_replace(struct nf_conn *ct, const u32 *data, const u32 *mask, unsigned int words); #ifdef CONFIG_NF_CONNTRACK_LABELS -int nf_conntrack_labels_init(void); int nf_connlabels_get(struct net *net, unsigned int bit); void nf_connlabels_put(struct net *net); #else -- cgit v1.2.3 From 61e9ab294b39e5e7c040884b65d06f52e06ac40f Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Mon, 7 Aug 2023 22:25:26 +0800 Subject: netfilter: h323: Remove unused function declarations Commit f587de0e2feb ("[NETFILTER]: nf_conntrack/nf_nat: add H.323 helper port") declared but never implemented these. Signed-off-by: Yue Haibing Signed-off-by: Florian Westphal --- include/linux/netfilter/nf_conntrack_h323.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include') diff --git a/include/linux/netfilter/nf_conntrack_h323.h b/include/linux/netfilter/nf_conntrack_h323.h index 9e937f64a1ad..81286c499325 100644 --- a/include/linux/netfilter/nf_conntrack_h323.h +++ b/include/linux/netfilter/nf_conntrack_h323.h @@ -34,10 +34,6 @@ struct nf_ct_h323_master { int get_h225_addr(struct nf_conn *ct, unsigned char *data, TransportAddress *taddr, union nf_inet_addr *addr, __be16 *port); -void nf_conntrack_h245_expect(struct nf_conn *new, - struct nf_conntrack_expect *this); -void nf_conntrack_q931_expect(struct nf_conn *new, - struct nf_conntrack_expect *this); struct nfct_h323_nat_hooks { int (*set_h245_addr)(struct sk_buff *skb, unsigned int protoff, -- cgit v1.2.3 From 26bbbef8ff4047f857ac2d7353eb19e46100ce18 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Fri, 4 Aug 2023 15:30:13 +0200 Subject: net: fs_enet: Remove fs_get_id() fs_get_id() hasn't been used since commit b219108cbace ("fs_enet: Remove !CONFIG_PPC_CPM_NEW_BINDING code") Remove it. Signed-off-by: Christophe Leroy Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/7a53b88cc40302fcbea59554f5e7067e3594ad53.1691155346.git.christophe.leroy@csgroup.eu Signed-off-by: Jakub Kicinski --- include/linux/fs_enet_pd.h | 11 ----------- 1 file changed, 11 deletions(-) (limited to 'include') diff --git a/include/linux/fs_enet_pd.h b/include/linux/fs_enet_pd.h index 77d783f71527..2351c3d9404d 100644 --- a/include/linux/fs_enet_pd.h +++ b/include/linux/fs_enet_pd.h @@ -151,15 +151,4 @@ struct fs_mii_fec_platform_info { u32 mii_speed; }; -static inline int fs_get_id(struct fs_platform_info *fpi) -{ - if(strstr(fpi->fs_type, "SCC")) - return fs_scc_index2id(fpi->fs_no); - if(strstr(fpi->fs_type, "FCC")) - return fs_fcc_index2id(fpi->fs_no); - if(strstr(fpi->fs_type, "FEC")) - return fs_fec_index2id(fpi->fs_no); - return fpi->fs_no; -} - #endif -- cgit v1.2.3 From caaf482e265415778de74e262a6f153dd2f18fa4 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Fri, 4 Aug 2023 15:30:14 +0200 Subject: net: fs_enet: Remove unused fields in fs_platform_info struct Since commit 3dd82a1ea724 ("[POWERPC] CPM: Always use new binding.") many fields of fs_platform_info structure are not used anymore. Remove them. Signed-off-by: Christophe Leroy Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/2e584fcd75e21a0f7e7d5f942eebdc067b2f82f9.1691155346.git.christophe.leroy@csgroup.eu Signed-off-by: Jakub Kicinski --- include/linux/fs_enet_pd.h | 19 ------------------- 1 file changed, 19 deletions(-) (limited to 'include') diff --git a/include/linux/fs_enet_pd.h b/include/linux/fs_enet_pd.h index 2351c3d9404d..a1905e41c167 100644 --- a/include/linux/fs_enet_pd.h +++ b/include/linux/fs_enet_pd.h @@ -111,33 +111,14 @@ struct fs_mii_bb_platform_info { }; struct fs_platform_info { - - void(*init_ioports)(struct fs_platform_info *); /* device specific information */ - int fs_no; /* controller index */ - char fs_type[4]; /* controller type */ - - u32 cp_page; /* CPM page */ - u32 cp_block; /* CPM sblock */ u32 cp_command; /* CPM page/sblock/mcn */ - u32 clk_trx; /* some stuff for pins & mux configuration*/ - u32 clk_rx; - u32 clk_tx; - u32 clk_route; - u32 clk_mask; - - u32 mem_offset; u32 dpram_offset; - u32 fcc_regs_c; - u32 device_flags; - struct device_node *phy_node; - const struct fs_mii_bus_info *bus_info; int rx_ring, tx_ring; /* number of buffers on rx */ - __u8 macaddr[ETH_ALEN]; /* mac address */ int rx_copybreak; /* limit we copy small frames */ int napi_weight; /* NAPI weight */ -- cgit v1.2.3 From 9359a48c65a3c2723d469ba11889c56387e4395a Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Fri, 4 Aug 2023 15:30:15 +0200 Subject: net: fs_enet: Remove has_phy field in fs_platform_info struct Since commit 3dd82a1ea724 ("[POWERPC] CPM: Always use new binding.") has_phy field is never set. Remove dead code and remove the field. Signed-off-by: Christophe Leroy Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/bb5264e09e18f0ce8a0dbee399926a59f33cb248.1691155346.git.christophe.leroy@csgroup.eu Signed-off-by: Jakub Kicinski --- include/linux/fs_enet_pd.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include') diff --git a/include/linux/fs_enet_pd.h b/include/linux/fs_enet_pd.h index a1905e41c167..2b351b676467 100644 --- a/include/linux/fs_enet_pd.h +++ b/include/linux/fs_enet_pd.h @@ -123,7 +123,6 @@ struct fs_platform_info { int napi_weight; /* NAPI weight */ int use_rmii; /* use RMII mode */ - int has_phy; /* if the network is phy container as well...*/ struct clk *clk_per; /* 'per' clock for register access */ }; -- cgit v1.2.3 From 7a76918371fe04667528142554a3f26a8879e8ab Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Fri, 4 Aug 2023 15:30:17 +0200 Subject: net: fs_enet: Move struct fs_platform_info into fs_enet.h struct fs_platform_info is only used in fs_enet ethernet driver since commit 3dd82a1ea724 ("[POWERPC] CPM: Always use new binding."). Stale prototypes using fs_platform_info were left over in arch/powerpc/sysdev/fsl_soc.c but they are now removed by previous patch. Move struct fs_platform_info into fs_enet.h Signed-off-by: Christophe Leroy Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/f882d6b0b7075d0d8435310634ceaa2cc8e9938f.1691155347.git.christophe.leroy@csgroup.eu Signed-off-by: Jakub Kicinski --- include/linux/fs_enet_pd.h | 16 ---------------- 1 file changed, 16 deletions(-) (limited to 'include') diff --git a/include/linux/fs_enet_pd.h b/include/linux/fs_enet_pd.h index 2b351b676467..7c9897dab558 100644 --- a/include/linux/fs_enet_pd.h +++ b/include/linux/fs_enet_pd.h @@ -110,22 +110,6 @@ struct fs_mii_bb_platform_info { int irq[32]; /* irqs per phy's */ }; -struct fs_platform_info { - /* device specific information */ - u32 cp_command; /* CPM page/sblock/mcn */ - - u32 dpram_offset; - - struct device_node *phy_node; - - int rx_ring, tx_ring; /* number of buffers on rx */ - int rx_copybreak; /* limit we copy small frames */ - int napi_weight; /* NAPI weight */ - - int use_rmii; /* use RMII mode */ - - struct clk *clk_per; /* 'per' clock for register access */ -}; struct fs_mii_fec_platform_info { u32 irq[32]; u32 mii_speed; -- cgit v1.2.3 From 7149b38dc7cbc77548afd65e4bb4d4b11dca8670 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Fri, 4 Aug 2023 15:30:19 +0200 Subject: net: fs_enet: Remove linux/fs_enet_pd.h linux/fs_enet_pd.h is not used anymore. Remove it. Signed-off-by: Christophe Leroy Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/5be102791c987792ad127b15543ee6715394cf67.1691155347.git.christophe.leroy@csgroup.eu Signed-off-by: Jakub Kicinski --- include/linux/fs_enet_pd.h | 118 --------------------------------------------- 1 file changed, 118 deletions(-) delete mode 100644 include/linux/fs_enet_pd.h (limited to 'include') diff --git a/include/linux/fs_enet_pd.h b/include/linux/fs_enet_pd.h deleted file mode 100644 index 7c9897dab558..000000000000 --- a/include/linux/fs_enet_pd.h +++ /dev/null @@ -1,118 +0,0 @@ -/* - * Platform information definitions for the - * universal Freescale Ethernet driver. - * - * Copyright (c) 2003 Intracom S.A. - * by Pantelis Antoniou - * - * 2005 (c) MontaVista Software, Inc. - * Vitaly Bordug - * - * This file is licensed under the terms of the GNU General Public License - * version 2. This program is licensed "as is" without any warranty of any - * kind, whether express or implied. - */ - -#ifndef FS_ENET_PD_H -#define FS_ENET_PD_H - -#include -#include -#include -#include -#include - -#define FS_ENET_NAME "fs_enet" - -enum fs_id { - fsid_fec1, - fsid_fec2, - fsid_fcc1, - fsid_fcc2, - fsid_fcc3, - fsid_scc1, - fsid_scc2, - fsid_scc3, - fsid_scc4, -}; - -#define FS_MAX_INDEX 9 - -static inline int fs_get_fec_index(enum fs_id id) -{ - if (id >= fsid_fec1 && id <= fsid_fec2) - return id - fsid_fec1; - return -1; -} - -static inline int fs_get_fcc_index(enum fs_id id) -{ - if (id >= fsid_fcc1 && id <= fsid_fcc3) - return id - fsid_fcc1; - return -1; -} - -static inline int fs_get_scc_index(enum fs_id id) -{ - if (id >= fsid_scc1 && id <= fsid_scc4) - return id - fsid_scc1; - return -1; -} - -static inline int fs_fec_index2id(int index) -{ - int id = fsid_fec1 + index - 1; - if (id >= fsid_fec1 && id <= fsid_fec2) - return id; - return FS_MAX_INDEX; - } - -static inline int fs_fcc_index2id(int index) -{ - int id = fsid_fcc1 + index - 1; - if (id >= fsid_fcc1 && id <= fsid_fcc3) - return id; - return FS_MAX_INDEX; -} - -static inline int fs_scc_index2id(int index) -{ - int id = fsid_scc1 + index - 1; - if (id >= fsid_scc1 && id <= fsid_scc4) - return id; - return FS_MAX_INDEX; -} - -enum fs_mii_method { - fsmii_fixed, - fsmii_fec, - fsmii_bitbang, -}; - -enum fs_ioport { - fsiop_porta, - fsiop_portb, - fsiop_portc, - fsiop_portd, - fsiop_porte, -}; - -struct fs_mii_bit { - u32 offset; - u8 bit; - u8 polarity; -}; -struct fs_mii_bb_platform_info { - struct fs_mii_bit mdio_dir; - struct fs_mii_bit mdio_dat; - struct fs_mii_bit mdc_dat; - int delay; /* delay in us */ - int irq[32]; /* irqs per phy's */ -}; - -struct fs_mii_fec_platform_info { - u32 irq[32]; - u32 mii_speed; -}; - -#endif -- cgit v1.2.3 From de3ecc4fd8bf201c5cd02dc49687fb1506cebb45 Mon Sep 17 00:00:00 2001 From: Zhengchao Shao Date: Mon, 7 Aug 2023 09:25:54 +0800 Subject: team: change the init function in the team_option structure to void Because the init function in the team_option structure always returns 0, so change the init function to void and remove redundant code. Signed-off-by: Zhengchao Shao Reviewed-by: Hangbin Liu Reviewed-by: Jiri Pirko Link: https://lore.kernel.org/r/20230807012556.3146071-4-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski --- include/linux/if_team.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/if_team.h b/include/linux/if_team.h index 8de6b6e67829..fc01c3cfe86d 100644 --- a/include/linux/if_team.h +++ b/include/linux/if_team.h @@ -162,7 +162,7 @@ struct team_option { bool per_port; unsigned int array_size; /* != 0 means the option is array */ enum team_option_type type; - int (*init)(struct team *team, struct team_option_inst_info *info); + void (*init)(struct team *team, struct team_option_inst_info *info); int (*getter)(struct team *team, struct team_gsetter_ctx *ctx); int (*setter)(struct team *team, struct team_gsetter_ctx *ctx); }; -- cgit v1.2.3 From c3b41f4c7b7ce573f379c0053e3c86e722562659 Mon Sep 17 00:00:00 2001 From: Zhengchao Shao Date: Mon, 7 Aug 2023 09:25:55 +0800 Subject: team: change the getter function in the team_option structure to void Because the getter function in the team_option structure always returns 0, so change the getter function to void and remove redundant code. Signed-off-by: Zhengchao Shao Reviewed-by: Hangbin Liu Reviewed-by: Jiri Pirko Link: https://lore.kernel.org/r/20230807012556.3146071-5-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski --- include/linux/if_team.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/if_team.h b/include/linux/if_team.h index fc01c3cfe86d..1b9b15a492fa 100644 --- a/include/linux/if_team.h +++ b/include/linux/if_team.h @@ -163,7 +163,7 @@ struct team_option { unsigned int array_size; /* != 0 means the option is array */ enum team_option_type type; void (*init)(struct team *team, struct team_option_inst_info *info); - int (*getter)(struct team *team, struct team_gsetter_ctx *ctx); + void (*getter)(struct team *team, struct team_gsetter_ctx *ctx); int (*setter)(struct team *team, struct team_gsetter_ctx *ctx); }; -- cgit v1.2.3 From 209bccbac9e6baa68308e1e236992ae3873e49dc Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Mon, 7 Aug 2023 22:21:11 +0800 Subject: net: fq: Remove unused typedef fq_flow_get_default_t Commitbf9009bf21b5 ("net/fq_impl: drop get_default_func, move default flow to fq_tin") remove its last user, so can remove it. Signed-off-by: Yue Haibing Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20230807142111.33524-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski --- include/net/fq.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include') diff --git a/include/net/fq.h b/include/net/fq.h index 07b5aff6ec58..99fbe4127b95 100644 --- a/include/net/fq.h +++ b/include/net/fq.h @@ -98,9 +98,4 @@ typedef bool fq_skb_filter_t(struct fq *, struct sk_buff *, void *); -typedef struct fq_flow *fq_flow_get_default_t(struct fq *, - struct fq_tin *, - int idx, - struct sk_buff *); - #endif -- cgit v1.2.3 From b876b71a6ac24265848cff4f208e96bf82c32b29 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Mon, 7 Aug 2023 22:32:14 +0800 Subject: devlink: Remove unused devlink_dpipe_table_resource_set() declaration Commit f655dacb59ac ("net: devlink: remove unused locked functions") removed this but leave the declaration. Signed-off-by: Yue Haibing Reviewed-by: Jiri Pirko Link: https://lore.kernel.org/r/20230807143214.46648-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski --- include/net/devlink.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include') diff --git a/include/net/devlink.h b/include/net/devlink.h index a1a8e1b6e7df..f7fec0791acc 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -1743,9 +1743,6 @@ int devl_resource_size_get(struct devlink *devlink, int devl_dpipe_table_resource_set(struct devlink *devlink, const char *table_name, u64 resource_id, u64 resource_units); -int devlink_dpipe_table_resource_set(struct devlink *devlink, - const char *table_name, u64 resource_id, - u64 resource_units); void devl_resource_occ_get_register(struct devlink *devlink, u64 resource_id, devlink_resource_occ_get_t *occ_get, -- cgit v1.2.3 From 2c2b88748fd5028a7771a864d46b1b78fb436a07 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 7 Aug 2023 14:00:51 -0700 Subject: docs: net: page_pool: de-duplicate the intro comment In commit 82e896d992fa ("docs: net: page_pool: use kdoc to avoid duplicating the information") I shied away from using the DOC: comments when moving to kdoc for documenting page_pool API, because I wasn't sure how familiar people are with it. Turns out there is already a DOC: comment for the intro, which is the same in both places, modulo what looks like minor rewording. Use the version from Documentation/ but keep the contents with the code. Acked-by: Jesper Dangaard Brouer Link: https://lore.kernel.org/r/20230807210051.1014580-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- include/net/page_pool/helpers.h | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) (limited to 'include') diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h index 78df91804c87..94231533a369 100644 --- a/include/net/page_pool/helpers.h +++ b/include/net/page_pool/helpers.h @@ -8,23 +8,23 @@ /** * DOC: page_pool allocator * - * This page_pool allocator is optimized for the XDP mode that - * uses one-frame-per-page, but have fallbacks that act like the + * The page_pool allocator is optimized for the XDP mode that + * uses one frame per-page, but it can fallback on the * regular page allocator APIs. * - * Basic use involve replacing alloc_pages() calls with the - * page_pool_alloc_pages() call. Drivers should likely use + * Basic use involves replacing alloc_pages() calls with the + * page_pool_alloc_pages() call. Drivers should use * page_pool_dev_alloc_pages() replacing dev_alloc_pages(). * - * API keeps track of in-flight pages, in-order to let API user know - * when it is safe to dealloactor page_pool object. Thus, API users - * must call page_pool_put_page() where appropriate and only attach - * the page to a page_pool-aware objects, like skbs marked for recycling. + * API keeps track of in-flight pages, in order to let API user know + * when it is safe to free a page_pool object. Thus, API users + * must call page_pool_put_page() to free the page, or attach + * the page to a page_pool-aware objects like skbs marked with + * skb_mark_for_recycle(). * - * API user must only call page_pool_put_page() once on a page, as it - * will either recycle the page, or in case of elevated refcnt, it - * will release the DMA mapping and in-flight state accounting. We - * hope to lift this requirement in the future. + * API user must call page_pool_put_page() once on a page, as it + * will either recycle the page, or in case of refcnt > 1, it will + * release the DMA mapping and in-flight state accounting. */ #ifndef _NET_PAGE_POOL_HELPERS_H #define _NET_PAGE_POOL_HELPERS_H -- cgit v1.2.3 From 2adbb7637fd1fcec93f4680ddb5ddbbd1a91aefb Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Tue, 8 Aug 2023 22:57:41 +0800 Subject: bpf: btf: Remove two unused function declarations Commit db559117828d ("bpf: Consolidate spin_lock, timer management into btf_record") removed the implementations but leave declarations. Signed-off-by: Yue Haibing Link: https://lore.kernel.org/r/20230808145741.33292-1-yuehaibing@huawei.com Signed-off-by: Martin KaFai Lau --- include/linux/btf.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include') diff --git a/include/linux/btf.h b/include/linux/btf.h index cac9f304e27a..df64cc642074 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -204,8 +204,6 @@ u32 btf_nr_types(const struct btf *btf); bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s, const struct btf_member *m, u32 expected_offset, u32 expected_size); -int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t); -int btf_find_timer(const struct btf *btf, const struct btf_type *t); struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type *t, u32 field_mask, u32 value_size); int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec); -- cgit v1.2.3 From 90ed8d3dc34bb36b5fb59671924d1ac35f01e75d Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Tue, 8 Aug 2023 22:46:10 +0800 Subject: net: phy: Remove two unused function declarations Commit 1e2dc14509fd ("net: ethtool: Add helpers for reporting test results") declared but never implemented these function. Signed-off-by: Yue Haibing Reviewed-by: Andrew Lunn Link: https://lore.kernel.org/r/20230808144610.19096-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include') diff --git a/include/linux/phy.h b/include/linux/phy.h index ba08b0e60279..b963ce22e7c7 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1732,10 +1732,6 @@ int phy_start_cable_test_tdr(struct phy_device *phydev, } #endif -int phy_cable_test_result(struct phy_device *phydev, u8 pair, u16 result); -int phy_cable_test_fault_length(struct phy_device *phydev, u8 pair, - u16 cm); - static inline void phy_device_reset(struct phy_device *phydev, int value) { mdio_device_reset(&phydev->mdio, value); -- cgit v1.2.3 From a76728719c85eaa8440c353490c639655d60d28f Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Tue, 8 Aug 2023 22:59:55 +0800 Subject: net: switchdev: Remove unused declaration switchdev_port_fwd_mark_set() Commit 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked devices") removed the implementation but leave declaration. Signed-off-by: Yue Haibing Reviewed-by: Simon Horman Reviewed-by: Petr Machata Link: https://lore.kernel.org/r/20230808145955.2176-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski --- include/net/switchdev.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include') diff --git a/include/net/switchdev.h b/include/net/switchdev.h index 0294cfec9c37..a43062d4c734 100644 --- a/include/net/switchdev.h +++ b/include/net/switchdev.h @@ -326,10 +326,6 @@ int call_switchdev_blocking_notifiers(unsigned long val, struct net_device *dev, struct switchdev_notifier_info *info, struct netlink_ext_ack *extack); -void switchdev_port_fwd_mark_set(struct net_device *dev, - struct net_device *group_dev, - bool joining); - int switchdev_handle_fdb_event_to_device(struct net_device *dev, unsigned long event, const struct switchdev_notifier_fdb_info *fdb_info, bool (*check_cb)(const struct net_device *dev), -- cgit v1.2.3 From 1ded5e5a5931bb8b31e15b63b655fe232e3416b2 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 8 Aug 2023 13:58:09 +0000 Subject: net: annotate data-races around sock->ops IPV6_ADDRFORM socket option is evil, because it can change sock->ops while other threads might read it. Same issue for sk->sk_family being set to AF_INET. Adding READ_ONCE() over sock->ops reads is needed for sockets that might be impacted by IPV6_ADDRFORM. Note that mptcp_is_tcpsk() can also overwrite sock->ops. Adding annotations for all sk->sk_family reads will require more patches :/ BUG: KCSAN: data-race in ____sys_sendmsg / do_ipv6_setsockopt write to 0xffff888109f24ca0 of 8 bytes by task 4470 on cpu 0: do_ipv6_setsockopt+0x2c5e/0x2ce0 net/ipv6/ipv6_sockglue.c:491 ipv6_setsockopt+0x57/0x130 net/ipv6/ipv6_sockglue.c:1012 udpv6_setsockopt+0x95/0xa0 net/ipv6/udp.c:1690 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3663 __sys_setsockopt+0x1c3/0x230 net/socket.c:2273 __do_sys_setsockopt net/socket.c:2284 [inline] __se_sys_setsockopt net/socket.c:2281 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2281 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffff888109f24ca0 of 8 bytes by task 4469 on cpu 1: sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] ____sys_sendmsg+0x349/0x4c0 net/socket.c:2503 ___sys_sendmsg net/socket.c:2557 [inline] __sys_sendmmsg+0x263/0x500 net/socket.c:2643 __do_sys_sendmmsg net/socket.c:2672 [inline] __se_sys_sendmmsg net/socket.c:2669 [inline] __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2669 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0xffffffff850e32b8 -> 0xffffffff850da890 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 4469 Comm: syz-executor.1 Not tainted 6.4.0-rc5-syzkaller-00313-g4c605260bc60 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023 Reported-by: syzbot Signed-off-by: Eric Dumazet Reviewed-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20230808135809.2300241-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- include/linux/net.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/net.h b/include/linux/net.h index 41c608c1b02c..c9b4a63791a4 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -123,7 +123,7 @@ struct socket { struct file *file; struct sock *sk; - const struct proto_ops *ops; + const struct proto_ops *ops; /* Might change with IPV6_ADDRFORM or MPTCP. */ struct socket_wq wq; }; -- cgit v1.2.3 From 1f507e80c700e31e358bf4213dc7e4dd614c7c72 Mon Sep 17 00:00:00 2001 From: Adham Faris Date: Mon, 7 Aug 2023 11:05:07 -0700 Subject: net/mlx5: Expose NIC temperature via hardware monitoring kernel API MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Expose NIC temperature by implementing hwmon kernel API, which turns current thermal zone kernel API to redundant. For each one of the supported and exposed thermal diode sensors, expose the following attributes: 1) Input temperature. 2) Highest temperature. 3) Temperature label: Depends on the firmware capability, if firmware doesn't support sensors naming, the fallback naming convention would be: "sensorX", where X is the HW spec (MTMP register) sensor index. 4) Temperature critical max value: refers to the high threshold of Warning Event. Will be exposed as `tempY_crit` hwmon attribute (RO attribute). For example for ConnectX5 HCA's this temperature value will be 105 Celsius, 10 degrees lower than the HW shutdown temperature). 5) Temperature reset history: resets highest temperature. For example, for dualport ConnectX5 NIC with a single IC thermal diode sensor will have 2 hwmon directories (one for each PCI function) under "/sys/class/hwmon/hwmon[X,Y]". Listing one of the directories above (hwmonX/Y) generates the corresponding output below: $ grep -H -d skip . /sys/class/hwmon/hwmon0/* Output ======================================================================= /sys/class/hwmon/hwmon0/name:mlx5 /sys/class/hwmon/hwmon0/temp1_crit:105000 /sys/class/hwmon/hwmon0/temp1_highest:48000 /sys/class/hwmon/hwmon0/temp1_input:46000 /sys/class/hwmon/hwmon0/temp1_label:asic grep: /sys/class/hwmon/hwmon0/temp1_reset_history: Permission denied In addition, displaying the sensors data via lm_sensors generates the corresponding output below: $ sensors Output ======================================================================= mlx5-pci-0800 Adapter: PCI adapter asic: +46.0°C (crit = +105.0°C, highest = +48.0°C) mlx5-pci-0801 Adapter: PCI adapter asic: +46.0°C (crit = +105.0°C, highest = +48.0°C) CC: Jean Delvare Signed-off-by: Adham Faris Reviewed-by: Tariq Toukan Reviewed-by: Gal Pressman Signed-off-by: Saeed Mahameed Acked-by: Guenter Roeck Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20230807180507.22984-3-saeed@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/mlx5/driver.h | 3 ++- include/linux/mlx5/mlx5_ifc.h | 14 +++++++++++++- 2 files changed, 15 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 3e1017d764b7..e1c7e502a4fc 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -134,6 +134,7 @@ enum { MLX5_REG_PCAM = 0x507f, MLX5_REG_NODE_DESC = 0x6001, MLX5_REG_HOST_ENDIANNESS = 0x7004, + MLX5_REG_MTCAP = 0x9009, MLX5_REG_MTMP = 0x900A, MLX5_REG_MCIA = 0x9014, MLX5_REG_MFRL = 0x9028, @@ -805,7 +806,7 @@ struct mlx5_core_dev { struct mlx5_rsc_dump *rsc_dump; u32 vsc_addr; struct mlx5_hv_vhca *hv_vhca; - struct mlx5_thermal *thermal; + struct mlx5_hwmon *hwmon; u64 num_block_tc; u64 num_block_ipsec; }; diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index b3ad6b9852ec..87fd6f9ed82c 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -10196,7 +10196,9 @@ struct mlx5_ifc_mcam_access_reg_bits { u8 mrtc[0x1]; u8 regs_44_to_32[0xd]; - u8 regs_31_to_0[0x20]; + u8 regs_31_to_10[0x16]; + u8 mtmp[0x1]; + u8 regs_8_to_0[0x9]; }; struct mlx5_ifc_mcam_access_reg_bits1 { @@ -10949,6 +10951,15 @@ struct mlx5_ifc_mrtc_reg_bits { u8 time_l[0x20]; }; +struct mlx5_ifc_mtcap_reg_bits { + u8 reserved_at_0[0x19]; + u8 sensor_count[0x7]; + + u8 reserved_at_20[0x20]; + + u8 sensor_map[0x40]; +}; + struct mlx5_ifc_mtmp_reg_bits { u8 reserved_at_0[0x14]; u8 sensor_index[0xc]; @@ -11036,6 +11047,7 @@ union mlx5_ifc_ports_control_registers_document_bits { struct mlx5_ifc_mfrl_reg_bits mfrl_reg; struct mlx5_ifc_mtutc_reg_bits mtutc_reg; struct mlx5_ifc_mrtc_reg_bits mrtc_reg; + struct mlx5_ifc_mtcap_reg_bits mtcap_reg; struct mlx5_ifc_mtmp_reg_bits mtmp_reg; u8 reserved_at_0[0x60e0]; }; -- cgit v1.2.3 From 40b0425f8ba17c32cf7182975032a3999c364dfc Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Mon, 7 Aug 2023 22:33:19 +0300 Subject: net: ptp: create a mock-up PTP Hardware Clock driver There are several cases where virtual net devices may benefit from having a PTP clock, and these have to do with testing. I can see at least netdevsim and veth as potential users of a common mock-up PTP hardware clock driver. The proposed idea is to create an object which emulates PTP clock operations on top of the unadjustable CLOCK_MONOTONIC_RAW plus a software-controlled time domain via a timecounter/cyclecounter and then link that PHC to the netdevsim device. The driver is fully functional for its intended purpose, and it successfully passes the PTP selftests. $ cd tools/testing/selftests/ptp/ $ ./phc.sh /dev/ptp2 TEST: settime [ OK ] TEST: adjtime [ OK ] TEST: adjfreq [ OK ] Signed-off-by: Vladimir Oltean Link: https://lore.kernel.org/r/20230807193324.4128292-7-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski --- include/linux/ptp_mock.h | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) create mode 100644 include/linux/ptp_mock.h (limited to 'include') diff --git a/include/linux/ptp_mock.h b/include/linux/ptp_mock.h new file mode 100644 index 000000000000..72eb401034d9 --- /dev/null +++ b/include/linux/ptp_mock.h @@ -0,0 +1,38 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Mock-up PTP Hardware Clock driver for virtual network devices + * + * Copyright 2023 NXP + */ + +#ifndef _PTP_MOCK_H_ +#define _PTP_MOCK_H_ + +struct device; +struct mock_phc; + +#if IS_ENABLED(CONFIG_PTP_1588_CLOCK_MOCK) + +struct mock_phc *mock_phc_create(struct device *dev); +void mock_phc_destroy(struct mock_phc *phc); +int mock_phc_index(struct mock_phc *phc); + +#else + +static inline struct mock_phc *mock_phc_create(struct device *dev) +{ + return NULL; +} + +static inline void mock_phc_destroy(struct mock_phc *phc) +{ +} + +static inline int mock_phc_index(struct mock_phc *phc) +{ + return -1; +} + +#endif + +#endif /* _PTP_MOCK_H_ */ -- cgit v1.2.3 From 1fc04a0b973392df5975901f56addc913d2c8f4d Mon Sep 17 00:00:00 2001 From: Shenwei Wang Date: Mon, 7 Aug 2023 11:07:15 -0500 Subject: net: stmmac: add new mode parameter for fix_mac_speed A mode parameter has been added to the callback function of fix_mac_speed to indicate the physical layer type. The mode can be one the following: MLO_AN_PHY - Conventional PHY MLO_AN_FIXED - Fixed-link mode MLO_AN_INBAND - In-band protocol Signed-off-by: Shenwei Wang Link: https://lore.kernel.org/r/20230807160716.259072-2-shenwei.wang@nxp.com Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 652404c03944..784277d666eb 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -256,7 +256,7 @@ struct plat_stmmacenet_data { u8 tx_sched_algorithm; struct stmmac_rxq_cfg rx_queues_cfg[MTL_MAX_RX_QUEUES]; struct stmmac_txq_cfg tx_queues_cfg[MTL_MAX_TX_QUEUES]; - void (*fix_mac_speed)(void *priv, unsigned int speed); + void (*fix_mac_speed)(void *priv, unsigned int speed, unsigned int mode); int (*fix_soc_reset)(void *priv, void __iomem *ioaddr); int (*serdes_powerup)(struct net_device *ndev, void *priv); void (*serdes_powerdown)(struct net_device *ndev, void *priv); -- cgit v1.2.3 From 1dcc03c9a7a824a31eaaecdfaa03542fb25feb6c Mon Sep 17 00:00:00 2001 From: Andrew Lunn Date: Tue, 8 Aug 2023 23:04:34 +0200 Subject: net: phy: phy_device: Call into the PHY driver to set LED offload Linux LEDs can be requested to perform hardware accelerated blinking to indicate link, RX, TX etc. Pass the rules for blinking to the PHY driver, if it implements the ops needed to determine if a given pattern can be offloaded, to offload it, and what the current offload is. Additionally implement the op needed to get what device the LED is for. Reviewed-by: Simon Horman Signed-off-by: Andrew Lunn Tested-by: Daniel Golle Link: https://lore.kernel.org/r/20230808210436.838995-3-andrew@lunn.ch Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) (limited to 'include') diff --git a/include/linux/phy.h b/include/linux/phy.h index b963ce22e7c7..3c1ceedd1b77 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1105,6 +1105,39 @@ struct phy_driver { int (*led_blink_set)(struct phy_device *dev, u8 index, unsigned long *delay_on, unsigned long *delay_off); + /** + * @led_hw_is_supported: Can the HW support the given rules. + * @dev: PHY device which has the LED + * @index: Which LED of the PHY device + * @rules The core is interested in these rules + * + * Return 0 if yes, -EOPNOTSUPP if not, or an error code. + */ + int (*led_hw_is_supported)(struct phy_device *dev, u8 index, + unsigned long rules); + /** + * @led_hw_control_set: Set the HW to control the LED + * @dev: PHY device which has the LED + * @index: Which LED of the PHY device + * @rules The rules used to control the LED + * + * Returns 0, or a an error code. + */ + int (*led_hw_control_set)(struct phy_device *dev, u8 index, + unsigned long rules); + /** + * @led_hw_control_get: Get how the HW is controlling the LED + * @dev: PHY device which has the LED + * @index: Which LED of the PHY device + * @rules Pointer to the rules used to control the LED + * + * Set *@rules to how the HW is currently blinking. Returns 0 + * on success, or a error code if the current blinking cannot + * be represented in rules, or some other error happens. + */ + int (*led_hw_control_get)(struct phy_device *dev, u8 index, + unsigned long *rules); + }; #define to_phy_driver(d) container_of(to_mdio_common_driver(d), \ struct phy_driver, mdiodrv) -- cgit v1.2.3 From 4a8d287909c905107d88a9835141ad592aedae75 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Wed, 9 Aug 2023 21:49:43 +0800 Subject: net: caif: Remove unused declaration cfsrvl_ctrlcmd() Commit 43e369210108 ("caif: Move refcount from service layer to sock and dev.") declared but never implemented this. Signed-off-by: Yue Haibing Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20230809134943.37844-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski --- include/net/caif/cfsrvl.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include') diff --git a/include/net/caif/cfsrvl.h b/include/net/caif/cfsrvl.h index bd5440977f7f..5ee7b322e18b 100644 --- a/include/net/caif/cfsrvl.h +++ b/include/net/caif/cfsrvl.h @@ -33,9 +33,6 @@ struct cflayer *cfrfml_create(u8 linkid, struct dev_info *dev_info, int mtu_size); struct cflayer *cfdbgl_create(u8 linkid, struct dev_info *dev_info); -void cfsrvl_ctrlcmd(struct cflayer *layr, enum caif_ctrlcmd ctrl, - int phyid); - bool cfsrvl_phyid_match(struct cflayer *layer, int phyid); void cfsrvl_init(struct cfsrvl *service, -- cgit v1.2.3 From afa2420cff5448eb225e88543e01b0b34b9a43cd Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Wed, 9 Aug 2023 22:23:23 +0800 Subject: sctp: Remove unused declaration sctp_backlog_migrate() Commit 61c9fed41638 ("[SCTP]: A better solution to fix the race between sctp_peeloff() and sctp_rcv().") removed the implementation but left declaration in place. Remove it. Signed-off-by: Yue Haibing Acked-by: Xin Long Link: https://lore.kernel.org/r/20230809142323.9428-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski --- include/net/sctp/sctp.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include') diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h index 2a67100b2a17..a2310fa995f6 100644 --- a/include/net/sctp/sctp.h +++ b/include/net/sctp/sctp.h @@ -148,8 +148,6 @@ void sctp_icmp_redirect(struct sock *, struct sctp_transport *, void sctp_icmp_proto_unreachable(struct sock *sk, struct sctp_association *asoc, struct sctp_transport *t); -void sctp_backlog_migrate(struct sctp_association *assoc, - struct sock *oldsk, struct sock *newsk); int sctp_transport_hashtable_init(void); void sctp_transport_hashtable_destroy(void); int sctp_hash_transport(struct sctp_transport *t); -- cgit v1.2.3 From ac3899c6229649737b9d5cb86e417c98243883dc Mon Sep 17 00:00:00 2001 From: Shradha Gupta Date: Wed, 9 Aug 2023 21:15:22 -0700 Subject: net: mana: Add gdma stats to ethtool output for mana Extended performance counter stats in 'ethtool -S ' for MANA VF to include GDMA tx LSO packets and bytes count. Tested-on: Ubuntu22 Testcases: 1. LISA testcase: PERF-NETWORK-TCP-THROUGHPUT-MULTICONNECTION-NTTTCP-Synthetic 2. LISA testcase: PERF-NETWORK-TCP-THROUGHPUT-MULTICONNECTION-NTTTCP-SRIOV 3. Validated the GDMA stat packets and byte counters Signed-off-by: Shradha Gupta Reviewed-by: Pavan Chebbi Signed-off-by: David S. Miller --- include/net/mana/mana.h | 87 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 87 insertions(+) (limited to 'include') diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h index 879990101c9f..9f70b4332238 100644 --- a/include/net/mana/mana.h +++ b/include/net/mana/mana.h @@ -352,6 +352,13 @@ struct mana_tx_qp { struct mana_ethtool_stats { u64 stop_queue; u64 wake_queue; + u64 hc_tx_bytes; + u64 hc_tx_ucast_pkts; + u64 hc_tx_ucast_bytes; + u64 hc_tx_bcast_pkts; + u64 hc_tx_bcast_bytes; + u64 hc_tx_mcast_pkts; + u64 hc_tx_mcast_bytes; u64 tx_cqe_err; u64 tx_cqe_unknown_type; u64 rx_coalesced_err; @@ -442,6 +449,7 @@ u32 mana_run_xdp(struct net_device *ndev, struct mana_rxq *rxq, struct bpf_prog *mana_xdp_get(struct mana_port_context *apc); void mana_chn_setxdp(struct mana_port_context *apc, struct bpf_prog *prog); int mana_bpf(struct net_device *ndev, struct netdev_bpf *bpf); +void mana_query_gf_stats(struct mana_port_context *apc); extern const struct ethtool_ops mana_ethtool_ops; @@ -583,6 +591,49 @@ struct mana_fence_rq_resp { struct gdma_resp_hdr hdr; }; /* HW DATA */ +/* Query stats RQ */ +struct mana_query_gf_stat_req { + struct gdma_req_hdr hdr; + u64 req_stats; +}; /* HW DATA */ + +struct mana_query_gf_stat_resp { + struct gdma_resp_hdr hdr; + u64 reported_stats; + /* rx errors/discards */ + u64 discard_rx_nowqe; + u64 err_rx_vport_disabled; + /* rx bytes/packets */ + u64 hc_rx_bytes; + u64 hc_rx_ucast_pkts; + u64 hc_rx_ucast_bytes; + u64 hc_rx_bcast_pkts; + u64 hc_rx_bcast_bytes; + u64 hc_rx_mcast_pkts; + u64 hc_rx_mcast_bytes; + /* tx errors */ + u64 err_tx_gf_disabled; + u64 err_tx_vport_disabled; + u64 err_tx_inval_vport_offset_pkt; + u64 err_tx_vlan_enforcement; + u64 err_tx_ethtype_enforcement; + u64 err_tx_SA_enforecement; + u64 err_tx_SQPDID_enforcement; + u64 err_tx_CQPDID_enforcement; + u64 err_tx_mtu_violation; + u64 err_tx_inval_oob; + /* tx bytes/packets */ + u64 hc_tx_bytes; + u64 hc_tx_ucast_pkts; + u64 hc_tx_ucast_bytes; + u64 hc_tx_bcast_pkts; + u64 hc_tx_bcast_bytes; + u64 hc_tx_mcast_pkts; + u64 hc_tx_mcast_bytes; + /* tx error */ + u64 err_tx_gdma; +}; /* HW DATA */ + /* Configure vPort Rx Steering */ struct mana_cfg_rx_steer_req_v2 { struct gdma_req_hdr hdr; @@ -662,6 +713,42 @@ struct mana_deregister_filter_resp { struct gdma_resp_hdr hdr; }; /* HW DATA */ +/* Requested GF stats Flags */ +/* Rx discards/Errors */ +#define STATISTICS_FLAGS_RX_DISCARDS_NO_WQE 0x0000000000000001 +#define STATISTICS_FLAGS_RX_ERRORS_VPORT_DISABLED 0x0000000000000002 +/* Rx bytes/pkts */ +#define STATISTICS_FLAGS_HC_RX_BYTES 0x0000000000000004 +#define STATISTICS_FLAGS_HC_RX_UCAST_PACKETS 0x0000000000000008 +#define STATISTICS_FLAGS_HC_RX_UCAST_BYTES 0x0000000000000010 +#define STATISTICS_FLAGS_HC_RX_MCAST_PACKETS 0x0000000000000020 +#define STATISTICS_FLAGS_HC_RX_MCAST_BYTES 0x0000000000000040 +#define STATISTICS_FLAGS_HC_RX_BCAST_PACKETS 0x0000000000000080 +#define STATISTICS_FLAGS_HC_RX_BCAST_BYTES 0x0000000000000100 +/* Tx errors */ +#define STATISTICS_FLAGS_TX_ERRORS_GF_DISABLED 0x0000000000000200 +#define STATISTICS_FLAGS_TX_ERRORS_VPORT_DISABLED 0x0000000000000400 +#define STATISTICS_FLAGS_TX_ERRORS_INVAL_VPORT_OFFSET_PACKETS \ + 0x0000000000000800 +#define STATISTICS_FLAGS_TX_ERRORS_VLAN_ENFORCEMENT 0x0000000000001000 +#define STATISTICS_FLAGS_TX_ERRORS_ETH_TYPE_ENFORCEMENT \ + 0x0000000000002000 +#define STATISTICS_FLAGS_TX_ERRORS_SA_ENFORCEMENT 0x0000000000004000 +#define STATISTICS_FLAGS_TX_ERRORS_SQPDID_ENFORCEMENT 0x0000000000008000 +#define STATISTICS_FLAGS_TX_ERRORS_CQPDID_ENFORCEMENT 0x0000000000010000 +#define STATISTICS_FLAGS_TX_ERRORS_MTU_VIOLATION 0x0000000000020000 +#define STATISTICS_FLAGS_TX_ERRORS_INVALID_OOB 0x0000000000040000 +/* Tx bytes/pkts */ +#define STATISTICS_FLAGS_HC_TX_BYTES 0x0000000000080000 +#define STATISTICS_FLAGS_HC_TX_UCAST_PACKETS 0x0000000000100000 +#define STATISTICS_FLAGS_HC_TX_UCAST_BYTES 0x0000000000200000 +#define STATISTICS_FLAGS_HC_TX_MCAST_PACKETS 0x0000000000400000 +#define STATISTICS_FLAGS_HC_TX_MCAST_BYTES 0x0000000000800000 +#define STATISTICS_FLAGS_HC_TX_BCAST_PACKETS 0x0000000001000000 +#define STATISTICS_FLAGS_HC_TX_BCAST_BYTES 0x0000000002000000 +/* Tx error */ +#define STATISTICS_FLAGS_TX_ERRORS_GDMA_ERROR 0x0000000004000000 + #define MANA_MAX_NUM_QUEUES 64 #define MANA_SHORT_VPORT_OFFSET_MAX ((1U << 8) - 1) -- cgit v1.2.3 From ae75336131337f926d61b7fd86a0cca3146a7620 Mon Sep 17 00:00:00 2001 From: Claudia Draghicescu Date: Wed, 10 May 2023 16:45:57 +0300 Subject: Bluetooth: Check for ISO support in controller This patch checks for ISO_BROADCASTER and ISO_SYNC_RECEIVER in controller. Signed-off-by: Claudia Draghicescu Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/hci.h | 1 + include/net/bluetooth/hci_core.h | 1 + include/net/bluetooth/mgmt.h | 2 ++ 3 files changed, 4 insertions(+) (limited to 'include') diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h index 872dcb91a540..ab2f8f1817cf 100644 --- a/include/net/bluetooth/hci.h +++ b/include/net/bluetooth/hci.h @@ -577,6 +577,7 @@ enum { #define HCI_LE_CIS_CENTRAL 0x10 #define HCI_LE_CIS_PERIPHERAL 0x20 #define HCI_LE_ISO_BROADCASTER 0x40 +#define HCI_LE_ISO_SYNC_RECEIVER 0x80 /* Connection modes */ #define HCI_CM_ACTIVE 0x0000 diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index e01d52cb668c..da871581ef87 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -1765,6 +1765,7 @@ void hci_conn_del_sysfs(struct hci_conn *conn); #define cis_peripheral_capable(dev) \ ((dev)->le_features[3] & HCI_LE_CIS_PERIPHERAL) #define bis_capable(dev) ((dev)->le_features[3] & HCI_LE_ISO_BROADCASTER) +#define sync_recv_capable(dev) ((dev)->le_features[3] & HCI_LE_ISO_SYNC_RECEIVER) #define mws_transport_config_capable(dev) (((dev)->commands[30] & 0x08) && \ (!test_bit(HCI_QUIRK_BROKEN_MWS_TRANSPORT_CONFIG, &(dev)->quirks))) diff --git a/include/net/bluetooth/mgmt.h b/include/net/bluetooth/mgmt.h index 5e68b3dd4422..d382679efd2b 100644 --- a/include/net/bluetooth/mgmt.h +++ b/include/net/bluetooth/mgmt.h @@ -111,6 +111,8 @@ struct mgmt_rp_read_index_list { #define MGMT_SETTING_WIDEBAND_SPEECH BIT(17) #define MGMT_SETTING_CIS_CENTRAL BIT(18) #define MGMT_SETTING_CIS_PERIPHERAL BIT(19) +#define MGMT_SETTING_ISO_BROADCASTER BIT(20) +#define MGMT_SETTING_ISO_SYNC_RECEIVER BIT(21) #define MGMT_OP_READ_INFO 0x0004 #define MGMT_READ_INFO_SIZE 0 -- cgit v1.2.3 From a0bfde167b506423111ddb8cd71930497a40fc54 Mon Sep 17 00:00:00 2001 From: Iulia Tanasescu Date: Tue, 30 May 2023 17:21:59 +0300 Subject: Bluetooth: ISO: Add support for connecting multiple BISes It is required for some configurations to have multiple BISes as part of the same BIG. Similar to the flow implemented for unicast, DEFER_SETUP will also be used to bind multiple BISes for the same BIG, before starting Periodic Advertising and creating the BIG. The user will have to open a new socket for each BIS. By setting the BT_DEFER_SETUP socket option and calling connect, a new connection will be added for the BIG and advertising handle set by the socket QoS parameters. Since all BISes will be bound for the same BIG and advertising handle, the socket QoS options and base parameters should match for all connections. By calling connect on a socket that does not have the BT_DEFER_SETUP option set, periodic advertising will be started and the BIG will be created, with a BIS for each previously bound connection. Since a BIG cannot be reconfigured with additional BISes after creation, no more connections can be bound for the BIG after the start periodic advertising and create BIG commands have been queued. The bis_cleanup function has also been updated, so that the advertising set and the BIG will not be terminated unless there are no more bound or connected BISes. The HCI_CONN_BIG_CREATED connection flag has been added to indicate that the BIG has been successfully created. This flag is checked at bis_cleanup, so that the BIG is only terminated if the HCI_LE_Create_BIG_Complete has been received. This implementation has been tested on hardware, using the "isotest" tool with an additional command line option, to specify the number of BISes to create as part of the desired BIG: tools/isotest -i hci0 -s 00:00:00:00:00:00 -N 2 -G 1 -T 1 The btmon log shows that a BIG containing 2 BISes has been created: < HCI Command: LE Create Broadcast Isochronous Group (0x08|0x0068) plen 31 Handle: 0x01 Advertising Handle: 0x01 Number of BIS: 2 SDU Interval: 10000 us (0x002710) Maximum SDU size: 40 Maximum Latency: 10 ms (0x000a) RTN: 0x02 PHY: LE 2M (0x02) Packing: Sequential (0x00) Framing: Unframed (0x00) Encryption: 0x00 Broadcast Code: 00000000000000000000000000000000 > HCI Event: Command Status (0x0f) plen 4 LE Create Broadcast Isochronous Group (0x08|0x0068) ncmd 1 Status: Success (0x00) > HCI Event: LE Meta Event (0x3e) plen 23 LE Broadcast Isochronous Group Complete (0x1b) Status: Success (0x00) Handle: 0x01 BIG Synchronization Delay: 1974 us (0x0007b6) Transport Latency: 1974 us (0x0007b6) PHY: LE 2M (0x02) NSE: 3 BN: 1 PTO: 1 IRC: 3 Maximum PDU: 40 ISO Interval: 10.00 msec (0x0008) Connection Handle #0: 10 Connection Handle #1: 11 < HCI Command: LE Setup Isochronous Data Path (0x08|0x006e) plen 13 Handle: 10 Data Path Direction: Input (Host to Controller) (0x00) Data Path: HCI (0x00) Coding Format: Transparent (0x03) Company Codec ID: Ericsson Technology Licensing (0) Vendor Codec ID: 0 Controller Delay: 0 us (0x000000) Codec Configuration Length: 0 Codec Configuration: > HCI Event: Command Complete (0x0e) plen 6 LE Setup Isochronous Data Path (0x08|0x006e) ncmd 1 Status: Success (0x00) Handle: 10 < HCI Command: LE Setup Isochronous Data Path (0x08|0x006e) plen 13 Handle: 11 Data Path Direction: Input (Host to Controller) (0x00) Data Path: HCI (0x00) Coding Format: Transparent (0x03) Company Codec ID: Ericsson Technology Licensing (0) Vendor Codec ID: 0 Controller Delay: 0 us (0x000000) Codec Configuration Length: 0 Codec Configuration: > HCI Event: Command Complete (0x0e) plen 6 LE Setup Isochronous Data Path (0x08|0x006e) ncmd 1 Status: Success (0x00) Handle: 11 < ISO Data TX: Handle 10 flags 0x02 dlen 44 < ISO Data TX: Handle 11 flags 0x02 dlen 44 > HCI Event: Number of Completed Packets (0x13) plen 5 Num handles: 1 Handle: 10 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 Num handles: 1 Handle: 11 Count: 1 Signed-off-by: Iulia Tanasescu Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/hci_core.h | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) (limited to 'include') diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index da871581ef87..c0bb58f1e86f 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -974,6 +974,7 @@ enum { HCI_CONN_SCANNING, HCI_CONN_AUTH_FAILURE, HCI_CONN_PER_ADV, + HCI_CONN_BIG_CREATED, }; static inline bool hci_conn_ssp_enabled(struct hci_conn *conn) @@ -1115,6 +1116,32 @@ static inline struct hci_conn *hci_conn_hash_lookup_bis(struct hci_dev *hdev, return NULL; } +static inline struct hci_conn * +hci_conn_hash_lookup_per_adv_bis(struct hci_dev *hdev, + bdaddr_t *ba, + __u8 big, __u8 bis) +{ + struct hci_conn_hash *h = &hdev->conn_hash; + struct hci_conn *c; + + rcu_read_lock(); + + list_for_each_entry_rcu(c, &h->list, list) { + if (bacmp(&c->dst, ba) || c->type != ISO_LINK || + !test_bit(HCI_CONN_PER_ADV, &c->flags)) + continue; + + if (c->iso_qos.bcast.big == big && + c->iso_qos.bcast.bis == bis) { + rcu_read_unlock(); + return c; + } + } + rcu_read_unlock(); + + return NULL; +} + static inline struct hci_conn *hci_conn_hash_lookup_handle(struct hci_dev *hdev, __u16 handle) { @@ -1351,6 +1378,9 @@ struct hci_conn *hci_connect_sco(struct hci_dev *hdev, int type, bdaddr_t *dst, __u16 setting, struct bt_codec *codec); struct hci_conn *hci_bind_cis(struct hci_dev *hdev, bdaddr_t *dst, __u8 dst_type, struct bt_iso_qos *qos); +struct hci_conn *hci_bind_bis(struct hci_dev *hdev, bdaddr_t *dst, + struct bt_iso_qos *qos, + __u8 base_len, __u8 *base); struct hci_conn *hci_connect_cis(struct hci_dev *hdev, bdaddr_t *dst, __u8 dst_type, struct bt_iso_qos *qos); struct hci_conn *hci_connect_bis(struct hci_dev *hdev, bdaddr_t *dst, -- cgit v1.2.3 From 7f74563e6140e42b4ffae62adbef7a65967a3f98 Mon Sep 17 00:00:00 2001 From: Pauli Virtanen Date: Thu, 1 Jun 2023 09:34:46 +0300 Subject: Bluetooth: ISO: do not emit new LE Create CIS if previous is pending LE Create CIS command shall not be sent before all CIS Established events from its previous invocation have been processed. Currently it is sent via hci_sync but that only waits for the first event, but there can be multiple. Make it wait for all events, and simplify the CIS creation as follows: Add new flag HCI_CONN_CREATE_CIS, which is set if Create CIS has been sent for the connection but it is not yet completed. Make BT_CONNECT state to mean the connection wants Create CIS. On events after which new Create CIS may need to be sent, send it if possible and some connections need it. These events are: hci_connect_cis, iso_connect_cfm, hci_cs_le_create_cis, hci_le_cis_estabilished_evt. The Create CIS status/completion events shall queue new Create CIS only if at least one of the connections transitions away from BT_CONNECT, so that we don't loop if controller is sending bogus events. This fixes sending multiple CIS Create for the same CIS in the "ISO AC 6(i) - Success" BlueZ test case: < HCI Command: LE Create Co.. (0x08|0x0064) plen 9 #129 [hci0] Number of CIS: 2 CIS Handle: 257 ACL Handle: 42 CIS Handle: 258 ACL Handle: 42 > HCI Event: Command Status (0x0f) plen 4 #130 [hci0] LE Create Connected Isochronous Stream (0x08|0x0064) ncmd 1 Status: Success (0x00) > HCI Event: LE Meta Event (0x3e) plen 29 #131 [hci0] LE Connected Isochronous Stream Established (0x19) Status: Success (0x00) Connection Handle: 257 ... < HCI Command: LE Setup Is.. (0x08|0x006e) plen 13 #132 [hci0] ... > HCI Event: Command Complete (0x0e) plen 6 #133 [hci0] LE Setup Isochronous Data Path (0x08|0x006e) ncmd 1 ... < HCI Command: LE Create Co.. (0x08|0x0064) plen 5 #134 [hci0] Number of CIS: 1 CIS Handle: 258 ACL Handle: 42 > HCI Event: Command Status (0x0f) plen 4 #135 [hci0] LE Create Connected Isochronous Stream (0x08|0x0064) ncmd 1 Status: ACL Connection Already Exists (0x0b) > HCI Event: LE Meta Event (0x3e) plen 29 #136 [hci0] LE Connected Isochronous Stream Established (0x19) Status: Success (0x00) Connection Handle: 258 ... Fixes: c09b80be6ffc ("Bluetooth: hci_conn: Fix not waiting for HCI_EVT_LE_CIS_ESTABLISHED") Signed-off-by: Pauli Virtanen Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/hci_core.h | 4 +++- include/net/bluetooth/hci_sync.h | 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index c0bb58f1e86f..ad39d09e9bd6 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -975,6 +975,7 @@ enum { HCI_CONN_AUTH_FAILURE, HCI_CONN_PER_ADV, HCI_CONN_BIG_CREATED, + HCI_CONN_CREATE_CIS, }; static inline bool hci_conn_ssp_enabled(struct hci_conn *conn) @@ -1351,7 +1352,8 @@ int hci_disconnect(struct hci_conn *conn, __u8 reason); bool hci_setup_sync(struct hci_conn *conn, __u16 handle); void hci_sco_setup(struct hci_conn *conn, __u8 status); bool hci_iso_setup_path(struct hci_conn *conn); -int hci_le_create_cis(struct hci_conn *conn); +int hci_le_create_cis_pending(struct hci_dev *hdev); +int hci_conn_check_create_cis(struct hci_conn *conn); struct hci_conn *hci_conn_add(struct hci_dev *hdev, int type, bdaddr_t *dst, u8 role); diff --git a/include/net/bluetooth/hci_sync.h b/include/net/bluetooth/hci_sync.h index 2495be4d8b82..b516a0f4a55b 100644 --- a/include/net/bluetooth/hci_sync.h +++ b/include/net/bluetooth/hci_sync.h @@ -124,7 +124,7 @@ int hci_abort_conn_sync(struct hci_dev *hdev, struct hci_conn *conn, u8 reason); int hci_le_create_conn_sync(struct hci_dev *hdev, struct hci_conn *conn); -int hci_le_create_cis_sync(struct hci_dev *hdev, struct hci_conn *conn); +int hci_le_create_cis_sync(struct hci_dev *hdev); int hci_le_remove_cig_sync(struct hci_dev *hdev, u8 handle); -- cgit v1.2.3 From 6bfa273e533d7b25eee3d74e28a7fe8e6a8e7a93 Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz Date: Thu, 25 May 2023 16:46:41 -0700 Subject: Bluetooth: Consolidate code around sk_alloc into a helper function This consolidates code around sk_alloc into bt_sock_alloc which does take care of common initialization. Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/bluetooth.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h index af729859385e..60689a07b82c 100644 --- a/include/net/bluetooth/bluetooth.h +++ b/include/net/bluetooth/bluetooth.h @@ -400,6 +400,8 @@ int bt_sock_register(int proto, const struct net_proto_family *ops); void bt_sock_unregister(int proto); void bt_sock_link(struct bt_sock_list *l, struct sock *s); void bt_sock_unlink(struct bt_sock_list *l, struct sock *s); +struct sock *bt_sock_alloc(struct net *net, struct socket *sock, + struct proto *prot, int proto, gfp_t prio, int kern); int bt_sock_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, int flags); int bt_sock_stream_recvmsg(struct socket *sock, struct msghdr *msg, -- cgit v1.2.3 From 69ae5065061c53ded4f49e821646b9bc60ab302a Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz Date: Thu, 25 May 2023 16:46:43 -0700 Subject: Bluetooth: hci_sock: Forward credentials to monitor This stores scm_creds into hci_skb_cb so they can be properly forwarded to the likes of btmon which is then able to print information about the process who is originating the traffic: bluetoothd[35]: @ MGMT Command: Rea.. (0x0001) plen 0 {0x0001} @ MGMT Event: Command Complete (0x0001) plen 6 {0x0001} Read Management Version Information (0x0001) plen 3 bluetoothd[35]: < ACL Data T.. flags 0x00 dlen 41 ATT: Write Command (0x52) len 36 Handle: 0x0043 Type: ASE Control Point (0x2bc6) Data: 020203000110270000022800020a00409c0001000110270000022800020a00409c00 Opcode: QoS Configuration (0x02) Number of ASE(s): 2 ASE: #0 ASE ID: 0x03 CIG ID: 0x00 CIS ID: 0x01 SDU Interval: 10000 usec Framing: Unframed (0x00) PHY: 0x02 LE 2M PHY (0x02) Max SDU: 40 RTN: 2 Max Transport Latency: 10 Presentation Delay: 40000 us ASE: #1 ASE ID: 0x01 CIG ID: 0x00 CIS ID: 0x01 SDU Interval: 10000 usec Framing: Unframed (0x00) PHY: 0x02 LE 2M PHY (0x02) Max SDU: 40 RTN: 2 Max Transport Latency: 10 Presentation Delay: 40000 us Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/bluetooth.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h index 60689a07b82c..34998ae8ed78 100644 --- a/include/net/bluetooth/bluetooth.h +++ b/include/net/bluetooth/bluetooth.h @@ -471,6 +471,7 @@ struct bt_skb_cb { struct sco_ctrl sco; struct hci_ctrl hci; struct mgmt_ctrl mgmt; + struct scm_creds creds; }; }; #define bt_cb(skb) ((struct bt_skb_cb *)((skb)->cb)) -- cgit v1.2.3 From 6a42e9bfd17f7135d59701f93942a3392da482f4 Mon Sep 17 00:00:00 2001 From: Iulia Tanasescu Date: Mon, 19 Jun 2023 17:53:16 +0300 Subject: Bluetooth: ISO: Support multiple BIGs This adds support for creating multiple BIGs. According to spec, each BIG shall have an unique handle, and each BIG should be associated with a different advertising handle. Otherwise, the LE Create BIG command will fail, with error code Command Disallowed (for reusing a BIG handle), or Unknown Advertising Identifier (for reusing an advertising handle). The btmon snippet below shows an exercise for creating two BIGs for the same controller, by opening two isotest instances with the following command: tools/isotest -i hci0 -s 00:00:00:00:00:00 < HCI Command: LE Create Broadcast Isochronous Group (0x08|0x0068) plen 31 Handle: 0x00 Advertising Handle: 0x01 Number of BIS: 1 SDU Interval: 10000 us (0x002710) Maximum SDU size: 40 Maximum Latency: 10 ms (0x000a) RTN: 0x02 PHY: LE 2M (0x02) Packing: Sequential (0x00) Framing: Unframed (0x00) Encryption: 0x00 Broadcast Code: 00000000000000000000000000000000 > HCI Event: Command Status (0x0f) plen 4 LE Create Broadcast Isochronous Group (0x08|0x0068) ncmd 1 Status: Success (0x00) > HCI Event: LE Meta Event (0x3e) plen 21 LE Broadcast Isochronous Group Complete (0x1b) Status: Success (0x00) Handle: 0x00 BIG Synchronization Delay: 912 us (0x000390) Transport Latency: 912 us (0x000390) PHY: LE 2M (0x02) NSE: 3 BN: 1 PTO: 1 IRC: 3 Maximum PDU: 40 ISO Interval: 10.00 msec (0x0008) Connection Handle #0: 10 < HCI Command: LE Create Broadcast Isochronous Group (0x08|0x0068) Handle: 0x01 Advertising Handle: 0x02 Number of BIS: 1 SDU Interval: 10000 us (0x002710) Maximum SDU size: 40 Maximum Latency: 10 ms (0x000a) RTN: 0x02 PHY: LE 2M (0x02) Packing: Sequential (0x00) Framing: Unframed (0x00) Encryption: 0x00 Broadcast Code: 00000000000000000000000000000000 > HCI Event: Command Status (0x0f) plen 4 LE Create Broadcast Isochronous Group (0x08|0x0068) ncmd 1 Status: Success (0x00) Signed-off-by: Iulia Tanasescu Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/hci_core.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index ad39d09e9bd6..9140d4a80e38 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -1095,8 +1095,7 @@ static inline __u8 hci_conn_lookup_type(struct hci_dev *hdev, __u16 handle) } static inline struct hci_conn *hci_conn_hash_lookup_bis(struct hci_dev *hdev, - bdaddr_t *ba, - __u8 big, __u8 bis) + bdaddr_t *ba, __u8 bis) { struct hci_conn_hash *h = &hdev->conn_hash; struct hci_conn *c; @@ -1107,7 +1106,7 @@ static inline struct hci_conn *hci_conn_hash_lookup_bis(struct hci_dev *hdev, if (bacmp(&c->dst, ba) || c->type != ISO_LINK) continue; - if (c->iso_qos.bcast.big == big && c->iso_qos.bcast.bis == bis) { + if (c->iso_qos.bcast.bis == bis) { rcu_read_unlock(); return c; } -- cgit v1.2.3 From 9e14606d8f38ea52a38c27692a9c1513c987a5da Mon Sep 17 00:00:00 2001 From: Hilda Wu Date: Wed, 21 Jun 2023 18:00:31 +0800 Subject: Bluetooth: msft: Extended monitor tracking by address filter Since limited tracking device per condition, this feature is to support tracking multiple devices concurrently. When a pattern monitor detects the device, this feature issues an address monitor for tracking that device. Let pattern monitor can keep monitor new devices. This feature adds an address filter when receiving a LE monitor device event which monitor handle is for a pattern, and the controller started monitoring the device. And this feature also has cancelled the monitor advertisement from address filters when receiving a LE monitor device event when the controller stopped monitoring the device specified by an address and monitor handle. Below is an example to know the feature adds the address filter. //Add MSFT pattern monitor < HCI Command: Vendor (0x3f|0x00f0) plen 14 #142 [hci0] 55.552420 03 b8 a4 03 ff 01 01 06 09 05 5f 52 45 46 .........._REF > HCI Event: Command Complete (0x0e) plen 6 #143 [hci0] 55.653960 Vendor (0x3f|0x00f0) ncmd 2 Status: Success (0x00) 03 00 //Got event from the pattern monitor > HCI Event: Vendor (0xff) plen 18 #148 [hci0] 58.384953 23 79 54 33 77 88 97 68 02 00 fb c1 29 eb 27 b8 #yT3w..h....).'. 00 01 .. //Add MSFT address monitor (Sample address: B8:27:EB:29:C1:FB) < HCI Command: Vendor (0x3f|0x00f0) plen 13 #149 [hci0] 58.385067 03 b8 a4 03 ff 04 00 fb c1 29 eb 27 b8 .........).'. //Report to userspace about found device (ADV Monitor Device Found) @ MGMT Event: Unknown (0x002f) plen 38 {0x0003} [hci0] 58.680042 01 00 fb c1 29 eb 27 b8 01 ce 00 00 00 00 16 00 ....).'......... 0a 09 4b 45 59 42 44 5f 52 45 46 02 01 06 03 19 ..KEYBD_REF..... c1 03 03 03 12 18 ...... //Got event from address monitor > HCI Event: Vendor (0xff) plen 18 #152 [hci0] 58.672956 23 79 54 33 77 88 97 68 02 00 fb c1 29 eb 27 b8 #yT3w..h....).'. 01 01 Signed-off-by: Alex Lu Signed-off-by: Hilda Wu Reviewed-by: Simon Horman Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/hci.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include') diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h index ab2f8f1817cf..5723405b833e 100644 --- a/include/net/bluetooth/hci.h +++ b/include/net/bluetooth/hci.h @@ -309,6 +309,16 @@ enum { * to support it. */ HCI_QUIRK_BROKEN_SET_RPA_TIMEOUT, + + /* When this quirk is set, MSFT extension monitor tracking by + * address filter is supported. Since tracking quantity of each + * pattern is limited, this feature supports tracking multiple + * devices concurrently if controller supports multiple + * address filters. + * + * This quirk must be set before hci_register_dev is called. + */ + HCI_QUIRK_USE_MSFT_EXT_ADDRESS_FILTER, }; /* HCI device flags */ -- cgit v1.2.3 From a13f316e90fdb1fb6df6582e845aa9b3270f3581 Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz Date: Mon, 26 Jun 2023 17:25:06 -0700 Subject: Bluetooth: hci_conn: Consolidate code for aborting connections This consolidates code for aborting connections using hci_cmd_sync_queue so it is synchronized with other threads, but because of the fact that some commands may block the cmd_sync_queue while waiting specific events this attempt to cancel those requests by using hci_cmd_sync_cancel. Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/hci_core.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index 9140d4a80e38..2dd59e3a51b3 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -739,6 +739,7 @@ struct hci_conn { unsigned long flags; enum conn_reasons conn_reason; + __u8 abort_reason; __u32 clock; __u16 clock_accuracy; @@ -758,7 +759,6 @@ struct hci_conn { struct delayed_work auto_accept_work; struct delayed_work idle_work; struct delayed_work le_conn_timeout; - struct work_struct le_scan_cleanup; struct device dev; struct dentry *debugfs; -- cgit v1.2.3 From 9f78191cc9f1b34c2e2afd7b554a83bf034092dd Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz Date: Wed, 28 Jun 2023 12:15:53 -0700 Subject: Bluetooth: hci_conn: Always allocate unique handles This attempts to always allocate a unique handle for connections so they can be properly aborted by the likes of hci_abort_conn, so this uses the invalid range as a pool of unset handles that way if userspace is trying to create multiple connections at once each will be given a unique handle which will be considered unset. Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/hci_core.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index 2dd59e3a51b3..491ab83ccafc 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -321,8 +321,8 @@ struct adv_monitor { #define HCI_MAX_SHORT_NAME_LENGTH 10 -#define HCI_CONN_HANDLE_UNSET 0xffff #define HCI_CONN_HANDLE_MAX 0x0eff +#define HCI_CONN_HANDLE_UNSET(_handle) (_handle > HCI_CONN_HANDLE_MAX) /* Min encryption key size to match with SMP */ #define HCI_MIN_ENC_KEY_SIZE 7 -- cgit v1.2.3 From f777d88278170410b06a1f6633f3b9375a4ddd6b Mon Sep 17 00:00:00 2001 From: Iulia Tanasescu Date: Mon, 3 Jul 2023 10:02:38 +0300 Subject: Bluetooth: ISO: Notify user space about failed bis connections Some use cases require the user to be informed if BIG synchronization fails. This commit makes it so that even if the BIG sync established event arrives with error status, a new hconn is added for each BIS, and the iso layer is notified about the failed connections. Unsuccesful bis connections will be marked using the HCI_CONN_BIG_SYNC_FAILED flag. From the iso layer, the POLLERR event is triggered on the newly allocated bis sockets, before adding them to the accept list of the parent socket. From user space, a new fd for each failed bis connection will be obtained by calling accept. The user should check for the POLLERR event on the new socket, to determine if the connection was successful or not. The HCI_CONN_BIG_SYNC flag has been added to mark whether the BIG sync has been successfully established. This flag is checked at bis cleanup, so the HCI LE BIG Terminate Sync command is only issued if needed. The BT_SK_BIG_SYNC flag indicates if BIG create sync has been called for a listening socket, to avoid issuing the command everytime a BIGInfo advertising report is received. Signed-off-by: Iulia Tanasescu Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/hci_core.h | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) (limited to 'include') diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index 491ab83ccafc..105c1c394f82 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -976,6 +976,8 @@ enum { HCI_CONN_PER_ADV, HCI_CONN_BIG_CREATED, HCI_CONN_CREATE_CIS, + HCI_CONN_BIG_SYNC, + HCI_CONN_BIG_SYNC_FAILED, }; static inline bool hci_conn_ssp_enabled(struct hci_conn *conn) @@ -1286,6 +1288,29 @@ static inline struct hci_conn *hci_conn_hash_lookup_big(struct hci_dev *hdev, return NULL; } +static inline struct hci_conn *hci_conn_hash_lookup_big_any_dst(struct hci_dev *hdev, + __u8 handle) +{ + struct hci_conn_hash *h = &hdev->conn_hash; + struct hci_conn *c; + + rcu_read_lock(); + + list_for_each_entry_rcu(c, &h->list, list) { + if (c->type != ISO_LINK) + continue; + + if (handle == c->iso_qos.bcast.big) { + rcu_read_unlock(); + return c; + } + } + + rcu_read_unlock(); + + return NULL; +} + static inline struct hci_conn *hci_conn_hash_lookup_state(struct hci_dev *hdev, __u8 type, __u16 state) { -- cgit v1.2.3 From 112b5090c21905531314fee41f691f0317bbf4f6 Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz Date: Thu, 6 Jul 2023 12:06:32 -0700 Subject: Bluetooth: MGMT: Fix always using HCI_MAX_AD_LENGTH HCI_MAX_AD_LENGTH shall only be used if the controller doesn't support extended advertising, otherwise HCI_MAX_EXT_AD_LENGTH shall be used instead. Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/hci_core.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include') diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index 105c1c394f82..8200a6689b39 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -1801,6 +1801,10 @@ void hci_conn_del_sysfs(struct hci_conn *conn); /* Extended advertising support */ #define ext_adv_capable(dev) (((dev)->le_features[1] & HCI_LE_EXT_ADV)) +/* Maximum advertising length */ +#define max_adv_len(dev) \ + (ext_adv_capable(dev) ? HCI_MAX_EXT_AD_LENGTH : HCI_MAX_AD_LENGTH) + /* BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 4, Part E page 1789: * * C24: Mandatory if the LE Controller supports Connection State and either -- cgit v1.2.3 From 3f19ffb2f924db5b0925c77818d18ac1f6f08a44 Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz Date: Thu, 13 Jul 2023 13:41:31 -0700 Subject: Bluetooth: af_bluetooth: Make BT_PKT_STATUS generic This makes the handling of BT_PKT_STATUS more generic so it can be reused by sockets other than SCO like BT_DEFER_SETUP, etc. Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/bluetooth.h | 8 +++----- include/net/bluetooth/sco.h | 2 -- 2 files changed, 3 insertions(+), 7 deletions(-) (limited to 'include') diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h index 34998ae8ed78..aa90adc3b2a4 100644 --- a/include/net/bluetooth/bluetooth.h +++ b/include/net/bluetooth/bluetooth.h @@ -386,6 +386,7 @@ struct bt_sock { enum { BT_SK_DEFER_SETUP, BT_SK_SUSPEND, + BT_SK_PKT_STATUS }; struct bt_sock_list { @@ -432,10 +433,6 @@ struct l2cap_ctrl { struct l2cap_chan *chan; }; -struct sco_ctrl { - u8 pkt_status; -}; - struct hci_dev; typedef void (*hci_req_complete_t)(struct hci_dev *hdev, u8 status, u16 opcode); @@ -466,9 +463,9 @@ struct bt_skb_cb { u8 force_active; u16 expect; u8 incoming:1; + u8 pkt_status:2; union { struct l2cap_ctrl l2cap; - struct sco_ctrl sco; struct hci_ctrl hci; struct mgmt_ctrl mgmt; struct scm_creds creds; @@ -477,6 +474,7 @@ struct bt_skb_cb { #define bt_cb(skb) ((struct bt_skb_cb *)((skb)->cb)) #define hci_skb_pkt_type(skb) bt_cb((skb))->pkt_type +#define hci_skb_pkt_status(skb) bt_cb((skb))->pkt_status #define hci_skb_expect(skb) bt_cb((skb))->expect #define hci_skb_opcode(skb) bt_cb((skb))->hci.opcode #define hci_skb_event(skb) bt_cb((skb))->hci.req_event diff --git a/include/net/bluetooth/sco.h b/include/net/bluetooth/sco.h index 1aa2e14b6c94..f40ddb4264fc 100644 --- a/include/net/bluetooth/sco.h +++ b/include/net/bluetooth/sco.h @@ -46,6 +46,4 @@ struct sco_conninfo { __u8 dev_class[3]; }; -#define SCO_CMSG_PKT_STATUS 0x01 - #endif /* __SCO_H */ -- cgit v1.2.3 From 16e3b6429159795a87add7584eb100b19aa1d70b Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz Date: Thu, 3 Aug 2023 14:49:14 -0700 Subject: Bluetooth: hci_conn: Fix modifying handle while aborting This introduces hci_conn_set_handle which takes care of verifying the conditions where the hci_conn handle can be modified, including when hci_conn_abort has been called and also checks that the handles is valid as well. Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/hci_core.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index 8200a6689b39..d2a3a2a9fd7d 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -1425,6 +1425,7 @@ int hci_conn_switch_role(struct hci_conn *conn, __u8 role); void hci_conn_enter_active_mode(struct hci_conn *conn, __u8 force_active); void hci_conn_failed(struct hci_conn *conn, u8 status); +u8 hci_conn_set_handle(struct hci_conn *conn, u16 handle); /* * hci_conn_get() and hci_conn_put() are used to control the life-time of an -- cgit v1.2.3 From f88670161eb205f842989df555d0dd2f9fe2d4b5 Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz Date: Fri, 4 Aug 2023 11:03:43 -0700 Subject: Bluetooth: hci_core: Make hci_is_le_conn_scanning public This moves hci_is_le_conn_scanning to hci_core.h so it can be used by different files without having to duplicate its code. Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/hci_core.h | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) (limited to 'include') diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index d2a3a2a9fd7d..f4462c325e2a 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -1372,6 +1372,27 @@ static inline struct hci_conn *hci_lookup_le_connect(struct hci_dev *hdev) return NULL; } +/* Returns true if an le connection is in the scanning state */ +static inline bool hci_is_le_conn_scanning(struct hci_dev *hdev) +{ + struct hci_conn_hash *h = &hdev->conn_hash; + struct hci_conn *c; + + rcu_read_lock(); + + list_for_each_entry_rcu(c, &h->list, list) { + if (c->type == LE_LINK && c->state == BT_CONNECT && + test_bit(HCI_CONN_SCANNING, &c->flags)) { + rcu_read_unlock(); + return true; + } + } + + rcu_read_unlock(); + + return false; +} + int hci_disconnect(struct hci_conn *conn, __u8 reason); bool hci_setup_sync(struct hci_conn *conn, __u16 handle); void hci_sco_setup(struct hci_conn *conn, __u8 status); -- cgit v1.2.3 From a1f6c3aef13c9e7f8d459bd464e9e34da1342c0c Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz Date: Fri, 4 Aug 2023 16:23:41 -0700 Subject: Bluetooth: hci_sync: Introduce PTR_UINT/UINT_PTR macros This introduces PTR_UINT/UINT_PTR macros and replace the use of PTR_ERR/ERR_PTR. Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/hci_sync.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/net/bluetooth/hci_sync.h b/include/net/bluetooth/hci_sync.h index b516a0f4a55b..57eeb07aeb25 100644 --- a/include/net/bluetooth/hci_sync.h +++ b/include/net/bluetooth/hci_sync.h @@ -5,6 +5,9 @@ * Copyright (C) 2021 Intel Corporation */ +#define UINT_PTR(_handle) ((void *)((uintptr_t)_handle)) +#define PTR_UINT(_ptr) ((uintptr_t)((void *)_ptr)) + typedef int (*hci_cmd_sync_work_func_t)(struct hci_dev *hdev, void *data); typedef void (*hci_cmd_sync_work_destroy_t)(struct hci_dev *hdev, void *data, int err); -- cgit v1.2.3 From b5793de3cfaefef34a1fc9305c9fe3dbcd0ac792 Mon Sep 17 00:00:00 2001 From: Pauli Virtanen Date: Sat, 5 Aug 2023 19:08:42 +0300 Subject: Bluetooth: hci_conn: avoid checking uninitialized CIG/CIS ids The CIS/CIG ids of ISO connections are defined only when the connection is unicast. Fix the lookup functions to check for unicast first. Ensure CIG/CIS IDs have valid value also in state BT_OPEN. Signed-off-by: Pauli Virtanen Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/hci_core.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index f4462c325e2a..c53d74236e3a 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -1219,7 +1219,7 @@ static inline struct hci_conn *hci_conn_hash_lookup_cis(struct hci_dev *hdev, rcu_read_lock(); list_for_each_entry_rcu(c, &h->list, list) { - if (c->type != ISO_LINK) + if (c->type != ISO_LINK || !bacmp(&c->dst, BDADDR_ANY)) continue; /* Match CIG ID if set */ @@ -1251,7 +1251,7 @@ static inline struct hci_conn *hci_conn_hash_lookup_cig(struct hci_dev *hdev, rcu_read_lock(); list_for_each_entry_rcu(c, &h->list, list) { - if (c->type != ISO_LINK) + if (c->type != ISO_LINK || !bacmp(&c->dst, BDADDR_ANY)) continue; if (handle == c->iso_qos.ucast.cig) { -- cgit v1.2.3 From 01b8539655635288dcd46366806abfacbb9b1f6c Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Wed, 9 Aug 2023 22:05:56 +0800 Subject: bpf: Remove unused declaration bpf_link_new_file() Commit a3b80e107894 ("bpf: Allocate ID for bpf_link") removed the implementation but not the declaration. Signed-off-by: Yue Haibing Link: https://lore.kernel.org/r/20230809140556.45836-1-yuehaibing@huawei.com Signed-off-by: Martin KaFai Lau --- include/linux/bpf.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index db3fe5a61b05..cfabbcf47bdb 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2120,7 +2120,6 @@ void bpf_link_cleanup(struct bpf_link_primer *primer); void bpf_link_inc(struct bpf_link *link); void bpf_link_put(struct bpf_link *link); int bpf_link_new_fd(struct bpf_link *link); -struct file *bpf_link_new_file(struct bpf_link *link, int *reserved_fd); struct bpf_link *bpf_link_get_from_fd(u32 ufd); struct bpf_link *bpf_link_get_curr_or_next(u32 *id); -- cgit v1.2.3 From e2142825c120d4317abf7160a0fc34b3de532586 Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Fri, 11 Aug 2023 10:55:27 +0800 Subject: net: tcp: send zero-window ACK when no memory For now, skb will be dropped when no memory, which makes client keep retrans util timeout and it's not friendly to the users. In this patch, we reply an ACK with zero-window in this case to update the snd_wnd of the sender to 0. Therefore, the sender won't timeout the connection and will probe the zero-window with the retransmits. Signed-off-by: Menglong Dong Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller --- include/net/inet_connection_sock.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index c2b15f7e5516..be3c858a2ebb 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -164,7 +164,8 @@ enum inet_csk_ack_state_t { ICSK_ACK_TIMER = 2, ICSK_ACK_PUSHED = 4, ICSK_ACK_PUSHED2 = 8, - ICSK_ACK_NOW = 16 /* Send the next ACK immediately (once) */ + ICSK_ACK_NOW = 16, /* Send the next ACK immediately (once) */ + ICSK_ACK_NOMEM = 32, }; void inet_csk_init_xmit_timers(struct sock *sk, -- cgit v1.2.3 From f614a29d6ca6962139b0eb36b985e3dda80258a6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=B6rn-Thorben=20Hinz?= Date: Fri, 11 Aug 2023 19:33:57 +0200 Subject: net: Remove leftover include from nftables.h MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Commit db3685b4046f ("net: remove obsolete members from struct net") removed the uses of struct list_head from this header, without removing the corresponding included header. Signed-off-by: Jörn-Thorben Hinz Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/net/netns/nftables.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include') diff --git a/include/net/netns/nftables.h b/include/net/netns/nftables.h index 8c77832d0240..cc8060c017d5 100644 --- a/include/net/netns/nftables.h +++ b/include/net/netns/nftables.h @@ -2,8 +2,6 @@ #ifndef _NETNS_NFTABLES_H_ #define _NETNS_NFTABLES_H_ -#include - struct netns_nftables { u8 gencursor; }; -- cgit v1.2.3 From e6d360ff87f005e5b28edc26cb43718244ae7e73 Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Fri, 11 Aug 2023 17:57:17 +0200 Subject: net: factor out inet{,6}_bind_sk helpers The mptcp protocol maintains an additional socket just to easily invoke a few stream operations on the first subflow. One of them is bind(). Factor out the helpers operating directly on the struct sock, to allow get rid of the above dependency in the next patch without duplicating the existing code. No functional changes intended. Signed-off-by: Paolo Abeni Acked-by: Mat Martineau Signed-off-by: Matthieu Baerts Signed-off-by: David S. Miller --- include/net/inet_common.h | 1 + include/net/ipv6.h | 1 + 2 files changed, 2 insertions(+) (limited to 'include') diff --git a/include/net/inet_common.h b/include/net/inet_common.h index b86b8e21de7f..8e97de700991 100644 --- a/include/net/inet_common.h +++ b/include/net/inet_common.h @@ -42,6 +42,7 @@ int inet_shutdown(struct socket *sock, int how); int inet_listen(struct socket *sock, int backlog); void inet_sock_destruct(struct sock *sk); int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len); +int inet_bind_sk(struct sock *sk, struct sockaddr *uaddr, int addr_len); /* Don't allocate port at this moment, defer to connect. */ #define BIND_FORCE_ADDRESS_NO_PORT (1 << 0) /* Grab and release socket lock. */ diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 2acc4c808d45..22643ffc2df8 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -1216,6 +1216,7 @@ void inet6_cleanup_sock(struct sock *sk); void inet6_sock_destruct(struct sock *sk); int inet6_release(struct socket *sock); int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len); +int inet6_bind_sk(struct sock *sk, struct sockaddr *uaddr, int addr_len); int inet6_getname(struct socket *sock, struct sockaddr *uaddr, int peer); int inet6_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg); -- cgit v1.2.3 From 71a9a874cd6beb074896519f762741fbc448f5be Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Fri, 11 Aug 2023 17:57:19 +0200 Subject: net: factor out __inet_listen_sk() helper The mptcp protocol maintains an additional socket just to easily invoke a few stream operations on the first subflow. One of them is inet_listen(). Factor out an helper operating directly on the (locked) struct sock, to allow get rid of the above dependency in the next patch without duplicating the existing code. No functional changes intended. Signed-off-by: Paolo Abeni Acked-by: Mat Martineau Signed-off-by: Matthieu Baerts Signed-off-by: David S. Miller --- include/net/inet_common.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/net/inet_common.h b/include/net/inet_common.h index 8e97de700991..f50a644d87a9 100644 --- a/include/net/inet_common.h +++ b/include/net/inet_common.h @@ -40,6 +40,7 @@ int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, int flags); int inet_shutdown(struct socket *sock, int how); int inet_listen(struct socket *sock, int backlog); +int __inet_listen_sk(struct sock *sk, int backlog); void inet_sock_destruct(struct sock *sk); int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len); int inet_bind_sk(struct sock *sk, struct sockaddr *uaddr, int addr_len); -- cgit v1.2.3 From 9d802da40b7c820deb9c60fc394457ea565cafc8 Mon Sep 17 00:00:00 2001 From: Adrian Moreno Date: Fri, 11 Aug 2023 16:12:48 +0200 Subject: net: openvswitch: add last-action drop reason Create a new drop reason subsystem for openvswitch and add the first drop reason to represent last-action drops. Last-action drops happen when a flow has an empty action list or there is no action that consumes the packet (output, userspace, recirc, etc). It is the most common way in which OVS drops packets. Implementation-wise, most of these skb-consuming actions already call "consume_skb" internally and return directly from within the do_execute_actions() loop so with minimal changes we can assume that any skb that exits the loop normally is a packet drop. Signed-off-by: Adrian Moreno Signed-off-by: David S. Miller --- include/net/dropreason.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include') diff --git a/include/net/dropreason.h b/include/net/dropreason.h index 685fb37df8e8..56cb7be92244 100644 --- a/include/net/dropreason.h +++ b/include/net/dropreason.h @@ -23,6 +23,12 @@ enum skb_drop_reason_subsys { */ SKB_DROP_REASON_SUBSYS_MAC80211_MONITOR, + /** + * @SKB_DROP_REASON_SUBSYS_OPENVSWITCH: openvswitch drop reasons, + * see net/openvswitch/drop.h + */ + SKB_DROP_REASON_SUBSYS_OPENVSWITCH, + /** @SKB_DROP_REASON_SUBSYS_NUM: number of subsystems defined */ SKB_DROP_REASON_SUBSYS_NUM }; -- cgit v1.2.3 From e7bc7db9ba463e763ac6113279cade19da9cb939 Mon Sep 17 00:00:00 2001 From: Eric Garver Date: Fri, 11 Aug 2023 16:12:50 +0200 Subject: net: openvswitch: add explicit drop action From: Eric Garver This adds an explicit drop action. This is used by OVS to drop packets for which it cannot determine what to do. An explicit action in the kernel allows passing the reason _why_ the packet is being dropped or zero to indicate no particular error happened (i.e: OVS intentionally dropped the packet). Since the error codes coming from userspace mean nothing for the kernel, we squash all of them into only two drop reasons: - OVS_DROP_EXPLICIT_WITH_ERROR to indicate a non-zero value was passed - OVS_DROP_EXPLICIT to indicate a zero value was passed (no error) e.g. trace all OVS dropped skbs # perf trace -e skb:kfree_skb --filter="reason >= 0x30000" [..] 106.023 ping/2465 skb:kfree_skb(skbaddr: 0xffffa0e8765f2000, \ location:0xffffffffc0d9b462, protocol: 2048, reason: 196611) reason: 196611 --> 0x30003 (OVS_DROP_EXPLICIT) Also, this patch allows ovs-dpctl.py to add explicit drop actions as: "drop" -> implicit empty-action drop "drop(0)" -> explicit non-error action drop "drop(42)" -> explicit error action drop Signed-off-by: Eric Garver Co-developed-by: Adrian Moreno Signed-off-by: Adrian Moreno Signed-off-by: David S. Miller --- include/uapi/linux/openvswitch.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index e94870e77ee9..efc82c318fa2 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -965,6 +965,7 @@ struct check_pkt_len_arg { * start of the packet or at the start of the l3 header depending on the value * of l3 tunnel flag in the tun_flags field of OVS_ACTION_ATTR_ADD_MPLS * argument. + * @OVS_ACTION_ATTR_DROP: Explicit drop action. * * Only a single header can be set with a single %OVS_ACTION_ATTR_SET. Not all * fields within a header are modifiable, e.g. the IPv4 protocol and fragment @@ -1002,6 +1003,7 @@ enum ovs_action_attr { OVS_ACTION_ATTR_CHECK_PKT_LEN, /* Nested OVS_CHECK_PKT_LEN_ATTR_*. */ OVS_ACTION_ATTR_ADD_MPLS, /* struct ovs_action_add_mpls. */ OVS_ACTION_ATTR_DEC_TTL, /* Nested OVS_DEC_TTL_ATTR_*. */ + OVS_ACTION_ATTR_DROP, /* u32 error code. */ __OVS_ACTION_ATTR_MAX, /* Nothing past this will be accepted * from userspace. */ -- cgit v1.2.3 From 83b5f0253b1ef352f4333c4fb2d24eff23045f6b Mon Sep 17 00:00:00 2001 From: Gabor Juhos Date: Fri, 11 Aug 2023 13:10:07 +0200 Subject: net: phy: Introduce PSGMII PHY interface mode The PSGMII interface is similar to QSGMII. The main difference is that the PSGMII interface combines five SGMII lines into a single link while in QSGMII only four lines are combined. Similarly to the QSGMII, this interface mode might also needs special handling within the MAC driver. It is commonly used by Qualcomm with their QCA807x PHY series and modern WiSoC-s. Add definitions for the PHY layer to allow to express this type of connection between the MAC and PHY. Signed-off-by: Gabor Juhos Signed-off-by: Robert Marko Signed-off-by: David S. Miller --- include/linux/phy.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include') diff --git a/include/linux/phy.h b/include/linux/phy.h index 3c1ceedd1b77..1351b802ffcf 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -110,6 +110,7 @@ extern const int phy_10gbit_features_array[1]; * @PHY_INTERFACE_MODE_XGMII: 10 gigabit media-independent interface * @PHY_INTERFACE_MODE_XLGMII:40 gigabit media-independent interface * @PHY_INTERFACE_MODE_MOCA: Multimedia over Coax + * @PHY_INTERFACE_MODE_PSGMII: Penta SGMII * @PHY_INTERFACE_MODE_QSGMII: Quad SGMII * @PHY_INTERFACE_MODE_TRGMII: Turbo RGMII * @PHY_INTERFACE_MODE_100BASEX: 100 BaseX @@ -147,6 +148,7 @@ typedef enum { PHY_INTERFACE_MODE_XGMII, PHY_INTERFACE_MODE_XLGMII, PHY_INTERFACE_MODE_MOCA, + PHY_INTERFACE_MODE_PSGMII, PHY_INTERFACE_MODE_QSGMII, PHY_INTERFACE_MODE_TRGMII, PHY_INTERFACE_MODE_100BASEX, @@ -254,6 +256,8 @@ static inline const char *phy_modes(phy_interface_t interface) return "xlgmii"; case PHY_INTERFACE_MODE_MOCA: return "moca"; + case PHY_INTERFACE_MODE_PSGMII: + return "psgmii"; case PHY_INTERFACE_MODE_QSGMII: return "qsgmii"; case PHY_INTERFACE_MODE_TRGMII: -- cgit v1.2.3 From a9f168e4c6e1f623dcf2640b9d76a4f61b9731e5 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Wed, 31 May 2023 13:50:21 +0300 Subject: net/mlx5: Check with FW that sync reset completed successfully Even if the PF driver had no error on his part of the sync reset flow, the firmware can see wider picture as it syncs all the PFs in the flow. So add at end of sync reset flow check with firmware by reading MFRL register and initialization segment that the flow had no issue from firmware point of view too. Signed-off-by: Moshe Shemesh Reviewed-by: Shay Drory Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 87fd6f9ed82c..9aed7e9b9f29 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -10858,8 +10858,9 @@ enum { MLX5_MFRL_REG_RESET_STATE_IDLE = 0, MLX5_MFRL_REG_RESET_STATE_IN_NEGOTIATION = 1, MLX5_MFRL_REG_RESET_STATE_RESET_IN_PROGRESS = 2, - MLX5_MFRL_REG_RESET_STATE_TIMEOUT = 3, + MLX5_MFRL_REG_RESET_STATE_NEG_TIMEOUT = 3, MLX5_MFRL_REG_RESET_STATE_NACK = 4, + MLX5_MFRL_REG_RESET_STATE_UNLOAD_TIMEOUT = 5, }; enum { -- cgit v1.2.3 From 0b4eb603d635ca47c1c372f69b4b96672e4c2c05 Mon Sep 17 00:00:00 2001 From: Shay Drory Date: Sun, 2 Jan 2022 14:57:39 +0200 Subject: net/mlx5: Remove unused CAPs mlx5 driver queries the device for VECTOR_CALC and SHAMPO caps, but there isn't any user who requires them. As well as, MLX5_MCAM_REGS_0x9080_0x90FF is queried but not used. Thus, drop all usages and definitions of the mentioned caps above. Signed-off-by: Shay Drory Reviewed-by: Maher Sanalla Signed-off-by: Saeed Mahameed --- include/linux/mlx5/device.h | 17 +---------------- include/linux/mlx5/mlx5_ifc.h | 43 ------------------------------------------- 2 files changed, 1 insertion(+), 59 deletions(-) (limited to 'include') diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 80cc12a9a531..5041965250f6 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -1208,9 +1208,7 @@ enum mlx5_cap_type { MLX5_CAP_FLOW_TABLE, MLX5_CAP_ESWITCH_FLOW_TABLE, MLX5_CAP_ESWITCH, - MLX5_CAP_RESERVED, - MLX5_CAP_VECTOR_CALC, - MLX5_CAP_QOS, + MLX5_CAP_QOS = 0xc, MLX5_CAP_DEBUG, MLX5_CAP_RESERVED_14, MLX5_CAP_DEV_MEM, @@ -1220,7 +1218,6 @@ enum mlx5_cap_type { MLX5_CAP_DEV_EVENT = 0x14, MLX5_CAP_IPSEC, MLX5_CAP_CRYPTO = 0x1a, - MLX5_CAP_DEV_SHAMPO = 0x1d, MLX5_CAP_MACSEC = 0x1f, MLX5_CAP_GENERAL_2 = 0x20, MLX5_CAP_PORT_SELECTION = 0x25, @@ -1239,7 +1236,6 @@ enum mlx5_pcam_feature_groups { enum mlx5_mcam_reg_groups { MLX5_MCAM_REGS_FIRST_128 = 0x0, - MLX5_MCAM_REGS_0x9080_0x90FF = 0x1, MLX5_MCAM_REGS_0x9100_0x917F = 0x2, MLX5_MCAM_REGS_NUM = 0x3, }; @@ -1416,10 +1412,6 @@ enum mlx5_qcam_feature_groups { #define MLX5_CAP_ODP_MAX(mdev, cap)\ MLX5_GET(odp_cap, mdev->caps.hca[MLX5_CAP_ODP]->max, cap) -#define MLX5_CAP_VECTOR_CALC(mdev, cap) \ - MLX5_GET(vector_calc_cap, \ - mdev->caps.hca[MLX5_CAP_VECTOR_CALC]->cur, cap) - #define MLX5_CAP_QOS(mdev, cap)\ MLX5_GET(qos_cap, mdev->caps.hca[MLX5_CAP_QOS]->cur, cap) @@ -1436,10 +1428,6 @@ enum mlx5_qcam_feature_groups { MLX5_GET(mcam_reg, (mdev)->caps.mcam[MLX5_MCAM_REGS_FIRST_128], \ mng_access_reg_cap_mask.access_regs.reg) -#define MLX5_CAP_MCAM_REG1(mdev, reg) \ - MLX5_GET(mcam_reg, (mdev)->caps.mcam[MLX5_MCAM_REGS_0x9080_0x90FF], \ - mng_access_reg_cap_mask.access_regs1.reg) - #define MLX5_CAP_MCAM_REG2(mdev, reg) \ MLX5_GET(mcam_reg, (mdev)->caps.mcam[MLX5_MCAM_REGS_0x9100_0x917F], \ mng_access_reg_cap_mask.access_regs2.reg) @@ -1485,9 +1473,6 @@ enum mlx5_qcam_feature_groups { #define MLX5_CAP_CRYPTO(mdev, cap)\ MLX5_GET(crypto_cap, (mdev)->caps.hca[MLX5_CAP_CRYPTO]->cur, cap) -#define MLX5_CAP_DEV_SHAMPO(mdev, cap)\ - MLX5_GET(shampo_cap, mdev->caps.hca_cur[MLX5_CAP_DEV_SHAMPO], cap) - #define MLX5_CAP_MACSEC(mdev, cap)\ MLX5_GET(macsec_cap, (mdev)->caps.hca[MLX5_CAP_MACSEC]->cur, cap) diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 9aed7e9b9f29..08dcb1f43be7 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1314,33 +1314,6 @@ struct mlx5_ifc_odp_cap_bits { u8 reserved_at_120[0x6E0]; }; -struct mlx5_ifc_calc_op { - u8 reserved_at_0[0x10]; - u8 reserved_at_10[0x9]; - u8 op_swap_endianness[0x1]; - u8 op_min[0x1]; - u8 op_xor[0x1]; - u8 op_or[0x1]; - u8 op_and[0x1]; - u8 op_max[0x1]; - u8 op_add[0x1]; -}; - -struct mlx5_ifc_vector_calc_cap_bits { - u8 calc_matrix[0x1]; - u8 reserved_at_1[0x1f]; - u8 reserved_at_20[0x8]; - u8 max_vec_count[0x8]; - u8 reserved_at_30[0xd]; - u8 max_chunk_size[0x3]; - struct mlx5_ifc_calc_op calc0; - struct mlx5_ifc_calc_op calc1; - struct mlx5_ifc_calc_op calc2; - struct mlx5_ifc_calc_op calc3; - - u8 reserved_at_c0[0x720]; -}; - struct mlx5_ifc_tls_cap_bits { u8 tls_1_2_aes_gcm_128[0x1]; u8 tls_1_3_aes_gcm_128[0x1]; @@ -3435,20 +3408,6 @@ struct mlx5_ifc_roce_addr_layout_bits { u8 reserved_at_e0[0x20]; }; -struct mlx5_ifc_shampo_cap_bits { - u8 reserved_at_0[0x3]; - u8 shampo_log_max_reservation_size[0x5]; - u8 reserved_at_8[0x3]; - u8 shampo_log_min_reservation_size[0x5]; - u8 shampo_min_mss_size[0x10]; - - u8 reserved_at_20[0x3]; - u8 shampo_max_log_headers_entry_size[0x5]; - u8 reserved_at_28[0x18]; - - u8 reserved_at_40[0x7c0]; -}; - struct mlx5_ifc_crypto_cap_bits { u8 reserved_at_0[0x3]; u8 synchronize_dek[0x1]; @@ -3484,14 +3443,12 @@ union mlx5_ifc_hca_cap_union_bits { struct mlx5_ifc_flow_table_eswitch_cap_bits flow_table_eswitch_cap; struct mlx5_ifc_e_switch_cap_bits e_switch_cap; struct mlx5_ifc_port_selection_cap_bits port_selection_cap; - struct mlx5_ifc_vector_calc_cap_bits vector_calc_cap; struct mlx5_ifc_qos_cap_bits qos_cap; struct mlx5_ifc_debug_cap_bits debug_cap; struct mlx5_ifc_fpga_cap_bits fpga_cap; struct mlx5_ifc_tls_cap_bits tls_cap; struct mlx5_ifc_device_mem_cap_bits device_mem_cap; struct mlx5_ifc_virtio_emulation_cap_bits virtio_emulation_cap; - struct mlx5_ifc_shampo_cap_bits shampo_cap; struct mlx5_ifc_macsec_cap_bits macsec_cap; struct mlx5_ifc_crypto_cap_bits crypto_cap; u8 reserved_at_0[0x8000]; -- cgit v1.2.3 From a41cb59117fa12ee17cda5e5c781eecfcb15dc0f Mon Sep 17 00:00:00 2001 From: Shay Drory Date: Tue, 11 Jul 2023 15:56:08 +0300 Subject: net/mlx5: Remove unused MAX HCA capabilities Each device cap has two modes: MAX and CUR. The driver maintains a cache of both modes of the capabilities. For most device caps, the MAX cap mode is never used. Hence, remove all driver queries of the MAX mode of the said caps as well as their helper MACROs. Signed-off-by: Shay Drory Reviewed-by: Maher Sanalla Signed-off-by: Saeed Mahameed --- include/linux/mlx5/device.h | 52 --------------------------------------------- include/linux/mlx5/driver.h | 1 - 2 files changed, 53 deletions(-) (limited to 'include') diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 5041965250f6..93399802ba77 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -1275,10 +1275,6 @@ enum mlx5_qcam_feature_groups { MLX5_GET(per_protocol_networking_offload_caps,\ mdev->caps.hca[MLX5_CAP_ETHERNET_OFFLOADS]->cur, cap) -#define MLX5_CAP_ETH_MAX(mdev, cap) \ - MLX5_GET(per_protocol_networking_offload_caps,\ - mdev->caps.hca[MLX5_CAP_ETHERNET_OFFLOADS]->max, cap) - #define MLX5_CAP_IPOIB_ENHANCED(mdev, cap) \ MLX5_GET(per_protocol_networking_offload_caps,\ mdev->caps.hca[MLX5_CAP_IPOIB_ENHANCED_OFFLOADS]->cur, cap) @@ -1301,77 +1297,40 @@ enum mlx5_qcam_feature_groups { #define MLX5_CAP64_FLOWTABLE(mdev, cap) \ MLX5_GET64(flow_table_nic_cap, (mdev)->caps.hca[MLX5_CAP_FLOW_TABLE]->cur, cap) -#define MLX5_CAP_FLOWTABLE_MAX(mdev, cap) \ - MLX5_GET(flow_table_nic_cap, mdev->caps.hca[MLX5_CAP_FLOW_TABLE]->max, cap) - #define MLX5_CAP_FLOWTABLE_NIC_RX(mdev, cap) \ MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_receive.cap) -#define MLX5_CAP_FLOWTABLE_NIC_RX_MAX(mdev, cap) \ - MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_receive.cap) - #define MLX5_CAP_FLOWTABLE_NIC_TX(mdev, cap) \ MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_transmit.cap) -#define MLX5_CAP_FLOWTABLE_NIC_TX_MAX(mdev, cap) \ - MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_transmit.cap) - #define MLX5_CAP_FLOWTABLE_SNIFFER_RX(mdev, cap) \ MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_receive_sniffer.cap) -#define MLX5_CAP_FLOWTABLE_SNIFFER_RX_MAX(mdev, cap) \ - MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_receive_sniffer.cap) - #define MLX5_CAP_FLOWTABLE_SNIFFER_TX(mdev, cap) \ MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_transmit_sniffer.cap) -#define MLX5_CAP_FLOWTABLE_SNIFFER_TX_MAX(mdev, cap) \ - MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_transmit_sniffer.cap) - #define MLX5_CAP_FLOWTABLE_RDMA_RX(mdev, cap) \ MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_receive_rdma.cap) -#define MLX5_CAP_FLOWTABLE_RDMA_RX_MAX(mdev, cap) \ - MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_receive_rdma.cap) - #define MLX5_CAP_FLOWTABLE_RDMA_TX(mdev, cap) \ MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_transmit_rdma.cap) -#define MLX5_CAP_FLOWTABLE_RDMA_TX_MAX(mdev, cap) \ - MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_transmit_rdma.cap) - #define MLX5_CAP_ESW_FLOWTABLE(mdev, cap) \ MLX5_GET(flow_table_eswitch_cap, \ mdev->caps.hca[MLX5_CAP_ESWITCH_FLOW_TABLE]->cur, cap) -#define MLX5_CAP_ESW_FLOWTABLE_MAX(mdev, cap) \ - MLX5_GET(flow_table_eswitch_cap, \ - mdev->caps.hca[MLX5_CAP_ESWITCH_FLOW_TABLE]->max, cap) - #define MLX5_CAP_ESW_FLOWTABLE_FDB(mdev, cap) \ MLX5_CAP_ESW_FLOWTABLE(mdev, flow_table_properties_nic_esw_fdb.cap) -#define MLX5_CAP_ESW_FLOWTABLE_FDB_MAX(mdev, cap) \ - MLX5_CAP_ESW_FLOWTABLE_MAX(mdev, flow_table_properties_nic_esw_fdb.cap) - #define MLX5_CAP_ESW_EGRESS_ACL(mdev, cap) \ MLX5_CAP_ESW_FLOWTABLE(mdev, flow_table_properties_esw_acl_egress.cap) -#define MLX5_CAP_ESW_EGRESS_ACL_MAX(mdev, cap) \ - MLX5_CAP_ESW_FLOWTABLE_MAX(mdev, flow_table_properties_esw_acl_egress.cap) - #define MLX5_CAP_ESW_INGRESS_ACL(mdev, cap) \ MLX5_CAP_ESW_FLOWTABLE(mdev, flow_table_properties_esw_acl_ingress.cap) -#define MLX5_CAP_ESW_INGRESS_ACL_MAX(mdev, cap) \ - MLX5_CAP_ESW_FLOWTABLE_MAX(mdev, flow_table_properties_esw_acl_ingress.cap) - #define MLX5_CAP_ESW_FT_FIELD_SUPPORT_2(mdev, cap) \ MLX5_CAP_ESW_FLOWTABLE(mdev, ft_field_support_2_esw_fdb.cap) -#define MLX5_CAP_ESW_FT_FIELD_SUPPORT_2_MAX(mdev, cap) \ - MLX5_CAP_ESW_FLOWTABLE_MAX(mdev, ft_field_support_2_esw_fdb.cap) - #define MLX5_CAP_ESW(mdev, cap) \ MLX5_GET(e_switch_cap, \ mdev->caps.hca[MLX5_CAP_ESWITCH]->cur, cap) @@ -1380,10 +1339,6 @@ enum mlx5_qcam_feature_groups { MLX5_GET64(flow_table_eswitch_cap, \ (mdev)->caps.hca[MLX5_CAP_ESWITCH_FLOW_TABLE]->cur, cap) -#define MLX5_CAP_ESW_MAX(mdev, cap) \ - MLX5_GET(e_switch_cap, \ - mdev->caps.hca[MLX5_CAP_ESWITCH]->max, cap) - #define MLX5_CAP_PORT_SELECTION(mdev, cap) \ MLX5_GET(port_selection_cap, \ mdev->caps.hca[MLX5_CAP_PORT_SELECTION]->cur, cap) @@ -1396,16 +1351,9 @@ enum mlx5_qcam_feature_groups { MLX5_GET(adv_virtualization_cap, \ mdev->caps.hca[MLX5_CAP_ADV_VIRTUALIZATION]->cur, cap) -#define MLX5_CAP_ADV_VIRTUALIZATION_MAX(mdev, cap) \ - MLX5_GET(adv_virtualization_cap, \ - mdev->caps.hca[MLX5_CAP_ADV_VIRTUALIZATION]->max, cap) - #define MLX5_CAP_FLOWTABLE_PORT_SELECTION(mdev, cap) \ MLX5_CAP_PORT_SELECTION(mdev, flow_table_properties_port_selection.cap) -#define MLX5_CAP_FLOWTABLE_PORT_SELECTION_MAX(mdev, cap) \ - MLX5_CAP_PORT_SELECTION_MAX(mdev, flow_table_properties_port_selection.cap) - #define MLX5_CAP_ODP(mdev, cap)\ MLX5_GET(odp_cap, mdev->caps.hca[MLX5_CAP_ODP]->cur, cap) diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index e1c7e502a4fc..c9d82e74daaa 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1022,7 +1022,6 @@ bool mlx5_cmd_is_down(struct mlx5_core_dev *dev); void mlx5_core_uplink_netdev_set(struct mlx5_core_dev *mdev, struct net_device *netdev); void mlx5_core_uplink_netdev_event_replay(struct mlx5_core_dev *mdev); -int mlx5_core_get_caps(struct mlx5_core_dev *dev, enum mlx5_cap_type cap_type); void mlx5_health_cleanup(struct mlx5_core_dev *dev); int mlx5_health_init(struct mlx5_core_dev *dev); void mlx5_start_health_poll(struct mlx5_core_dev *dev); -- cgit v1.2.3 From bb48cf1679d294d4fd3bfaa88289ed9004cbb025 Mon Sep 17 00:00:00 2001 From: David Vernet Date: Mon, 14 Aug 2023 13:59:08 -0500 Subject: bpf: Document struct bpf_struct_ops fields Subsystems that want to implement a struct bpf_struct_ops structure to enable struct_ops maps must currently reverse engineer how the structure works. Given that this is meant to be a way for subsystem maintainers to extend their subsystems using BPF, let's document it to make it a bit easier on them. Signed-off-by: David Vernet Link: https://lore.kernel.org/r/20230814185908.700553-3-void@manifault.com Signed-off-by: Martin KaFai Lau --- include/linux/bpf.h | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) (limited to 'include') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index cfabbcf47bdb..eced6400f778 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1550,6 +1550,53 @@ struct bpf_struct_ops_value; struct btf_member; #define BPF_STRUCT_OPS_MAX_NR_MEMBERS 64 +/** + * struct bpf_struct_ops - A structure of callbacks allowing a subsystem to + * define a BPF_MAP_TYPE_STRUCT_OPS map type composed + * of BPF_PROG_TYPE_STRUCT_OPS progs. + * @verifier_ops: A structure of callbacks that are invoked by the verifier + * when determining whether the struct_ops progs in the + * struct_ops map are valid. + * @init: A callback that is invoked a single time, and before any other + * callback, to initialize the structure. A nonzero return value means + * the subsystem could not be initialized. + * @check_member: When defined, a callback invoked by the verifier to allow + * the subsystem to determine if an entry in the struct_ops map + * is valid. A nonzero return value means that the map is + * invalid and should be rejected by the verifier. + * @init_member: A callback that is invoked for each member of the struct_ops + * map to allow the subsystem to initialize the member. A nonzero + * value means the member could not be initialized. This callback + * is exclusive with the @type, @type_id, @value_type, and + * @value_id fields. + * @reg: A callback that is invoked when the struct_ops map has been + * initialized and is being attached to. Zero means the struct_ops map + * has been successfully registered and is live. A nonzero return value + * means the struct_ops map could not be registered. + * @unreg: A callback that is invoked when the struct_ops map should be + * unregistered. + * @update: A callback that is invoked when the live struct_ops map is being + * updated to contain new values. This callback is only invoked when + * the struct_ops map is loaded with BPF_F_LINK. If not defined, the + * it is assumed that the struct_ops map cannot be updated. + * @validate: A callback that is invoked after all of the members have been + * initialized. This callback should perform static checks on the + * map, meaning that it should either fail or succeed + * deterministically. A struct_ops map that has been validated may + * not necessarily succeed in being registered if the call to @reg + * fails. For example, a valid struct_ops map may be loaded, but + * then fail to be registered due to there being another active + * struct_ops map on the system in the subsystem already. For this + * reason, if this callback is not defined, the check is skipped as + * the struct_ops map will have final verification performed in + * @reg. + * @type: BTF type. + * @value_type: Value type. + * @name: The name of the struct bpf_struct_ops object. + * @func_models: Func models + * @type_id: BTF type id. + * @value_id: BTF value id. + */ struct bpf_struct_ops { const struct bpf_verifier_ops *verifier_ops; int (*init)(struct btf *btf); -- cgit v1.2.3 From 8897562f67b3e61ad736cd5c9f307447d33280e4 Mon Sep 17 00:00:00 2001 From: Lorenz Bauer Date: Tue, 15 Aug 2023 09:53:41 +0100 Subject: net: Fix slab-out-of-bounds in inet[6]_steal_sock Kumar reported a KASAN splat in tcp_v6_rcv: bash-5.2# ./test_progs -t btf_skc_cls_ingress ... [ 51.810085] BUG: KASAN: slab-out-of-bounds in tcp_v6_rcv+0x2d7d/0x3440 [ 51.810458] Read of size 2 at addr ffff8881053f038c by task test_progs/226 The problem is that inet[6]_steal_sock accesses sk->sk_protocol without accounting for request or timewait sockets. To fix this we can't just check sock_common->skc_reuseport since that flag is present on timewait sockets. Instead, add a fullsock check to avoid the out of bands access of sk_protocol. Fixes: 9c02bec95954 ("bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign") Reported-by: Kumar Kartikeya Dwivedi Signed-off-by: Lorenz Bauer Link: https://lore.kernel.org/r/20230815-bpf-next-v2-1-95126eaa4c1b@isovalent.com Signed-off-by: Martin KaFai Lau --- include/net/inet6_hashtables.h | 2 +- include/net/inet_hashtables.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h index 284b5ce7205d..533a7337865a 100644 --- a/include/net/inet6_hashtables.h +++ b/include/net/inet6_hashtables.h @@ -116,7 +116,7 @@ struct sock *inet6_steal_sock(struct net *net, struct sk_buff *skb, int doff, if (!sk) return NULL; - if (!prefetched) + if (!prefetched || !sk_fullsock(sk)) return sk; if (sk->sk_protocol == IPPROTO_TCP) { diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 843557223414..3ecfeadbfa06 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -462,7 +462,7 @@ struct sock *inet_steal_sock(struct net *net, struct sk_buff *skb, int doff, if (!sk) return NULL; - if (!prefetched) + if (!prefetched || !sk_fullsock(sk)) return sk; if (sk->sk_protocol == IPPROTO_TCP) { -- cgit v1.2.3 From fde9bd4a4d41b65a936d65eb416c1de27cb562f1 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 14 Aug 2023 14:47:15 -0700 Subject: genetlink: make genl_info->nlhdr const struct netlink_callback has a const nlh pointer, make the pointer in struct genl_info const as well, to make copying between the two easier. Reviewed-by: Jiri Pirko Link: https://lore.kernel.org/r/20230814214723.2924989-3-kuba@kernel.org Signed-off-by: Jakub Kicinski --- include/net/genetlink.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/genetlink.h b/include/net/genetlink.h index ed4622dd4828..0366d0925596 100644 --- a/include/net/genetlink.h +++ b/include/net/genetlink.h @@ -104,7 +104,7 @@ struct genl_family { struct genl_info { u32 snd_seq; u32 snd_portid; - struct nlmsghdr * nlhdr; + const struct nlmsghdr * nlhdr; struct genlmsghdr * genlhdr; void * userhdr; struct nlattr ** attrs; -- cgit v1.2.3 From bffcc6882a1bb2be8c9420184966f4c2c822078e Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 14 Aug 2023 14:47:16 -0700 Subject: genetlink: remove userhdr from struct genl_info Only three families use info->userhdr today and going forward we discourage using fixed headers in new families. So having the pointer to user header in struct genl_info is an overkill. Compute the header pointer at runtime. Reviewed-by: Johannes Berg Reviewed-by: Jiri Pirko Reviewed-by: Aaron Conole Link: https://lore.kernel.org/r/20230814214723.2924989-4-kuba@kernel.org Signed-off-by: Jakub Kicinski --- include/net/genetlink.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/net/genetlink.h b/include/net/genetlink.h index 0366d0925596..9dc21ec15734 100644 --- a/include/net/genetlink.h +++ b/include/net/genetlink.h @@ -95,7 +95,6 @@ struct genl_family { * @snd_portid: netlink portid of sender * @nlhdr: netlink message header * @genlhdr: generic netlink message header - * @userhdr: user specific header * @attrs: netlink attributes * @_net: network namespace * @user_ptr: user pointers @@ -106,7 +105,6 @@ struct genl_info { u32 snd_portid; const struct nlmsghdr * nlhdr; struct genlmsghdr * genlhdr; - void * userhdr; struct nlattr ** attrs; possible_net_t _net; void * user_ptr[2]; @@ -123,6 +121,11 @@ static inline void genl_info_net_set(struct genl_info *info, struct net *net) write_pnet(&info->_net, net); } +static inline void *genl_info_userhdr(const struct genl_info *info) +{ + return (u8 *)info->genlhdr + GENL_HDRLEN; +} + #define GENL_SET_ERR_MSG(info, msg) NL_SET_ERR_MSG((info)->extack, msg) #define GENL_SET_ERR_MSG_FMT(info, msg, args...) \ -- cgit v1.2.3 From 9272af109fe65d1a13f28c5c13777b62d3e97e8c Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 14 Aug 2023 14:47:17 -0700 Subject: genetlink: add struct genl_info to struct genl_dumpit_info Netlink GET implementations must currently juggle struct genl_info and struct netlink_callback, depending on whether they were called from doit or dumpit. Add genl_info to the dump state and populate the fields. This way implementations can simply pass struct genl_info around. Reviewed-by: Jiri Pirko Link: https://lore.kernel.org/r/20230814214723.2924989-5-kuba@kernel.org Signed-off-by: Jakub Kicinski --- include/net/genetlink.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include') diff --git a/include/net/genetlink.h b/include/net/genetlink.h index 9dc21ec15734..86c8eaaa3a43 100644 --- a/include/net/genetlink.h +++ b/include/net/genetlink.h @@ -250,11 +250,13 @@ struct genl_split_ops { * @family: generic netlink family - for internal genl code usage * @op: generic netlink ops - for internal genl code usage * @attrs: netlink attributes + * @info: struct genl_info describing the request */ struct genl_dumpit_info { const struct genl_family *family; struct genl_split_ops op; struct nlattr **attrs; + struct genl_info info; }; static inline const struct genl_dumpit_info * @@ -263,6 +265,12 @@ genl_dumpit_info(struct netlink_callback *cb) return cb->data; } +static inline const struct genl_info * +genl_info_dump(struct netlink_callback *cb) +{ + return &genl_dumpit_info(cb)->info; +} + int genl_register_family(struct genl_family *family); int genl_unregister_family(const struct genl_family *family); void genl_notify(const struct genl_family *family, struct sk_buff *skb, -- cgit v1.2.3 From 7288dd2fd4888c85c687f8ded69c280938d1a7b6 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 14 Aug 2023 14:47:18 -0700 Subject: genetlink: use attrs from struct genl_info Since dumps carry struct genl_info now, use the attrs pointer from genl_info and remove the one in struct genl_dumpit_info. Reviewed-by: Johannes Berg Reviewed-by: Miquel Raynal Reviewed-by: Jiri Pirko Link: https://lore.kernel.org/r/20230814214723.2924989-6-kuba@kernel.org Signed-off-by: Jakub Kicinski --- include/net/genetlink.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include') diff --git a/include/net/genetlink.h b/include/net/genetlink.h index 86c8eaaa3a43..a8a15b9c22c8 100644 --- a/include/net/genetlink.h +++ b/include/net/genetlink.h @@ -255,7 +255,6 @@ struct genl_split_ops { struct genl_dumpit_info { const struct genl_family *family; struct genl_split_ops op; - struct nlattr **attrs; struct genl_info info; }; -- cgit v1.2.3 From 5c670a010de46687ed27553602d8131ce4d7a9fb Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 14 Aug 2023 14:47:19 -0700 Subject: genetlink: add a family pointer to struct genl_info Having family in struct genl_info is quite useful. It cuts down the number of arguments which need to be passed to helpers which already take struct genl_info. Reviewed-by: Jiri Pirko Link: https://lore.kernel.org/r/20230814214723.2924989-7-kuba@kernel.org Signed-off-by: Jakub Kicinski --- include/net/genetlink.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/net/genetlink.h b/include/net/genetlink.h index a8a15b9c22c8..6b858c4cba5b 100644 --- a/include/net/genetlink.h +++ b/include/net/genetlink.h @@ -93,6 +93,7 @@ struct genl_family { * struct genl_info - receiving information * @snd_seq: sending sequence number * @snd_portid: netlink portid of sender + * @family: generic netlink family * @nlhdr: netlink message header * @genlhdr: generic netlink message header * @attrs: netlink attributes @@ -103,6 +104,7 @@ struct genl_family { struct genl_info { u32 snd_seq; u32 snd_portid; + const struct genl_family *family; const struct nlmsghdr * nlhdr; struct genlmsghdr * genlhdr; struct nlattr ** attrs; @@ -247,13 +249,11 @@ struct genl_split_ops { /** * struct genl_dumpit_info - info that is available during dumpit op call - * @family: generic netlink family - for internal genl code usage * @op: generic netlink ops - for internal genl code usage * @attrs: netlink attributes * @info: struct genl_info describing the request */ struct genl_dumpit_info { - const struct genl_family *family; struct genl_split_ops op; struct genl_info info; }; -- cgit v1.2.3 From 5aa51d9f889ca61201044b28677eb60236b0d746 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 14 Aug 2023 14:47:20 -0700 Subject: genetlink: add genlmsg_iput() API Add some APIs and helpers required for convenient construction of replies and notifications based on struct genl_info. Reviewed-by: Jiri Pirko Link: https://lore.kernel.org/r/20230814214723.2924989-8-kuba@kernel.org Signed-off-by: Jakub Kicinski --- include/net/genetlink.h | 54 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 53 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/genetlink.h b/include/net/genetlink.h index 6b858c4cba5b..e18a4c0d69ee 100644 --- a/include/net/genetlink.h +++ b/include/net/genetlink.h @@ -113,7 +113,7 @@ struct genl_info { struct netlink_ext_ack *extack; }; -static inline struct net *genl_info_net(struct genl_info *info) +static inline struct net *genl_info_net(const struct genl_info *info) { return read_pnet(&info->_net); } @@ -270,6 +270,32 @@ genl_info_dump(struct netlink_callback *cb) return &genl_dumpit_info(cb)->info; } +/** + * genl_info_init_ntf() - initialize genl_info for notifications + * @info: genl_info struct to set up + * @family: pointer to the genetlink family + * @cmd: command to be used in the notification + * + * Initialize a locally declared struct genl_info to pass to various APIs. + * Intended to be used when creating notifications. + */ +static inline void +genl_info_init_ntf(struct genl_info *info, const struct genl_family *family, + u8 cmd) +{ + struct genlmsghdr *hdr = (void *) &info->user_ptr[0]; + + memset(info, 0, sizeof(*info)); + info->family = family; + info->genlhdr = hdr; + hdr->cmd = cmd; +} + +static inline bool genl_info_is_ntf(const struct genl_info *info) +{ + return !info->nlhdr; +} + int genl_register_family(struct genl_family *family); int genl_unregister_family(const struct genl_family *family); void genl_notify(const struct genl_family *family, struct sk_buff *skb, @@ -278,6 +304,32 @@ void genl_notify(const struct genl_family *family, struct sk_buff *skb, void *genlmsg_put(struct sk_buff *skb, u32 portid, u32 seq, const struct genl_family *family, int flags, u8 cmd); +static inline void * +__genlmsg_iput(struct sk_buff *skb, const struct genl_info *info, int flags) +{ + return genlmsg_put(skb, info->snd_portid, info->snd_seq, info->family, + flags, info->genlhdr->cmd); +} + +/** + * genlmsg_iput - start genetlink message based on genl_info + * @skb: skb in which message header will be placed + * @info: genl_info as provided to do/dump handlers + * + * Convenience wrapper which starts a genetlink message based on + * information in user request. @info should be either the struct passed + * by genetlink core to do/dump handlers (when constructing replies to + * such requests) or a struct initialized by genl_info_init_ntf() + * when constructing notifications. + * + * Returns pointer to new genetlink header. + */ +static inline void * +genlmsg_iput(struct sk_buff *skb, const struct genl_info *info) +{ + return __genlmsg_iput(skb, info, 0); +} + /** * genlmsg_nlhdr - Obtain netlink header from user specified header * @user_hdr: user header as returned from genlmsg_put() -- cgit v1.2.3 From c274af2242693f59f00851b3660a21b10bcd76cc Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 16 Aug 2023 08:15:33 +0000 Subject: inet: introduce inet->inet_flags Various inet fields are currently racy. do_ip_setsockopt() and do_ip_getsockopt() are mostly holding the socket lock, but some (fast) paths do not. Use a new inet->inet_flags to hold atomic bits in the series. Remove inet->cmsg_flags, and use instead 9 bits from inet_flags. Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/net/inet_sock.h | 55 +++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 44 insertions(+), 11 deletions(-) (limited to 'include') diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index 0bb32bfc6183..e3b35b0015f3 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -194,6 +194,7 @@ struct rtable; * @inet_rcv_saddr - Bound local IPv4 addr * @inet_dport - Destination port * @inet_num - Local port + * @inet_flags - various atomic flags * @inet_saddr - Sending source * @uc_ttl - Unicast TTL * @inet_sport - Source port @@ -218,11 +219,11 @@ struct inet_sock { #define inet_dport sk.__sk_common.skc_dport #define inet_num sk.__sk_common.skc_num + unsigned long inet_flags; __be32 inet_saddr; __s16 uc_ttl; - __u16 cmsg_flags; - struct ip_options_rcu __rcu *inet_opt; __be16 inet_sport; + struct ip_options_rcu __rcu *inet_opt; __u16 inet_id; __u8 tos; @@ -259,16 +260,48 @@ struct inet_sock { #define IPCORK_OPT 1 /* ip-options has been held in ipcork.opt */ #define IPCORK_ALLFRAG 2 /* always fragment (for ipv6 for now) */ +enum { + INET_FLAGS_PKTINFO = 0, + INET_FLAGS_TTL = 1, + INET_FLAGS_TOS = 2, + INET_FLAGS_RECVOPTS = 3, + INET_FLAGS_RETOPTS = 4, + INET_FLAGS_PASSSEC = 5, + INET_FLAGS_ORIGDSTADDR = 6, + INET_FLAGS_CHECKSUM = 7, + INET_FLAGS_RECVFRAGSIZE = 8, +}; + /* cmsg flags for inet */ -#define IP_CMSG_PKTINFO BIT(0) -#define IP_CMSG_TTL BIT(1) -#define IP_CMSG_TOS BIT(2) -#define IP_CMSG_RECVOPTS BIT(3) -#define IP_CMSG_RETOPTS BIT(4) -#define IP_CMSG_PASSSEC BIT(5) -#define IP_CMSG_ORIGDSTADDR BIT(6) -#define IP_CMSG_CHECKSUM BIT(7) -#define IP_CMSG_RECVFRAGSIZE BIT(8) +#define IP_CMSG_PKTINFO BIT(INET_FLAGS_PKTINFO) +#define IP_CMSG_TTL BIT(INET_FLAGS_TTL) +#define IP_CMSG_TOS BIT(INET_FLAGS_TOS) +#define IP_CMSG_RECVOPTS BIT(INET_FLAGS_RECVOPTS) +#define IP_CMSG_RETOPTS BIT(INET_FLAGS_RETOPTS) +#define IP_CMSG_PASSSEC BIT(INET_FLAGS_PASSSEC) +#define IP_CMSG_ORIGDSTADDR BIT(INET_FLAGS_ORIGDSTADDR) +#define IP_CMSG_CHECKSUM BIT(INET_FLAGS_CHECKSUM) +#define IP_CMSG_RECVFRAGSIZE BIT(INET_FLAGS_RECVFRAGSIZE) + +#define IP_CMSG_ALL (IP_CMSG_PKTINFO | IP_CMSG_TTL | \ + IP_CMSG_TOS | IP_CMSG_RECVOPTS | \ + IP_CMSG_RETOPTS | IP_CMSG_PASSSEC | \ + IP_CMSG_ORIGDSTADDR | IP_CMSG_CHECKSUM | \ + IP_CMSG_RECVFRAGSIZE) + +static inline unsigned long inet_cmsg_flags(const struct inet_sock *inet) +{ + return READ_ONCE(inet->inet_flags) & IP_CMSG_ALL; +} + +#define inet_test_bit(nr, sk) \ + test_bit(INET_FLAGS_##nr, &inet_sk(sk)->inet_flags) +#define inet_set_bit(nr, sk) \ + set_bit(INET_FLAGS_##nr, &inet_sk(sk)->inet_flags) +#define inet_clear_bit(nr, sk) \ + clear_bit(INET_FLAGS_##nr, &inet_sk(sk)->inet_flags) +#define inet_assign_bit(nr, sk, val) \ + assign_bit(INET_FLAGS_##nr, &inet_sk(sk)->inet_flags, val) static inline bool sk_is_inet(struct sock *sk) { -- cgit v1.2.3 From 6b5f43ea08150e7ff72f734545101c58489ead5b Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 16 Aug 2023 08:15:35 +0000 Subject: inet: move inet->recverr to inet->inet_flags IP_RECVERR socket option can now be set/get without locking the socket. This patch potentially avoid data-races around inet->recverr. Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/net/inet_sock.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index e3b35b0015f3..552188aa5a2d 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -230,8 +230,7 @@ struct inet_sock { __u8 min_ttl; __u8 mc_ttl; __u8 pmtudisc; - __u8 recverr:1, - is_icsk:1, + __u8 is_icsk:1, freebind:1, hdrincl:1, mc_loop:1, @@ -270,6 +269,8 @@ enum { INET_FLAGS_ORIGDSTADDR = 6, INET_FLAGS_CHECKSUM = 7, INET_FLAGS_RECVFRAGSIZE = 8, + + INET_FLAGS_RECVERR = 9, }; /* cmsg flags for inet */ -- cgit v1.2.3 From 8e8cfb114d9f1e2efddb892f993c1ad61635c85e Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 16 Aug 2023 08:15:36 +0000 Subject: inet: move inet->recverr_rfc4884 to inet->inet_flags IP_RECVERR_RFC4884 socket option can now be set/read without locking the socket. Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/net/inet_sock.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index 552188aa5a2d..c01f1f64a861 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -238,7 +238,6 @@ struct inet_sock { mc_all:1, nodefrag:1; __u8 bind_address_no_port:1, - recverr_rfc4884:1, defer_connect:1; /* Indicates that fastopen_connect is set * and cookie exists so we defer connect * until first data frame is written @@ -271,6 +270,7 @@ enum { INET_FLAGS_RECVFRAGSIZE = 8, INET_FLAGS_RECVERR = 9, + INET_FLAGS_RECVERR_RFC4884 = 10, }; /* cmsg flags for inet */ -- cgit v1.2.3 From 3f7e753206bb20fc098b44ec40001befd1fe18d1 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 16 Aug 2023 08:15:37 +0000 Subject: inet: move inet->freebind to inet->inet_flags IP_FREEBIND socket option can now be set/read without locking the socket. Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Reviewed-by: Simon Horman Reviewed-by: Matthieu Baerts Signed-off-by: David S. Miller --- include/net/inet_sock.h | 5 +++-- include/net/ipv6.h | 3 ++- 2 files changed, 5 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index c01f1f64a861..d6ba963534b4 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -231,7 +231,6 @@ struct inet_sock { __u8 mc_ttl; __u8 pmtudisc; __u8 is_icsk:1, - freebind:1, hdrincl:1, mc_loop:1, transparent:1, @@ -271,6 +270,7 @@ enum { INET_FLAGS_RECVERR = 9, INET_FLAGS_RECVERR_RFC4884 = 10, + INET_FLAGS_FREEBIND = 11, }; /* cmsg flags for inet */ @@ -423,7 +423,8 @@ static inline bool inet_can_nonlocal_bind(struct net *net, struct inet_sock *inet) { return READ_ONCE(net->ipv4.sysctl_ip_nonlocal_bind) || - inet->freebind || inet->transparent; + test_bit(INET_FLAGS_FREEBIND, &inet->inet_flags) || + inet->transparent; } static inline bool inet_addr_valid_or_nonlocal(struct net *net, diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 22643ffc2df8..fd570d77588e 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -937,7 +937,8 @@ static inline bool ipv6_can_nonlocal_bind(struct net *net, struct inet_sock *inet) { return net->ipv6.sysctl.ip_nonlocal_bind || - inet->freebind || inet->transparent; + test_bit(INET_FLAGS_FREEBIND, &inet->inet_flags) || + inet->transparent; } /* Sysctl settings for net ipv6.auto_flowlabels */ -- cgit v1.2.3 From cafbe182a467bf6799242fd7468438cf1ab833dc Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 16 Aug 2023 08:15:38 +0000 Subject: inet: move inet->hdrincl to inet->inet_flags IP_HDRINCL socket option can now be set/read without locking the socket. Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/net/inet_sock.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index d6ba963534b4..ad1895e32e7d 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -231,7 +231,6 @@ struct inet_sock { __u8 mc_ttl; __u8 pmtudisc; __u8 is_icsk:1, - hdrincl:1, mc_loop:1, transparent:1, mc_all:1, @@ -271,6 +270,7 @@ enum { INET_FLAGS_RECVERR = 9, INET_FLAGS_RECVERR_RFC4884 = 10, INET_FLAGS_FREEBIND = 11, + INET_FLAGS_HDRINCL = 12, }; /* cmsg flags for inet */ @@ -397,7 +397,7 @@ static inline __u8 inet_sk_flowi_flags(const struct sock *sk) { __u8 flags = 0; - if (inet_sk(sk)->transparent || inet_sk(sk)->hdrincl) + if (inet_sk(sk)->transparent || inet_test_bit(HDRINCL, sk)) flags |= FLOWI_FLAG_ANYSRC; return flags; } -- cgit v1.2.3 From b09bde5c3554d58cb711dac55a10ecc56d436966 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 16 Aug 2023 08:15:39 +0000 Subject: inet: move inet->mc_loop to inet->inet_frags IP_MULTICAST_LOOP socket option can now be set/read without locking the socket. v3: fix build bot error reported in ipvs set_mcast_loop() Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/net/inet_sock.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index ad1895e32e7d..6c4eeca59f60 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -231,7 +231,6 @@ struct inet_sock { __u8 mc_ttl; __u8 pmtudisc; __u8 is_icsk:1, - mc_loop:1, transparent:1, mc_all:1, nodefrag:1; @@ -271,6 +270,7 @@ enum { INET_FLAGS_RECVERR_RFC4884 = 10, INET_FLAGS_FREEBIND = 11, INET_FLAGS_HDRINCL = 12, + INET_FLAGS_MC_LOOP = 13, }; /* cmsg flags for inet */ -- cgit v1.2.3 From 307b4ac6dc18436076cdd314aa3e556be077bf2d Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 16 Aug 2023 08:15:40 +0000 Subject: inet: move inet->mc_all to inet->inet_frags IP_MULTICAST_ALL socket option can now be set/read without locking the socket. Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/net/inet_sock.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index 6c4eeca59f60..fffd34fa6a7c 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -232,7 +232,6 @@ struct inet_sock { __u8 pmtudisc; __u8 is_icsk:1, transparent:1, - mc_all:1, nodefrag:1; __u8 bind_address_no_port:1, defer_connect:1; /* Indicates that fastopen_connect is set @@ -271,6 +270,7 @@ enum { INET_FLAGS_FREEBIND = 11, INET_FLAGS_HDRINCL = 12, INET_FLAGS_MC_LOOP = 13, + INET_FLAGS_MC_ALL = 14, }; /* cmsg flags for inet */ -- cgit v1.2.3 From 4bd0623f04eef65c0a324000fad73c4d3a677f8e Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 16 Aug 2023 08:15:41 +0000 Subject: inet: move inet->transparent to inet->inet_flags IP_TRANSPARENT socket option can now be set/read without locking the socket. v2: removed unused issk variable in mptcp_setsockopt_sol_ip_set_transparent() v4: rebased after commit 3f326a821b99 ("mptcp: change the mpc check helper to return a sk") Signed-off-by: Eric Dumazet Cc: Paolo Abeni Acked-by: Soheil Hassas Yeganeh Reviewed-by: Simon Horman Reviewed-by: Matthieu Baerts Signed-off-by: David S. Miller --- include/net/inet_sock.h | 6 +++--- include/net/ipv6.h | 2 +- include/net/route.h | 2 +- include/net/tcp.h | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) (limited to 'include') diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index fffd34fa6a7c..cefd9a60dc6d 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -231,7 +231,6 @@ struct inet_sock { __u8 mc_ttl; __u8 pmtudisc; __u8 is_icsk:1, - transparent:1, nodefrag:1; __u8 bind_address_no_port:1, defer_connect:1; /* Indicates that fastopen_connect is set @@ -271,6 +270,7 @@ enum { INET_FLAGS_HDRINCL = 12, INET_FLAGS_MC_LOOP = 13, INET_FLAGS_MC_ALL = 14, + INET_FLAGS_TRANSPARENT = 15, }; /* cmsg flags for inet */ @@ -397,7 +397,7 @@ static inline __u8 inet_sk_flowi_flags(const struct sock *sk) { __u8 flags = 0; - if (inet_sk(sk)->transparent || inet_test_bit(HDRINCL, sk)) + if (inet_test_bit(TRANSPARENT, sk) || inet_test_bit(HDRINCL, sk)) flags |= FLOWI_FLAG_ANYSRC; return flags; } @@ -424,7 +424,7 @@ static inline bool inet_can_nonlocal_bind(struct net *net, { return READ_ONCE(net->ipv4.sysctl_ip_nonlocal_bind) || test_bit(INET_FLAGS_FREEBIND, &inet->inet_flags) || - inet->transparent; + test_bit(INET_FLAGS_TRANSPARENT, &inet->inet_flags); } static inline bool inet_addr_valid_or_nonlocal(struct net *net, diff --git a/include/net/ipv6.h b/include/net/ipv6.h index fd570d77588e..d40d8238d4c2 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -938,7 +938,7 @@ static inline bool ipv6_can_nonlocal_bind(struct net *net, { return net->ipv6.sysctl.ip_nonlocal_bind || test_bit(INET_FLAGS_FREEBIND, &inet->inet_flags) || - inet->transparent; + test_bit(INET_FLAGS_TRANSPARENT, &inet->inet_flags); } /* Sysctl settings for net ipv6.auto_flowlabels */ diff --git a/include/net/route.h b/include/net/route.h index d9ca98d2366f..51a45b1887b5 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -298,7 +298,7 @@ static inline void ip_route_connect_init(struct flowi4 *fl4, __be32 dst, { __u8 flow_flags = 0; - if (inet_sk(sk)->transparent) + if (inet_test_bit(TRANSPARENT, sk)) flow_flags |= FLOWI_FLAG_ANYSRC; flowi4_init_output(fl4, oif, READ_ONCE(sk->sk_mark), ip_sock_rt_tos(sk), diff --git a/include/net/tcp.h b/include/net/tcp.h index 6d77c08d83b7..07b21d9a9620 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2031,7 +2031,7 @@ static inline bool inet_sk_transparent(const struct sock *sk) case TCP_NEW_SYN_RECV: return inet_rsk(inet_reqsk(sk))->no_srccheck; } - return inet_sk(sk)->transparent; + return inet_test_bit(TRANSPARENT, sk); } /* Determines whether this is a thin stream (which may suffer from -- cgit v1.2.3 From b1c0356a58577ce0a583e3044bf801877fc0b5de Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 16 Aug 2023 08:15:42 +0000 Subject: inet: move inet->is_icsk to inet->inet_flags We move single bit fields to inet->inet_flags to avoid races. Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/net/inet_connection_sock.h | 4 ++-- include/net/inet_sock.h | 5 ++--- 2 files changed, 4 insertions(+), 5 deletions(-) (limited to 'include') diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index be3c858a2ebb..5d2fcc137b88 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -342,9 +342,9 @@ static inline bool inet_csk_in_pingpong_mode(struct sock *sk) return inet_csk(sk)->icsk_ack.pingpong >= TCP_PINGPONG_THRESH; } -static inline bool inet_csk_has_ulp(struct sock *sk) +static inline bool inet_csk_has_ulp(const struct sock *sk) { - return inet_sk(sk)->is_icsk && !!inet_csk(sk)->icsk_ulp_ops; + return inet_test_bit(IS_ICSK, sk) && !!inet_csk(sk)->icsk_ulp_ops; } #endif /* _INET_CONNECTION_SOCK_H */ diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index cefd9a60dc6d..38f7fc1c4dac 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -201,7 +201,6 @@ struct rtable; * @inet_id - ID counter for DF pkts * @tos - TOS * @mc_ttl - Multicasting TTL - * @is_icsk - is this an inet_connection_sock? * @uc_index - Unicast outgoing device index * @mc_index - Multicast device index * @mc_list - Group array @@ -230,8 +229,7 @@ struct inet_sock { __u8 min_ttl; __u8 mc_ttl; __u8 pmtudisc; - __u8 is_icsk:1, - nodefrag:1; + __u8 nodefrag:1; __u8 bind_address_no_port:1, defer_connect:1; /* Indicates that fastopen_connect is set * and cookie exists so we defer connect @@ -271,6 +269,7 @@ enum { INET_FLAGS_MC_LOOP = 13, INET_FLAGS_MC_ALL = 14, INET_FLAGS_TRANSPARENT = 15, + INET_FLAGS_IS_ICSK = 16, }; /* cmsg flags for inet */ -- cgit v1.2.3 From f04b8d3478a3a63f17d8dc0dc6da16a828b48df0 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 16 Aug 2023 08:15:43 +0000 Subject: inet: move inet->nodefrag to inet->inet_flags IP_NODEFRAG socket option can now be set/read without locking the socket. Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/net/inet_sock.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index 38f7fc1c4dac..0e6e1b017efb 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -229,7 +229,6 @@ struct inet_sock { __u8 min_ttl; __u8 mc_ttl; __u8 pmtudisc; - __u8 nodefrag:1; __u8 bind_address_no_port:1, defer_connect:1; /* Indicates that fastopen_connect is set * and cookie exists so we defer connect @@ -270,6 +269,7 @@ enum { INET_FLAGS_MC_ALL = 14, INET_FLAGS_TRANSPARENT = 15, INET_FLAGS_IS_ICSK = 16, + INET_FLAGS_NODEFRAG = 17, }; /* cmsg flags for inet */ -- cgit v1.2.3 From ca571e2eb7eb6202aa367e01c811aab023272f38 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 16 Aug 2023 08:15:44 +0000 Subject: inet: move inet->bind_address_no_port to inet->inet_flags IP_BIND_ADDRESS_NO_PORT socket option can now be set/read without locking the socket. Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/net/inet_sock.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index 0e6e1b017efb..5eca2e70cbb2 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -229,8 +229,7 @@ struct inet_sock { __u8 min_ttl; __u8 mc_ttl; __u8 pmtudisc; - __u8 bind_address_no_port:1, - defer_connect:1; /* Indicates that fastopen_connect is set + __u8 defer_connect:1; /* Indicates that fastopen_connect is set * and cookie exists so we defer connect * until first data frame is written */ @@ -270,6 +269,7 @@ enum { INET_FLAGS_TRANSPARENT = 15, INET_FLAGS_IS_ICSK = 16, INET_FLAGS_NODEFRAG = 17, + INET_FLAGS_BIND_ADDRESS_NO_PORT = 18, }; /* cmsg flags for inet */ -- cgit v1.2.3 From 08e39c0dfa29f233f5a621f7d274b793a080c769 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 16 Aug 2023 08:15:45 +0000 Subject: inet: move inet->defer_connect to inet->inet_flags Make room in struct inet_sock by removing this bit field, using one available bit in inet_flags instead. Also move local_port_range to fill the resulting hole, saving 8 bytes on 64bit arches. Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Reviewed-by: Simon Horman Reviewed-by: Matthieu Baerts Signed-off-by: David S. Miller --- include/net/inet_sock.h | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) (limited to 'include') diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index 5eca2e70cbb2..acbb93d7607a 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -229,21 +229,18 @@ struct inet_sock { __u8 min_ttl; __u8 mc_ttl; __u8 pmtudisc; - __u8 defer_connect:1; /* Indicates that fastopen_connect is set - * and cookie exists so we defer connect - * until first data frame is written - */ __u8 rcv_tos; __u8 convert_csum; int uc_index; int mc_index; __be32 mc_addr; - struct ip_mc_socklist __rcu *mc_list; - struct inet_cork_full cork; struct { __u16 lo; __u16 hi; } local_port_range; + + struct ip_mc_socklist __rcu *mc_list; + struct inet_cork_full cork; }; #define IPCORK_OPT 1 /* ip-options has been held in ipcork.opt */ @@ -270,6 +267,7 @@ enum { INET_FLAGS_IS_ICSK = 16, INET_FLAGS_NODEFRAG = 17, INET_FLAGS_BIND_ADDRESS_NO_PORT = 18, + INET_FLAGS_DEFER_CONNECT = 19, }; /* cmsg flags for inet */ -- cgit v1.2.3 From ac8a52962164a50e693fa021d3564d7745b83a7f Mon Sep 17 00:00:00 2001 From: Abel Wu Date: Mon, 14 Aug 2023 15:09:11 +0800 Subject: net-memcg: Fix scope of sockmem pressure indicators Now there are two indicators of socket memory pressure sit inside struct mem_cgroup, socket_pressure and tcpmem_pressure, indicating memory reclaim pressure in memcg->memory and ->tcpmem respectively. When in legacy mode (cgroupv1), the socket memory is charged into ->tcpmem which is independent of ->memory, so socket_pressure has nothing to do with socket's pressure at all. Things could be worse by taking socket_pressure into consideration in legacy mode, as a pressure in ->memory can lead to premature reclamation/throttling in socket. While for the default mode (cgroupv2), the socket memory is charged into ->memory, and ->tcpmem/->tcpmem_pressure are simply not used. So {socket,tcpmem}_pressure are only used in default/legacy mode respectively for indicating socket memory pressure. This patch fixes the pieces of code that make mixed use of both. Fixes: 8e8ae645249b ("mm: memcontrol: hook up vmpressure to socket pressure") Signed-off-by: Abel Wu Acked-by: Shakeel Butt Signed-off-by: David S. Miller --- include/linux/memcontrol.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 5818af8eca5a..dbf26bc89dd4 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -284,6 +284,11 @@ struct mem_cgroup { atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS]; atomic_long_t memory_events_local[MEMCG_NR_MEMORY_EVENTS]; + /* + * Hint of reclaim pressure for socket memroy management. Note + * that this indicator should NOT be used in legacy cgroup mode + * where socket memory is accounted/charged separately. + */ unsigned long socket_pressure; /* Legacy tcp memory accounting */ @@ -1727,8 +1732,8 @@ void mem_cgroup_sk_alloc(struct sock *sk); void mem_cgroup_sk_free(struct sock *sk); static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg) { - if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && memcg->tcpmem_pressure) - return true; + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) + return !!memcg->tcpmem_pressure; do { if (time_before(jiffies, READ_ONCE(memcg->socket_pressure))) return true; -- cgit v1.2.3 From 3dec89b14d37ee635e772636dad3f09f78f1ab87 Mon Sep 17 00:00:00 2001 From: Kui-Feng Lee Date: Tue, 15 Aug 2023 11:07:05 -0700 Subject: net/ipv6: Remove expired routes with a separated list of routes. FIB6 GC walks trees of fib6_tables to remove expired routes. Walking a tree can be expensive if the number of routes in a table is big, even if most of them are permanent. Checking routes in a separated list of routes having expiration will avoid this potential issue. Signed-off-by: Kui-Feng Lee Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/net/ip6_fib.h | 64 ++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 51 insertions(+), 13 deletions(-) (limited to 'include') diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h index 05e6f756feaf..c9ff23cf313e 100644 --- a/include/net/ip6_fib.h +++ b/include/net/ip6_fib.h @@ -179,6 +179,9 @@ struct fib6_info { refcount_t fib6_ref; unsigned long expires; + + struct hlist_node gc_link; + struct dst_metrics *fib6_metrics; #define fib6_pmtu fib6_metrics->metrics[RTAX_MTU-1] @@ -247,19 +250,6 @@ static inline bool fib6_requires_src(const struct fib6_info *rt) return rt->fib6_src.plen > 0; } -static inline void fib6_clean_expires(struct fib6_info *f6i) -{ - f6i->fib6_flags &= ~RTF_EXPIRES; - f6i->expires = 0; -} - -static inline void fib6_set_expires(struct fib6_info *f6i, - unsigned long expires) -{ - f6i->expires = expires; - f6i->fib6_flags |= RTF_EXPIRES; -} - static inline bool fib6_check_expired(const struct fib6_info *f6i) { if (f6i->fib6_flags & RTF_EXPIRES) @@ -267,6 +257,11 @@ static inline bool fib6_check_expired(const struct fib6_info *f6i) return false; } +static inline bool fib6_has_expires(const struct fib6_info *f6i) +{ + return f6i->fib6_flags & RTF_EXPIRES; +} + /* Function to safely get fn->fn_sernum for passed in rt * and store result in passed in cookie. * Return true if we can get cookie safely @@ -388,6 +383,7 @@ struct fib6_table { struct inet_peer_base tb6_peers; unsigned int flags; unsigned int fib_seq; + struct hlist_head tb6_gc_hlist; /* GC candidates */ #define RT6_TABLE_HAS_DFLT_ROUTER BIT(0) }; @@ -504,6 +500,48 @@ void fib6_gc_cleanup(void); int fib6_init(void); +/* fib6_info must be locked by the caller, and fib6_info->fib6_table can be + * NULL. + */ +static inline void fib6_set_expires_locked(struct fib6_info *f6i, + unsigned long expires) +{ + struct fib6_table *tb6; + + tb6 = f6i->fib6_table; + f6i->expires = expires; + if (tb6 && !fib6_has_expires(f6i)) + hlist_add_head(&f6i->gc_link, &tb6->tb6_gc_hlist); + f6i->fib6_flags |= RTF_EXPIRES; +} + +/* fib6_info must be locked by the caller, and fib6_info->fib6_table can be + * NULL. If fib6_table is NULL, the fib6_info will no be inserted into the + * list of GC candidates until it is inserted into a table. + */ +static inline void fib6_set_expires(struct fib6_info *f6i, + unsigned long expires) +{ + spin_lock_bh(&f6i->fib6_table->tb6_lock); + fib6_set_expires_locked(f6i, expires); + spin_unlock_bh(&f6i->fib6_table->tb6_lock); +} + +static inline void fib6_clean_expires_locked(struct fib6_info *f6i) +{ + if (fib6_has_expires(f6i)) + hlist_del_init(&f6i->gc_link); + f6i->fib6_flags &= ~RTF_EXPIRES; + f6i->expires = 0; +} + +static inline void fib6_clean_expires(struct fib6_info *f6i) +{ + spin_lock_bh(&f6i->fib6_table->tb6_lock); + fib6_clean_expires_locked(f6i); + spin_unlock_bh(&f6i->fib6_table->tb6_lock); +} + struct ipv6_route_iter { struct seq_net_private p; struct fib6_walker w; -- cgit v1.2.3 From dd2e84bb3804103ad1c26a21deb4b35b0e166746 Mon Sep 17 00:00:00 2001 From: Alexander Lobakin Date: Fri, 28 Jul 2023 17:52:05 +0200 Subject: virtchnl: fix fake 1-elem arrays in structs allocated as `nents + 1` - 1 The two most problematic virtchnl structures are virtchnl_rss_key and virtchnl_rss_lut. Their "flex" arrays have the type of u8, thus, when allocating / checking, the actual size is calculated as `sizeof + nents - 1 byte`. But their sizeof() is not 1 byte larger than the size of such structure with proper flex array, it's two bytes larger due to the padding. That said, their size is always 1 byte larger unless there are no tail elements -- then it's +2 bytes. Add virtchnl_struct_size() macro which will handle this case (and later other cases as well). Make its calling conv the same as we call struct_size() to allow it to be drop-in, even though it's unlikely to become possible to switch to generic API. The macro will calculate a proper size of a structure with a flex array at the end, so that it becomes transparent for the compilers, but add the difference from the old values, so that the real size of sorta-ABI-messages doesn't change. Use it on the allocation side in IAVF and the receiving side (defined as static inline in virtchnl.h) for the mentioned two structures. Signed-off-by: Alexander Lobakin Reviewed-by: Kees Cook Tested-by: Rafal Romanowski Signed-off-by: Tony Nguyen --- include/linux/avf/virtchnl.h | 31 +++++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 8 deletions(-) (limited to 'include') diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h index c15221dcb75e..3ab207c14809 100644 --- a/include/linux/avf/virtchnl.h +++ b/include/linux/avf/virtchnl.h @@ -866,18 +866,20 @@ VIRTCHNL_CHECK_STRUCT_LEN(4, virtchnl_promisc_info); struct virtchnl_rss_key { u16 vsi_id; u16 key_len; - u8 key[1]; /* RSS hash key, packed bytes */ + u8 key[]; /* RSS hash key, packed bytes */ }; -VIRTCHNL_CHECK_STRUCT_LEN(6, virtchnl_rss_key); +VIRTCHNL_CHECK_STRUCT_LEN(4, virtchnl_rss_key); +#define virtchnl_rss_key_LEGACY_SIZEOF 6 struct virtchnl_rss_lut { u16 vsi_id; u16 lut_entries; - u8 lut[1]; /* RSS lookup table */ + u8 lut[]; /* RSS lookup table */ }; -VIRTCHNL_CHECK_STRUCT_LEN(6, virtchnl_rss_lut); +VIRTCHNL_CHECK_STRUCT_LEN(4, virtchnl_rss_lut); +#define virtchnl_rss_lut_LEGACY_SIZEOF 6 /* VIRTCHNL_OP_GET_RSS_HENA_CAPS * VIRTCHNL_OP_SET_RSS_HENA @@ -1367,6 +1369,17 @@ struct virtchnl_fdir_del { VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_fdir_del); +#define __vss_byone(p, member, count, old) \ + (struct_size(p, member, count) + (old - 1 - struct_size(p, member, 0))) + +#define __vss(type, func, p, member, count) \ + struct type: func(p, member, count, type##_LEGACY_SIZEOF) + +#define virtchnl_struct_size(p, m, c) \ + _Generic(*p, \ + __vss(virtchnl_rss_key, __vss_byone, p, m, c), \ + __vss(virtchnl_rss_lut, __vss_byone, p, m, c)) + /** * virtchnl_vc_validate_vf_msg * @ver: Virtchnl version info @@ -1479,19 +1492,21 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode, } break; case VIRTCHNL_OP_CONFIG_RSS_KEY: - valid_len = sizeof(struct virtchnl_rss_key); + valid_len = virtchnl_rss_key_LEGACY_SIZEOF; if (msglen >= valid_len) { struct virtchnl_rss_key *vrk = (struct virtchnl_rss_key *)msg; - valid_len += vrk->key_len - 1; + valid_len = virtchnl_struct_size(vrk, key, + vrk->key_len); } break; case VIRTCHNL_OP_CONFIG_RSS_LUT: - valid_len = sizeof(struct virtchnl_rss_lut); + valid_len = virtchnl_rss_lut_LEGACY_SIZEOF; if (msglen >= valid_len) { struct virtchnl_rss_lut *vrl = (struct virtchnl_rss_lut *)msg; - valid_len += vrl->lut_entries - 1; + valid_len = virtchnl_struct_size(vrl, lut, + vrl->lut_entries); } break; case VIRTCHNL_OP_GET_RSS_HENA_CAPS: -- cgit v1.2.3 From 5e7f59fa07f86f554c301c7a383bba54d5ef9819 Mon Sep 17 00:00:00 2001 From: Alexander Lobakin Date: Fri, 28 Jul 2023 17:52:06 +0200 Subject: virtchnl: fix fake 1-elem arrays in structures allocated as `nents + 1` There are five virtchnl structures, which are allocated and checked in the code as `nents + 1`, meaning that they always have memory for one excessive element regardless of their actual number. This comes from that their sizeof() includes space for 1 element and then they get allocated via struct_size() or its open-coded equivalents, passing the actual number of elements. Expand virtchnl_struct_size() to handle such structures and replace those 1-elem arrays with proper flex ones. Also fix several places which open-code %IAVF_VIRTCHNL_VF_RESOURCE_SIZE. Finally, let the virtchnl_ether_addr_list size be computed automatically when there's no enough space for the whole list, otherwise we have to open-code reverse struct_size() logics. Signed-off-by: Alexander Lobakin Reviewed-by: Kees Cook Tested-by: Rafal Romanowski Signed-off-by: Tony Nguyen --- include/linux/avf/virtchnl.h | 57 +++++++++++++++++++++++++++----------------- 1 file changed, 35 insertions(+), 22 deletions(-) (limited to 'include') diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h index 3ab207c14809..c1a20b533fc0 100644 --- a/include/linux/avf/virtchnl.h +++ b/include/linux/avf/virtchnl.h @@ -268,10 +268,11 @@ struct virtchnl_vf_resource { u32 rss_key_size; u32 rss_lut_size; - struct virtchnl_vsi_resource vsi_res[1]; + struct virtchnl_vsi_resource vsi_res[]; }; -VIRTCHNL_CHECK_STRUCT_LEN(36, virtchnl_vf_resource); +VIRTCHNL_CHECK_STRUCT_LEN(20, virtchnl_vf_resource); +#define virtchnl_vf_resource_LEGACY_SIZEOF 36 /* VIRTCHNL_OP_CONFIG_TX_QUEUE * VF sends this message to set up parameters for one TX queue. @@ -340,10 +341,11 @@ struct virtchnl_vsi_queue_config_info { u16 vsi_id; u16 num_queue_pairs; u32 pad; - struct virtchnl_queue_pair_info qpair[1]; + struct virtchnl_queue_pair_info qpair[]; }; -VIRTCHNL_CHECK_STRUCT_LEN(72, virtchnl_vsi_queue_config_info); +VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_vsi_queue_config_info); +#define virtchnl_vsi_queue_config_info_LEGACY_SIZEOF 72 /* VIRTCHNL_OP_REQUEST_QUEUES * VF sends this message to request the PF to allocate additional queues to @@ -385,10 +387,11 @@ VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_vector_map); struct virtchnl_irq_map_info { u16 num_vectors; - struct virtchnl_vector_map vecmap[1]; + struct virtchnl_vector_map vecmap[]; }; -VIRTCHNL_CHECK_STRUCT_LEN(14, virtchnl_irq_map_info); +VIRTCHNL_CHECK_STRUCT_LEN(2, virtchnl_irq_map_info); +#define virtchnl_irq_map_info_LEGACY_SIZEOF 14 /* VIRTCHNL_OP_ENABLE_QUEUES * VIRTCHNL_OP_DISABLE_QUEUES @@ -459,10 +462,11 @@ VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_ether_addr); struct virtchnl_ether_addr_list { u16 vsi_id; u16 num_elements; - struct virtchnl_ether_addr list[1]; + struct virtchnl_ether_addr list[]; }; -VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_ether_addr_list); +VIRTCHNL_CHECK_STRUCT_LEN(4, virtchnl_ether_addr_list); +#define virtchnl_ether_addr_list_LEGACY_SIZEOF 12 /* VIRTCHNL_OP_ADD_VLAN * VF sends this message to add one or more VLAN tag filters for receives. @@ -481,10 +485,11 @@ VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_ether_addr_list); struct virtchnl_vlan_filter_list { u16 vsi_id; u16 num_elements; - u16 vlan_id[1]; + u16 vlan_id[]; }; -VIRTCHNL_CHECK_STRUCT_LEN(6, virtchnl_vlan_filter_list); +VIRTCHNL_CHECK_STRUCT_LEN(4, virtchnl_vlan_filter_list); +#define virtchnl_vlan_filter_list_LEGACY_SIZEOF 6 /* This enum is used for all of the VIRTCHNL_VF_OFFLOAD_VLAN_V2_CAPS related * structures and opcodes. @@ -1372,11 +1377,19 @@ VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_fdir_del); #define __vss_byone(p, member, count, old) \ (struct_size(p, member, count) + (old - 1 - struct_size(p, member, 0))) +#define __vss_full(p, member, count, old) \ + (struct_size(p, member, count) + (old - struct_size(p, member, 0))) + #define __vss(type, func, p, member, count) \ struct type: func(p, member, count, type##_LEGACY_SIZEOF) #define virtchnl_struct_size(p, m, c) \ _Generic(*p, \ + __vss(virtchnl_vf_resource, __vss_full, p, m, c), \ + __vss(virtchnl_vsi_queue_config_info, __vss_full, p, m, c), \ + __vss(virtchnl_irq_map_info, __vss_full, p, m, c), \ + __vss(virtchnl_ether_addr_list, __vss_full, p, m, c), \ + __vss(virtchnl_vlan_filter_list, __vss_full, p, m, c), \ __vss(virtchnl_rss_key, __vss_byone, p, m, c), \ __vss(virtchnl_rss_lut, __vss_byone, p, m, c)) @@ -1414,24 +1427,23 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode, valid_len = sizeof(struct virtchnl_rxq_info); break; case VIRTCHNL_OP_CONFIG_VSI_QUEUES: - valid_len = sizeof(struct virtchnl_vsi_queue_config_info); + valid_len = virtchnl_vsi_queue_config_info_LEGACY_SIZEOF; if (msglen >= valid_len) { struct virtchnl_vsi_queue_config_info *vqc = (struct virtchnl_vsi_queue_config_info *)msg; - valid_len += (vqc->num_queue_pairs * - sizeof(struct - virtchnl_queue_pair_info)); + valid_len = virtchnl_struct_size(vqc, qpair, + vqc->num_queue_pairs); if (vqc->num_queue_pairs == 0) err_msg_format = true; } break; case VIRTCHNL_OP_CONFIG_IRQ_MAP: - valid_len = sizeof(struct virtchnl_irq_map_info); + valid_len = virtchnl_irq_map_info_LEGACY_SIZEOF; if (msglen >= valid_len) { struct virtchnl_irq_map_info *vimi = (struct virtchnl_irq_map_info *)msg; - valid_len += (vimi->num_vectors * - sizeof(struct virtchnl_vector_map)); + valid_len = virtchnl_struct_size(vimi, vecmap, + vimi->num_vectors); if (vimi->num_vectors == 0) err_msg_format = true; } @@ -1442,23 +1454,24 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode, break; case VIRTCHNL_OP_ADD_ETH_ADDR: case VIRTCHNL_OP_DEL_ETH_ADDR: - valid_len = sizeof(struct virtchnl_ether_addr_list); + valid_len = virtchnl_ether_addr_list_LEGACY_SIZEOF; if (msglen >= valid_len) { struct virtchnl_ether_addr_list *veal = (struct virtchnl_ether_addr_list *)msg; - valid_len += veal->num_elements * - sizeof(struct virtchnl_ether_addr); + valid_len = virtchnl_struct_size(veal, list, + veal->num_elements); if (veal->num_elements == 0) err_msg_format = true; } break; case VIRTCHNL_OP_ADD_VLAN: case VIRTCHNL_OP_DEL_VLAN: - valid_len = sizeof(struct virtchnl_vlan_filter_list); + valid_len = virtchnl_vlan_filter_list_LEGACY_SIZEOF; if (msglen >= valid_len) { struct virtchnl_vlan_filter_list *vfl = (struct virtchnl_vlan_filter_list *)msg; - valid_len += vfl->num_elements * sizeof(u16); + valid_len = virtchnl_struct_size(vfl, vlan_id, + vfl->num_elements); if (vfl->num_elements == 0) err_msg_format = true; } -- cgit v1.2.3 From b0654e64dbaf62f565b5f2b4fbd92202e88dcba3 Mon Sep 17 00:00:00 2001 From: Alexander Lobakin Date: Fri, 28 Jul 2023 17:52:07 +0200 Subject: virtchnl: fix fake 1-elem arrays for structures allocated as `nents` Finally, fix 3 structures which are allocated technically correctly, i.e. the calculated size equals to the one that struct_size() would return, except for sizeof(). For &virtchnl_vlan_filter_list_v2, use the same approach when there are no enough space as taken previously for &virtchnl_vlan_filter_list, i.e. let the maximum size be calculated automatically instead of trying to guestimate it using maths. Signed-off-by: Alexander Lobakin Reviewed-by: Kees Cook Tested-by: Rafal Romanowski Signed-off-by: Tony Nguyen --- include/linux/avf/virtchnl.h | 39 ++++++++++++++++++++++++--------------- 1 file changed, 24 insertions(+), 15 deletions(-) (limited to 'include') diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h index c1a20b533fc0..d0807ad43f93 100644 --- a/include/linux/avf/virtchnl.h +++ b/include/linux/avf/virtchnl.h @@ -716,10 +716,11 @@ struct virtchnl_vlan_filter_list_v2 { u16 vport_id; u16 num_elements; u8 pad[4]; - struct virtchnl_vlan_filter filters[1]; + struct virtchnl_vlan_filter filters[]; }; -VIRTCHNL_CHECK_STRUCT_LEN(40, virtchnl_vlan_filter_list_v2); +VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_vlan_filter_list_v2); +#define virtchnl_vlan_filter_list_v2_LEGACY_SIZEOF 40 /* VIRTCHNL_OP_ENABLE_VLAN_STRIPPING_V2 * VIRTCHNL_OP_DISABLE_VLAN_STRIPPING_V2 @@ -918,10 +919,11 @@ VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_channel_info); struct virtchnl_tc_info { u32 num_tc; u32 pad; - struct virtchnl_channel_info list[1]; + struct virtchnl_channel_info list[]; }; -VIRTCHNL_CHECK_STRUCT_LEN(24, virtchnl_tc_info); +VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_tc_info); +#define virtchnl_tc_info_LEGACY_SIZEOF 24 /* VIRTCHNL_ADD_CLOUD_FILTER * VIRTCHNL_DEL_CLOUD_FILTER @@ -1059,10 +1061,11 @@ VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_rdma_qv_info); struct virtchnl_rdma_qvlist_info { u32 num_vectors; - struct virtchnl_rdma_qv_info qv_info[1]; + struct virtchnl_rdma_qv_info qv_info[]; }; -VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_rdma_qvlist_info); +VIRTCHNL_CHECK_STRUCT_LEN(4, virtchnl_rdma_qvlist_info); +#define virtchnl_rdma_qvlist_info_LEGACY_SIZEOF 16 /* VF reset states - these are written into the RSTAT register: * VFGEN_RSTAT on the VF @@ -1377,6 +1380,9 @@ VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_fdir_del); #define __vss_byone(p, member, count, old) \ (struct_size(p, member, count) + (old - 1 - struct_size(p, member, 0))) +#define __vss_byelem(p, member, count, old) \ + (struct_size(p, member, count - 1) + (old - struct_size(p, member, 0))) + #define __vss_full(p, member, count, old) \ (struct_size(p, member, count) + (old - struct_size(p, member, 0))) @@ -1390,6 +1396,9 @@ VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_fdir_del); __vss(virtchnl_irq_map_info, __vss_full, p, m, c), \ __vss(virtchnl_ether_addr_list, __vss_full, p, m, c), \ __vss(virtchnl_vlan_filter_list, __vss_full, p, m, c), \ + __vss(virtchnl_vlan_filter_list_v2, __vss_byelem, p, m, c), \ + __vss(virtchnl_tc_info, __vss_byelem, p, m, c), \ + __vss(virtchnl_rdma_qvlist_info, __vss_byelem, p, m, c), \ __vss(virtchnl_rss_key, __vss_byone, p, m, c), \ __vss(virtchnl_rss_lut, __vss_byone, p, m, c)) @@ -1495,13 +1504,13 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode, case VIRTCHNL_OP_RELEASE_RDMA_IRQ_MAP: break; case VIRTCHNL_OP_CONFIG_RDMA_IRQ_MAP: - valid_len = sizeof(struct virtchnl_rdma_qvlist_info); + valid_len = virtchnl_rdma_qvlist_info_LEGACY_SIZEOF; if (msglen >= valid_len) { struct virtchnl_rdma_qvlist_info *qv = (struct virtchnl_rdma_qvlist_info *)msg; - valid_len += ((qv->num_vectors - 1) * - sizeof(struct virtchnl_rdma_qv_info)); + valid_len = virtchnl_struct_size(qv, qv_info, + qv->num_vectors); } break; case VIRTCHNL_OP_CONFIG_RSS_KEY: @@ -1534,12 +1543,12 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode, valid_len = sizeof(struct virtchnl_vf_res_request); break; case VIRTCHNL_OP_ENABLE_CHANNELS: - valid_len = sizeof(struct virtchnl_tc_info); + valid_len = virtchnl_tc_info_LEGACY_SIZEOF; if (msglen >= valid_len) { struct virtchnl_tc_info *vti = (struct virtchnl_tc_info *)msg; - valid_len += (vti->num_tc - 1) * - sizeof(struct virtchnl_channel_info); + valid_len = virtchnl_struct_size(vti, list, + vti->num_tc); if (vti->num_tc == 0) err_msg_format = true; } @@ -1566,13 +1575,13 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode, break; case VIRTCHNL_OP_ADD_VLAN_V2: case VIRTCHNL_OP_DEL_VLAN_V2: - valid_len = sizeof(struct virtchnl_vlan_filter_list_v2); + valid_len = virtchnl_vlan_filter_list_v2_LEGACY_SIZEOF; if (msglen >= valid_len) { struct virtchnl_vlan_filter_list_v2 *vfl = (struct virtchnl_vlan_filter_list_v2 *)msg; - valid_len += (vfl->num_elements - 1) * - sizeof(struct virtchnl_vlan_filter); + valid_len = virtchnl_struct_size(vfl, filters, + vfl->num_elements); if (vfl->num_elements == 0) { err_msg_format = true; -- cgit v1.2.3 From 4072d97ddc447ce9dd8f7a39cdf6f92d2031bb01 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Michel?= Date: Tue, 15 Aug 2023 11:23:38 +0200 Subject: netem: add prng attribute to netem_sched_data MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add prng attribute to struct netem_sched_data and allows setting the seed of the PRNG through netlink using the new TCA_NETEM_PRNG_SEED attribute. The PRNG attribute is not actually used yet. Signed-off-by: François Michel Reviewed-by: Simon Horman Acked-by: Stephen Hemminger Link: https://lore.kernel.org/r/20230815092348.1449179-2-francois.michel@uclouvain.be Signed-off-by: Jakub Kicinski --- include/uapi/linux/pkt_sched.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h index 00f6ff0aff1f..3f85ae578056 100644 --- a/include/uapi/linux/pkt_sched.h +++ b/include/uapi/linux/pkt_sched.h @@ -603,6 +603,7 @@ enum { TCA_NETEM_JITTER64, TCA_NETEM_SLOT, TCA_NETEM_SLOT_DIST, + TCA_NETEM_PRNG_SEED, __TCA_NETEM_MAX, }; -- cgit v1.2.3 From a171fbec88a2c730b108c7147ac5e7b2f5a02b47 Mon Sep 17 00:00:00 2001 From: Yan Zhai Date: Thu, 17 Aug 2023 19:58:14 -0700 Subject: lwt: Check LWTUNNEL_XMIT_CONTINUE strictly LWTUNNEL_XMIT_CONTINUE is implicitly assumed in ip(6)_finish_output2, such that any positive return value from a xmit hook could cause unexpected continue behavior, despite that related skb may have been freed. This could be error-prone for future xmit hook ops. One of the possible errors is to return statuses of dst_output directly. To make the code safer, redefine LWTUNNEL_XMIT_CONTINUE value to distinguish from dst_output statuses and check the continue condition explicitly. Fixes: 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure") Suggested-by: Dan Carpenter Signed-off-by: Yan Zhai Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/96b939b85eda00e8df4f7c080f770970a4c5f698.1692326837.git.yan@cloudflare.com --- include/net/lwtunnel.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h index 6f15e6fa154e..53bd2d02a4f0 100644 --- a/include/net/lwtunnel.h +++ b/include/net/lwtunnel.h @@ -16,9 +16,12 @@ #define LWTUNNEL_STATE_INPUT_REDIRECT BIT(1) #define LWTUNNEL_STATE_XMIT_REDIRECT BIT(2) +/* LWTUNNEL_XMIT_CONTINUE should be distinguishable from dst_output return + * values (NET_XMIT_xxx and NETDEV_TX_xxx in linux/netdevice.h) for safety. + */ enum { LWTUNNEL_XMIT_DONE, - LWTUNNEL_XMIT_CONTINUE, + LWTUNNEL_XMIT_CONTINUE = 0x100, }; -- cgit v1.2.3 From bbed596c74a527e0d0d30bc56732f26407f12d6e Mon Sep 17 00:00:00 2001 From: Guangguan Wang Date: Thu, 17 Aug 2023 21:20:32 +0800 Subject: net/smc: Extend SMCR v2 linkgroup netlink attribute Add SMC_NLA_LGR_R_V2_MAX_CONNS and SMC_NLA_LGR_R_V2_MAX_LINKS to SMCR v2 linkgroup netlink attribute SMC_NLA_LGR_R_V2 for linkgroup's detail info showing. Signed-off-by: Guangguan Wang Reviewed-by: Jan Karcher Signed-off-by: David S. Miller --- include/uapi/linux/smc.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/uapi/linux/smc.h b/include/uapi/linux/smc.h index bb4dacca31e7..837fcd4b0abc 100644 --- a/include/uapi/linux/smc.h +++ b/include/uapi/linux/smc.h @@ -107,6 +107,8 @@ enum { enum { SMC_NLA_LGR_R_V2_UNSPEC, SMC_NLA_LGR_R_V2_DIRECT, /* u8 */ + SMC_NLA_LGR_R_V2_MAX_CONNS, /* u8 */ + SMC_NLA_LGR_R_V2_MAX_LINKS, /* u8 */ __SMC_NLA_LGR_R_V2_MAX, SMC_NLA_LGR_R_V2_MAX = __SMC_NLA_LGR_R_V2_MAX - 1 }; -- cgit v1.2.3 From 4025d3e73abde4f65f4b04d4b1d8449b00e31473 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 18 Aug 2023 09:40:39 +0000 Subject: net: add skb_queue_purge_reason and __skb_queue_purge_reason skb_queue_purge() and __skb_queue_purge() become wrappers around the new generic functions. New SKB_DROP_REASON_QUEUE_PURGE drop reason is added, but users can start adding more specific reasons. Signed-off-by: Eric Dumazet Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/linux/skbuff.h | 23 +++++++++++++++++++---- include/net/dropreason-core.h | 3 +++ 2 files changed, 22 insertions(+), 4 deletions(-) (limited to 'include') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index aa57e2eca33b..9aec136bc690 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3149,20 +3149,35 @@ static inline int skb_orphan_frags_rx(struct sk_buff *skb, gfp_t gfp_mask) } /** - * __skb_queue_purge - empty a list + * __skb_queue_purge_reason - empty a list * @list: list to empty + * @reason: drop reason * * Delete all buffers on an &sk_buff list. Each buffer is removed from * the list and one reference dropped. This function does not take the * list lock and the caller must hold the relevant locks to use it. */ -static inline void __skb_queue_purge(struct sk_buff_head *list) +static inline void __skb_queue_purge_reason(struct sk_buff_head *list, + enum skb_drop_reason reason) { struct sk_buff *skb; + while ((skb = __skb_dequeue(list)) != NULL) - kfree_skb(skb); + kfree_skb_reason(skb, reason); +} + +static inline void __skb_queue_purge(struct sk_buff_head *list) +{ + __skb_queue_purge_reason(list, SKB_DROP_REASON_QUEUE_PURGE); +} + +void skb_queue_purge_reason(struct sk_buff_head *list, + enum skb_drop_reason reason); + +static inline void skb_queue_purge(struct sk_buff_head *list) +{ + skb_queue_purge_reason(list, SKB_DROP_REASON_QUEUE_PURGE); } -void skb_queue_purge(struct sk_buff_head *list); unsigned int skb_rbtree_purge(struct rb_root *root); diff --git a/include/net/dropreason-core.h b/include/net/dropreason-core.h index f291a3b0f9e5..a587e83fc169 100644 --- a/include/net/dropreason-core.h +++ b/include/net/dropreason-core.h @@ -79,6 +79,7 @@ FN(IPV6_NDISC_BAD_CODE) \ FN(IPV6_NDISC_BAD_OPTIONS) \ FN(IPV6_NDISC_NS_OTHERHOST) \ + FN(QUEUE_PURGE) \ FNe(MAX) /** @@ -342,6 +343,8 @@ enum skb_drop_reason { * for another host. */ SKB_DROP_REASON_IPV6_NDISC_NS_OTHERHOST, + /** @SKB_DROP_REASON_QUEUE_PURGE: bulk free. */ + SKB_DROP_REASON_QUEUE_PURGE, /** * @SKB_DROP_REASON_MAX: the maximum of core drop reasons, which * shouldn't be used as a real 'reason' - only for tracing code gen -- cgit v1.2.3 From f132fdd9dc81e045bcf95005d418a31fbe9d942f Mon Sep 17 00:00:00 2001 From: Patrisious Haddad Date: Mon, 2 May 2022 14:40:56 +0300 Subject: macsec: add functions to get macsec real netdevice and check offload Given a macsec net_device add two functions to return the real net_device for that device, and check if that macsec device is offloaded or not. This is needed for auxiliary drivers that implement MACsec offload, but have flows which are triggered over the macsec net_device, this allows the drivers in such cases to verify if the device is offloaded or not, and to access the real device of that macsec device, which would belong to the driver, and would be needed for the offload procedure. Signed-off-by: Patrisious Haddad Reviewed-by: Raed Salem Reviewed-by: Mark Zhang Signed-off-by: Leon Romanovsky --- include/net/macsec.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/net/macsec.h b/include/net/macsec.h index 441ed8fd4b5f..75a6f4863c83 100644 --- a/include/net/macsec.h +++ b/include/net/macsec.h @@ -312,6 +312,8 @@ static inline bool macsec_send_sci(const struct macsec_secy *secy) return tx_sc->send_sci || (secy->n_rx_sc > 1 && !tx_sc->end_station && !tx_sc->scb); } +struct net_device *macsec_get_real_dev(const struct net_device *dev); +bool macsec_netdev_is_offloaded(struct net_device *dev); static inline void *macsec_netdev_priv(const struct net_device *dev) { -- cgit v1.2.3 From 2e92f669b86d70a26a77b088c74e1cb6d26322e1 Mon Sep 17 00:00:00 2001 From: Patrisious Haddad Date: Tue, 6 Dec 2022 14:33:49 +0200 Subject: net/mlx5e: Move MACsec flow steering and statistics database from ethernet to core Since now MACsec flow steering (macsec_fs) and MACsec statistics (stats) are maintained by the core driver, move their data as well to be saved inside core structures instead of staying part of ethernet MACsec database. In addition cleanup all MACsec stats functions from the ethernet MACsec code and move what's needed to be part of macsec_fs instead. Signed-off-by: Patrisious Haddad Signed-off-by: Leon Romanovsky --- include/linux/mlx5/driver.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 25d0528f9219..0e80241c270f 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -805,6 +805,9 @@ struct mlx5_core_dev { u32 vsc_addr; struct mlx5_hv_vhca *hv_vhca; struct mlx5_thermal *thermal; +#ifdef CONFIG_MLX5_MACSEC + struct mlx5_macsec_fs *macsec_fs; +#endif }; struct mlx5_db { -- cgit v1.2.3 From 758ce14aee825f8f3ca8f76c9991c108094cae8b Mon Sep 17 00:00:00 2001 From: Patrisious Haddad Date: Tue, 3 May 2022 08:37:48 +0300 Subject: RDMA/mlx5: Implement MACsec gid addition and deletion Handle MACsec IP ambiguity issue, since mlx5 hw can't support programming both the MACsec and the physical gid when they have the same IP address, because it wouldn't know to whom to steer the traffic. Hence in such case we delete the physical gid from the hw gid table, which would then cause all traffic sent over it to fail, and we'll only be able to send traffic over the MACsec gid. Signed-off-by: Patrisious Haddad Reviewed-by: Raed Salem Reviewed-by: Mark Zhang Signed-off-by: Leon Romanovsky --- include/linux/mlx5/driver.h | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) (limited to 'include') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 0e80241c270f..21954e7aeeba 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1323,6 +1323,50 @@ static inline bool mlx5_get_roce_state(struct mlx5_core_dev *dev) return mlx5_is_roce_on(dev); } +static inline bool mlx5e_is_macsec_device(const struct mlx5_core_dev *mdev) +{ + if (!(MLX5_CAP_GEN_64(mdev, general_obj_types) & + MLX5_GENERAL_OBJ_TYPES_CAP_MACSEC_OFFLOAD)) + return false; + + if (!MLX5_CAP_GEN(mdev, log_max_dek)) + return false; + + if (!MLX5_CAP_MACSEC(mdev, log_max_macsec_offload)) + return false; + + if (!MLX5_CAP_FLOWTABLE_NIC_RX(mdev, macsec_decrypt) || + !MLX5_CAP_FLOWTABLE_NIC_RX(mdev, reformat_remove_macsec)) + return false; + + if (!MLX5_CAP_FLOWTABLE_NIC_TX(mdev, macsec_encrypt) || + !MLX5_CAP_FLOWTABLE_NIC_TX(mdev, reformat_add_macsec)) + return false; + + if (!MLX5_CAP_MACSEC(mdev, macsec_crypto_esp_aes_gcm_128_encrypt) && + !MLX5_CAP_MACSEC(mdev, macsec_crypto_esp_aes_gcm_256_encrypt)) + return false; + + if (!MLX5_CAP_MACSEC(mdev, macsec_crypto_esp_aes_gcm_128_decrypt) && + !MLX5_CAP_MACSEC(mdev, macsec_crypto_esp_aes_gcm_256_decrypt)) + return false; + + return true; +} + +#define NIC_RDMA_BOTH_DIRS_CAPS (MLX5_FT_NIC_RX_2_NIC_RX_RDMA | MLX5_FT_NIC_TX_RDMA_2_NIC_TX) + +static inline bool mlx5_is_macsec_roce_supported(struct mlx5_core_dev *mdev) +{ + if (((MLX5_CAP_GEN_2(mdev, flow_table_type_2_type) & + NIC_RDMA_BOTH_DIRS_CAPS) != NIC_RDMA_BOTH_DIRS_CAPS) || + !MLX5_CAP_FLOWTABLE_RDMA_TX(mdev, max_modify_header_actions) || + !mlx5e_is_macsec_device(mdev)) + return false; + + return true; +} + enum { MLX5_OCTWORD = 16, }; -- cgit v1.2.3 From afcb21d5a89b40c3062aa48d39ab5340abf7dcd8 Mon Sep 17 00:00:00 2001 From: Patrisious Haddad Date: Thu, 11 Aug 2022 05:20:02 +0300 Subject: net/mlx5: Add MACsec priorities in RDMA namespaces Add MACsec flow steering priorities in RDMA namespaces. This allows adding tables/rules to forward RoCEv2 traffic to the MACsec crypto tables in NIC_TX domain, and accept RoCEv2 traffic from NIC_RX domain. Signed-off-by: Patrisious Haddad Reviewed-by: Maor Gottlieb Signed-off-by: Leon Romanovsky --- include/linux/mlx5/fs.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index 2cb404c7ea13..091ff1240959 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -105,6 +105,8 @@ enum mlx5_flow_namespace_type { MLX5_FLOW_NAMESPACE_RDMA_TX_COUNTERS, MLX5_FLOW_NAMESPACE_RDMA_RX_IPSEC, MLX5_FLOW_NAMESPACE_RDMA_TX_IPSEC, + MLX5_FLOW_NAMESPACE_RDMA_RX_MACSEC, + MLX5_FLOW_NAMESPACE_RDMA_TX_MACSEC, }; enum { -- cgit v1.2.3 From ac7ea1c78f0edb910af1c17a3fb01345944bcb39 Mon Sep 17 00:00:00 2001 From: Patrisious Haddad Date: Thu, 13 Apr 2023 12:04:42 +0300 Subject: net/mlx5: Add RoCE MACsec steering infrastructure in core Adds all the core steering helper functions that are needed in order to setup RoCE steering rules which includes both the RX and TX rules addition and deletion. As well as exporting the function to be ready to use from the IB driver where we expose functions to allow deletion of all rules, which is needed when a GID is deleted, or a deletion of a specific rule when an SA is deleted, and a similar manner for the rules addition. These functions are used in a later patch by IB driver to trigger the rules addition/deletion when needed. Signed-off-by: Patrisious Haddad Signed-off-by: Leon Romanovsky --- include/linux/mlx5/device.h | 2 ++ include/linux/mlx5/driver.h | 2 ++ include/linux/mlx5/macsec.h | 32 ++++++++++++++++++++++++++++++++ 3 files changed, 36 insertions(+) create mode 100644 include/linux/mlx5/macsec.h (limited to 'include') diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 80cc12a9a531..ca93a5ef9dac 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -364,6 +364,8 @@ enum mlx5_event { enum mlx5_driver_event { MLX5_DRIVER_EVENT_TYPE_TRAP = 0, MLX5_DRIVER_EVENT_UPLINK_NETDEV, + MLX5_DRIVER_EVENT_MACSEC_SA_ADDED, + MLX5_DRIVER_EVENT_MACSEC_SA_DELETED, }; enum { diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 21954e7aeeba..21014bc34516 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -807,6 +807,8 @@ struct mlx5_core_dev { struct mlx5_thermal *thermal; #ifdef CONFIG_MLX5_MACSEC struct mlx5_macsec_fs *macsec_fs; + /* MACsec notifier chain to sync MACsec core and IB database */ + struct blocking_notifier_head macsec_nh; #endif }; diff --git a/include/linux/mlx5/macsec.h b/include/linux/mlx5/macsec.h new file mode 100644 index 000000000000..f7ff4c2a95d0 --- /dev/null +++ b/include/linux/mlx5/macsec.h @@ -0,0 +1,32 @@ +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ +/* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. */ + +#ifndef MLX5_MACSEC_H +#define MLX5_MACSEC_H + +#ifdef CONFIG_MLX5_MACSEC +struct mlx5_macsec_event_data { + struct mlx5_macsec_fs *macsec_fs; + void *macdev; + u32 fs_id; + bool is_tx; +}; + +int mlx5_macsec_add_roce_rule(void *macdev, const struct sockaddr *addr, u16 gid_idx, + struct list_head *tx_rules_list, struct list_head *rx_rules_list, + struct mlx5_macsec_fs *macsec_fs); + +void mlx5_macsec_del_roce_rule(u16 gid_idx, struct mlx5_macsec_fs *macsec_fs, + struct list_head *tx_rules_list, struct list_head *rx_rules_list); + +void mlx5_macsec_add_roce_sa_rules(u32 fs_id, const struct sockaddr *addr, u16 gid_idx, + struct list_head *tx_rules_list, + struct list_head *rx_rules_list, + struct mlx5_macsec_fs *macsec_fs, bool is_tx); + +void mlx5_macsec_del_roce_sa_rules(u32 fs_id, struct mlx5_macsec_fs *macsec_fs, + struct list_head *tx_rules_list, + struct list_head *rx_rules_list, bool is_tx); + +#endif +#endif /* MLX5_MACSEC_H */ -- cgit v1.2.3 From 58dbd6428a6819e55a3c52ec60126b5d00804a38 Mon Sep 17 00:00:00 2001 From: Patrisious Haddad Date: Thu, 13 Apr 2023 12:04:59 +0300 Subject: RDMA/mlx5: Handles RoCE MACsec steering rules addition and deletion Add RoCE MACsec rules when a gid is added for the MACsec netdevice and handle their cleanup when the gid is removed or the MACsec SA is deleted. Also support alias IP for the MACsec device, as long as we don't have more ips than what the gid table can hold. In addition handle the case where a gid is added but there are still no SAs added for the MACsec device, so the rules are added later on when the SAs are added. Signed-off-by: Patrisious Haddad Signed-off-by: Leon Romanovsky --- include/linux/mlx5/driver.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 21014bc34516..728bcd6d184c 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1325,6 +1325,7 @@ static inline bool mlx5_get_roce_state(struct mlx5_core_dev *dev) return mlx5_is_roce_on(dev); } +#ifdef CONFIG_MLX5_MACSEC static inline bool mlx5e_is_macsec_device(const struct mlx5_core_dev *mdev) { if (!(MLX5_CAP_GEN_64(mdev, general_obj_types) & @@ -1363,11 +1364,12 @@ static inline bool mlx5_is_macsec_roce_supported(struct mlx5_core_dev *mdev) if (((MLX5_CAP_GEN_2(mdev, flow_table_type_2_type) & NIC_RDMA_BOTH_DIRS_CAPS) != NIC_RDMA_BOTH_DIRS_CAPS) || !MLX5_CAP_FLOWTABLE_RDMA_TX(mdev, max_modify_header_actions) || - !mlx5e_is_macsec_device(mdev)) + !mlx5e_is_macsec_device(mdev) || !mdev->macsec_fs) return false; return true; } +#endif enum { MLX5_OCTWORD = 16, -- cgit v1.2.3 From 0f158b32a9b146c9b86783efccfd9ed02c744623 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 18 Aug 2023 17:41:45 +0000 Subject: net: selectively purge error queue in IP_RECVERR / IPV6_RECVERR Setting IP_RECVERR and IPV6_RECVERR options to zero currently purges the socket error queue, which was probably not expected for zerocopy and tx_timestamp users. I discovered this issue while preparing commit 6b5f43ea0815 ("inet: move inet->recverr to inet->inet_flags"), I presume this change does not need to be backported to stable kernels. Add skb_errqueue_purge() helper to purge error messages only. Signed-off-by: Eric Dumazet Cc: Willem de Bruijn Cc: Soheil Hassas Yeganeh Reviewed-by: Willem de Bruijn Signed-off-by: David S. Miller --- include/linux/skbuff.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 9aec136bc690..4174c4b82d13 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3180,6 +3180,7 @@ static inline void skb_queue_purge(struct sk_buff_head *list) } unsigned int skb_rbtree_purge(struct rb_root *root); +void skb_errqueue_purge(struct sk_buff_head *list); void *__netdev_alloc_frag_align(unsigned int fragsz, unsigned int align_mask); -- cgit v1.2.3 From c5487f8d91868eeab17a59cf4d164ea113f90252 Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Wed, 9 Aug 2023 10:34:13 +0200 Subject: bpf: Switch BPF_F_KPROBE_MULTI_RETURN macro to enum Switching BPF_F_KPROBE_MULTI_RETURN macro to anonymous enum, so it'd show up in vmlinux.h. There's not functional change compared to having this as macro. Acked-by: Yafang Shao Suggested-by: Andrii Nakryiko Signed-off-by: Jiri Olsa Acked-by: Yonghong Song Link: https://lore.kernel.org/r/20230809083440.3209381-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov --- include/uapi/linux/bpf.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index d21deb46f49f..a4e55e5e84a7 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1186,7 +1186,9 @@ enum bpf_perf_event_type { /* link_create.kprobe_multi.flags used in LINK_CREATE command for * BPF_TRACE_KPROBE_MULTI attach type to create return probe. */ -#define BPF_F_KPROBE_MULTI_RETURN (1U << 0) +enum { + BPF_F_KPROBE_MULTI_RETURN = (1U << 0) +}; /* link_create.netfilter.flags used in LINK_CREATE command for * BPF_PROG_TYPE_NETFILTER to enable IP packet defragmentation. -- cgit v1.2.3 From 89ae89f53d201143560f1e9ed4bfa62eee34f88e Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Wed, 9 Aug 2023 10:34:15 +0200 Subject: bpf: Add multi uprobe link Adding new multi uprobe link that allows to attach bpf program to multiple uprobes. Uprobes to attach are specified via new link_create uprobe_multi union: struct { __aligned_u64 path; __aligned_u64 offsets; __aligned_u64 ref_ctr_offsets; __u32 cnt; __u32 flags; } uprobe_multi; Uprobes are defined for single binary specified in path and multiple calling sites specified in offsets array with optional reference counters specified in ref_ctr_offsets array. All specified arrays have length of 'cnt'. The 'flags' supports single bit for now that marks the uprobe as return probe. Acked-by: Andrii Nakryiko Acked-by: Yafang Shao Signed-off-by: Jiri Olsa Acked-by: Yonghong Song Link: https://lore.kernel.org/r/20230809083440.3209381-4-jolsa@kernel.org Signed-off-by: Alexei Starovoitov --- include/linux/trace_events.h | 6 ++++++ include/uapi/linux/bpf.h | 16 ++++++++++++++++ 2 files changed, 22 insertions(+) (limited to 'include') diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index e66d04dbe56a..5b85cf18c350 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -752,6 +752,7 @@ int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id, u32 *fd_type, const char **buf, u64 *probe_offset, u64 *probe_addr); int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog); +int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog); #else static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx) { @@ -798,6 +799,11 @@ bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) { return -EOPNOTSUPP; } +static inline int +bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) +{ + return -EOPNOTSUPP; +} #endif enum { diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index a4e55e5e84a7..e48780951fc7 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1039,6 +1039,7 @@ enum bpf_attach_type { BPF_NETFILTER, BPF_TCX_INGRESS, BPF_TCX_EGRESS, + BPF_TRACE_UPROBE_MULTI, __MAX_BPF_ATTACH_TYPE }; @@ -1057,6 +1058,7 @@ enum bpf_link_type { BPF_LINK_TYPE_STRUCT_OPS = 9, BPF_LINK_TYPE_NETFILTER = 10, BPF_LINK_TYPE_TCX = 11, + BPF_LINK_TYPE_UPROBE_MULTI = 12, MAX_BPF_LINK_TYPE, }; @@ -1190,6 +1192,13 @@ enum { BPF_F_KPROBE_MULTI_RETURN = (1U << 0) }; +/* link_create.uprobe_multi.flags used in LINK_CREATE command for + * BPF_TRACE_UPROBE_MULTI attach type to create return probe. + */ +enum { + BPF_F_UPROBE_MULTI_RETURN = (1U << 0) +}; + /* link_create.netfilter.flags used in LINK_CREATE command for * BPF_PROG_TYPE_NETFILTER to enable IP packet defragmentation. */ @@ -1626,6 +1635,13 @@ union bpf_attr { }; __u64 expected_revision; } tcx; + struct { + __aligned_u64 path; + __aligned_u64 offsets; + __aligned_u64 ref_ctr_offsets; + __u32 cnt; + __u32 flags; + } uprobe_multi; }; } link_create; -- cgit v1.2.3 From 0b779b61f651851df5c5c42938a6c441eb1b5100 Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Wed, 9 Aug 2023 10:34:16 +0200 Subject: bpf: Add cookies support for uprobe_multi link Adding support to specify cookies array for uprobe_multi link. The cookies array share indexes and length with other uprobe_multi arrays (offsets/ref_ctr_offsets). The cookies[i] value defines cookie for i-the uprobe and will be returned by bpf_get_attach_cookie helper when called from ebpf program hooked to that specific uprobe. Acked-by: Andrii Nakryiko Acked-by: Yafang Shao Signed-off-by: Jiri Olsa Acked-by: Yonghong Song Link: https://lore.kernel.org/r/20230809083440.3209381-5-jolsa@kernel.org Signed-off-by: Alexei Starovoitov --- include/uapi/linux/bpf.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index e48780951fc7..d7f4f50b1e58 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1639,6 +1639,7 @@ union bpf_attr { __aligned_u64 path; __aligned_u64 offsets; __aligned_u64 ref_ctr_offsets; + __aligned_u64 cookies; __u32 cnt; __u32 flags; } uprobe_multi; -- cgit v1.2.3 From b733eeade4204423711793595c3c8d78a2fa8b2e Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Wed, 9 Aug 2023 10:34:17 +0200 Subject: bpf: Add pid filter support for uprobe_multi link Adding support to specify pid for uprobe_multi link and the uprobes are created only for task with given pid value. Using the consumer.filter filter callback for that, so the task gets filtered during the uprobe installation. We still need to check the task during runtime in the uprobe handler, because the handler could get executed if there's another system wide consumer on the same uprobe (thanks Oleg for the insight). Cc: Oleg Nesterov Reviewed-by: Oleg Nesterov Signed-off-by: Jiri Olsa Acked-by: Yonghong Song Link: https://lore.kernel.org/r/20230809083440.3209381-6-jolsa@kernel.org Signed-off-by: Alexei Starovoitov --- include/uapi/linux/bpf.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index d7f4f50b1e58..8790b3962e4b 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1642,6 +1642,7 @@ union bpf_attr { __aligned_u64 cookies; __u32 cnt; __u32 flags; + __u32 pid; } uprobe_multi; }; } link_create; -- cgit v1.2.3 From 93ca82447c3eab149b5b2588ab8bd9aa2eacef7c Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Thu, 17 Aug 2023 14:15:23 -0700 Subject: wifi: cfg80211: Annotate struct cfg80211_acl_data with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct cfg80211_acl_data. Additionally, since the element count member must be set before accessing the annotated flexible array member, move its initialization earlier. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Johannes Berg Cc: "David S. Miller" Cc: Eric Dumazet Cc: Jakub Kicinski Cc: Paolo Abeni Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook Reviewed-by: Justin Stitt Reviewed-by: Gustavo A. R. Silva Reviewed-by: Jeff Johnson Link: https://lore.kernel.org/r/20230817211531.4193219-1-keescook@chromium.org Signed-off-by: Johannes Berg --- include/net/cfg80211.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index d6fa7c8767ad..eb73b5af5d04 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -1282,7 +1282,7 @@ struct cfg80211_acl_data { int n_acl_entries; /* Keep it last */ - struct mac_address mac_addrs[]; + struct mac_address mac_addrs[] __counted_by(n_acl_entries); }; /** -- cgit v1.2.3 From c14679d7005a4d26289eadb4f5fe499aca78ef7c Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Thu, 17 Aug 2023 14:15:25 -0700 Subject: wifi: cfg80211: Annotate struct cfg80211_mbssid_elems with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct cfg80211_mbssid_elems. Additionally, since the element count member must be set before accessing the annotated flexible array member, move its initialization earlier. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Johannes Berg Cc: "David S. Miller" Cc: Eric Dumazet Cc: Jakub Kicinski Cc: Paolo Abeni Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook Reviewed-by: Gustavo A. R. Silva Reviewed-by: Jeff Johnson Link: https://lore.kernel.org/r/20230817211531.4193219-3-keescook@chromium.org Signed-off-by: Johannes Berg --- include/net/cfg80211.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index eb73b5af5d04..5c7d091b3925 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -1187,7 +1187,7 @@ struct cfg80211_mbssid_elems { struct { const u8 *data; size_t len; - } elem[]; + } elem[] __counted_by(cnt); }; /** -- cgit v1.2.3 From 342bc7c9e877847d602c4a67c3135051df07cc4d Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Thu, 17 Aug 2023 14:15:26 -0700 Subject: wifi: cfg80211: Annotate struct cfg80211_pmsr_request with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct cfg80211_pmsr_request. Additionally, since the element count member must be set before accessing the annotated flexible array member, move its initialization earlier. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Johannes Berg Cc: "David S. Miller" Cc: Eric Dumazet Cc: Jakub Kicinski Cc: Paolo Abeni Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook Reviewed-by: Gustavo A. R. Silva Reviewed-by: Jeff Johnson Link: https://lore.kernel.org/r/20230817211531.4193219-4-keescook@chromium.org Signed-off-by: Johannes Berg --- include/net/cfg80211.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index 5c7d091b3925..e9ca4726a732 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -3948,7 +3948,7 @@ struct cfg80211_pmsr_request { struct list_head list; - struct cfg80211_pmsr_request_peer peers[]; + struct cfg80211_pmsr_request_peer peers[] __counted_by(n_peers); }; /** -- cgit v1.2.3 From 7b6d7087031b69f0694cbef4485616c905664feb Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Thu, 17 Aug 2023 14:15:27 -0700 Subject: wifi: cfg80211: Annotate struct cfg80211_rnr_elems with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct cfg80211_rnr_elems. Additionally, since the element count member must be set before accessing the annotated flexible array member, move its initialization earlier. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Johannes Berg Cc: "David S. Miller" Cc: Eric Dumazet Cc: Jakub Kicinski Cc: Paolo Abeni Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook Reviewed-by: Gustavo A. R. Silva Reviewed-by: Jeff Johnson Link: https://lore.kernel.org/r/20230817211531.4193219-5-keescook@chromium.org Signed-off-by: Johannes Berg --- include/net/cfg80211.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index e9ca4726a732..6efe216c01d2 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -1204,7 +1204,7 @@ struct cfg80211_rnr_elems { struct { const u8 *data; size_t len; - } elem[]; + } elem[] __counted_by(cnt); }; /** -- cgit v1.2.3 From e3eac9f32ec04112b39e01b574ac739382469bf9 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Thu, 17 Aug 2023 14:15:28 -0700 Subject: wifi: cfg80211: Annotate struct cfg80211_scan_request with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct cfg80211_scan_request. Additionally, since the element count member must be set before accessing the annotated flexible array member, move its initialization earlier. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Johannes Berg Cc: "David S. Miller" Cc: Eric Dumazet Cc: Jakub Kicinski Cc: Paolo Abeni Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook Reviewed-by: Gustavo A. R. Silva Reviewed-by: Jeff Johnson Link: https://lore.kernel.org/r/20230817211531.4193219-6-keescook@chromium.org Signed-off-by: Johannes Berg --- include/net/cfg80211.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index 6efe216c01d2..a2afc94a5408 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -2544,7 +2544,7 @@ struct cfg80211_scan_request { struct cfg80211_scan_6ghz_params *scan_6ghz_params; /* keep last */ - struct ieee80211_channel *channels[]; + struct ieee80211_channel *channels[] __counted_by(n_channels); }; static inline void get_random_mask_addr(u8 *buf, const u8 *addr, const u8 *mask) -- cgit v1.2.3 From 545d3523dff0cf61ccd65da1c352f7c7c21fcfb0 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Thu, 17 Aug 2023 14:15:29 -0700 Subject: wifi: cfg80211: Annotate struct cfg80211_tid_config with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct cfg80211_tid_config. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Johannes Berg Cc: "David S. Miller" Cc: Eric Dumazet Cc: Jakub Kicinski Cc: Paolo Abeni Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook Reviewed-by: Gustavo A. R. Silva Reviewed-by: Jeff Johnson Link: https://lore.kernel.org/r/20230817211531.4193219-7-keescook@chromium.org Signed-off-by: Johannes Berg --- include/net/cfg80211.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index a2afc94a5408..423fe9b85cb0 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -811,7 +811,7 @@ struct cfg80211_tid_cfg { struct cfg80211_tid_config { const u8 *peer; u32 n_tid_conf; - struct cfg80211_tid_cfg tid_conf[]; + struct cfg80211_tid_cfg tid_conf[] __counted_by(n_tid_conf); }; /** -- cgit v1.2.3 From 43c2817225fce05701f062a996255007481935e2 Mon Sep 17 00:00:00 2001 From: Zhengchao Shao Date: Mon, 21 Aug 2023 16:41:04 +0800 Subject: net: remove unnecessary input parameter 'how' in ifdown function When the ifdown function in the dst_ops structure is referenced, the input parameter 'how' is always true. In the current implementation of the ifdown interface, ip6_dst_ifdown does not use the input parameter 'how', xfrm6_dst_ifdown and xfrm4_dst_ifdown functions use the input parameter 'unregister'. But false judgment on 'unregister' in xfrm6_dst_ifdown and xfrm4_dst_ifdown is false, so remove the input parameter 'how' in ifdown function. Signed-off-by: Zhengchao Shao Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20230821084104.3812233-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni --- include/net/dst_ops.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/dst_ops.h b/include/net/dst_ops.h index 632086b2f644..6d1c8541183d 100644 --- a/include/net/dst_ops.h +++ b/include/net/dst_ops.h @@ -23,7 +23,7 @@ struct dst_ops { u32 * (*cow_metrics)(struct dst_entry *, unsigned long); void (*destroy)(struct dst_entry *); void (*ifdown)(struct dst_entry *, - struct net_device *dev, int how); + struct net_device *dev); struct dst_entry * (*negative_advice)(struct dst_entry *); void (*link_failure)(struct sk_buff *); void (*update_pmtu)(struct dst_entry *dst, struct sock *sk, -- cgit v1.2.3 From a7ed3465daa240bdf01a5420f64336fee879c09d Mon Sep 17 00:00:00 2001 From: "GONG, Ruiqi" Date: Wed, 9 Aug 2023 15:45:03 +0800 Subject: netfilter: ebtables: fix fortify warnings in size_entry_mwt() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When compiling with gcc 13 and CONFIG_FORTIFY_SOURCE=y, the following warning appears: In function ‘fortify_memcpy_chk’, inlined from ‘size_entry_mwt’ at net/bridge/netfilter/ebtables.c:2118:2: ./include/linux/fortify-string.h:592:25: error: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning] 592 | __read_overflow2_field(q_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The compiler is complaining: memcpy(&offsets[1], &entry->watchers_offset, sizeof(offsets) - sizeof(offsets[0])); where memcpy reads beyong &entry->watchers_offset to copy {watchers,target,next}_offset altogether into offsets[]. Silence the warning by wrapping these three up via struct_group(). Signed-off-by: GONG, Ruiqi Reviewed-by: Gustavo A. R. Silva Reviewed-by: Kees Cook Signed-off-by: Florian Westphal --- include/uapi/linux/netfilter_bridge/ebtables.h | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) (limited to 'include') diff --git a/include/uapi/linux/netfilter_bridge/ebtables.h b/include/uapi/linux/netfilter_bridge/ebtables.h index a494cf43a755..b0caad82b693 100644 --- a/include/uapi/linux/netfilter_bridge/ebtables.h +++ b/include/uapi/linux/netfilter_bridge/ebtables.h @@ -182,12 +182,14 @@ struct ebt_entry { unsigned char sourcemsk[ETH_ALEN]; unsigned char destmac[ETH_ALEN]; unsigned char destmsk[ETH_ALEN]; - /* sizeof ebt_entry + matches */ - unsigned int watchers_offset; - /* sizeof ebt_entry + matches + watchers */ - unsigned int target_offset; - /* sizeof ebt_entry + matches + watchers + target */ - unsigned int next_offset; + __struct_group(/* no tag */, offsets, /* no attrs */, + /* sizeof ebt_entry + matches */ + unsigned int watchers_offset; + /* sizeof ebt_entry + matches + watchers */ + unsigned int target_offset; + /* sizeof ebt_entry + matches + watchers + target */ + unsigned int next_offset; + ); unsigned char elems[0] __attribute__ ((aligned (__alignof__(struct ebt_replace)))); }; -- cgit v1.2.3 From a2f02c9920b2cc3c6cc1f2c2aee37354e6edd801 Mon Sep 17 00:00:00 2001 From: "GONG, Ruiqi" Date: Wed, 9 Aug 2023 15:51:36 +0800 Subject: netfilter: ebtables: replace zero-length array members As suggested by Kees[1], replace the old-style 0-element array members of multiple structs in ebtables.h with modern C99 flexible array. [1]: https://lore.kernel.org/all/5E8E0F9C-EE3F-4B0D-B827-DC47397E2A4A@kernel.org/ [ fw@strlen.de: keep struct ebt_entry_target as-is, causes compiler warning: "variable sized type 'struct ebt_entry_target' not at the end of a struct or class is a GNU extension" ] Link: https://github.com/KSPP/linux/issues/21 Signed-off-by: GONG, Ruiqi Reviewed-by: Kees Cook Signed-off-by: Florian Westphal --- include/uapi/linux/netfilter_bridge/ebtables.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include') diff --git a/include/uapi/linux/netfilter_bridge/ebtables.h b/include/uapi/linux/netfilter_bridge/ebtables.h index b0caad82b693..4ff328f3d339 100644 --- a/include/uapi/linux/netfilter_bridge/ebtables.h +++ b/include/uapi/linux/netfilter_bridge/ebtables.h @@ -87,7 +87,7 @@ struct ebt_entries { /* nr. of entries */ unsigned int nentries; /* entry list */ - char data[0] __attribute__ ((aligned (__alignof__(struct ebt_replace)))); + char data[] __attribute__ ((aligned (__alignof__(struct ebt_replace)))); }; /* used for the bitmask of struct ebt_entry */ @@ -129,7 +129,7 @@ struct ebt_entry_match { } u; /* size of data */ unsigned int match_size; - unsigned char data[0] __attribute__ ((aligned (__alignof__(struct ebt_replace)))); + unsigned char data[] __attribute__ ((aligned (__alignof__(struct ebt_replace)))); }; struct ebt_entry_watcher { @@ -142,7 +142,7 @@ struct ebt_entry_watcher { } u; /* size of data */ unsigned int watcher_size; - unsigned char data[0] __attribute__ ((aligned (__alignof__(struct ebt_replace)))); + unsigned char data[] __attribute__ ((aligned (__alignof__(struct ebt_replace)))); }; struct ebt_entry_target { @@ -190,7 +190,7 @@ struct ebt_entry { /* sizeof ebt_entry + matches + watchers + target */ unsigned int next_offset; ); - unsigned char elems[0] __attribute__ ((aligned (__alignof__(struct ebt_replace)))); + unsigned char elems[] __attribute__ ((aligned (__alignof__(struct ebt_replace)))); }; static __inline__ struct ebt_entry_target * -- cgit v1.2.3 From 49e62a0462a2d669e35f97eb3e0a6f2aab20518d Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Mon, 21 Aug 2023 21:02:18 +0800 Subject: net: mscc: ocelot: Remove unused declarations Commit 6c30384eb1de ("net: mscc: ocelot: register devlink ports") declared but never implemented ocelot_devlink_init() and ocelot_devlink_teardown(). Commit 2096805497e2 ("net: mscc: ocelot: automatically detect VCAP constants") declared but never implemented ocelot_detect_vcap_constants(). Commit 403ffc2c34de ("net: mscc: ocelot: add support for preemptible traffic classes") declared but never implemented ocelot_port_update_preemptible_tcs(). Signed-off-by: Yue Haibing Reviewed-by: Vladimir Oltean Link: https://lore.kernel.org/r/20230821130218.19096-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski --- include/soc/mscc/ocelot.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include') diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h index a8c2817335b9..1e1b40f4e664 100644 --- a/include/soc/mscc/ocelot.h +++ b/include/soc/mscc/ocelot.h @@ -1165,7 +1165,6 @@ int ocelot_port_get_mm(struct ocelot *ocelot, int port, struct ethtool_mm_state *state); int ocelot_port_mqprio(struct ocelot *ocelot, int port, struct tc_mqprio_qopt_offload *mqprio); -void ocelot_port_update_preemptible_tcs(struct ocelot *ocelot, int port); #if IS_ENABLED(CONFIG_BRIDGE_MRP) int ocelot_mrp_add(struct ocelot *ocelot, int port, -- cgit v1.2.3 From 7bdfda42f043fce2e590252f264e0832b31a3e92 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Mon, 7 Aug 2023 22:50:32 +0800 Subject: wifi: wext: Remove unused declaration dev_get_wireless_info() Commit 556829657397 ("[NL80211]: add netlink interface to cfg80211") declared but never implemented this, remove it. Commit 11433ee450eb ("[WEXT]: Move to net/wireless") rename net/core/wireless.c to net/wireless/wext.c, then commit 3d23e349d807 ("wext: refactor") refactor wext.c to wext-core.c, fix the wext comment. Signed-off-by: Yue Haibing Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20230807145032.44768-1-yuehaibing@huawei.com Signed-off-by: Johannes Berg --- include/net/iw_handler.h | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-) (limited to 'include') diff --git a/include/net/iw_handler.h b/include/net/iw_handler.h index d2ea5863eedc..b2cf243ebe44 100644 --- a/include/net/iw_handler.h +++ b/include/net/iw_handler.h @@ -426,17 +426,10 @@ struct iw_public_data { /**************************** PROTOTYPES ****************************/ /* - * Functions part of the Wireless Extensions (defined in net/core/wireless.c). - * Those may be called only within the kernel. + * Functions part of the Wireless Extensions (defined in net/wireless/wext-core.c). + * Those may be called by driver modules. */ -/* First : function strictly used inside the kernel */ - -/* Handle /proc/net/wireless, called in net/code/dev.c */ -int dev_get_wireless_info(char *buffer, char **start, off_t offset, int length); - -/* Second : functions that may be called by driver modules */ - /* Send a single event to user space */ void wireless_send_event(struct net_device *dev, unsigned int cmd, union iwreq_data *wrqu, const char *extra); -- cgit v1.2.3 From 1dcf396b4223143fcd3ef6d5e2acdbb6f7bea2e5 Mon Sep 17 00:00:00 2001 From: Dmitry Antipov Date: Thu, 13 Jul 2023 16:29:36 +0300 Subject: wifi: cfg80211: improve documentation for flag fields Fix and hopefully improve documentation for 'flag' fields of a few types by adding references to relevant enumerations. Signed-off-by: Dmitry Antipov Link: https://lore.kernel.org/r/20230713132957.275859-1-dmantipov@yandex.ru Signed-off-by: Johannes Berg --- include/net/cfg80211.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'include') diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index 423fe9b85cb0..070ac0c62bd4 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -263,7 +263,7 @@ enum ieee80211_privacy { * are only for driver use when pointers to this structure are * passed around. * - * @flags: rate-specific flags + * @flags: rate-specific flags from &enum ieee80211_rate_flags * @bitrate: bitrate in units of 100 Kbps * @hw_value: driver/hardware value for this rate * @hw_value_short: driver/hardware value for this rate when @@ -1353,7 +1353,7 @@ struct cfg80211_unsol_bcast_probe_resp { * @twt_responder: Enable Target Wait Time * @he_required: stations must support HE * @sae_h2e_required: stations must support direct H2E technique in SAE - * @flags: flags, as defined in enum cfg80211_ap_settings_flags + * @flags: flags, as defined in &enum nl80211_ap_settings_flags * @he_obss_pd: OBSS Packet Detection settings * @he_oper: HE operation IE (or %NULL if HE isn't enabled) * @fils_discovery: FILS discovery transmission parameters @@ -2156,7 +2156,7 @@ enum mpath_info_flags { * @sn: target sequence number * @metric: metric (cost) of this mesh path * @exptime: expiration time for the mesh path from now, in msecs - * @flags: mesh path flags + * @flags: mesh path flags from &enum mesh_path_flags * @discovery_timeout: total mesh path discovery timeout, in msecs * @discovery_retries: mesh path discovery retries * @generation: generation number for nl80211 dumps. @@ -2496,7 +2496,7 @@ struct cfg80211_scan_6ghz_params { * the actual dwell time may be shorter. * @duration_mandatory: if set, the scan duration must be as specified by the * %duration field. - * @flags: bit field of flags controlling operation + * @flags: control flags from &enum nl80211_scan_flags * @rates: bitmap of rates to advertise for each band * @wiphy: the wiphy this was for * @scan_start: time (in jiffies) when the scan started @@ -2616,7 +2616,7 @@ struct cfg80211_bss_select_adjust { * @scan_width: channel width for scanning * @ie: optional information element(s) to add into Probe Request or %NULL * @ie_len: length of ie in octets - * @flags: bit field of flags controlling operation + * @flags: control flags from &enum nl80211_scan_flags * @match_sets: sets of parameters to be matched for a scan result * entry to be considered valid and to be passed to the host * (others are filtered out). @@ -8118,7 +8118,7 @@ void cfg80211_conn_failed(struct net_device *dev, const u8 *mac_addr, * @link_id: the ID of the link the frame was received on * @buf: Management frame (header + body) * @len: length of the frame data - * @flags: flags, as defined in enum nl80211_rxmgmt_flags + * @flags: flags, as defined in &enum nl80211_rxmgmt_flags * @rx_tstamp: Hardware timestamp of frame RX in nanoseconds * @ack_tstamp: Hardware timestamp of ack TX in nanoseconds */ -- cgit v1.2.3 From a49a0d4e573e22f47218668ee4137cdcdc391652 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Mon, 10 Jul 2023 16:03:02 -0700 Subject: wifi: cfg80211: remove dead/unused enum value Drop an unused (extra) enum value to prevent a kernel-doc warning. cfg80211.h:1492: warning: Excess enum value 'STATION_PARAM_APPLY_STA_TXPOWER' description in 'station_parameters_apply_mask' Fixes: 2d8b08fef0af ("wifi: cfg80211: fix kernel-doc warnings all over the file") Signed-off-by: Randy Dunlap Cc: "David S. Miller" Cc: Eric Dumazet Cc: Jakub Kicinski Cc: Paolo Abeni Cc: Johannes Berg Cc: linux-wireless@vger.kernel.org Cc: Mauro Carvalho Chehab Link: https://lore.kernel.org/r/20230710230312.31197-3-rdunlap@infradead.org Signed-off-by: Johannes Berg --- include/net/cfg80211.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include') diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index 070ac0c62bd4..3a4b684f89bf 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -1482,7 +1482,6 @@ struct iface_combination_params { * @STATION_PARAM_APPLY_UAPSD: apply new uAPSD parameters (uapsd_queues, max_sp) * @STATION_PARAM_APPLY_CAPABILITY: apply new capability * @STATION_PARAM_APPLY_PLINK_STATE: apply new plink state - * @STATION_PARAM_APPLY_STA_TXPOWER: apply tx power for STA * * Not all station parameters have in-band "no change" signalling, * for those that don't these flags will are used. -- cgit v1.2.3 From 266a5cd768da7e243cd26bc6840ce48cad1c907e Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Mon, 10 Jul 2023 16:03:06 -0700 Subject: wifi: radiotap: fix kernel-doc notation warnings Fix a typo (82011 -> 80211) to prevent a kernel-doc warning. Add one missing function parameter description to prevent a kernel-doc warning. ieee80211_radiotap.h:52: warning: expecting prototype for struct ieee82011_radiotap_header. Prototype was for struct ieee80211_radiotap_header instead ieee80211_radiotap.h:581: warning: Function parameter or member 'data' not described in 'ieee80211_get_radiotap_len' Fixes: 42f82e2e62ae ("wireless: radiotap: rewrite the radiotap header file") Signed-off-by: Randy Dunlap Cc: "David S. Miller" Cc: Eric Dumazet Cc: Jakub Kicinski Cc: Paolo Abeni Cc: Johannes Berg Cc: linux-wireless@vger.kernel.org Link: https://lore.kernel.org/r/20230710230312.31197-7-rdunlap@infradead.org Signed-off-by: Johannes Berg --- include/net/ieee80211_radiotap.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/ieee80211_radiotap.h b/include/net/ieee80211_radiotap.h index c4722a9963de..2338f8d2a8b3 100644 --- a/include/net/ieee80211_radiotap.h +++ b/include/net/ieee80211_radiotap.h @@ -21,7 +21,7 @@ #include /** - * struct ieee82011_radiotap_header - base radiotap header + * struct ieee80211_radiotap_header - base radiotap header */ struct ieee80211_radiotap_header { /** @@ -575,6 +575,7 @@ enum ieee80211_radiotap_eht_usig_tb { /** * ieee80211_get_radiotap_len - get radiotap header length + * @data: pointer to the header */ static inline u16 ieee80211_get_radiotap_len(const char *data) { -- cgit v1.2.3 From c6662a4b3ecf03aa832125e9705a185402cd3b17 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Mon, 10 Jul 2023 16:03:09 -0700 Subject: wifi: mac80211: fix kernel-doc notation warning Add description for struct member 'agg' to prevent a kernel-doc warning. mac80211.h:2289: warning: Function parameter or member 'agg' not described in 'ieee80211_link_sta' Fixes: 4c51541ddb78 ("wifi: mac80211: keep A-MSDU data in sta and per-link") Signed-off-by: Randy Dunlap Cc: "David S. Miller" Cc: Eric Dumazet Cc: Jakub Kicinski Cc: Paolo Abeni Cc: Johannes Berg Cc: Benjamin Berg Cc: linux-wireless@vger.kernel.org Link: https://lore.kernel.org/r/20230710230312.31197-10-rdunlap@infradead.org [reword the kernel-doc comment] Signed-off-by: Johannes Berg --- include/net/mac80211.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/net/mac80211.h b/include/net/mac80211.h index 3a8a2d2c58c3..d7fa0a55067e 100644 --- a/include/net/mac80211.h +++ b/include/net/mac80211.h @@ -2259,6 +2259,7 @@ struct ieee80211_sta_aggregates { * @he_cap: HE capabilities of this STA * @he_6ghz_capa: on 6 GHz, holds the HE 6 GHz band capabilities * @eht_cap: EHT capabilities of this STA + * @agg: per-link data for multi-link aggregation * @bandwidth: current bandwidth the station can receive with * @rx_nss: in HT/VHT, the maximum number of spatial streams the * station can receive at the moment, changed by operating mode -- cgit v1.2.3 From a7a2ef0c4b3efbd7d6f3fabd87dbbc0b3f2de5af Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Fri, 23 Jun 2023 17:24:00 +0200 Subject: mac80211: make ieee80211_tx_info padding explicit While looking at a bug, I got rather confused by the layout of the 'status' field in ieee80211_tx_info. Apparently, the intention is that status_driver_data[] is used for driver specific data, and fills up the size of the union to 40 bytes, just like the other ones. This is indeed what actually happens, but only because of the combination of two mistakes: - "void *status_driver_data[18 / sizeof(void *)];" is intended to be 18 bytes long but is actually two bytes shorter because of rounding-down in the division, to a multiple of the pointer size (4 bytes or 8 bytes). - The other fields combined are intended to be 22 bytes long, but are actually 24 bytes because of padding in front of the unaligned tx_time member, and in front of the pointer array. The two mistakes cancel out. so the size ends up fine, but it seems more helpful to make this explicit, by having a multiple of 8 bytes in the size calculation and explicitly describing the padding. Fixes: ea5907db2a9cc ("mac80211: fix struct ieee80211_tx_info size") Fixes: 02219b3abca59 ("mac80211: add WMM admission control support") Signed-off-by: Arnd Bergmann Reviewed-by: Kees Cook Link: https://lore.kernel.org/r/20230623152443.2296825-2-arnd@kernel.org Signed-off-by: Johannes Berg --- include/net/mac80211.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/mac80211.h b/include/net/mac80211.h index d7fa0a55067e..f1420e5af5de 100644 --- a/include/net/mac80211.h +++ b/include/net/mac80211.h @@ -1192,9 +1192,11 @@ struct ieee80211_tx_info { u8 ampdu_ack_len; u8 ampdu_len; u8 antenna; + u8 pad; u16 tx_time; u8 flags; - void *status_driver_data[18 / sizeof(void *)]; + u8 pad2; + void *status_driver_data[16 / sizeof(void *)]; } status; struct { struct ieee80211_tx_rate driver_rates[ -- cgit v1.2.3 From 9e261e6da0a814f4ee1856ab06b19c25190aeffb Mon Sep 17 00:00:00 2001 From: Jeff Johnson Date: Tue, 22 Aug 2023 09:30:17 -0700 Subject: wifi: Fix ieee80211.h kernel-doc issues The kernel-doc script identified multiple issues in ieee80211.h, so fix them. In the process update some references to the latest applicable specification. include/linux/ieee80211.h:848: warning: Function parameter or member 'count' not described in 'ieee80211_quiet_ie' include/linux/ieee80211.h:848: warning: Function parameter or member 'period' not described in 'ieee80211_quiet_ie' include/linux/ieee80211.h:848: warning: Function parameter or member 'duration' not described in 'ieee80211_quiet_ie' include/linux/ieee80211.h:848: warning: Function parameter or member 'offset' not described in 'ieee80211_quiet_ie' include/linux/ieee80211.h:860: warning: Function parameter or member 'token' not described in 'ieee80211_msrment_ie' include/linux/ieee80211.h:860: warning: Function parameter or member 'mode' not described in 'ieee80211_msrment_ie' include/linux/ieee80211.h:860: warning: Function parameter or member 'type' not described in 'ieee80211_msrment_ie' include/linux/ieee80211.h:860: warning: Function parameter or member 'request' not described in 'ieee80211_msrment_ie' include/linux/ieee80211.h:871: warning: Function parameter or member 'mode' not described in 'ieee80211_channel_sw_ie' include/linux/ieee80211.h:871: warning: Function parameter or member 'new_ch_num' not described in 'ieee80211_channel_sw_ie' include/linux/ieee80211.h:871: warning: Function parameter or member 'count' not described in 'ieee80211_channel_sw_ie' include/linux/ieee80211.h:883: warning: Function parameter or member 'mode' not described in 'ieee80211_ext_chansw_ie' include/linux/ieee80211.h:883: warning: Function parameter or member 'new_operating_class' not described in 'ieee80211_ext_chansw_ie' include/linux/ieee80211.h:883: warning: Function parameter or member 'new_ch_num' not described in 'ieee80211_ext_chansw_ie' include/linux/ieee80211.h:883: warning: Function parameter or member 'count' not described in 'ieee80211_ext_chansw_ie' include/linux/ieee80211.h:905: warning: Function parameter or member 'mesh_ttl' not described in 'ieee80211_mesh_chansw_params_ie' include/linux/ieee80211.h:905: warning: Function parameter or member 'mesh_flags' not described in 'ieee80211_mesh_chansw_params_ie' include/linux/ieee80211.h:905: warning: Function parameter or member 'mesh_reason' not described in 'ieee80211_mesh_chansw_params_ie' include/linux/ieee80211.h:905: warning: Function parameter or member 'mesh_pre_value' not described in 'ieee80211_mesh_chansw_params_ie' include/linux/ieee80211.h:913: warning: Function parameter or member 'new_channel_width' not described in 'ieee80211_wide_bw_chansw_ie' include/linux/ieee80211.h:913: warning: Function parameter or member 'new_center_freq_seg0' not described in 'ieee80211_wide_bw_chansw_ie' include/linux/ieee80211.h:913: warning: Function parameter or member 'new_center_freq_seg1' not described in 'ieee80211_wide_bw_chansw_ie' include/linux/ieee80211.h:926: warning: expecting prototype for struct ieee80211_tim. Prototype was for struct ieee80211_tim_ie instead include/linux/ieee80211.h:941: warning: Function parameter or member 'meshconf_psel' not described in 'ieee80211_meshconf_ie' include/linux/ieee80211.h:941: warning: Function parameter or member 'meshconf_pmetric' not described in 'ieee80211_meshconf_ie' include/linux/ieee80211.h:941: warning: Function parameter or member 'meshconf_congest' not described in 'ieee80211_meshconf_ie' include/linux/ieee80211.h:941: warning: Function parameter or member 'meshconf_synch' not described in 'ieee80211_meshconf_ie' include/linux/ieee80211.h:941: warning: Function parameter or member 'meshconf_auth' not described in 'ieee80211_meshconf_ie' include/linux/ieee80211.h:941: warning: Function parameter or member 'meshconf_form' not described in 'ieee80211_meshconf_ie' include/linux/ieee80211.h:941: warning: Function parameter or member 'meshconf_cap' not described in 'ieee80211_meshconf_ie' include/linux/ieee80211.h:964: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * mesh channel switch parameters element's flag indicator include/linux/ieee80211.h:984: warning: Function parameter or member 'rann_flags' not described in 'ieee80211_rann_ie' include/linux/ieee80211.h:984: warning: Function parameter or member 'rann_hopcount' not described in 'ieee80211_rann_ie' include/linux/ieee80211.h:984: warning: Function parameter or member 'rann_ttl' not described in 'ieee80211_rann_ie' include/linux/ieee80211.h:984: warning: Function parameter or member 'rann_addr' not described in 'ieee80211_rann_ie' include/linux/ieee80211.h:984: warning: Function parameter or member 'rann_seq' not described in 'ieee80211_rann_ie' include/linux/ieee80211.h:984: warning: Function parameter or member 'rann_interval' not described in 'ieee80211_rann_ie' include/linux/ieee80211.h:984: warning: Function parameter or member 'rann_metric' not described in 'ieee80211_rann_ie' include/linux/ieee80211.h:1019: warning: expecting prototype for enum ieee80211_opmode_bits. Prototype was for enum ieee80211_vht_opmode_bits instead include/linux/ieee80211.h:1052: warning: Function parameter or member 'tx_power' not described in 'ieee80211_tpc_report_ie' include/linux/ieee80211.h:1052: warning: Function parameter or member 'link_margin' not described in 'ieee80211_tpc_report_ie' include/linux/ieee80211.h:1073: warning: Function parameter or member 'compat_info' not described in 'ieee80211_s1g_bcn_compat_ie' include/linux/ieee80211.h:1073: warning: Function parameter or member 'beacon_int' not described in 'ieee80211_s1g_bcn_compat_ie' include/linux/ieee80211.h:1073: warning: Function parameter or member 'tsf_completion' not described in 'ieee80211_s1g_bcn_compat_ie' include/linux/ieee80211.h:1086: warning: Function parameter or member 'ch_width' not described in 'ieee80211_s1g_oper_ie' include/linux/ieee80211.h:1086: warning: Function parameter or member 'oper_class' not described in 'ieee80211_s1g_oper_ie' include/linux/ieee80211.h:1086: warning: Function parameter or member 'primary_ch' not described in 'ieee80211_s1g_oper_ie' include/linux/ieee80211.h:1086: warning: Function parameter or member 'oper_ch' not described in 'ieee80211_s1g_oper_ie' include/linux/ieee80211.h:1086: warning: Function parameter or member 'basic_mcs_nss' not described in 'ieee80211_s1g_oper_ie' include/linux/ieee80211.h:1097: warning: Function parameter or member 'aid' not described in 'ieee80211_aid_response_ie' include/linux/ieee80211.h:1097: warning: Function parameter or member 'switch_count' not described in 'ieee80211_aid_response_ie' include/linux/ieee80211.h:1097: warning: Function parameter or member 'response_int' not described in 'ieee80211_aid_response_ie' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_STATUS' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_MINOR_REASON' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_CAPABILITY' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_DEVICE_ID' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_GO_INTENT' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_GO_CONFIG_TIMEOUT' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_LISTEN_CHANNEL' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_GROUP_BSSID' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_EXT_LISTEN_TIMING' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_INTENDED_IFACE_ADDR' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_MANAGABILITY' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_CHANNEL_LIST' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_ABSENCE_NOTICE' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_DEVICE_INFO' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_GROUP_INFO' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_GROUP_ID' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_INTERFACE' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_OPER_CHANNEL' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_INVITE_FLAGS' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_VENDOR_SPECIFIC' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1519: warning: Enum value 'IEEE80211_P2P_ATTR_MAX' not described in enum 'ieee80211_p2p_attr_id' include/linux/ieee80211.h:1554: warning: Function parameter or member 'frame_control' not described in 'ieee80211_bar' include/linux/ieee80211.h:1554: warning: Function parameter or member 'duration' not described in 'ieee80211_bar' include/linux/ieee80211.h:1554: warning: Function parameter or member 'ra' not described in 'ieee80211_bar' include/linux/ieee80211.h:1554: warning: Function parameter or member 'ta' not described in 'ieee80211_bar' include/linux/ieee80211.h:1554: warning: Function parameter or member 'control' not described in 'ieee80211_bar' include/linux/ieee80211.h:1554: warning: Function parameter or member 'start_seq_num' not described in 'ieee80211_bar' include/linux/ieee80211.h:1579: warning: Function parameter or member 'reserved' not described in 'ieee80211_mcs_info' include/linux/ieee80211.h:1618: warning: Function parameter or member 'cap_info' not described in 'ieee80211_ht_cap' include/linux/ieee80211.h:1618: warning: Function parameter or member 'ampdu_params_info' not described in 'ieee80211_ht_cap' include/linux/ieee80211.h:1618: warning: Function parameter or member 'mcs' not described in 'ieee80211_ht_cap' include/linux/ieee80211.h:1618: warning: Function parameter or member 'extended_ht_cap_info' not described in 'ieee80211_ht_cap' include/linux/ieee80211.h:1618: warning: Function parameter or member 'tx_BF_cap_info' not described in 'ieee80211_ht_cap' include/linux/ieee80211.h:1618: warning: Function parameter or member 'antenna_selection_info' not described in 'ieee80211_ht_cap' include/linux/ieee80211.h:1704: warning: Function parameter or member 'primary_chan' not described in 'ieee80211_ht_operation' include/linux/ieee80211.h:1704: warning: Function parameter or member 'ht_param' not described in 'ieee80211_ht_operation' include/linux/ieee80211.h:1704: warning: Function parameter or member 'operation_mode' not described in 'ieee80211_ht_operation' include/linux/ieee80211.h:1704: warning: Function parameter or member 'stbc_param' not described in 'ieee80211_ht_operation' include/linux/ieee80211.h:1704: warning: Function parameter or member 'basic_set' not described in 'ieee80211_ht_operation' include/linux/ieee80211.h:1872: warning: Function parameter or member 'mac_cap_info' not described in 'ieee80211_he_cap_elem' include/linux/ieee80211.h:1872: warning: Function parameter or member 'phy_cap_info' not described in 'ieee80211_he_cap_elem' include/linux/ieee80211.h:1936: warning: Function parameter or member 'he_oper_params' not described in 'ieee80211_he_operation' include/linux/ieee80211.h:1936: warning: Function parameter or member 'he_mcs_nss_set' not described in 'ieee80211_he_operation' include/linux/ieee80211.h:1936: warning: Function parameter or member 'optional' not described in 'ieee80211_he_operation' include/linux/ieee80211.h:1948: warning: Function parameter or member 'he_sr_control' not described in 'ieee80211_he_spr' include/linux/ieee80211.h:1948: warning: Function parameter or member 'optional' not described in 'ieee80211_he_spr' include/linux/ieee80211.h:1960: warning: Function parameter or member 'aifsn' not described in 'ieee80211_he_mu_edca_param_ac_rec' include/linux/ieee80211.h:1960: warning: Function parameter or member 'ecw_min_max' not described in 'ieee80211_he_mu_edca_param_ac_rec' include/linux/ieee80211.h:1960: warning: Function parameter or member 'mu_edca_timer' not described in 'ieee80211_he_mu_edca_param_ac_rec' include/linux/ieee80211.h:1974: warning: Function parameter or member 'mu_qos_info' not described in 'ieee80211_mu_edca_param_set' include/linux/ieee80211.h:1974: warning: Function parameter or member 'ac_be' not described in 'ieee80211_mu_edca_param_set' include/linux/ieee80211.h:1974: warning: Function parameter or member 'ac_bk' not described in 'ieee80211_mu_edca_param_set' include/linux/ieee80211.h:1974: warning: Function parameter or member 'ac_vi' not described in 'ieee80211_mu_edca_param_set' include/linux/ieee80211.h:1974: warning: Function parameter or member 'ac_vo' not described in 'ieee80211_mu_edca_param_set' include/linux/ieee80211.h:2194: warning: Enum value 'IEEE80211_REG_LPI_AP' not described in enum 'ieee80211_ap_reg_power' include/linux/ieee80211.h:2194: warning: Enum value 'IEEE80211_REG_SP_AP' not described in enum 'ieee80211_ap_reg_power' include/linux/ieee80211.h:2194: warning: Enum value 'IEEE80211_REG_VLP_AP' not described in enum 'ieee80211_ap_reg_power' include/linux/ieee80211.h:2194: warning: Excess enum value 'IEEE80211_REG_SP' description in 'ieee80211_ap_reg_power' include/linux/ieee80211.h:2194: warning: Excess enum value 'IEEE80211_REG_VLP' description in 'ieee80211_ap_reg_power' include/linux/ieee80211.h:2194: warning: Excess enum value 'IEEE80211_REG_LPI' description in 'ieee80211_ap_reg_power' include/linux/ieee80211.h:2577: warning: cannot understand function prototype: 'struct ieee80211_he_6ghz_oper ' include/linux/ieee80211.h:2624: warning: Function parameter or member 'tx_power_info' not described in 'ieee80211_tx_pwr_env' include/linux/ieee80211.h:2624: warning: Function parameter or member 'tx_power' not described in 'ieee80211_tx_pwr_env' include/linux/ieee80211.h:4485: warning: expecting prototype for RSNX Capabilities(). Prototype was for WLAN_RSNX_CAPA_PROTECTED_TWT() instead include/linux/ieee80211.h:4734: warning: expecting prototype for ieee80211_mle_get_eml_sync_delay(). Prototype was for ieee80211_mle_get_eml_med_sync_delay() instead 117 warnings as Errors Signed-off-by: Jeff Johnson Link: https://lore.kernel.org/r/20230822-kerneldoc-v1-1-0d42ce5029bf@quicinc.com Signed-off-by: Johannes Berg --- include/linux/ieee80211.h | 235 ++++++++++++++++++++++++++++++++++------------ 1 file changed, 177 insertions(+), 58 deletions(-) (limited to 'include') diff --git a/include/linux/ieee80211.h b/include/linux/ieee80211.h index 4b998090898e..bd2f6e19c357 100644 --- a/include/linux/ieee80211.h +++ b/include/linux/ieee80211.h @@ -836,9 +836,14 @@ enum ieee80211_preq_target_flags { }; /** - * struct ieee80211_quiet_ie + * struct ieee80211_quiet_ie - Quiet element + * @count: Quiet Count + * @period: Quiet Period + * @duration: Quiet Duration + * @offset: Quiet Offset * - * This structure refers to "Quiet information element" + * This structure represents the payload of the "Quiet element" as + * described in IEEE Std 802.11-2020 section 9.4.2.22. */ struct ieee80211_quiet_ie { u8 count; @@ -848,9 +853,15 @@ struct ieee80211_quiet_ie { } __packed; /** - * struct ieee80211_msrment_ie + * struct ieee80211_msrment_ie - Measurement element + * @token: Measurement Token + * @mode: Measurement Report Mode + * @type: Measurement Type + * @request: Measurement Request or Measurement Report * - * This structure refers to "Measurement Request/Report information element" + * This structure represents the payload of both the "Measurement + * Request element" and the "Measurement Report element" as described + * in IEEE Std 802.11-2020 sections 9.4.2.20 and 9.4.2.21. */ struct ieee80211_msrment_ie { u8 token; @@ -860,9 +871,14 @@ struct ieee80211_msrment_ie { } __packed; /** - * struct ieee80211_channel_sw_ie + * struct ieee80211_channel_sw_ie - Channel Switch Announcement element + * @mode: Channel Switch Mode + * @new_ch_num: New Channel Number + * @count: Channel Switch Count * - * This structure refers to "Channel Switch Announcement information element" + * This structure represents the payload of the "Channel Switch + * Announcement element" as described in IEEE Std 802.11-2020 section + * 9.4.2.18. */ struct ieee80211_channel_sw_ie { u8 mode; @@ -871,9 +887,14 @@ struct ieee80211_channel_sw_ie { } __packed; /** - * struct ieee80211_ext_chansw_ie + * struct ieee80211_ext_chansw_ie - Extended Channel Switch Announcement element + * @mode: Channel Switch Mode + * @new_operating_class: New Operating Class + * @new_ch_num: New Channel Number + * @count: Channel Switch Count * - * This structure represents the "Extended Channel Switch Announcement element" + * This structure represents the "Extended Channel Switch Announcement + * element" as described in IEEE Std 802.11-2020 section 9.4.2.52. */ struct ieee80211_ext_chansw_ie { u8 mode; @@ -894,8 +915,14 @@ struct ieee80211_sec_chan_offs_ie { /** * struct ieee80211_mesh_chansw_params_ie - mesh channel switch parameters IE + * @mesh_ttl: Time To Live + * @mesh_flags: Flags + * @mesh_reason: Reason Code + * @mesh_pre_value: Precedence Value * - * This structure represents the "Mesh Channel Switch Paramters element" + * This structure represents the payload of the "Mesh Channel Switch + * Parameters element" as described in IEEE Std 802.11-2020 section + * 9.4.2.102. */ struct ieee80211_mesh_chansw_params_ie { u8 mesh_ttl; @@ -906,6 +933,13 @@ struct ieee80211_mesh_chansw_params_ie { /** * struct ieee80211_wide_bw_chansw_ie - wide bandwidth channel switch IE + * @new_channel_width: New Channel Width + * @new_center_freq_seg0: New Channel Center Frequency Segment 0 + * @new_center_freq_seg1: New Channel Center Frequency Segment 1 + * + * This structure represents the payload of the "Wide Bandwidth + * Channel Switch element" as described in IEEE Std 802.11-2020 + * section 9.4.2.160. */ struct ieee80211_wide_bw_chansw_ie { u8 new_channel_width; @@ -913,9 +947,14 @@ struct ieee80211_wide_bw_chansw_ie { } __packed; /** - * struct ieee80211_tim + * struct ieee80211_tim_ie - Traffic Indication Map information element + * @dtim_count: DTIM Count + * @dtim_period: DTIM Period + * @bitmap_ctrl: Bitmap Control + * @virtual_map: Partial Virtual Bitmap * - * This structure refers to "Traffic Indication Map information element" + * This structure represents the payload of the "TIM element" as + * described in IEEE Std 802.11-2020 section 9.4.2.5. */ struct ieee80211_tim_ie { u8 dtim_count; @@ -926,9 +965,17 @@ struct ieee80211_tim_ie { } __packed; /** - * struct ieee80211_meshconf_ie + * struct ieee80211_meshconf_ie - Mesh Configuration element + * @meshconf_psel: Active Path Selection Protocol Identifier + * @meshconf_pmetric: Active Path Selection Metric Identifier + * @meshconf_congest: Congestion Control Mode Identifier + * @meshconf_synch: Synchronization Method Identifier + * @meshconf_auth: Authentication Protocol Identifier + * @meshconf_form: Mesh Formation Info + * @meshconf_cap: Mesh Capability (see &enum mesh_config_capab_flags) * - * This structure refers to "Mesh Configuration information element" + * This structure represents the payload of the "Mesh Configuration + * element" as described in IEEE Std 802.11-2020 section 9.4.2.97. */ struct ieee80211_meshconf_ie { u8 meshconf_psel; @@ -950,6 +997,9 @@ struct ieee80211_meshconf_ie { * is ongoing * @IEEE80211_MESHCONF_CAPAB_POWER_SAVE_LEVEL: STA is in deep sleep mode or has * neighbors in deep sleep mode + * + * Enumerates the "Mesh Capability" as described in IEEE Std + * 802.11-2020 section 9.4.2.97.7. */ enum mesh_config_capab_flags { IEEE80211_MESHCONF_CAPAB_ACCEPT_PLINKS = 0x01, @@ -960,7 +1010,7 @@ enum mesh_config_capab_flags { #define IEEE80211_MESHCONF_FORM_CONNECTED_TO_GATE 0x1 -/** +/* * mesh channel switch parameters element's flag indicator * */ @@ -969,9 +1019,17 @@ enum mesh_config_capab_flags { #define WLAN_EID_CHAN_SWITCH_PARAM_REASON BIT(2) /** - * struct ieee80211_rann_ie + * struct ieee80211_rann_ie - RANN (root announcement) element + * @rann_flags: Flags + * @rann_hopcount: Hop Count + * @rann_ttl: Element TTL + * @rann_addr: Root Mesh STA Address + * @rann_seq: HWMP Sequence Number + * @rann_interval: Interval + * @rann_metric: Metric * - * This structure refers to "Root Announcement information element" + * This structure represents the payload of the "RANN element" as + * described in IEEE Std 802.11-2020 section 9.4.2.111. */ struct ieee80211_rann_ie { u8 rann_flags; @@ -993,7 +1051,7 @@ enum ieee80211_ht_chanwidth_values { }; /** - * enum ieee80211_opmode_bits - VHT operating mode field bits + * enum ieee80211_vht_opmode_bits - VHT operating mode field bits * @IEEE80211_OPMODE_NOTIF_CHANWIDTH_MASK: channel width mask * @IEEE80211_OPMODE_NOTIF_CHANWIDTH_20MHZ: 20 MHz channel width * @IEEE80211_OPMODE_NOTIF_CHANWIDTH_40MHZ: 40 MHz channel width @@ -1042,9 +1100,12 @@ enum ieee80211_s1g_chanwidth { #define WLAN_USER_POSITION_LEN 16 /** - * struct ieee80211_tpc_report_ie + * struct ieee80211_tpc_report_ie - TPC Report element + * @tx_power: Transmit Power + * @link_margin: Link Margin * - * This structure refers to "TPC Report element" + * This structure represents the payload of the "TPC Report element" as + * described in IEEE Std 802.11-2020 section 9.4.2.16. */ struct ieee80211_tpc_report_ie { u8 tx_power; @@ -1062,9 +1123,14 @@ struct ieee80211_addba_ext_ie { } __packed; /** - * struct ieee80211_s1g_bcn_compat_ie + * struct ieee80211_s1g_bcn_compat_ie - S1G Beacon Compatibility element + * @compat_info: Compatibility Information + * @beacon_int: Beacon Interval + * @tsf_completion: TSF Completion * - * S1G Beacon Compatibility element + * This structure represents the payload of the "S1G Beacon + * Compatibility element" as described in IEEE Std 802.11-2020 section + * 9.4.2.196. */ struct ieee80211_s1g_bcn_compat_ie { __le16 compat_info; @@ -1073,9 +1139,15 @@ struct ieee80211_s1g_bcn_compat_ie { } __packed; /** - * struct ieee80211_s1g_oper_ie + * struct ieee80211_s1g_oper_ie - S1G Operation element + * @ch_width: S1G Operation Information Channel Width + * @oper_class: S1G Operation Information Operating Class + * @primary_ch: S1G Operation Information Primary Channel Number + * @oper_ch: S1G Operation Information Channel Center Frequency + * @basic_mcs_nss: Basic S1G-MCS and NSS Set * - * S1G Operation element + * This structure represents the payload of the "S1G Operation + * element" as described in IEEE Std 802.11-2020 section 9.4.2.212. */ struct ieee80211_s1g_oper_ie { u8 ch_width; @@ -1086,9 +1158,13 @@ struct ieee80211_s1g_oper_ie { } __packed; /** - * struct ieee80211_aid_response_ie + * struct ieee80211_aid_response_ie - AID Response element + * @aid: AID/Group AID + * @switch_count: AID Switch Count + * @response_int: AID Response Interval * - * AID Response element + * This structure represents the payload of the "AID Response element" + * as described in IEEE Std 802.11-2020 section 9.4.2.194. */ struct ieee80211_aid_response_ie { __le16 aid; @@ -1489,7 +1565,7 @@ struct ieee80211_tdls_data { /* * Peer-to-Peer IE attribute related definitions. */ -/** +/* * enum ieee80211_p2p_attr_id - identifies type of peer-to-peer attribute. */ enum ieee80211_p2p_attr_id { @@ -1539,11 +1615,17 @@ struct ieee80211_p2p_noa_attr { #define IEEE80211_P2P_OPPPS_CTWINDOW_MASK 0x7F /** - * struct ieee80211_bar - HT Block Ack Request + * struct ieee80211_bar - Block Ack Request frame format + * @frame_control: Frame Control + * @duration: Duration + * @ra: RA + * @ta: TA + * @control: BAR Control + * @start_seq_num: Starting Sequence Number (see Figure 9-37) * - * This structure refers to "HT BlockAckReq" as - * described in 802.11n draft section 7.2.1.7.1 - */ + * This structure represents the "BlockAckReq frame format" + * as described in IEEE Std 802.11-2020 section 9.3.1.7. +*/ struct ieee80211_bar { __le16 frame_control; __le16 duration; @@ -1563,13 +1645,17 @@ struct ieee80211_bar { #define IEEE80211_HT_MCS_MASK_LEN 10 /** - * struct ieee80211_mcs_info - MCS information + * struct ieee80211_mcs_info - Supported MCS Set field * @rx_mask: RX mask * @rx_highest: highest supported RX rate. If set represents * the highest supported RX data rate in units of 1 Mbps. * If this field is 0 this value should not be used to * consider the highest RX data rate supported. * @tx_params: TX parameters + * @reserved: Reserved bits + * + * This structure represents the "Supported MCS Set field" as + * described in IEEE Std 802.11-2020 section 9.4.2.55.4. */ struct ieee80211_mcs_info { u8 rx_mask[IEEE80211_HT_MCS_MASK_LEN]; @@ -1600,10 +1686,16 @@ struct ieee80211_mcs_info { (IEEE80211_HT_MCS_UNEQUAL_MODULATION_START / 8) /** - * struct ieee80211_ht_cap - HT capabilities + * struct ieee80211_ht_cap - HT capabilities element + * @cap_info: HT Capability Information + * @ampdu_params_info: A-MPDU Parameters + * @mcs: Supported MCS Set + * @extended_ht_cap_info: HT Extended Capabilities + * @tx_BF_cap_info: Transmit Beamforming Capabilities + * @antenna_selection_info: ASEL Capability * - * This structure is the "HT capabilities element" as - * described in 802.11n D5.0 7.3.2.57 + * This structure represents the payload of the "HT Capabilities + * element" as described in IEEE Std 802.11-2020 section 9.4.2.55. */ struct ieee80211_ht_cap { __le16 cap_info; @@ -1691,9 +1783,14 @@ enum ieee80211_min_mpdu_spacing { /** * struct ieee80211_ht_operation - HT operation IE + * @primary_chan: Primary Channel + * @ht_param: HT Operation Information parameters + * @operation_mode: HT Operation Information operation mode + * @stbc_param: HT Operation Information STBC params + * @basic_set: Basic HT-MCS Set * - * This structure is the "HT operation element" as - * described in 802.11n-2009 7.3.2.57 + * This structure represents the payload of the "HT Operation + * element" as described in IEEE Std 802.11-2020 section 9.4.2.56. */ struct ieee80211_ht_operation { u8 primary_chan; @@ -1862,9 +1959,12 @@ struct ieee80211_vht_operation { /** * struct ieee80211_he_cap_elem - HE capabilities element + * @mac_cap_info: HE MAC Capabilities Information + * @phy_cap_info: HE PHY Capabilities Information * - * This structure is the "HE capabilities element" fixed fields as - * described in P802.11ax_D4.0 section 9.4.2.242.2 and 9.4.2.242.3 + * This structure represents the fixed fields of the payload of the + * "HE capabilities element" as described in IEEE Std 802.11ax-2021 + * sections 9.4.2.248.2 and 9.4.2.248.3. */ struct ieee80211_he_cap_elem { u8 mac_cap_info[6]; @@ -1923,35 +2023,45 @@ struct ieee80211_he_mcs_nss_supp { } __packed; /** - * struct ieee80211_he_operation - HE capabilities element + * struct ieee80211_he_operation - HE Operation element + * @he_oper_params: HE Operation Parameters + BSS Color Information + * @he_mcs_nss_set: Basic HE-MCS And NSS Set + * @optional: Optional fields VHT Operation Information, Max Co-Hosted + * BSSID Indicator, and 6 GHz Operation Information * - * This structure is the "HE operation element" fields as - * described in P802.11ax_D4.0 section 9.4.2.243 + * This structure represents the payload of the "HE Operation + * element" as described in IEEE Std 802.11ax-2021 section 9.4.2.249. */ struct ieee80211_he_operation { __le32 he_oper_params; __le16 he_mcs_nss_set; - /* Optional 0,1,3,4,5,7 or 8 bytes: depends on @he_oper_params */ u8 optional[]; } __packed; /** - * struct ieee80211_he_spr - HE spatial reuse element + * struct ieee80211_he_spr - Spatial Reuse Parameter Set element + * @he_sr_control: SR Control + * @optional: Optional fields Non-SRG OBSS PD Max Offset, SRG OBSS PD + * Min Offset, SRG OBSS PD Max Offset, SRG BSS Color + * Bitmap, and SRG Partial BSSID Bitmap * - * This structure is the "HE spatial reuse element" element as - * described in P802.11ax_D4.0 section 9.4.2.241 + * This structure represents the payload of the "Spatial Reuse + * Parameter Set element" as described in IEEE Std 802.11ax-2021 + * section 9.4.2.252. */ struct ieee80211_he_spr { u8 he_sr_control; - /* Optional 0 to 19 bytes: depends on @he_sr_control */ u8 optional[]; } __packed; /** * struct ieee80211_he_mu_edca_param_ac_rec - MU AC Parameter Record field + * @aifsn: ACI/AIFSN + * @ecw_min_max: ECWmin/ECWmax + * @mu_edca_timer: MU EDCA Timer * - * This structure is the "MU AC Parameter Record" fields as - * described in P802.11ax_D4.0 section 9.4.2.245 + * This structure represents the "MU AC Parameter Record" as described + * in IEEE Std 802.11ax-2021 section 9.4.2.251, Figure 9-788p. */ struct ieee80211_he_mu_edca_param_ac_rec { u8 aifsn; @@ -1961,9 +2071,14 @@ struct ieee80211_he_mu_edca_param_ac_rec { /** * struct ieee80211_mu_edca_param_set - MU EDCA Parameter Set element + * @mu_qos_info: QoS Info + * @ac_be: MU AC_BE Parameter Record + * @ac_bk: MU AC_BK Parameter Record + * @ac_vi: MU AC_VI Parameter Record + * @ac_vo: MU AC_VO Parameter Record * - * This structure is the "MU EDCA Parameter Set element" fields as - * described in P802.11ax_D4.0 section 9.4.2.245 + * This structure represents the payload of the "MU EDCA Parameter Set + * element" as described in IEEE Std 802.11ax-2021 section 9.4.2.251. */ struct ieee80211_mu_edca_param_set { u8 mu_qos_info; @@ -2177,9 +2292,9 @@ int ieee80211_get_vht_max_nss(struct ieee80211_vht_cap *cap, * enum ieee80211_ap_reg_power - regulatory power for a Access Point * * @IEEE80211_REG_UNSET_AP: Access Point has no regulatory power mode - * @IEEE80211_REG_LPI: Indoor Access Point - * @IEEE80211_REG_SP: Standard power Access Point - * @IEEE80211_REG_VLP: Very low power Access Point + * @IEEE80211_REG_LPI_AP: Indoor Access Point + * @IEEE80211_REG_SP_AP: Standard power Access Point + * @IEEE80211_REG_VLP_AP: Very low power Access Point * @IEEE80211_REG_AP_POWER_AFTER_LAST: internal * @IEEE80211_REG_AP_POWER_MAX: maximum value */ @@ -2567,7 +2682,7 @@ static inline bool ieee80211_he_capa_size_ok(const u8 *data, u8 len) #define IEEE80211_6GHZ_CTRL_REG_SP_AP 1 /** - * ieee80211_he_6ghz_oper - HE 6 GHz operation Information field + * struct ieee80211_he_6ghz_oper - HE 6 GHz operation Information field * @primary: primary channel * @control: control flags * @ccfs0: channel center frequency segment 0 @@ -2614,9 +2729,13 @@ enum ieee80211_tx_power_intrpt_type { }; /** - * struct ieee80211_tx_pwr_env + * struct ieee80211_tx_pwr_env - Transmit Power Envelope + * @tx_power_info: Transmit Power Information field + * @tx_power: Maximum Transmit Power field * - * This structure represents the "Transmit Power Envelope element" + * This structure represents the payload of the "Transmit Power + * Envelope element" as described in IEEE Std 802.11ax-2021 section + * 9.4.2.161 */ struct ieee80211_tx_pwr_env { u8 tx_power_info; @@ -4478,7 +4597,7 @@ static inline bool for_each_element_completed(const struct element *element, return (const u8 *)element == (const u8 *)data + datalen; } -/** +/* * RSNX Capabilities: * bits 0-3: Field length (n-1) */ @@ -4721,7 +4840,7 @@ ieee80211_mle_get_bss_param_ch_cnt(const struct ieee80211_multi_link_elem *mle) } /** - * ieee80211_mle_get_eml_sync_delay - returns the medium sync delay + * ieee80211_mle_get_eml_med_sync_delay - returns the medium sync delay * @data: pointer to the multi link EHT IE * * The element is assumed to be of the correct type (BASIC) and big enough, -- cgit v1.2.3 From 740ebe35bd3f5c4ff8ec60e5e521e47ea8f5492c Mon Sep 17 00:00:00 2001 From: Geliang Tang Date: Mon, 21 Aug 2023 15:25:14 -0700 Subject: mptcp: add struct mptcp_sched_ops This patch defines struct mptcp_sched_ops, which has three struct members, name, owner and list, and four function pointers: init(), release() and get_subflow(). The scheduler function get_subflow() have a struct mptcp_sched_data parameter, which contains a reinject flag for retrans or not, a subflows number and a mptcp_subflow_context array. Add the scheduler registering, unregistering and finding functions to add, delete and find a packet scheduler on the global list mptcp_sched_list. Acked-by: Paolo Abeni Reviewed-by: Mat Martineau Signed-off-by: Geliang Tang Signed-off-by: Mat Martineau Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-3-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski --- include/net/mptcp.h | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) (limited to 'include') diff --git a/include/net/mptcp.h b/include/net/mptcp.h index 3c5c68618fcc..fb996124b3d5 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -96,6 +96,27 @@ struct mptcp_out_options { #endif }; +#define MPTCP_SCHED_NAME_MAX 16 +#define MPTCP_SUBFLOWS_MAX 8 + +struct mptcp_sched_data { + bool reinject; + u8 subflows; + struct mptcp_subflow_context *contexts[MPTCP_SUBFLOWS_MAX]; +}; + +struct mptcp_sched_ops { + int (*get_subflow)(struct mptcp_sock *msk, + struct mptcp_sched_data *data); + + char name[MPTCP_SCHED_NAME_MAX]; + struct module *owner; + struct list_head list; + + void (*init)(struct mptcp_sock *msk); + void (*release)(struct mptcp_sock *msk); +} ____cacheline_aligned_in_smp; + #ifdef CONFIG_MPTCP void mptcp_init(void); -- cgit v1.2.3 From eb6603246ab9a3787e9603f0fd6a321b46623e46 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Mon, 21 Aug 2023 21:00:02 +0800 Subject: qed/qede: Remove unused declarations Commit 8cd160a29415 ("qede: convert to new udp_tunnel_nic infra") removed qede_udp_tunnel_{add,del}() but not the declarations. Commit 0ebcebbef1cc ("qed: Read device port count from the shmem") removed qed_device_num_engines() but not its declaration. Commit 1e128c81290a ("qed: Add support for hardware offloaded FCoE.") declared but never implemented qed_fcoe_set_pf_params(). Signed-off-by: Yue Haibing Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/linux/qed/qed_fcoe_if.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include') diff --git a/include/linux/qed/qed_fcoe_if.h b/include/linux/qed/qed_fcoe_if.h index 90e3045b2dcb..0d3b6ed21628 100644 --- a/include/linux/qed/qed_fcoe_if.h +++ b/include/linux/qed/qed_fcoe_if.h @@ -67,9 +67,6 @@ struct qed_fcoe_cb_ops { u32 (*get_login_failures)(void *cookie); }; -void qed_fcoe_set_pf_params(struct qed_dev *cdev, - struct qed_fcoe_pf_params *params); - /** * struct qed_fcoe_ops - qed FCoE operations. * @common: common operations pointer -- cgit v1.2.3 From 71ab55a9af80fcffe1f42b9b8dba4d0e3dd0c351 Mon Sep 17 00:00:00 2001 From: Petr Pavlu Date: Mon, 21 Aug 2023 15:12:15 +0200 Subject: mlx4: Get rid of the mlx4_interface.get_dev callback Simplify the mlx4 driver interface by removing mlx4_get_protocol_dev() and the associated mlx4_interface.get_dev callbacks. This is done in preparation to use an auxiliary bus to model the mlx4 driver structure. The change is motivated by the following situation: * The mlx4_en interface is being initialized by mlx4_en_add() and mlx4_en_activate(). * The latter activate function calls mlx4_en_init_netdev() -> register_netdev() to register a new net_device. * A netdev event NETDEV_REGISTER is raised for the device. * The netdev notififier mlx4_ib_netdev_event() is called and it invokes mlx4_ib_scan_netdevs() -> mlx4_get_protocol_dev() -> mlx4_en_get_netdev() [via mlx4_interface.get_dev]. This chain creates a problem when mlx4_en gets switched to be an auxiliary driver. It contains two device calls which would both need to take a respective device lock. Avoid this situation by updating mlx4_ib_scan_netdevs() to no longer call mlx4_get_protocol_dev() but instead to utilize the information passed in net_device.parent and net_device.dev_port. This data is sufficient to determine that an updated port is one that the mlx4_ib driver should take care of and to keep mlx4_ib_dev.iboe.netdevs up to date. Following that, update mlx4_ib_get_netdev() to also not call mlx4_get_protocol_dev() and instead scan all current netdevs to find find a matching one. Note that mlx4_ib_get_netdev() is called early from ib_register_device() and cannot use data tracked in mlx4_ib_dev.iboe.netdevs which is not at that point yet set. Finally, remove function mlx4_get_protocol_dev() and the mlx4_interface.get_dev callbacks (only mlx4_en_get_netdev()) as they became unused. Signed-off-by: Petr Pavlu Tested-by: Leon Romanovsky Reviewed-by: Leon Romanovsky Acked-by: Tariq Toukan Signed-off-by: David S. Miller --- include/linux/mlx4/driver.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include') diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h index 1834c8fad12e..923951e19300 100644 --- a/include/linux/mlx4/driver.h +++ b/include/linux/mlx4/driver.h @@ -59,7 +59,6 @@ struct mlx4_interface { void (*remove)(struct mlx4_dev *dev, void *context); void (*event) (struct mlx4_dev *dev, void *context, enum mlx4_dev_event event, unsigned long param); - void * (*get_dev)(struct mlx4_dev *dev, void *context, u8 port); void (*activate)(struct mlx4_dev *dev, void *context); struct list_head list; enum mlx4_protocol protocol; @@ -88,8 +87,6 @@ struct mlx4_port_map { int mlx4_port_map_set(struct mlx4_dev *dev, struct mlx4_port_map *v2p); -void *mlx4_get_protocol_dev(struct mlx4_dev *dev, enum mlx4_protocol proto, int port); - struct devlink_port *mlx4_get_devlink_port(struct mlx4_dev *dev, int port); #endif /* MLX4_DRIVER_H */ -- cgit v1.2.3 From 7ba189ac52acab44129f09b302069e5a86fd92c2 Mon Sep 17 00:00:00 2001 From: Petr Pavlu Date: Mon, 21 Aug 2023 15:12:17 +0200 Subject: mlx4: Use 'void *' as the event param of mlx4_dispatch_event() Function mlx4_dispatch_event() takes an 'unsigned long' as its event parameter. The actual value is none (MLX4_DEV_EVENT_CATASTROPHIC_ERROR), a pointer to mlx4_eqe (MLX4_DEV_EVENT_PORT_MGMT_CHANGE), or a 32-bit integer (remaining events). In preparation to switch mlx4_en and mlx4_ib to be an auxiliary device, the mlx4_interface.event callback is replaced with a notifier and function mlx4_dispatch_event() gets updated to invoke atomic_notifier_call_chain(). This requires forwarding the input 'param' value from the former function to the latter. A problem is that the notifier call takes 'void *' as its 'param' value, compared to 'unsigned long' used by mlx4_dispatch_event(). Re-passing the value would need either punning it to 'void *' or passing down the address of the input 'param'. Both approaches create a number of unnecessary casts. Change instead the input 'param' of mlx4_dispatch_event() from 'unsigned long' to 'void *'. A mlx4_eqe pointer can be passed directly, callers using an int value are adjusted to pass its address. Signed-off-by: Petr Pavlu Reviewed-by: Leon Romanovsky Signed-off-by: David S. Miller --- include/linux/mlx4/driver.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h index 923951e19300..032d7f5bfef6 100644 --- a/include/linux/mlx4/driver.h +++ b/include/linux/mlx4/driver.h @@ -58,7 +58,7 @@ struct mlx4_interface { void * (*add) (struct mlx4_dev *dev); void (*remove)(struct mlx4_dev *dev, void *context); void (*event) (struct mlx4_dev *dev, void *context, - enum mlx4_dev_event event, unsigned long param); + enum mlx4_dev_event event, void *param); void (*activate)(struct mlx4_dev *dev, void *context); struct list_head list; enum mlx4_protocol protocol; -- cgit v1.2.3 From 73d68002a02efd370dba6b8fc570427326e36d1a Mon Sep 17 00:00:00 2001 From: Petr Pavlu Date: Mon, 21 Aug 2023 15:12:18 +0200 Subject: mlx4: Replace the mlx4_interface.event callback with a notifier Use a notifier to implement mlx4_dispatch_event() in preparation to switch mlx4_en and mlx4_ib to be an auxiliary device. A problem is that if the mlx4_interface.event callback was replaced with something as mlx4_adrv.event then the implementation of mlx4_dispatch_event() would need to acquire a lock on a given device before executing this callback. That is necessary because otherwise there is no guarantee that the associated driver cannot get unbound when the callback is running. However, taking this lock is not possible because mlx4_dispatch_event() can be invoked from the hardirq context. Using an atomic notifier allows the driver to accurately record when it wants to receive these events and solves this problem. A handler registration is done by both mlx4_en and mlx4_ib at the end of their mlx4_interface.add callback. This matches the current situation when mlx4_add_device() would enable events for a given device immediately after this callback, by adding the device on the mlx4_priv.list. Signed-off-by: Petr Pavlu Tested-by: Leon Romanovsky Acked-by: Tariq Toukan Reviewed-by: Leon Romanovsky Signed-off-by: David S. Miller --- include/linux/mlx4/driver.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h index 032d7f5bfef6..228da8ed7e75 100644 --- a/include/linux/mlx4/driver.h +++ b/include/linux/mlx4/driver.h @@ -34,6 +34,7 @@ #define MLX4_DRIVER_H #include +#include #include struct mlx4_dev; @@ -57,8 +58,6 @@ enum { struct mlx4_interface { void * (*add) (struct mlx4_dev *dev); void (*remove)(struct mlx4_dev *dev, void *context); - void (*event) (struct mlx4_dev *dev, void *context, - enum mlx4_dev_event event, void *param); void (*activate)(struct mlx4_dev *dev, void *context); struct list_head list; enum mlx4_protocol protocol; @@ -87,6 +86,11 @@ struct mlx4_port_map { int mlx4_port_map_set(struct mlx4_dev *dev, struct mlx4_port_map *v2p); +int mlx4_register_event_notifier(struct mlx4_dev *dev, + struct notifier_block *nb); +int mlx4_unregister_event_notifier(struct mlx4_dev *dev, + struct notifier_block *nb); + struct devlink_port *mlx4_get_devlink_port(struct mlx4_dev *dev, int port); #endif /* MLX4_DRIVER_H */ -- cgit v1.2.3 From 13f857111cb23f2a8dbcd0271c3ff1824913d980 Mon Sep 17 00:00:00 2001 From: Petr Pavlu Date: Mon, 21 Aug 2023 15:12:19 +0200 Subject: mlx4: Get rid of the mlx4_interface.activate callback The mlx4_interface.activate callback was introduced in commit 79857cd31fe7 ("net/mlx4: Postpone the registration of net_device"). It dealt with a situation when a netdev notifier received a NETDEV_REGISTER event for a new net_device created by mlx4_en but the same device was not yet visible to mlx4_get_protocol_dev(). The callback can be removed now that mlx4_get_protocol_dev() is gone. Signed-off-by: Petr Pavlu Tested-by: Leon Romanovsky Reviewed-by: Leon Romanovsky Acked-by: Tariq Toukan Signed-off-by: David S. Miller --- include/linux/mlx4/driver.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include') diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h index 228da8ed7e75..0f8c9ba4c574 100644 --- a/include/linux/mlx4/driver.h +++ b/include/linux/mlx4/driver.h @@ -58,7 +58,6 @@ enum { struct mlx4_interface { void * (*add) (struct mlx4_dev *dev); void (*remove)(struct mlx4_dev *dev, void *context); - void (*activate)(struct mlx4_dev *dev, void *context); struct list_head list; enum mlx4_protocol protocol; int flags; -- cgit v1.2.3 From e2fb47d4eb5cd245c38c8c57d969ac6b12efc764 Mon Sep 17 00:00:00 2001 From: Petr Pavlu Date: Mon, 21 Aug 2023 15:12:20 +0200 Subject: mlx4: Move the bond work to the core driver Function mlx4_en_queue_bond_work() is used in mlx4_en to start a bond reconfiguration. It gathers data about a new port map setting, takes a reference on the netdev that triggered the change and queues a work object on mlx4_en_priv.mdev.workqueue to perform the operation. The scheduled work is mlx4_en_bond_work() which calls mlx4_bond()/mlx4_unbond() and consequently mlx4_do_bond(). At the same time, function mlx4_change_port_types() in mlx4_core might be invoked to change the port type configuration. As part of its logic, it re-registers the whole device by calling mlx4_unregister_device(), followed by mlx4_register_device(). The two operations can result in concurrent access to the data about currently active interfaces on the device. Functions mlx4_register_device() and mlx4_unregister_device() lock the intf_mutex to gain exclusive access to this data. The current implementation of mlx4_do_bond() doesn't do that which could result in an unexpected behavior. An updated version of mlx4_do_bond() for use with an auxiliary bus goes and locks the intf_mutex when accessing a new auxiliary device array. However, doing so can then result in the following deadlock: * A two-port mlx4 device is configured as an Ethernet bond. * One of the ports is changed from eth to ib, for instance, by writing into a mlx4_port sysfs attribute file. * mlx4_change_port_types() is called to update port types. It invokes mlx4_unregister_device() to unregister the device which locks the intf_mutex and starts removing all associated interfaces. * Function mlx4_en_remove() gets invoked and starts destroying its first netdev. This triggers mlx4_en_netdev_event() which recognizes that the configured bond is broken. It runs mlx4_en_queue_bond_work() which takes a reference on the netdev. Removing the netdev now cannot proceed until the work is completed. * Work function mlx4_en_bond_work() gets scheduled. It calls mlx4_unbond() -> mlx4_do_bond(). The latter function tries to lock the intf_mutex but that is not possible because it is held already by mlx4_unregister_device(). This particular case could be possibly solved by unregistering the mlx4_en_netdev_event() notifier in mlx4_en_remove() earlier, but it seems better to decouple mlx4_en more and break this reference order. Avoid then this scenario by recognizing that the bond reconfiguration operates only on a mlx4_dev. The logic to queue and execute the bond work can be moved into the mlx4_core driver. Only a reference on the respective mlx4_dev object is needed to be taken during the work's lifetime. This removes a call from mlx4_en that can directly result in needing to lock the intf_mutex, it remains a privilege of the core driver. Signed-off-by: Petr Pavlu Tested-by: Leon Romanovsky Reviewed-by: Leon Romanovsky Acked-by: Tariq Toukan Signed-off-by: David S. Miller --- include/linux/mlx4/device.h | 13 +++++++++++++ include/linux/mlx4/driver.h | 19 ------------------- 2 files changed, 13 insertions(+), 19 deletions(-) (limited to 'include') diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 6646634a0b9d..049d8a4b044d 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -1087,6 +1087,19 @@ static inline void *mlx4_buf_offset(struct mlx4_buf *buf, int offset) (offset & (PAGE_SIZE - 1)); } +static inline int mlx4_is_bonded(struct mlx4_dev *dev) +{ + return !!(dev->flags & MLX4_FLAG_BONDED); +} + +static inline int mlx4_is_mf_bonded(struct mlx4_dev *dev) +{ + return (mlx4_is_bonded(dev) && mlx4_is_mfunc(dev)); +} + +int mlx4_queue_bond_work(struct mlx4_dev *dev, int is_bonded, u8 v2p_p1, + u8 v2p_p2); + int mlx4_pd_alloc(struct mlx4_dev *dev, u32 *pdn); void mlx4_pd_free(struct mlx4_dev *dev, u32 pdn); int mlx4_xrcd_alloc(struct mlx4_dev *dev, u32 *xrcdn); diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h index 0f8c9ba4c574..781d5a0c2faa 100644 --- a/include/linux/mlx4/driver.h +++ b/include/linux/mlx4/driver.h @@ -66,25 +66,6 @@ struct mlx4_interface { int mlx4_register_interface(struct mlx4_interface *intf); void mlx4_unregister_interface(struct mlx4_interface *intf); -int mlx4_bond(struct mlx4_dev *dev); -int mlx4_unbond(struct mlx4_dev *dev); -static inline int mlx4_is_bonded(struct mlx4_dev *dev) -{ - return !!(dev->flags & MLX4_FLAG_BONDED); -} - -static inline int mlx4_is_mf_bonded(struct mlx4_dev *dev) -{ - return (mlx4_is_bonded(dev) && mlx4_is_mfunc(dev)); -} - -struct mlx4_port_map { - u8 port1; - u8 port2; -}; - -int mlx4_port_map_set(struct mlx4_dev *dev, struct mlx4_port_map *v2p); - int mlx4_register_event_notifier(struct mlx4_dev *dev, struct notifier_block *nb); int mlx4_unregister_event_notifier(struct mlx4_dev *dev, -- cgit v1.2.3 From 8c2d2b87719bad9c1db4a7919e74f7818a8ca3de Mon Sep 17 00:00:00 2001 From: Petr Pavlu Date: Mon, 21 Aug 2023 15:12:22 +0200 Subject: mlx4: Register mlx4 devices to an auxiliary virtual bus Add an auxiliary virtual bus to model the mlx4 driver structure. The code is added along the current custom device management logic. Subsequent patches switch mlx4_en and mlx4_ib to the auxiliary bus and the old interface is then removed. Structure mlx4_priv gains a new adev dynamic array to keep track of its auxiliary devices. Access to the array is protected by the global mlx4_intf mutex. Functions mlx4_register_device() and mlx4_unregister_device() are updated to expose auxiliary devices on the bus in order to load mlx4_en and/or mlx4_ib. Functions mlx4_register_auxiliary_driver() and mlx4_unregister_auxiliary_driver() are added to substitute mlx4_register_interface() and mlx4_unregister_interface(), respectively. Function mlx4_do_bond() is adjusted to walk over the adev array and re-adds a specific auxiliary device if its driver sets the MLX4_INTFF_BONDING flag. Signed-off-by: Petr Pavlu Tested-by: Leon Romanovsky Reviewed-by: Leon Romanovsky Acked-by: Tariq Toukan Signed-off-by: David S. Miller --- include/linux/mlx4/device.h | 7 +++++++ include/linux/mlx4/driver.h | 11 +++++++++++ 2 files changed, 18 insertions(+) (limited to 'include') diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 049d8a4b044d..27f42f713c89 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -33,6 +33,7 @@ #ifndef MLX4_DEVICE_H #define MLX4_DEVICE_H +#include #include #include #include @@ -889,6 +890,12 @@ struct mlx4_dev { u8 uar_page_shift; }; +struct mlx4_adev { + struct auxiliary_device adev; + struct mlx4_dev *mdev; + int idx; +}; + struct mlx4_clock_params { u64 offset; u8 bar; diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h index 781d5a0c2faa..9cf157d381c6 100644 --- a/include/linux/mlx4/driver.h +++ b/include/linux/mlx4/driver.h @@ -34,9 +34,12 @@ #define MLX4_DRIVER_H #include +#include #include #include +#define MLX4_ADEV_NAME "mlx4_core" + struct mlx4_dev; #define MLX4_MAC_MASK 0xffffffffffffULL @@ -63,8 +66,16 @@ struct mlx4_interface { int flags; }; +struct mlx4_adrv { + struct auxiliary_driver adrv; + enum mlx4_protocol protocol; + int flags; +}; + int mlx4_register_interface(struct mlx4_interface *intf); void mlx4_unregister_interface(struct mlx4_interface *intf); +int mlx4_register_auxiliary_driver(struct mlx4_adrv *madrv); +void mlx4_unregister_auxiliary_driver(struct mlx4_adrv *madrv); int mlx4_register_event_notifier(struct mlx4_dev *dev, struct notifier_block *nb); -- cgit v1.2.3 From c138cdb89a14ce00d45e77d3db18263e4b9f9465 Mon Sep 17 00:00:00 2001 From: Petr Pavlu Date: Mon, 21 Aug 2023 15:12:25 +0200 Subject: mlx4: Delete custom device management logic After the conversion to use the auxiliary bus, the custom device management is not needed anymore and can be deleted. Signed-off-by: Petr Pavlu Tested-by: Leon Romanovsky Reviewed-by: Leon Romanovsky Acked-by: Tariq Toukan Signed-off-by: David S. Miller --- include/linux/mlx4/driver.h | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'include') diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h index 9cf157d381c6..69825223081f 100644 --- a/include/linux/mlx4/driver.h +++ b/include/linux/mlx4/driver.h @@ -58,22 +58,12 @@ enum { MLX4_INTFF_BONDING = 1 << 0 }; -struct mlx4_interface { - void * (*add) (struct mlx4_dev *dev); - void (*remove)(struct mlx4_dev *dev, void *context); - struct list_head list; - enum mlx4_protocol protocol; - int flags; -}; - struct mlx4_adrv { struct auxiliary_driver adrv; enum mlx4_protocol protocol; int flags; }; -int mlx4_register_interface(struct mlx4_interface *intf); -void mlx4_unregister_interface(struct mlx4_interface *intf); int mlx4_register_auxiliary_driver(struct mlx4_adrv *madrv); void mlx4_unregister_auxiliary_driver(struct mlx4_adrv *madrv); -- cgit v1.2.3 From 59da9885767a75df697c84c06aaf2296e10d85a4 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Wed, 23 Aug 2023 10:56:32 +0200 Subject: net: dsa: use capital "OR" for multiple licenses in SPDX Documentation/process/license-rules.rst and checkpatch expect the SPDX identifier syntax for multiple licenses to use capital "OR". Correct it to keep consistent format and avoid copy-paste issues. Signed-off-by: Krzysztof Kozlowski Reviewed-by: Kurt Kanzenbach Reviewed-by: FLorian Fainelli Link: https://lore.kernel.org/r/20230823085632.116725-1-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski --- include/linux/platform_data/hirschmann-hellcreek.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/platform_data/hirschmann-hellcreek.h b/include/linux/platform_data/hirschmann-hellcreek.h index 6a000df5541f..8748680e9e3c 100644 --- a/include/linux/platform_data/hirschmann-hellcreek.h +++ b/include/linux/platform_data/hirschmann-hellcreek.h @@ -1,4 +1,4 @@ -/* SPDX-License-Identifier: (GPL-2.0 or MIT) */ +/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */ /* * Hirschmann Hellcreek TSN switch platform data. * -- cgit v1.2.3 From fbdc4bc47268953c80853489f696e02d61f9a2c6 Mon Sep 17 00:00:00 2001 From: Iulia Tanasescu Date: Thu, 17 Aug 2023 09:44:27 +0300 Subject: Bluetooth: ISO: Use defer setup to separate PA sync and BIG sync This commit implements defer setup support for the Broadcast Sink scenario: By setting defer setup on a broadcast socket before calling listen, the user is able to trigger the PA sync and BIG sync procedures separately. This is useful if the user first wants to synchronize to the periodic advertising transmitted by a Broadcast Source, and trigger the BIG sync procedure later on. If defer setup is set, once a PA sync established event arrives, a new hcon is created and notified to the ISO layer. A child socket associated with the PA sync connection will be added to the accept queue of the listening socket. Once the accept call returns the fd for the PA sync child socket, the user should call read on that fd. This will trigger the BIG create sync procedure, and the PA sync socket will become a listening socket itself. When the BIG sync established event is notified to the ISO layer, the bis connections will be added to the accept queue of the PA sync parent. The user should call accept on the PA sync socket to get the final bis connections. Signed-off-by: Iulia Tanasescu Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/hci_core.h | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index c53d74236e3a..6fb055e3c595 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -978,6 +978,8 @@ enum { HCI_CONN_CREATE_CIS, HCI_CONN_BIG_SYNC, HCI_CONN_BIG_SYNC_FAILED, + HCI_CONN_PA_SYNC, + HCI_CONN_PA_SYNC_FAILED, }; static inline bool hci_conn_ssp_enabled(struct hci_conn *conn) @@ -1300,7 +1302,7 @@ static inline struct hci_conn *hci_conn_hash_lookup_big_any_dst(struct hci_dev * if (c->type != ISO_LINK) continue; - if (handle == c->iso_qos.bcast.big) { + if (handle != BT_ISO_QOS_BIG_UNSET && handle == c->iso_qos.bcast.big) { rcu_read_unlock(); return c; } @@ -1311,6 +1313,29 @@ static inline struct hci_conn *hci_conn_hash_lookup_big_any_dst(struct hci_dev * return NULL; } +static inline struct hci_conn * +hci_conn_hash_lookup_pa_sync(struct hci_dev *hdev, __u8 big) +{ + struct hci_conn_hash *h = &hdev->conn_hash; + struct hci_conn *c; + + rcu_read_lock(); + + list_for_each_entry_rcu(c, &h->list, list) { + if (c->type != ISO_LINK || + !test_bit(HCI_CONN_PA_SYNC, &c->flags)) + continue; + + if (c->iso_qos.bcast.big == big) { + rcu_read_unlock(); + return c; + } + } + rcu_read_unlock(); + + return NULL; +} + static inline struct hci_conn *hci_conn_hash_lookup_state(struct hci_dev *hdev, __u8 type, __u16 state) { @@ -1435,7 +1460,8 @@ struct hci_conn *hci_connect_bis(struct hci_dev *hdev, bdaddr_t *dst, __u8 data_len, __u8 *data); int hci_pa_create_sync(struct hci_dev *hdev, bdaddr_t *dst, __u8 dst_type, __u8 sid, struct bt_iso_qos *qos); -int hci_le_big_create_sync(struct hci_dev *hdev, struct bt_iso_qos *qos, +int hci_le_big_create_sync(struct hci_dev *hdev, struct hci_conn *hcon, + struct bt_iso_qos *qos, __u16 sync_handle, __u8 num_bis, __u8 bis[]); int hci_conn_check_link_mode(struct hci_conn *conn); int hci_conn_check_secure(struct hci_conn *conn, __u8 sec_level); -- cgit v1.2.3 From db08722fc7d46168fe31d9b8a7b29229dd959f9f Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz Date: Fri, 18 Aug 2023 14:19:27 -0700 Subject: Bluetooth: hci_core: Fix missing instances using HCI_MAX_AD_LENGTH There a few instances still using HCI_MAX_AD_LENGTH instead of using max_adv_len which takes care of detecting what is the actual maximum length depending on if the controller supports EA or not. Fixes: 112b5090c219 ("Bluetooth: MGMT: Fix always using HCI_MAX_AD_LENGTH") Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/hci_core.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index 6fb055e3c595..6e2988b11f99 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -83,7 +83,7 @@ struct discovery_state { u8 last_adv_addr_type; s8 last_adv_rssi; u32 last_adv_flags; - u8 last_adv_data[HCI_MAX_AD_LENGTH]; + u8 last_adv_data[HCI_MAX_EXT_AD_LENGTH]; u8 last_adv_data_len; bool report_invalid_rssi; bool result_filtering; @@ -290,7 +290,7 @@ struct adv_pattern { __u8 ad_type; __u8 offset; __u8 length; - __u8 value[HCI_MAX_AD_LENGTH]; + __u8 value[HCI_MAX_EXT_AD_LENGTH]; }; struct adv_rssi_thresholds { @@ -726,7 +726,7 @@ struct hci_conn { __u16 le_conn_interval; __u16 le_conn_latency; __u16 le_supv_timeout; - __u8 le_adv_data[HCI_MAX_AD_LENGTH]; + __u8 le_adv_data[HCI_MAX_EXT_AD_LENGTH]; __u8 le_adv_data_len; __u8 le_per_adv_data[HCI_MAX_PER_AD_LENGTH]; __u8 le_per_adv_data_len; -- cgit v1.2.3 From 9c0826310bfb784c9bac7d1d9454e304185446c5 Mon Sep 17 00:00:00 2001 From: Claudia Draghicescu Date: Fri, 30 Jun 2023 12:59:28 +0300 Subject: Bluetooth: ISO: Add support for periodic adv reports processing In the case of a Periodic Synchronized Receiver, the PA report received from a Broadcaster contains the BASE, which has information about codec and other parameters of a BIG. This isnformation is stored and the application can retrieve it using getsockopt(BT_ISO_BASE). Signed-off-by: Claudia Draghicescu Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/hci.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include') diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h index 5723405b833e..c58425d4c592 100644 --- a/include/net/bluetooth/hci.h +++ b/include/net/bluetooth/hci.h @@ -2771,6 +2771,17 @@ struct hci_ev_le_enh_conn_complete { __u8 clk_accurancy; } __packed; +#define HCI_EV_LE_PER_ADV_REPORT 0x0f +struct hci_ev_le_per_adv_report { + __le16 sync_handle; + __u8 tx_power; + __u8 rssi; + __u8 cte_type; + __u8 data_status; + __u8 length; + __u8 data[]; +} __packed; + #define HCI_EV_LE_EXT_ADV_SET_TERM 0x12 struct hci_evt_le_ext_adv_set_term { __u8 status; -- cgit v1.2.3 From 253f3399f4c09ce6f4e67350f839be0361b4d5ff Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz Date: Tue, 22 Aug 2023 12:02:03 -0700 Subject: Bluetooth: HCI: Introduce HCI_QUIRK_BROKEN_LE_CODED This introduces HCI_QUIRK_BROKEN_LE_CODED which is used to indicate that LE Coded PHY shall not be used, it is then set for some Intel models that claim to support it but when used causes many problems. Cc: stable@vger.kernel.org # 6.4.y+ Link: https://github.com/bluez/bluez/issues/577 Link: https://github.com/bluez/bluez/issues/582 Link: https://lore.kernel.org/linux-bluetooth/CABBYNZKco-v7wkjHHexxQbgwwSz-S=GZ=dZKbRE1qxT1h4fFbQ@mail.gmail.com/T/# Fixes: 288c90224eec ("Bluetooth: Enable all supported LE PHY by default") Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/hci.h | 10 ++++++++++ include/net/bluetooth/hci_core.h | 4 +++- 2 files changed, 13 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h index c58425d4c592..87d92accc26e 100644 --- a/include/net/bluetooth/hci.h +++ b/include/net/bluetooth/hci.h @@ -319,6 +319,16 @@ enum { * This quirk must be set before hci_register_dev is called. */ HCI_QUIRK_USE_MSFT_EXT_ADDRESS_FILTER, + + /* + * When this quirk is set, LE Coded PHY shall not be used. This is + * required for some Intel controllers which erroneously claim to + * support it but it causes problems with extended scanning. + * + * This quirk can be set before hci_register_dev is called or + * during the hdev->setup vendor callback. + */ + HCI_QUIRK_BROKEN_LE_CODED, }; /* HCI device flags */ diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index 6e2988b11f99..e6359f7346f1 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -1817,7 +1817,9 @@ void hci_conn_del_sysfs(struct hci_conn *conn); #define scan_2m(dev) (((dev)->le_tx_def_phys & HCI_LE_SET_PHY_2M) || \ ((dev)->le_rx_def_phys & HCI_LE_SET_PHY_2M)) -#define le_coded_capable(dev) (((dev)->le_features[1] & HCI_LE_PHY_CODED)) +#define le_coded_capable(dev) (((dev)->le_features[1] & HCI_LE_PHY_CODED) && \ + !test_bit(HCI_QUIRK_BROKEN_LE_CODED, \ + &(dev)->quirks)) #define scan_coded(dev) (((dev)->le_tx_def_phys & HCI_LE_SET_PHY_CODED) || \ ((dev)->le_rx_def_phys & HCI_LE_SET_PHY_CODED)) -- cgit v1.2.3 From d55595f04dcc8bd6f6ff33f451dda8de3f1232da Mon Sep 17 00:00:00 2001 From: Jiawen Wu Date: Wed, 23 Aug 2023 14:19:28 +0800 Subject: net: pcs: xpcs: add specific vendor supoprt for Wangxun 10Gb NICs Since Wangxun 10Gb NICs require some special configuration on the IP of Synopsys Designware XPCS, introduce dev_flag for different vendors. Read OUI from device identifier registers, to detect Wangxun devices. And xpcs_soft_reset() is skipped to avoid the reset of device identifier registers. Signed-off-by: Jiawen Wu Signed-off-by: David S. Miller --- include/linux/pcs/pcs-xpcs.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include') diff --git a/include/linux/pcs/pcs-xpcs.h b/include/linux/pcs/pcs-xpcs.h index ff99cf7a5d0d..f37e8acfd351 100644 --- a/include/linux/pcs/pcs-xpcs.h +++ b/include/linux/pcs/pcs-xpcs.h @@ -20,12 +20,19 @@ #define DW_AN_C37_1000BASEX 4 #define DW_10GBASER 5 +/* device vendor OUI */ +#define DW_OUI_WX 0x0018fc80 + +/* dev_flag */ +#define DW_DEV_TXGBE BIT(0) + struct xpcs_id; struct dw_xpcs { struct mdio_device *mdiodev; const struct xpcs_id *id; struct phylink_pcs pcs; + int dev_flag; }; int xpcs_get_an_mode(struct dw_xpcs *xpcs, phy_interface_t interface); -- cgit v1.2.3 From f629acc6f21043fdc80ae93cdafa6713888db0fd Mon Sep 17 00:00:00 2001 From: Jiawen Wu Date: Wed, 23 Aug 2023 14:19:29 +0800 Subject: net: pcs: xpcs: support to switch mode for Wangxun NICs According to chapter 6 of DesignWare Cores Ethernet PCS (version 3.20a) and custom design manual, add a configuration flow for switching interface mode. If the interface changes, the following setting is required: 1. wait VR_XS_PCS_DIG_STS bit(4, 2) [PSEQ_STATE] = 100b (Power-Good) 2. write SR_XS_PCS_CTRL2 to select various PCS type 3. write SR_PMA_CTRL1 and/or SR_XS_PCS_CTRL1 for link speed 4. program PMA registers 5. write VR_XS_PCS_DIG_CTRL1 bit(15) [VR_RST] = 1b (Vendor-Specific Soft Reset) 6. wait for VR_XS_PCS_DIG_CTRL1 bit(15) [VR_RST] to get cleared Only 10GBASE-R/SGMII/1000BASE-X modes are planned for the current Wangxun devices. And there is a quirk for Wangxun devices to switch mode although the interface in phylink state has not changed, since PCS will change to default 10GBASE-R when the ethernet driver(txgbe) do LAN reset. Signed-off-by: Jiawen Wu Signed-off-by: David S. Miller --- include/linux/pcs/pcs-xpcs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/pcs/pcs-xpcs.h b/include/linux/pcs/pcs-xpcs.h index f37e8acfd351..da3a6c30f6d2 100644 --- a/include/linux/pcs/pcs-xpcs.h +++ b/include/linux/pcs/pcs-xpcs.h @@ -32,6 +32,7 @@ struct dw_xpcs { struct mdio_device *mdiodev; const struct xpcs_id *id; struct phylink_pcs pcs; + phy_interface_t interface; int dev_flag; }; -- cgit v1.2.3 From a4f39c9f14a634e4cd35fcd338c239d11fcc73fc Mon Sep 17 00:00:00 2001 From: Nicolas Dichtel Date: Wed, 23 Aug 2023 15:41:02 +0200 Subject: net: handle ARPHRD_PPP in dev_is_mac_header_xmit() The goal is to support a bpf_redirect() from an ethernet device (ingress) to a ppp device (egress). The l2 header is added automatically by the ppp driver, thus the ethernet header should be removed. CC: stable@vger.kernel.org Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper") Signed-off-by: Nicolas Dichtel Tested-by: Siwar Zitouni Reviewed-by: Guillaume Nault Signed-off-by: David S. Miller --- include/linux/if_arp.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include') diff --git a/include/linux/if_arp.h b/include/linux/if_arp.h index 1ed52441972f..10a1e81434cb 100644 --- a/include/linux/if_arp.h +++ b/include/linux/if_arp.h @@ -53,6 +53,10 @@ static inline bool dev_is_mac_header_xmit(const struct net_device *dev) case ARPHRD_NONE: case ARPHRD_RAWIP: case ARPHRD_PIMREG: + /* PPP adds its l2 header automatically in ppp_start_xmit(). + * This makes it look like an l3 device to __bpf_redirect() and tcf_mirred_init(). + */ + case ARPHRD_PPP: return false; default: return true; -- cgit v1.2.3 From 2a6d50b50d6d589d43a90d6ca990b8b811e67701 Mon Sep 17 00:00:00 2001 From: Dave Marchevsky Date: Mon, 21 Aug 2023 12:33:06 -0700 Subject: bpf: Consider non-owning refs trusted Recent discussions around default kptr "trustedness" led to changes such as commit 6fcd486b3a0a ("bpf: Refactor RCU enforcement in the verifier."). One of the conclusions of those discussions, as expressed in code and comments in that patch, is that we'd like to move away from 'raw' PTR_TO_BTF_ID without some type flag or other register state indicating trustedness. Although PTR_TRUSTED and PTR_UNTRUSTED flags mark this state explicitly, the verifier currently considers trustedness implied by other register state. For example, owning refs to graph collection nodes must have a nonzero ref_obj_id, so they pass the is_trusted_reg check despite having no explicit PTR_{UN}TRUSTED flag. This patch makes trustedness of non-owning refs to graph collection nodes explicit as well. By definition, non-owning refs are currently trusted. Although the ref has no control over pointee lifetime, due to non-owning ref clobbering rules (see invalidate_non_owning_refs) dereferencing a non-owning ref is safe in the critical section controlled by bpf_spin_lock associated with its owning collection. Note that the previous statement does not hold true for nodes with shared ownership due to the use-after-free issue that this series is addressing. True shared ownership was disabled by commit 7deca5eae833 ("bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed"), though, so the statement holds for now. Further patches in the series will change the trustedness state of non-owning refs before re-enabling bpf_refcount_acquire. Let's add NON_OWN_REF type flag to BPF_REG_TRUSTED_MODIFIERS such that a non-owning ref reg state would pass is_trusted_reg check. Somewhat surprisingly, this doesn't result in any change to user-visible functionality elsewhere in the verifier: graph collection nodes are all marked MEM_ALLOC, which tends to be handled in separate codepaths from "raw" PTR_TO_BTF_ID. Regardless, let's be explicit here and document the current state of things before changing it elsewhere in the series. Signed-off-by: Dave Marchevsky Acked-by: Yonghong Song Link: https://lore.kernel.org/r/20230821193311.3290257-3-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf_verifier.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index f70f9ac884d2..b6e58dab8e27 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -745,7 +745,7 @@ static inline bool bpf_prog_check_recur(const struct bpf_prog *prog) } } -#define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | PTR_TRUSTED) +#define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | PTR_TRUSTED | NON_OWN_REF) static inline bool bpf_type_has_unsafe_modifiers(u32 type) { -- cgit v1.2.3 From 0816b8c6bf7fc87cec4273dc199e8f0764b9e7b1 Mon Sep 17 00:00:00 2001 From: Dave Marchevsky Date: Mon, 21 Aug 2023 12:33:09 -0700 Subject: bpf: Consider non-owning refs to refcounted nodes RCU protected An earlier patch in the series ensures that the underlying memory of nodes with bpf_refcount - which can have multiple owners - is not reused until RCU grace period has elapsed. This prevents use-after-free with non-owning references that may point to recently-freed memory. While RCU read lock is held, it's safe to dereference such a non-owning ref, as by definition RCU GP couldn't have elapsed and therefore underlying memory couldn't have been reused. From the perspective of verifier "trustedness" non-owning refs to refcounted nodes are now trusted only in RCU CS and therefore should no longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them MEM_RCU in order to reflect this new state. Signed-off-by: Dave Marchevsky Link: https://lore.kernel.org/r/20230821193311.3290257-6-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index eced6400f778..12596af59c00 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -653,7 +653,8 @@ enum bpf_type_flag { MEM_RCU = BIT(13 + BPF_BASE_TYPE_BITS), /* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning. - * Currently only valid for linked-list and rbtree nodes. + * Currently only valid for linked-list and rbtree nodes. If the nodes + * have a bpf_refcount_field, they must be tagged MEM_RCU as well. */ NON_OWN_REF = BIT(14 + BPF_BASE_TYPE_BITS), -- cgit v1.2.3 From 70934c7c99ad01778eef83e898df4c624e52492f Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Thu, 24 Aug 2023 14:37:53 +0100 Subject: net: phylink: add phylink_limit_mac_speed() Add a function which can be used to limit the phylink MAC capabilities to an upper speed limit. Signed-off-by: Russell King (Oracle) Link: https://lore.kernel.org/r/E1qZAX3-005pTi-K1@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- include/linux/phylink.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 789c516c6b4a..7d07f8736431 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -223,6 +223,8 @@ struct phylink_config { unsigned long mac_capabilities; }; +void phylink_limit_mac_speed(struct phylink_config *config, u32 max_speed); + /** * struct phylink_mac_ops - MAC operations structure. * @validate: Validate and update the link configuration. -- cgit v1.2.3 From e80af2acdef73d90ce56d6849d96f2a5862cec2c Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Thu, 24 Aug 2023 14:37:58 +0100 Subject: net: stmmac: convert plat->phylink_node to fwnode All users of plat->phylink_node first convert it to a fwnode. Rather than repeatedly convert to a fwnode, store it as a fwnode. To reflect this change, call it plat->port_node instead - it is used for more than just phylink. Signed-off-by: Russell King (Oracle) Link: https://lore.kernel.org/r/E1qZAX8-005pTo-OT@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 784277d666eb..b2ccd827bb80 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -227,7 +227,7 @@ struct plat_stmmacenet_data { phy_interface_t phy_interface; struct stmmac_mdio_bus_data *mdio_bus_data; struct device_node *phy_node; - struct device_node *phylink_node; + struct fwnode_handle *port_node; struct device_node *mdio_node; struct stmmac_dma_cfg *dma_cfg; struct stmmac_est *est; -- cgit v1.2.3 From 62b6442c58dc17b168f69b37b398a9cab7cd90c9 Mon Sep 17 00:00:00 2001 From: Dima Chumak Date: Thu, 24 Aug 2023 23:28:29 -0700 Subject: devlink: Expose port function commands to control IPsec crypto offloads Expose port function commands to enable / disable IPsec crypto offloads, this is used to control the port IPsec capabilities. When IPsec crypto is disabled for a function of the port (default), function cannot offload any IPsec crypto operations (Encrypt/Decrypt and XFRM state offloading). When enabled, IPsec crypto operations can be offloaded by the function of the port. Example of a PCI VF port which supports IPsec crypto offloads: $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto disable $ devlink port function set pci/0000:06:00.0/1 ipsec_crypto enable $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto enable Signed-off-by: Dima Chumak Signed-off-by: Leon Romanovsky Signed-off-by: Saeed Mahameed Reviewed-by: Jiri Pirko Link: https://lore.kernel.org/r/20230825062836.103744-2-saeed@kernel.org Signed-off-by: Jakub Kicinski --- include/net/devlink.h | 15 +++++++++++++++ include/uapi/linux/devlink.h | 2 ++ 2 files changed, 17 insertions(+) (limited to 'include') diff --git a/include/net/devlink.h b/include/net/devlink.h index f7fec0791acc..1cf07a820a0e 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -1583,6 +1583,15 @@ void devlink_free(struct devlink *devlink); * Should be used by device drivers set * the admin state of a function managed * by the devlink port. + * @port_fn_ipsec_crypto_get: Callback used to get port function's ipsec_crypto + * capability. Should be used by device drivers + * to report the current state of ipsec_crypto + * capability of a function managed by the devlink + * port. + * @port_fn_ipsec_crypto_set: Callback used to set port function's ipsec_crypto + * capability. Should be used by device drivers to + * enable/disable ipsec_crypto capability of a + * function managed by the devlink port. * * Note: Driver should return -EOPNOTSUPP if it doesn't support * port function (@port_fn_*) handling for a particular port. @@ -1620,6 +1629,12 @@ struct devlink_port_ops { int (*port_fn_state_set)(struct devlink_port *port, enum devlink_port_fn_state state, struct netlink_ext_ack *extack); + int (*port_fn_ipsec_crypto_get)(struct devlink_port *devlink_port, + bool *is_enable, + struct netlink_ext_ack *extack); + int (*port_fn_ipsec_crypto_set)(struct devlink_port *devlink_port, + bool enable, + struct netlink_ext_ack *extack); }; void devlink_port_init(struct devlink *devlink, diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index 3782d4219ac9..f9ae9a058ad2 100644 --- a/include/uapi/linux/devlink.h +++ b/include/uapi/linux/devlink.h @@ -661,6 +661,7 @@ enum devlink_resource_unit { enum devlink_port_fn_attr_cap { DEVLINK_PORT_FN_ATTR_CAP_ROCE_BIT, DEVLINK_PORT_FN_ATTR_CAP_MIGRATABLE_BIT, + DEVLINK_PORT_FN_ATTR_CAP_IPSEC_CRYPTO_BIT, /* Add new caps above */ __DEVLINK_PORT_FN_ATTR_CAPS_MAX, @@ -669,6 +670,7 @@ enum devlink_port_fn_attr_cap { #define DEVLINK_PORT_FN_CAP_ROCE _BITUL(DEVLINK_PORT_FN_ATTR_CAP_ROCE_BIT) #define DEVLINK_PORT_FN_CAP_MIGRATABLE \ _BITUL(DEVLINK_PORT_FN_ATTR_CAP_MIGRATABLE_BIT) +#define DEVLINK_PORT_FN_CAP_IPSEC_CRYPTO _BITUL(DEVLINK_PORT_FN_ATTR_CAP_IPSEC_CRYPTO_BIT) enum devlink_port_function_attr { DEVLINK_PORT_FUNCTION_ATTR_UNSPEC, -- cgit v1.2.3 From 390a24cbc39626a8a38c6d877a59f758fe209f2d Mon Sep 17 00:00:00 2001 From: Dima Chumak Date: Thu, 24 Aug 2023 23:28:30 -0700 Subject: devlink: Expose port function commands to control IPsec packet offloads Expose port function commands to enable / disable IPsec packet offloads, this is used to control the port IPsec capabilities. When IPsec packet is disabled for a function of the port (default), function cannot offload IPsec packet operations (encapsulation and XFRM policy offload). When enabled, IPsec packet operations can be offloaded by the function of the port, which includes crypto operation (Encrypt/Decrypt), IPsec encapsulation and XFRM state and policy offload. Example of a PCI VF port which supports IPsec packet offloads: $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_packet disable $ devlink port function set pci/0000:06:00.0/1 ipsec_packet enable $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_packet enable Signed-off-by: Dima Chumak Signed-off-by: Leon Romanovsky Signed-off-by: Saeed Mahameed Reviewed-by: Jiri Pirko Link: https://lore.kernel.org/r/20230825062836.103744-3-saeed@kernel.org Signed-off-by: Jakub Kicinski --- include/net/devlink.h | 15 +++++++++++++++ include/uapi/linux/devlink.h | 2 ++ 2 files changed, 17 insertions(+) (limited to 'include') diff --git a/include/net/devlink.h b/include/net/devlink.h index 1cf07a820a0e..29fd1b4ee654 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -1592,6 +1592,15 @@ void devlink_free(struct devlink *devlink); * capability. Should be used by device drivers to * enable/disable ipsec_crypto capability of a * function managed by the devlink port. + * @port_fn_ipsec_packet_get: Callback used to get port function's ipsec_packet + * capability. Should be used by device drivers + * to report the current state of ipsec_packet + * capability of a function managed by the devlink + * port. + * @port_fn_ipsec_packet_set: Callback used to set port function's ipsec_packet + * capability. Should be used by device drivers to + * enable/disable ipsec_packet capability of a + * function managed by the devlink port. * * Note: Driver should return -EOPNOTSUPP if it doesn't support * port function (@port_fn_*) handling for a particular port. @@ -1635,6 +1644,12 @@ struct devlink_port_ops { int (*port_fn_ipsec_crypto_set)(struct devlink_port *devlink_port, bool enable, struct netlink_ext_ack *extack); + int (*port_fn_ipsec_packet_get)(struct devlink_port *devlink_port, + bool *is_enable, + struct netlink_ext_ack *extack); + int (*port_fn_ipsec_packet_set)(struct devlink_port *devlink_port, + bool enable, + struct netlink_ext_ack *extack); }; void devlink_port_init(struct devlink *devlink, diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index f9ae9a058ad2..03875e078be8 100644 --- a/include/uapi/linux/devlink.h +++ b/include/uapi/linux/devlink.h @@ -662,6 +662,7 @@ enum devlink_port_fn_attr_cap { DEVLINK_PORT_FN_ATTR_CAP_ROCE_BIT, DEVLINK_PORT_FN_ATTR_CAP_MIGRATABLE_BIT, DEVLINK_PORT_FN_ATTR_CAP_IPSEC_CRYPTO_BIT, + DEVLINK_PORT_FN_ATTR_CAP_IPSEC_PACKET_BIT, /* Add new caps above */ __DEVLINK_PORT_FN_ATTR_CAPS_MAX, @@ -671,6 +672,7 @@ enum devlink_port_fn_attr_cap { #define DEVLINK_PORT_FN_CAP_MIGRATABLE \ _BITUL(DEVLINK_PORT_FN_ATTR_CAP_MIGRATABLE_BIT) #define DEVLINK_PORT_FN_CAP_IPSEC_CRYPTO _BITUL(DEVLINK_PORT_FN_ATTR_CAP_IPSEC_CRYPTO_BIT) +#define DEVLINK_PORT_FN_CAP_IPSEC_PACKET _BITUL(DEVLINK_PORT_FN_ATTR_CAP_IPSEC_PACKET_BIT) enum devlink_port_function_attr { DEVLINK_PORT_FUNCTION_ATTR_UNSPEC, -- cgit v1.2.3 From 17c8da5a3423c9f3800a256dbaa685bae18e1264 Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Thu, 24 Aug 2023 23:28:33 -0700 Subject: net/mlx5: Add IFC bits to support IPsec enable/disable Add hardware definitions to allow to control IPSec capabilities. Signed-off-by: Leon Romanovsky Signed-off-by: Saeed Mahameed Link: https://lore.kernel.org/r/20230825062836.103744-6-saeed@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/mlx5/mlx5_ifc.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 08dcb1f43be7..fc3db401f8a2 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -65,9 +65,11 @@ enum { enum { MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE = 0x0, + MLX5_SET_HCA_CAP_OP_MOD_ETHERNET_OFFLOADS = 0x1, MLX5_SET_HCA_CAP_OP_MOD_ODP = 0x2, MLX5_SET_HCA_CAP_OP_MOD_ATOMIC = 0x3, MLX5_SET_HCA_CAP_OP_MOD_ROCE = 0x4, + MLX5_SET_HCA_CAP_OP_MOD_IPSEC = 0x15, MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE2 = 0x20, MLX5_SET_HCA_CAP_OP_MOD_PORT_SELECTION = 0x25, }; @@ -3451,6 +3453,7 @@ union mlx5_ifc_hca_cap_union_bits { struct mlx5_ifc_virtio_emulation_cap_bits virtio_emulation_cap; struct mlx5_ifc_macsec_cap_bits macsec_cap; struct mlx5_ifc_crypto_cap_bits crypto_cap; + struct mlx5_ifc_ipsec_cap_bits ipsec_cap; u8 reserved_at_0[0x8000]; }; -- cgit v1.2.3 From 8efd7b17a3b03242c97281d88463ad56e8f551f8 Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Thu, 24 Aug 2023 23:28:34 -0700 Subject: net/mlx5: Provide an interface to block change of IPsec capabilities mlx5 HW can't perform IPsec offload operation simultaneously both on PF and VFs at the same time. While the previous patches added devlink knobs to change IPsec capabilities dynamically, there is a need to add a logic to block such IPsec capabilities for the cases when IPsec is already configured. Signed-off-by: Leon Romanovsky Signed-off-by: Saeed Mahameed Link: https://lore.kernel.org/r/20230825062836.103744-7-saeed@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/mlx5/driver.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index e95f10066eac..3033bbaeac81 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -813,6 +813,7 @@ struct mlx5_core_dev { /* MACsec notifier chain to sync MACsec core and IB database */ struct blocking_notifier_head macsec_nh; #endif + u64 num_ipsec_offloads; }; struct mlx5_db { -- cgit v1.2.3 From fd0fc6fdd8896a195cb7b0210a5ee46774718fc8 Mon Sep 17 00:00:00 2001 From: Sabrina Dubroca Date: Fri, 25 Aug 2023 23:35:09 +0200 Subject: tls: move tls_cipher_size_desc to net/tls/tls.h It's only used in net/tls/*, no need to bloat include/net/tls.h. Signed-off-by: Sabrina Dubroca Link: https://lore.kernel.org/r/dd9fad80415e5b3575b41f56b331871038362eab.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski --- include/net/tls.h | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'include') diff --git a/include/net/tls.h b/include/net/tls.h index 06fca9160346..a2b44578dcb7 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -51,16 +51,6 @@ struct tls_rec; -struct tls_cipher_size_desc { - unsigned int iv; - unsigned int key; - unsigned int salt; - unsigned int tag; - unsigned int rec_seq; -}; - -extern const struct tls_cipher_size_desc tls_cipher_size_desc[]; - /* Maximum data size carried in a TLS record */ #define TLS_MAX_PAYLOAD_SIZE ((size_t)1 << 14) -- cgit v1.2.3 From 72f93a3136ee18fd59fa6579f84c07e93424681e Mon Sep 17 00:00:00 2001 From: Antonio Napolitano Date: Sat, 26 Aug 2023 01:05:50 +0200 Subject: r8152: add vendor/device ID pair for D-Link DUB-E250 The D-Link DUB-E250 is an RTL8156 based 2.5G Ethernet controller. Add the vendor and product ID values to the driver. This makes Ethernet work with the adapter. Signed-off-by: Antonio Napolitano Link: https://lore.kernel.org/r/CV200KJEEUPC.WPKAHXCQJ05I@mercurius Signed-off-by: Jakub Kicinski --- include/linux/usb/r8152.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/usb/r8152.h b/include/linux/usb/r8152.h index 20d88b1defc3..287e9d83fb8b 100644 --- a/include/linux/usb/r8152.h +++ b/include/linux/usb/r8152.h @@ -29,6 +29,7 @@ #define VENDOR_ID_LINKSYS 0x13b1 #define VENDOR_ID_NVIDIA 0x0955 #define VENDOR_ID_TPLINK 0x2357 +#define VENDOR_ID_DLINK 0x2001 #if IS_REACHABLE(CONFIG_USB_RTL8152) extern u8 rtl8152_get_version(struct usb_interface *intf); -- cgit v1.2.3 From a014c35556b9045ece8426df2b38eb3c5e1c1aa0 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Sat, 26 Aug 2023 11:02:51 +0100 Subject: net: stmmac: clarify difference between "interface" and "phy_interface" Clarify the difference between "interface" and "phy_interface" in struct plat_stmmacenet_data, both by adding a comment, and also renaming "interface" to be "mac_interface". The difference between these are: MAC ----- optional PCS ----- SerDes ----- optional PHY ----- Media ^ ^ mac_interface phy_interface Note that phylink currently only deals with phy_interface. Signed-off-by: Russell King (Oracle) Link: https://lore.kernel.org/r/E1qZq83-005tts-6K@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index b2ccd827bb80..ce89cc3e4913 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -223,7 +223,20 @@ struct dwmac4_addrs { struct plat_stmmacenet_data { int bus_id; int phy_addr; - int interface; + /* MAC ----- optional PCS ----- SerDes ----- optional PHY ----- Media + * ^ ^ + * mac_interface phy_interface + * + * mac_interface is the MAC-side interface, which may be the same + * as phy_interface if there is no intervening PCS. If there is a + * PCS, then mac_interface describes the interface mode between the + * MAC and PCS, and phy_interface describes the interface mode + * between the PCS and PHY. + */ + phy_interface_t mac_interface; + /* phy_interface is the PHY-side interface - the interface used by + * an attached PHY. + */ phy_interface_t phy_interface; struct stmmac_mdio_bus_data *mdio_bus_data; struct device_node *phy_node; -- cgit v1.2.3