summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-07-20lockdown: Fix kexec lockdown bypass with ima policyEric Snowberg
The lockdown LSM is primarily used in conjunction with UEFI Secure Boot. This LSM may also be used on machines without UEFI. It can also be enabled when UEFI Secure Boot is disabled. One of lockdown's features is to prevent kexec from loading untrusted kernels. Lockdown can be enabled through a bootparam or after the kernel has booted through securityfs. If IMA appraisal is used with the "ima_appraise=log" boot param, lockdown can be defeated with kexec on any machine when Secure Boot is disabled or unavailable. IMA prevents setting "ima_appraise=log" from the boot param when Secure Boot is enabled, but this does not cover cases where lockdown is used without Secure Boot. To defeat lockdown, boot without Secure Boot and add ima_appraise=log to the kernel command line; then: $ echo "integrity" > /sys/kernel/security/lockdown $ echo "appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig" > \ /sys/kernel/security/ima/policy $ kexec -ls unsigned-kernel Add a call to verify ima appraisal is set to "enforce" whenever lockdown is enabled. This fixes CVE-2022-21505. Cc: stable@vger.kernel.org Fixes: 29d3c1c8dfe7 ("kexec: Allow kexec_file() with appropriate IMA policy when locked down") Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com> Acked-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: John Haxby <john.haxby@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-20spi: bcm2835: bcm2835_spi_handle_err(): fix NULL pointer deref for non DMA ↵Marc Kleine-Budde
transfers In case a IRQ based transfer times out the bcm2835_spi_handle_err() function is called. Since commit 1513ceee70f2 ("spi: bcm2835: Drop dma_pending flag") the TX and RX DMA transfers are unconditionally canceled, leading to NULL pointer derefs if ctlr->dma_tx or ctlr->dma_rx are not set. Fix the NULL pointer deref by checking that ctlr->dma_tx and ctlr->dma_rx are valid pointers before accessing them. Fixes: 1513ceee70f2 ("spi: bcm2835: Drop dma_pending flag") Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://lore.kernel.org/r/20220719072234.2782764-1-mkl@pengutronix.de Signed-off-by: Mark Brown <broonie@kernel.org>
2022-07-20selftests: gpio: fix include path to kernel headers for out of tree buildsKent Gibson
When building selftests out of the kernel tree the gpio.h the include path is incorrect and the build falls back to the system includes which may be outdated. Add the KHDR_INCLUDES to the CFLAGS to include the gpio.h from the build tree. Fixes: 4f4d0af7b2d9 ("selftests: gpio: restore CFLAGS options") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Kent Gibson <warthog618@gmail.com> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2022-07-20Merge tag 'linux-can-fixes-for-5.19-20220720' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== this is a pull request of 2 patches for net/master. The first patch is by me and fixes the detection of the mcp251863 in the mcp251xfd driver. The last patch is by Liang He and adds a missing of_node_put() in the rcar_canfd driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20mlxsw: spectrum_router: Fix IPv4 nexthop gateway indicationIdo Schimmel
mlxsw needs to distinguish nexthops with a gateway from connected nexthops in order to write the former to the adjacency table of the device. The check used to rely on the fact that nexthops with a gateway have a 'link' scope whereas connected nexthops have a 'host' scope. This is no longer correct after commit 747c14307214 ("ip: fix dflt addr selection for connected nexthop"). Fix that by instead checking the address family of the gateway IP. This is a more direct way and also consistent with the IPv6 counterpart in mlxsw_sp_rt6_is_gateway(). Cc: stable@vger.kernel.org Fixes: 747c14307214 ("ip: fix dflt addr selection for connected nexthop") Fixes: 597cfe4fc339 ("nexthop: Add support for IPv4 nexthops") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20net/sched: cls_api: Fix flow action initializationOz Shlomo
The cited commit refactored the flow action initialization sequence to use an interface method when translating tc action instances to flow offload objects. The refactored version skips the initialization of the generic flow action attributes for tc actions, such as pedit, that allocate more than one offload entry. This can cause potential issues for drivers mapping flow action ids. Populate the generic flow action fields for all the flow action entries. Fixes: c54e1d920f04 ("flow_offload: add ops to tc_action_ops for flow action setup") Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> ---- v1 -> v2: - coalese the generic flow action fields initialization to a single loop Reviewed-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20Merge branch 'net-sysctl-races-round-4'David S. Miller
Kuniyuki Iwashima says: ==================== sysctl: Fix data-races around ipv4_net_table (Round 4). This series fixes data-races around 17 knobs after fib_multipath_use_neigh in ipv4_net_table. tcp_fack was skipped because it's obsolete and there's no readers. So, round 5 will start with tcp_dsack, 2 rounds left for 27 knobs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20tcp: Fix data-races around sysctl_tcp_max_reordering.Kuniyuki Iwashima
While reading sysctl_tcp_max_reordering, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: dca145ffaa8d ("tcp: allow for bigger reordering level") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20tcp: Fix a data-race around sysctl_tcp_abort_on_overflow.Kuniyuki Iwashima
While reading sysctl_tcp_abort_on_overflow, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20tcp: Fix a data-race around sysctl_tcp_rfc1337.Kuniyuki Iwashima
While reading sysctl_tcp_rfc1337, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20tcp: Fix a data-race around sysctl_tcp_stdurg.Kuniyuki Iwashima
While reading sysctl_tcp_stdurg, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20tcp: Fix a data-race around sysctl_tcp_retrans_collapse.Kuniyuki Iwashima
While reading sysctl_tcp_retrans_collapse, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20tcp: Fix data-races around sysctl_tcp_slow_start_after_idle.Kuniyuki Iwashima
While reading sysctl_tcp_slow_start_after_idle, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 35089bb203f4 ("[TCP]: Add tcp_slow_start_after_idle sysctl.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20tcp: Fix a data-race around sysctl_tcp_thin_linear_timeouts.Kuniyuki Iwashima
While reading sysctl_tcp_thin_linear_timeouts, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 36e31b0af587 ("net: TCP thin linear timeouts") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20tcp: Fix data-races around sysctl_tcp_recovery.Kuniyuki Iwashima
While reading sysctl_tcp_recovery, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 4f41b1c58a32 ("tcp: use RACK to detect losses") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20tcp: Fix a data-race around sysctl_tcp_early_retrans.Kuniyuki Iwashima
While reading sysctl_tcp_early_retrans, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: eed530b6c676 ("tcp: early retransmit") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20tcp: Fix data-races around sysctl knobs related to SYN option.Kuniyuki Iwashima
While reading these knobs, they can be changed concurrently. Thus, we need to add READ_ONCE() to their readers. - tcp_sack - tcp_window_scaling - tcp_timestamps Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20udp: Fix a data-race around sysctl_udp_l3mdev_accept.Kuniyuki Iwashima
While reading sysctl_udp_l3mdev_accept, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 63a6fff353d0 ("net: Avoid receiving packets with an l3mdev on unbound UDP sockets") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20ip: Fix data-races around sysctl_ip_prot_sock.Kuniyuki Iwashima
sysctl_ip_prot_sock is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. Fixes: 4548b683b781 ("Introduce a sysctl that modifies the value of PROT_SOCK.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20ipv4: Fix data-races around sysctl_fib_multipath_hash_fields.Kuniyuki Iwashima
While reading sysctl_fib_multipath_hash_fields, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: ce5c9c20d364 ("ipv4: Add a sysctl to control multipath hash fields") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20ipv4: Fix data-races around sysctl_fib_multipath_hash_policy.Kuniyuki Iwashima
While reading sysctl_fib_multipath_hash_policy, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: bf4e0a3db97e ("net: ipv4: add support for ECMP hash policy choice") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20ipv4: Fix a data-race around sysctl_fib_multipath_use_neigh.Kuniyuki Iwashima
While reading sysctl_fib_multipath_use_neigh, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: a6db4494d218 ("net: ipv4: Consider failed nexthops in multipath routes") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2022-07-20 1) Fix a policy refcount imbalance in xfrm_bundle_lookup. From Hangyu Hua. 2) Fix some clang -Wformat warnings. Justin Stitt ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20can: rcar_canfd: Add missing of_node_put() in rcar_canfd_probe()Liang He
We should use of_node_put() for the reference returned by of_get_child_by_name() which has increased the refcount. Fixes: 45721c406dcf ("can: rcar_canfd: Add support for r8a779a0 SoC") Link: https://lore.kernel.org/all/20220712095623.364287-1-windhl@126.com Signed-off-by: Liang He <windhl@126.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-07-20can: mcp251xfd: fix detection of mcp251863Marc Kleine-Budde
In commit c6f2a617a0a8 ("can: mcp251xfd: add support for mcp251863") support for the mcp251863 was added. However it was not taken into account that the auto detection of the chip model cannot distinguish between mcp2518fd and mcp251863 and would lead to a warning message if the firmware specifies a mcp251863. Fix auto detection: If a mcp2518fd compatible chip is found, keep the mcp251863 if specified by firmware, use mcp2518fd instead. Link: https://lore.kernel.org/all/20220706064835.1848864-1-mkl@pengutronix.de Fixes: c6f2a617a0a8 ("can: mcp251xfd: add support for mcp251863") Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-07-20drm/imx/dcss: Add missing of_node_put() in fail pathLiang He
In dcss_dev_create() and dcss_dev_destroy(), we should call of_node_put() in fail path or before the dcss's destroy as of_graph_get_port_by_id() has increased the refcount. Fixes: 9021c317b770 ("drm/imx: Add initial support for DCSS on iMX8MQ") Signed-off-by: Liang He <windhl@126.com> Reviewed-by: Laurentiu Palcu <laurentiu.palcu@oss.nxp.com> Signed-off-by: Laurentiu Palcu <laurentiu.palcu@oss.nxp.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220714081337.374761-1-windhl@126.com
2022-07-19drm/i915/guc: support v69 in parallel to v70Daniele Ceraolo Spurio
This patch re-introduces support for GuC v69 in parallel to v70. As this is a quick fix, v69 has been re-introduced as the single "fallback" guc version in case v70 is not available on disk and only for platforms that are out of force_probe and require the GuC by default. All v69 specific code has been labeled as such for easy identification, and the same was done for all v70 functions for which there is a separate v69 version, to avoid accidentally calling the wrong version via the unlabeled name. When the fallback mode kicks in, a drm_notice message is printed in dmesg to inform the user of the required update. The existing logging of the fetch function has also been updated so that we no longer complain immediately if we can't find a fw and we only throw an error if the fetch of both the base and fallback blobs fails. The plan is to follow this up with a more complex rework to allow for multiple different GuC versions to be supported at the same time. v2: reduce the fallback to platform that require it, switch to firmware_request_nowarn(), improve logs. Fixes: 2584b3549f4c ("drm/i915/guc: Update to GuC version 70.1.1") Link: https://lists.freedesktop.org/archives/intel-gfx/2022-July/301640.html Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Dave Airlie <airlied@gmail.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220718230732.1409641-1-daniele.ceraolospurio@intel.com (cherry picked from commit 774ce1510e6ccb9c0752d4aa7a9ff3624b3db3f3) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2022-07-19drm/i915/guc: Support programming the EU priority in the GuC descriptorMatthew Brost
In GuC submission mode the EU priority must be updated by the GuC rather than the driver as the GuC owns the programming of the context descriptor. Given that the GuC code uses the GuC priorities, we can't use a generic function using i915 priorities for both execlists and GuC submission. The existing function has therefore been pushed to the execlists back-end while a new one has been added for GuC. v2: correctly use the GuC prio. Cc: John Harrison <john.c.harrison@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220504234636.2119794-1-daniele.ceraolospurio@intel.com (cherry picked from commit a5c89f7c43c12c592a882a0ec2a15e9df0011e80) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2022-07-19Merge branch '40GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-07-18 This series contains updates to iavf driver only. Przemyslaw fixes handling of multiple VLAN requests to account for individual errors instead of rejecting them all. He removes incorrect implementations of ETHTOOL_COALESCE_MAX_FRAMES and ETHTOOL_COALESCE_MAX_FRAMES_IRQ. He also corrects an issue with NULL pointer caused by improper handling of dummy receive descriptors. Finally, he corrects debug prints reporting an unknown state. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: iavf: Fix missing state logs iavf: Fix handling of dummy receive descriptors iavf: Disallow changing rx/tx-frames and rx/tx-frames-irq iavf: Fix VLAN_V2 addition/rejection ==================== Link: https://lore.kernel.org/r/20220718174807.4113582-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19Documentation: fix udp_wmem_min in ip-sysctl.rstXin Long
UDP doesn't support tx memory accounting, and sysctl udp_wmem_min is not really used anywhere. So we should fix the description in ip-sysctl.rst accordingly. Fixes: 95766fff6b9a ("[UDP]: Add memory accounting.") Signed-off-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/c880a963d9b1fb5f442ae3c9e4dfa70d45296a16.1658167019.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19net: ethernet: mtk_ppe: fix possible NULL pointer dereference in ↵Lorenzo Bianconi
mtk_flow_get_wdma_info odev pointer can be NULL in mtk_flow_offload_replace routine according to the flower action rules. Fix possible NULL pointer dereference in mtk_flow_get_wdma_info. Fixes: a333215e10cb5 ("net: ethernet: mtk_eth_soc: implement flow offloading to WED devices") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/4e1685bc4976e21e364055f6bee86261f8f9ee93.1658137753.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19r8152: fix a WOL issueHayes Wang
This fixes that the platform is waked by an unexpected packet. The size and range of FIFO is different when the device enters S3 state, so it is necessary to correct some settings when suspending. Regardless of jumbo frame, set RMS to 1522 and MTPS to MTPS_DEFAULT. Besides, enable MCU_BORW_EN to update the method of calculating the pointer of data. Then, the hardware could get the correct data. Fixes: 195aae321c82 ("r8152: support new chips") Signed-off-by: Hayes Wang <hayeswang@realtek.com> Link: https://lore.kernel.org/r/20220718082120.10957-391-nic_swsd@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19drm/panel-edp: Fix variable typo when saving hpd absent delay from DTNícolas F. R. A. Prado
The value read from the "hpd-absent-delay-ms" property in DT was being saved to the wrong variable, overriding the hpd_reliable delay. Fix the typo. Fixes: 5540cf8f3e8d ("drm/panel-edp: Implement generic "edp-panel"s probed by EDID") Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Reviewed-by: André Almeida <andrealmeid@igalia.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20220719203857.1488831-4-nfraprado@collabora.com
2022-07-19virt: sev-guest: Pass the appropriate argument type to iounmap()Tom Lendacky
Fix a sparse warning in sev_guest_probe() where the wrong argument type is provided to iounmap(). Fixes: fce96cf04430 ("virt: Add SEV-SNP guest driver") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/202207150617.jqwQ0Rpz-lkp@intel.com
2022-07-19Merge branch 'md-fixes' of ↵Jens Axboe
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-5.19 Pull MD fix from Song. * 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md/raid5: missing error code in setup_conf()
2022-07-19srcu: Make expedited RCU grace periods block even less frequentlyNeeraj Upadhyay
The purpose of commit 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers from consuming CPU") was to prevent a long series of never-blocking expedited SRCU grace periods from blocking kernel-live-patching (KLP) progress. Although it was successful, it also resulted in excessive boot times on certain embedded workloads running under qemu with the "-bios QEMU_EFI.fd" command line. Here "excessive" means increasing the boot time up into the three-to-four minute range. This increase in boot time was due to the more than 6000 back-to-back invocations of synchronize_rcu_expedited() within the KVM host OS, which in turn resulted from qemu's emulation of a long series of MMIO accesses. Commit 640a7d37c3f4 ("srcu: Block less aggressively for expedited grace periods") did not significantly help this particular use case. Zhangfei Gao and Shameerali Kolothum Thodi did experiments varying the value of SRCU_MAX_NODELAY_PHASE with HZ=250 and with various values of non-sleeping per phase counts on a system with preemption enabled, and observed the following boot times: +──────────────────────────+────────────────+ | SRCU_MAX_NODELAY_PHASE | Boot time (s) | +──────────────────────────+────────────────+ | 100 | 30.053 | | 150 | 25.151 | | 200 | 20.704 | | 250 | 15.748 | | 500 | 11.401 | | 1000 | 11.443 | | 10000 | 11.258 | | 1000000 | 11.154 | +──────────────────────────+────────────────+ Analysis on the experiment results show additional improvements with CPU-bound delays approaching one jiffy in duration. This improvement was also seen when number of per-phase iterations were scaled to one jiffy. This commit therefore scales per-grace-period phase number of non-sleeping polls so that non-sleeping polls extend for about one jiffy. In addition, the delay-calculation call to srcu_get_delay() in srcu_gp_end() is replaced with a simple check for an expedited grace period. This change schedules callback invocation immediately after expedited grace periods complete, which results in greatly improved boot times. Testing done by Marc and Zhangfei confirms that this change recovers most of the performance degradation in boottime; for CONFIG_HZ_250 configuration, specifically, boot times improve from 3m50s to 41s on Marc's setup; and from 2m40s to ~9.7s on Zhangfei's setup. In addition to the changes to default per phase delays, this change adds 3 new kernel parameters - srcutree.srcu_max_nodelay, srcutree.srcu_max_nodelay_phase, and srcutree.srcu_retry_check_delay. This allows users to configure the srcu grace period scanning delays in order to more quickly react to additional use cases. Fixes: 640a7d37c3f4 ("srcu: Block less aggressively for expedited grace periods") Fixes: 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers from consuming CPU") Reported-by: Zhangfei Gao <zhangfei.gao@linaro.org> Reported-by: yueluck <yueluck@163.com> Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Link: https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/ Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-07-19srcu: Block less aggressively for expedited grace periodsPaul E. McKenney
Commit 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers from consuming CPU") fixed a problem where a long-running expedited SRCU grace period could block kernel live patching. It did so by giving up on expediting once a given SRCU expedited grace period grew too old. Unfortunately, this added excessive delays to boots of virtual embedded systems specifying "-bios QEMU_EFI.fd" to qemu. This commit therefore makes the transition away from expediting less aggressive, increasing the per-grace-period phase number of non-sleeping polls of readers from one to three and increasing the required grace-period age from one jiffy (actually from zero to one jiffies) to two jiffies (actually from one to two jiffies). Fixes: 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers from consuming CPU") Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reported-by: Zhangfei Gao <zhangfei.gao@linaro.org> Reported-by: chenxiang (M)" <chenxiang66@hisilicon.com> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Link: https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/
2022-07-19KVM: x86: Protect the unused bits in MSR exiting flagsAaron Lewis
The flags for KVM_CAP_X86_USER_SPACE_MSR and KVM_X86_SET_MSR_FILTER have no protection for their unused bits. Without protection, future development for these features will be difficult. Add the protection needed to make it possible to extend these features in the future. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Message-Id: <20220714161314.1715227-1-aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-19md/raid5: missing error code in setup_conf()Dan Carpenter
Return -ENOMEM if the allocation fails. Don't return success. Fixes: 8fbcba6b999b ("md/raid5: Cleanup setup_conf() error returns") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org>
2022-07-19tools headers UAPI: Sync linux/kvm.h with the kernel sourcesPaolo Bonzini
Silence this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h' diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-19KVM: selftests: Fix target thread to be migrated in rseq_testGavin Shan
In rseq_test, there are two threads, which are vCPU thread and migration worker separately. Unfortunately, the test has the wrong PID passed to sched_setaffinity() in the migration worker. It forces migration on the migration worker because zeroed PID represents the calling thread, which is the migration worker itself. It means the vCPU thread is never enforced to migration and it can migrate at any time, which eventually leads to failure as the following logs show. host# uname -r 5.19.0-rc6-gavin+ host# # cat /proc/cpuinfo | grep processor | tail -n 1 processor : 223 host# pwd /home/gavin/sandbox/linux.main/tools/testing/selftests/kvm host# for i in `seq 1 100`; do \ echo "--------> $i"; ./rseq_test; done --------> 1 --------> 2 --------> 3 --------> 4 --------> 5 --------> 6 ==== Test Assertion Failure ==== rseq_test.c:265: rseq_cpu == cpu pid=3925 tid=3925 errno=4 - Interrupted system call 1 0x0000000000401963: main at rseq_test.c:265 (discriminator 2) 2 0x0000ffffb044affb: ?? ??:0 3 0x0000ffffb044b0c7: ?? ??:0 4 0x0000000000401a6f: _start at ??:? rseq CPU = 4, sched CPU = 27 Fix the issue by passing correct parameter, TID of the vCPU thread, to sched_setaffinity() in the migration worker. Fixes: 61e52f1630f5 ("KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs") Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Message-Id: <20220719020830.3479482-1-gshan@redhat.com> Reviewed-by: Andrew Jones <andrew.jones@linux.dev> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-19KVM: stats: Fix value for KVM_STATS_UNIT_MAX for boolean statsOliver Upton
commit 1b870fa5573e ("kvm: stats: tell userspace which values are boolean") added a new stat unit (boolean) but failed to raise KVM_STATS_UNIT_MAX. Fix by pointing UNIT_MAX at the new max value of UNIT_BOOLEAN. Fixes: 1b870fa5573e ("kvm: stats: tell userspace which values are boolean") Reported-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20220719125229.2934273-1-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-19Merge branch 'amt-fix-validation-and-synchronization-bugs'Paolo Abeni
Taehee Yoo says: ==================== amt: fix validation and synchronization bugs There are some synchronization issues in the amt module. Especially, an amt gateway doesn't well synchronize its own variables and status(amt->status). It tries to use a workqueue for handles in a single thread. A global lock is also good, but it would occur complex locking complex. In this patchset, only the gateway uses workqueue. The reason why only gateway interface uses workqueue is that gateway should manage its own states and variables a little bit statefully. But relay doesn't need to manage tunnels statefully, stateless is okay. So, relay side message handlers are okay to be called concurrently. But it doesn't mean that no lock is needed. Only amt multicast data message type will not be processed by the work queue because It contains actual multicast data. So, it should be processed immediately. When any amt gateway events are triggered(sending discovery message by delayed_work, sending request message by delayed_work and receiving messages), it stores event and skb into the event queue(amt->events[16]). Then, workqueue processes these events one by one. The first patch is to use the work queue. The second patch is to remove unnecessary lock due to a previous patch. The third patch is to use READ_ONCE() in the amt module. Even if the amt module uses a single thread, some variables (ready4, ready6, amt->status) can be accessed concurrently. The fourth patch is to add missing nonce generation logic when it sends a new request message. The fifth patch is to drop unexpected advertisement messages. advertisement message should be received only after the gateway sends a discovery message first. So, the gateway should drop advertisement messages if it has never sent a discovery message and it also should drop duplicate advertisement messages. Using nonce is good to distinguish whether a received message is an expected message or not. The sixth patch is to drop unexpected query messages. This is the same behavior as the fourth patch. Query messages should be received only after the gateway sends a request message first. The nonce variable is used to distinguish whether it is a reply to a previous request message or not. amt->ready4 and amt->ready6 are used to distinguish duplicate messages. The seventh patch is to drop unexpected multicast data. AMT gateway should not receive multicast data message type before establish between gateway and relay. In order to drop unexpected multicast data messages, it checks amt->status. The last patch is to fix a locking problem on the relay side. amt->nr_tunnels variable is protected by amt->lock. But amt_request_handler() doesn't protect this variable. v2: - Use local_bh_disable() instead of rcu_read_lock_bh() in amt_membership_query_handler. - Fix using uninitialized variables. - Fix unexpectedly start the event_wq after stopping. - Fix possible deadlock in amt_event_work(). - Add a limit variable in amt_event_work() to prevent infinite working. - Rename amt_queue_events() to amt_queue_event(). ==================== Link: https://lore.kernel.org/r/20220717160910.19156-1-ap420073@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-19amt: do not use amt->nr_tunnels outside of lockTaehee Yoo
amt->nr_tunnels is protected by amt->lock. But, amt_request_handler() has been using this variable without the amt->lock. So, it expands context of amt->lock in the amt_request_handler() to protect amt->nr_tunnels variable. Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-19amt: drop unexpected multicast dataTaehee Yoo
AMT gateway interface should not receive unexpected multicast data. Multicast data message type should be received after sending an update message, which means all establishment between gateway and relay is finished. So, amt_multicast_data_handler() checks amt->status. Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-19amt: drop unexpected query messageTaehee Yoo
AMT gateway interface should not receive unexpected query messages. In order to drop unexpected query messages, it checks nonce. And it also checks ready4 and ready6 variables to drop duplicated messages. Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-19amt: drop unexpected advertisement messageTaehee Yoo
AMT gateway interface should not receive unexpected advertisement messages. In order to drop these packets, it should check nonce and amt->status. Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-19amt: add missing regeneration nonce logic in request logicTaehee Yoo
When AMT gateway starts sending a new request message, it should regenerate the nonce variable. Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-19amt: use READ_ONCE() in amt moduleTaehee Yoo
There are some data races in the amt module. amt->ready4, amt->ready6, and amt->status can be accessed concurrently without locks. So, it uses READ_ONCE() and WRITE_ONCE(). Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-19amt: remove unnecessary locksTaehee Yoo
By the previous patch, amt gateway handlers are changed to worked by a single thread. So, most locks for gateway are not needed. So, it removes. Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>