Age | Commit message (Collapse) | Author |
|
All supported compilers today (gcc v5.1+ and clang v11+) have support for
-mcmodel=medium. As such, NO_MINIMAL_TOC is no longer being set. Remove
NO_MINIMAL_TOC as well as the fallback to -mminimal-toc.
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240110141237.3179199-1-naveen@kernel.org
|
|
This part was commented in about 19 years before.
If there are no plans to enable this part code in the future,
we can remove this dead code.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240126025030.577795-1-chentao@kylinos.cn
|
|
This part was commented from commit a33a7d7309d7
("[PATCH] spufs: implement mfc access for PPE-side DMA")
in about 18 years before.
If there are no plans to enable this part code in the future,
we can remove this dead code.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240126021258.574916-1-chentao@kylinos.cn
|
|
This part was commented from commit 165785e5c0be ("[POWERPC] Cell
iommu support") in about 17 years before.
If there are no plans to enable this part code in the future,
we can remove this dead code.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240125082637.532826-1-chentao@kylinos.cn
|
|
The first 32k of memory is reserved for interrupt vectors, however for
powerpc64 this might not be enough. Fix this by reserving the maximum
size between 32k and the real size of interrupt vectors.
Signed-off-by: GUO Zihua <guozihua@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240113080509.1598290-1-guozihua@huawei.com
|
|
Update the node name to be align with binding document.
Signed-off-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240119203911.3143928-4-Frank.Li@nxp.com
|
|
Update dts to match dts binding document.
Signed-off-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240119203911.3143928-3-Frank.Li@nxp.com
|
|
Due to the INTA is shared with the active-low PHY2 interrupt on
P1010RDB-PA board, so configure P1010RDB-PA's INTA with polarity as
active-low, the P1010RDB-PB board is used separately, so configure
P1010RDB-PB's INTA with polarity as active-high. The INTX in
P1010RDB-PB do not work because of the pcie@0 node fixup will be
overwrited by p1010si-post.dtsi file, so we move the pcie@0 node fixup
to p1010rdb-pb.dts and p1010rdb-pb_36b.dts.
Signed-off-by: Xiaowei Bao <xiaowei.bao@nxp.com>
Signed-off-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240119203911.3143928-2-Frank.Li@nxp.com
|
|
Enable Power Management feature on device tree, including MPC8536,
MPC8544, MPC8548, MPC8572, P1010, P1020, P1021, P1022, P2020, P2041,
P3041, T104X, T1024.
Signed-off-by: Zhao Chenhui <chenhui.zhao@freescale.com>
Signed-off-by: Ran Wang <ran.wang_1@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240119203911.3143928-1-Frank.Li@nxp.com
|
|
This commit adds kernel-doc style comments with complete parameter
descriptions for the function smp_startup_cpu().
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240408053109.96360-2-yang.lee@linux.alibaba.com
|
|
Fix some function names in kernel-doc comments.
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240408053109.96360-1-yang.lee@linux.alibaba.com
|
|
Fix the kernel-doc annotation for the 'skip' parameter in the
partial_decompress() function by adding a missing underscore and colon.
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240408083916.123369-1-yang.lee@linux.alibaba.com
|
|
Commit ab1a517d55b0 ("powerpc/syscall: Rename syscall_64.c into
interrupt.c") missed to update these three lines:
GCOV_PROFILE_syscall_64.o := n
KCOV_INSTRUMENT_syscall_64.o := n
UBSAN_SANITIZE_syscall_64.o := n
To restore the original behavior, we could replace them with:
GCOV_PROFILE_interrupt.o := n
KCOV_INSTRUMENT_interrupt.o := n
UBSAN_SANITIZE_interrupt.o := n
However, nobody has noticed the functional change in the past three
years, so they were unneeded.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240216135517.2002749-1-masahiroy@kernel.org
|
|
Recent additions in BPF like cpu v4 instructions, test_bpf module
exhibits the following failures:
test_bpf: #82 ALU_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
test_bpf: #83 ALU_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
test_bpf: #84 ALU64_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
test_bpf: #85 ALU64_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
test_bpf: #86 ALU64_MOVSX | BPF_W jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
test_bpf: #165 ALU_SDIV_X: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)
test_bpf: #166 ALU_SDIV_K: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)
test_bpf: #169 ALU_SMOD_X: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)
test_bpf: #170 ALU_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)
test_bpf: #172 ALU64_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)
test_bpf: #313 BSWAP 16: 0x0123456789abcdef -> 0xefcd
eBPF filter opcode 00d7 (@2) unsupported
jited:0 301 PASS
test_bpf: #314 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89
eBPF filter opcode 00d7 (@2) unsupported
jited:0 555 PASS
test_bpf: #315 BSWAP 64: 0x0123456789abcdef -> 0x67452301
eBPF filter opcode 00d7 (@2) unsupported
jited:0 268 PASS
test_bpf: #316 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89
eBPF filter opcode 00d7 (@2) unsupported
jited:0 269 PASS
test_bpf: #317 BSWAP 16: 0xfedcba9876543210 -> 0x1032
eBPF filter opcode 00d7 (@2) unsupported
jited:0 460 PASS
test_bpf: #318 BSWAP 32: 0xfedcba9876543210 -> 0x10325476
eBPF filter opcode 00d7 (@2) unsupported
jited:0 320 PASS
test_bpf: #319 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe
eBPF filter opcode 00d7 (@2) unsupported
jited:0 222 PASS
test_bpf: #320 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476
eBPF filter opcode 00d7 (@2) unsupported
jited:0 273 PASS
test_bpf: #344 BPF_LDX_MEMSX | BPF_B
eBPF filter opcode 0091 (@5) unsupported
jited:0 432 PASS
test_bpf: #345 BPF_LDX_MEMSX | BPF_H
eBPF filter opcode 0089 (@5) unsupported
jited:0 381 PASS
test_bpf: #346 BPF_LDX_MEMSX | BPF_W
eBPF filter opcode 0081 (@5) unsupported
jited:0 505 PASS
test_bpf: #490 JMP32_JA: Unconditional jump: if (true) return 1
eBPF filter opcode 0006 (@1) unsupported
jited:0 261 PASS
test_bpf: Summary: 1040 PASSED, 10 FAILED, [924/1038 JIT'ed]
Fix them by adding missing processing.
Fixes: daabb2b098e0 ("bpf/tests: add tests for cpuv4 instructions")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/91de862dda99d170697eb79ffb478678af7e0b27.1709652689.git.christophe.leroy@csgroup.eu
|
|
There is code that builds with calls to IO accessors even when
CONFIG_PCI=n, but the actual calls are guarded by runtime checks.
If not those calls would be faulting, because the page at virtual
address zero is (usually) not mapped into the kernel. As Arnd pointed
out, it is possible a large port value could cause the address to be
above mmap_min_addr which would then access userspace, which would be
a bug.
To avoid any such issues, set _IO_BASE to POISON_POINTER_DELTA. That
is a value chosen to point into unmapped space between the kernel and
userspace, so any access will always fault.
Note that on 32-bit POISON_POINTER_DELTA is 0, so the patch only has an
effect on 64-bit.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240503075619.394467-2-mpe@ellerman.id.au
|
|
With -Wextra clang warns about pointer arithmetic using a null pointer.
When building with CONFIG_PCI=n, that triggers a warning in the IO
accessors, eg:
In file included from linux/arch/powerpc/include/asm/io.h:672:
linux/arch/powerpc/include/asm/io-defs.h:23:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
23 | DEF_PCI_AC_RET(inb, u8, (unsigned long port), (port), pio, port)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...
linux/arch/powerpc/include/asm/io.h:591:53: note: expanded from macro '__do_inb'
591 | #define __do_inb(port) readb((PCI_IO_ADDR)_IO_BASE + port);
| ~~~~~~~~~~~~~~~~~~~~~ ^
That is because when CONFIG_PCI=n, _IO_BASE is defined as 0.
Although _IO_BASE is defined as plain 0, the cast (PCI_IO_ADDR) converts
it to void * before the addition with port happens.
Instead the addition can be done first, and then the cast. The resulting
value will be the same, but avoids the warning, and also avoids void
pointer arithmetic which is apparently non-standard.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Closes: https://lore.kernel.org/all/CA+G9fYtEh8zmq8k8wE-8RZwW-Qr927RLTn+KqGnq1F=ptaaNsA@mail.gmail.com
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240503075619.394467-1-mpe@ellerman.id.au
|
|
Currently, bpf jit code on powerpc assumes all the bpf functions and
helpers to be part of core kernel text. This is false for kfunc case,
as function addresses may not be part of core kernel text area. So,
add support for addresses that are not within core kernel text area
too, to enable kfunc support. Emit instructions based on whether the
function address is within core kernel text address or not, to retain
optimized instruction sequence where possible.
In case of PCREL, as a bpf function that is not within core kernel
text area is likely to go out of range with relative addressing on
kernel base, use PC relative addressing. If that goes out of range,
load the full address with PPC_LI64().
With addresses that are not within core kernel text area supported,
override bpf_jit_supports_kfunc_call() to enable kfunc support. Also,
override bpf_jit_supports_far_kfunc_call() to enable 64-bit pointers,
as an address offset can be more than 32-bit long on PPC64.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240502173205.142794-2-hbathini@linux.ibm.com
|
|
With PCREL addressing, there is no kernel TOC. So, it is not setup in
prologue when PCREL addressing is used. But the number of instructions
to skip on a tail call was not adjusted accordingly. That resulted in
not so obvious failures while using tailcalls. 'tailcalls' selftest
crashed the system with the below call trace:
bpf_test_run+0xe8/0x3cc (unreliable)
bpf_prog_test_run_skb+0x348/0x778
__sys_bpf+0xb04/0x2b00
sys_bpf+0x28/0x38
system_call_exception+0x168/0x340
system_call_vectored_common+0x15c/0x2ec
Also, as bpf programs are always module addresses and a bpf helper in
general is a core kernel text address, using PC relative addressing
often fails with "out of range of pcrel address" error. Switch to
using kernel base for relative addressing to handle this better.
Fixes: 7e3a68be42e1 ("powerpc/64: vmlinux support building with PCREL addresing")
Cc: stable@vger.kernel.org # v6.4+
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240502173205.142794-1-hbathini@linux.ibm.com
|
|
This list was moved many years ago.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240503121012.3ba5000b@canb.auug.org.au
|
|
Documents how to use the PR_PPC_GET_DEXCR and PR_PPC_SET_DEXCR prctl()'s
for changing a process's DEXCR or its process tree default value.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240417112325.728010-10-bgray@linux.ibm.com
|
|
Adds a utility to exercise the prctl DEXCR inheritance in the shell.
Supports setting and clearing each aspect.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
[mpe: Use correct SPDX license, use execvp() for usability, print errors]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240417112325.728010-9-bgray@linux.ibm.com
|
|
Now that the DEXCR can be configured with prctl, add a section in
lsdexcr that explains why each aspect is set the way it is.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240417112325.728010-8-bgray@linux.ibm.com
|
|
Now that a process can control its DEXCR to some extent, make the
hashchk tests more reliable by explicitly setting the local and onexec
NPHIE aspect.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240417112325.728010-7-bgray@linux.ibm.com
|
|
Some basic tests of the prctl interface of the DEXCR.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
[mpe: Add missing SPDX tag]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240417112325.728010-6-bgray@linux.ibm.com
|
|
Now that we track a DEXCR on a per-task basis, individual tasks are free
to configure it as they like.
The interface is a pair of getter/setter prctl's that work on a single
aspect at a time (multiple aspects at once is more difficult if there
are different rules applied for each aspect, now or in future). The
getter shows the current state of the process config, and the setter
allows setting/clearing the aspect.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
[mpe: Account for PR_RISCV_SET_ICACHE_FLUSH_CTX, shrink some longs lines]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240417112325.728010-5-bgray@linux.ibm.com
|
|
Inheriting the DEXCR across exec can have security and usability
concerns. If a program is compiled with hash instructions it generally
expects to run with NPHIE enabled. But if the parent process disables
NPHIE then if it's not careful it will be disabled for any children too
and the protection offered by hash checks is basically worthless.
This patch introduces a per-process reset value that new execs in a
particular process tree are initialized with. This enables fine grained
control over what DEXCR value child processes run with by default.
For example, containers running legacy binaries that expect hash
instructions to act as NOPs could configure the reset value of the
container root to control the default reset value for all members of
the container.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
[mpe: Add missing SPDX tag on dexcr.c]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240417112325.728010-4-bgray@linux.ibm.com
|
|
Add capability to make the DEXCR act as a per-process SPR.
We do not yet have an interface for changing the values per task. We
also expect the kernel to use a single DEXCR value across all tasks
while in privileged state, so there is no need to synchronize after
changing it (the userspace aspects will synchronize upon returning to
userspace).
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240417112325.728010-3-bgray@linux.ibm.com
|
|
The hashchk tests want to verify that the hash key is changed over exec.
It does so by calculating hashes at the same address across an exec.
This is made simpler by disabling PIE functionality, so we can
re-execute ourselves and be using the same addresses in the child.
While -fno-pie is already added, -no-pie is also required.
Fixes: bdb07f35a52f ("selftests/powerpc/dexcr: Add hashst/hashchk test")
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240417112325.728010-2-bgray@linux.ibm.com
|
|
The last function to reference module_bug_list went in 2008's
commit b9754568ef17 ("powerpc: Remove dead module_find_bug code")
but I don't think that was called since 2006's
commit 73c9ceab40b1 ("[POWERPC] Generic BUG for powerpc")
Now that the list has gone, I think we can also clean up the bug
entries in mod_arch_specific.
Lightly boot tested.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240503002317.183500-1-linux@treblig.org
|
|
Aneesh's IBM address no longer works, switch to his preferred kernel.org
address.
Acked-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240430044327.49363-1-mpe@ellerman.id.au
|
|
Aneesh is stepping down from powerpc maintenance.
Acked-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240430044228.49015-1-mpe@ellerman.id.au
|
|
The `memory_limit` variable should only be used during boot, enforce
that by marking it initdata.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240422115231.1769984-1-mpe@ellerman.id.au
|
|
use the new cleanup magic to replace of_node_put() with
__free(device_node) marking to auto release when they get out of scope.
Suggested-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: sundar <prosunofficial@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240424150718.5006-1-prosunofficial@gmail.com
|
|
The sources for the powerpc selftests are arranged into sub-directories.
However when the tests are built and installed, the sub-directories are
squashed, losing the structure.
For example, with the current code the result of installing the selftests is:
$ tree tools/testing/selftests/kselftest_install
tools/testing/selftests/kselftest_install
├── kselftest
│ ├── ktap_helpers.sh
│ ├── module.sh
│ ├── prefix.pl
│ └── runner.sh
├── kselftest-list.txt
├── powerpc
│ ├── alignment_handler
│ ├── attr_test
│ ├── back_to_back_ebbs_test
│ ├── bad_accesses
│ ├── bhrb_filter_map_test
│ ├── bhrb_no_crash_wo_pmu_test
│ ├── blacklisted_events_test
│ ├── cache_shape
│ ├── close_clears_pmcc_test
│ ├── context_switch
│ ├── copy_first_unaligned
...
│ ├── settings
...
│ └── wild_bctr
└── run_kselftest.sh
All the powerpc tests are squashed into the single powerpc directory. In
particular, note that there is a single `settings` file, even though
there are multiple settings files in the powerpc selftest sources. One
of the settings files ends up installed, depending on install order,
even if they have different contents.
Similarly if there were two tests with the same name in different
sub-directories they would clobber each other.
Fix it by replicating the directory structure of the source tree into
the install directory. The result being for example:
$ tree tools/testing/selftests/kselftest_install
tools/testing/selftests/kselftest_install
├── kselftest
│ ├── ktap_helpers.sh
│ ├── module.sh
│ ├── prefix.pl
│ └── runner.sh
├── kselftest-list.txt
├── powerpc
│ ├── alignment
│ │ ├── alignment_handler
│ │ └── copy_first_unaligned
│ ├── benchmarks
│ │ ├── context_switch
│ │ ├── exec_target
│ │ ├── fork
│ │ ├── futex_bench
│ │ ├── gettimeofday
│ │ ├── mmap_bench
│ │ ├── null_syscall
│ │ └── settings
...
│ ├── eeh
│ │ ├── eeh-basic.sh
│ │ ├── eeh-functions.sh
│ │ └── settings
...
│ └── vphn
│ └── test-vphn
└── run_kselftest.sh
Note multiple settings files in different sub-directories.
This change also has the effect of changing the names of the tests from
the point of view of the kselftest runner. Before the tests are named
eg:
powerpc:copy_first_unaligned
powerpc:cache_shape
powerpc:reg_access_test
After, the test collection names include the sub-directory:
powerpc/alignment:copy_first_unaligned
powerpc/cache_shape:cache_shape
powerpc/pmu/ebb:reg_access_test
That means whereas previously all powerpc tests could be run with:
$ ./run_kselftest.sh -c powerpc
After the change it's necessary to pass a regex that matches all powerpc
entries, eg:
$ ./run_kselftest.sh -c "powerpc.*"
The latter form also works before and after the change.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240422133453.1793988-2-mpe@ellerman.id.au
|
|
The pmu Makefile has grown more sub directories over the years. Rather
than open coding the rules for each subdir, use for loops.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240422133453.1793988-1-mpe@ellerman.id.au
|
|
Build breaks when executing make with run_tests for sub-folders
under powerpc. This is because, CFLAGS and GIT_VERSION macros are
defined in Makefile of toplevel powerpc folder.
make: Entering directory '/home/maddy/linux/tools/testing/selftests/powerpc/mm'
gcc hugetlb_vs_thp_test.c ../harness.c ../utils.c -o /home/maddy/selftest_output//hugetlb_vs_thp_test
hugetlb_vs_thp_test.c:6:10: fatal error: utils.h: No such file or directory
6 | #include "utils.h"
| ^~~~~~~~~
compilation terminated.
Fix this by adding the flags.mk in each sub-folder Makefile. Also remove
the CFLAGS and GIT_VERSION macros from powerpc/ folder Makefile since
the same is definied in flags.mk
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240229093711.581230-3-maddy@linux.ibm.com
|
|
When running `make -C powerpc/pmu run_tests` from top level selftests
directory, currently this error is being reported:
make: Entering directory '/home/maddy/linux/tools/testing/selftests/powerpc/pmu'
Makefile:40: warning: overriding recipe for target 'emit_tests'
../../lib.mk:111: warning: ignoring old recipe for target 'emit_tests'
gcc -m64 count_instructions.c ../harness.c event.c lib.c ../utils.c loop.S -o /home/maddy/selftest_output//count_instructions
In file included from count_instructions.c:13:
event.h:12:10: fatal error: utils.h: No such file or directory
12 | #include "utils.h"
| ^~~~~~~~~
compilation terminated.
This is due to missing of include path in CFLAGS. That is, CFLAGS and
GIT_VERSION macros are defined in the powerpc/ folder Makefile which
in this case is not involved.
To address the failure in case of executing specific sub-folder test
directly, a new rule file has been addded by the patch called "flags.mk"
under selftest/powerpc/ folder and is linked to all the Makefile of
powerpc/pmu sub-folders.
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
[mpe: Fixup ifeq, make GIT_VERSION simply expanded to avoid re-executing git describe]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240229093711.581230-2-maddy@linux.ibm.com
|
|
In some powerpc/ sub-folder Makefiles, CFLAGS are defined before lib.mk
include. Clean it up by re-ordering the flags to follow after the mk
include. This is needed to support sub-folders in powerpc/ buildable on
its own.
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240229093711.581230-1-maddy@linux.ibm.com
|
|
We noticed the following nuisance messages during boot process:
vio vio: uevent: failed to send synthetic uevent
vio 4000: uevent: failed to send synthetic uevent
vio 4001: uevent: failed to send synthetic uevent
vio 4002: uevent: failedto send synthetic uevent
vio 4004: uevent: failed to send synthetic uevent
It's caused by either vio_register_device_node() failing to set
dev->of_node or the node is missing a "compatible" property. To match
the definition of modalias in modalias_show(), remove the return of
ENODEV in such cases. The failure messages is also suppressed with this
change.
Signed-off-by: Lidong Zhong <lidong.zhong@suse.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240411020450.12725-1-lidong.zhong@suse.com
|
|
plpar_hcall(), plpar_hcall9(), and related functions expect callers to
provide valid result buffers of certain minimum size. Currently this
is communicated only through comments in the code and the compiler has
no idea.
For example, if I write a bug like this:
long retbuf[PLPAR_HCALL_BUFSIZE]; // should be PLPAR_HCALL9_BUFSIZE
plpar_hcall9(H_ALLOCATE_VAS_WINDOW, retbuf, ...);
This compiles with no diagnostics emitted, but likely results in stack
corruption at runtime when plpar_hcall9() stores results past the end
of the array. (To be clear this is a contrived example and I have not
found a real instance yet.)
To make this class of error less likely, we can use explicitly-sized
array parameters instead of pointers in the declarations for the hcall
APIs. When compiled with -Warray-bounds[1], the code above now
provokes a diagnostic like this:
error: array argument is too small;
is of size 32, callee requires at least 72 [-Werror,-Warray-bounds]
60 | plpar_hcall9(H_ALLOCATE_VAS_WINDOW, retbuf,
| ^ ~~~~~~
[1] Enabled for LLVM builds but not GCC for now. See commit
0da6e5fd6c37 ("gcc: disable '-Warray-bounds' for gcc-13 too") and
related changes.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240408-pseries-hvcall-retbuf-v1-1-ebc73d7253cf@linux.ibm.com
|
|
Erhard reported that kmemleak was showing a warning at boot:
kmemleak: Not scanning unknown object at 0xc00000007f000000
CPU: 0 PID: 0 Comm: swapper Not tainted 5.19.0-rc3-PMacG5+ #2
Call Trace:
.dump_stack_lvl+0x7c/0xc4 (unreliable)
.kmemleak_no_scan+0xe0/0x100
.iommu_init_early_dart+0x2f0/0x924
.pmac_probe+0x1b0/0x20c
.setup_arch+0x1b8/0x674
.start_kernel+0xdc/0xb74
start_here_common+0x1c/0x44
DART table allocated at: (____ptrval____)
Which he bisected to a change in kmemleak, commit
23c2d497de21 ("mm: kmemleak: take a full lowmem check in kmemleak_*_phys()").
Because pmac_probe() is called before mem_topology_setup(), the min/
max PFN variables are still zero. That causes kmemleak_alloc_phys() to
ignore the allocation, because the checks against the PFN fail. Then
kmemleak_no_scan() can't find the allocation and prints warning.
Given that kmemleak_alloc_phys() is ignoring the allocation to begin
with, there's no need to call kmemleak_no_scan() at all, which avoids
the warning.
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Closes: https://lore.kernel.org/all/bug-216156-206035@https.bugzilla.kernel.org%2F/
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240419115913.3317575-1-mpe@ellerman.id.au
|
|
When a device is hot removed on powernv, the hotplug driver clears
the device's state. However, on pseries, if a device is removed by
phyp after reaching the error threshold, the kernel remains unaware,
leading to the device not being torn down. This prevents necessary
remediation actions like failover.
Permanently disable the device if the presence check fails.
Also, in eeh_dev_check_failure in we may consider the error as false
positive if the device is hotpluged out as the get_state call returns
EEH_STATE_NOT_SUPPORT and we may end up not clearing the device state,
so log the event if the state is not moved to permanent failure state.
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240422075737.1405551-1-ganeshgr@linux.ibm.com
|
|
The patch titled ("powerpc: make fadump resilient with memory add/remove
events") has made significant changes to the implementation of fadump,
particularly on elfcorehdr creation and fadump crash info header
structure. Therefore, updating the fadump implementation documentation
to reflect those changes.
Following updates are done to firmware assisted dump documentation:
1. The elfcorehdr is no longer stored after fadump HDR in the reserved
dump area. Instead, the second kernel dynamically allocates memory
for the elfcorehdr within the address range from 0 to the boot memory
size. Therefore, update figures 1 and 2 of Memory Reservation during
the first and second kernels to reflect this change.
2. A version field has been added to the fadump header to manage the
future changes to fadump crash info header structure without changing
the fadump header magic number in the future. Therefore, remove the
corresponding TODO from the document.
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240422195932.1583833-4-sourabhjain@linux.ibm.com
|
|
The elfcorehdr describes the CPUs and memory of the crashed kernel to
the kernel that captures the dump, known as the second or fadump kernel.
The elfcorehdr needs to be updated if the system's memory changes due to
memory hotplug or online/offline events.
Currently, memory hotplug events are monitored in userspace by udev
rules, and fadump is re-registered, which recreates the elfcorehdr with
the latest available memory in the system.
However, the previous patch ("powerpc: make fadump resilient with memory
add/remove events") moved the creation of elfcorehdr to the second or
fadump kernel. This eliminates the need to regenerate the elfcorehdr
during memory hotplug or online/offline events.
Create a sysfs entry at /sys/kernel/fadump/hotplug_ready to let
userspace know that fadump re-registration is not required for memory
add/remove events.
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240422195932.1583833-3-sourabhjain@linux.ibm.com
|
|
Due to changes in memory resources caused by either memory hotplug or
online/offline events, the elfcorehdr, which describes the CPUs and
memory of the crashed kernel to the kernel that collects the dump (known
as second/fadump kernel), becomes outdated. Consequently, attempting
dump collection with an outdated elfcorehdr can lead to failed or
inaccurate dump collection.
Memory hotplug or online/offline events is referred as memory add/remove
events in reset of the commit message.
The current solution to address the aforementioned issue is as follows:
Monitor memory add/remove events in userspace using udev rules, and
re-register fadump whenever there are changes in memory resources. This
leads to the creation of a new elfcorehdr with updated system memory
information.
There are several notable issues associated with re-registering fadump
for every memory add/remove events.
1. Bulk memory add/remove events with udev-based fadump re-registration
can lead to race conditions and, more importantly, it creates a wide
window during which fadump is inactive until all memory add/remove
events are settled.
2. Re-registering fadump for every memory add/remove event is
inefficient.
3. The memory for elfcorehdr is allocated based on the memblock regions
available during early boot and remains fixed thereafter. However, if
elfcorehdr is later recreated with additional memblock regions, its
size will increase, potentially leading to memory corruption.
Address the aforementioned challenges by shifting the creation of
elfcorehdr from the first kernel (also referred as the crashed kernel),
where it was created and frequently recreated for every memory
add/remove event, to the fadump kernel. As a result, the elfcorehdr only
needs to be created once, thus eliminating the necessity to re-register
fadump during memory add/remove events.
At present, the first kernel prepares fadump header and stores it in the
fadump reserved area. The fadump header includes the start address of
the elfcorehdr, crashing CPU details, and other relevant information. In
the event of a crash in the first kernel, the second/fadump boots and
accesses the fadump header prepared by the first kernel. It then
performs the following steps in a platform-specific function
[rtas|opal]_fadump_process:
1. Sanity check for fadump header
2. Update CPU notes in elfcorehdr
Along with the above, update the setup_fadump()/fadump.c to create
elfcorehdr and set its address to the global variable elfcorehdr_addr
for the vmcore module to process it in the second/fadump kernel.
Section below outlines the information required to create the elfcorehdr
and the changes made to make it available to the fadump kernel if it's
not already.
To create elfcorehdr, the following crashed kernel information is
required: CPU notes, vmcoreinfo, and memory ranges.
At present, the CPU notes are already prepared in the fadump kernel, so
no changes are needed in that regard. The fadump kernel has access to
all crashed kernel memory regions, including boot memory regions that
are relocated by firmware to fadump reserved areas, so no changes for
that either. However, it is necessary to add new members to the fadump
header, i.e., the 'fadump_crash_info_header' structure, in order to pass
the crashed kernel's vmcoreinfo address and its size to fadump kernel.
In addition to the vmcoreinfo address and size, there are a few other
attributes also added to the fadump_crash_info_header structure.
1. version:
It stores the fadump header version, which is currently set to 1.
This provides flexibility to update the fadump crash info header in
the future without changing the magic number. For each change in the
fadump header, the version will be increased. This will help the
updated kernel determine how to handle kernel dumps from older
kernels. The magic number remains relevant for checking fadump header
corruption.
2. pt_regs_sz/cpu_mask_sz:
Store size of pt_regs and cpu_mask structure of first kernel. These
attributes are used to prevent dump processing if the sizes of
pt_regs or cpu_mask structure differ between the first and fadump
kernels.
Note: if either first/crashed kernel or second/fadump kernel do not have
the changes introduced here then kernel fail to collect the dump and
prints relevant error message on the console.
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240422195932.1583833-2-sourabhjain@linux.ibm.com
|
|
Couple of Minor fixes:
- hcall return values are long. Fix that for h_get_mpp, h_get_ppp and
parse_ppp_data
- If hcall fails, values set should be at-least zero. It shouldn't be
uninitialized values. Fix that for h_get_mpp and h_get_ppp
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240412092047.455483-3-sshegde@linux.ibm.com
|
|
When there are no options specified for lparstat, it is expected to
give reports since LPAR(Logical Partition) boot.
APP(Available Processor Pool) is an indicator of how many cores in the
shared pool are free to use in Shared Processor LPAR(SPLPAR). APP is
derived using pool_idle_time which is obtained using H_PIC call.
The interval based reports show correct APP value while since boot
report shows very high APP values. This happens because in that case APP
is obtained by dividing pool idle time by LPAR uptime. Since pool idle
time is reported by the PowerVM hypervisor since its boot, it need not
align with LPAR boot.
To fix that export boot pool idle time in lparcfg and powerpc-utils will
use this info to derive APP as below for since boot reports.
APP = (pool idle time - boot pool idle time) / (uptime * timebase)
Results:: Observe APP values.
====================== Shared LPAR ================================
lparstat
System Configuration
type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00
reboot
stress-ng --cpu=$(nproc) -t 600
sleep 600
So in this case app is expected to close to 37-6=31.
====== 6.9-rc1 and lparstat 1.3.10 =============
%user %sys %wait %idle physc %entc lbusy app vcsw phint
----- ----- ----- ----- ----- ----- ----- ----- ----- -----
47.48 0.01 0.00 52.51 0.00 0.00 47.49 69099.72 541547 21
=== With this patch and powerpc-utils patch to do the above equation ===
%user %sys %wait %idle physc %entc lbusy app vcsw phint
----- ----- ----- ----- ----- ----- ----- ----- ----- -----
47.48 0.01 0.00 52.51 5.73 47.75 47.49 31.21 541753 21
=====================================================================
Note: physc, purr/idle purr being inaccurate is being handled in a
separate patch in powerpc-utils tree.
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240412092047.455483-2-sshegde@linux.ibm.com
|
|
memory limit value specified by the user are further updated such that
the value is 16MB aligned. This is because hash translation mode use
16MB as direct mapping page size. Make sure we update the global
variable 'memory_limit' with the 16MB aligned value such that all kernel
components will see the new aligned value of the memory limit.
Signed-off-by: Aneesh Kumar K.V (IBM) <aneesh.kumar@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240403083611.172833-3-aneesh.kumar@kernel.org
|
|
If the user specifies the memory limit, the kernel should honor it such
that all allocation and reservations are made within the memory limit
specified. fadump was breaking that rule. Remove the code which updates
the memory limit such that fadump reservations are done within the
limit specified.
Signed-off-by: Aneesh Kumar K.V (IBM) <aneesh.kumar@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240403083611.172833-2-aneesh.kumar@kernel.org
|
|
The value specified for the memory limit is used to set a restriction on
memory usage. It is important to ensure that this restriction is within
the linear map kernel address space range. The hash page table
translation uses a 16MB page size to map the kernel linear map address
space. htab_bolt_mapping() function aligns down the size of the range
while mapping kernel linear address space. Since the memblock limit is
enforced very early during boot, before we can detect the type of memory
translation (radix vs hash), we align the memory limit value specified
as a kernel parameter to 16MB. This alignment value will work for both
hash and radix translations.
Signed-off-by: Aneesh Kumar K.V (IBM) <aneesh.kumar@kernel.org>
Acked-by: Joel Savitz <jsavitz@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240403083611.172833-1-aneesh.kumar@kernel.org
|