Age | Commit message (Collapse) | Author |
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Incorrect bucket state transition in the discard path; when incrementing
a bucket's generation number that had already been discarded, we were
forgetting to check if it should be need_gc_gens, not free.
This was caught by the .invalid checks in the transaction commit path,
causing us to go emergency read only.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
With CONFIG_READ_ONLY_THP_FOR_FS, the Linux kernel supports using THPs
for read-only mmapped files, such as shared libraries. However, the
kernel makes no attempt to actually align those mappings on 2MB
boundaries, which makes it impossible to use those THPs most of the
time. This issue applies to general file mapping THP as well as
existing setups using CONFIG_READ_ONLY_THP_FOR_FS. This is easily
fixed by using thp_get_unmapped_area for the unmapped_area function
in bcachefs, which is what ext2, ext4, fuse, xfs and btrfs all use.
Similar to commit b0c582233a85 ("btrfs: fix alignment of VMA for
memory mapped files on THP").
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
i.e. the start of automatic self healing:
If errors=continue or fix_safe, we now automatically fix simple errors
without user intervention.
New error action option: fix_safe
This replaces the existing errors=ro option, which gets a new slot, i.e.
existing errors=ro users now get errors=fix_safe.
This is currently only enabled for a limited set of errors - initially
just disk accounting; errors we would never not want to fix, and we
don't want to require user intervention (i.e. to make sure a bug report
gets filed).
Errors will still be counted in the superblock, so we (developers) will
still know they've been occuring if a bug report gets filed (as bug
reports typically include the errors superblock section).
Eventually we'll be enabling this for a much wider set of errors, after
we've done thorough error injection testing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
reference: https://github.com/koverstreet/bcachefs/issues/692
trans->ref is the reference used by the cycle detector, which walks
btree_trans objects of other threads to walk the graph of held locks and
issue wakeups when an abort is required.
We have to wait for the ref to go to 1 before freeing trans->paths or
clearing trans->locking_wait.task.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
this is long running - help users see what's going on
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Missing enum conversion
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We only have 48 bits for the LRU time field, which is insufficient to
prevent wraparound.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
LRUs only have 48 bits for the time field (i.e. LRU order); thus we need
overflow checks and guards.
Reported-by: syzbot+df3bf3f088dcaa728857@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We've been moving away from going RW lazily; if we want to go RW we do
that in set_may_go_rw(), and if we didn't go RW we don't need to delete
dead snapshots.
Reported-by: syzbot+4366624c0b5aac4906cf@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We shouln't be running the journal shutdown sequence if we never fully
initialized the journal.
Reported-by: syzbot+ffd2270f0bca3322ee00@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We can only handle btree IDs up to 62, since the btree id (plus the type
for interior btree nodes) has to fit ito a 64 bit bitmask - check for
invalid ones to avoid invalid shifts later.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
these should be 64 bit bitmasks, not 32 bit.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Reported-by: syzbot+9f74cb4006b83e2a3df1@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We can't discard a bucket while it's still open; this needs the
bucket_is_open_safe() version, which takes the open_buckets lock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We use 0 size arrays as markers, but ubsan doesn't know that - cast them
to a pointer to fix the splat.
Also, make sure this code gets tested a bit more.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
btree_iter_init() needs to happen before key_cache_init(), to initialize
btree_trans_barrier
Reported-by: syzbot+3cca837c2183f8f6fcaf@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are a number of small char/misc and iio driver fixes for
6.10-rc4. Included in here are the following:
- iio driver fixes for a bunch of reported problems.
- mei driver fixes for a number of reported issues.
- amiga parport driver build fix.
- .editorconfig fix that was causing lots of unintended whitespace
changes to happen to files when they were being edited. Unless we
want to sweep the whole tree and remove all trailing whitespace at
once, this is needed for the .editorconfig file to be able to be
used at all. This change is required because the original
submitters never touched older files in the tree.
- jfs bugfix for a buffer overflow
The jfs bugfix is in here as I didn't know where else to put it, and
it's been ignored for a while as the filesystem seems to be abandoned
and I'm tired of seeing the same issue reported in multiple places.
All of these have been in linux-next with no reported issues"
* tag 'char-misc-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (25 commits)
.editorconfig: remove trim_trailing_whitespace option
jfs: xattr: fix buffer overflow for invalid xattr
misc: microchip: pci1xxxx: Fix a memory leak in the error handling of gp_aux_bus_probe()
misc: microchip: pci1xxxx: fix double free in the error handling of gp_aux_bus_probe()
parport: amiga: Mark driver struct with __refdata to prevent section mismatch
mei: vsc: Fix wrong invocation of ACPI SID method
mei: vsc: Don't stop/restart mei device during system suspend/resume
mei: me: release irq in mei_me_pci_resume error path
mei: demote client disconnect warning on suspend to debug
iio: inkern: fix channel read regression
iio: imu: inv_mpu6050: stabilized timestamping in interrupt
iio: adc: ad7173: Fix sampling frequency setting
iio: adc: ad7173: Clear append status bit
iio: imu: inv_icm42600: delete unneeded update watermark call
iio: imu: inv_icm42600: stabilized timestamp in interrupt
iio: invensense: fix odr switching to same value
iio: adc: ad7173: Remove index from temp channel
iio: adc: ad7173: Add ad7173_device_info names
iio: adc: ad7173: fix buffers enablement for ad7176-2
iio: temperature: mlx90635: Fix ERR_PTR dereference in mlx90635_probe()
...
|
|
Pull xfs fix from Chandan Babu:
"Ensure xfs incore superblock's allocated inode counter, free inode
counter, and free data block counter are all zero or positive when
they are copied over from xfs_mount->m_[icount,ifree,fdblocks]
respectively"
* tag 'xfs-6.10-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: make sure sb_fdblocks is non-negative
|
|
Pull smb server fixes from Steve French:
"Two small smb3 server fixes:
- set xatttr fix
- pathname parsing check fix"
* tag '6.10-rc3-smb3-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix missing use of get_write in in smb2_set_ea()
ksmbd: move leading slash check to smb2_get_name()
|
|
Pull NFS client fixes from Trond Myklebust:
"Bugfixes:
- NFSv4.2: Fix a memory leak in nfs4_set_security_label
- NFSv2/v3: abort nfs_atomic_open_v23 if the name is too long.
- NFS: Add appropriate memory barriers to the sillyrename code
- Propagate readlink errors in nfs_symlink_filler
- NFS: don't invalidate dentries on transient errors
- NFS: fix unnecessary synchronous writes in random write workloads
- NFSv4.1: enforce rootpath check when deciding whether or not to trunk
Other:
- Change email address for Trond Myklebust due to email server concerns"
* tag 'nfs-for-6.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: add barriers when testing for NFS_FSDATA_BLOCKED
SUNRPC: return proper error from gss_wrap_req_priv
NFSv4.1 enforce rootpath check in fs_location query
NFS: abort nfs_atomic_open_v23 if name is too long.
nfs: don't invalidate dentries on transient errors
nfs: Avoid flushing many pages with NFS_FILE_SYNC
nfs: propagate readlink errors in nfs_symlink_filler
MAINTAINERS: Change email address for Trond Myklebust
NFSv4: Fix memory leak in nfs4_set_security_label
|
|
Pull bcachefs fixes from Kent Overstreet:
- fix kworker explosion, due to calling submit_bio() (which can block)
from a multithreaded workqueue
- fix error handling in btree node scan
- forward compat fix: kill an old debug assert
- key cache shrinker fixes
This is a partial fix for stalls doing multithreaded creates - there
were various O(n^2) issues the key cache shrinker was hitting [1].
There's more work coming here; I'm working on a patch to delete the
key cache lock, which initial testing shows to be a pretty drastic
performance improvement
- assorted syzbot fixes
Link: https://lore.kernel.org/linux-bcachefs/CAGudoHGenxzk0ZqPXXi1_QDbfqQhGHu+wUwzyS6WmfkUZ1HiXA@mail.gmail.com/ [1]
* tag 'bcachefs-2024-06-12' of https://evilpiepirate.org/git/bcachefs:
bcachefs: Fix rcu_read_lock() leak in drop_extra_replicas
bcachefs: Add missing bch_inode_info.ei_flags init
bcachefs: Add missing synchronize_srcu_expedited() call when shutting down
bcachefs: Check for invalid bucket from bucket_gen(), gc_bucket()
bcachefs: Replace bucket_valid() asserts in bucket lookup with proper checks
bcachefs: Fix snapshot_create_lock lock ordering
bcachefs: Fix refcount leak in check_fix_ptrs()
bcachefs: Leave a buffer in the btree key cache to avoid lock thrashing
bcachefs: Fix reporting of freed objects from key cache shrinker
bcachefs: set sb->s_shrinker->seeks = 0
bcachefs: increase key cache shrinker batch size
bcachefs: Enable automatic shrinking for rhashtables
bcachefs: fix the display format for show-super
bcachefs: fix stack frame size in fsck.c
bcachefs: Delete incorrect BTREE_ID_NR assertion
bcachefs: Fix incorrect error handling found_btree_node_is_readable()
bcachefs: Split out btree_write_submit_wq
|
|
Fix an issue where get_write is not used in smb2_set_ea().
Fixes: 6fc0a265e1b9 ("ksmbd: fix potential circular locking issue in smb2_set_ea()")
Cc: stable@vger.kernel.org
Reported-by: Wang Zhaolong <wangzhaolong1@huawei.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
If the directory name in the root of the share starts with
character like 镜(0x955c) or Ṝ(0x1e5c), it (and anything inside)
cannot be accessed. The leading slash check must be checked after
converting unicode to nls string.
Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"Misc:
- Restore debugfs behavior of ignoring unknown mount options
- Fix kernel doc for netfs_wait_for_oustanding_io()
- Fix struct statx comment after new addition for this cycle
- Fix a check in find_next_fd()
iomap:
- Fix data zeroing behavior when an extent spans the block that
contains i_size
- Restore i_size increasing in iomap_write_end() for now to avoid
stale data exposure on xfs with a realtime device
Cachefiles:
- Remove unneeded fdtable.h include
- Improve trace output for cachefiles_obj_{get,put}_ondemand_fd()
- Remove requests from the request list to prevent accessing already
freed requests
- Fix UAF when issuing restore command while the daemon is still
alive by adding an additional reference count to requests
- Fix UAF by grabbing a reference during xarray lookup with xa_lock()
held
- Simplify error handling in cachefiles_ondemand_daemon_read()
- Add consistency checks read and open requests to avoid crashes
- Add a spinlock to protect ondemand_id variable which is used to
determine whether an anonymous cachefiles fd has already been
closed
- Make on-demand reads killable allowing to handle broken cachefiles
daemon better
- Flush all requests after the kernel has been marked dead via
CACHEFILES_DEAD to avoid hung-tasks
- Ensure that closed requests are marked as such to avoid reusing
them with a reopen request
- Defer fd_install() until after copy_to_user() succeeded and thereby
get rid of having to use close_fd()
- Ensure that anonymous cachefiles on-demand fds are reused while
they are valid to avoid pinning already freed cookies"
* tag 'vfs-6.10-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
iomap: Fix iomap_adjust_read_range for plen calculation
iomap: keep on increasing i_size in iomap_write_end()
cachefiles: remove unneeded include of <linux/fdtable.h>
fs/file: fix the check in find_next_fd()
cachefiles: make on-demand read killable
cachefiles: flush all requests after setting CACHEFILES_DEAD
cachefiles: Set object to close if ondemand_id < 0 in copen
cachefiles: defer exposing anon_fd until after copy_to_user() succeeds
cachefiles: never get a new anonymous fd if ondemand_id is valid
cachefiles: add spin_lock for cachefiles_ondemand_info
cachefiles: add consistency check for copen/cread
cachefiles: remove err_put_fd label in cachefiles_ondemand_daemon_read()
cachefiles: fix slab-use-after-free in cachefiles_ondemand_daemon_read()
cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd()
cachefiles: remove requests from xarray during flushing requests
cachefiles: add output string to cachefiles_obj_[get|put]_ondemand_fd
statx: Update offset commentary for struct statx
netfs: fix kernel doc for nets_wait_for_outstanding_io()
debugfs: continue to ignore unknown mount options
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We use the polling interface to srcu for tracking pending frees; when
shutting down we don't need to wait for an srcu barrier to free them,
but SRCU still gets confused if we shutdown with an outstanding grace
period.
Reported-by: syzbot+6a038377f0a594d7d44e@syzkaller.appspotmail.com
Reported-by: syzbot+0ece6edfd05ed20e32d9@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Turn more asserts into proper recoverable error paths.
Reported-by: syzbot+246b47da27f8e7e7d6fb@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The bucket_gens array and gc_buckets array known their own size; we
should be using those members, and returning an error.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
======================================================
WARNING: possible circular locking dependency detected
6.10.0-rc2-ktest-00018-gebd1d148b278 #144 Not tainted
------------------------------------------------------
fio/1345 is trying to acquire lock:
ffff88813e200ab8 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_truncate+0x76/0xf0
but task is already holding lock:
ffff888105a1fa38 (&sb->s_type->i_mutex_key#13){+.+.}-{3:3}, at: do_truncate+0x7b/0xc0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&sb->s_type->i_mutex_key#13){+.+.}-{3:3}:
down_write+0x3d/0xd0
bch2_write_iter+0x1c0/0x10f0
vfs_write+0x24a/0x560
__x64_sys_pwrite64+0x77/0xb0
x64_sys_call+0x17e5/0x1ab0
do_syscall_64+0x68/0x130
entry_SYSCALL_64_after_hwframe+0x4b/0x53
-> #1 (sb_writers#10){.+.+}-{0:0}:
mnt_want_write+0x4a/0x1d0
filename_create+0x69/0x1a0
user_path_create+0x38/0x50
bch2_fs_file_ioctl+0x315/0xbf0
__x64_sys_ioctl+0x297/0xaf0
x64_sys_call+0x10cb/0x1ab0
do_syscall_64+0x68/0x130
entry_SYSCALL_64_after_hwframe+0x4b/0x53
-> #0 (&c->snapshot_create_lock){++++}-{3:3}:
__lock_acquire+0x1445/0x25b0
lock_acquire+0xbd/0x2b0
down_read+0x40/0x180
bch2_truncate+0x76/0xf0
bchfs_truncate+0x240/0x3f0
bch2_setattr+0x7b/0xb0
notify_change+0x322/0x4b0
do_truncate+0x8b/0xc0
do_ftruncate+0x110/0x270
__x64_sys_ftruncate+0x43/0x80
x64_sys_call+0x1373/0x1ab0
do_syscall_64+0x68/0x130
entry_SYSCALL_64_after_hwframe+0x4b/0x53
other info that might help us debug this:
Chain exists of:
&c->snapshot_create_lock --> sb_writers#10 --> &sb->s_type->i_mutex_key#13
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&sb->s_type->i_mutex_key#13);
lock(sb_writers#10);
lock(&sb->s_type->i_mutex_key#13);
rlock(&c->snapshot_create_lock);
*** DEADLOCK ***
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
fsck_err() does a goto fsck_err on error; factor out check_fix_ptr() so
that our error label can drop our device ref.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We count objects as freed when we move them to the srcu-pending lists
because we're doing the equivalent of a kfree_srcu(); the only
difference is managing the pending list ourself means we can allocate
from the pending list.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
inodes and dentries are still present in the btree node cache, in much
more compact form
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Since the key cache shrinker walks the rhashtable, a mostly empty
rhashtable leads to really nasty reclaim performance issues.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
There are three keys displayed in non-uniform format.
Let's fix them.
[Before]
```
Label: testbcachefs
Version: 1.9: (unknown version)
Version upgrade complete: 0.0: (unknown version)
```
[After]
```
Label: testbcachefs
Version: 1.9: (unknown version)
Version upgrade complete: 0.0: (unknown version)
```
Fixes: 7423330e30ab ("bcachefs: prt_printf() now respects \r\n\t")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
fsck.c always runs top of the stack so we're not too concerned here;
noinline_for_stack is sufficient
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
for forwards compat we now explicitly allow mounting and using
filesystems with unknown btrees, and we have to walk them for fsck.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
error handling here is slightly odd, which is why we were accidently
calling evict() on an error pointer
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Split the workqueues for btree read completions and btree write
submissions; we don't want concurrency control on btree read
completions, but we do want concurrency control on write submissions,
else blocking in submit_bio() will cause a ton of kworkers to be
allocated.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
A user with a completely full filesystem experienced an unexpected
shutdown when the filesystem tried to write the superblock during
runtime.
kernel shows the following dmesg:
[ 8.176281] XFS (dm-4): Metadata corruption detected at xfs_sb_write_verify+0x60/0x120 [xfs], xfs_sb block 0x0
[ 8.177417] XFS (dm-4): Unmount and run xfs_repair
[ 8.178016] XFS (dm-4): First 128 bytes of corrupted metadata buffer:
[ 8.178703] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 01 90 00 00 XFSB............
[ 8.179487] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[ 8.180312] 00000020: cf 12 dc 89 ca 26 45 29 92 e6 e3 8d 3b b8 a2 c3 .....&E)....;...
[ 8.181150] 00000030: 00 00 00 00 01 00 00 06 00 00 00 00 00 00 00 80 ................
[ 8.182003] 00000040: 00 00 00 00 00 00 00 81 00 00 00 00 00 00 00 82 ................
[ 8.182004] 00000050: 00 00 00 01 00 64 00 00 00 00 00 04 00 00 00 00 .....d..........
[ 8.182004] 00000060: 00 00 64 00 b4 a5 02 00 02 00 00 08 00 00 00 00 ..d.............
[ 8.182005] 00000070: 00 00 00 00 00 00 00 00 0c 09 09 03 17 00 00 19 ................
[ 8.182008] XFS (dm-4): Corruption of in-memory data detected. Shutting down filesystem
[ 8.182010] XFS (dm-4): Please unmount the filesystem and rectify the problem(s)
When xfs_log_sb writes super block to disk, b_fdblocks is fetched from
m_fdblocks without any lock. As m_fdblocks can experience a positive ->
negative -> positive changing when the FS reaches fullness (see
xfs_mod_fdblocks). So there is a chance that sb_fdblocks is negative, and
because sb_fdblocks is type of unsigned long long, it reads super big.
And sb_fdblocks being bigger than sb_dblocks is a problem during log
recovery, xfs_validate_sb_write() complains.
Fix:
As sb_fdblocks will be re-calculated during mount when lazysbcount is
enabled, We just need to make xfs_validate_sb_write() happy -- make sure
sb_fdblocks is not nenative. This patch also takes care of other percpu
counters in xfs_log_sb.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
|
Pull smb client fixes from Steve French:
"Two small smb3 client fixes:
- fix deadlock in umount
- minor cleanup due to netfs change"
* tag '6.10-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Don't advance the I/O iterator before terminating subrequest
smb: client: fix deadlock in smb2_find_smb_tcon()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"14 hotfixes, 6 of which are cc:stable.
All except the nilfs2 fix affect MM and all are singletons - see the
chagelogs for details"
* tag 'mm-hotfixes-stable-2024-06-07-15-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors
mm: fix xyz_noprof functions calling profiled functions
codetag: avoid race at alloc_slab_obj_exts
mm/hugetlb: do not call vma_add_reservation upon ENOMEM
mm/ksm: fix ksm_zero_pages accounting
mm/ksm: fix ksm_pages_scanned accounting
kmsan: do not wipe out origin when doing partial unpoisoning
vmalloc: check CONFIG_EXECMEM in is_vmalloc_or_module_addr()
mm: page_alloc: fix highatomic typing in multi-block buddies
nilfs2: fix potential kernel bug due to lack of writeback flag waiting
memcg: remove the lockdep assert from __mod_objcg_mlstate()
mm: arm64: fix the out-of-bounds issue in contpte_clear_young_dirty_ptes
mm: huge_mm: fix undefined reference to `mthp_stats' for CONFIG_SYSFS=n
mm: drop the 'anon_' prefix for swap-out mTHP counters
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix handling of folio private changes.
The private value holds pointer to our extent buffer structure
representing a metadata range. Release and create of the range was
not properly synchronized when updating the private bit which ended
up in double folio_put, leading to all sorts of breakage
- fix a crash, reported as duplicate key in metadata, but caused by a
race of fsync and size extending write. Requires prealloc target
range + fsync and other conditions (log tree state, timing)
- fix leak of qgroup extent records after transaction abort
* tag 'for-6.10-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: protect folio::private when attaching extent buffer folios
btrfs: fix leak of qgroup extent records after transaction abort
btrfs: fix crash on racing fsync and size-extending write into prealloc
|
|
There's now no need to make sure subreq->io_iter is advanced to match
subreq->transferred before calling one of the netfs subrequest termination
functions as the check has been removed netfslib and the iterator is reset
prior to retrying a subreq.
Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Unlock cifs_tcp_ses_lock before calling cifs_put_smb_ses() to avoid such
deadlock.
Cc: stable@vger.kernel.org
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
[BUG]
Since v6.8 there are rare kernel crashes reported by various people,
the common factor is bad page status error messages like this:
BUG: Bad page state in process kswapd0 pfn:d6e840
page: refcount:0 mapcount:0 mapping:000000007512f4f2 index:0x2796c2c7c
pfn:0xd6e840
aops:btree_aops ino:1
flags: 0x17ffffe0000008(uptodate|node=0|zone=2|lastcpupid=0x3fffff)
page_type: 0xffffffff()
raw: 0017ffffe0000008 dead000000000100 dead000000000122 ffff88826d0be4c0
raw: 00000002796c2c7c 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: non-NULL mapping
[CAUSE]
Commit 09e6cef19c9f ("btrfs: refactor alloc_extent_buffer() to
allocate-then-attach method") changes the sequence when allocating a new
extent buffer.
Previously we always called grab_extent_buffer() under
mapping->i_private_lock, to ensure the safety on modification on
folio::private (which is a pointer to extent buffer for regular
sectorsize).
This can lead to the following race:
Thread A is trying to allocate an extent buffer at bytenr X, with 4
4K pages, meanwhile thread B is trying to release the page at X + 4K
(the second page of the extent buffer at X).
Thread A | Thread B
-----------------------------------+-------------------------------------
| btree_release_folio()
| | This is for the page at X + 4K,
| | Not page X.
| |
alloc_extent_buffer() | |- release_extent_buffer()
|- filemap_add_folio() for the | | |- atomic_dec_and_test(eb->refs)
| page at bytenr X (the first | | |
| page). | | |
| Which returned -EEXIST. | | |
| | | |
|- filemap_lock_folio() | | |
| Returned the first page locked. | | |
| | | |
|- grab_extent_buffer() | | |
| |- atomic_inc_not_zero() | | |
| | Returned false | | |
| |- folio_detach_private() | | |- folio_detach_private() for X
| |- folio_test_private() | | |- folio_test_private()
| Returned true | | | Returned true
|- folio_put() | |- folio_put()
Now there are two puts on the same folio at folio X, leading to refcount
underflow of the folio X, and eventually causing the BUG_ON() on the
page->mapping.
The condition is not that easy to hit:
- The release must be triggered for the middle page of an eb
If the release is on the same first page of an eb, page lock would kick
in and prevent the race.
- folio_detach_private() has a very small race window
It's only between folio_test_private() and folio_clear_private().
That's exactly when mapping->i_private_lock is used to prevent such race,
and commit 09e6cef19c9f ("btrfs: refactor alloc_extent_buffer() to
allocate-then-attach method") screwed that up.
At that time, I thought the page lock would kick in as
filemap_release_folio() also requires the page to be locked, but forgot
the filemap_release_folio() only locks one page, not all pages of an
extent buffer.
[FIX]
Move all the code requiring i_private_lock into
attach_eb_folio_to_filemap(), so that everything is done with proper
lock protection.
Furthermore to prevent future problems, add an extra
lockdep_assert_locked() to ensure we're holding the proper lock.
To reproducer that is able to hit the race (takes a few minutes with
instrumented code inserting delays to alloc_extent_buffer()):
#!/bin/sh
drop_caches () {
while(true); do
echo 3 > /proc/sys/vm/drop_caches
echo 1 > /proc/sys/vm/compact_memory
done
}
run_tar () {
while(true); do
for x in `seq 1 80` ; do
tar cf /dev/zero /mnt > /dev/null &
done
wait
done
}
mkfs.btrfs -f -d single -m single /dev/vda
mount -o noatime /dev/vda /mnt
# create 200,000 files, 1K each
./simoop -n 200000 -E -f 1k /mnt
drop_caches &
(run_tar)
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/linux-btrfs/CAHk-=wgt362nGfScVOOii8cgKn2LVVHeOvOA7OBwg1OwbuJQcw@mail.gmail.com/
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Link: https://lore.kernel.org/lkml/CABXGCsPktcHQOvKTbPaTwegMExije=Gpgci5NW=hqORo-s7diA@mail.gmail.com/
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Link: https://lore.kernel.org/linux-btrfs/e8b3311c-9a75-4903-907f-fc0f7a3fe423@gmx.de/
Reported-by: syzbot+f80b066392366b4af85e@syzkaller.appspotmail.com
Fixes: 09e6cef19c9f ("btrfs: refactor alloc_extent_buffer() to allocate-then-attach method")
CC: stable@vger.kernel.org # 6.8+
CC: Chris Mason <clm@fb.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The error handling in nilfs_empty_dir() when a directory folio/page read
fails is incorrect, as in the old ext2 implementation, and if the
folio/page cannot be read or nilfs_check_folio() fails, it will falsely
determine the directory as empty and corrupt the file system.
In addition, since nilfs_empty_dir() does not immediately return on a
failed folio/page read, but continues to loop, this can cause a long loop
with I/O if i_size of the directory's inode is also corrupted, causing the
log writer thread to wait and hang, as reported by syzbot.
Fix these issues by making nilfs_empty_dir() immediately return a false
value (0) if it fails to get a directory folio/page.
Link: https://lkml.kernel.org/r/20240604134255.7165-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+c8166c541d3971bf6c87@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c8166c541d3971bf6c87
Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations")
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|