Age | Commit message (Collapse) | Author |
|
Add a new superblock section to keep counts of errors seen since
filesystem creation: we'll be addingcounters for every distinct fsck
error.
The new superblock section has entries of the for [ id, count,
time_of_last_error ]; this is intended to let us see what errors are
occuring - and getting fixed - via show-super output.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We now track IO errors per device since filesystem creation.
IO error counts can be viewed in sysfs, or with the 'bcachefs
show-super' command.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This fixes a use after free - mi is dangling after the resize call.
Additionally, resizing the device's member info section was useless - we
were attempting to preallocate the space required before adding it to
the filesystem superblock, but there's other sections that we should
have been preallocating as well for that to work.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This fixes an incorrect memcpy() in the recent members_v2 code - a
members_v1 member is BCH_MEMBER_V1_BYTES, not sizeof(struct bch_member).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This adds a new btree, rebalance_work, to eliminate scanning required
for finding extents that need work done on them in the background - i.e.
for the background_target and background_compression options.
rebalance_work is a bitset btree, where a KEY_TYPE_set corresponds to an
extent in the extents or reflink btree at the same pos.
A new extent field is added, bch_extent_rebalance, which indicates that
this extent has work that needs to be done in the background - and which
options to use. This allows per-inode options to be propagated to
indirect extents - at least in some circumstances. In this patch,
changing IO options on a file will not propagate the new options to
indirect extents pointed to by that file.
Updating (setting/clearing) the rebalance_work btree is done by the
extent trigger, which looks at the bch_extent_rebalance field.
Scanning is still requrired after changing IO path options - either just
for a given inode, or for the whole filesystem. We indicate that
scanning is required by adding a KEY_TYPE_cookie key to the
rebalance_work btree: the cookie counter is so that we can detect that
scanning is still required when an option has been flipped mid-way
through an existing scan.
Future possible work:
- Propagate options to indirect extents when being changed
- Add other IO path options - nr_replicas, ec, to rebalance_work so
they can be applied in the background when they change
- Add a counter, for bcachefs fs usage output, showing the pending
amount of rebalance work: we'll probably want to do this after the
disk space accounting rewrite (moving it to a new btree)
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
New helper for new rebalance code
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
data_progress_list is gone - it was redundant with moving_context_list
The upcoming rebalance rewrite is going to have it using two different
move_stats objects with the same moving_context, depending on whether
it's scanning or using the rebalance_work btree - this patch plumbs
stats around a bit differently so that will work.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
btree_trans and moving_context are used together, and having the
moving_context owns the transaction object reduces some plumbing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Prep work for the new rebalance code - we need a few helpers exported.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Since compression options now include compression level, proper
validation is a bit more involved.
This adds bch2_compression_opt_valid(), and plumbs it around
appropriately.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This fixes an infinite loop when there's a set bit at position >= 32.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We don't yet repair (split) them, just check.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The write path may (rarely) see an encoded (checksummed) extent that
exceeds encoded_extent_max - this can happen when we're moving an
existing extent that was not checksummed, but was given a checksum by
bch2_write_rechecksum().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We're going to be using bch2_target_to_text() ->
bch2_disk_path_to_text() from bch2_bkey_ptrs_to_text() and
bch2_bkey_ptrs_invalid(), which can be called in any context.
This patch adds the actual label to bch_disk_group_cpu so that it can be
used by bch2_disk_path_to_text, and splits out bch2_disk_path_to_text()
into two variants - like the previous patch, one for when we have a
running filesystem and another for when we only have a superblock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Previously we just had bch2_opt_target_to_text() which could be passed
either a filesystem object or just a superblock - depending on if we
have a running filesystem or not.
Split these into two functions for clarity.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Upcoming rebalance_work btree will require extent triggers to be
BTREE_TRIGGER_WANTS_OLD_AND_NEW - so to reduce potential confusion,
let's just make all triggers BTREE_TRIGGER_WANTS_OLD_AND_NEW.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The data move path now correctly picks IO options when inodes in
different snapshots have different options applied.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We can't mark device superblocks or allocate journal on a device that
isn't online.
That means we may need to do this on every mount, because we may have
formatted a new filesystem and then done the first mount
(bch2_fs_initialize()) in degraded mode.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This code duplicated initialization already done in
bch2_fs_btree_iter_init().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The ca->oldest_gen array needs to be the same size as the bucket_gens
array; ca->mi.nbuckets is updated with only state_lock held, not
gc_lock, so bch2_gc_gens() could race with device resize and allocate
too small of an oldest_gens array.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Shrinkers are now exported to debugfs, so the names can't have slashes
in them.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
More forwards compatibility fixups: having BKEY_TYPE_btree at the end of
the enum conflicts with unnkown btree IDs, this shifts BKEY_TYPE_btree
to slot 0 and fixes things up accordingly.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Since we can run with unknown btree IDs, we can't directly index btree
IDs into fixed size arrays.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Be a bit more careful about when bch2_delete_dead_snapshots needs to
run: it only needs to run synchronously if we're running fsck, and it
only needs to run at all if we have snapshot nodes to delete or if fsck
has noticed that it needs to run.
Also:
Rename BCH_FS_HAVE_DELETED_SNAPSHOTS -> BCH_FS_NEED_DELETE_DEAD_SNAPSHOTS
Kill bch2_delete_dead_snapshots_hook(), move functionality to
bch2_mark_snapshot()
Factor out bch2_check_snapshot_needs_deletion(), to explicitly check
if we need to be running snapshot deletion.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We must not hold btree locks while taking snapshot_create_lock - this
fixes a lockdep splat.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
As pointed out by Linus, closure_sync() was racy; we could skip blocking
immediately after a get() and a put(), but then that would skip any
barrier corresponding to the other thread's put() barrier.
To fix this, always do the full __closure_sync() sequence whenever any
get() has happened and the closure might have been used by other
threads.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Pull initial bcachefs updates from Kent Overstreet:
"Here's the bcachefs filesystem pull request.
One new patch since last week: the exportfs constants ended up
conflicting with other filesystems that are also getting added to the
global enum, so switched to new constants picked by Amir.
The only new non fs/bcachefs/ patch is the objtool patch that adds
bcachefs functions to the list of noreturns. The patch that exports
osq_lock() has been dropped for now, per Ingo"
* tag 'bcachefs-2023-10-30' of https://evilpiepirate.org/git/bcachefs: (2781 commits)
exportfs: Change bcachefs fid_type enum to avoid conflicts
bcachefs: Refactor memcpy into direct assignment
bcachefs: Fix drop_alloc_keys()
bcachefs: snapshot_create_lock
bcachefs: Fix snapshot skiplists during snapshot deletion
bcachefs: bch2_sb_field_get() refactoring
bcachefs: KEY_TYPE_error now counts towards i_sectors
bcachefs: Fix handling of unknown bkey types
bcachefs: Switch to unsafe_memcpy() in a few places
bcachefs: Use struct_size()
bcachefs: Correctly initialize new buckets on device resize
bcachefs: Fix another smatch complaint
bcachefs: Use strsep() in split_devs()
bcachefs: Add iops fields to bch_member
bcachefs: Rename bch_sb_field_members -> bch_sb_field_members_v1
bcachefs: New superblock section members_v2
bcachefs: Add new helper to retrieve bch_member from sb
bcachefs: bucket_lock() is now a sleepable lock
bcachefs: fix crc32c checksum merge byte order problem
bcachefs: Fix bch2_inode_delete_keys()
...
|
|
The memcpy() in bch2_bkey_append_ptr() is operating on an embedded fake
flexible array which looks to the compiler like it has 0 size. This
causes W=1 builds to emit warnings due to -Wstringop-overflow:
In file included from include/linux/string.h:254,
from include/linux/bitmap.h:11,
from include/linux/cpumask.h:12,
from include/linux/smp.h:13,
from include/linux/lockdep.h:14,
from include/linux/radix-tree.h:14,
from include/linux/backing-dev-defs.h:6,
from fs/bcachefs/bcachefs.h:182:
fs/bcachefs/extents.c: In function 'bch2_bkey_append_ptr':
include/linux/fortify-string.h:57:33: warning: writing 8 bytes into a region of size 0 [-Wstringop-overflow=]
57 | #define __underlying_memcpy __builtin_memcpy
| ^
include/linux/fortify-string.h:648:9: note: in expansion of macro '__underlying_memcpy'
648 | __underlying_##op(p, q, __fortify_size); \
| ^~~~~~~~~~~~~
include/linux/fortify-string.h:693:26: note: in expansion of macro '__fortify_memcpy_chk'
693 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \
| ^~~~~~~~~~~~~~~~~~~~
fs/bcachefs/extents.c:235:17: note: in expansion of macro 'memcpy'
235 | memcpy((void *) &k->v + bkey_val_bytes(&k->k),
| ^~~~~~
fs/bcachefs/bcachefs_format.h:287:33: note: destination object 'v' of size 0
287 | struct bch_val v;
| ^
Avoid making any structure changes and just replace the u64 copy into a
direct assignment, side-stepping the entire problem.
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Brian Foster <bfoster@redhat.com>
Cc: linux-bcachefs@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202309192314.VBsjiIm5-lkp@intel.com/
Link: https://lore.kernel.org/r/20231010235609.work.594-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
For consistency with the rest of the reconstruct_alloc option, we should
be skipping all alloc keys.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a new lock for snapshot creation - this addresses a few races with
logged operations and snapshot deletion.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In snapshot deleion, we have to pick new skiplist nodes for entries that
point to nodes being deleted.
The function that finds a new skiplist node, skipping over entries being
deleted, was incorrect: if n = 0, but the parent node is being deleted,
we also need to skip over that node.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Instead of using token pasting to generate methods for each superblock
section, just make the type a parameter to bch2_sb_field_get().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
KEY_TYPE_error is used when all replicas in an extent are marked as
failed; it indicates that data was present, but has been lost.
So that i_sectors doesn't change when replacing extents with
KEY_TYPE_error, we now have to count error keys as allocations - this
fixes fsck errors later.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
min_val_size was U8_MAX for unknown key types, causing us to flag any
known key as invalid - it should have been 0.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The new fortify checking doesn't work for us in all places; this
switches to unsafe_memcpy() where appropriate to silence a few
warnings/errors.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Use struct_size() instead of hand writing it.
This is less verbose and more robust.
While at it, prepare for the coming implementation by GCC and Clang of the
__counted_by attribute. Flexible array members annotated with __counted_by
can have their accesses bounds-checked at run-time checking via
CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for
strcpy/memcpy-family functions).
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_dev_resize() was never updated for the allocator rewrite with
persistent freelists, and it wasn't noticed because the tests weren't
running fsck - oops.
Fix this by running bch2_dev_freespace_init() for the new buckets.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This should be harmless, but initialize last_seq anyways.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Minor refactoring to fix a smatch complaint.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
members_v2 has dynamically resizable entries so that we can extend
bch_member. The members can no longer be accessed with simple array
indexing Instead members_v2_get is used to find a member's exact
location within the array and returns a copy of that member.
Alternatively member_v2_get_mut retrieves a mutable point to a member.
Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Prep work for introducing bch_sb_field_members_v2 - introduce new
helpers that will check for members_v2 if it exists, otherwise using v1
Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
fsck_err() may sleep - it takes a mutex and may allocate memory, so
bucket_lock() needs to be a sleepable lock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|