summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2017-08-16btrfs: Introduce a function to check if all chunks a OK for degraded rw mountQu Wenruo
Introduce a new function, btrfs_check_rw_degradable(), to check if all chunks in btrfs is OK for degraded rw mount. It provides the new basis for accurate btrfs mount/remount and even runtime degraded mount check other than old one-size-fit-all method. Btrfs currently uses num_tolerated_disk_barrier_failures to do global check for tolerated missing device. Although the one-size-fit-all solution is quite safe, it's too strict if data and metadata has different duplication level. For example, if one use Single data and RAID1 metadata for 2 disks, it means any missing device will make the fs unable to be degraded mounted. But in fact, some times all single chunks may be in the existing device and in that case, we should allow it to be rw degraded mounted. Such case can be easily reproduced using the following script: # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc # wipefs -f /dev/sdc # mount /dev/sdb -o degraded,rw If using btrfs-debug-tree to check /dev/sdb, one should find that the data chunk is only in sdb, so in fact it should allow degraded mount. This patchset will introduce a new per-chunk degradable check for btrfs, allow above case to succeed, and it's quite small anyway. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> [ copied text from cover letter with more details about the problem being solved ] Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16Btrfs: report errors when checksum is not foundLiu Bo
When btrfs fails the checksum check, it'll fill the whole page with "1". However, if %csum_expected is 0 (which means there is no checksum), then for some unknown reason, we just pretend that the read is correct, so userspace would be confused about the dilemma that read is successful but getting a page with all content being "1". This can happen due to a bug in btrfs-convert. This fixes it by always returning errors if checksum doesn't match. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: Prevent possible ERR_PTR() dereferenceNikolay Borisov
In btrfs_full_stripe_len/btrfs_is_parity_mirror we have similar code which gets the chunk map for a particular range via get_chunk_map. However, get_chunk_map can return an ERR_PTR value and while the 2 callers do catch this with a WARN_ON they then proceed to indiscriminately dereference the extent map. This of course leads to a crash. Fix the offenders by making the dereference conditional on IS_ERR. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: Remove redundant checks from btrfs_alloc_data_chunk_ondemandNikolay Borisov
Many commits ago the data space_info in alloc_data_chunk_ondemand used to be acquired from the inode. At that point commit 33b4d47f5e24 ("Btrfs: deal with NULL space info") got introduced to deal with spurios cases where the space info could be null, following a rebalance. Nowadays, however, the space info is referenced directly from the btrfs_fs_info struct which is initialised at filesystem mount time. This makes the null checks redundant, so remove them. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: Remove redundant argument of flush_spaceNikolay Borisov
All callers of flush_space pass the same number for orig/num_bytes arguments. Let's remove one of the numbers and also modify the trace point to show only a single number - bytes requested. Seems that last point where the two parameters were treated differently is before the ticketed enospc rework. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: resume qgroup rescan on rw remountAleksa Sarai
Several distributions mount the "proper root" as ro during initrd and then remount it as rw before pivot_root(2). Thus, if a rescan had been aborted by a previous shutdown, the rescan would never be resumed. This issue would manifest itself as several btrfs ioctl(2)s causing the entire machine to hang when btrfs_qgroup_wait_for_completion was hit (due to the fs_info->qgroup_rescan_running flag being set but the rescan itself not being resumed). Notably, Docker's btrfs storage driver makes regular use of BTRFS_QUOTA_CTL_DISABLE and BTRFS_IOC_QUOTA_RESCAN_WAIT (causing this problem to be manifested on boot for some machines). Cc: <stable@vger.kernel.org> # v3.11+ Cc: Jeff Mahoney <jeffm@suse.com> Fixes: b382a324b60f ("Btrfs: fix qgroup rescan resume on mount") Signed-off-by: Aleksa Sarai <asarai@suse.de> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: clean up extraneous computations in add_delayed_refsEdmund Nadolski
Repeating the same computation in multiple places is not necessary. Signed-off-by: Edmund Nadolski <enadolski@suse.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: allow backref search checks for shared extentsEdmund Nadolski
When called with a struct share_check, find_parent_nodes() will detect a shared extent and immediately return with BACKREF_SHARED_FOUND. Signed-off-by: Edmund Nadolski <enadolski@suse.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: add cond_resched() calls when resolving backrefsEdmund Nadolski
Since backref resolution is CPU-intensive, the cond_resched calls should help alleviate soft lockup occurences. Signed-off-by: Edmund Nadolski <enadolski@suse.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: backref, add tracepoints for prelim_ref insertion and mergingJeff Mahoney
This patch adds a tracepoint event for prelim_ref insertion and merging. For each, the ref being inserted or merged and the count of tree nodes is issued. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: add a node counter to each of the rbtreesJeff Mahoney
This patch adds counters to each of the rbtrees so that we can tell how large they are growing for a given workload. These counters will be exported by tracepoints in the next patch. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: convert prelimary reference tracking to use rbtreesEdmund Nadolski
It's been known for a while that the use of multiple lists that are periodically merged was an algorithmic problem within btrfs. There are several workloads that don't complete in any reasonable amount of time (e.g. btrfs/130) and others that cause soft lockups. The solution is to use a set of rbtrees that do insertion merging for both indirect and direct refs, with the former converting refs into the latter. The result is a btrfs/130 workload that used to take several hours now takes about half of that. This runtime still isn't acceptable and a future patch will address that by moving the rbtrees higher in the stack so the lookups can be shared across multiple calls to find_parent_nodes. Signed-off-by: Edmund Nadolski <enadolski@suse.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: remove ref_tree implementation from backref.cEdmund Nadolski
Commit afce772e87c3 ("btrfs: fix check_shared for fiemap ioctl") added the ref_tree code in backref.c to reduce backref searching for shared extents under the FIEMAP ioctl. This code will not be compatible with the upcoming rbtree changes for improved backref searching, so this patch removes the ref_tree code. The rbtree changes will provide the equivalent functionality for FIEMAP. The above commit also introduced transaction semantics around calls to btrfs_check_shared() in order to accurately account for delayed refs. This functionality needs to be retained, so a complete revert of the above commit is not desirable. This patch therefore removes the ref_tree portion of the commit as above, however it does not remove the transaction portion. Signed-off-by: Edmund Nadolski <enadolski@suse.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: btrfs_check_shared should manage its own transactionEdmund Nadolski
Commit afce772e87c3 ("btrfs: fix check_shared for fiemap ioctl") added transaction semantics around calls to btrfs_check_shared() in order to provide accurate accounting of delayed refs. The transaction management should be done inside btrfs_check_shared(), so that callers do not need to manage transactions individually. Signed-off-by: Edmund Nadolski <enadolski@suse.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: backref, cleanup __ namespace abuseJeff Mahoney
We typically use __ to indicate a helper routine that shouldn't be called directly without understanding the proper context required to do so. We use static functions to indicate that a function is private to a particular C file. The backref code uses static function and __ prefixes on nearly everything, which makes the code difficult to read and establishes a pattern for future code that shouldn't be followed. This patch drops all the unnecessary prefixes. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: backref, add unode_aux_to_inode_list helperJeff Mahoney
Replacing the double cast and ternary conditional with a helper makes the code easier on the eyes. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: backref, constify some argumentsJeff Mahoney
This constifies a few buffers used in the backref code. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: constify tracepoint argumentsJeff Mahoney
Tracepoint arguments are all read-only. If we mark the arguments as const, we're able to keep or convert those arguments to const where appropriate. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: struct-funcs, constify readersJeff Mahoney
We have reader helpers for most of the on-disk structures that use an extent_buffer and pointer as offset into the buffer that are read-only. We should mark them as const and, in turn, allow consumers of these interfaces to mark the buffers const as well. No impact on code, but serves as documentation that a buffer is intended not to be modified. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: remove unused sectorsize memberNikolay Borisov
The sectorsize member of btrfs_block_group_cache is unused. So remove it, this reduces the number of holes in the struct. With patch: /* size: 856, cachelines: 14, members: 40 */ /* sum members: 837, holes: 4, sum holes: 19 */ /* bit holes: 1, sum bit holes: 29 bits */ /* last cacheline: 24 bytes */ Without patch: /* size: 864, cachelines: 14, members: 41 */ /* sum members: 841, holes: 5, sum holes: 23 */ /* bit holes: 1, sum bit holes: 29 bits */ /* last cacheline: 32 bytes */ Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: Be explicit about usage of min()Nikolay Borisov
__btrfs_alloc_chunk contains code which boils down to: ndevs = min(ndevs, devs_max) It's conditional upon devs_max not being 0. However, it cannot really be 0 since it's always set to either BTRFS_MAX_DEVS_SYS_CHUNK or BTRFS_MAX_DEVS(fs_info->chunk_root). So eliminate the condition check and use min explicitly. This has no functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: Use explicit round_down call rather than open-coding itNikolay Borisov
No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: convert while loop to list_for_each_entryNikolay Borisov
No functional changes, just make the loop a bit more readable Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-11Merge tag 'nfs-for-4.13-5' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client fixes from Anna Schumaker: "A few more NFS client bugfixes from me for rc5. Dros has a stable fix for flexfiles to prevent leaking the nfs4_ff_ds_version arrays when freeing a layout, Trond fixed a potential recovery loop situation with the TEST_STATEID operation, and Christoph fixed up the pNFS blocklayout Kconfig options to prevent unsafe use with kernels that don't have large block device support. Summary: Stable fix: - fix leaking nfs4_ff_ds_version array Other fixes: - improve TEST_STATEID OLD_STATEID handling to prevent recovery loop - require 64-bit sector_t for pNFS blocklayout to prevent 32-bit compile errors" * tag 'nfs-for-4.13-5' of git://git.linux-nfs.org/projects/anna/linux-nfs: pnfs/blocklayout: require 64-bit sector_t NFSv4: Ignore NFS4ERR_OLD_STATEID in nfs41_check_open_stateid() nfs/flexfiles: fix leak of nfs4_ff_ds_version arrays
2017-08-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "Fix a few bugs in fuse" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: set mapping error in writepage_locked when it fails fuse: Dont call set_page_dirty_lock() for ITER_BVEC pages for async_dio fuse: initialize the flock flag in fuse_file on allocation
2017-08-11pnfs/blocklayout: require 64-bit sector_tChristoph Hellwig
The blocklayout code does not compile cleanly for a 32-bit sector_t, and also has no reliable checks for devices sizes, which makes it unsafe to use with a kernel that doesn't support large block devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Arnd Bergmann <arnd@arndb.de> Fixes: 5c83746a0cf2 ("pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing") Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-11fuse: set mapping error in writepage_locked when it failsJeff Layton
This ensures that we see errors on fsync when writeback fails. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-08-10userfaultfd: replace ENOSPC with ESRCH in case mm has gone during copy/zeropageMike Rapoport
When the process exit races with outstanding mcopy_atomic, it would be better to return ESRCH error. When such race occurs the process and it's mm are going away and returning "no such process" to the uffd monitor seems better fit than ENOSPC. Link: http://lkml.kernel.org/r/1502111545-32305-1-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Suggested-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10mm: fix KSM data corruptionMinchan Kim
Nadav reported KSM can corrupt the user data by the TLB batching race[1]. That means data user written can be lost. Quote from Nadav Amit: "For this race we need 4 CPUs: CPU0: Caches a writable and dirty PTE entry, and uses the stale value for write later. CPU1: Runs madvise_free on the range that includes the PTE. It would clear the dirty-bit. It batches TLB flushes. CPU2: Writes 4 to /proc/PID/clear_refs , clearing the PTEs soft-dirty. We care about the fact that it clears the PTE write-bit, and of course, batches TLB flushes. CPU3: Runs KSM. Our purpose is to pass the following test in write_protect_page(): if (pte_write(*pvmw.pte) || pte_dirty(*pvmw.pte) || (pte_protnone(*pvmw.pte) && pte_savedwrite(*pvmw.pte))) Since it will avoid TLB flush. And we want to do it while the PTE is stale. Later, and before replacing the page, we would be able to change the page. Note that all the operations the CPU1-3 perform canhappen in parallel since they only acquire mmap_sem for read. We start with two identical pages. Everything below regards the same page/PTE. CPU0 CPU1 CPU2 CPU3 ---- ---- ---- ---- Write the same value on page [cache PTE as dirty in TLB] MADV_FREE pte_mkclean() 4 > clear_refs pte_wrprotect() write_protect_page() [ success, no flush ] pages_indentical() [ ok ] Write to page different value [Ok, using stale PTE] replace_page() Later, CPU1, CPU2 and CPU3 would flush the TLB, but that is too late. CPU0 already wrote on the page, but KSM ignored this write, and it got lost" In above scenario, MADV_FREE is fixed by changing TLB batching API including [set|clear]_tlb_flush_pending. Remained thing is soft-dirty part. This patch changes soft-dirty uses TLB batching API instead of flush_tlb_mm and KSM checks pending TLB flush by using mm_tlb_flush_pending so that it will flush TLB to avoid data lost if there are other parallel threads pending TLB flush. [1] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com Link: http://lkml.kernel.org/r/20170802000818.4760-8-namit@vmware.com Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Nadav Amit <namit@vmware.com> Reported-by: Nadav Amit <namit@vmware.com> Tested-by: Nadav Amit <namit@vmware.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Hugh Dickins <hughd@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10mm: fix global NR_SLAB_.*CLAIMABLE counter readsJohannes Weiner
As Tetsuo points out: "Commit 385386cff4c6 ("mm: vmstat: move slab statistics from zone to node counters") broke "Slab:" field of /proc/meminfo . It shows nearly 0kB" In addition to /proc/meminfo, this problem also affects the slab counters OOM/allocation failure info dumps, can cause early -ENOMEM from overcommit protection, and miscalculate image size requirements during suspend-to-disk. This is because the patch in question switched the slab counters from the zone level to the node level, but forgot to update the global accessor functions to read the aggregate node data instead of the aggregate zone data. Use global_node_page_state() to access the global slab counters. Fixes: 385386cff4c6 ("mm: vmstat: move slab statistics from zone to node counters") Link: http://lkml.kernel.org/r/20170801134256.5400-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Stefan Agner <stefan@agner.ch> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-09NFSv4: Ignore NFS4ERR_OLD_STATEID in nfs41_check_open_stateid()Trond Myklebust
If the call to TEST_STATEID returns NFS4ERR_OLD_STATEID, then it just means we raced with other calls to OPEN. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-08nfs/flexfiles: fix leak of nfs4_ff_ds_version arraysWeston Andros Adamson
The client was freeing the nfs4_ff_layout_ds, but not the contained nfs4_ff_ds_version array. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-07Merge tag 'xfs-4.13-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Darrick Wong: "I have a couple more bug fixes for you today: - fix memory leak when issuing discard - fix propagation of the dax inode flag" * tag 'xfs-4.13-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: Fix per-inode DAX flag inheritance xfs: Fix leak of discard bio
2017-08-06Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "A large number of ext4 bug fixes and cleanups for v4.13" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix copy paste error in ext4_swap_extents() ext4: fix overflow caused by missing cast in ext4_resize_fs() ext4, project: expand inode extra size if possible ext4: cleanup ext4_expand_extra_isize_ea() ext4: restructure ext4_expand_extra_isize ext4: fix forgetten xattr lock protection in ext4_expand_extra_isize ext4: make xattr inode reads faster ext4: inplace xattr block update fails to deduplicate blocks ext4: remove unused mode parameter ext4: fix warning about stack corruption ext4: fix dir_nlink behaviour ext4: silence array overflow warning ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize ext4: release discard bio after sending discard commands ext4: convert swap_inode_data() over to use swap() on most of the fields ext4: error should be cleared if ea_inode isn't added to the cache ext4: Don't clear SGID when inheriting ACLs ext4: preserve i_mode if __ext4_set_acl() fails ext4: remove unused metadata accounting variables ext4: correct comment references to ext4_ext_direct_IO()
2017-08-06ext4: fix copy paste error in ext4_swap_extents()Maninder Singh
This bug was found by a static code checker tool for copy paste problems. Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Signed-off-by: Vaneet Narang <v.narang@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-08-06ext4: fix overflow caused by missing cast in ext4_resize_fs()Jerry Lee
On a 32-bit platform, the value of n_blcoks_count may be wrong during the file system is resized to size larger than 2^32 blocks. This may caused the superblock being corrupted with zero blocks count. Fixes: 1c6bd7173d66 Signed-off-by: Jerry Lee <jerrylee@qnap.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org # 3.7+
2017-08-06ext4, project: expand inode extra size if possibleMiao Xie
When upgrading from old format, try to set project id to old file first time, it will return EOVERFLOW, but if that file is dirtied(touch etc), changing project id will be allowed, this might be confusing for users, we could try to expand @i_extra_isize here too. Reported-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-08-06ext4: cleanup ext4_expand_extra_isize_ea()Miao Xie
Clean up some goto statement, make ext4_expand_extra_isize_ea() clearer. Signed-off-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Wang Shilong <wshilong@ddn.com>
2017-08-06ext4: restructure ext4_expand_extra_isizeMiao Xie
Current ext4_expand_extra_isize just tries to expand extra isize, if someone is holding xattr lock or some check fails, it will give up. So rename its name to ext4_try_to_expand_extra_isize. Besides that, we clean up unnecessary check and move some relative checks into it. Signed-off-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Wang Shilong <wshilong@ddn.com>
2017-08-06ext4: fix forgetten xattr lock protection in ext4_expand_extra_isizeMiao Xie
We should avoid the contention between the i_extra_isize update and the inline data insertion, so move the xattr trylock in front of i_extra_isize update. Signed-off-by: Miao Xie <miaoxie@huawei.com> Reviewed-by: Wang Shilong <wshilong@ddn.com>
2017-08-06ext4: make xattr inode reads fasterTahsin Erdogan
ext4_xattr_inode_read() currently reads each block sequentially while waiting for io operation to complete before moving on to the next block. This prevents request merging in block layer. Add a ext4_bread_batch() function that starts reads for all blocks then optionally waits for them to complete. A similar logic is used in ext4_find_entry(), so update that code to use the new function. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-08-05ext4: inplace xattr block update fails to deduplicate blocksTahsin Erdogan
When an xattr block has a single reference, block is updated inplace and it is reinserted to the cache. Later, a cache lookup is performed to see whether an existing block has the same contents. This cache lookup will most of the time return the just inserted entry so deduplication is not achieved. Running the following test script will produce two xattr blocks which can be observed in "File ACL: " line of debugfs output: mke2fs -b 1024 -I 128 -F -O extent /dev/sdb 1G mount /dev/sdb /mnt/sdb touch /mnt/sdb/{x,y} setfattr -n user.1 -v aaa /mnt/sdb/x setfattr -n user.2 -v bbb /mnt/sdb/x setfattr -n user.1 -v aaa /mnt/sdb/y setfattr -n user.2 -v bbb /mnt/sdb/y debugfs -R 'stat x' /dev/sdb | cat debugfs -R 'stat y' /dev/sdb | cat This patch defers the reinsertion to the cache so that we can locate other blocks with the same contents. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2017-08-05ext4: remove unused mode parameterTahsin Erdogan
ext4_alloc_file_blocks() does not use its mode parameter. Remove it. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-08-05ext4: fix warning about stack corruptionArnd Bergmann
After commit 62d1034f53e3 ("fortify: use WARN instead of BUG for now"), we get a warning about possible stack overflow from a memcpy that was not strictly bounded to the size of the local variable: inlined from 'ext4_mb_seq_groups_show' at fs/ext4/mballoc.c:2322:2: include/linux/string.h:309:9: error: '__builtin_memcpy': writing between 161 and 1116 bytes into a region of size 160 overflows the destination [-Werror=stringop-overflow=] We actually had a bug here that would have been found by the warning, but it was already fixed last year in commit 30a9d7afe70e ("ext4: fix stack memory corruption with 64k block size"). This replaces the fixed-length structure on the stack with a variable-length structure, using the correct upper bound that tells the compiler that everything is really fine here. I also change the loop count to check for the same upper bound for consistency, but the existing code is already correct here. Note that while clang won't allow certain kinds of variable-length arrays in structures, this particular instance is fine, as the array is at the end of the structure, and the size is strictly bounded. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-08-05ext4: fix dir_nlink behaviourAndreas Dilger
The dir_nlink feature has been enabled by default for new ext4 filesystems since e2fsprogs-1.41 in 2008, and was automatically enabled by the kernel for older ext4 filesystems since the dir_nlink feature was added with ext4 in kernel 2.6.28+ when the subdirectory count exceeded EXT4_LINK_MAX-1. Automatically adding the file system features such as dir_nlink is generally frowned upon, since it could cause the file system to not be mountable on older kernel, thus preventing the administrator from rolling back to an older kernel if necessary. In this case, the administrator might also want to disable the feature because glibc's fts_read() function does not correctly optimize directory traversal for directories that use st_nlinks field of 1 to indicate that the number of links in the directory are not tracked by the file system, and could fail to traverse the full directory hierarchy. Fortunately, in the past ten years very few users have complained about incomplete file system traversal by glibc's fts_read(). This commit also changes ext4_inc_count() to allow i_nlinks to reach the full EXT4_LINK_MAX links on the parent directory (including "." and "..") before changing i_links_count to be 1. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196405 Signed-off-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-08-05ext4: silence array overflow warningDan Carpenter
I get a static checker warning: fs/ext4/ext4.h:3091 ext4_set_de_type() error: buffer overflow 'ext4_type_by_mode' 15 <= 15 It seems unlikely that we would hit this read overflow in real life, but it's also simple enough to make the array 16 bytes instead of 15. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-08-05ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesizeJan Kara
ext4_find_unwritten_pgoff() does not properly handle a situation when starting index is in the middle of a page and blocksize < pagesize. The following command shows the bug on filesystem with 1k blocksize: xfs_io -f -c "falloc 0 4k" \ -c "pwrite 1k 1k" \ -c "pwrite 3k 1k" \ -c "seek -a -r 0" foo In this example, neither lseek(fd, 1024, SEEK_HOLE) nor lseek(fd, 2048, SEEK_DATA) will return the correct result. Fix the problem by neglecting buffers in a page before starting offset. Reported-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> CC: stable@vger.kernel.org # 3.8+
2017-08-05ext4: release discard bio after sending discard commandsDaeho Jeong
We've changed the discard command handling into parallel manner. But, in this change, I forgot decreasing the usage count of the bio which was used to send discard request. I'm sorry about that. Fixes: a015434480dc ("ext4: send parallel discards on commit completions") Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2017-08-04xfs: Fix per-inode DAX flag inheritanceLukas Czerner
According to the commit that implemented per-inode DAX flag: commit 58f88ca2df72 ("xfs: introduce per-inode DAX enablement") the flag is supposed to act as "inherit flag". Currently this only works in the situations where parent directory already has a flag in di_flags set, otherwise inheritance does not work. This is because setting the XFS_DIFLAG2_DAX flag is done in a wrong branch designated for di_flags, not di_flags2. Fix this by moving the code to branch designated for setting di_flags2, which does test for flags in di_flags2. Fixes: 58f88ca2df72 ("xfs: introduce per-inode DAX enablement") Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-08-04xfs: Fix leak of discard bioJan Kara
The bio describing discard operation is allocated by __blkdev_issue_discard() which returns us a reference to it. That reference is never released and thus we leak this bio. Drop the bio reference once it completes in xlog_discard_endio(). CC: stable@vger.kernel.org Fixes: 4560e78f40cb55bd2ea8f1ef4001c5baa88531c7 Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>