summaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)Author
2017-02-20Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Nothing exciting, just the usual pile of fixes, updates and cleanups: - A bunch of clocksource driver updates - Removal of CONFIG_TIMER_STATS and the related /proc file - More posix timer slim down work - A scalability enhancement in the tick broadcast code - Math cleanups" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) hrtimer: Catch invalid clockids again math64, tile: Fix build failure clocksource/drivers/arm_arch_timer:: Mark cyclecounter __ro_after_init timerfd: Protect the might cancel mechanism proper timer_list: Remove useless cast when printing time: Remove CONFIG_TIMER_STATS clocksource/drivers/arm_arch_timer: Work around Hisilicon erratum 161010101 clocksource/drivers/arm_arch_timer: Introduce generic errata handling infrastructure clocksource/drivers/arm_arch_timer: Remove fsl-a008585 parameter clocksource/drivers/arm_arch_timer: Add dt binding for hisilicon-161010101 erratum clocksource/drivers/ostm: Add renesas-ostm timer driver clocksource/drivers/ostm: Document renesas-ostm timer DT bindings clocksource/drivers/tcb_clksrc: Use 32 bit tcb as sched_clock clocksource/drivers/gemini: Add driver for the Cortina Gemini clocksource: add DT bindings for Cortina Gemini clockevents: Add a clkevt-of mechanism like clksrc-of tick/broadcast: Reduce lock cacheline contention timers: Omit POSIX timer stuff from task_struct when disabled x86/timer: Make delay() work during early bootup delay: Add explanation of udelay() inaccuracy ...
2017-02-17Merge branch 'for-4.11/block' into for-4.11/linus-mergeJens Axboe
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-17rhashtable: Add nested tablesHerbert Xu
This patch adds code that handles GFP_ATOMIC kmalloc failure on insertion. As we cannot use vmalloc, we solve it by making our hash table nested. That is, we allocate single pages at each level and reach our desired table size by nesting them. When a nested table is created, only a single page is allocated at the top-level. Lower levels are allocated on demand during insertion. Therefore for each insertion to succeed, only two (non-consecutive) pages are needed. After a nested table is created, a rehash will be scheduled in order to switch to a vmalloced table as soon as possible. Also, the rehash code will never rehash into a nested table. If we detect a nested table during a rehash, the rehash will be aborted and a new rehash will be scheduled. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-16usercopy: Adjust tests to deal with SMAP/PANKees Cook
Under SMAP/PAN/etc, we cannot write directly to userspace memory, so this rearranges the test bytes to get written through copy_to_user(). Additionally drops the bad copy_from_user() test that would trigger a memcpy() against userspace on failure. Signed-off-by: Kees Cook <keescook@chromium.org>
2017-02-16usercopy: add testcases to check zeroing on failureHoeun Ryu
During usercopy the destination buffer will be zeroed if copy_from_user() or get_user() fails. This patch adds testcases for it. The destination buffer is set with non-zero value before illegal copy_from_user() or get_user() is executed and the buffer is compared to zero after usercopy is done. Signed-off-by: Hoeun Ryu <hoeun.ryu@gmail.com> [kees: clarified commit log, dropped second kmalloc] Signed-off-by: Kees Cook <keescook@chromium.org>
2017-02-13idr: Add missing __rcu annotationsMatthew Wilcox
Where we use the radix tree iteration macros, we need to annotate 'slot' with __rcu. Make sure we don't forget any new places in the future with the same CFLAGS check used for radix-tree.c. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-02-13radix-tree: Fix __rcu annotationsMatthew Wilcox
Many places were missing __rcu annotations. A few places needed a few lines of explanation about why it was safe to not use RCU accessors. Add a custom CFLAGS setting to the Makefile to ensure that new patches don't miss RCU annotations. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-02-13radix-tree: Add rcu_dereference and rcu_assign_pointer callsMatthew Wilcox
Some of these have been missing for many years. Others were recently introduced by me. Fortunately, we have tools that help us find such things. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-02-13radix_tree_iter_resume: Fix out of bounds errorMatthew Wilcox
The address sanitizer occasionally finds an out of bounds error while running the test-suite. It turned out to be a read of the pointer immediately next to the tree root, but this out of bounds error could have occurred elsewhere. This happens because radix_tree_iter_resume() dereferences 'slot' before checking whether we've come to the end of the chunk. We can just delete this line; the value was never used. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-02-13radix-tree: Store a pointer to the root in each nodeMatthew Wilcox
Instead of having this mysterious private_data in each radix_tree_node, store a pointer to the root, which can be useful for debugging. This also relieves the mm code from the duty of updating it. Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-02-13radix-tree: Chain preallocated nodes through ->parentMatthew Wilcox
Chaining through the ->private_data member means we have to zero ->private_data after removing preallocated nodes from the list. We're about to initialise ->parent anyway, so we can avoid zeroing it. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-02-13ida: Use exceptional entries for small IDAsMatthew Wilcox
We can use the root entry as a bitmap and save allocating a 128 byte bitmap for an IDA that contains only a few entries (30 on a 32-bit machine, 62 on a 64-bit machine). This costs about 300 bytes of kernel text on x86-64, so as long as 3 IDAs fall into this category, this is a net win for memory consumption. Thanks to Rasmus Villemoes for his work documenting the problem and collecting statistics on IDAs. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-02-13ida: Move ida_bitmap to a percpu variableMatthew Wilcox
When we preload the IDA, we allocate an IDA bitmap. Instead of storing that preallocated bitmap in the IDA, we store it in a percpu variable. Generally there are more IDAs in the system than CPUs, so this cuts down on the number of preallocated bitmaps that are unused, and about half of the IDA users did not call ida_destroy() so they were leaking IDA bitmaps. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-02-13Reimplement IDR and IDA using the radix treeMatthew Wilcox
The IDR is very similar to the radix tree. It has some functionality that the radix tree did not have (alloc next free, cyclic allocation, a callback-based for_each, destroy tree), which is readily implementable on top of the radix tree. A few small changes were needed in order to use a tag to represent nodes with free space below them. More extensive changes were needed to support storing NULL as a valid entry in an IDR. Plain radix trees still interpret NULL as a not-present entry. The IDA is reimplemented as a client of the newly enhanced radix tree. As in the current implementation, it uses a bitmap at the last level of the tree. Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2017-02-13radix-tree: Add radix_tree_iter_deleteMatthew Wilcox
Factor the deletion code out into __radix_tree_delete() and provide a nice iterator-based wrapper around it. If we free the node, advance the iterator to avoid reading from freed memory. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-02-13radix-tree: Add radix_tree_iter_tag_clear()Matthew Wilcox
The counterpart to radix_tree_iter_tag_set(), used by the IDR code Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Rehas Sachdeva <aquannie@gmail.com>
2017-02-13EXPORT_SYMBOL radix_tree_replace_slotSong Liu
It will be used in drivers/md/raid5-cache.c Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>
2017-02-10time: Remove CONFIG_TIMER_STATSKees Cook
Currently CONFIG_TIMER_STATS exposes process information across namespaces: kernel/time/timer_list.c print_timer(): SEQ_printf(m, ", %s/%d", tmp, timer->start_pid); /proc/timer_list: #11: <0000000000000000>, hrtimer_wakeup, S:01, do_nanosleep, cron/2570 Given that the tracer can give the same information, this patch entirely removes CONFIG_TIMER_STATS. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: linux-doc@vger.kernel.org Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Xing Gao <xgao01@email.wm.edu> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Jessica Frazelle <me@jessfraz.com> Cc: kernel-hardening@lists.openwall.com Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Michal Marek <mmarek@suse.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Olof Johansson <olof@lixom.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-api@vger.kernel.org Cc: Arjan van de Ven <arjan@linux.intel.com> Link: http://lkml.kernel.org/r/20170208192659.GA32582@beast Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10debugobjects: Improve variable namingWaiman Long
As suggested by Ingo, the debug_objects_alloc counter is now renamed to debug_objects_allocated with minor twist in comment and debug output. Signed-off-by: Waiman Long <longman@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1486503630-1501-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-10refcount_t: Introduce a special purpose refcount typePeter Zijlstra
Provide refcount_t, an atomic_t like primitive built just for refcounting. It provides saturation semantics such that overflow becomes impossible and thereby 'spurious' use-after-free is avoided. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-08printk: rename nmi.c and exported apiSergey Senozhatsky
A preparation patch for printk_safe work. No functional change. - rename nmi.c to print_safe.c - add `printk_safe' prefix to some (which used both by printk-safe and printk-nmi) of the exported functions. Link: http://lkml.kernel.org/r/20161227141611.940-3-sergey.senozhatsky@gmail.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Calvin Owens <calvinowens@fb.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
2017-02-06Merge 4.10-rc7 into char-misc-nextGreg Kroah-Hartman
We want the hv and other fixes in here as well to handle merge and testing issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-05debugobjects: Reduce contention on the global pool_lockWaiman Long
On a large SMP system with many CPUs, the global pool_lock may become a performance bottleneck as all the CPUs that need to allocate or free debug objects have to take the lock. That can sometimes cause soft lockups like: NMI watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [rcuos/1:21] ... RIP: 0010:[<ffffffff817c216b>] [<ffffffff817c216b>] _raw_spin_unlock_irqrestore+0x3b/0x60 ... Call Trace: [<ffffffff813f40d1>] free_object+0x81/0xb0 [<ffffffff813f4f33>] debug_check_no_obj_freed+0x193/0x220 [<ffffffff81101a59>] ? trace_hardirqs_on_caller+0xf9/0x1c0 [<ffffffff81284996>] ? file_free_rcu+0x36/0x60 [<ffffffff81251712>] kmem_cache_free+0xd2/0x380 [<ffffffff81284960>] ? fput+0x90/0x90 [<ffffffff81284996>] file_free_rcu+0x36/0x60 [<ffffffff81124c23>] rcu_nocb_kthread+0x1b3/0x550 [<ffffffff81124b71>] ? rcu_nocb_kthread+0x101/0x550 [<ffffffff81124a70>] ? sync_exp_work_done.constprop.63+0x50/0x50 [<ffffffff810c59d1>] kthread+0x101/0x120 [<ffffffff81101a59>] ? trace_hardirqs_on_caller+0xf9/0x1c0 [<ffffffff817c2d32>] ret_from_fork+0x22/0x50 To reduce the amount of contention on the pool_lock, the actual kmem_cache_free() of the debug objects will be delayed if the pool_lock is busy. This will temporarily increase the amount of free objects available at the free pool when the system is busy. As a result, the number of kmem_cache allocation and freeing is reduced. To further reduce the lock operations free debug objects in batches of four. Signed-off-by: Waiman Long <longman@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: "Du Changbin" <changbin.du@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Stancek <jstancek@redhat.com> Link: http://lkml.kernel.org/r/1483647425-4135-4-git-send-email-longman@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-04debugobjects: Scale thresholds with # of CPUsWaiman Long
On a large SMP systems with hundreds of CPUs, the current thresholds for allocating and freeing debug objects (256 and 1024 respectively) may not work well. This can cause a lot of needless calls to kmem_aloc() and kmem_free() on those systems. To alleviate this thrashing problem, the object freeing threshold is now increased to "1024 + # of CPUs * 32". Whereas the object allocation threshold is increased to "256 + # of CPUs * 4". That should make the debug objects subsystem scale better with the number of CPUs available in the system. Signed-off-by: Waiman Long <longman@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: "Du Changbin" <changbin.du@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Stancek <jstancek@redhat.com> Link: http://lkml.kernel.org/r/1483647425-4135-3-git-send-email-longman@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-04debugobjects: Track number of kmem_cache_alloc/kmem_cache_free doneWaiman Long
New debugfs stat counters are added to track the numbers of kmem_cache_alloc() and kmem_cache_free() function calls to get a sense of how the internal debug objects cache management is performing. Signed-off-by: Waiman Long <longman@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: "Du Changbin" <changbin.du@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Stancek <jstancek@redhat.com> Link: http://lkml.kernel.org/r/1483647425-4135-2-git-send-email-longman@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-03lib: Introduce priority array area managerJiri Pirko
This introduces a infrastructure for management of linear priority areas. Priority order in an array matters, however order of items inside a priority group does not matter. As an initial implementation, L-sort algorithm is used. It is quite trivial. More advanced algorithm called P-sort will be introduced as a follow-up. The infrastructure is prepared for other algos. Alongside this, a testing module is introduced as well. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02ext4: move halfmd4 into hash.c directlyJason A. Donenfeld
The "half md4" transform should not be used by any new code. And fortunately, it's only used now by ext4. Since ext4 supports several hashing methods, at some point it might be desirable to move to something like SipHash. As an intermediate step, remove half md4 from cryptohash.h and lib, and make it just a local function in ext4's hash.c. There's precedent for doing this; the other function ext can use for its hashes -- TEA -- is also implemented in the same place. Also, by being a local function, this might allow gcc to perform some additional optimizations. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-01Merge tag 'drm-misc-next-2017-01-30' of ↵Dave Airlie
git://anongit.freedesktop.org/git/drm-misc into drm-next Another round of -misc stuff: - Noralf debugfs cleanup cleanup (not yet everything, some more driver patches awaiting acks). - More doc work. - edid/infoframe fixes from Ville. - misc 1-patch fixes all over, as usual Noralf needs this for his tinydrm pull request. * tag 'drm-misc-next-2017-01-30' of git://anongit.freedesktop.org/git/drm-misc: (48 commits) drm/vc4: Remove vc4_debugfs_cleanup() dma/fence: Export enable-signaling tracepoint for emission by drivers drm/tilcdc: Remove tilcdc_debugfs_cleanup() drm/tegra: Remove tegra_debugfs_cleanup() drm/sti: Remove drm_debugfs_remove_files() calls drm/radeon: Remove drm_debugfs_remove_files() call drm/omap: Remove omap_debugfs_cleanup() drm/hdlcd: Remove hdlcd_debugfs_cleanup() drm/etnaviv: Remove etnaviv_debugfs_cleanup() drm/etnaviv: allow build with COMPILE_TEST drm/amd/amdgpu: Remove drm_debugfs_remove_files() call drm/prime: Clarify DMA-BUF/GEM Object lifetime drm/ttm: Make sure BOs being swapped out are cacheable drm/atomic: Remove drm_atomic_debugfs_cleanup() drm: drm_minor_register(): Clean up debugfs on failure drm: debugfs: Remove all files automatically on cleanup drm/fourcc: add vivante tiled layout format modifiers drm/edid: Set YQ bits in the AVI infoframe according to CEA-861-F drm/edid: Set AVI infoframe Q even when QS=0 drm/edid: Introduce drm_hdmi_avi_infoframe_quant_range() ...
2017-01-31Merge branch 'for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU changes from Paul E. McKenney: - Dynticks updates, consolidating open-coded counter accesses into a well-defined API - SRCU updates: Simplify algorithm, add formal verification - Documentation updates - Miscellaneous fixes - Torture-test updates Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Two trivial overlapping changes conflicts in MPLS and mlx5. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27radix tree: constify some pointersMatthew Wilcox
If we're just getting the value of a tag, or looking up an entry, we won't modify the radix tree, so we can declare these functions as taking a const pointer. Mostly for documentation purposes, though it might help code generation. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-01-27sbitmap: add helpers for dumping to a seq_fileOmar Sandoval
This is useful debugging information that will be used in the blk-mq debugfs directory. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Changed 'weight' to 'busy'. Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27Merge branch 'master' of ↵Dave Airlie
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next Backmerge Linus master to get the connector locking revert. * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux: (645 commits) sysctl: fix proc_doulongvec_ms_jiffies_minmax() Revert "drm/probe-helpers: Drop locking from poll_enable" MAINTAINERS: add Dan Streetman to zbud maintainers MAINTAINERS: add Dan Streetman to zswap maintainers mm: do not export ioremap_page_range symbol for external module mn10300: fix build error of missing fpu_save() romfs: use different way to generate fsid for BLOCK or MTD frv: add missing atomic64 operations mm, page_alloc: fix premature OOM when racing with cpuset mems update mm, page_alloc: move cpuset seqcount checking to slowpath mm, page_alloc: fix fast-path race with cpuset update or removal mm, page_alloc: fix check for NULL preferred_zone kernel/panic.c: add missing \n fbdev: color map copying bounds checking frv: add atomic64_add_unless() mm/mempolicy.c: do not put mempolicy before using its nodemask radix-tree: fix private list warnings Documentation/filesystems/proc.txt: add VmPin mm, memcg: do not retry precharge charges proc: add a schedule point in proc_pid_readdir() ...
2017-01-25test_firmware: add test custom fallback triggerLuis R. Rodriguez
We have no custom fallback mechanism test interface. Provide one. This tests both the custom fallback mechanism and cancelling the it. Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-25test_firmware: use device attribute groupsLuis R. Rodriguez
This simplifies init and exit. Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-25test_firmware: move misc_device downLuis R. Rodriguez
This will make further changes easier to review. Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-24mm: do not export ioremap_page_range symbol for external modulezhong jiang
Recently, I've found cases in which ioremap_page_range was used incorrectly, in external modules, leading to crashes. This can be partly attributed to the fact that ioremap_page_range is lower-level, with fewer protections, as compared to the other functions that an external module would typically call. Those include: ioremap_cache ioremap_nocache ioremap_prot ioremap_uc ioremap_wc ioremap_wt ...each of which wraps __ioremap_caller, which in turn provides a safer way to achieve the mapping. Therefore, stop EXPORT-ing ioremap_page_range. Link: http://lkml.kernel.org/r/1485173220-29010-1-git-send-email-zhongjiang@huawei.com Signed-off-by: zhong jiang <zhongjiang@huawei.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Suggested-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-24radix-tree: fix private list warningsMatthew Wilcox
The newly introduced warning in radix_tree_free_nodes() was testing the wrong variable; it should have been 'old' instead of 'node'. Fixes: ea07b862ac8e ("mm: workingset: fix use-after-free in shadow node shrinker") Link: http://lkml.kernel.org/r/20170118163746.GA32495@cmpxchg.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-24lib/dma-virt: Add dma_virt_opsBart Van Assche
Several RDMA drivers (hfi1, qib and rxe) expect that ib_sge.addr is a virtual address. Provide DMA mapping operations that are suitable for these drivers. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24lib/dma-noop: Only build dma_noop_ops for s390 and m32rBart Van Assche
Reduce the kernel size by only building dma_noop_ops for those architectures that actually use it. This was suggested by Christoph Hellwig. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24lib/dma-noop: Clarify a commentBart Van Assche
The next patch in this series will introduce another set of DMA operations that map 1:1 with memory. Clarify that dma-noop maps to physical addresses. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24treewide: Constify most dma_map_ops structuresBart Van Assche
Most dma_map_ops structures are never modified. Constify these structures such that these can be write-protected. This patch has been generated as follows: git grep -l 'struct dma_map_ops' | xargs -d\\n sed -i \ -e 's/struct dma_map_ops/const struct dma_map_ops/g' \ -e 's/const struct dma_map_ops {/struct dma_map_ops {/g' \ -e 's/^const struct dma_map_ops;$/struct dma_map_ops;/' \ -e 's/const const struct dma_map_ops /const struct dma_map_ops /g'; sed -i -e 's/const \(struct dma_map_ops intel_dma_ops\)/\1/' \ $(git grep -l 'struct dma_map_ops intel_dma_ops'); sed -i -e 's/const \(struct dma_map_ops dma_iommu_ops\)/\1/' \ $(git grep -l 'struct dma_map_ops' | grep ^arch/powerpc); sed -i -e '/^struct vmd_dev {$/,/^};$/ s/const \(struct dma_map_ops[[:blank:]]dma_ops;\)/\1/' \ -e '/^static void vmd_setup_dma_ops/,/^}$/ s/const \(struct dma_map_ops \*dest\)/\1/' \ -e 's/const \(struct dma_map_ops \*dest = \&vmd->dma_ops\)/\1/' \ drivers/pci/host/*.c sed -i -e '/^void __init pci_iommu_alloc(void)$/,/^}$/ s/dma_ops->/intel_dma_ops./' arch/ia64/kernel/pci-dma.c sed -i -e 's/static const struct dma_map_ops sn_dma_ops/static struct dma_map_ops sn_dma_ops/' arch/ia64/sn/pci/pci_dma.c sed -i -e 's/(const struct dma_map_ops \*)//' drivers/misc/mic/bus/vop_bus.c Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-arch@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Russell King <linux@armlinux.org.uk> Cc: x86@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-23rcu: Enable RCU tracepoints by default to aid in debuggingMatt Fleming
While debugging a performance issue I needed to understand why RCU sofitrqs were firing so frequently. Unfortunately, the RCU callback tracepoints are hidden behind CONFIG_RCU_TRACE which defaults to off in the upstream kernel and is likely to also be disabled in enterprise distribution configs. Enable it by default for CONFIG_TREE_RCU. However, we must keep it disabled for tiny RCU, because it would otherwise pull in a large amount of code that would make tiny RCU less than tiny. I ran some file system metadata intensive workloads (git checkout, FS-Mark) on a variety of machines with this patch and saw no detectable change in performance. Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23lib/prime_numbers: Suppress warn on kmalloc failureChris Wilson
The allocation for the bitmap may become very large, larger than MAX_ORDER, for large requests. We fail gracefully by falling back to trail-division, so disable the warning from kmalloc: 521.961092] WARNING: CPU: 0 PID: 30637 at mm/page_alloc.c:3548 __alloc_pages_slowpath+0x237/0x9a0 [ 521.961105] Modules linked in: i915(+) drm_kms_helper intel_gtt prime_numbers [last unloaded: drm_kms_helper] [ 521.961126] CPU: 0 PID: 30637 Comm: drv_selftest Tainted: G U W 4.10.0-rc3+ #321 [ 521.961137] Hardware name: / , BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015 [ 521.961148] Call Trace: [ 521.961161] dump_stack+0x4d/0x6f [ 521.961172] __warn+0xc1/0xe0 [ 521.961181] warn_slowpath_null+0x18/0x20 [ 521.961189] __alloc_pages_slowpath+0x237/0x9a0 [ 521.961200] ? sg_init_table+0x1a/0x40 [ 521.961208] ? get_page_from_freelist+0x3fa/0x910 [ 521.961275] ? i915_gem_object_get_sg+0x272/0x2b0 [i915] [ 521.961285] __alloc_pages_nodemask+0x1ea/0x220 [ 521.961295] kmalloc_order+0x1c/0x50 [ 521.961304] __kmalloc+0x115/0x170 [ 521.961314] expand_to_next_prime+0x43/0x180 [prime_numbers] [ 521.961324] next_prime_number+0x47/0xc0 [prime_numbers] [ 521.961377] igt_vma_rotate+0x386/0x590 [i915] [ 521.961429] i915_subtests+0x37/0xc0 [i915] [ 521.961481] i915_vma_mock_selftests+0x3d/0x70 [i915] [ 521.961532] run_selftests+0x16e/0x1f0 [i915] [ 521.961541] ? 0xffffffffa02a4000 [ 521.961592] i915_mock_selftests+0x29/0x40 [i915] [ 521.961638] i915_init+0xa/0x5e [i915] [ 521.961646] ? 0xffffffffa02a4000 [ 521.961655] do_one_initcall+0x3e/0x160 [ 521.961664] ? __vunmap+0x7c/0xc0 [ 521.961672] ? vfree+0x29/0x70 [ 521.961680] ? kmem_cache_alloc+0xcf/0x120 [ 521.961690] do_init_module+0x55/0x1c4 [ 521.961699] load_module+0x1f3f/0x25b0 [ 521.961707] ? __symbol_put+0x40/0x40 [ 521.961716] ? kernel_read_file+0x100/0x190 [ 521.961725] SYSC_finit_module+0xbc/0xf0 [ 521.961734] SyS_finit_module+0x9/0x10 [ 521.961744] entry_SYSCALL_64_fastpath+0x17/0x98 [ 521.961752] RIP: 0033:0x7f111aca4119 [ 521.961760] RSP: 002b:00007ffd8be6cbe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 521.961773] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f111aca4119 [ 521.961781] RDX: 0000000000000000 RSI: 000055dfc18bc8e0 RDI: 0000000000000006 [ 521.961789] RBP: 00007ffd8be6bbe0 R08: 0000000000000000 R09: 0000000000000000 [ 521.961796] R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000005 [ 521.961805] R13: 000055dfc18bd3a0 R14: 00007ffd8be6bbc0 R15: 0000000000000005 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20170113235119.22528-1-chris@chris-wilson.co.uk
2017-01-20percpu_counter: percpu_counter_hotcpu_callback() cleanupEric Dumazet
In commit ebd8fef304f9 ("percpu_counter: make percpu_counters_lock irq-safe") we disabled irqs in percpu_counter_hotcpu_callback() We can grab every counter spinlock without having to disable irqs again. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-01-20timerqueue: Use rb_entry_safe() instead of open-coding itGeliang Tang
Signed-off-by: Geliang Tang <geliangtang@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/0d5cf199ac43792df0b6f7e2145545c30fa1dbbe.1482222135.git.geliangtang@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-18sbitmap: fix wakeup hang after sbq resizeOmar Sandoval
When we resize a struct sbitmap_queue, we update the wakeup batch size, but we don't update the wait count in the struct sbq_wait_states. If we resized down from a size which could use a bigger batch size, these counts could be too large and cause us to miss necessary wakeups. To fix this, update the wait counts when we resize (ensuring some careful memory ordering so that it's safe w.r.t. concurrent clears). This also fixes a theoretical issue where two threads could end up bumping the wait count up by the batch size, which could also potentially lead to hangs. Reported-by: Martin Raiber <martin@urbackup.org> Fixes: e3a2b3f931f5 ("blk-mq: allow changing of queue depth through sysfs") Fixes: 2971c35f3588 ("blk-mq: bitmap tag: fix race on blk_mq_bitmap_tags::wake_cnt") Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-18sbitmap: use smp_mb__after_atomic() in sbq_wake_up()Omar Sandoval
We always do an atomic clear_bit() right before we call sbq_wake_up(), so we can use smp_mb__after_atomic(). While we're here, comment the memory barriers in here a little more. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2017-01-17Merge branch 'stable/for-linus-4.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb Pull swiotlb fix from Konrad Rzeszutek Wilk: "A tiny fix to make sure that page-sized mappings are page-aligned (and not say straddle two pages). This is important for some drivers (such as NVME)" * 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: swiotlb: ensure that page-sized mappings are page-aligned