]> git.kernelconcepts.de Git - karo-tx-linux.git/log
karo-tx-linux.git
9 years agofrv: drop _PAGE_FILE and pte_file()-related helpers
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:31 +0000 (14:10 -0800)]
frv: drop _PAGE_FILE and pte_file()-related helpers

We've replaced remap_file_pages(2) implementation with emulation.  Nobody
creates non-linear mapping anymore.

This patch also increase number of bits availble for swap offset.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agocris: drop _PAGE_FILE and pte_file()-related helpers
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:28 +0000 (14:10 -0800)]
cris: drop _PAGE_FILE and pte_file()-related helpers

We've replaced remap_file_pages(2) implementation with emulation.  Nobody
creates non-linear mapping anymore.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoc6x: drop pte_file()
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:25 +0000 (14:10 -0800)]
c6x: drop pte_file()

We've replaced remap_file_pages(2) implementation with emulation.  Nobody
creates non-linear mapping anymore.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoblackfin: drop pte_file()
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:23 +0000 (14:10 -0800)]
blackfin: drop pte_file()

We've replaced remap_file_pages(2) implementation with emulation.  Nobody
creates non-linear mapping anymore.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Steven Miao <realmz6@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoavr32: drop _PAGE_FILE and pte_file()-related helpers
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:20 +0000 (14:10 -0800)]
avr32: drop _PAGE_FILE and pte_file()-related helpers

We've replaced remap_file_pages(2) implementation with emulation.  Nobody
creates non-linear mapping anymore.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoarm: drop L_PTE_FILE and pte_file()-related helpers
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:17 +0000 (14:10 -0800)]
arm: drop L_PTE_FILE and pte_file()-related helpers

We've replaced remap_file_pages(2) implementation with emulation.  Nobody
creates non-linear mapping anymore.

This patch also adjust __SWP_TYPE_SHIFT, effectively increase size of
possible swap file to 128G.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoarm64: drop PTE_FILE and pte_file()-related helpers
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:15 +0000 (14:10 -0800)]
arm64: drop PTE_FILE and pte_file()-related helpers

We've replaced remap_file_pages(2) implementation with emulation.  Nobody
creates non-linear mapping anymore.

This patch also adjust __SWP_TYPE_SHIFT and increase number of bits
availble for swap offset.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoarc: drop _PAGE_FILE and pte_file()-related helpers
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:12 +0000 (14:10 -0800)]
arc: drop _PAGE_FILE and pte_file()-related helpers

We've replaced remap_file_pages(2) implementation with emulation.  Nobody
creates non-linear mapping anymore.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoalpha: drop _PAGE_FILE and pte_file()-related helpers
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:09 +0000 (14:10 -0800)]
alpha: drop _PAGE_FILE and pte_file()-related helpers

We've replaced remap_file_pages(2) implementation with emulation.  Nobody
creates non-linear mapping anymore.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoasm-generic: drop unused pte_file* helpers
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:07 +0000 (14:10 -0800)]
asm-generic: drop unused pte_file* helpers

All users are gone.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agomm: remove rest usage of VM_NONLINEAR and pte_file()
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:04 +0000 (14:10 -0800)]
mm: remove rest usage of VM_NONLINEAR and pte_file()

One bit in ->vm_flags is unused now!

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agomm: replace vma->sharead.linear with vma->shared
Kirill A. Shutemov [Tue, 10 Feb 2015 22:10:02 +0000 (14:10 -0800)]
mm: replace vma->sharead.linear with vma->shared

After removing vma->shared.nonlinear we have only one member of
vma->shared union, which doesn't make much sense.

This patch drops the union and move struct vma->shared.linear to
vma->shared.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agormap: drop support of non-linear mappings
Kirill A. Shutemov [Tue, 10 Feb 2015 22:09:59 +0000 (14:09 -0800)]
rmap: drop support of non-linear mappings

We don't create non-linear mappings anymore.  Let's drop code which
handles them in rmap.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoproc: drop handling non-linear mappings
Kirill A. Shutemov [Tue, 10 Feb 2015 22:09:57 +0000 (14:09 -0800)]
proc: drop handling non-linear mappings

We have to handle non-linear mappings for /proc/PID/{smaps,clear_refs}
which is unused now.  Let's drop it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agomm: drop vm_ops->remap_pages and generic_file_remap_pages() stub
Kirill A. Shutemov [Tue, 10 Feb 2015 22:09:54 +0000 (14:09 -0800)]
mm: drop vm_ops->remap_pages and generic_file_remap_pages() stub

Nobody uses it anymore.

[akpm@linux-foundation.org: fix filemap_xip.c]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agomm: drop support of non-linear mapping from fault codepath
Kirill A. Shutemov [Tue, 10 Feb 2015 22:09:51 +0000 (14:09 -0800)]
mm: drop support of non-linear mapping from fault codepath

We don't create non-linear mappings anymore.  Let's drop code which
handles them on page fault.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agomm: drop support of non-linear mapping from unmap/zap codepath
Kirill A. Shutemov [Tue, 10 Feb 2015 22:09:49 +0000 (14:09 -0800)]
mm: drop support of non-linear mapping from unmap/zap codepath

We have remap_file_pages(2) emulation in -mm tree for few release cycles
and we plan to have it mainline in v3.20. This patchset removes rest of
VM_NONLINEAR infrastructure.

Patches 1-8 take care about generic code. They are pretty
straight-forward and can be applied without other of patches.

Rest patches removes pte_file()-related stuff from architecture-specific
code. It usually frees up one bit in non-present pte. I've tried to reuse
that bit for swap offset, where I was able to figure out how to do that.

For obvious reason I cannot test all that arch-specific code and would
like to see acks from maintainers.

In total, remap_file_pages(2) required about 1.4K lines of not-so-trivial
kernel code. That's too much for functionality nobody uses.

Tested-by: Felipe Balbi <balbi@ti.com>
This patch (of 38):

We don't create non-linear mappings anymore. Let's drop code which
handles them on unmap/zap.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agomm: replace remap_file_pages() syscall with emulation
Kirill A. Shutemov [Tue, 10 Feb 2015 22:09:46 +0000 (14:09 -0800)]
mm: replace remap_file_pages() syscall with emulation

remap_file_pages(2) was invented to be able efficiently map parts of
huge file into limited 32-bit virtual address space such as in database
workloads.

Nonlinear mappings are pain to support and it seems there's no
legitimate use-cases nowadays since 64-bit systems are widely available.

Let's drop it and get rid of all these special-cased code.

The patch replaces the syscall with emulation which creates new VMA on
each remap_file_pages(), unless they it can be merged with an adjacent
one.

I didn't find *any* real code that uses remap_file_pages(2) to test
emulation impact on.  I've checked Debian code search and source of all
packages in ALT Linux.  No real users: libc wrappers, mentions in
strace, gdb, valgrind and this kind of stuff.

There are few basic tests in LTP for the syscall.  They work just fine
with emulation.

To test performance impact, I've written small test case which
demonstrate pretty much worst case scenario: map 4G shmfs file, write to
begin of every page pgoff of the page, remap pages in reverse order,
read every page.

The test creates 1 million of VMAs if emulation is in use, so I had to
set vm.max_map_count to 1100000 to avoid -ENOMEM.

Before: 23.3 ( +-  4.31% ) seconds
After: 43.9 ( +-  0.85% ) seconds
Slowdown: 1.88x

I believe we can live with that.

Test case:

        #define _GNU_SOURCE
        #include <assert.h>
        #include <stdlib.h>
        #include <stdio.h>
        #include <sys/mman.h>

        #define MB (1024UL * 1024)
        #define SIZE (4096 * MB)

        int main(int argc, char **argv)
        {
                unsigned long *p;
                long i, pass;

                for (pass = 0; pass < 10; pass++) {
                        p = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
                                        MAP_SHARED | MAP_ANONYMOUS, -1, 0);
                        if (p == MAP_FAILED) {
                                perror("mmap");
                                return -1;
                        }

                        for (i = 0; i < SIZE / 4096; i++)
                                p[i * 4096 / sizeof(*p)] = i;

                        for (i = 0; i < SIZE / 4096; i++) {
                                if (remap_file_pages(p + i * 4096 / sizeof(*p), 4096,
                                                0, (SIZE - 4096 * (i + 1)) >> 12, 0)) {
                                        perror("remap_file_pages");
                                        return -1;
                                }
                        }

                        for (i = SIZE / 4096 - 1; i >= 0; i--)
                                assert(p[i * 4096 / sizeof(*p)] == SIZE / 4096 - i - 1);

                        munmap(p, SIZE);
                }

                return 0;
        }

[akpm@linux-foundation.org: fix spello]
[sasha.levin@oracle.com: initialize populate before usage]
[sasha.levin@oracle.com: grab file ref to prevent race while mmaping]
Signed-off-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Armin Rigo <arigo@tunes.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agomm/vmstat.c: fix/cleanup ifdefs
Andrew Morton [Tue, 10 Feb 2015 22:09:43 +0000 (14:09 -0800)]
mm/vmstat.c: fix/cleanup ifdefs

CONFIG_COMPACTION=y, CONFIG_DEBUG_FS=n:

  mm/vmstat.c:690: warning: 'frag_start' defined but not used
  mm/vmstat.c:702: warning: 'frag_next' defined but not used
  mm/vmstat.c:710: warning: 'frag_stop' defined but not used
  mm/vmstat.c:715: warning: 'walk_zones_in_node' defined but not used

It's all a bit of a tangly mess and it's unclear why CONFIG_COMPACTION
figures in there at all.  Move frag_start/frag_next/frag_stop and
migratetype_names[] into the existing CONFIG_PROC_FS block.

walk_zones_in_node() gets a special ifdef.

Also move the #include lines up to where #include lines live.

[axel.lin@ingics.com: fix build error when !CONFIG_PROC_FS]
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Tested-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agomm/slab_common.c: use kmem_cache_free()
Vaishali Thakkar [Tue, 10 Feb 2015 22:09:40 +0000 (14:09 -0800)]
mm/slab_common.c: use kmem_cache_free()

Here, free memory is allocated using kmem_cache_zalloc.  So, use
kmem_cache_free instead of kfree.

This is done using Coccinelle and semantic patch used
is as follows:

@@
expression x,E,c;
@@

 x = \(kmem_cache_alloc\|kmem_cache_zalloc\|kmem_cache_alloc_node\)(c,...)
 ... when != x = E
     when != &x
?-kfree(x)
+kmem_cache_free(c,x)

Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agomm/slub.c: fix typo in comment
Kim Phillips [Tue, 10 Feb 2015 22:09:37 +0000 (14:09 -0800)]
mm/slub.c: fix typo in comment

Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agomm: don't use compound_head() in virt_to_head_page()
Joonsoo Kim [Tue, 10 Feb 2015 22:09:35 +0000 (14:09 -0800)]
mm: don't use compound_head() in virt_to_head_page()

compound_head() is implemented with assumption that there would be race
condition when checking tail flag.  This assumption is only true when we
try to access arbitrary positioned struct page.

The situation that virt_to_head_page() is called is different case.  We
call virt_to_head_page() only in the range of allocated pages, so there
is no race condition on tail flag.  In this case, we don't need to
handle race condition and we can reduce overhead slightly.  This patch
implements compound_head_fast() which is similar with compound_head()
except tail flag race handling.  And then, virt_to_head_page() uses this
optimized function to improve performance.

I saw 1.8% win in a fast-path loop over kmem_cache_alloc/free, (14.063
ns -> 13.810 ns) if target object is on tail page.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agomm/slub: optimize alloc/free fastpath by removing preemption on/off
Joonsoo Kim [Tue, 10 Feb 2015 22:09:32 +0000 (14:09 -0800)]
mm/slub: optimize alloc/free fastpath by removing preemption on/off

We had to insert a preempt enable/disable in the fastpath a while ago in
order to guarantee that tid and kmem_cache_cpu are retrieved on the same
cpu.  It is the problem only for CONFIG_PREEMPT in which scheduler can
move the process to other cpu during retrieving data.

Now, I reach the solution to remove preempt enable/disable in the
fastpath.  If tid is matched with kmem_cache_cpu's tid after tid and
kmem_cache_cpu are retrieved by separate this_cpu operation, it means
that they are retrieved on the same cpu.  If not matched, we just have
to retry it.

With this guarantee, preemption enable/disable isn't need at all even if
CONFIG_PREEMPT, so this patch removes it.

I saw roughly 5% win in a fast-path loop over kmem_cache_alloc/free in
CONFIG_PREEMPT.  (14.821 ns -> 14.049 ns)

Below is the result of Christoph's slab_test reported by Jesper Dangaard
Brouer.

* Before

 Single thread testing
 =====================
 1. Kmalloc: Repeatedly allocate then free test
 10000 times kmalloc(8) -> 49 cycles kfree -> 62 cycles
 10000 times kmalloc(16) -> 48 cycles kfree -> 64 cycles
 10000 times kmalloc(32) -> 53 cycles kfree -> 70 cycles
 10000 times kmalloc(64) -> 64 cycles kfree -> 77 cycles
 10000 times kmalloc(128) -> 74 cycles kfree -> 84 cycles
 10000 times kmalloc(256) -> 84 cycles kfree -> 114 cycles
 10000 times kmalloc(512) -> 83 cycles kfree -> 116 cycles
 10000 times kmalloc(1024) -> 81 cycles kfree -> 120 cycles
 10000 times kmalloc(2048) -> 104 cycles kfree -> 136 cycles
 10000 times kmalloc(4096) -> 142 cycles kfree -> 165 cycles
 10000 times kmalloc(8192) -> 238 cycles kfree -> 226 cycles
 10000 times kmalloc(16384) -> 403 cycles kfree -> 264 cycles
 2. Kmalloc: alloc/free test
 10000 times kmalloc(8)/kfree -> 68 cycles
 10000 times kmalloc(16)/kfree -> 68 cycles
 10000 times kmalloc(32)/kfree -> 69 cycles
 10000 times kmalloc(64)/kfree -> 68 cycles
 10000 times kmalloc(128)/kfree -> 68 cycles
 10000 times kmalloc(256)/kfree -> 68 cycles
 10000 times kmalloc(512)/kfree -> 74 cycles
 10000 times kmalloc(1024)/kfree -> 75 cycles
 10000 times kmalloc(2048)/kfree -> 74 cycles
 10000 times kmalloc(4096)/kfree -> 74 cycles
 10000 times kmalloc(8192)/kfree -> 75 cycles
 10000 times kmalloc(16384)/kfree -> 510 cycles

* After

 Single thread testing
 =====================
 1. Kmalloc: Repeatedly allocate then free test
 10000 times kmalloc(8) -> 46 cycles kfree -> 61 cycles
 10000 times kmalloc(16) -> 46 cycles kfree -> 63 cycles
 10000 times kmalloc(32) -> 49 cycles kfree -> 69 cycles
 10000 times kmalloc(64) -> 57 cycles kfree -> 76 cycles
 10000 times kmalloc(128) -> 66 cycles kfree -> 83 cycles
 10000 times kmalloc(256) -> 84 cycles kfree -> 110 cycles
 10000 times kmalloc(512) -> 77 cycles kfree -> 114 cycles
 10000 times kmalloc(1024) -> 80 cycles kfree -> 116 cycles
 10000 times kmalloc(2048) -> 102 cycles kfree -> 131 cycles
 10000 times kmalloc(4096) -> 135 cycles kfree -> 163 cycles
 10000 times kmalloc(8192) -> 238 cycles kfree -> 218 cycles
 10000 times kmalloc(16384) -> 399 cycles kfree -> 262 cycles
 2. Kmalloc: alloc/free test
 10000 times kmalloc(8)/kfree -> 65 cycles
 10000 times kmalloc(16)/kfree -> 66 cycles
 10000 times kmalloc(32)/kfree -> 65 cycles
 10000 times kmalloc(64)/kfree -> 66 cycles
 10000 times kmalloc(128)/kfree -> 66 cycles
 10000 times kmalloc(256)/kfree -> 71 cycles
 10000 times kmalloc(512)/kfree -> 72 cycles
 10000 times kmalloc(1024)/kfree -> 71 cycles
 10000 times kmalloc(2048)/kfree -> 71 cycles
 10000 times kmalloc(4096)/kfree -> 71 cycles
 10000 times kmalloc(8192)/kfree -> 65 cycles
 10000 times kmalloc(16384)/kfree -> 511 cycles

Most of the results are better than before.

Note that this change slightly worses performance in !CONFIG_PREEMPT,
roughly 0.3%.  Implementing each case separately would help performance,
but, since it's so marginal, I didn't do that.  This would help
maintanance since we have same code for all cases.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agofsioctl.c: make generic_block_fiemap() signal-tolerant
Dmitry Monakhov [Tue, 10 Feb 2015 22:09:29 +0000 (14:09 -0800)]
fsioctl.c: make generic_block_fiemap() signal-tolerant

__generic_block_fiemap may spin very long time for large sparse files.

Without this patch an unprivileged user may abuse system resources simply
by spawning a vast number of unkilable busyloops (works on ext2/ext3):

  truncate --size 1T test
  for ((i=0;i<1024;i++))
  do
         filefrag test > /dev/null &
  done

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoo2dlm: fix NULL pointer dereference in o2dlm_blocking_ast_wrapper
Srinivas Eeda [Tue, 10 Feb 2015 22:09:26 +0000 (14:09 -0800)]
o2dlm: fix NULL pointer dereference in o2dlm_blocking_ast_wrapper

A tiny race between BAST and unlock message causes the NULL dereference.

A node sends an unlock request to master and receives a response.  Before
processing the response it receives a BAST from the master.  Since both
requests are processed by different threads it creates a race.  While the
BAST is being processed, lock can get freed by unlock code.

This patch makes bast to return immediately if lock is found but unlock is
pending.  The code should handle this race.  We also have to fix master
node to skip sending BAST after receiving unlock message.

Below is the crash stack

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
    IP: o2dlm_blocking_ast_wrapper+0xd/0x16
    dlm_do_local_bast+0x8e/0x97 [ocfs2_dlm]
    dlm_proxy_ast_handler+0x838/0x87e [ocfs2_dlm]
    o2net_process_message+0x395/0x5b8 [ocfs2_nodemanager]
    o2net_rx_until_empty+0x762/0x90d [ocfs2_nodemanager]
    worker_thread+0x14d/0x1ed

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoocfs2: prune the dcache before deleting the dentry of directory
alex chen [Tue, 10 Feb 2015 22:09:23 +0000 (14:09 -0800)]
ocfs2: prune the dcache before deleting the dentry of directory

In ocfs2_dentry_convert_worker, we should prune the dcache before deleting
the dentry of directory, otherwise, in the following cases the inode of
directory will still remain in orphan directory until the device being
umounted.

Mount point: /mnt/ocfs2
Node A                              Node B
mkdir /mnt/ocfs2/testdir
  ocfs2_mkdir
  ->ocfs2_mknod
  ->ocfs2_dentry_attach_lock
  ->ocfs2_dentry_lock(dentry, 0)
  ... ...
touch /mnt/ocfs2/testdir/testfile
                                    unlink /mnt/test/testdir/testfile
                                    rmdir /mnt/ocfs2/testdir
                                      ocfs2_unlink
                                      ->ocfs2_remote_dentry_delete
                                      ->ocfs2_dentry_lock(dentry, 1)
                                      ... ...
... ...
ocfs2_downconvert_thread
->ocfs2_unblock_lock
->ocfs2_dentry_convert_worker
->ocfs2_find_local_alias
  ->dget_dlock
->d_delete
Here the dentry can not be
released because the children's
dentry is negative but still exist.
Finally, this inode will still remain
in orphan directory until its children
are destroyed.

So before deleting dentry of directory, we should prune the dcache to
remove unused children of the parent dentry by shrink_dcache_parent().

Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: joyce.xue <xuejiufei@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoocfs2: make resv_lock spinlock static
Fabian Frederick [Tue, 10 Feb 2015 22:09:20 +0000 (14:09 -0800)]
ocfs2: make resv_lock spinlock static

resv_lock is only used in reservations.c

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoocfs2: remove unreachable code in __ocfs2_recovery_thread()
Daeseok Youn [Tue, 10 Feb 2015 22:09:18 +0000 (14:09 -0800)]
ocfs2: remove unreachable code in __ocfs2_recovery_thread()

Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoocfs2: removes mlog_errno() call twice in ocfs2_find_dir_space_el()
Daeseok Youn [Tue, 10 Feb 2015 22:09:15 +0000 (14:09 -0800)]
ocfs2: removes mlog_errno() call twice in ocfs2_find_dir_space_el()

mlog_errno() is called twice when some functions are failed.

Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoocfs2: remove unreachable code
Daeseok Youn [Tue, 10 Feb 2015 22:09:12 +0000 (14:09 -0800)]
ocfs2: remove unreachable code

Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoocfs2: o2net: silence uninitialized variable warning
Dan Carpenter [Tue, 10 Feb 2015 22:09:10 +0000 (14:09 -0800)]
ocfs2: o2net: silence uninitialized variable warning

Smatch complains that, if o2net_tx_can_proceed() returns false, then "sc"
and "ret" are uninialized or maybe we are re-using the data from previous
iteration.  I do not know if we can hit this bug in real life but checking
the return value is harmless and we may as well silence the static checker
warning.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoocfs2: remove pointless assignment from ocfs2_calc_refcount_meta_credits()
Jan Kara [Tue, 10 Feb 2015 22:09:07 +0000 (14:09 -0800)]
ocfs2: remove pointless assignment from ocfs2_calc_refcount_meta_credits()

The assigned value is never used.

Coverity-id 1226847.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoocfs2: add a mount option journal_async_commit on ocfs2 filesystem
alex chen [Tue, 10 Feb 2015 22:09:04 +0000 (14:09 -0800)]
ocfs2: add a mount option journal_async_commit on ocfs2 filesystem

Add a mount option to support JBD2 feature:

JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT.  When this feature is opened, journal
commit block can be written to disk without waiting for descriptor blocks,
which can improve journal commit performance.  This option will enable
'journal_checksum' internally.

Using the fs_mark benchmark, using journal_async_commit shows a 50%
improvement, the files per second go up from 215.2 to 317.5.

test script:
fs_mark  -d  /mnt/ocfs2/  -s  10240  -n  1000

default:
FSUse%        Count         Size    Files/sec     App Overhead
     0         1000        10240        215.2            17878

with journal_async_commit option:
FSUse%        Count         Size    Files/sec     App Overhead
     0         1000        10240        317.5            17881

Signed-off-by: Alex Chen <alex.chen@huawei.com>
Signed-off-by: Weiwei Wang <wangww631@huawei.comm>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoocfs2: fix journal commit deadlock in ocfs2_convert_inline_data_to_extents
alex chen [Tue, 10 Feb 2015 22:09:02 +0000 (14:09 -0800)]
ocfs2: fix journal commit deadlock in ocfs2_convert_inline_data_to_extents

Similar to ocfs2_write_end_nolock() which is metioned at commit
136f49b91710 ("ocfs2: fix journal commit deadlock"), we should unlock
pages before ocfs2_commit_trans() in ocfs2_convert_inline_data_to_extents.

Otherwise, it will cause a deadlock with journal commit threads.

Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoocfs2: dlm: dlmdomain: remove unused function
Rickard Strandqvist [Tue, 10 Feb 2015 22:08:59 +0000 (14:08 -0800)]
ocfs2: dlm: dlmdomain: remove unused function

Remove dlm_joined() that is not used anywhere.

This was partially found by using a static code analysis program called
cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoocfs2: quota_local: remove unused function
Rickard Strandqvist [Tue, 10 Feb 2015 22:08:56 +0000 (14:08 -0800)]
ocfs2: quota_local: remove unused function

Remove ol_dqblk_file_block() that is not used anywhere.

This was partially found by using a static code analysis program called
cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoocfs2: xattr: remove unused function
Rickard Strandqvist [Tue, 10 Feb 2015 22:08:54 +0000 (14:08 -0800)]
ocfs2: xattr: remove unused function

Remove ocfs2_xattr_bucket_get_val() that is not used anywhere.

This was partially found by using a static code analysis program called
cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoocfs2: fix snprintf format specifier in dlmdebug.c
alex chen [Tue, 10 Feb 2015 22:08:51 +0000 (14:08 -0800)]
ocfs2: fix snprintf format specifier in dlmdebug.c

Use snprintf format specifier "%lu" instead of "%ld" for argument of type
'unsigned long'.

Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoocfs2: fix wrong comment
Junxiao Bi [Tue, 10 Feb 2015 22:08:48 +0000 (14:08 -0800)]
ocfs2: fix wrong comment

O2NET_CONN_IDLE_DELAY is not defined, connection attempts will not be
canceled due to timeout.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoocfs2: fix uninitialized variable access
Junxiao Bi [Tue, 10 Feb 2015 22:08:46 +0000 (14:08 -0800)]
ocfs2: fix uninitialized variable access

Variable "why" is not yet initialized at line 615, fix it.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoocfs2: remove unnecessary else in ocfs2_set_acl()
Fabian Frederick [Tue, 10 Feb 2015 22:08:43 +0000 (14:08 -0800)]
ocfs2: remove unnecessary else in ocfs2_set_acl()

else is unnecessary after return.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoocfs2/dlm: add missing dlm_lock_put() when recovery master down
Xue jiufei [Tue, 10 Feb 2015 22:08:40 +0000 (14:08 -0800)]
ocfs2/dlm: add missing dlm_lock_put() when recovery master down

When the recovery master is down, the owner of $RECOVERY calls
dlm_do_local_recovery_cleanup() to prune any $RECOVERY entries for dead
nodes.  The lock is in the granted list and the refcount must be 2.  We
should put twice to remove this lock.  Otherwise, it will lead to a memory
leak.

Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Reported-by: yangwenfang <vicky.yangwenfang@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agosh: eliminate unused irq_reg_{readl,writel} accessors
Kevin Cernekee [Tue, 10 Feb 2015 22:08:38 +0000 (14:08 -0800)]
sh: eliminate unused irq_reg_{readl,writel} accessors

Defining these macros way down in arch/sh/.../irq.c doesn't cause
kernel/irq/generic-chip.c to use them.  As far as I can tell this code has
no effect.

Fixes: 332fd7c4fef5f3b1 ("genirq: Generic chip: Change irq_reg_{readl,writel} arguments")
Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> (cpp/asm comparison)
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agosh: build superh without CONFIG_EXPERT
Rob Landley [Tue, 10 Feb 2015 22:08:35 +0000 (14:08 -0800)]
sh: build superh without CONFIG_EXPERT

What sh4 actually wants is HAVE_PATA_PLATFORM, so select that instead.

Signed-off-by: Rob Landley <rob@landley.net>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agofsnotify: fix handling of renames in audit
Jan Kara [Tue, 10 Feb 2015 22:08:32 +0000 (14:08 -0800)]
fsnotify: fix handling of renames in audit

Commit e9fd702a58c4 ("audit: convert audit watches to use fsnotify
instead of inotify") broke handling of renames in audit.  Audit code
wants to update inode number of an inode corresponding to watched name
in a directory.  When something gets renamed into a directory to a
watched name, inotify previously passed moved inode to audit code
however new fsnotify code passes directory inode where the change
happened.  That confuses audit and it starts watching parent directory
instead of a file in a directory.

This can be observed for example by doing:

  cd /tmp
  touch foo bar
  auditctl -w /tmp/foo
  touch foo
  mv bar foo
  touch foo

In audit log we see events like:

  type=CONFIG_CHANGE msg=audit(1423563584.155:90): auid=1000 ses=2 op="updated rules" path="/tmp/foo" key=(null) list=4 res=1
  ...
  type=PATH msg=audit(1423563584.155:91): item=2 name="bar" inode=1046884 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=DELETE
  type=PATH msg=audit(1423563584.155:91): item=3 name="foo" inode=1046842 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=DELETE
  type=PATH msg=audit(1423563584.155:91): item=4 name="foo" inode=1046884 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=CREATE
  ...

and that's it - we see event for the first touch after creating the
audit rule, we see events for rename but we don't see any event for the
last touch.  However we start seeing events for unrelated stuff
happening in /tmp.

Fix the problem by passing moved inode as data in the FS_MOVED_FROM and
FS_MOVED_TO events instead of the directory where the change happens.
This doesn't introduce any new problems because noone besides
audit_watch.c cares about the passed value:

  fs/notify/fanotify/fanotify.c cares only about FSNOTIFY_EVENT_PATH events.
  fs/notify/dnotify/dnotify.c doesn't care about passed 'data' value at all.
  fs/notify/inotify/inotify_fsnotify.c uses 'data' only for FSNOTIFY_EVENT_PATH.
  kernel/audit_tree.c doesn't care about passed 'data' at all.
  kernel/audit_watch.c expects moved inode as 'data'.

Fixes: e9fd702a58c49db ("audit: convert audit watches to use fsnotify instead of inotify")
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoinotify: update documentation to reflect code changes
Zhang Zhen [Tue, 10 Feb 2015 22:08:30 +0000 (14:08 -0800)]
inotify: update documentation to reflect code changes

The inotify interface has changed a lot.  The user interface was too
old, and the kernel interface was removed by Eric Paris in commit:
2dfc1ca inotify: remove inotify in kernel interface.

Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com>
Cc: Wang Kai <morgan.wang@huawei.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Robert Love <robert.w.love@intel.com>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agofanotify: don't set FAN_ONDIR implicitly on a marks ignored mask
Lino Sanfilippo [Tue, 10 Feb 2015 22:08:27 +0000 (14:08 -0800)]
fanotify: don't set FAN_ONDIR implicitly on a marks ignored mask

Currently FAN_ONDIR is always set on a mark's ignored mask when the
event mask is extended without FAN_MARK_ONDIR being set.  This may
result in events for directories being ignored unexpectedly for call
sequences like

  fanotify_mark(fd, FAN_MARK_ADD, FAN_OPEN | FAN_ONDIR , AT_FDCWD, "dir");
  fanotify_mark(fd, FAN_MARK_ADD, FAN_CLOSE, AT_FDCWD, "dir");

Also FAN_MARK_ONDIR is only honored when adding events to a mark's mask,
but not for event removal.  Fix both issues by not setting FAN_ONDIR
implicitly on the ignore mask any more.  Instead treat FAN_ONDIR as any
other event flag and require FAN_MARK_ONDIR to be set by the user for
both event mask and ignore mask.  Furthermore take FAN_MARK_ONDIR into
account when set for event removal.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agofanotify: don't recalculate a marks mask if only the ignored mask changed
Lino Sanfilippo [Tue, 10 Feb 2015 22:08:24 +0000 (14:08 -0800)]
fanotify: don't recalculate a marks mask if only the ignored mask changed

If removing bits from a mark's ignored mask, the concerning
inodes/vfsmounts mask is not affected.  So don't recalculate it.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agofanotify: only destroy mark when both mask and ignored_mask are cleared
Lino Sanfilippo [Tue, 10 Feb 2015 22:08:21 +0000 (14:08 -0800)]
fanotify: only destroy mark when both mask and ignored_mask are cleared

In fanotify_mark_remove_from_mask() a mark is destroyed if only one of
both bitmasks (mask or ignored_mask) of a mark is cleared.  However the
other mask may still be set and contain information that should not be
lost.  So only destroy a mark if both masks are cleared.

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agohugetlb, x86: register 1G page size if we can allocate them at runtime
Kirill A. Shutemov [Tue, 10 Feb 2015 22:08:19 +0000 (14:08 -0800)]
hugetlb, x86: register 1G page size if we can allocate them at runtime

After commit 944d9fec8d7a ("hugetlb: add support for gigantic page
allocation at runtime") we can allocate 1G pages at runtime if CMA is
enabled.

Let's register 1G pages into hugetlb even if the user hasn't requested
them explicitly at boot time with hugepagesz=1G.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoMerge branch 'for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Linus Torvalds [Tue, 10 Feb 2015 03:40:42 +0000 (19:40 -0800)]
Merge branch 'for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata

Pull libata changes from Tejun Heo:
 "Mostly driver-specific changes.  Nothing too noteworthy.

  This pull request contains three merges from for-3.19-fixes.  The
  first two are to pull ahci_xgene and sata_dwc_460ex fix commits which
  are depended upon by later changes.  The last one is to pull in a fix
  patch which missed the v3.19-rc window"

* 'for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: (24 commits)
  ahci_xgene: Fix the dma state machine lockup for the ATA_CMD_SMART PIO mode command.
  ata: libahci: Use of_platform_device_create only if supported
  sata_mv: Delete unnecessary checks before the function call "phy_power_off"
  ata: Delete unnecessary checks before the function call "pci_dev_put"
  ata: pata_platform: fix owner module reference mismatch for scsi host
  ata: ahci_platform: fix owner module reference mismatch for scsi host
  pata_pdc2027x: Use 64-bit timekeeping
  ata: libahci: Fix devres cleanup on failure
  ata: libahci: Allow using multiple regulators
  Documentation: bindings: Add the regulator property to the sub-nodes AHCI bindings
  ata: libahci: Clean-up the ahci_platform_en/disable_phys functions
  sata_rcar: extend PM methods
  sata_dwc_460ex: disable compilation on ARM and ARM64
  ata: libata-core: Remove unused function
  sata_dwc_460ex: convert to devm_kzalloc in ->probe()
  sata_dwc_460ex: remove extra message
  sata_dwc_460ex: use np local variable in ->probe()
  sata_dwc_460ex: fix most of the sparse warnings
  sata_dwc_460ex: enable COMPILE_TEST for the driver
  sata_dwc_460ex: remove redundant dev_set_drvdata
  ...

9 years agoMerge branch 'for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Linus Torvalds [Tue, 10 Feb 2015 03:21:15 +0000 (19:21 -0800)]
Merge branch 'for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue changes from Tejun Heo:
 "One cosmetic cleanup patch"

* 'for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue.h: remove loops of single statement macros

9 years agoMerge branch 'for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Linus Torvalds [Tue, 10 Feb 2015 03:07:45 +0000 (19:07 -0800)]
Merge branch 'for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup changes from Tejun Heo:
 "Three mostly trivial patches.  The biggest change is that blkio is now
  initialized before memcg which will be needed to make memcg and blkcg
  cooperate on writeback IOs"

* 'for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: add dummy css_put() for !CONFIG_CGROUPS
  cgroup: reorder SUBSYS(blkio) in cgroup_subsys.h
  Update of Documentation/cgroups/00-INDEX

9 years agoMerge branch 'for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Linus Torvalds [Tue, 10 Feb 2015 02:55:45 +0000 (18:55 -0800)]
Merge branch 'for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu

Pull percpu changes from Tejun Heo:
 "Nothing too interesting.  One cleanup patch and another to add a
  trivial state check function"

* 'for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu_ref: implement percpu_ref_is_dying()
  percpu_ref: remove unnecessary ACCESS_ONCE() in percpu_ref_tryget_live()

9 years agoMerge tag 'edac_for_3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Linus Torvalds [Tue, 10 Feb 2015 02:36:19 +0000 (18:36 -0800)]
Merge tag 'edac_for_3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:

 - a new synopsys_edac.c driver for the Synopsys DDR controller, from
   Punnaiah Choudary Kalluri.

 - minor fixes/cleanups all around

 - Mauro and I are adding the repo URLs to MAINTAINERS since people
   asked for trees to base upcoming work on.

* tag 'edac_for_3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC: Add repo URLs to MAINTAINERS
  EDAC, mv64x60_edac: Fix an error code in probe()
  EDAC: edac_mc_sysfs: Make stuff static
  EDAC: Fix the leak of mci->bus->name when bus_register fails
  edac: i5100_edac: Remove unused i5100_recmema_dm_buf_id
  EDAC, synps: Add EDAC support for zynq ddr ecc controller
  mpc85xx_edac: Fix a typo in comments
  EDAC, mce_amd_inj: Fix sparse non static symbol warning

9 years agoMerge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 10 Feb 2015 02:22:04 +0000 (18:22 -0800)]
Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 RAS update from Ingo Molnar:
 "The changes in this cycle were:

   - allow mmcfg access to APEI error injection handlers

   - improve MCE error messages

   - smaller cleanups"

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, mce: Fix sparse errors
  x86, mce: Improve timeout error messages
  ACPI, EINJ: Enhance error injection tolerance level

9 years agoMerge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 10 Feb 2015 02:16:03 +0000 (18:16 -0800)]
Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 mm cleanups from Ingo Molnar:
 "Two cleanups: simplify parse_setup_data() and sanitize_e820_map()
  usage"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, e820: Clean up sanitize_e820_map() users
  x86, setup: Let early_memremap() handle page alignment

9 years agoMerge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 10 Feb 2015 02:11:28 +0000 (18:11 -0800)]
Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SoC updates from Ingo Molnar:
 "Various Intel Atom SoC updates (mostly to enhance debuggability), plus
  an apb_timer cleanup"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: pmc_atom: Expose contents of PSS
  x86: pmc_atom: Clean up init function
  x86: pmc-atom: Remove unused macro
  x86: pmc_atom: don%27t check for NULL twice
  x86: pmc-atom: Assign debugfs node as soon as possible
  x86/platform: Remove unused function from apb_timer.c

9 years agoMerge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 10 Feb 2015 02:01:52 +0000 (18:01 -0800)]
Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fpu updates from Ingo Molnar:
 "Initial round of kernel_fpu_begin/end cleanups from Oleg Nesterov,
  plus a cleanup from Borislav Petkov"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, fpu: Fix math_state_restore() race with kernel_fpu_begin()
  x86, fpu: Don't abuse has_fpu in __kernel_fpu_begin/end()
  x86, fpu: Introduce per-cpu in_kernel_fpu state
  x86/fpu: Use a symbolic name for asm operand

9 years agoMerge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 10 Feb 2015 01:53:53 +0000 (17:53 -0800)]
Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull EFI updates from Ingo Molnar:
 "Main changes:

   - Move efivarfs from the misc filesystem section to pseudo filesystem

   - Expose firmware platform size in sysfs

   - Improve robustness of get_memory_map() by removing assumptions on
     the size of efi_memory_desc_t.

  - various cleanups and fixes

  The biggest risk is the get_memory_map() change, which changes the way
  that both the arm64 and x86 EFI boot stub build the early memory map.
  There are no known regressions with it at the moment, BYMMV"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi: Don't look for chosen@0 node on DT platforms
  firmware: efi: Remove unneeded guid unparse
  efi/libstub: Call get_memory_map() to obtain map and desc sizes
  efi: Small leak on error in runtime map code
  efi: rtc-efi: Mark UIE as unsupported
  arm64/efi: efistub: Apply __init annotation
  efi: Expose underlying UEFI firmware platform size to userland
  efi: Rename efi_guid_unparse to efi_guid_to_str
  efi: Update the URLs for efibootmgr
  fs: Make efivarfs a pseudo filesystem, built by default with EFI

9 years agoMerge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 10 Feb 2015 01:50:09 +0000 (17:50 -0800)]
Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Ingo Molnar:
 "Misc cleanups"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/rtc: Remove duplicate const specifier
  x86, early_serial_console: Remove unnecessary check
  x86, early_serial_console: Remove unused macro XMTRDY
  x86, setup: Rename BOOT_ISDIGIT_H to BOOT_CTYPE_H
  x86, CPU: Fix trivial printk formatting issues with dmesg

9 years agoMerge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 10 Feb 2015 01:16:44 +0000 (17:16 -0800)]
Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 asm changes from Ingo Molnar:
 "The main changes in this cycle were the x86/entry and sysret
  enhancements from Andy Lutomirski, see merge commits 772a9aca125 and
  b57c0b5175dd for details"

[ Exectutive summary: IST exceptions that interrupt user space will run
  on the regular kernel stack instead of the IST stack.  Which
  simplifies things particularly on return to user space.

  The sysret cleanup ends up simplifying the logic on when we can use
  sysret vs when we have to use iret.                - Linus ]

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86_64, entry: Remove the syscall exit audit and schedule optimizations
  x86_64, entry: Use sysret to return to userspace when possible
  x86, traps: Fix ist_enter from userspace
  x86, vdso: teach 'make clean' remove vdso64 binaries
  x86_64 entry: Fix RCX for ptraced syscalls
  x86: entry_64.S: fold SAVE_ARGS_IRQ macro into its sole user
  x86: ia32entry.S: fix wrong symbolic constant usage: R11->ARGOFFSET
  x86: entry_64.S: delete unused code
  x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks
  x86, traps: Add ist_begin_non_atomic and ist_end_non_atomic
  x86: Clean up current_stack_pointer
  x86, traps: Track entry into and exit from IST context
  x86, entry: Switch stacks on a paranoid entry from userspace

9 years agoMerge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 10 Feb 2015 00:57:56 +0000 (16:57 -0800)]
Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 APIC updates from Ingo Molnar:
 "Continued fallout of the conversion of the x86 IRQ code to the
  hierarchical irqdomain framework: more cleanups, simplifications,
  memory allocation behavior enhancements, mainly in the interrupt
  remapping and APIC code"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
  x86, init: Fix UP boot regression on x86_64
  iommu/amd: Fix irq remapping detection logic
  x86/acpi: Make acpi_[un]register_gsi_ioapic() depend on CONFIG_X86_LOCAL_APIC
  x86: Consolidate boot cpu timer setup
  x86/apic: Reuse apic_bsp_setup() for UP APIC setup
  x86/smpboot: Sanitize uniprocessor init
  x86/smpboot: Move apic init code to apic.c
  init: Get rid of x86isms
  x86/apic: Move apic_init_uniprocessor code
  x86/smpboot: Cleanup ioapic handling
  x86/apic: Sanitize ioapic handling
  x86/ioapic: Add proper checks to setp/enable_IO_APIC()
  x86/ioapic: Provide stub functions for IOAPIC%3Dn
  x86/smpboot: Move smpboot inlines to code
  x86/x2apic: Use state information for disable
  x86/x2apic: Split enable and setup function
  x86/x2apic: Disable x2apic from nox2apic setup
  x86/x2apic: Add proper state tracking
  x86/x2apic: Clarify remapping mode for x2apic enablement
  x86/x2apic: Move code in conditional region
  ...

9 years agoMerge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 10 Feb 2015 00:33:07 +0000 (16:33 -0800)]
Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Ingo Molnar:
 "The main changes in this cycle were:

   - rework hrtimer expiry calculation in hrtimer_interrupt(): the
     previous code had a subtle bug where expiry caching would miss an
     expiry, resulting in occasional bogus (late) expiry of hrtimers.

   - continuing Y2038 fixes

   - ktime division optimization

   - misc smaller fixes and cleanups"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hrtimer: Make __hrtimer_get_next_event() static
  rtc: Convert rtc_set_ntp_time() to use timespec64
  rtc: Remove redundant rtc_valid_tm() from rtc_hctosys()
  rtc: Modify rtc_hctosys() to address y2038 issues
  rtc: Update rtc-dev to use y2038-safe time interfaces
  rtc: Update interface.c to use y2038-safe time interfaces
  time: Expose get_monotonic_boottime64 for in-kernel use
  time: Expose getboottime64 for in-kernel uses
  ktime: Optimize ktime_divns for constant divisors
  hrtimer: Prevent stale expiry time in hrtimer_interrupt()
  ktime.h: Introduce ktime_ms_delta

9 years agoMerge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 10 Feb 2015 00:06:06 +0000 (16:06 -0800)]
Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "The main scheduler changes in this cycle were:

   - various sched/deadline fixes and enhancements

   - rescheduling latency fixes/cleanups

   - rework the rq->clock code to be more consistent and more robust.

   - minor micro-optimizations

   - ->avg.decay_count fixes

   - add a stack overflow check to might_sleep()

   - idle-poll handler fix, possibly resulting in power savings

   - misc smaller updates and fixes"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/Documentation: Remove unneeded word
  sched/wait: Introduce wait_on_bit_timeout()
  sched: Pull resched loop to __schedule() callers
  sched/deadline: Remove cpu_active_mask from cpudl_find()
  sched: Fix hrtick_start() on UP
  sched/deadline: Avoid pointless __setscheduler()
  sched/deadline: Fix stale yield state
  sched/deadline: Fix hrtick for a non-leftmost task
  sched/deadline: Modify cpudl::free_cpus to reflect rd->online
  sched/idle: Add missing checks to the exit condition of cpu_idle_poll()
  sched: Fix missing preemption opportunity
  sched/rt: Reduce rq lock contention by eliminating locking of non-feasible target
  sched/debug: Print rq->clock_task
  sched/core: Rework rq->clock update skips
  sched/core: Validate rq_clock*() serialization
  sched/core: Remove check of p->sched_class
  sched/fair: Fix sched_entity::avg::decay_count initialization
  sched/debug: Fix potential call to __ffs(0) in sched_show_task()
  sched/debug: Check for stack overflow in ___might_sleep()
  sched/fair: Fix the dealing with decay_count in __synchronize_entity_decay()

9 years agoMerge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 9 Feb 2015 23:43:55 +0000 (15:43 -0800)]
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates from Ingo Molnar:
 "Kernel side changes:

   - AMD range breakpoints support:

     Extend breakpoint tools and core to support address range through
     perf event with initial backend support for AMD extended
     breakpoints.

     The syntax is:

         perf record -e mem:addr/len:type

     For example set write breakpoint from 0x1000 to 0x1200 (0x1000 + 512)

         perf record -e mem:0x1000/512:w

   - event throttling/rotating fixes

   - various event group handling fixes, cleanups and general paranoia
     code to be more robust against bugs in the future.

    - kernel stack overhead fixes

  User-visible tooling side changes:

   - Show precise number of samples in at the end of a 'record' session,
     if processing build ids, since we will then traverse the whole
     perf.data file and see all the PERF_RECORD_SAMPLE records,
     otherwise stop showing the previous off-base heuristicly counted
     number of "samples" (Namhyung Kim).

   - Support to read compressed module from build-id cache (Namhyung
     Kim)

   - Enable sampling loads and stores simultaneously in 'perf mem'
     (Stephane Eranian)

   - 'perf diff' output improvements (Namhyung Kim)

   - Fix error reporting for evsel pgfault constructor (Arnaldo Carvalho
     de Melo)

  Tooling side infrastructure changes:

   - Cache eh/debug frame offset for dwarf unwind (Namhyung Kim)

   - Support parsing parameterized events (Cody P Schafer)

   - Add support for IP address formats in libtraceevent (David Ahern)

  Plus other misc fixes"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
  perf: Decouple unthrottling and rotating
  perf: Drop module reference on event init failure
  perf: Use POLLIN instead of POLL_IN for perf poll data in flag
  perf: Fix put_event() ctx lock
  perf: Fix move_group() order
  perf: Fix event->ctx locking
  perf: Add a bit of paranoia
  perf symbols: Convert lseek + read to pread
  perf tools: Use perf_data_file__fd() consistently
  perf symbols: Support to read compressed module from build-id cache
  perf evsel: Set attr.task bit for a tracking event
  perf header: Set header version correctly
  perf record: Show precise number of samples
  perf tools: Do not use __perf_session__process_events() directly
  perf callchain: Cache eh/debug frame offset for dwarf unwind
  perf tools: Provide stub for missing pthread_attr_setaffinity_np
  perf evsel: Don't rely on malloc working for sz 0
  tools lib traceevent: Add support for IP address formats
  perf ui/tui: Show fatal error message only if exists
  perf tests: Fix typo in sample-parsing.c
  ...

9 years agoMerge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Mon, 9 Feb 2015 23:24:03 +0000 (15:24 -0800)]
Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core locking updates from Ingo Molnar:
 "The main changes are:

   - mutex, completions and rtmutex micro-optimizations
   - lock debugging fix
   - various cleanups in the MCS and the futex code"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/rtmutex: Optimize setting task running after being blocked
  locking/rwsem: Use task->state helpers
  sched/completion: Add lock-free checking of the blocking case
  sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state
  locking/mutex: Explicitly mark task as running after wakeup
  futex: Fix argument handling in futex_lock_pi() calls
  doc: Fix misnamed FUTEX_CMP_REQUEUE_PI op constants
  locking/Documentation: Update code path
  softirq/preempt: Add missing current->preempt_disable_ip update
  locking/osq: No need for load/acquire when acquire-polling
  locking/mcs: Better differentiate between MCS variants
  locking/mutex: Introduce ww_mutex_set_context_slowpath()
  locking/mutex: Move MCS related comments to proper location
  locking/mutex: Checking the stamp is WW only

9 years agoMerge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 9 Feb 2015 22:28:42 +0000 (14:28 -0800)]
Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU updates from Ingo Molnar:
 "The main RCU changes in this cycle are:

   - Documentation updates.

   - Miscellaneous fixes.

   - Preemptible-RCU fixes, including fixing an old bug in the
     interaction of RCU priority boosting and CPU hotplug.

   - SRCU updates.

   - RCU CPU stall-warning updates.

   - RCU torture-test updates"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  rcu: Initialize tiny RCU stall-warning timeouts at boot
  rcu: Fix RCU CPU stall detection in tiny implementation
  rcu: Add GP-kthread-starvation checks to CPU stall warnings
  rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors
  rcu: Optionally run grace-period kthreads at real-time priority
  ksoftirqd: Use new cond_resched_rcu_qs() function
  ksoftirqd: Enable IRQs and call cond_resched() before poking RCU
  rcutorture: Add more diagnostics in rcu_barrier() test failure case
  torture: Flag console.log file to prevent holdovers from earlier runs
  torture: Add "-enable-kvm -soundhw pcspk" to qemu command line
  rcutorture: Handle different mpstat versions
  rcutorture: Check from beginning to end of grace period
  rcu: Remove redundant rcu_batches_completed() declaration
  rcutorture: Drop rcu_torture_completed() and friends
  rcu: Provide rcu_batches_completed_sched() for TINY_RCU
  rcutorture: Use unsigned for Reader Batch computations
  rcutorture: Make build-output parsing correctly flag RCU's warnings
  rcu: Make _batches_completed() functions return unsigned long
  rcutorture: Issue warnings on close calls due to Reader Batch blows
  documentation: Fix smp typo in memory-barriers.txt
  ...

9 years agoMerge tag 'regulator-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie...
Linus Torvalds [Mon, 9 Feb 2015 21:46:28 +0000 (13:46 -0800)]
Merge tag 'regulator-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator updates from Mark Brown:
 "This has not been a busy release for the regulator framework, though
  we do have the first parts of some ongoing work from Bjorn Andersson
  to allow us to support more complex modern systems with dynamic
  configuration of regulators in suspend and idle states.

   - Support for device-specific properties on regulator nodes when
     using simplified DT parsing in the core from Krzysztof Kozlowski.

   - Restructuring of the load tracking code, intended to support future
     improvements in this area for more complex system designs.

   - New drivers for Maxim MAX77843 and Mediatek MT6397.

   - Lots of smaller fixes and improvements"

* tag 'regulator-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (29 commits)
  regulator: max77843: Add max77843 regulator driver
  regulator: Fix build breakage on !REGULATOR
  regulator: Build sysfs entries with static attribute groups
  regulator: qcom-rpm: Make it possible to specify supply
  regulator: core: Consolidate drms update handling
  regulator: qcom-rpm: signedness bug in probe()
  regulator: da9211: Add gpio control for enable/disable of buck
  regulator: qcom_rpm: Don't update vreg->uV/mV if rpm_reg_write fails
  regulator: lp872x: Remove **regulators from struct lp872x
  regulator: da9211: fix unmatched of_node
  regulator: Update documentation after renaming function argument
  regulator: axp20x: Migrate to regulator core's simplified DT parsing code
  regulator: axp20x: Fill regulators_node and of_match descriptor fields
  regulator: pfuze100-regulator: add pfuze3000 support
  regulator: max77686: Document gpio properties
  regulator: Allow parsing custom properties when using simplified DT parsing
  regulator: max77686: Add GPIO control
  regulator: Copy config passed during registration
  regulator: tps65023: Constify struct regmap_config and regulator_ops
  regulator: max8649: Constify struct regmap_config and regulator_ops
  ...

9 years agoMerge tag 'spi-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Linus Torvalds [Mon, 9 Feb 2015 21:36:20 +0000 (13:36 -0800)]
Merge tag 'spi-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi updates from Mark Brown:
 "The major highlight this release is a refactoring of the core to allow
  us to run synchronous transfers in the context of the caller when
  there is no contention for the bus.  This improves performance in the
  very common case by eliminating context switches and reducing the
  number of hardware setup and teardown operations we need to perform.

  Other changes:

   - New drivers for DLN-2 USB-SPI adapter and ST SPI controllers.

   - A big round of cleanups, performance and feature improvements for
     the xilinx driver from Ricardo Ribalda Delgado.

   - A wide range of smaller cleanups, fixes and feature improvements
     throughout the subsystem"

* tag 'spi-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (68 commits)
  spi: mxs: cleanup wait_for_completion return handling
  spi: ti-qspi: cleanup wait_for_completion return handling
  spi: spi-imx: cleanup wait_for_completion handling
  spi: sh-msiof: cleanup wait_for_completion return handling
  spi: match var type to return type of wait_for_completion
  spi: spi-pxa2xx: only include mach/dma.h for legacy DMA
  spi: atmel: cleanup wait_for_completion return handling
  spi: fsl-dspi: Remove possible memory leak of 'chip'
  spi: sh-msiof: Update calculation of frequency dividing
  spi: spidev: Convert buf pointers for 32-bit compat SPI_IOC_MESSAGE(n)
  spi/xilinx: Fix access invalid memory on xilinx_spi_tx
  spi: Revert "spi/xilinx: Remove iowrite/ioread wrappers"
  spi/xilinx: Check number of slaves range
  spi/xilinx: Use polling mode on small transfers
  spi/xilinx: Remove remaining_words driver data variable
  spi/xilinx: Remove iowrite/ioread wrappers
  spi/xilinx: Convert bits_per_word in bytes_per_word
  spi/xilinx: Convert remainding_bytes in remaining words
  spi/xilinx: Make spi_tx and spi_rx simmetric
  spi/xilinx: Remove rx_fn and tx_fn pointer
  ...

9 years agoMerge tag 'regmap-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie...
Linus Torvalds [Mon, 9 Feb 2015 20:52:18 +0000 (12:52 -0800)]
Merge tag 'regmap-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull regmap updates from Mark Brown:
 "A very quiet release for regmap this time around:

   - Fix an endianness issue for I2C devices connected via SMBus where
     we were getting two layers both trying to do endianness handling.
   - Use a union to reduce the size of the regmap struct.
   - A couple of smaller fixes"

* tag 'regmap-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: Fix i2c word access when using SMBus access functions
  regmap: Export regmap_get_val_endian
  regmap: ac97: Clean up indentation
  regmap: correct the description of structure element in reg_field
  regmap: Move spinlock_flags into the union

9 years agoMerge tag 'hwmon-for-linus-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 9 Feb 2015 20:34:41 +0000 (12:34 -0800)]
Merge tag 'hwmon-for-linus-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon updates from Guenter Roeck:
 "Explicit support for ina231 added to ina2xx driver.

  Minor improvements, cleanup and fixes in various drivers"

* tag 'hwmon-for-linus-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (tmp102) add hibernation callbacks
  hwmon: (ads2828) Only keep data in device data structure if needed
  hwmon: (ads2828) Convert to use regmap
  hwmon: (jc42) Allow negative hysteresis temperatures
  hwmon: (adc128d818) Do proper sign extension
  hwmon: (ad7314) Do proper sign extension
  hwmon: (abx500) Fix format string warnings
  hwmon: (jc42) Fix integer overflow when writing hysteresis value
  hwmon: (jc42) Fix integer overflow
  hwmon: (jc42) Use sign_extend32 for sign extension
  hwmon: (ina2xx) Add ina231 compatible string
  hwmon: (ina2xx) use DIV_ROUND_CLOSEST() to avoid rounding errors
  hwmon: (ina2xx) remove an unnecessary dev_get_drvdata() result check
  hwmon: (ina2xx) implement update_interval attribute for ina226
  hwmon: (ina2xx) make shunt resistance configurable at run-time
  hwmon: (ina2xx) don't accept shunt values greater than the calibration factor
  hwmon: (ina2xx) remove a stray new line
  hwmon: (ina2xx) reinitialize the chip in case it's been reset
  hwmon: (nct7802) Constify struct regmap_config

9 years agorandom: Fix fast_mix() function
George Spelvin [Sat, 7 Feb 2015 05:32:06 +0000 (00:32 -0500)]
random: Fix fast_mix() function

There was a bad typo in commit 43759d4f429c ("random: use an improved
fast_mix() function") and I didn't notice because it "looked right", so
I saw what I expected to see when I reviewed it.

Only months later did I look and notice it's not the Threefish-inspired
mix function that I had designed and optimized.

Mea Culpa.  Each input bit still has a chance to affect each output bit,
and the fast pool is spilled *long* before it fills, so it's not a total
disaster, but it's definitely not the intended great improvement.

I'm still working on finding better rotation constants.  These are good
enough, but since it's unrolled twice, it's possible to get better
mixing for free by using eight different constants rather than repeating
the same four.

Signed-off-by: George Spelvin <linux@horizon.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org # v3.16+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoMerge branch 'for-3.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj...
Tejun Heo [Mon, 9 Feb 2015 12:54:41 +0000 (07:54 -0500)]
Merge branch 'for-3.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata into for-3.20

09c32aaa3683 ("ahci_xgene: Fix the dma state machine lockup for the
ATA_CMD_SMART PIO mode command.") missed 3.19 release.  Fold it into
for-3.20.

Signed-off-by: Tejun Heo <tj@kernel.org>
9 years agoEDAC: Add repo URLs to MAINTAINERS
Borislav Petkov [Fri, 6 Feb 2015 16:46:07 +0000 (17:46 +0100)]
EDAC: Add repo URLs to MAINTAINERS

... so that people can base new work ontop.

Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
9 years agoLinux 3.19 v3.19
Linus Torvalds [Mon, 9 Feb 2015 02:54:22 +0000 (18:54 -0800)]
Linux 3.19

9 years agoMerge tag 'nios2-fixes-v3.19-final' of git://git.rocketboards.org/linux-socfpga-next
Linus Torvalds [Mon, 9 Feb 2015 02:45:16 +0000 (18:45 -0800)]
Merge tag 'nios2-fixes-v3.19-final' of git://git.rocketboards.org/linux-socfpga-next

Pull nios2 fix from Ley Foon Tan:
 "This fixes incorrect behavior of some user programs"

* tag 'nios2-fixes-v3.19-final' of git://git.rocketboards.org/linux-socfpga-next:
  nios2: fix unhandled signals

9 years agoMerge git://git.kvack.org/~bcrl/aio-fixes
Linus Torvalds [Mon, 9 Feb 2015 02:27:58 +0000 (18:27 -0800)]
Merge git://git.kvack.org/~bcrl/aio-fixes

Pull aio nested sleep annotation from Ben LaHaise,

* git://git.kvack.org/~bcrl/aio-fixes:
  aio: annotate aio_read_event_ring for sleep patterns

9 years agoMerge tag 'trace-fixes-v3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 9 Feb 2015 02:08:14 +0000 (18:08 -0800)]
Merge tag 'trace-fixes-v3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull ftrace fixes from Steven Rostedt:
 "During testing Sedat Dilek hit a "suspicious RCU usage" splat that
  pointed out a real bug.  During suspend and resume the tlb_flush
  tracepoint is called when the CPU is going offline.  As the CPU has
  been noted as offline, RCU is ignoring that CPU, which means that it
  can not use RCU protected locks.  When tracepoints are activated, they
  require RCU locking, and if RCU is ignoring a CPU that runs a
  tracepoint, there is a chance that the tracepoint could cause
  corruption.

  The solution was to change the tracepoint into a
  TRACE_EVENT_CONDITION() which allows us to check a condition to
  determine if the tracepoint should be called or not.  If the condition
  is not met, the rcu protected code will not be executed.  By adding
  the condition "cpu_online(smp_processor_id())", this will prevent the
  RCU protected code from being executed if the CPU is marked offline.

  After adding this, another bug was discovered.  As RCU checks rcu
  callers, if a rcu call is not done, there is no check (obviously).  We
  found that tracepoints could be added in RCU ignored locations and not
  have lockdep complain until the tracepoint is activated.  This missed
  places where tracepoints were added in places they should not have
  been.  To fix this, code was added in 3.18 that if lockdep is enabled,
  any tracepoint will still call the rcu checks even if the tracepoint
  is not enabled.  The bug here, is that the check does not take the
  CONDITION into account.  As the condition may prevent tracepoints from
  being activated in RCU ignored areas (as the one patch does), we get
  false positives when we enable lockdep and hit a tracepoint that the
  condition prevents it from being called in a RCU ignored location.

  The fix for this is to add the CONDITION to the rcu checks, even if
  the tracepoint is not enabled"

* tag 'trace-fixes-v3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  x86/tlb/trace: Do not trace on CPU that is offline
  tracing: Add condition check to RCU lockdep checks

9 years agonios2: fix unhandled signals
Chung-Ling Tang [Mon, 9 Feb 2015 01:40:50 +0000 (09:40 +0800)]
nios2: fix unhandled signals

Follow other architectures for user fault handling.

Signed-off-by: Chung-Ling Tang <cltang@codesourcery.com>
Acked-by: Ley Foon Tan <lftan@altera.com>
9 years agoMerge remote-tracking branch 'spi/topic/xilinx' into spi-next
Mark Brown [Sun, 8 Feb 2015 03:17:01 +0000 (11:17 +0800)]
Merge remote-tracking branch 'spi/topic/xilinx' into spi-next

9 years agoMerge remote-tracking branches 'spi/topic/sirf', 'spi/topic/spidev', 'spi/topic/st...
Mark Brown [Sun, 8 Feb 2015 03:16:58 +0000 (11:16 +0800)]
Merge remote-tracking branches 'spi/topic/sirf', 'spi/topic/spidev', 'spi/topic/st-ssc' and 'spi/topic/ti-qspi' into spi-next

9 years agoMerge remote-tracking branches 'spi/topic/orion', 'spi/topic/pxa2xx', 'spi/topic...
Mark Brown [Sun, 8 Feb 2015 03:16:55 +0000 (11:16 +0800)]
Merge remote-tracking branches 'spi/topic/orion', 'spi/topic/pxa2xx', 'spi/topic/qup', 'spi/topic/rockchip' and 'spi/topic/samsung' into spi-next

9 years agoMerge remote-tracking branches 'spi/topic/img-spfi', 'spi/topic/imx', 'spi/topic...
Mark Brown [Sun, 8 Feb 2015 03:16:52 +0000 (11:16 +0800)]
Merge remote-tracking branches 'spi/topic/img-spfi', 'spi/topic/imx', 'spi/topic/inline', 'spi/topic/meson' and 'spi/topic/mxs' into spi-next

9 years agoMerge remote-tracking branches 'spi/topic/falcon', 'spi/topic/fsf', 'spi/topic/fsl...
Mark Brown [Sun, 8 Feb 2015 03:16:46 +0000 (11:16 +0800)]
Merge remote-tracking branches 'spi/topic/falcon', 'spi/topic/fsf', 'spi/topic/fsl', 'spi/topic/fsl-dspi' and 'spi/topic/gpio' into spi-next

9 years agoMerge remote-tracking branches 'spi/topic/atmel', 'spi/topic/config', 'spi/topic...
Mark Brown [Sun, 8 Feb 2015 03:16:43 +0000 (11:16 +0800)]
Merge remote-tracking branches 'spi/topic/atmel', 'spi/topic/config', 'spi/topic/dln2' and 'spi/topic/dw' into spi-next

9 years agoMerge remote-tracking branch 'spi/topic/sh-msiof' into spi-next
Mark Brown [Sun, 8 Feb 2015 03:16:43 +0000 (11:16 +0800)]
Merge remote-tracking branch 'spi/topic/sh-msiof' into spi-next

9 years agoMerge remote-tracking branch 'spi/topic/core' into spi-next
Mark Brown [Sun, 8 Feb 2015 03:16:42 +0000 (11:16 +0800)]
Merge remote-tracking branch 'spi/topic/core' into spi-next

9 years agoMerge tag 'spi-v3.19-rc7' into spi-linus
Mark Brown [Sun, 8 Feb 2015 03:16:38 +0000 (11:16 +0800)]
Merge tag 'spi-v3.19-rc7' into spi-linus

spi: Fixes for v3.19

A couple of driver specific fixes:

 - Disable DMA mode for i.MX6DL chips due to a hardware bug.
 - Don't use devm_kzalloc() outside of bind/unbind paths in the fsl-dspi
   driver, fixing memory leaks.

# gpg: Signature made Thu 05 Feb 2015 05:06:57 HKT using RSA key ID 5D5487D0
# gpg: WARNING: digest algorithm MD5 is deprecated
# gpg: please see http://www.gnupg.org/faq/weak-digest-algos.html for more information
# gpg: Oops: keyid_from_fingerprint: no pubkey
# gpg: key AF88CD16: no public key for trusted key - skipped
# gpg: key AF88CD16 marked as ultimately trusted
# gpg: key 5621E907: no public key for trusted key - skipped
# gpg: key 5621E907 marked as ultimately trusted
# gpg: Good signature from "Mark Brown <broonie@sirena.org.uk>"
# gpg:                 aka "Mark Brown <broonie@debian.org>"
# gpg:                 aka "Mark Brown <broonie@kernel.org>"
# gpg:                 aka "Mark Brown <broonie@tardis.ed.ac.uk>"
# gpg:                 aka "Mark Brown <broonie@linaro.org>"
# gpg:                 aka "Mark Brown <Mark.Brown@linaro.org>"

9 years agoMerge remote-tracking branches 'regulator/topic/rk808', 'regulator/topic/rpm', 'regul...
Mark Brown [Sun, 8 Feb 2015 03:16:30 +0000 (11:16 +0800)]
Merge remote-tracking branches 'regulator/topic/rk808', 'regulator/topic/rpm', 'regulator/topic/rt5033' and 'regulator/topic/tps65023' into regulator-next

9 years agoMerge remote-tracking branches 'regulator/topic/max8649', 'regulator/topic/mode'...
Mark Brown [Sun, 8 Feb 2015 03:16:27 +0000 (11:16 +0800)]
Merge remote-tracking branches 'regulator/topic/max8649', 'regulator/topic/mode', 'regulator/topic/mt6397', 'regulator/topic/pfuze100' and 'regulator/topic/qcom-rpm' into regulator-next

9 years agoMerge remote-tracking branches 'regulator/topic/isl9305', 'regulator/topic/lp872x...
Mark Brown [Sun, 8 Feb 2015 03:16:24 +0000 (11:16 +0800)]
Merge remote-tracking branches 'regulator/topic/isl9305', 'regulator/topic/lp872x', 'regulator/topic/max14577', 'regulator/topic/max7686' and 'regulator/topic/max77843' into regulator-next

9 years agoMerge remote-tracking branches 'regulator/topic/axp20x', 'regulator/topic/da9211...
Mark Brown [Sun, 8 Feb 2015 03:16:23 +0000 (11:16 +0800)]
Merge remote-tracking branches 'regulator/topic/axp20x', 'regulator/topic/da9211' and 'regulator/topic/fan53555' into regulator-next

9 years agoMerge remote-tracking branch 'regulator/topic/dt-cb' into regulator-next
Mark Brown [Sun, 8 Feb 2015 03:16:22 +0000 (11:16 +0800)]
Merge remote-tracking branch 'regulator/topic/dt-cb' into regulator-next

9 years agoMerge remote-tracking branch 'regulator/topic/core' into regulator-next
Mark Brown [Sun, 8 Feb 2015 03:16:21 +0000 (11:16 +0800)]
Merge remote-tracking branch 'regulator/topic/core' into regulator-next

9 years agoMerge remote-tracking branch 'regulator/fix/qcom-rpm' into regulator-linus
Mark Brown [Sun, 8 Feb 2015 03:16:18 +0000 (11:16 +0800)]
Merge remote-tracking branch 'regulator/fix/qcom-rpm' into regulator-linus

9 years agoMerge tag 'regulator-v3.19-rc7' into regulator-linus
Mark Brown [Sun, 8 Feb 2015 03:16:17 +0000 (11:16 +0800)]
Merge tag 'regulator-v3.19-rc7' into regulator-linus

regulator: Fix !REGULATOR builds of systems using system suspend calls

The system suspend calls (used to synchronize between system suspend of
the CPU and it's PMIC) are called from board code but not stubbed when
regulator is disabled causing build failures in such cases.  This patch
fixes those build failures.

# gpg: Signature made Thu 05 Feb 2015 05:10:43 HKT using RSA key ID 5D5487D0
# gpg: WARNING: digest algorithm MD5 is deprecated
# gpg: please see http://www.gnupg.org/faq/weak-digest-algos.html for more information
# gpg: Oops: keyid_from_fingerprint: no pubkey
# gpg: key AF88CD16: no public key for trusted key - skipped
# gpg: key AF88CD16 marked as ultimately trusted
# gpg: key 5621E907: no public key for trusted key - skipped
# gpg: key 5621E907 marked as ultimately trusted
# gpg: Good signature from "Mark Brown <broonie@sirena.org.uk>"
# gpg:                 aka "Mark Brown <broonie@debian.org>"
# gpg:                 aka "Mark Brown <broonie@kernel.org>"
# gpg:                 aka "Mark Brown <broonie@tardis.ed.ac.uk>"
# gpg:                 aka "Mark Brown <broonie@linaro.org>"
# gpg:                 aka "Mark Brown <Mark.Brown@linaro.org>"

9 years agoMerge remote-tracking branches 'regmap/topic/ac97', 'regmap/topic/doc' and 'regmap...
Mark Brown [Sun, 8 Feb 2015 03:16:11 +0000 (11:16 +0800)]
Merge remote-tracking branches 'regmap/topic/ac97', 'regmap/topic/doc' and 'regmap/topic/smbus' into regmap-next

9 years agoMerge remote-tracking branch 'regmap/topic/core' into regmap-next
Mark Brown [Sun, 8 Feb 2015 03:16:10 +0000 (11:16 +0800)]
Merge remote-tracking branch 'regmap/topic/core' into regmap-next

9 years agox86/tlb/trace: Do not trace on CPU that is offline
Steven Rostedt (Red Hat) [Fri, 6 Feb 2015 19:18:19 +0000 (14:18 -0500)]
x86/tlb/trace: Do not trace on CPU that is offline

When taking a CPU down for suspend and resume, a tracepoint may be called
when the CPU has been designated offline. As tracepoints require RCU for
protection, they must not be called if the current CPU is offline.

Unfortunately, trace_tlb_flush() is called in this scenario as was noted
by LOCKDEP:

...

 Disabling non-boot CPUs ...
 intel_pstate CPU 1 exiting

 ===============================
 smpboot: CPU 1 didn't die...
 [ INFO: suspicious RCU usage. ]
 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
 -------------------------------
 include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage!

 other info that might help us debug this:

 RCU used illegally from offline CPU!
 rcu_scheduler_active = 1, debug_locks = 0
 no locks held by swapper/1/0.

 stack backtrace:
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.1-iniza-small #1
 Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
  0000000000000001 ffff88011a44fe18 ffffffff817e370d 0000000000000011
  ffff88011a448290 ffff88011a44fe48 ffffffff810d6847 ffff8800c66b9600
  0000000000000001 ffff88011a44c000 ffffffff81cb3900 ffff88011a44fe78
 Call Trace:
  [<ffffffff817e370d>] dump_stack+0x4c/0x65
  [<ffffffff810d6847>] lockdep_rcu_suspicious+0xe7/0x120
  [<ffffffff810b71a5>] idle_task_exit+0x205/0x2c0
  [<ffffffff81054c4e>] play_dead_common+0xe/0x50
  [<ffffffff81054ca5>] native_play_dead+0x15/0x140
  [<ffffffff8102963f>] arch_cpu_idle_dead+0xf/0x20
  [<ffffffff810cd89e>] cpu_startup_entry+0x37e/0x580
  [<ffffffff81053e20>] start_secondary+0x140/0x150
 intel_pstate CPU 2 exiting

...

By converting the tlb_flush tracepoint to a TRACE_EVENT_CONDITION where the
condition is cpu_online(smp_processor_id()), we can avoid calling RCU protected
code when the CPU is offline.

Link: http://lkml.kernel.org/r/CA+icZUUGiGDoL5NU8RuxKzFjoLjEKRtUWx=JB8B9a0EQv-eGzQ@mail.gmail.com
Cc: stable@vger.kernel.org # 3.17+
Fixes: d17d8f9dedb9 "x86/mm: Add tracepoints for TLB flushes"
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Hansen <dave@sr71.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>