]> git.kernelconcepts.de Git - karo-tx-linux.git/log
karo-tx-linux.git
11 years agonet/core: remove duplicate statements by do-while loop
Akinobu Mita [Wed, 17 Apr 2013 23:49:21 +0000 (09:49 +1000)]
net/core: remove duplicate statements by do-while loop

Remove duplicate statements by using do-while loop instead of while loop.

- A;
- while (e) {
+ do {
A;
- }
+ } while (e);

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agonet/core: rename random32() to prandom_u32()
Akinobu Mita [Wed, 17 Apr 2013 23:49:21 +0000 (09:49 +1000)]
net/core: rename random32() to prandom_u32()

Use preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agonet/netfilter: rename random32() to prandom_u32()
Akinobu Mita [Wed, 17 Apr 2013 23:49:20 +0000 (09:49 +1000)]
net/netfilter: rename random32() to prandom_u32()

Use preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agonet/sched: rename random32() to prandom_u32()
Akinobu Mita [Wed, 17 Apr 2013 23:49:20 +0000 (09:49 +1000)]
net/sched: rename random32() to prandom_u32()

Use preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agonet/sunrpc: rename random32() to prandom_u32()
Akinobu Mita [Wed, 17 Apr 2013 23:49:20 +0000 (09:49 +1000)]
net/sunrpc: rename random32() to prandom_u32()

Use preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers-net-rename-random32-to-prandom_u32-fix
Andrew Morton [Wed, 17 Apr 2013 23:49:20 +0000 (09:49 +1000)]
drivers-net-rename-random32-to-prandom_u32-fix

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers/net: rename random32() to prandom_u32()
Akinobu Mita [Wed, 17 Apr 2013 23:49:19 +0000 (09:49 +1000)]
drivers/net: rename random32() to prandom_u32()

Use preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Thomas Sailer <t.sailer@alumni.ethz.ch>
Acked-by: Bing Zhao <bzhao@marvell.com> [mwifiex]
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Chan <mchan@broadcom.com>
Cc: Thomas Sailer <t.sailer@alumni.ethz.ch>
Cc: Jean-Paul Roubelat <jpr@f6fbb.org>
Cc: Bing Zhao <bzhao@marvell.com>
Cc: Brett Rudley <brudley@broadcom.com>
Cc: Arend van Spriel <arend@broadcom.com>
Cc: "Franky (Zhenhui) Lin" <frankyl@broadcom.com>
Cc: Hante Meuleman <meuleman@broadcom.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoscsi: rename random32() to prandom_u32()
Akinobu Mita [Wed, 17 Apr 2013 23:49:19 +0000 (09:49 +1000)]
scsi: rename random32() to prandom_u32()

Use preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Robert Love <robert.w.love@intel.com>
Cc: James Smart <james.smart@emulex.com>
Cc: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agolguest: rename random32() to prandom_u32()
Akinobu Mita [Wed, 17 Apr 2013 23:49:19 +0000 (09:49 +1000)]
lguest: rename random32() to prandom_u32()

Use preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agouwb: rename random32() to prandom_u32()
Akinobu Mita [Wed, 17 Apr 2013 23:49:18 +0000 (09:49 +1000)]
uwb: rename random32() to prandom_u32()

Use preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agovideo/uvesafb: rename random32() to prandom_u32()
Akinobu Mita [Wed, 17 Apr 2013 23:49:18 +0000 (09:49 +1000)]
video/uvesafb: rename random32() to prandom_u32()

Use preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Michal Januszewski <spock@gentoo.org>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agommc: rename random32() to prandom_u32()
Akinobu Mita [Wed, 17 Apr 2013 23:49:18 +0000 (09:49 +1000)]
mmc: rename random32() to prandom_u32()

Use preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Chris Ball <cjb@laptop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrbd: rename random32() to prandom_u32()
Akinobu Mita [Wed, 17 Apr 2013 23:49:17 +0000 (09:49 +1000)]
drbd: rename random32() to prandom_u32()

Use preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agokernel/: rename random32() to prandom_u32()
Akinobu Mita [Wed, 17 Apr 2013 23:49:17 +0000 (09:49 +1000)]
kernel/: rename random32() to prandom_u32()

Use preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomm/: rename random32() to prandom_u32()
Akinobu Mita [Wed, 17 Apr 2013 23:49:16 +0000 (09:49 +1000)]
mm/: rename random32() to prandom_u32()

Use preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agolib/: rename random32() to prandom_u32()
Akinobu Mita [Wed, 17 Apr 2013 23:49:16 +0000 (09:49 +1000)]
lib/: rename random32() to prandom_u32()

Use preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agox86: rename random32() to prandom_u32()
Akinobu Mita [Wed, 17 Apr 2013 23:49:16 +0000 (09:49 +1000)]
x86: rename random32() to prandom_u32()

Use preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agox86: pageattr-test: remove srandom32 call
Akinobu Mita [Wed, 17 Apr 2013 23:49:16 +0000 (09:49 +1000)]
x86: pageattr-test: remove srandom32 call

pageattr-test calls srandom32() once every test iteration.  But calling
srandom32() after late_initcalls is not meaningfull.  Because the random
states for random32() is mixed by good random numbers in late_initcall
prandom_reseed().

So this removes the call to srandom32().

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agouuid: use prandom_bytes()
Akinobu Mita [Wed, 17 Apr 2013 23:49:15 +0000 (09:49 +1000)]
uuid: use prandom_bytes()

Use prandom_bytes() to generate 16 bytes of pseudo-random bytes.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoraid6test: use prandom_bytes()
Akinobu Mita [Wed, 17 Apr 2013 23:49:15 +0000 (09:49 +1000)]
raid6test: use prandom_bytes()

Use prandom_bytes() to generate random bytes for test data.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Dan Williams <djbw@fb.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: replace kmalloc and then memcpy with kmemdup
Mihnea Dobrescu-Balaur [Wed, 17 Apr 2013 23:49:15 +0000 (09:49 +1000)]
aoe: replace kmalloc and then memcpy with kmemdup

Signed-off-by: Mihnea Dobrescu-Balaur <mihneadb@gmail.com>
Cc: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agonbd: increase default and max request sizes
Michal Belczyk [Wed, 17 Apr 2013 23:49:14 +0000 (09:49 +1000)]
nbd: increase default and max request sizes

Raise the default max request size for nbd to 128KB (from 127KB) to get it
4KB aligned.  This patch also allows the max request size to be increased
(via /sys/block/nbd<x>/queue/max_sectors_kb) to 32MB.

The patch makes nbd network traffic more efficient by:
- reducing request fragmentation (4KB alignment)
- reducing the number of requests (fewer round trips, less network overhead)

Especially in high latency networks, larger request size can make a dramatic

Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Michal Belczyk <belczyk@bsd.krakow.pl>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agopid_namespacec-h-simplify-defines-fix
Andrew Morton [Wed, 17 Apr 2013 23:49:14 +0000 (09:49 +1000)]
pid_namespacec-h-simplify-defines-fix

kernel/pid.c:54:1: warning: "BITS_PER_PAGE" redefined

Cc: Raphael S.Carvalho <raphael.scarv@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agopid_namespace.c/.h: simplify defines
Raphael S.Carvalho [Wed, 17 Apr 2013 23:49:14 +0000 (09:49 +1000)]
pid_namespace.c/.h: simplify defines

Move BITS_PER_PAGE from pid_namespace.c to pid_namespace.h, since we can
simplify the define PID_MAP_ENTRIES by using the BITS_PER_PAGE.

Signed-off-by: Raphael S.Carvalho <raphael.scarv@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agokernel-pidc-improve-flow-of-a-loop-inside-alloc_pidmap-fix
Andrew Morton [Wed, 17 Apr 2013 23:49:13 +0000 (09:49 +1000)]
kernel-pidc-improve-flow-of-a-loop-inside-alloc_pidmap-fix

simplify code

Cc: Raphael S. Carvalho <raphael.scarv@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agokernel/pid.c: improve flow of a loop inside alloc_pidmap.
Raphael S. Carvalho [Wed, 17 Apr 2013 23:49:13 +0000 (09:49 +1000)]
kernel/pid.c: improve flow of a loop inside alloc_pidmap.

find_next_offset() searches for an available "cleaned bit" in the
respective pid bitmap (page), so returns the offset if found, otherwise it
returns a value equals to BITS_PER_PAGE.

For example, suppose find_next_offset didn't find any available bit, so
there's no purpose to call mk_pid (Wasteful Cpu Cycles).

Therefore, I found it could be better to call mk_pid after the checking
(offset < BITS_PER_PAGE) returned sucessfully!  Another point: If (offset
< BITS_PER_PAGE) results in a "failure", then mk_pid would be called again
afterwards.

Signed-off-by: Raphael S. Carvalho <raphael.scarv@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agorbtree_test: add __init/__exit annotations
Davidlohr Bueso [Wed, 17 Apr 2013 23:49:13 +0000 (09:49 +1000)]
rbtree_test: add __init/__exit annotations

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Reviewed-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agorbtree_test: add extra rbtree integrity check
Davidlohr Bueso [Wed, 17 Apr 2013 23:49:13 +0000 (09:49 +1000)]
rbtree_test: add extra rbtree integrity check

Account for the rbtree having  2**bh(v)-1 internal nodes.

While this can be seen as a consequence of other checks, Michel states
that it nicely sums up what the other properties are for.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Reviewed-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomwave: fix info leak in mwave_ioctl()
Dan Carpenter [Wed, 17 Apr 2013 23:49:12 +0000 (09:49 +1000)]
mwave: fix info leak in mwave_ioctl()

Smatch complains that on 64 bit systems, there is a hole in the
MW_ABILITIES struct between ->component_count and ->component_list[].  It
leaks stack information from the mwave_ioctl() function.

I've added a memset() to initialize the struct to zero.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Greg KH <greg@kroah.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc/sem.c: alternatives to preempt_disable()
Manfred Spraul [Wed, 17 Apr 2013 23:49:12 +0000 (09:49 +1000)]
ipc/sem.c: alternatives to preempt_disable()

ipc/sem.c uses a custom wakeup scheme that relies on preempt_disable().
On -RT, this causes increased latencies and debug warnings.

The patch adds two additional schemes:
- one built around a completion - could be better for -RT kernels
- one built around a spinlock - unfortunately it's broken
- and the current one

My preferred solution would be the spinlock implementation: RT would use
premptible spinlocks, mainline normal spinlocks.  Thus both get the
optimal implementation without any special code in ipc/sem.c.
Unfortunately, I don't see how it could be fixed.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc-sysv-shared-memory-limited-to-8tib-fix
Andrew Morton [Wed, 17 Apr 2013 23:49:12 +0000 (09:49 +1000)]
ipc-sysv-shared-memory-limited-to-8tib-fix

make ipc/shm.c:newseg()'s numpages size_t, not int

Cc: <stable@vger.kernel.org>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc: sysv shared memory limited to 8TiB
Robin Holt [Wed, 17 Apr 2013 23:49:11 +0000 (09:49 +1000)]
ipc: sysv shared memory limited to 8TiB

Trying to run an application which was trying to put data into half of
memory using shmget(), we found that having a shmall value below 8EiB-8TiB
would prevent us from using anything more than 8TiB.  By setting
kernel.shmall greater than 8EiB-8TiB would make the job work.

In the newseg() function, ns->shm_tot which, at 8TiB is INT_MAX.

ipc/shm.c:
 458 static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
 459 {
...
 465         int numpages = (size + PAGE_SIZE -1) >> PAGE_SHIFT;
...
 474         if (ns->shm_tot + numpages > ns->shm_ctlall)
 475                 return -ENOSPC;

Signed-off-by: Robin Holt <holt@sgi.com>
Reported-by: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc/msg.c: use list_for_each_entry_[safe] for list traversing
Nikola Pajkovsky [Wed, 17 Apr 2013 23:49:11 +0000 (09:49 +1000)]
ipc/msg.c: use list_for_each_entry_[safe] for list traversing

The ipc/msg.c code does its list operations by hand and it open-codes the
accesses, instead of using for_each_entry_[safe].

Signed-off-by: Nikola Pajkovsky <npajkovs@redhat.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc, sem: prevent possible deadlock
Davidlohr Bueso [Wed, 17 Apr 2013 23:49:11 +0000 (09:49 +1000)]
ipc, sem: prevent possible deadlock

In semctl_main(), when cmd == GETALL, we're locking sma->sem_perm.lock
(through sem_lock_and_putref), yet after the conditional, we lock it
again.  Unlock sma right after exiting the conditional.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agofix for sem_lock
Rik van Riel [Wed, 17 Apr 2013 23:49:10 +0000 (09:49 +1000)]
fix for sem_lock

Fix a typo in sem_lock.  Of course we need to unlock the local
semaphore lock before jumping to lock_all, in the rare case that
somebody started a complex operation while we were spinning on
the spinlock.

Can be folded into patch 7/7 before merging

Signed-off-by: Rik van Riel <riel@redhat.com>
Reported-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc,sem: fine grained locking for semtimedop
Rik van Riel [Wed, 17 Apr 2013 23:49:10 +0000 (09:49 +1000)]
ipc,sem: fine grained locking for semtimedop

Introduce finer grained locking for semtimedop, to handle the common case
of a program wanting to manipulate one semaphore from an array with
multiple semaphores.

If the call is a semop manipulating just one semaphore in an array with
multiple semaphores, only take the lock for that semaphore itself.

If the call needs to manipulate multiple semaphores, or another caller is
in a transaction that manipulates multiple semaphores, the sem_array lock
is taken, as well as all the locks for the individual semaphores.

On a 24 CPU system, performance numbers with the semop-multi
test with N threads and N semaphores, look like this:

vanilla Davidlohr's Davidlohr's + Davidlohr's +
threads patches rwlock patches v3 patches
10 610652 726325 1783589 2142206
20 341570 365699 1520453 1977878
30 288102 307037 1498167 2037995
40 290714 305955 1612665 2256484
50 288620 312890 1733453 2650292
60 289987 306043 1649360 2388008
70 291298 306347 1723167 2717486
80 290948 305662 1729545 2763582
90 290996 306680 1736021 2757524
100 292243 306700 1773700 3059159

Signed-off-by: Rik van Riel <riel@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Chegu Vinod <chegu_vinod@hp.com>
Cc: Emmanuel Benisty <benisty.e@gmail.com>
Cc: Jason Low <jason.low2@hp.com>
Reviewed-by: Michel Lespinasse <walken@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc,sem: have only one list in struct sem_queue
Rik van Riel [Wed, 17 Apr 2013 23:49:10 +0000 (09:49 +1000)]
ipc,sem: have only one list in struct sem_queue

Having only one list in struct sem_queue, and only queueing simple
semaphore operations on the list for the semaphore involved, allows us to
introduce finer grained locking for semtimedop.

Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Chegu Vinod <chegu_vinod@hp.com>
Cc: Emmanuel Benisty <benisty.e@gmail.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipcsem-open-code-and-rename-sem_lock-fix
Andrew Morton [Wed, 17 Apr 2013 23:49:10 +0000 (09:49 +1000)]
ipcsem-open-code-and-rename-sem_lock-fix

propagate the ipc_obtain_object() errno out of sem_obtain_lock()

Cc: Chegu Vinod <chegu_vinod@hp.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Emmanuel Benisty <benisty.e@gmail.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc,sem: open code and rename sem_lock
Rik van Riel [Wed, 17 Apr 2013 23:49:09 +0000 (09:49 +1000)]
ipc,sem: open code and rename sem_lock

Rename sem_lock() to sem_obtain_lock(), so we can introduce a sem_lock()
later that only locks the sem_array and does nothing else.

Open code the locking from ipc_lock() in sem_obtain_lock() so we can
introduce finer grained locking for the sem_array in the next patch.

Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Chegu Vinod <chegu_vinod@hp.com>
Cc: Emmanuel Benisty <benisty.e@gmail.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipcsem-do-not-hold-ipc-lock-more-than-necessary-fix-checkpatch-fixes
Andrew Morton [Wed, 17 Apr 2013 23:49:09 +0000 (09:49 +1000)]
ipcsem-do-not-hold-ipc-lock-more-than-necessary-fix-checkpatch-fixes

ERROR: space required before the open parenthesis '('
#25: FILE: ipc/sem.c:1101:
+ if(semnum < 0 || semnum >= nsems) {

total: 1 errors, 0 warnings, 12 lines checked

./patches/ipcsem-do-not-hold-ipc-lock-more-than-necessary-fix.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc,sem: prevent releasing RCU read lock twice in semctl_main
Sasha Levin [Wed, 17 Apr 2013 23:49:09 +0000 (09:49 +1000)]
ipc,sem: prevent releasing RCU read lock twice in semctl_main

We will release RCU read lock twice if the sem id passed from userspace is
not within the legal range in semctl_main.

Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc,sem: do not hold ipc lock more than necessary
Davidlohr Bueso [Wed, 17 Apr 2013 23:49:08 +0000 (09:49 +1000)]
ipc,sem: do not hold ipc lock more than necessary

Instead of holding the ipc lock for permissions and security checks, among
others, only acquire it when necessary.

Some numbers....

1) With Rik's semop-multi.c microbenchmark we can see the following
   results:

Baseline (3.9-rc1):
cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
total operations: 151452270, ops/sec 5048409

+  59.40%            a.out  [kernel.kallsyms]  [k] _raw_spin_lock
+   6.14%            a.out  [kernel.kallsyms]  [k] sys_semtimedop
+   3.84%            a.out  [kernel.kallsyms]  [k] avc_has_perm_flags
+   3.64%            a.out  [kernel.kallsyms]  [k] __audit_syscall_exit
+   2.06%            a.out  [kernel.kallsyms]  [k] copy_user_enhanced_fast_string
+   1.86%            a.out  [kernel.kallsyms]  [k] ipc_lock

With this patchset:
cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
total operations: 273156400, ops/sec 9105213

+  18.54%            a.out  [kernel.kallsyms]  [k] _raw_spin_lock
+  11.72%            a.out  [kernel.kallsyms]  [k] sys_semtimedop
+   7.70%            a.out  [kernel.kallsyms]  [k] ipc_has_perm.isra.21
+   6.58%            a.out  [kernel.kallsyms]  [k] avc_has_perm_flags
+   6.54%            a.out  [kernel.kallsyms]  [k] __audit_syscall_exit
+   4.71%            a.out  [kernel.kallsyms]  [k] ipc_obtain_object_check

2) While on an Oracle swingbench DSS (data mining) workload the
   improvements are not as exciting as with Rik's benchmark, we can see
   some positive numbers.  For an 8 socket machine the following are the
   percentages of %sys time incurred in the ipc lock:

Baseline (3.9-rc1):
100 swingbench users: 8,74%
400 swingbench users: 21,86%
800 swingbench users: 84,35%

With this patchset:
100 swingbench users: 8,11%
400 swingbench users: 19,93%
800 swingbench users: 77,69%

[riel@redhat.com: fix two locking bugs]
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Chegu Vinod <chegu_vinod@hp.com>
Acked-by: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Emmanuel Benisty <benisty.e@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc: introduce lockless pre_down ipcctl
Davidlohr Bueso [Wed, 17 Apr 2013 23:49:08 +0000 (09:49 +1000)]
ipc: introduce lockless pre_down ipcctl

Various forms of ipc use ipcctl_pre_down() to retrieve an ipc object and
check permissions, mostly for IPC_RMID and IPC_SET commands.

Introduce ipcctl_pre_down_nolock(), a lockless version of this function.
The locking version is retained, yet modified to call the nolock version
without affecting its semantics, thus transparent to all ipc callers.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chegu Vinod <chegu_vinod@hp.com>
Cc: Emmanuel Benisty <benisty.e@gmail.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc-introduce-obtaining-a-lockless-ipc-object-fix
Andrew Morton [Wed, 17 Apr 2013 23:49:08 +0000 (09:49 +1000)]
ipc-introduce-obtaining-a-lockless-ipc-object-fix

propagate the ipc_obtain_object() errno from ipc_lock()

Cc: Chegu Vinod <chegu_vinod@hp.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Emmanuel Benisty <benisty.e@gmail.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc: introduce obtaining a lockless ipc object
Davidlohr Bueso [Wed, 17 Apr 2013 23:49:08 +0000 (09:49 +1000)]
ipc: introduce obtaining a lockless ipc object

Through ipc_lock() and therefore ipc_lock_check() we currently return the
locked ipc object.  This is not necessary for all situations and can,
therefore, cause unnecessary ipc lock contention.

Introduce analogous ipc_obtain_object() and ipc_obtain_object_check()
functions that only lookup and return the ipc object.

Both these functions must be called within the RCU read critical section.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Chegu Vinod <chegu_vinod@hp.com>
Acked-by: Michel Lespinasse <walken@google.com>
Cc: Emmanuel Benisty <benisty.e@gmail.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc: remove bogus lock comment for ipc_checkid
Davidlohr Bueso [Wed, 17 Apr 2013 23:49:07 +0000 (09:49 +1000)]
ipc: remove bogus lock comment for ipc_checkid

This series makes the sysv semaphore code more scalable, by reducing the
time the semaphore lock is held, and making the locking more scalable for
semaphore arrays with multiple semaphores.

The first four patches were written by Davidlohr Buesso, and reduce the
hold time of the semaphore lock.

The last three patches change the sysv semaphore code locking to be more
fine grained, providing a performance boost when multiple semaphores in a
semaphore array are being manipulated simultaneously.

On a 24 CPU system, performance numbers with the semop-multi
test with N threads and N semaphores, look like this:

vanilla Davidlohr's Davidlohr's + Davidlohr's +
threads patches rwlock patches v3 patches
10 610652 726325 1783589 2142206
20 341570 365699 1520453 1977878
30 288102 307037 1498167 2037995
40 290714 305955 1612665 2256484
50 288620 312890 1733453 2650292
60 289987 306043 1649360 2388008
70 291298 306347 1723167 2717486
80 290948 305662 1729545 2763582
90 290996 306680 1736021 2757524
100 292243 306700 1773700 3059159

This patch:

There is no reason to be holding the ipc lock while reading ipcp->seq,
hence remove misleading comment.

Also simplify the return value for the function.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Chegu Vinod <chegu_vinod@hp.com>
Cc: Emmanuel Benisty <benisty.e@gmail.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc/msgutil.c: use linux/uaccess.h
HoSung Jung [Wed, 17 Apr 2013 23:49:07 +0000 (09:49 +1000)]
ipc/msgutil.c: use linux/uaccess.h

Signed-off-by: HoSung Jung <rain6557@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc: find_msg can be static
Fengguang Wu [Wed, 17 Apr 2013 23:49:07 +0000 (09:49 +1000)]
ipc: find_msg can be static

Fixes sparse warning identified by Fengguang's test robot:
   ipc/msg.c:810:16: sparse: symbol 'find_msg' was not declared. Should it be static?

Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc: refactor msg list search into separate function
Peter Hurley [Wed, 17 Apr 2013 23:49:06 +0000 (09:49 +1000)]
ipc: refactor msg list search into separate function

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc: simplify msg list search
Peter Hurley [Wed, 17 Apr 2013 23:49:06 +0000 (09:49 +1000)]
ipc: simplify msg list search

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc: implement MSG_COPY as a new receive mode
Peter Hurley [Wed, 17 Apr 2013 23:49:06 +0000 (09:49 +1000)]
ipc: implement MSG_COPY as a new receive mode

Teach the helper routines about MSG_COPY so that msgtyp is preserved as
the message number to copy.

The security functions affected by this change were audited and no
additional changes are necessary.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc: remove msg handling from queue scan
Peter Hurley [Wed, 17 Apr 2013 23:49:06 +0000 (09:49 +1000)]
ipc: remove msg handling from queue scan

In preparation for refactoring the queue scan into a separate
function, relocate msg copying.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc: set EFAULT as default error in load_msg()
Peter Hurley [Wed, 17 Apr 2013 23:49:05 +0000 (09:49 +1000)]
ipc: set EFAULT as default error in load_msg()

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc: tighten msg copy loops
Peter Hurley [Wed, 17 Apr 2013 23:49:05 +0000 (09:49 +1000)]
ipc: tighten msg copy loops

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc: separate msg allocation from userspace copy
Peter Hurley [Wed, 17 Apr 2013 23:49:05 +0000 (09:49 +1000)]
ipc: separate msg allocation from userspace copy

Separating msg allocation enables single-block vmalloc
allocation instead.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc: clamp with min()
Peter Hurley [Wed, 17 Apr 2013 23:49:04 +0000 (09:49 +1000)]
ipc: clamp with min()

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agosctp: convert sctp_assoc_set_id() to use idr_alloc_cyclic()
Jeff Layton [Wed, 17 Apr 2013 23:49:04 +0000 (09:49 +1000)]
sctp: convert sctp_assoc_set_id() to use idr_alloc_cyclic()

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoinotify: convert inotify_add_to_idr() to use idr_alloc_cyclic()
Jeff Layton [Wed, 17 Apr 2013 23:49:04 +0000 (09:49 +1000)]
inotify: convert inotify_add_to_idr() to use idr_alloc_cyclic()

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Robert Love <rlove@rlove.org>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agonfsd: convert nfs4_alloc_stid() to use idr_alloc_cyclic()
Jeff Layton [Wed, 17 Apr 2013 23:49:04 +0000 (09:49 +1000)]
nfsd: convert nfs4_alloc_stid() to use idr_alloc_cyclic()

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers/infiniband/hw/mlx4: convert to using idr_alloc_cyclic()
Jeff Layton [Wed, 17 Apr 2013 23:49:03 +0000 (09:49 +1000)]
drivers/infiniband/hw/mlx4: convert to using idr_alloc_cyclic()

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jack Morgenstein <jackm@dev.mellanox.co.il>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodrivers/infiniband/hw/amso1100: convert to using idr_alloc_cyclic
Jeff Layton [Wed, 17 Apr 2013 23:49:03 +0000 (09:49 +1000)]
drivers/infiniband/hw/amso1100: convert to using idr_alloc_cyclic

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoidr: introduce idr_alloc_cyclic()
Jeff Layton [Wed, 17 Apr 2013 23:49:03 +0000 (09:49 +1000)]
idr: introduce idr_alloc_cyclic()

As Tejun points out, there are several users of the IDR facility that
attempt to use it in a cyclic fashion.  These users are likely to see
-ENOSPC errors after the counter wraps one or more times however.

This patchset adds a new idr_alloc_cyclic routine and converts several of
these users to it.  Many of these users are in obscure parts of the
kernel, and I don't have a good way to test some of them.  The change is
pretty straightforward though, so hopefully it won't be an issue.

There is one other cyclic user of idr_alloc that I didn't touch in
ipc/util.c.  That one is doing some strange stuff that I didn't quite
understand, but it looks like it should probably be converted later
somehow.

This patch:

Thus spake Tejun Heo:

    Ooh, BTW, the cyclic allocation is broken.  It's prone to -ENOSPC
    after the first wraparound.  There are several cyclic users in the
    kernel and I think it probably would be best to implement cyclic
    support in idr.

This patch does that by adding new idr_alloc_cyclic function that such
users in the kernel can use.  With this, there's no need for a caller to
keep track of the last value used as that's now tracked internally.  This
should prevent the ENOSPC problems that can hit when the "last allocated"
counter exceeds INT_MAX.

Later patches will convert existing cyclic users to the new interface.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Jack Morgenstein <jackm@dev.mellanox.co.il>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Robert Love <rlove@rlove.org>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Tom Tucker <tom@opengridcomputing.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agokexec-use-min_t-to-simplify-logic-fix
Andrew Morton [Wed, 17 Apr 2013 23:49:02 +0000 (09:49 +1000)]
kexec-use-min_t-to-simplify-logic-fix

replace min_t with min, remove unneeded casts

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Joe Perches <joe@perches.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agokexec: Use min() and min_t() to simplify logic
Zhang Yanfei [Wed, 17 Apr 2013 23:49:02 +0000 (09:49 +1000)]
kexec: Use min() and min_t() to simplify logic

Simplify the logic of variable assignments.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Reviewed-by: Simon Horman <horms@verge.net.au>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agokexec: fix wrong types of some local variables
Zhang Yanfei [Wed, 17 Apr 2013 23:49:02 +0000 (09:49 +1000)]
kexec: fix wrong types of some local variables

The types of the following local variables:

- ubytes/mbytes in kimage_load_crash_segment()/kimage_load_normal_segment()

- r in vmcoreinfo_append_str()

are wrong, so fix them.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoexec: do not abuse ->cred_guard_mutex in threadgroup_lock()
Oleg Nesterov [Wed, 17 Apr 2013 23:49:01 +0000 (09:49 +1000)]
exec: do not abuse ->cred_guard_mutex in threadgroup_lock()

threadgroup_lock() takes signal->cred_guard_mutex to ensure that
thread_group_leader() is stable.  This doesn't look nice, the scope of
this lock in do_execve() is huge.

And as Dave pointed out this can lead to deadlock, we have the
following dependencies:

do_execve: cred_guard_mutex -> i_mutex
cgroup_mount: i_mutex -> cgroup_mutex
attach_task_by_pid: cgroup_mutex -> cred_guard_mutex

Change de_thread() to take threadgroup_change_begin() around the
switch-the-leader code and change threadgroup_lock() to avoid
->cred_guard_mutex.

Note that de_thread() can't sleep with ->group_rwsem held, this can
obviously deadlock with the exiting leader if the writer is active, so it
does threadgroup_change_end() before schedule().

Reported-by: Dave Jones <davej@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoset_task_comm: kill the pointless memset() + wmb()
Oleg Nesterov [Wed, 17 Apr 2013 23:49:01 +0000 (09:49 +1000)]
set_task_comm: kill the pointless memset() + wmb()

set_task_comm() does memset() + wmb() before strlcpy().  This buys nothing
and to add to the confusion, the comment is wrong.

- We do not need memset() to be "safe from non-terminating string
  reads", the final char is always zero and we never change it.

- wmb() is paired with nothing, it cannot prevent from printing
  the mixture of the old/new data unless the reader takes the lock.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agofs, proc: truncate /proc/pid/comm writes to first TASK_COMM_LEN bytes
David Rientjes [Wed, 17 Apr 2013 23:49:01 +0000 (09:49 +1000)]
fs, proc: truncate /proc/pid/comm writes to first TASK_COMM_LEN bytes

Currently, a write to a procfs file will return the number of bytes
successfully written.  If the actual string is longer than this, the
remainder of the string will not be be written and userspace will complete
the operation by issuing additional write()s.

Hence

$ echo -n "abcdefghijklmnopqrs" > /proc/self/comm

results in

$ cat /proc/$$/comm
pqrs

since the final four bytes were written with a second write() since
TASK_COMM_LEN == 16.  This is obviously an undesired result and not
equivalent to prctl(PR_SET_NAME).  The implementation should not need to
know the definition of TASK_COMM_LEN.

This patch truncates the string to the first TASK_COMM_LEN bytes and
returns the bytes written as the length of the string written so the
second write() is suppressed.

$ cat /proc/$$/comm
abcdefghijklmno

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agocoredump: change wait_for_dump_helpers() to use wait_event_interruptible()
Oleg Nesterov [Wed, 17 Apr 2013 23:49:00 +0000 (09:49 +1000)]
coredump: change wait_for_dump_helpers() to use wait_event_interruptible()

wait_for_dump_helpers() calls wake_up/kill_fasync from inside the
wait_event-like loop.  This is not needed and in fact this is not strictly
correct, we can/should do this only once after we change pipe->writers.
We could even check if it becomes zero.

Change this code to use use wait_event_interruptible(), this can also help
to make this wait freezable.

With this patch we check pipe->readers without pipe_lock(), this is fine.
Once we see pipe->readers == 1 we know that the handler decremented the
counter, this is all we need.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Neil Horman <nhorman@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agocoredump: factor out the setting of PF_DUMPCORE
Oleg Nesterov [Wed, 17 Apr 2013 23:49:00 +0000 (09:49 +1000)]
coredump: factor out the setting of PF_DUMPCORE

Cleanup.  Every linux_binfmt->core_dump() sets PF_DUMPCORE, move this into
zap_threads() called by do_coredump().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Neil Horman <nhorman@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agocoredump: introduce dump_interrupted()
Oleg Nesterov [Wed, 17 Apr 2013 23:49:00 +0000 (09:49 +1000)]
coredump: introduce dump_interrupted()

By discussion with Mandeep.

Change dump_write(), dump_seek() and do_coredump() to check
signal_pending() and abort if it is true.  dump_seek() does this only
before f_op->llseek(), otherwise it relies on dump_write().

We need this change to ensure that the coredump won't delay suspend, and
to ensure it reacts to SIGKILL "quickly enough", a core dump can take a
lot of time.  In particular this can help oom-killer.

We add the new trivial helper, dump_interrupted() to add the comments and
to simplify the potential freezer changes.  Perhaps it will have more
callers.

Ideally it should do try_to_freeze() but then we need the unpleasant
changes in dump_write() and wait_for_dump_helpers().  It is not trivial to
change dump_write() to restart if f_op->write() fails because of
freezing().  We need to handle the short writes, we need to clear
TIF_SIGPENDING (and we can't rely on recalc_sigpending() unless we change
it to check PF_DUMPCORE).  And if the buggy f_op->write() sets
TIF_SIGPENDING we can not distinguish this case from the race with
freeze_task() + __thaw_task().

So we simply accept the fact that the freezer can truncate a core-dump but
at least you can reliably suspend.  Hopefully we can tolerate this
unlikely case and the necessary complications doesn't worth a trouble.
But if we decide to make the coredumping freezable later we can do this on
top of this change.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Neil Horman <nhorman@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agocoredump: sanitize the setting of signal->group_exit_code
Oleg Nesterov [Wed, 17 Apr 2013 23:49:00 +0000 (09:49 +1000)]
coredump: sanitize the setting of signal->group_exit_code

Now that the coredumping process can be SIGKILL'ed, the setting of
->group_exit_code in do_coredump() can race with complete_signal() and
SIGKILL or 0x80 can be "lost", or wait(status) can report status ==
SIGKILL | 0x80.

But the main problem is that it is not clear to me what should we do if
binfmt->core_dump() succeeds but SIGKILL was sent, that is why this patch
comes as a separate change.

This patch adds 0x80 if ->core_dump() succeeds and the process was not
killed.  But perhaps we can (should?) re-set ->group_exit_code changed by
SIGKILL back to "siginfo->si_signo |= 0x80" in case when core_dumped == T.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agocoredump: ensure that SIGKILL always kills the dumping thread
Oleg Nesterov [Wed, 17 Apr 2013 23:48:59 +0000 (09:48 +1000)]
coredump: ensure that SIGKILL always kills the dumping thread

prepare_signal() blesses SIGKILL sent to the dumping process but this
signal can be "lost" anyway.  The problems is, complete_signal() sees
SIGNAL_GROUP_EXIT and skips the "kill them all" logic.  And even if the
dumping process is single-threaded (so the target is always "correct"),
the group-wide SIGKILL is not recorded in task->pending and thus
__fatal_signal_pending() won't be true.  A multi-threaded case has even
more problems.

And even ignoring all technical details, SIGNAL_GROUP_EXIT doesn't look
right to me.  This coredumping process is not exiting yet, it can do a lot
of work dumping the core.

With this patch the dumping process doesn't have SIGNAL_GROUP_EXIT, we set
signal->group_exit_task instead.  This makes signal_group_exit() true and
thus this should equally close the races with exit/exec/stop but allows to
kill the dumping thread reliably.

Notes:
- It is not clear what should we do with ->group_exit_code
  if the dumper was killed, see the next change.

- we need more (hopefully straightforward) changes to ensure
  that SIGKILL actually interrupts the coredump. Basically we
  need to check __fatal_signal_pending() in dump_write() and
  dump_seek().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agocoredump: only SIGKILL should interrupt the coredumping task
Oleg Nesterov [Wed, 17 Apr 2013 23:48:59 +0000 (09:48 +1000)]
coredump: only SIGKILL should interrupt the coredumping task

There are 2 well known and ancient problems with coredump/signals, and a
lot of related bug reports:

- do_coredump() clears TIF_SIGPENDING but of course this can't help
  if, say, SIGCHLD comes after that.

  In this case the coredump can fail unexpectedly. See for example
  wait_for_dump_helper()->signal_pending() check but there are other
  reasons.

- At the same time, dumping a huge core on the slow media can take a
  lot of time/resources and there is no way to kill the coredumping
  task reliably. In particular this is not oom_kill-friendly.

This patch tries to fix the 1st problem, and makes the preparation for the
next changes.

We add the new SIGNAL_GROUP_COREDUMP flag set by zap_threads() to indicate
that this process dumps the core.  prepare_signal() checks this flag and
nacks any signal except SIGKILL.

Note that this check tries to be conservative, in the long term we should
probably treat the SIGNAL_GROUP_EXIT case equally but this needs more
discussion.  See marc.info/?l=linux-kernel&m=120508897917439

Notes:
- recalc_sigpending() doesn't check SIGNAL_GROUP_COREDUMP.
  The patch assumes that dump_write/etc paths should never
  call it, but we can change it as well.

- There is another source of TIF_SIGPENDING, freezer. This
  will be addressed separately.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agokmod: remove call_usermodehelper_fns()
Lucas De Marchi [Wed, 17 Apr 2013 23:48:59 +0000 (09:48 +1000)]
kmod: remove call_usermodehelper_fns()

This function suffers from not being able to determine if the cleanup is
called in case it returns -ENOMEM.  Nobody is using it anymore, so let's
remove it.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agousermodehelper: split remaining calls to call_usermodehelper_fns()
Lucas De Marchi [Wed, 17 Apr 2013 23:48:58 +0000 (09:48 +1000)]
usermodehelper: split remaining calls to call_usermodehelper_fns()

These are the only users of call_usermodehelper_fns().  This function
suffers from not being able to determine if the cleanup is called.  Even
if in this places the cleanup pointer is NULL, convert them to use the
separate call_usermodehelper_setup() + call_usermodehelper_exec()
functions so we can remove the _fns variant.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agocoredump: remove trailling whitespace
Lucas De Marchi [Wed, 17 Apr 2013 23:48:58 +0000 (09:48 +1000)]
coredump: remove trailling whitespace

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoKEYS: split call to call_usermodehelper_fns()
Lucas De Marchi [Wed, 17 Apr 2013 23:48:58 +0000 (09:48 +1000)]
KEYS: split call to call_usermodehelper_fns()

Use call_usermodehelper_setup() + call_usermodehelper_exec() instead of
calling call_usermodehelper_fns().  In case there's an OOM in this last
function the cleanup function may not be called - in this case we would
miss a call to key_put().

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agokmod: split call to call_usermodehelper_fns()
Lucas De Marchi [Wed, 17 Apr 2013 23:48:57 +0000 (09:48 +1000)]
kmod: split call to call_usermodehelper_fns()

Use call_usermodehelper_setup() + call_usermodehelper_exec() instead of
calling call_usermodehelper_fns().  In case the latter returns -ENOMEM the
cleanup function may had not been called - in this case we would not free
argv and module_name.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agousermodehelper-export-_exec-and-_setup-functions-fix
Andrew Morton [Wed, 17 Apr 2013 23:48:57 +0000 (09:48 +1000)]
usermodehelper-export-_exec-and-_setup-functions-fix

export call_usermodehelper_setup() to modules

Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agousermodehelper: export call_usermodehelper_exec() and call_usermodehelper_setup()
Lucas De Marchi [Wed, 17 Apr 2013 23:48:57 +0000 (09:48 +1000)]
usermodehelper: export call_usermodehelper_exec() and call_usermodehelper_setup()

call_usermodehelper_setup() + call_usermodehelper_exec() need to be called
instead of call_usermodehelper_fns() when the cleanup function needs to be
called even when an ENOMEM error occurs.  In this case using
call_usermodehelper_fns() the user can't distinguish if the cleanup
function was called or not.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoselftest: add a test case for PTRACE_PEEKSIGINFO
Andrey Vagin [Wed, 17 Apr 2013 23:48:57 +0000 (09:48 +1000)]
selftest: add a test case for PTRACE_PEEKSIGINFO

* Dump signals from process-wide and per-thread queues with
  different sizes of buffers.
* Check error paths for buffers with restricted permissions. A part of
  buffer or a whole buffer is for read-only.
* Try to get nonexistent signal.

Signed-off-by: Andrew Vagin <avagin@openvz.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pedro Alves <palves@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoptrace: add ability to retrieve signals without removing from a queue (v4)
Andrey Vagin [Wed, 17 Apr 2013 23:48:56 +0000 (09:48 +1000)]
ptrace: add ability to retrieve signals without removing from a queue (v4)

This patch adds a new ptrace request PTRACE_PEEKSIGINFO.

This request is used to retrieve information about pending signals
starting with the specified sequence number.  Siginfo_t structures are
copied from the child into the buffer starting at "data".

The argument "addr" is a pointer to struct ptrace_peeksiginfo_args.
struct ptrace_peeksiginfo_args {
u64 off; /* from which siginfo to start */
u32 flags;
s32 nr; /* how may siginfos to take */
};

"nr" has type "s32", because ptrace() returns "long", which has 32 bits on
i386 and a negative values is used for errors.

Currently here is only one flag PTRACE_PEEKSIGINFO_SHARED for dumping
signals from process-wide queue.  If this flag is not set, signals are
read from a per-thread queue.

The request PTRACE_PEEKSIGINFO returns a number of dumped signals.  If a
signal with the specified sequence number doesn't exist, ptrace returns
zero.  The request returns an error, if no signal has been dumped.

Errors:
EINVAL - one or more specified flags are not supported or nr is negative
EFAULT - buf or addr is outside your accessible address space.

A result siginfo contains a kernel part of si_code which usually striped,
but it's required for queuing the same siginfo back during restore of
pending signals.

This functionality is required for checkpointing pending signals.  Pedro
Alves suggested using it in "gdb" to peek at pending signals.  gdb already
uses PTRACE_GETSIGINFO to get the siginfo for the signal which was already
dequeued.  This functionality allows gdb to look at the pending signals
which were not reported yet.

The prototype of this code was developed by Oleg Nesterov.

Signed-off-by: Andrew Vagin <avagin@openvz.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pedro Alves <palves@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agofat: additions to support fat_fallocate
Namjae Jeon [Wed, 17 Apr 2013 23:48:56 +0000 (09:48 +1000)]
fat: additions to support fat_fallocate

Implement preallocation via the fallocate syscall on VFAT partitions.

With FALLOC_FL_KEEP_SIZE, there is no way to distinguish if the mismatch
between i_size and no.  of clusters allocated is a consequence of
fallocate or just plain corruption.  When a non fallocate aware (old)
linux fat driver tries to write to such a file, it throws an error.Also,
fsck detects this as inconsistency and truncates the prealloc'd blocks.

To avoid this, as suggested by OGAWA, remove changes that make fallocate
persistent across mounts and restrict lifetime of blocks from fallocate(2)
to file release.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ravishankar N <ravi.n1@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoDocumentation: update nfs option in filesystem/vfat.txt
Namjae Jeon [Wed, 17 Apr 2013 23:48:56 +0000 (09:48 +1000)]
Documentation: update nfs option in filesystem/vfat.txt

Add descriptions about 'stale_rw' and 'nostale_ro' nfs options in
filesystem/vfat.txt

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ravishankar N <ravi.n1@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Acked-by: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agofat (exportfs): rebuild directory-inode if fat_dget()
Namjae Jeon [Wed, 17 Apr 2013 23:48:55 +0000 (09:48 +1000)]
fat (exportfs): rebuild directory-inode if fat_dget()

This patch enables rebuilding of directory inodes which are not present in
the cache.This is done by traversing the disk clusters to find the
directory entry of the parent directory and using its i_pos to build the
inode.

The traversal is done by fat_scan_logstart() which is similar to
fat_scan() but matches i_pos values instead of names.fat_scan_logstart()
needs an inode parameter to work, for which a dummy inode is created by
it's caller fat_rebuild_parent().  This dummy inode is destroyed after the
traversal completes.

All this is done  only if the nostale_ro nfs mount option is specified.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ravishankar N <ravi.n1@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agofat (exportfs): rebuild inode if ilookup() fails
Namjae Jeon [Wed, 17 Apr 2013 23:48:55 +0000 (09:48 +1000)]
fat (exportfs): rebuild inode if ilookup() fails

If the cache lookups fail,use the i_pos value to find the directory entry
of the inode and rebuild the inode.Since this involves accessing the FAT
media, do this only if the nostale_ro nfs mount option is specified.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ravishankar N <ravi.n1@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agofat: restructure export_operations
Namjae Jeon [Wed, 17 Apr 2013 23:48:55 +0000 (09:48 +1000)]
fat: restructure export_operations

Define two nfs export_operation structures,one for 'stale_rw' mounts and
the other for 'nostale_ro'.  The latter uses i_pos as a basis for encoding
and decoding file handles.

Also, assign i_pos to kstat->ino.  The logic for rebuilding the inode is
added in the subsequent patches.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ravishankar N <ravi.n1@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agofat: introduce a helper fat_get_blknr_offset()
Namjae Jeon [Wed, 17 Apr 2013 23:48:54 +0000 (09:48 +1000)]
fat: introduce a helper fat_get_blknr_offset()

Introduce helper function to get the block number and offset for a given
i_pos value.  Use it in __fat_write_inode() now and later on in nfs.c

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ravishankar N <ravi.n1@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agofat: move fat_i_pos_read to fat.h
Namjae Jeon [Wed, 17 Apr 2013 23:48:54 +0000 (09:48 +1000)]
fat: move fat_i_pos_read to fat.h

Move fat_i_pos_read to fat.h so that it can be called from nfs.c in the
subsequent patches to encode the file handle.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ravishankar N <ravi.n1@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agofat: introduce 2 new values for the -o nfs mount option
Namjae Jeon [Wed, 17 Apr 2013 23:48:54 +0000 (09:48 +1000)]
fat: introduce 2 new values for the -o nfs mount option

This patchset eliminates the client side ESTALE errors when a FAT
partition exported over NFS has its dentries evicted from the cache.  The
idea is to find the on-disk location_'i_pos' of the dirent of the inode
that has been evicted and use it to rebuild the inode.

This patch:

Provide two possible values 'stale_rw' and 'nostale_ro' for the -o nfs
mount option.The first one allows all file operations but does not reduce
ESTALE errors on memory constrained systems.  The second one eliminates
ESTALE errors but mounts the filesystem as read-only.  Not specifying a
value defaults to 'stale_rw'.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ravishankar N <ravi.n1@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agohfsplus: add error propagation to __hfsplus_ext_write_extent()
Alexey Khoroshilov [Wed, 17 Apr 2013 23:48:54 +0000 (09:48 +1000)]
hfsplus: add error propagation to __hfsplus_ext_write_extent()

__hfsplus_ext_write_extent() suppresses errors coming from
hfs_brec_find().  The patch implements error code propagation.

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agohfs/hfsplus: convert printks to pr_<level>
Joe Perches [Wed, 17 Apr 2013 23:48:53 +0000 (09:48 +1000)]
hfs/hfsplus: convert printks to pr_<level>

Use a more current logging style.

Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
hfsplus now uses "hfsplus: " for all messages.
Coalesce formats.
Prefix debugging messages too.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agohfs/hfsplus: convert dprint to hfs_dbg
Joe Perches [Wed, 17 Apr 2013 23:48:53 +0000 (09:48 +1000)]
hfs/hfsplus: convert dprint to hfs_dbg

Use a more current logging style.

Rename macro and uses.
Add do {} while (0) to macro.
Add DBG_ to macro.
Add and use hfs_dbg_cont variant where appropriate.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agohfsplus-fix-warnings-in-fs-hfsplus-bfindc-in-function-hfs_find_1st_rec_by_cnid-fix
Andrew Morton [Wed, 17 Apr 2013 23:48:53 +0000 (09:48 +1000)]
hfsplus-fix-warnings-in-fs-hfsplus-bfindc-in-function-hfs_find_1st_rec_by_cnid-fix

make the workaround more explicit

Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agohfsplus: fix warnings in fs/hfsplus/bfind.c
Vyacheslav Dubeyko [Wed, 17 Apr 2013 23:48:52 +0000 (09:48 +1000)]
hfsplus: fix warnings in fs/hfsplus/bfind.c

fs/hfsplus/bfind.c: In function 'hfs_find_1st_rec_by_cnid':
(1) include/uapi/linux/swab.h:60:2: warning: 'search_cnid' may be used uninitialized in this function [-Wmaybe-uninitialized]
(2) include/uapi/linux/swab.h:60:2: warning: 'cur_cnid' may be used uninitialized in this function [-Wmaybe-uninitialized]

Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agohfs: add error checking for hfs_find_init()
Alexey Khoroshilov [Wed, 17 Apr 2013 23:48:52 +0000 (09:48 +1000)]
hfs: add error checking for hfs_find_init()

hfs_find_init() may fail with ENOMEM, but there are places,
where the returned value is not checked. The consequences can be
very unpleasant, e.g. kfree uninitialized pointer and
inappropriate mutex unlocking.

The patch adds checks for errors in hfs_find_init().

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agonilfs2: fix issue with flush kernel thread after remount in RO mode because of driver...
Vyacheslav Dubeyko [Wed, 17 Apr 2013 23:48:52 +0000 (09:48 +1000)]
nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption

DESCRIPTION:

NILFS2 driver remounts itself in RO mode in the case of metadata
corruption discovering (for example, in the case of broken bmap
discovering).  But, usually, it takes place file system operations before
remounting in RO mode.  Thereby, NILFS2 driver can be in RO mode with
presence of dirty pages in modified inodes' address spaces.  It results in
flush kernel thread's infinite trying to flush dirty pages in RO mode.  As
a result, it is possible to see such side effects as: (1) flush kernel
thread occupies 50% - 99% of CPU time; (2) system can't be shutdowned
without manual power switch off.

SYMPTOMS:
(1) System log contains error message: "Remounting filesystem read-only".
(2) The flush kernel thread occupies 50% - 99% of CPU time.
(3) The system can't be shutdowned without manual power switch off.

REPRODUCTION PATH:
(1) Create volume group with name "unencrypted" by means of vgcreate utility.
(2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>):

----------------[BEGIN SCRIPT]--------------------
#!/bin/bash

VG=unencrypted
#apt-get install nilfs-tools darcs
lvcreate --size 2G --name ntest $VG
mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
mkdir /var/tmp/n
mkdir /var/tmp/n/ntest
mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
mkdir /var/tmp/n/ntest/thedir
cd /var/tmp/n/ntest/thedir
sleep 2
date
darcs init
sleep 2
dmesg|tail -n 5
date
darcs whatsnew || true
date
sleep 2
dmesg|tail -n 5
----------------[END SCRIPT]--------------------

(3) Try to shutdown the system.

REPRODUCIBILITY: 100%

FIX:

The patch implements checking mount state of NILFS2 driver in
nilfs_writepage(), nilfs_writepages() and nilfs_mdt_write_page() methods.
If it is detected the RO mount state then all dirty pages are simply
discarded with warning messages is written in system log.

Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Anthony Doggett <Anthony2486@interfaces.org.uk>
Cc: ARAI Shun-ichi <hermes@ceres.dti.ne.jp>
Cc: Piotr Szymaniak <szarpaj@grubelek.pl>
Cc: Zahid Chowdhury <zahid.chowdhury@starsolutions.com>
Cc: Elmer Zhang <freeboy6716@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agortc: rtc-twl: convert twl4030rtc_driver to dev_pm_ops
Jingoo Han [Wed, 17 Apr 2013 23:48:51 +0000 (09:48 +1000)]
rtc: rtc-twl: convert twl4030rtc_driver to dev_pm_ops

Instead of using legacy suspend/resume methods, using newer dev_pm_ops
structure allows better control over power management.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agortc: rtc-stmp3xxx: convert stmp3xxx_rtcdrv to dev_pm_ops
Jingoo Han [Wed, 17 Apr 2013 23:48:51 +0000 (09:48 +1000)]
rtc: rtc-stmp3xxx: convert stmp3xxx_rtcdrv to dev_pm_ops

Instead of using legacy suspend/resume methods, using newer dev_pm_ops
structure allows better control over power management.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>