]> git.kernelconcepts.de Git - karo-tx-linux.git/log
karo-tx-linux.git
12 years agoAdd linux-next specific files for 20120417 next-20120417
Stephen Rothwell [Tue, 17 Apr 2012 04:28:34 +0000 (14:28 +1000)]
Add linux-next specific files for 20120417

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
12 years agoRevert "mfd: add irq domain support for max8997 interrupts"
Stephen Rothwell [Tue, 17 Apr 2012 04:15:24 +0000 (14:15 +1000)]
Revert "mfd: add irq domain support for max8997 interrupts"

This reverts commit 98d8618af37728f6e18e84110ddb99987b47dd12.

12 years agoMerge branch 'akpm'
Stephen Rothwell [Tue, 17 Apr 2012 03:49:18 +0000 (13:49 +1000)]
Merge branch 'akpm'

12 years agonotify_change(): check that i_mutex is held
Andrew Morton [Thu, 12 Apr 2012 22:52:42 +0000 (08:52 +1000)]
notify_change(): check that i_mutex is held

Cc: Djalal Harouni <tixxdz@opendz.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoramoops: fix printk format warnings
Randy Dunlap [Thu, 12 Apr 2012 22:52:41 +0000 (08:52 +1000)]
ramoops: fix printk format warnings

Fix printk format warnings for phys_addr_t type variables:

drivers/char/ramoops.c:246:3: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'phys_addr_t'
drivers/char/ramoops.c:273:2: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'phys_addr_t'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoramoops: use pstore interface
Kees Cook [Thu, 12 Apr 2012 22:52:41 +0000 (08:52 +1000)]
ramoops: use pstore interface

Instead of using /dev/mem directly and forcing userspace to know (or
extract) where the platform has defined persistent memory, how many slots
it has, the sizes, etc, use the common pstore infrastructure to handle
Oops gathering and extraction.  This presents a much easier to use
filesystem-based view to the memory region.  This also means that any
other tools that are written to understand pstore will automatically be
able to process ramoops too.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marco Stornelli <marco.stornelli@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoc/r: ipc: message queue stealing feature introduced
Stanislav Kinsbursky [Thu, 12 Apr 2012 22:52:40 +0000 (08:52 +1000)]
c/r: ipc: message queue stealing feature introduced

This patch is required for checkpoint/restore in userspace.

IOW, c/r requires some way to get all pending IPC messages without
deleting them for the queue (checkpoint can fail and in this case tasks
will be resumed, so queue have to be valid).

To achieve this, the new operation flag MSG_STEAL for sys_msgrcv() system
call introduced.  If this flag is set, then passed struct msgbuf pointer
will be used for storing array of structures:

struct msgbuf_a {
long mtype;         /* type of message */
int msize;          /* size of message */
char mtext[0];      /* message text */
};

each of which will be followed by corresponding message data.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Lucas De Marchi <lucas.de.marchi@gmail.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoc/r: ipc: message queue receive cleanup
Stanislav Kinsbursky [Thu, 12 Apr 2012 22:52:40 +0000 (08:52 +1000)]
c/r: ipc: message queue receive cleanup

Move all message related manipulation into one function msg_fill().
Actually, two functions because of the compat one.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Lucas De Marchi <lucas.de.marchi@gmail.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoc/r: prctl: add ability to get clear_tid_address
Cyrill Gorcunov [Thu, 12 Apr 2012 22:52:39 +0000 (08:52 +1000)]
c/r: prctl: add ability to get clear_tid_address

Zero is written at clear_tid_address when the process exits.  This
functionality is used by pthread_join().

We already have sys_set_tid_address() to change this address for the
current task but there is no way to obtain it from user space.

Without the ability to find this address and dump it we can't restore
pthread'ed apps which call pthread_join() once they have been restored.

This patch introduces the PR_GET_TID_ADDRESS prctl option which allows the
current process to obtain own clear_tid_address.

This feature is available iif CONFIG_CHECKPOINT_RESTORE is set.

[akpm@linux-foundation.org: fix prctl numbering]
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pedro Alves <palves@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoc/r: prctl: add ability to set new mm_struct::exe_file
Cyrill Gorcunov [Thu, 12 Apr 2012 22:52:39 +0000 (08:52 +1000)]
c/r: prctl: add ability to set new mm_struct::exe_file

When we do restore we would like to have a way to setup a former
mm_struct::exe_file so that /proc/pid/exe would point to the original
executable file a process had at checkpoint time.

For this the PR_SET_MM_EXE_FILE code is introduced.  This option takes a
file descriptor which will be set as a source for new /proc/$pid/exe
symlink.

Note it allows to change /proc/$pid/exe if there are no VM_EXECUTABLE
vmas present for current process, simply because this feature is a special
to C/R and mm::num_exe_file_vmas become meaningless after that.

To minimize the amount of transition the /proc/pid/exe symlink might have,
this feature is implemented in one-shot manner.  Thus once changed the
symlink can't be changed again.  This should help sysadmins to monitor the
symlinks over all process running in a system.

In particular one could make a snapshot of processes and ring alarm if
there unexpected changes of /proc/pid/exe's in a system.

Note -- this feature is available iif CONFIG_CHECKPOINT_RESTORE is set and
the caller must have CAP_SYS_RESOURCE capability granted, otherwise the
request to change symlink will be rejected.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoc/r: prctl: extend PR_SET_MM to set up more mm_struct entries
Cyrill Gorcunov [Thu, 12 Apr 2012 22:52:38 +0000 (08:52 +1000)]
c/r: prctl: extend PR_SET_MM to set up more mm_struct entries

During checkpoint we dump whole process memory to a file and the dump
includes process stack memory.  But among stack data itself, the stack
carries additional parameters such as command line arguments, environment
data and auxiliary vector.

So when we do restore procedure and once we've restored stack data itself
we need to setup mm_struct::arg_start/end, env_start/end, so restored
process would be able to find command line arguments and environment data
it had at checkpoint time.  The same applies to auxiliary vector.

For this reason additional PR_SET_MM_(ARG_START | ARG_END | ENV_START |
ENV_END | AUXV) codes are introduced.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoc/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid...
Cyrill Gorcunov [Thu, 12 Apr 2012 22:52:38 +0000 (08:52 +1000)]
c/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid/stat

We would like to have an ability to restore command line arguments and
program environment pointers but first we need to obtain them somehow.
Thus we put these values into /proc/$pid/stat.  The exit_code is needed to
restore zombie tasks.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agosyscalls-x86-add-__nr_kcmp-syscall-v8-comment-update-fix
Andrew Morton [Thu, 12 Apr 2012 22:52:37 +0000 (08:52 +1000)]
syscalls-x86-add-__nr_kcmp-syscall-v8-comment-update-fix

tweak comment text

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agosyscalls-x86-add-__nr_kcmp-syscall-v8 comment update
Cyrill Gorcunov [Thu, 12 Apr 2012 22:52:37 +0000 (08:52 +1000)]
syscalls-x86-add-__nr_kcmp-syscall-v8 comment update

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agosyscalls, x86: add __NR_kcmp syscall
Cyrill Gorcunov [Thu, 12 Apr 2012 22:52:36 +0000 (08:52 +1000)]
syscalls, x86: add __NR_kcmp syscall

While doing the checkpoint-restore in the user space one need to determine
whether various kernel objects (like mm_struct-s of file_struct-s) are
shared between tasks and restore this state.

The 2nd step can be solved by using appropriate CLONE_ flags and the
unshare syscall, while there's currently no ways for solving the 1st one.

One of the ways for checking whether two tasks share e.g.  mm_struct is to
provide some mm_struct ID of a task to its proc file, but showing such
info considered to be not that good for security reasons.

Thus after some debates we end up in conclusion that using that named
'comparison' syscall might be the best candidate.  So here is it --
__NR_kcmp.

It takes up to 5 arguments - the pids of the two tasks (which
characteristics should be compared), the comparison type and (in case of
comparison of files) two file descriptors.

Lookups for pids are done in the caller's PID namespace only.

At moment only x86 is supported and tested.

[akpm@linux-foundation.org: fix up selftests, warnings]
[akpm@linux-foundation.org: include errno.h]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: Michal Marek <mmarek@suse.cz>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agofs, proc: introduce /proc/<pid>/task/<tid>/children entry
Cyrill Gorcunov [Thu, 12 Apr 2012 22:52:36 +0000 (08:52 +1000)]
fs, proc: introduce /proc/<pid>/task/<tid>/children entry

When we do checkpoint of a task we need to know the list of children the
task, has but there is no easy and fast way to generate reverse
parent->children chain from arbitrary <pid> (while a parent pid is
provided in "PPid" field of /proc/<pid>/status).

So instead of walking over all pids in the system (creating one big
process tree in memory, just to figure out which children a task has) --
we add explicit /proc/<pid>/task/<tid>/children entry, because the kernel
already has this kind of information but it is not yet exported.

This is a first level children, not the whole process tree.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agosysctl: make kernel.ns_last_pid control dependent on CHECKPOINT_RESTORE
Cyrill Gorcunov [Thu, 12 Apr 2012 22:52:35 +0000 (08:52 +1000)]
sysctl: make kernel.ns_last_pid control dependent on CHECKPOINT_RESTORE

For those who doesn't need C/R functionality there is no need to control
last pid, ie the pid for the next fork() call.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorapidio/tsi721: add DMA engine support
Alexandre Bounine [Thu, 12 Apr 2012 22:52:35 +0000 (08:52 +1000)]
rapidio/tsi721: add DMA engine support

Adds support for DMA Engine API into Tsi721 mport driver.

Includes following changes for Tsi721 driver:
- Modifies BDMA register offset definitions to support per-channel handling
- Separates BDMA channel reserved for RIO Maintenance requests
- Adds DMA Engine callback routines

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Li Yang <leoli@freescale.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agorapidio: add DMA engine support for RIO data transfers
Alexandre Bounine [Thu, 12 Apr 2012 22:52:35 +0000 (08:52 +1000)]
rapidio: add DMA engine support for RIO data transfers

Adds DMA Engine framework support into RapidIO subsystem.

Uses DMA Engine DMA_SLAVE interface to generate data transfers to/from
remote RapidIO target devices.

Introduces RapidIO-specific wrapper for prep_slave_sg() interface with an
extra parameter to pass target specific information.

Uses scatterlist to describe local data buffer.  Address flat data buffer
on a remote side.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Acked-by: Vinod Koul <vinod.koul@linux.intel.com>
Cc: Li Yang <leoli@freescale.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoipc/sem.c: alternatives to preempt_disable()
Manfred Spraul [Thu, 12 Apr 2012 22:52:34 +0000 (08:52 +1000)]
ipc/sem.c: alternatives to preempt_disable()

ipc/sem.c uses a custom wakeup scheme that relies on preempt_disable().
On -RT, this causes increased latencies and debug warnings.

The patch adds two additional schemes:
- one built around a completion - could be better for -RT kernels
- one built around a spinlock - unfortunately it's broken
- and the current one

My preferred solution would be the spinlock implementation: RT would use
premptible spinlocks, mainline normal spinlocks.  Thus both get the
optimal implementation without any special code in ipc/sem.c.
Unfortunately, I don't see how it could be fixed.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoproc: use IS_ERR_OR_NULL()
Cong Wang [Thu, 12 Apr 2012 22:52:34 +0000 (08:52 +1000)]
proc: use IS_ERR_OR_NULL()

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoproc: use mm_access() instead of ptrace_may_access()
Cong Wang [Thu, 12 Apr 2012 22:52:33 +0000 (08:52 +1000)]
proc: use mm_access() instead of ptrace_may_access()

mm_access() handles this much better, and avoids some race conditions.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoproc: remove mm_for_maps()
Cong Wang [Thu, 12 Apr 2012 22:52:32 +0000 (08:52 +1000)]
proc: remove mm_for_maps()

mm_for_maps() is a simple wrapper for mm_access(), and the name is
misleading, so just remove it and use mm_access() directly.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoproc: unify ptrace_may_access() locking code
Cong Wang [Thu, 12 Apr 2012 22:52:32 +0000 (08:52 +1000)]
proc: unify ptrace_may_access() locking code

Unify mutex_lock+ptrace_may_access code and rename lock_trace() to
task_access_lock(), which better describes what it does.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoproc-clean-up-proc-pid-environ-handling-checkpatch-fixes
Andrew Morton [Thu, 12 Apr 2012 22:52:31 +0000 (08:52 +1000)]
proc-clean-up-proc-pid-environ-handling-checkpatch-fixes

ERROR: "foo* bar" should be "foo *bar"
#26: FILE: fs/proc/base.c:680:
+static int __mem_open(struct inode* inode, struct file* file, unsigned int mode)

ERROR: "foo* bar" should be "foo *bar"
#26: FILE: fs/proc/base.c:680:
+static int __mem_open(struct inode* inode, struct file* file, unsigned int mode)

ERROR: "foo* bar" should be "foo *bar"
#43: FILE: fs/proc/base.c:708:
+static int mem_open(struct inode* inode, struct file* file)

ERROR: "foo* bar" should be "foo *bar"
#43: FILE: fs/proc/base.c:708:
+static int mem_open(struct inode* inode, struct file* file)

ERROR: "foo* bar" should be "foo *bar"
#55: FILE: fs/proc/base.c:809:
+static int environ_open(struct inode* inode, struct file* file)

ERROR: "foo* bar" should be "foo *bar"
#55: FILE: fs/proc/base.c:809:
+static int environ_open(struct inode* inode, struct file* file)

total: 6 errors, 0 warnings, 100 lines checked

./patches/proc-clean-up-proc-pid-environ-handling.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoproc: clean up /proc/<pid>/environ handling
Cong Wang [Thu, 12 Apr 2012 22:52:31 +0000 (08:52 +1000)]
proc: clean up /proc/<pid>/environ handling

Similar to e268337dfe2 ("proc: clean up and fix /proc/<pid>/mem
handling"), move the check of permission to open(), this will simplify
read() code.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agokmod: avoid deadlock from recursive kmod call
Tetsuo Handa [Thu, 12 Apr 2012 22:52:30 +0000 (08:52 +1000)]
kmod: avoid deadlock from recursive kmod call

The system deadlocks (at least since 2.6.10) when
call_usermodehelper(UMH_WAIT_EXEC) request triggered
call_usermodehelper(UMH_WAIT_PROC) request.

This is because "khelper thread is waiting for the worker thread at
wait_for_completion() in do_fork() since the worker thread was created
with CLONE_VFORK flag" and "the worker thread cannot call complete()
because do_execve() is blocked at UMH_WAIT_PROC request" and "the khelper
thread cannot start processing UMH_WAIT_PROC request because the khelper
thread is waiting for the worker thread at wait_for_completion() in
do_fork()".

In order to avoid deadlock, do not try to call wait_for_completion() in
call_usermodehelper_exec() if the worker thread was created by khelper
thread with CLONE_VFORK flag.

The easiest example to observe this deadlock is to use a corrupted
/sbin/hotplug binary (like shown below).

  # : > /tmp/dummy
  # chmod 755 /tmp/dummy
  # echo /tmp/dummy > /proc/sys/kernel/hotplug
  # modprobe whatever

call_usermodehelper("/tmp/dummy", UMH_WAIT_EXEC) is called from
kobject_uevent_env() in lib/kobject_uevent.c upon loading/unloading a
module.  do_execve("/tmp/dummy") triggers a call to
request_module("binfmt-0000") from search_binary_handler() which in turn
calls call_usermodehelper(UMH_WAIT_PROC).

There are various hooks called during do_execve() operation (e.g.
security_bprm_check(), audit_bprm(), "struct
linux_binfmt"->load_binary()).  If one of such hooks triggers
UMH_WAIT_EXEC, this deadlock will happen even if /sbin/hotplug is not
corrupted.

[akpm@linux-foundation.org: add comment to kmod_thread_locker]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agokmod: move call_usermodehelper_fns() to .c file and unexport all it's helpers
Boaz Harrosh [Thu, 12 Apr 2012 22:52:30 +0000 (08:52 +1000)]
kmod: move call_usermodehelper_fns() to .c file and unexport all it's helpers

If we move call_usermodehelper_fns() to kmod.c file and EXPORT_SYMBOL it
we can avoid exporting all it's helper functions:
call_usermodehelper_setup
call_usermodehelper_setfns
call_usermodehelper_exec
And make all of them static to kmod.c

Since the optimizer will see all these as a single call site it will
inline them inside call_usermodehelper_fns().  So we loose the call to
_fns but gain 3 calls to the helpers.  (Not that it matters)

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agokmod: convert two call sites to call_usermodehelper_fns()
Boaz Harrosh [Thu, 12 Apr 2012 22:52:29 +0000 (08:52 +1000)]
kmod: convert two call sites to call_usermodehelper_fns()

Both kernel/sys.c && security/keys/request_key.c where inlining the exact
same code as call_usermodehelper_fns(); So simply convert these sites to
directly use call_usermodehelper_fns().

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agokmod: unexport call_usermodehelper_freeinfo()
Boaz Harrosh [Thu, 12 Apr 2012 22:52:29 +0000 (08:52 +1000)]
kmod: unexport call_usermodehelper_freeinfo()

call_usermodehelper_freeinfo() is not used outside of kmod.c.  So unexport
it, and make it static to kmod.c

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoHPFS: remove PRINTK() macro
Dan Carpenter [Thu, 12 Apr 2012 22:52:28 +0000 (08:52 +1000)]
HPFS: remove PRINTK() macro

The PRINTK() macro isn't really used.  Let's just remove it because it
is ugly and out of date.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agortc-rename-config_rtc_mxc-to-config_rtc_drv_mxc-fix
Andrew Morton [Thu, 12 Apr 2012 22:52:27 +0000 (08:52 +1000)]
rtc-rename-config_rtc_mxc-to-config_rtc_drv_mxc-fix

Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Fabio Estevam <fabio.estevam@freescale.com>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agortc: rename CONFIG_RTC_MXC to CONFIG_RTC_DRV_MXC
Fabio Estevam [Thu, 12 Apr 2012 22:52:27 +0000 (08:52 +1000)]
rtc: rename CONFIG_RTC_MXC to CONFIG_RTC_DRV_MXC

In order to keep consistency with other rtc drivers,rename CONFIG_RTC_MXC
to CONFIG_RTC_DRV_MXC.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/rtc/Kconfig: place RTC_DRV_IMXDI and RTC_MXC under "on-CPU RTC drivers"
Fabio Estevam [Thu, 12 Apr 2012 22:52:26 +0000 (08:52 +1000)]
drivers/rtc/Kconfig: place RTC_DRV_IMXDI and RTC_MXC under "on-CPU RTC drivers"

RTC_DRV_IMXDI and RTC_MXC are on-chip RTC modules, so move them under
"on-CPU RTC drivers" selection menu.

While at it change the dependency of RTC_DRV_IMXDI from ARCH_MX25 to
SOC_IMX25.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/rtc/rtc-pcf8563.c: add RTC_VL_READ/RTC_VL_CLR ioctl feature
Alexander Stein [Thu, 12 Apr 2012 22:52:26 +0000 (08:52 +1000)]
drivers/rtc/rtc-pcf8563.c: add RTC_VL_READ/RTC_VL_CLR ioctl feature

Changes are based on arch/cris/arch-v10/drivers/pcf8563.c

Signed-off-by: Alexander Stein <alexander.stein@systec-electronic.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agortc: add ioctl to get/clear battery low voltage status
Alexander Stein [Thu, 12 Apr 2012 22:52:25 +0000 (08:52 +1000)]
rtc: add ioctl to get/clear battery low voltage status

Currently there is no generic way to get the RTC battery status within an
application.  So add an ioctl to read the status bit.  The idea is that
the bit is set once a low voltage is detected.  It stays there until it is
reset using the RTC_VL_CLR ioctl.

Signed-off-by: Alexander Stein <alexander.stein@systec-electronic.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/rtc/rtc-ep93xx.c: convert to use module_platform_driver()
H Hartley Sweeten [Thu, 12 Apr 2012 22:52:25 +0000 (08:52 +1000)]
drivers/rtc/rtc-ep93xx.c: convert to use module_platform_driver()

Use module_platform_driver() to remove the boilerplate code.

Also, change the probe and remove functions to __devinit/__devexit.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agortc/spear: add Device Tree probing capability
Viresh Kumar [Thu, 12 Apr 2012 22:52:25 +0000 (08:52 +1000)]
rtc/spear: add Device Tree probing capability

SPEAr platforms now support DT and so must convert all drivers support DT.
 This patch adds DT probing support for rtc and updates its documentation
too.

Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Cc: Stefan Roese <sr@denx.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Rajeev Kumar <rajeev-dlh.kumar@st.com>
Cc: Rob Herring <robherring2@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agocheckpatch: suggest pr_<level> over printk(KERN_<LEVEL>
Joe Perches [Thu, 12 Apr 2012 22:52:24 +0000 (08:52 +1000)]
checkpatch: suggest pr_<level> over printk(KERN_<LEVEL>

Suggest the shorter pr_<level> instead of printk(KERN_<LEVEL>.

Prefer to use pr_<level> over bare printks.
Prefer to use pr_warn over pr_warning.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolib/test-kstrtox.c: mark const init data with __initconst instead of __initdata
Uwe Kleine-König [Thu, 12 Apr 2012 22:52:24 +0000 (08:52 +1000)]
lib/test-kstrtox.c: mark const init data with __initconst instead of __initdata

As long as there is no other non-const variable marked __initdata in the
same compilation unit it doesn't hurt. If there were one however
compilation would fail with

error: $variablename causes a section type conflict

because a section containing const variables is marked read only and so
cannot contain non-const variables.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agolist_debug: WARN for adding something already in the list
Chris Metcalf [Thu, 12 Apr 2012 22:52:23 +0000 (08:52 +1000)]
list_debug: WARN for adding something already in the list

We were bitten by this at one point and added an additional sanity test
for DEBUG_LIST.  You can't validly add a list_head to a list where either
prev or next is the same as the thing you're adding.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoleds: simple_strtoul() cleanup
Shuah Khan [Thu, 12 Apr 2012 22:52:23 +0000 (08:52 +1000)]
leds: simple_strtoul() cleanup

led-class.c and ledtrig-timer.c still use simple_strtoul().  Change them
to use kstrtoul() instead of obsolete simple_strtoul().

Also fix the existing int ret declaration to be ssize_t to match the
return type for _store functions in ledtrig-timer.c.

Signed-off-by: Shuah Khan <shuahkhan@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoleds-add-led-driver-for-lm3556-chip-checkpatch-fixes
Andrew Morton [Thu, 12 Apr 2012 22:52:22 +0000 (08:52 +1000)]
leds-add-led-driver-for-lm3556-chip-checkpatch-fixes

WARNING: please write a paragraph that describes the config symbol fully
#35: FILE: drivers/leds/Kconfig:405:
+config LEDS_LM3556

ERROR: "foo * bar" should be "foo *bar"
#204: FILE: drivers/leds/leds-lm3556.c:142:
+static int lm3556_read_reg(struct i2c_client *client, u8 reg, u8 * val)

total: 1 errors, 1 warnings, 736 lines checked

./patches/leds-add-led-driver-for-lm3556-chip.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Geon Si Jeong <gshark.jeong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoleds: add LED driver for lm3556 chip
Geon Si Jeong [Thu, 12 Apr 2012 22:52:22 +0000 (08:52 +1000)]
leds: add LED driver for lm3556 chip

A simple driver for the Texas Instruments LM3556 chip.

The LM3556 is a 4 MHz fixed-frequency synchronous boost converter plus
1.5A constant current driver for a high-current white LED.  Datasheet:
www.national.com/ds/LM/LM3556.pdf

Tested on OMAP4430

Signed-off-by: Geon Si Jeong <gshark.jeong@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Daniel Jeong <daniel.jeong@ti.com>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Greg KH <greg@kroah.com>
Cc: Wolfram Sang <w.sang@pengutronix.de>
Cc: Shuah Khan <shuahkhan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoleds-led-module-for-da9052-53-pmic-v2-fix
Andrew Morton [Thu, 12 Apr 2012 22:52:21 +0000 (08:52 +1000)]
leds-led-module-for-da9052-53-pmic-v2-fix

Cc: Ashish Jangam <ashish.jangam@kpitcummins.com>
Cc: David Dajun Chen <dchen@diasemi.com>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoleds: driver for DA9052/53 PMIC v2
David Dajun Chen [Thu, 12 Apr 2012 22:52:21 +0000 (08:52 +1000)]
leds: driver for DA9052/53 PMIC v2

LED Driver for Dialog Semiconductor DA9052/53 PMICs.

Signed-off-by: David Dajun Chen <dchen@diasemi.com>
Signed-off-by: Ashish Jangam <ashish.jangam@kpitcummins.com>
Reviewed-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/leds/leds-lp5521.c: fix lp5521_read() error handling
Dan Carpenter [Thu, 12 Apr 2012 22:52:20 +0000 (08:52 +1000)]
drivers/leds/leds-lp5521.c: fix lp5521_read() error handling

Gcc 4.6.2 complains that:
drivers/leds/leds-lp5521.c: In function `lp5521_load_program':
drivers/leds/leds-lp5521.c:214:21: warning: `mode' may be used uninitialized in this function [-Wuninitialized]
drivers/leds/leds-lp5521.c: In function `lp5521_probe':
drivers/leds/leds-lp5521.c:788:5: warning: `buf' may be used uninitialized in this function [-Wuninitialized]
drivers/leds/leds-lp5521.c:740:6: warning: `ret' may be used uninitialized in this function [-Wuninitialized]

These are real problems if lp5521_read() returns an error.  When that
happens we should handle it, instead of ignoring it or doing a bitwise OR
with all the other error codes and continuing.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Milo <Milo.Kim@ti.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoblacklight: remove redundant spi driver bus initialization
Lars-Peter Clausen [Thu, 12 Apr 2012 22:52:20 +0000 (08:52 +1000)]
blacklight: remove redundant spi driver bus initialization

In ancient times it was necessary to manually initialize the bus field of
an spi_driver to spi_bus_type.  These days this is done in
spi_driver_register() so we can drop the manual assignment.

The patch was generated using the following coccinelle semantic patch:
// <smpl>
@@
identifier _driver;
@@
struct spi_driver _driver = {
.driver = {
- .bus = &spi_bus_type,
},
};
// </smpl>

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoMAINTAINERS: remove Alessandro
Alessandro Rubini [Thu, 12 Apr 2012 22:52:19 +0000 (08:52 +1000)]
MAINTAINERS: remove Alessandro

I'm definitely not up to date on the subject matter any more
and haven't been able to comment on the messages on the topic.

Signed-off-by: Alessandro Rubini <rubini@ipvvis.unipv.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agohamradio/scc: orphan driver in MAINTAINERS
Uwe Kleine-König [Thu, 12 Apr 2012 22:52:19 +0000 (08:52 +1000)]
hamradio/scc: orphan driver in MAINTAINERS

The email address doesn't exist anymore and the website yields a 404.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agovsprintf-further-optimize-decimal-conversion-checkpatch-fixes
Andrew Morton [Thu, 12 Apr 2012 22:52:18 +0000 (08:52 +1000)]
vsprintf-further-optimize-decimal-conversion-checkpatch-fixes

WARNING: line over 80 characters
#112: FILE: lib/vsprintf.c:136:
+  * (x * 0x00cd) >> 11     x <       1029 shorter code than * 0x67 (on i386)

ERROR: trailing statements should be on next line
#192: FILE: lib/vsprintf.c:178:
+ if (q == 0) return buf;

ERROR: trailing statements should be on next line
#195: FILE: lib/vsprintf.c:181:
+ if (r == 0) return buf;

ERROR: trailing statements should be on next line
#198: FILE: lib/vsprintf.c:184:
+ if (q == 0) return buf;

ERROR: trailing statements should be on next line
#201: FILE: lib/vsprintf.c:187:
+ if (r == 0) return buf;

ERROR: trailing statements should be on next line
#204: FILE: lib/vsprintf.c:190:
+ if (q == 0) return buf;

ERROR: trailing statements should be on next line
#207: FILE: lib/vsprintf.c:193:
+ if (r == 0) return buf;

ERROR: trailing statements should be on next line
#210: FILE: lib/vsprintf.c:196:
+ if (q == 0) return buf;

ERROR: space prohibited after that '&' (ctx:WxW)
#290: FILE: lib/vsprintf.c:267:
+ d2  = (h      ) & 0xffff;
                  ^

ERROR: space prohibited before that close parenthesis ')'
#290: FILE: lib/vsprintf.c:267:
+ d2  = (h      ) & 0xffff;

total: 9 errors, 1 warnings, 310 lines checked

./patches/vsprintf-further-optimize-decimal-conversion.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agovsprintf-further-optimize-decimal-conversion-v2
Denys Vlasenko [Thu, 12 Apr 2012 22:52:18 +0000 (08:52 +1000)]
vsprintf-further-optimize-decimal-conversion-v2

Here is a new version. I also plugged a hole in num_to_str() -
it was assuming it's safe to call put_dec() for num=0.
(We never tripped over it before because the single caller
of num_to_str() takes care of that case).

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Douglas W Jones <jones@cs.uiowa.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agovsprintf: further optimize decimal conversion
Denys Vlasenko [Thu, 12 Apr 2012 22:52:18 +0000 (08:52 +1000)]
vsprintf: further optimize decimal conversion

Previous code was using optimizations which were developed to work well
even on narrow-word CPUs (by today's standards).  But Linux runs only on
32-bit and wider CPUs.  We can use that.

First: using 32x32->64 multiply and trivial 32-bit shift, we can correctly
divide by 10 much larger numbers, and thus we can print groups of 9 digits
instead of groups of 5 digits.

Next: there are two algorithms to print larger numbers.  One is generic:
divide by 1000000000 and repeatedly print groups of (up to) 9 digits.
It's conceptually simple, but requires an (unsigned long long) /
1000000000 division.

Second algorithm splits 64-bit unsigned long long into 16-bit chunks,
manipulates them cleverly and generates groups of 4 decimal digits.  It so
happens that it does NOT require long long division.

If long is > 32 bits, division of 64-bit values is relatively easy, and we
will use the first algorithm.  If long long is > 64 bits (strange
architecture with VERY large long long), second algorithm can't be used,
and we again use the first one.

Else (if long is 32 bits and long long is 64 bits) we use second one.

And third: there is a simple optimization which takes fast path not only
for zero as was done before, but for all one-digit numbers.

In all tested cases new code is faster than old one, in many cases by 30%,
in few cases by more than 50% (for example, on x86-32, conversion of
12345678).  Code growth is ~0 in 32-bit case and ~130 bytes in 64-bit
case.

This patch is based upon an original from Michal Nazarewicz.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Douglas W Jones <jones@cs.uiowa.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoparisc: use set_current_blocked() and block_sigmask()
Matt Fleming [Thu, 12 Apr 2012 22:52:17 +0000 (08:52 +1000)]
parisc: use set_current_blocked() and block_sigmask()

As described in e6fa16ab ("signal: sigprocmask() should do
retarget_shared_pending()") the modification of current->blocked is
incorrect as we need to check whether the signal we're about to block is
pending in the shared queue.

Also, use the new helper function introduced in commit 5e6292c0f28f
("signal: add block_sigmask() for adding sigmask to current->blocked")
which centralises the code for updating current->blocked after
successfully delivering a signal and reduces the amount of duplicate code
across architectures.  In the past some architectures got this code wrong,
so using this helper function should stop that from happening again.

Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agocheckpatch: check for spin_is_locked()
Andi Kleen [Thu, 12 Apr 2012 22:52:16 +0000 (08:52 +1000)]
checkpatch: check for spin_is_locked()

spin_is_locked() is usually misued. In checkpatch.pl:

- warn when it is used at all

- error out when it is asserted on free, because that's usually broken
  (e.g.  doesn't work on on uni processor builds).  Recommend
  lockdep_assert_held() instead.

[joe@perches.com: some improvements]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoinclude/linux/spinlock.h: add a kerneldoc comment to spin_is_locked() that discourage...
Andi Kleen [Thu, 12 Apr 2012 22:52:16 +0000 (08:52 +1000)]
include/linux/spinlock.h: add a kerneldoc comment to spin_is_locked() that discourages its use

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agospinlockstxt-add-a-discussion-on-why-spin_is_locked-is-bad-fix
Andrew Morton [Thu, 12 Apr 2012 22:52:15 +0000 (08:52 +1000)]
spinlockstxt-add-a-discussion-on-why-spin_is_locked-is-bad-fix

- s/hold/held in various places

- reindent for 80 cols

- a few grammatical tweaks

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agospinlocks.txt: add a discussion on why spin_is_locked() is bad
Andi Kleen [Thu, 12 Apr 2012 22:52:15 +0000 (08:52 +1000)]
spinlocks.txt: add a discussion on why spin_is_locked() is bad

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/net/ethernet/smsc/smsc911x.h: use lockdep_assert_held() instead of home grown...
Andi Kleen [Thu, 12 Apr 2012 22:52:15 +0000 (08:52 +1000)]
drivers/net/ethernet/smsc/smsc911x.h: use lockdep_assert_held() instead of home grown buggy construct

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Steve Glendinning <steve.glendinning@smsc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/net/irda/sir_dev.c: remove spin_is_locked()
Andi Kleen [Thu, 12 Apr 2012 22:52:14 +0000 (08:52 +1000)]
drivers/net/irda/sir_dev.c: remove spin_is_locked()

It's hard to imagine how this spin_is_locked() debugging check is not
totally racy.  Remove it.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Samuel Ortiz <samuel@sortiz.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agofutex: use lockdep_assert_held() for lock checking
Andi Kleen [Thu, 12 Apr 2012 22:52:14 +0000 (08:52 +1000)]
futex: use lockdep_assert_held() for lock checking

Use lockdep_assert_held() for lock checking instead of a strange homegrown
variant.  This removes the return for this case, but that is unlikely to
be useful anyway.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm/huge_memory.c: use lockdep_assert_held()
Andi Kleen [Thu, 12 Apr 2012 22:52:13 +0000 (08:52 +1000)]
mm/huge_memory.c: use lockdep_assert_held()

Use lockdep_assert_held() to check for locks instead of an opencoded
variant.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoXFS: fix lock ASSERT on UP
Andi Kleen [Thu, 12 Apr 2012 22:52:13 +0000 (08:52 +1000)]
XFS: fix lock ASSERT on UP

ASSERT(!spin_is_locked()) doesn't work on UP builds.  Replace with a
standard lockdep_assert_held()

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Ben Myers <bpm@sgi.com>
Cc: Alex Elder <elder@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodrivers/scsi/aha152x.c: remove broken usage of spin_is_locked()
Andi Kleen [Thu, 12 Apr 2012 22:52:12 +0000 (08:52 +1000)]
drivers/scsi/aha152x.c: remove broken usage of spin_is_locked()

Remove racy usage of spin_is_locked. The author seems to have been
unclear on the concept of locking.

This is debug code normally not enabled, but I caught it on a tree sweep.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agosgi-xp: use lockdep_assert_held()
Andi Kleen [Thu, 12 Apr 2012 22:52:12 +0000 (08:52 +1000)]
sgi-xp: use lockdep_assert_held()

!spin_is_locked() is always true on a UP build, so it cannot be used for
assertions.  Replace with lockdep_assert_held().

I realize UP builds are not very likely for this driver, but it's still
better to fix it.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoum/kernel/trap.c: port OOM changes to handle_page_fault()
Kautuk Consul [Thu, 12 Apr 2012 22:52:11 +0000 (08:52 +1000)]
um/kernel/trap.c: port OOM changes to handle_page_fault()

Commit d065bd810b6deb6 ("mm: retry page fault when blocking on disk
transfer") and commit 37b23e0525 ("x86,mm: make pagefault killable")

The above commits introduced changes into the x86 pagefault handler
for making the page fault handler retryable as well as killable.

These changes reduce the mmap_sem hold time, which is crucial
during OOM killer invocation.

Port these changes to um.

Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agofrv: use set_current_blocked() and block_sigmask()
Matt Fleming [Thu, 12 Apr 2012 22:52:10 +0000 (08:52 +1000)]
frv: use set_current_blocked() and block_sigmask()

As described in e6fa16ab ("signal: sigprocmask() should do
retarget_shared_pending()") the modification of current->blocked is
incorrect as we need to check whether the signal we're about to block is
pending in the shared queue.

Also, use the new helper function introduced in commit 5e6292c0f28f
("signal: add block_sigmask() for adding sigmask to current->blocked")
which centralises the code for updating current->blocked after
successfully delivering a signal and reduces the amount of duplicate code
across architectures.  In the past some architectures got this code wrong,
so using this helper function should stop that from happening again.

Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agosecurity/keys/keyctl.c: suppress memory allocation failure warning
Andrew Morton [Thu, 12 Apr 2012 22:52:10 +0000 (08:52 +1000)]
security/keys/keyctl.c: suppress memory allocation failure warning

This allocation may be large.  The code is probing to see if it will
succeed and if not, it falls back to vmalloc().  We should suppress any
page-allocation failure messages when the fallback happens.

Reported-by: Dave Jones <davej@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm/memcg: use vm_swappiness from target memory cgroup
Konstantin Khlebnikov [Thu, 12 Apr 2012 22:52:09 +0000 (08:52 +1000)]
mm/memcg: use vm_swappiness from target memory cgroup

Use vm_swappiness from memory cgroup which is triggered this memory
reclaim.  This is more reasonable and allows to kill one argument.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomemcg: revise the position of threshold index while unregistering event
Sha Zhengju [Thu, 12 Apr 2012 22:52:09 +0000 (08:52 +1000)]
memcg: revise the position of threshold index while unregistering event

Index current_threshold should point to threshold just below or equal to
usage.  See below: http://www.spinics.net/lists/cgroups/msg00844.html

Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomemcg: make threshold index in the right position
Sha Zhengju [Thu, 12 Apr 2012 22:52:08 +0000 (08:52 +1000)]
memcg: make threshold index in the right position

Index current_threshold may point to threshold that just equal to usage
after last call of __mem_cgroup_threshold.  But after registering a new
event, it will change (pointing to threshold just below usage).  So make
it consistent here.

For example:
now:
threshold array:  3  [5]  7  9   (usage = 6, [index] = 5)

next turn (after calling __mem_cgroup_threshold):
threshold array:  3   5  [7]  9   (usage = 7, [index] = 7)

after registering a new event (threshold = 10):
threshold array:  3  [5]  7  9  10 (usage = 7, [index] = 5)

Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomemcg: remove redundant parentheses
Kirill A. Shutemov [Thu, 12 Apr 2012 22:52:08 +0000 (08:52 +1000)]
memcg: remove redundant parentheses

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomemcg: mark stat field of mem_cgroup struct as __percpu
Kirill A. Shutemov [Thu, 12 Apr 2012 22:52:07 +0000 (08:52 +1000)]
memcg: mark stat field of mem_cgroup struct as __percpu

It fixes a lot of sparse warnings.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomemcg: remove unused variable
Kirill A. Shutemov [Thu, 12 Apr 2012 22:52:07 +0000 (08:52 +1000)]
memcg: remove unused variable

mm/memcontrol.c: In function `mc_handle_file_pte':
mm/memcontrol.c:5206:16: warning: variable `inode' set but not used [-Wunused-but-set-variable]

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomemcg: mark more functions/variables as static
Kirill A. Shutemov [Thu, 12 Apr 2012 22:52:06 +0000 (08:52 +1000)]
memcg: mark more functions/variables as static

Based on sparse output.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm/memcg: kill mem_cgroup_lru_del()
Konstantin Khlebnikov [Thu, 12 Apr 2012 22:52:06 +0000 (08:52 +1000)]
mm/memcg: kill mem_cgroup_lru_del()

This patch kills mem_cgroup_lru_del(), we can use
mem_cgroup_lru_del_list() instead.  On 0-order isolation we already have
right lru list id.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: remove lru type checks from __isolate_lru_page()
Konstantin Khlebnikov [Thu, 12 Apr 2012 22:52:06 +0000 (08:52 +1000)]
mm: remove lru type checks from __isolate_lru_page()

After patch "mm: forbid lumpy-reclaim in shrink_active_list()" we can
completely remove anon/file and active/inactive lru type filters from
__isolate_lru_page(), because isolation for 0-order reclaim always
isolates pages from right lru list.  And pages-isolation for lumpy
shrink_inactive_list() or memory-compaction anyway allowed to isolate
pages from all evictable lru lists.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: mark mm-inline functions as __always_inline
Konstantin Khlebnikov [Thu, 12 Apr 2012 22:52:05 +0000 (08:52 +1000)]
mm: mark mm-inline functions as __always_inline

GCC sometimes ignores "inline" directives even for small and simple functions.
This supposed to be fixed in gcc 4.7, but it was released only yesterday.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm-push-lru-index-into-shrink_active_list-fix
Andrew Morton [Thu, 12 Apr 2012 22:52:05 +0000 (08:52 +1000)]
mm-push-lru-index-into-shrink_active_list-fix

fix kerneldoc, per Minchan

Cc: Glauber Costa <glommer@parallels.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: push lru index into shrink_[in]active_list()
Konstantin Khlebnikov [Thu, 12 Apr 2012 22:52:04 +0000 (08:52 +1000)]
mm: push lru index into shrink_[in]active_list()

Let's toss lru index through call stack to isolate_lru_pages(), this is
better than its reconstructing from individual bits.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm/memcg: move reclaim_stat into lruvec
Hugh Dickins [Thu, 12 Apr 2012 22:52:04 +0000 (08:52 +1000)]
mm/memcg: move reclaim_stat into lruvec

With mem_cgroup_disabled() now explicit, it becomes clear that the
zone_reclaim_stat structure actually belongs in lruvec, per-zone when
memcg is disabled but per-memcg per-zone when it's enabled.

We can delete mem_cgroup_get_reclaim_stat(), and change
update_page_reclaim_stat() to update just the one set of stats, the one
which get_scan_count() will actually use.

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm/memcg: scanning_global_lru means mem_cgroup_disabled
Hugh Dickins [Thu, 12 Apr 2012 22:52:03 +0000 (08:52 +1000)]
mm/memcg: scanning_global_lru means mem_cgroup_disabled

Although one has to admire the skill with which it has been concealed,
scanning_global_lru(mz) is actually just an interesting way to test
mem_cgroup_disabled().  Too many developer hours have been wasted on
confusing it with global_reclaim(): just use mem_cgroup_disabled().

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Glauber Costa <glommer@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomemcg swap: use mem_cgroup_uncharge_swap()
Hugh Dickins [Thu, 12 Apr 2012 22:52:03 +0000 (08:52 +1000)]
memcg swap: use mem_cgroup_uncharge_swap()

That stuff __mem_cgroup_commit_charge_swapin() does with a swap entry, it
has a name and even a declaration: just use mem_cgroup_uncharge_swap().

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomemcg swap: mem_cgroup_move_swap_account never needs fixup
Hugh Dickins [Thu, 12 Apr 2012 22:52:02 +0000 (08:52 +1000)]
memcg swap: mem_cgroup_move_swap_account never needs fixup

The need_fixup arg to mem_cgroup_move_swap_account() is always false,
so just remove it.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomemcg: fix/change behavior of shared anon at moving task
KAMEZAWA Hiroyuki [Thu, 12 Apr 2012 22:52:02 +0000 (08:52 +1000)]
memcg: fix/change behavior of shared anon at moving task

This patch changes memcg's behavior at task_move().

At task_move(), the kernel scans a task's page table and move the changes
for mapped pages from source cgroup to target cgroup.  There has been a
bug at handling shared anonymous pages for a long time.

Before patch:
  - The spec says 'shared anonymous pages are not moved.'
  - The implementation was 'shared anonymoys pages may be moved'.
    If page_mapcount <=2, shared anonymous pages's charge were moved.

After patch:
  - The spec says 'all anonymous pages are moved'.
  - The implementation is 'all anonymous pages are moved'.

Considering usage of memcg, this will not affect user's experience.
'shared anonymous' pages only exists between a tree of processes which
don't do exec().  Moving one of process without exec() seems not sane.
For example, libcgroup will not be affected by this change.  (Anyway, no
one noticed the implementation for a long time...)

Below is a discussion log:

 - current spec/implementation are complex
 - Now, shared file caches are moved
 - It adds unclear check as page_mapcount(). To do correct check,
   we should check swap users, etc.
 - No one notice this implementation behavior. So, no one get benefit
   from the design.
 - In general, once task is moved to a cgroup for running, it will not
   be moved....
 - Finally, we have control knob as memory.move_charge_at_immigrate.

Here is a patch to allow moving shared pages, completely. This makes
memcg simpler and fix current broken code.

Suggested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Glauber Costa <glommer@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm-add-extra-free-kbytes-tunable-update-checkpatch-fixes
Andrew Morton [Thu, 12 Apr 2012 22:52:01 +0000 (08:52 +1000)]
mm-add-extra-free-kbytes-tunable-update-checkpatch-fixes

ERROR: trailing whitespace
#98: FILE: mm/page_alloc.c:5303:
+ * free_kbytes_sysctl_handler - just a wrapper around proc_dointvec() so $

ERROR: trailing whitespace
#103: FILE: mm/page_alloc.c:5307:
+int free_kbytes_sysctl_handler(ctl_table *table, int write, $

ERROR: need consistent spacing around '*' (ctx:WxV)
#103: FILE: mm/page_alloc.c:5307:
+int free_kbytes_sysctl_handler(ctl_table *table, int write,
                                          ^

total: 3 errors, 0 warnings, 69 lines checked

NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or
      scripts/cleanfile

./patches/mm-add-extra-free-kbytes-tunable-update.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm-add-extra-free-kbytes-tunable-update
Rik van Riel [Thu, 12 Apr 2012 22:52:01 +0000 (08:52 +1000)]
mm-add-extra-free-kbytes-tunable-update

All the fixes suggested by Andrew Morton.   Not much of a changelog
since the patch should probably be folded into
mm-add-extra-free-kbytes-tunable.patch

Thank you for pointing these out, Andrew.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: add extra free kbytes tunable
Rik van Riel [Thu, 12 Apr 2012 22:52:00 +0000 (08:52 +1000)]
mm: add extra free kbytes tunable

Add a userspace visible knob to tell the VM to keep an extra amount of
memory free, by increasing the gap between each zone's min and low
watermarks.

This is useful for realtime applications that call system calls and have a
bound on the number of allocations that happen in any short time period.
In this application, extra_free_kbytes would be left at an amount equal to
or larger than than the maximum number of allocations that happen in any
burst.

It may also be useful to reduce the memory use of virtual machines
(temporarily?), in a way that does not cause memory fragmentation like
ballooning does.

Testing results from Satoru Moriya:

: I ran some sample workloads and measure memory allocation latency
: (latency of __alloc_page_nodemask()).
: The test is like following:
:
:  - CPU: 1 socket, 4 core
:  - Memory: 4GB
:
:  - Background load:
:    $ dd if=3D/dev/zero of=3D/tmp/tmp1
:    $ dd if=3D/dev/zero of=3D/tmp/tmp2
:    $ dd if=3D/dev/zero of=3D/tmp/tmp3
:
:  - Main load:
:    $ mapped-file-stream 1 $((1024 * 1024 * 640))  --(*)
:
:  (*) This is made by Johannes Weiner
:      https://lkml.org/lkml/2010/8/30/226
:
:      It allocates/access 640MByte memory at a burst.
:
: The result is follwoing:
:
:                                |         |  extra   |
:                                | default |  kbytes  |
: --------------------------------------------------------------
: min_free_kbytes                |    8113 |   8113   |
: extra_free_kbytes              |       0 | 640*1024 | (KB)
: --------------------------------------------------------------
: worst latency                  | 517.762 |  20.775  | (usec)
: --------------------------------------------------------------
: vmstat result                  |         |          |
:  nr_vmscan_write               |       0 |      0   |
:  pgsteal_dma                   |       0 |      0   |
:  pgsteal_dma32                 |  143667 | 144882   |
:  pgsteal_normal                |   31486 |  27001   |
:  pgsteal_movable               |       0 |      0   |
:  pgscan_kswapd_dma             |       0 |      0   |
:  pgscan_kswapd_dma32           |  138617 | 156351   |
:  pgscan_kswapd_normal          |   30593 |  27955   |
:  pgscan_kswapd_movable         |       0 |      0   |
:  pgscan_direct_dma             |       0 |      0   |
:  pgscan_direct_dma32           |    5050 |      0   |
:  pgscan_direct_normal          |     896 |      0   |
:  pgscan_direct_movable         |       0 |      0   |
:  kswapd_steal                  |  169207 | 171883   |
:  kswapd_inodesteal             |       0 |      0   |
:  kswapd_low_wmark_hit_quickly  |      43 |     45   |
:  kswapd_high_wmark_hit_quickly |       1 |      0   |
:  allocstall                    |      32 |      0   |
:
:
: As you can see, in the default case there were 32 direct reclaim
: (allocstal= l) and its worst latency was 517.762 usecs.  This value may be
: larger if a process would sleep or issue I/O in the direct reclaim path.
: OTOH, ii the other case where I add extra free bytes, there were no direct
: reclaim and its worst latency was 20.775 usecs.
:
: In this test case, we can avoid direct reclaim and keep a latency low.

Signed-off-by: Rik van Riel<riel@redhat.com>
Acked-by: Johannes Weiner <jweiner@redhat.com>
Tested-by: Satoru Moriya <satoru.moriya@hds.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: remove swap token code
Rik van Riel [Thu, 12 Apr 2012 22:52:00 +0000 (08:52 +1000)]
mm: remove swap token code

The swap token code no longer fits in with the current VM model.  It does
not play well with cgroups or the better NUMA placement code in
development, since we have only one swap token globally.

It also has the potential to mess with scalability of the system, by
increasing the number of non-reclaimable pages on the active and inactive
anon LRU lists.

Last but not least, the swap token code has been broken for a year without
complaints, as reported by Konstantin Khlebnikov.  This suggests we no
longer have much use for it.

The days of sub-1G memory systems with heavy use of swap are over.  If we
ever need thrashing reducing code in the future, we will have to implement
something that does scale.

Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: vmscan: remove reclaim_mode_t
Mel Gorman [Thu, 12 Apr 2012 22:51:59 +0000 (08:51 +1000)]
mm: vmscan: remove reclaim_mode_t

There is little motiviation for reclaim_mode_t once RECLAIM_MODE_[A]SYNC
and lumpy reclaim have been removed.  This patch gets rid of
reclaim_mode_t as well and improves the documentation about what
reclaim/compaction is and when it is triggered.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm: vmscan: remove lumpy reclaim
Mel Gorman [Thu, 12 Apr 2012 22:51:59 +0000 (08:51 +1000)]
mm: vmscan: remove lumpy reclaim

Lumpy reclaim had a purpose but in the mind of some, it was to kick the
system so hard it thrashed.  For others the purpose was to complicate
vmscan.c.  Over time it was giving softer shoes and a nicer attitude but
memory compaction needs to step up and replace it so this patch sends
lumpy reclaim to the farm.

Here are the important notes related to the patch.

1. The tracepoint format changes for isolating LRU pages.

2. This patch stops reclaim/compaction entering sync reclaim as this
   was only intended for lumpy reclaim and an oversight.  Page migration
   has its own logic for stalling on writeback pages if necessary and
   memory compaction is already using it.  This is a behaviour change.

3. RECLAIM_MODE_SYNC no longer exists.  pageout() does not stall on
   PageWriteback with CONFIG_COMPACTION.  It has been this way for a
   while.  I am calling it out in case this is a surpise to people.  This
   behaviour avoids a situation where we wait on a page being written back
   to slow storage like USB.  Currently we depend on wait_iff_congested()
   for throttling if if too many dirty pages are scanned.

4. Reclaim/compaction can no longer queue dirty pages in pageout() if
   the underlying BDI is congested.  Lumpy reclaim used this logic and
   reclaim/compaction was using it in error.  This is a behaviour change.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm, thp: allow fallback when pte_alloc_one() fails for huge pmd
David Rientjes [Thu, 12 Apr 2012 22:51:58 +0000 (08:51 +1000)]
mm, thp: allow fallback when pte_alloc_one() fails for huge pmd

The transparent hugepages feature is careful to not invoke the oom killer
when a hugepage cannot be allocated.

pte_alloc_one() failing in __do_huge_pmd_anonymous_page(), however,
currently results in VM_FAULT_OOM which invokes the pagefault oom killer
to kill a memory-hogging task.

This is unnecessary since it's possible to drop the reference to the
hugepage and fallback to allocating a small page.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm, thp: remove unnecessary ret variable
David Rientjes [Thu, 12 Apr 2012 22:51:58 +0000 (08:51 +1000)]
mm, thp: remove unnecessary ret variable

The "ret" variable is unnecessary in __do_huge_pmd_anonymous_page(), so
remove it.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm/hugetlb.c: use long vars instead of int in region_count()
Wang Sheng-Hui [Thu, 12 Apr 2012 22:51:57 +0000 (08:51 +1000)]
mm/hugetlb.c: use long vars instead of int in region_count()

args f & t and fields from & to of struct file_region are defined as long.
 Use long instead of int to type the temp vars.

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm/mempolicy.c: use enum value MPOL_REBIND_ONCE in mpol_rebind_policy()
Wang Sheng-Hui [Thu, 12 Apr 2012 22:51:57 +0000 (08:51 +1000)]
mm/mempolicy.c: use enum value MPOL_REBIND_ONCE in mpol_rebind_policy()

We have enum definition in mempolicy.h: MPOL_REBIND_ONCE.  It should
replace the magic number 0 for step comparison in function
mpol_rebind_policy.

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agomm/memory_failure: let the compiler add the function name
Borislav Petkov [Thu, 12 Apr 2012 22:51:56 +0000 (08:51 +1000)]
mm/memory_failure: let the compiler add the function name

These things tend to get out of sync with time so let the compiler
automatically enter the current function name using __func__.

No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Andi Kleen <andi@firstfloor.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agobrlocks/lglocks: cleanups
Andi Kleen [Thu, 12 Apr 2012 22:51:56 +0000 (08:51 +1000)]
brlocks/lglocks: cleanups

lglocks and brlocks are currently generated with some complicated macros
in lglock.h.  But there's no reason to not just use common utility
functions and put all the data into a common data structure.

Since there are at least two users it makes sense to share this code in a
library.  This is also easier maintainable than a macro forest.

This will also make it later possible to dynamically allocate lglocks and
also use them in modules (this would both still need some additional, but
now straightforward, code)

In general the users now look more like normal function calls with
pointers, not magic macros.

The patch is rather large because I move over all users in one go to keep
it bisectable.  This impacts the VFS somewhat in terms of lines changed.
But no actual behaviour change.

[akpm@linux-foundation.org: checkpatch fixes]
[levinsasha928@gmail.com: fix dup_mnt_ns()]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agofsnotify: handle subfiles' perm events
Naohiro Aota [Thu, 12 Apr 2012 22:51:56 +0000 (08:51 +1000)]
fsnotify: handle subfiles' perm events

Recently I'm working on fanotify and found the following strange
behaviors.

I wrote a program to set fanotify_mark on "/tmp/block" and FAN_DENY
all events notified.

fanotify_mask = FAN_ALL_EVENTS | FAN_ALL_PERM_EVENTS | FAN_EVENT_ON_CHILD:
$ cd /tmp/block; cat foo
cat: foo: Operation not permitted

Operation on the file is blocked as expected.

But,

fanotify_mask = FAN_ALL_PERM_EVENTS | FAN_EVENT_ON_CHILD:
$ cd /tmp/block; cat foo
aaa

It's not blocked anymore.  This is confusing behavior.  Also reading
commit "fsnotify: call fsnotify_parent in perm events", it seems like
fsnotify should handle subfiles' perm events as well as the other notify
events.

With this patch, regardless of FAN_ALL_EVENTS set or not:
$ cd /tmp/block; cat foo
cat: foo: Operation not permitted

Operation on the file is now blocked properly.

FS_OPEN_PERM and FS_ACCESS_PERM are not listed on FS_EVENTS_POSS_ON_CHILD.
 Due to fsnotify_inode_watches_children() check, if you only specify only
these events as fsnotify_mask, you don't get subfiles' perm events
notified.

This patch add the events to FS_EVENTS_POSS_ON_CHILD to get them notified
even if only these events are specified to fsnotify_mask.

Signed-off-by: Naohiro Aota <naota@elisp.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agofsnotify: remove unused parameter from send_to_group()
Dan Carpenter [Thu, 12 Apr 2012 22:51:55 +0000 (08:51 +1000)]
fsnotify: remove unused parameter from send_to_group()

We don't use "mnt" anymore in send_to_group() after 1968f5eed5 ("fanotify:
use both marks when possible") was applied.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agofs: hardlink creation restrictions
Kees Cook [Thu, 12 Apr 2012 22:51:55 +0000 (08:51 +1000)]
fs: hardlink creation restrictions

On systems that have user-writable directories on the same partition as
system files, a long-standing class of security issues is the
hardlink-based time-of-check-time-of-use race, most commonly seen in
world-writable directories like /tmp.  The common method of exploitation
of this flaw is to cross privilege boundaries when following a given
hardlink (i.e.  a root process follows a hardlink created by another
user).  Additionally, an issue exists where users can "pin" a potentially
vulnerable setuid/setgid file so that an administrator will not actually
upgrade a system fully.

The solution is to permit hardlinks to only be created when the user is
already the existing file's owner, or if they already have read/write
access to the existing file.

Many Linux users are surprised when they learn they can link to files they
have no access to, so this change appears to follow the doctrine of "least
surprise".  Additionally, this change does not violate POSIX, which states
"the implementation may require that the calling process has permission to
access the existing file"[1].

This change is known to break some implementations of the "at" daemon,
though the version used by Fedora and Ubuntu has been fixed[2] for a
while.  Otherwise, the change has been undisruptive while in use in Ubuntu
for the last 1.5 years.

This patch is based on the patch in Openwall and grsecurity.  I have added
a sysctl to enable the protected behavior, documentation, and an audit
notification.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/linkat.html
[2] http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279

[akpm@linux-foundation.org: uninline may_linkat() and audit_log_link_denied()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Rik van Riel <riel@redhat.com>
Cc: Federica Teodori <federica.teodori@googlemail.com>
Cc: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Eric Paris <eparis@redhat.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Dan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>