]> git.kernelconcepts.de Git - karo-tx-linux.git/log
karo-tx-linux.git
11 years agoAdd linux-next specific files for 20121205 next-20121205
Stephen Rothwell [Wed, 5 Dec 2012 05:47:01 +0000 (16:47 +1100)]
Add linux-next specific files for 20121205

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
11 years agoMerge branch 'akpm/master'
Stephen Rothwell [Wed, 5 Dec 2012 05:24:58 +0000 (16:24 +1100)]
Merge branch 'akpm/master'

11 years agoscatterlist-dont-bug-when-we-can-trivially-return-a-proper-error-fix
Andrew Morton [Thu, 29 Nov 2012 03:19:22 +0000 (14:19 +1100)]
scatterlist-dont-bug-when-we-can-trivially-return-a-proper-error-fix

s/E2BIG/EINVAL/

Cc: Nick Bowler <nbowler@elliptictech.com>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoscatterlist: don't BUG when we can trivially return a proper error.
Nick Bowler [Thu, 29 Nov 2012 03:19:22 +0000 (14:19 +1100)]
scatterlist: don't BUG when we can trivially return a proper error.

There is absolutely no reason to crash the kernel when we have a perfectly
good return value already available to use for conveying failure status.

Let's return an error code instead of crashing the kernel: that sounds
like a much better plan.

Signed-off-by: Nick Bowler <nbowler@elliptictech.com>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agofs-notify-add-procfs-fdinfo-helper-v7-fix
Andrew Morton [Thu, 29 Nov 2012 03:19:21 +0000 (14:19 +1100)]
fs-notify-add-procfs-fdinfo-helper-v7-fix

s/mark_lock/mark_mutex/

Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: Matthew Helsley <matt.helsley@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodocs-add-documentation-about-proc-pid-fdinfo-fd-output-fix
Andrew Morton [Thu, 29 Nov 2012 03:19:21 +0000 (14:19 +1100)]
docs-add-documentation-about-proc-pid-fdinfo-fd-output-fix

Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: Matthew Helsley <matt.helsley@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agodocs: add documentation about /proc/<pid>/fdinfo/<fd> output
Cyrill Gorcunov [Thu, 29 Nov 2012 03:19:21 +0000 (14:19 +1100)]
docs: add documentation about /proc/<pid>/fdinfo/<fd> output

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Helsley <matt.helsley@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agofs, notify: don't forget to provide fhandle for inode fanotify
Cyrill Gorcunov [Thu, 29 Nov 2012 03:19:20 +0000 (14:19 +1100)]
fs, notify: don't forget to provide fhandle for inode fanotify

For inode based fanotify I missed to add fhandle output. This patch
brings it in.

 | pos: 0
 | flags: 02
 | fanotify ino:2 sdev:800013 mask:1 ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:0200000000000000

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Helsley <matt.helsley@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agofs, notify: add missing space after prefix
Cyrill Gorcunov [Thu, 29 Nov 2012 03:19:20 +0000 (14:19 +1100)]
fs, notify: add missing space after prefix

While being prepared the first series I occasionally left "inotify-wd"
token not updated.  This patch fixes it and bring space between prefix and
the rest of line back.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Helsley <matt.helsley@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agofs, notify: Move bare fdinfo helpers to a header
Cyrill Gorcunov [Thu, 29 Nov 2012 03:19:20 +0000 (14:19 +1100)]
fs, notify: Move bare fdinfo helpers to a header

Otherwise if the kernel gets build without procfs support
we will have build error

 | fs/notify/inotify/inotify_user.c:333:17: error: 'inotify_show_fdinfo' undeclared here (not in a function)

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agofs, notify: add procfs fdinfo helper
Cyrill Gorcunov [Thu, 29 Nov 2012 03:19:19 +0000 (14:19 +1100)]
fs, notify: add procfs fdinfo helper

This allow us to print out fsnotify details such as watchee inode, device,
mask and optionally a file handle.

For inotify objects if kernel compiled with exportfs support the output
will be

 | pos: 0
 | flags: 02000000
 | inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d
 | inotify wd:2 ino:a111 sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:11a1000020542153
 | inotify wd:1 ino:6b149 sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:49b1060023552153

If kernel compiled without exportfs support, the file handle
won't be provided but inode and device only.

 | pos: 0
 | flags: 02000000
 | inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0
 | inotify wd:2 ino:a111 sdev:800013 mask:800afce ignored_mask:0
 | inotify wd:1 ino:6b149 sdev:800013 mask:800afce ignored_mask:0

For fanotify the output is like

 | pos: 0
 | flags: 02
 | fanotify ino:68f71 sdev:800013 mask:1 ignored_mask:40000000

 | pos: 0
 | flags: 02
 | fanotify mnt_id:13 mask:1 ignored_mask:40000000

To minimize impact on general fsnotify code the new functionality
is gathered in fs/notify/fdinfo.c file.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Helsley <matt.helsley@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agofs, exportfs: add exportfs_encode_inode_fh() helper
Cyrill Gorcunov [Thu, 29 Nov 2012 03:19:19 +0000 (14:19 +1100)]
fs, exportfs: add exportfs_encode_inode_fh() helper

We will need this helper in the next patch to provide a file handle for
inotify marks in /proc/pid/fdinfo output.

The patch is rather providing the way to use inodes directly when dentry
is not available (like in case of inotify system).

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Helsley <matt.helsley@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agofs, exportfs: escape nil dereference if no s_export_op present
Cyrill Gorcunov [Thu, 29 Nov 2012 03:19:19 +0000 (14:19 +1100)]
fs, exportfs: escape nil dereference if no s_export_op present

This routine will be used to generate a file handle in fdinfo output for
inotify subsystem, where if no s_export_op present the general
export_encode_fh should be used.  Thus add a test if s_export_op present
inside exportfs_encode_fh itself.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Helsley <matt.helsley@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agofdinfo: show sigmask for signalfd fd
Cyrill Gorcunov [Thu, 29 Nov 2012 03:19:18 +0000 (14:19 +1100)]
fdinfo: show sigmask for signalfd fd

The sigmask is read in lockless manner for a sake of
code simplicity, thus if precise data needed here
the tasks which refer to the signalfd should be
stopped before read.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Helsley <matt.helsley@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agofs, epoll: drop enabled field from fdinfo output
Cyrill Gorcunov [Thu, 29 Nov 2012 03:19:18 +0000 (14:19 +1100)]
fs, epoll: drop enabled field from fdinfo output

Once EPOLL_CTL_DISABLE get merged into mainline I'll bring "enabled" field
back.  Plain check for rdllink is not enough here and should be extended,
thus to not confuse the readers drop it for a while.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Helsley <matt.helsley@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agofs, epoll: add procfs fdinfo helper
Cyrill Gorcunov [Thu, 29 Nov 2012 03:19:18 +0000 (14:19 +1100)]
fs, epoll: add procfs fdinfo helper

This allows us to print out eventpoll target file descriptor, events and
data, the /proc/pid/fdinfo/fd consists of

 | pos: 0
 | flags: 02
 | tfd:        5 events:       1d data: ffffffffffffffff enabled: 1

[avagin@: fix for unitialized ret variable]

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Helsley <matt.helsley@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agofs, eventfd: add procfs fdinfo helper
Cyrill Gorcunov [Thu, 29 Nov 2012 03:19:18 +0000 (14:19 +1100)]
fs, eventfd: add procfs fdinfo helper

This allows us to print out raw counter value.  The /proc/pid/fdinfo/fd
output is

 | pos: 0
 | flags: 04002
 | eventfd-count:               5a

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Helsley <matt.helsley@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoprocfs: add ability to plug in auxiliary fdinfo providers
Cyrill Gorcunov [Thu, 29 Nov 2012 03:19:17 +0000 (14:19 +1100)]
procfs: add ability to plug in auxiliary fdinfo providers

This patch brings ability to print out auxiliary data associated
with file in procfs interface /proc/pid/fdinfo/fd.

In particular further patches make eventfd, evenpoll, signalfd
and fsnotify to print additional information complete enough
to restore these objects after checkpoint.

To simplify the code we add show_fdinfo callback inside
struct file_operations (as Al and Pavel are proposing).

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Helsley <matt.helsley@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agotools/testing/selftests/kcmp/kcmp_test.c: print reason for failure in kcmp_test
Dave Jones [Thu, 29 Nov 2012 03:19:17 +0000 (14:19 +1100)]
tools/testing/selftests/kcmp/kcmp_test.c: print reason for failure in kcmp_test

I was curious why sys_kcmp wasn't working, which led me to the testcase.
It turned out I hadn't enabled CHECKPOINT_RESTORE in the kernel I was
testing.  Add a decoding of errno to the testcase to make that obvious.

Signed-off-by: Dave Jones <davej@redhat.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agobreakpoint selftests: print failure status instead of cause make error
Dave Young [Thu, 29 Nov 2012 03:19:17 +0000 (14:19 +1100)]
breakpoint selftests: print failure status instead of cause make error

In case breakpoint test exit non zero value it will cause make error.
Better way is just print the test failure status.

Signed-off-by: Dave Young <dyoung@redhat.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agokcmp selftests: print fail status instead of cause make error
Dave Young [Thu, 29 Nov 2012 03:19:16 +0000 (14:19 +1100)]
kcmp selftests: print fail status instead of cause make error

In case kcmp_test exit non zero value it will cause make error.
Better way is just print the test failure status.

Signed-off-by: Dave Young <dyoung@redhat.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agokcmp selftests: make run_tests fix
Dave Young [Thu, 29 Nov 2012 03:19:16 +0000 (14:19 +1100)]
kcmp selftests: make run_tests fix

make run_tests need the target is run_tests instead of run-tests
Also gcc output should be kcmp_test. Fix these two issues.

Signed-off-by: Dave Young <dyoung@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomem-hotplug selftests: print failure status instead of cause make error
Dave Young [Thu, 29 Nov 2012 03:19:16 +0000 (14:19 +1100)]
mem-hotplug selftests: print failure status instead of cause make error

  bash-4.1$ make -C memory-hotplug run_tests
  make: Entering directory `/home/dave/git/linux-2.6/tools/testing/selftests/memory-hotplug'
  ./on-off-test.sh
  make: execvp: ./on-off-test.sh: Permission denied
  make: *** [run_tests] Error 127
  make: Leaving directory `/home/dave/git/linux-2.6/tools/testing/selftests/memory-hotplug'

After applying the patch:
  bash-4.1$ make -C memory-hotplug run_tests
  make: Entering directory `/home/dave/git/linux-2.6/tools/testing/selftests/memory-hotplug'
  /bin/sh: ./on-off-test.sh: Permission denied
  memory-hotplug selftests: [FAIL]
  make: Leaving directory `/home/dave/git/linux-2.6/tools/testing/selftests/memory-hotplug'

Signed-off-by: Dave Young <dyoung@redhat.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agocpu-hotplug selftests: print failure status instead of cause make error
Dave Young [Thu, 29 Nov 2012 03:19:15 +0000 (14:19 +1100)]
cpu-hotplug selftests: print failure status instead of cause make error

  bash-4.1$ make -C cpu-hotplug run_tests
  make: Entering directory `/home/dave/git/linux-2.6/tools/testing/selftests/cpu-hotplug'
  ./on-off-test.sh
  make: execvp: ./on-off-test.sh: Permission denied
  make: *** [run_tests] Error 127
  make: Leaving directory `/home/dave/git/linux-2.6/tools/testing/selftests/cpu-hotplug'

After applying the patch:
  bash-4.1$ make -C cpu-hotplug run_tests
  make: Entering directory `/home/dave/git/linux-2.6/tools/testing/selftests/cpu-hotplug'
  /bin/sh: ./on-off-test.sh: Permission denied
  cpu-hotplug selftests: [FAIL]
  make: Leaving directory `/home/dave/git/linux-2.6/tools/testing/selftests/cpu-hotplug'

Signed-off-by: Dave Young <dyoung@redhat.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomqueue selftests: print failure status instead of cause make error
Dave Young [Thu, 29 Nov 2012 03:19:15 +0000 (14:19 +1100)]
mqueue selftests: print failure status instead of cause make error

Original behavior:
  bash-4.1$ make -C mqueue run_tests
  make: Entering directory `/home/dave/git/linux-2.6/tools/testing/selftests/mqueue'
  ./mq_open_tests /test1
  Not running as root, but almost all tests require root in order to modify
  system settings.  Exiting.
  make: *** [run_tests] Error 1
  make: Leaving directory `/home/dave/git/linux-2.6/tools/testing/selftests/mqueue'

After applying the patch:
  bash-4.1$ make -C mqueue run_tests
  make: Entering directory `/home/dave/git/linux-2.6/tools/testing/selftests/mqueue'
  Not running as root, but almost all tests require root in order to modify
  system settings.  Exiting.
  mq_open_tests: [FAIL]
  Not running as root, but almost all tests require root in order to modify
  system settings.  Exiting.
  mq_perf_tests: [FAIL]
  make: Leaving directory `/home/dave/git/linux-2.6/tools/testing/selftests/mqueue'

Signed-off-by: Dave Young <dyoung@redhat.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agovm selftests: print failure status instead of cause make error
Dave Young [Thu, 29 Nov 2012 03:19:15 +0000 (14:19 +1100)]
vm selftests: print failure status instead of cause make error

  bash-4.1$ make -C vm run_tests
  make: Entering directory `/home/dave/git/linux-2.6/tools/testing/selftests/vm'
  /bin/sh ./run_vmtests
  ./run_vmtests: line 24: /proc/sys/vm/nr_hugepages: Permission denied
  Please run this test as root
  make: *** [run_tests] Error 1
  make: Leaving directory `/home/dave/git/linux-2.6/tools/testing/selftests/vm'

After applying the patch:
  bash-4.1$ make -C vm run_tests
  make: Entering directory `/home/dave/git/linux-2.6/tools/testing/selftests/vm'
  ./run_vmtests: line 24: /proc/sys/vm/nr_hugepages: Permission denied
  Please run this test as root
  vmtests: [FAIL]
  make: Leaving directory `/home/dave/git/linux-2.6/tools/testing/selftests/vm'

Signed-off-by: Dave Young <dyoung@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoDocumentation/DMA-API-HOWTO.txt: fix typo
Andrew Morton [Thu, 29 Nov 2012 03:19:14 +0000 (14:19 +1100)]
Documentation/DMA-API-HOWTO.txt: fix typo

Noted by Jesper

Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomtd: mtd_stresstest: use prandom_bytes()
Akinobu Mita [Thu, 29 Nov 2012 03:19:14 +0000 (14:19 +1100)]
mtd: mtd_stresstest: use prandom_bytes()

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Laight <david.laight@aculab.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Robert Love <robert.w.love@intel.com>
Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomtd: mtd_subpagetest: convert to use prandom library
Akinobu Mita [Thu, 29 Nov 2012 03:19:14 +0000 (14:19 +1100)]
mtd: mtd_subpagetest: convert to use prandom library

This removes home-brewed pseudo-random number generator and use
prandom library.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Laight <david.laight@aculab.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Robert Love <robert.w.love@intel.com>
Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomtd: mtd_speedtest: use prandom_bytes
Akinobu Mita [Thu, 29 Nov 2012 03:19:13 +0000 (14:19 +1100)]
mtd: mtd_speedtest: use prandom_bytes

Use prandom_bytes instead of equivalent local function.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Laight <david.laight@aculab.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Robert Love <robert.w.love@intel.com>
Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomtd: mtd_pagetest: convert to use prandom library
Akinobu Mita [Thu, 29 Nov 2012 03:19:13 +0000 (14:19 +1100)]
mtd: mtd_pagetest: convert to use prandom library

This removes home-brewed pseudo-random number generator and use
prandom library.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Laight <david.laight@aculab.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Robert Love <robert.w.love@intel.com>
Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomtd: mtd_oobtest: convert to use prandom library
Akinobu Mita [Thu, 29 Nov 2012 03:19:13 +0000 (14:19 +1100)]
mtd: mtd_oobtest: convert to use prandom library

This removes home-brewed pseudo-random number generator and use
prandom library.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Laight <david.laight@aculab.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Robert Love <robert.w.love@intel.com>
Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomtd: mtd_nandecctest: use prandom_bytes instead of get_random_bytes()
Akinobu Mita [Thu, 29 Nov 2012 03:19:13 +0000 (14:19 +1100)]
mtd: mtd_nandecctest: use prandom_bytes instead of get_random_bytes()

Using prandom_bytes() is enough.  Because this data is only used
for testing, not used for cryptographic use.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Laight <david.laight@aculab.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Robert Love <robert.w.love@intel.com>
Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoubifs: use prandom_bytes
Akinobu Mita [Thu, 29 Nov 2012 03:19:12 +0000 (14:19 +1100)]
ubifs: use prandom_bytes

This also converts filling memory loop to use memset.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: David Laight <david.laight@aculab.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Robert Love <robert.w.love@intel.com>
Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agomtd: nandsim: use prandom_bytes
Akinobu Mita [Thu, 29 Nov 2012 03:19:12 +0000 (14:19 +1100)]
mtd: nandsim: use prandom_bytes

This also removes unnecessary memset call which is immediately overwritten
with random bytes.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Laight <david.laight@aculab.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Robert Love <robert.w.love@intel.com>
Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agobnx2x: use prandom_bytes()
Akinobu Mita [Thu, 29 Nov 2012 03:19:12 +0000 (14:19 +1100)]
bnx2x: use prandom_bytes()

Use prandom_bytes() to fill rss key with pseudo-random bytes.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: David Laight <david.laight@aculab.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Robert Love <robert.w.love@intel.com>
Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoprandom: introduce prandom_bytes() and prandom_bytes_state()
Akinobu Mita [Thu, 29 Nov 2012 03:19:11 +0000 (14:19 +1100)]
prandom: introduce prandom_bytes() and prandom_bytes_state()

Add functions to get the requested number of pseudo-random bytes.

The difference from get_random_bytes() is that it generates pseudo-random
numbers by prandom_u32().  It doesn't consume the entropy pool, and the
sequence is reproducible if the same rnd_state is used.  So it is suitable
for generating random bytes for testing.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: David Laight <david.laight@aculab.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Robert Love <robert.w.love@intel.com>
Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agorandom32: rename random32 to prandom
Akinobu Mita [Thu, 29 Nov 2012 03:19:11 +0000 (14:19 +1100)]
random32: rename random32 to prandom

This renames all random32 functions to have 'prandom_' prefix as follows:

void prandom_seed(u32 seed); /* rename from srandom32() */
u32 prandom_u32(void); /* rename from random32() */
void prandom_seed_state(struct rnd_state *state, u64 seed);
/* rename from prandom32_seed() */
u32 prandom_u32_state(struct rnd_state *state);
/* rename from prandom32() */

The purpose of this renaming is to prevent some kernel developers from
assuming that prandom32() and random32() might imply that only
prandom32() was the one using a pseudo-random number generator by
prandom32's "p", and the result may be a very embarassing security
exposure.  This concern was expressed by Theodore Ts'o.

And furthermore, I'm going to introduce new functions for getting the
requested number of pseudo-random bytes.  If I continue to use both
prandom32 and random32 prefixes for these functions, the confusion
is getting worse.

As a result of this renaming, "prandom_" is the common prefix for
pseudo-random number library.

Currently, srandom32() and random32() are preserved because it is
difficult to rename too many users at once.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Robert Love <robert.w.love@intel.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Cc: David Laight <david.laight@aculab.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: return real minor number for static minors
Ed Cashin [Thu, 29 Nov 2012 03:19:11 +0000 (14:19 +1100)]
aoe: return real minor number for static minors

The value returned by the static minor device number number allocator is
the real minor number, so it must be multiplied by the supported number of
partitions per aoedev.

Without this fix the support for systems without udev is incomplete, and
the few users of aoe on such systems will have surprising results when
device nodes names do not match the AoE target.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: initialize sysminor to avoid compiler warning
Ed Cashin [Thu, 29 Nov 2012 03:19:10 +0000 (14:19 +1100)]
aoe: initialize sysminor to avoid compiler warning

Because the minor_get and related functions use the return values for
errors, the compiler doesn't know that sysminor will always either 1) be
initialized in aoedev_by_aoeaddr by the call to minor_get, or 2) be unused
as the "goto out" is executed.

This patch avoids the compiler warning.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: make error messages more specific in static minor allocation
Ed Cashin [Thu, 29 Nov 2012 03:19:10 +0000 (14:19 +1100)]
aoe: make error messages more specific in static minor allocation

For some special-purpose systems where udev isn't present, static
allocation of minor numbers is desirable.  This update distinguishes
different failure scenarios, to help the user understand what went wrong.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: remove call to request handler from I/O completion
Ed Cashin [Thu, 29 Nov 2012 03:19:10 +0000 (14:19 +1100)]
aoe: remove call to request handler from I/O completion

There is no need to call the request handler function in the I/O
completion routine.  The user impact of not doing it is a more "nice" aoe
driver that is less susceptible to causing soft lockups.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: cleanup: correct comment for aoetgt nout
Ed Cashin [Thu, 29 Nov 2012 03:19:09 +0000 (14:19 +1100)]
aoe: cleanup: correct comment for aoetgt nout

A misplaced comment was attached to the nout member of the aoetgt.  This
change corrects the comment.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: increase default cap on outstanding AoE commands in the network
Ed Cashin [Thu, 29 Nov 2012 03:19:09 +0000 (14:19 +1100)]
aoe: increase default cap on outstanding AoE commands in the network

The aoe driver will never be waiting for more than aoe_maxout AoE commands
from a given remote network port on an AoE target.  Increasing the cap
increases performance.  Users can tighten the setting to reduce the amount
of memory used for handling AoE traffic or the network bandwidth used for
AoE.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: remove vestigial request queue allocation
Ed Cashin [Thu, 29 Nov 2012 03:19:09 +0000 (14:19 +1100)]
aoe: remove vestigial request queue allocation

Before the aoe driver was an I/O request handler, it was a
make_request-style block driver.  Even so, there was a problem where sysfs
expected a request queue to exist, so one was provided in commit
7135a71b19be1fa ("aoe: allocate unused request_queue for sysfs").

During the transition to the request-handler style, a patch was merged
that was based on a driver without the noop queue, and the noop queue
remained in place after the patch was merged, even though a new functional
queue was introduced by the patch, allocated through blk_init_queue.

The user impact is a memory leak proportional to the number of AoE targets
discovered.  This patch removes the memory leak and cleans up vestiges of
the old do-nothing queue from the aoeblk_gdalloc function.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: copy fallback timing information on destination failover
Ed Cashin [Thu, 29 Nov 2012 03:19:08 +0000 (14:19 +1100)]
aoe: copy fallback timing information on destination failover

commit f3b8e07af7744cbb ("aoe: commands in retransmit queue use new
destination on failure") omits the copying of the coarse-grained time when
an AoE command was sent during the failover from one destination MAC
address on the AoE target to another.

The coarse-grained timing is only used when the system time changes or an
unlikely length of time has passed since the sending of the AoE command.
Users will not be impacted unless their system clock is very inaccurate or
something unusual (e.g., 10 GbE link reset) happens during the period when
the aoe driver is handling the failure of a port on the AoE target.  Being
effected will mean that an AoE target could be considered "down" too
eagerly.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: update driver-internal version to 64+
Ed Cashin [Thu, 29 Nov 2012 03:19:08 +0000 (14:19 +1100)]
aoe: update driver-internal version to 64+

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: commands in retransmit queue use new destination on failure
Ed Cashin [Thu, 29 Nov 2012 03:19:08 +0000 (14:19 +1100)]
aoe: commands in retransmit queue use new destination on failure

When one remote MAC address isn't working as a destination for AoE
commands, the frames used to track information associated with the AoE
commands are moved to a new aoetgt (defined by the tuple of {AoE major,
AoE minor, target MAC address}).

This patch makes sure that the frames on the queue for retransmits that
need to be done are updated to use the new destination, so that
retransmits will be sent through a working network path.

Without this change, packets on the retransmit queue will be needlessly
retransmitted to the unresponsive destination MAC, possibly causing
premature target failure before there's time for the retransmit timer to
run again, decide to retransmit again, and finally update the destination
to a working MAC address on the AoE target.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: use high-resolution RTTs with fallback to low-res
Ed Cashin [Thu, 29 Nov 2012 03:19:07 +0000 (14:19 +1100)]
aoe: use high-resolution RTTs with fallback to low-res

These changes improve the accuracy of the decision about whether it's time
to retransmit an AoE command by using the microsecond-resolution
gettimeofday instead of jiffies.

Because the system time can jump suddenly, the decision reverts to using
jiffies if the high-resolution time difference is relatively large.
Otherwise the AoE targets could be considered failed inappropriately.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: manipulate aoedev network stats under lock
Ed Cashin [Thu, 29 Nov 2012 03:19:07 +0000 (14:19 +1100)]
aoe: manipulate aoedev network stats under lock

With this bugfix in place the calculation of the criterion for "lateness"
is performed under lock.  Without the lock, there is a chance that one of
the non-atomic operations performed on the round trip time statistics
could be incomplete, such that an incorrect lateness criterion would be
calculated.

Without this change, the effect of the bug would be rare unecessary but
benign retransmissions.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: err device: include MAC addresses for unexpected responses
Ed Cashin [Thu, 29 Nov 2012 03:19:07 +0000 (14:19 +1100)]
aoe: err device: include MAC addresses for unexpected responses

The /dev/etherd/err character device provides low-level information about
normal but sometimes interesting AoE command retransmits and "unexpected
responses", i.e., responses for packets that have already been
retransmitted.

This change adds MAC addresses to the messages about unexpected responses,
so that when they occur, it's more easy to determine the network paths to
which they belong.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: improve network congestion handling
Ed Cashin [Thu, 29 Nov 2012 03:19:07 +0000 (14:19 +1100)]
aoe: improve network congestion handling

The aoe driver already had some congestion handling, but it was limited in
its ability to cope with the kind of congestion that can arise on more
complex networks such as those involving paths through multiple ethernet
switches.

Some of the lessons from TCP's history of development can be applied to
improving the congestion control and avoidance on AoE storage networks.
These changes use familar concepts from Van Jacobson's "Congestion
Avoidance and Control" paper from '88, without adding significant
overhead.

This patch depends on an upcoming patch that covers the failover case when
AoE commands being retransmitted are transferred from one retransmit queue
to another.  Another upcoming patch increases the timing accuracy.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: provide ATA identify device content to user on request
Ed Cashin [Thu, 29 Nov 2012 03:19:06 +0000 (14:19 +1100)]
aoe: provide ATA identify device content to user on request

Make the aoe driver follow expected behavior when the user uses ioctl to
get the ATA device identify information, allowing access to model, serial
number, etc.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: update driver-internal version number to 60
Ed Cashin [Thu, 29 Nov 2012 03:19:06 +0000 (14:19 +1100)]
aoe: update driver-internal version number to 60

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: whitespace cleanup
Ed Cashin [Thu, 29 Nov 2012 03:19:06 +0000 (14:19 +1100)]
aoe: whitespace cleanup

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: cleanup: remove unused ata_scnt function
Ed Cashin [Thu, 29 Nov 2012 03:19:05 +0000 (14:19 +1100)]
aoe: cleanup: remove unused ata_scnt function

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: "payload" sysfs file exports per-AoE-command data transfer size
Ed Cashin [Thu, 29 Nov 2012 03:19:05 +0000 (14:19 +1100)]
aoe: "payload" sysfs file exports per-AoE-command data transfer size

The userland aoetools package includes an "aoe-stat" command that can
display a "payload size" column when the aoe driver exports this
information.  Users can quickly see what amount of user data is
transferred inside each AoE command on the network, network headers
excluded.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: support larger I/O requests via aoe_maxsectors module param
Ed Cashin [Thu, 29 Nov 2012 03:19:05 +0000 (14:19 +1100)]
aoe: support larger I/O requests via aoe_maxsectors module param

The GPFS filesystem is an example of an aoe user that requires the aoe
driver to support I/O request sizes larger than the default.  Most users
will not need large I/O request sizes, because they would need to be split
up into multiple AoE commands anyway.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: support the forgetting (flushing) of a user-specified AoE target
Ed Cashin [Thu, 29 Nov 2012 03:19:04 +0000 (14:19 +1100)]
aoe: support the forgetting (flushing) of a user-specified AoE target

Users sometimes want to cause the aoe driver to forget a particular
previously discovered device when it is no longer online.  The aoetools
provide an "aoe-flush" command that users run to perform this
administrative task.  The changes below provide the support needed in the
driver.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: update cap on outstanding commands based on config query response
Ed Cashin [Thu, 29 Nov 2012 03:19:04 +0000 (14:19 +1100)]
aoe: update cap on outstanding commands based on config query response

The ATA over Ethernet config query response contains a "buffer count"
field reflecting the AoE target's capacity to buffer incoming AoE
commands.

By taking the current value of this field into accound, we increase
performance throughput or avoid network congestion, when the value
has increased or decreased, respectively.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: avoid using skb member after dev_queue_xmit
Ed Cashin [Thu, 29 Nov 2012 03:19:04 +0000 (14:19 +1100)]
aoe: avoid using skb member after dev_queue_xmit

After calling dev_queue_xmit it is no longer safe to access the
members of the skb.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe-print-warning-regarding-a-common-reason-for-dropped-transmits-v2
Ed Cashin [Thu, 29 Nov 2012 03:19:03 +0000 (14:19 +1100)]
aoe-print-warning-regarding-a-common-reason-for-dropped-transmits-v2

Dropped transmits are not common, but when they do occur, increasing
the transmit queue length often helps.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: print warning regarding a common reason for dropped transmits
Ed Cashin [Thu, 29 Nov 2012 03:19:03 +0000 (14:19 +1100)]
aoe: print warning regarding a common reason for dropped transmits

Dropped transmits are not common, but when they do occur, increasing
the transmit queue length often helps.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoaoe: describe the behavior of the "err" character device
Ed Cashin [Thu, 29 Nov 2012 03:19:03 +0000 (14:19 +1100)]
aoe: describe the behavior of the "err" character device

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoDocumentation/sparse.txt: document context annotations for lock checking
Ed Cashin [Thu, 29 Nov 2012 03:19:03 +0000 (14:19 +1100)]
Documentation/sparse.txt: document context annotations for lock checking

The context feature of sparse is used with the Linux kernel sources to
check for imbalanced uses of locks.  Document the annotations defined in
include/linux/compiler.h that tell sparse what to expect when a lock is
held on function entry, exit, or both.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Christopher Li <sparse@chrisli.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agolinux/compiler.h: add __must_hold macro for functions called with a lock held
Josh Triplett [Thu, 29 Nov 2012 03:19:02 +0000 (14:19 +1100)]
linux/compiler.h: add __must_hold macro for functions called with a lock held

linux/compiler.h has macros to denote functions that acquire or release
locks, but not to denote functions called with a lock held that return
with the lock still held.  Add a __must_hold macro to cover that case.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Reported-by: Ed Cashin <ecashin@coraid.com>
Tested-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agopidns: remove unused is_container_init()
Gao feng [Thu, 29 Nov 2012 03:19:02 +0000 (14:19 +1100)]
pidns: remove unused is_container_init()

since commit 1cdcbec1a3 ("CRED: Neuter sys_capset()") is_container_init()
has no callers.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc/sem.c: alternatives to preempt_disable()
Manfred Spraul [Thu, 29 Nov 2012 03:19:02 +0000 (14:19 +1100)]
ipc/sem.c: alternatives to preempt_disable()

ipc/sem.c uses a custom wakeup scheme that relies on preempt_disable().
On -RT, this causes increased latencies and debug warnings.

The patch adds two additional schemes:
- one built around a completion - could be better for -RT kernels
- one built around a spinlock - unfortunately it's broken
- and the current one

My preferred solution would be the spinlock implementation: RT would use
premptible spinlocks, mainline normal spinlocks.  Thus both get the
optimal implementation without any special code in ipc/sem.c.
Unfortunately, I don't see how it could be fixed.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoDocumentation/sysctl/kernel.txt: document /proc/sys/shmall
Carlos Alberto Lopez Perez [Thu, 29 Nov 2012 03:19:01 +0000 (14:19 +1100)]
Documentation/sysctl/kernel.txt: document /proc/sys/shmall

Signed-off-by: Carlos Alberto Lopez Perez <clopez@igalia.com>
Cc: Rob Landley <rob@landley.net>
Cc: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc: add more comments to message copying related code
Stanislav Kinsbursky [Thu, 29 Nov 2012 03:19:01 +0000 (14:19 +1100)]
ipc: add more comments to message copying related code

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc: simplify message copying
Stanislav Kinsbursky [Thu, 29 Nov 2012 03:19:01 +0000 (14:19 +1100)]
ipc: simplify message copying

Remvoe the redundant and confusing fill_copy().  Also add copy_msg() check
for error.  In this case exit from the function have to be done instead of
break, because further code interprets any error as EAGAIN.

Also define copy_msg() for the case when CONFIG_CHECKPOINT_RESTORE is
disabled.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc-convert-prepare_copy-from-macro-to-function-fix
Andrew Morton [Thu, 29 Nov 2012 03:19:00 +0000 (14:19 +1100)]
ipc-convert-prepare_copy-from-macro-to-function-fix

remove __maybe_unused

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: James Morris <jmorris@namei.org>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc: convert prepare_copy() from macro to function
Stanislav Kinsbursky [Thu, 29 Nov 2012 03:19:00 +0000 (14:19 +1100)]
ipc: convert prepare_copy() from macro to function

This code works if CONFIG_CHECKPOINT_RESTORE is disabled.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc: simplify free_copy() call
Stanislav Kinsbursky [Thu, 29 Nov 2012 03:19:00 +0000 (14:19 +1100)]
ipc: simplify free_copy() call

Passing and checking of msgflg to free_copy() is redundant.  This patch
sets copy to NULL on declaration instead and checks for non-NULL in
free_copy().

Note: in case of copy allocation failure, error is returned immediately.
So no need to check for IS_ERR() in free_copy().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agotest: IPC message queue copy feature test update
Stanislav Kinsbursky [Thu, 29 Nov 2012 03:18:59 +0000 (14:18 +1100)]
test: IPC message queue copy feature test update

This update fixes coding style problems (80-characters line and others).
Also, it fixes test to work with new IPC sysctls (instead of using
experimental API logic, which was throwed away and replaced by sysctls).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoselftests: IPC message queue copy feature test
Stanislav Kinsbursky [Thu, 29 Nov 2012 03:18:59 +0000 (14:18 +1100)]
selftests: IPC message queue copy feature test

This test can be used to check wheither kernel supports IPC message queue
copy and restore features (required by CRIU project).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc: cleanup do_msgrcv() around MSG_COPY feature
Stanislav Kinsbursky [Thu, 29 Nov 2012 03:18:59 +0000 (14:18 +1100)]
ipc: cleanup do_msgrcv() around MSG_COPY feature

MSG_COPY feature was developed for Checkpoint/Restart In User space project
and thus wrapped in CONFIG_CHECKPOINT_RESTORE macro. But code look a bit ugly.
So this patch is an attempt to cleanup do_msgrcv() a bit and make it looks
better.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc: remove redundant MSG_COPY check
Stanislav Kinsbursky [Thu, 29 Nov 2012 03:18:58 +0000 (14:18 +1100)]
ipc: remove redundant MSG_COPY check

Small cleanup patch.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc: introduce message queue copy feature
Stanislav Kinsbursky [Thu, 29 Nov 2012 03:18:58 +0000 (14:18 +1100)]
ipc: introduce message queue copy feature

This patch is required for checkpoint/restore in userspace.

c/r requires some way to get all pending IPC messages without deleting
them from the queue (checkpoint can fail and in this case tasks will be
resumed, so queue have to be valid).

To achive this, new operation flag MSG_COPY for sys_msgrcv() system call
was introduced.  If this flag was specified, then mtype is interpreted as
number of the message to copy.

If MSG_COPY is set, then kernel will allocate dummy message with passed
size, and then use new copy_msg() helper function to copy desired message
(instead of unlinking it from the queue).

Notes:

1) Return -ENOSYS if MSG_COPY is specified, but
   CONFIG_CHECKPOINT_RESTORE is not set.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc-message-queue-receive-cleanup-checkpatch-fixes
Andrew Morton [Thu, 29 Nov 2012 03:18:58 +0000 (14:18 +1100)]
ipc-message-queue-receive-cleanup-checkpatch-fixes

WARNING: line over 80 characters
#33: FILE: include/linux/msg.h:39:
+       long (*msg_fill)(void __user *, struct msg_msg *, size_t ));

ERROR: space prohibited before that close parenthesis ')'
#33: FILE: include/linux/msg.h:39:
+       long (*msg_fill)(void __user *, struct msg_msg *, size_t ));

WARNING: line over 80 characters
#94: FILE: ipc/compat.c:368:
+ return do_msgrcv(first, uptr, second, msgtyp, third, compat_do_msg_fill);

ERROR: space prohibited before that close parenthesis ')'
#142: FILE: ipc/msg.c:774:
+        long (*msg_handler)(void __user *, struct msg_msg *, size_t ))

total: 2 errors, 2 warnings, 165 lines checked

./patches/ipc-message-queue-receive-cleanup.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc: message queue receive cleanup
Stanislav Kinsbursky [Thu, 29 Nov 2012 03:18:58 +0000 (14:18 +1100)]
ipc: message queue receive cleanup

Move all message related manipulation into one function msg_fill().
Actually, two functions because of the compat one.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoDocumentation: update sysctl/kernel.txt
Stanislav Kinsbursky [Thu, 29 Nov 2012 03:18:57 +0000 (14:18 +1100)]
Documentation: update sysctl/kernel.txt

Add documentation about new "msg_next_id", "sem_next_id" and "shm_next_id"
sysctls.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc: wrap new sysctls for CRIU inside CONFIG_CHECKPOINT_RESTORE
Stanislav Kinsbursky [Thu, 29 Nov 2012 03:18:57 +0000 (14:18 +1100)]
ipc: wrap new sysctls for CRIU inside CONFIG_CHECKPOINT_RESTORE

Wrap "msg_next_id", "sem_next_id" and "shm_next_id" inside
CONFIG_CHECKPOINT_RESTORE macro.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc-add-sysctl-to-specify-desired-next-object-id-checkpatch-fixes
Andrew Morton [Thu, 29 Nov 2012 03:18:57 +0000 (14:18 +1100)]
ipc-add-sysctl-to-specify-desired-next-object-id-checkpatch-fixes

ERROR: space required before the open parenthesis '('
#123: FILE: ipc/util.c:285:
+ if(ids->seq > ids->seq_max)

total: 1 errors, 0 warnings, 94 lines checked

./patches/ipc-add-sysctl-to-specify-desired-next-object-id.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc: add sysctl to specify desired next object id
Stanislav Kinsbursky [Thu, 29 Nov 2012 03:18:56 +0000 (14:18 +1100)]
ipc: add sysctl to specify desired next object id

Add 3 new variables and sysctls to tune them (by one "next_id" variable
for messages, semaphores and shared memory respectively).  This variable
can be used to set desired id for next allocated IPC object.  By default
it's equal to -1 and old behaviour is preserved.  If this variable is
non-negative, then desired idr will be extracted from it and used as a
start value to search for free IDR slot.

Notes:

1) this patch doesn't guarantee that the new object will have desired
   id.  So it's up to user space how to handle new object with wrong id.

2) After a sucessful id allocation attempt, "next_id" will be set back
   to -1 (if it was non-negative).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoipc: remove forced assignment of selected message
Stanislav Kinsbursky [Thu, 29 Nov 2012 03:18:56 +0000 (14:18 +1100)]
ipc: remove forced assignment of selected message

This is a cleanup patch. The assignment is redundant.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoexec: use -ELOOP for max recursion depth
Kees Cook [Thu, 29 Nov 2012 03:18:56 +0000 (14:18 +1100)]
exec: use -ELOOP for max recursion depth

To avoid an explosion of request_module calls on a chain of abusive
scripts, fail maximum recursion with -ELOOP instead of -ENOEXEC. As soon
as maximum recursion depth is hit, the error will fail all the way back
up the chain, aborting immediately.

This also has the side-effect of stopping the user's shell from attempting
to reexecute the top-level file as a shell script. As seen in the
dash source:

        if (cmd != path_bshell && errno == ENOEXEC) {
                *argv-- = cmd;
                *argv = cmd = path_bshell;
                goto repeat;
        }

The above logic was designed for running scripts automatically that lacked
the "#!" header, not to re-try failed recursion. On a legitimate -ENOEXEC,
things continue to behave as the shell expects.

Additionally, when tracking recursion, the binfmt handlers should not be
involved. The recursion being tracked is the depth of calls through
search_binary_handler(), so that function should be exclusively responsible
for tracking the depth.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: halfdog <me@halfdog.net>
Cc: P J P <ppandit@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoexec: do not leave bprm->interp on stack
Kees Cook [Thu, 29 Nov 2012 03:18:55 +0000 (14:18 +1100)]
exec: do not leave bprm->interp on stack

If a series of scripts are executed, each triggering module loading via
unprintable bytes in the script header, kernel stack contents can leak
into the command line.

Normally execution of binfmt_script and binfmt_misc happens recursively.
However, when modules are enabled, and unprintable bytes exist in the
bprm->buf, execution will restart after attempting to load matching binfmt
modules.  Unfortunately, the logic in binfmt_script and binfmt_misc does
not expect to get restarted.  They leave bprm->interp pointing to their
local stack.  This means on restart bprm->interp is left pointing into
unused stack memory which can then be copied into the userspace argv
areas.

After additional study, it seems that both recursion and restart remains
the desirable way to handle exec with scripts, misc, and modules.  As
such, we need to protect the changes to interp.

This changes the logic to require allocation for any changes to the
bprm->interp.  To avoid adding a new kmalloc to every exec, the default
value is left as-is.  Only when passing through binfmt_script or
binfmt_misc does an allocation take place.

For a proof of concept, see DoTest.sh from:
http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: halfdog <me@halfdog.net>
Cc: P J P <ppandit@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agofork: unshare: remove dead code
Alan Cox [Thu, 29 Nov 2012 03:18:55 +0000 (14:18 +1100)]
fork: unshare: remove dead code

If new_nsproxy is set we will always call switch_task_namespaces and then
set new_nsproxy back to NULL so the reassignment and fall through check
are redundant

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoproc: pid/status: show all supplementary groups
Artem Bityutskiy [Thu, 29 Nov 2012 03:18:55 +0000 (14:18 +1100)]
proc: pid/status: show all supplementary groups

We display a list of supplementary group for each process in
/proc/<pid>/status.  However, we show only the first 32 groups, not all of
them.

Although this is rare, but sometimes processes do have more than 32
supplementary groups, and this kernel limitation breaks user-space apps
that rely on the group list in /proc/<pid>/status.

Number 32 comes from the internal NGROUPS_SMALL macro which defines the
length for the internal kernel "small" groups buffer.  There is no
apparent reason to limit to this value.

This patch removes the 32 groups printing limit.

The Linux kernel limits the amount of supplementary groups by NGROUPS_MAX,
which is currently set to 65536.  And this is the maximum count of groups
we may possibly print.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years ago/proc/pid/status: add "Seccomp" field
Kees Cook [Thu, 29 Nov 2012 03:18:54 +0000 (14:18 +1100)]
/proc/pid/status: add "Seccomp" field

It is currently impossible to examine the state of seccomp for a given
process.  While attaching with gdb and attempting "call
prctl(PR_GET_SECCOMP,...)" will work with some situations, it is not
reliable.  If the process is in seccomp mode 1, this query will kill the
process (prctl not allowed), if the process is in mode 2 with prctl not
allowed, it will similarly be killed, and in weird cases, if prctl is
filtered to return errno 0, it can look like seccomp is disabled.

When reviewing the state of running processes, there should be a way to
externally examine the seccomp mode.  ("Did this build of Chrome end up
using seccomp?" "Did my distro ship ssh with seccomp enabled?")

This adds the "Seccomp" line to /proc/$pid/status.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: James Morris <jmorris@namei.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoprocfs-add-vmflags-field-in-smaps-output-v4-fix
Andrew Morton [Thu, 29 Nov 2012 03:18:54 +0000 (14:18 +1100)]
procfs-add-vmflags-field-in-smaps-output-v4-fix

remove unneeded brakes per sfr, avoid using bloaty for_each_set_bit()

Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoprocfs: add VmFlags field in smaps output
Cyrill Gorcunov [Thu, 29 Nov 2012 03:18:54 +0000 (14:18 +1100)]
procfs: add VmFlags field in smaps output

During c/r sessions we've found that there is no way at the moment to
fetch some VMA associated flags, such as mlock() and madvise().

This leads us to a problem -- we don't know if we should call for mlock()
and/or madvise() after restore on the vma area we're bringing back to
life.

This patch intorduces a new field into "smaps" output called VmFlags,
where all set flags associated with the particular VMA is shown as two
letter mnemonics.

[ Strictly speaking for c/r we only need mlock/madvise bits but it has been
  said that providing just a few flags looks somehow inconsistent.  So all
  flags are here now. ]

This feature is made available on CONFIG_CHECKPOINT_RESTORE=n kernels, as
other applications may start to use these fields.

The data is encoded in a somewhat awkward two letters mnemonic form, to
encourage userspace to be prepared for fields being added or removed in
the future.

[a.p.zijlstra@chello.nl: props to use for_each_set_bit]
[sfr@canb.auug.org.au: props to use array instead of struct]
[akpm@linux-foundation.org: overall redesign and simplification]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoproc: don't show nonexistent capabilities
Andrew Vagin [Thu, 29 Nov 2012 03:18:53 +0000 (14:18 +1100)]
proc: don't show nonexistent capabilities

Without this patch it is really hard to interpret a bounding set, if
CAP_LAST_CAP is unknown for a current kernel.

Non-existant capabilities can not be deleted from a bounding set with help
of prctl.

E.g.: Here are two examples without/with this patch.
CapBnd: ffffffe0fdecffff
CapBnd: 00000000fdecffff

I suggest to hide non-existent capabilities. Here is two reasons.
* It's logically and easier for using.
* It helps to checkpoint-restore capabilities of tasks, because tasks
can be restored on another kernel, where CAP_LAST_CAP is bigger.

Signed-off-by: Andrew Vagin <avagin@openvz.org>
Cc: Andrew G. Morgan <morgan@kernel.org>
Reviewed-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoptrace: introduce PTRACE_O_EXITKILL
Oleg Nesterov [Thu, 29 Nov 2012 03:18:53 +0000 (14:18 +1100)]
ptrace: introduce PTRACE_O_EXITKILL

Ptrace jailers want to be sure that the tracee can never escape
from the control. However if the tracer dies unexpectedly the
tracee continues to run in potentially unsafe mode.

Add the new ptrace option PTRACE_O_EXITKILL. If the tracer exits
it sends SIGKILL to every tracee which has this bit set.

Note that the new option is not equal to the last-option << 1.  Because
currently all options have an event, and the new one starts the eventless
group.  It uses the random 20 bit, so we have the room for 12 more events,
but we can also add the new eventless options below this one.

Suggested by Amnon Shiloh.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Amnon Shiloh <u3557@miso.sublimeip.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Chris Evans <scarybeasts@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agosimple_strto*: annotate function as obsolete
Eldad Zack [Thu, 29 Nov 2012 03:18:53 +0000 (14:18 +1100)]
simple_strto*: annotate function as obsolete

Update the documentation for simple_strto* to reflect that it has been
obsoleted and advise the usage of kstrto*.

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Joe Perches <joe@perches.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agokstrto*: add documentation
Eldad Zack [Thu, 29 Nov 2012 03:18:52 +0000 (14:18 +1100)]
kstrto*: add documentation

As Bruce Fields pointed out, kstrto* is currently lacking kerneldoc
comments.  This patch adds kerneldoc comments to common variants of
kstrto*: kstrto(u)l, kstrto(u)ll and kstrto(u)int.

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Joe Perches <joe@perches.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoDocumentation: fix Documentation/security/00-INDEX
Jarkko Sakkinen [Thu, 29 Nov 2012 03:18:52 +0000 (14:18 +1100)]
Documentation: fix Documentation/security/00-INDEX

keys-ecryptfs.txt was missing from 00-INDEX.

Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agoDocumentation/DMA-API-HOWTO.txt: minor grammar corrections
Shuah Khan [Thu, 29 Nov 2012 03:18:52 +0000 (14:18 +1100)]
Documentation/DMA-API-HOWTO.txt: minor grammar corrections

Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 years agofs/fat: strip "cp" prefix from codepage in display
Dave Reisner [Thu, 29 Nov 2012 03:18:52 +0000 (14:18 +1100)]
fs/fat: strip "cp" prefix from codepage in display

Option parsing code expects an unsigned integer for the codepage option,
but prefixes and stores this option with "cp" before passing to
load_nls().  This makes the displayed option in /proc an invalid one.
Strip the prefix when printing so that the displayed option is valid for
reuse.

Signed-off-by: Dave Reisner <dreisner@archlinux.org>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>