git.kernelconcepts.de Git - karo-tx-linux.git/log

pata_sch: use ata_pci_sff_init_one()

pata_sch is standard SFF. No reason to open code init. Use
ata_pci_sff_init_one() instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

pata_sil680: Do our own exec_command posting

Use our own mmio area to avoid PCI posting. This avoids the rather slow
paranoid implementation in the default handler.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

libata: Remove excess delay in the tf_load path

We don't need to stall and wait after loading the task file and before
issuing a command, so don't do it. This shows up on profiles and is not
needed.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

sata_nv: use ata_pci_sff_activate_host() instead of ata_host_activate()

sata_nv was incorrectly using ata_host_activate() instead of
ata_pci_sff_activate_host() leading to IRQ assignment failure in
legacy mode. Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Robert Hancock <hancockr@shaw.ca>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

libata: don't flush dcache on slab pages

page_mapping() check this via VM_BUG_ON(PageSlab(page)) so we bug here
with the according debuging turned on.

Future TODO: replace this with a flush_dcache_page_for_pio() API

Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Cc: stable@kernel.org

pata_cmd640: don't read CFR pointlessly

cmd640_hardware_init() reads CFR but doesn't use the value read...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

libata: make sff_irq_on() method optional

Now, with the introduction of the sff_set_devctl() method, we can
use it in sff_irq_on() method too -- that way its implementations
in 'pata_bf54x' and 'pata_scc' become virtually identical to
ata_sff_irq_on(). The sff_irq_on() method now becomes quite
superfluous, and the only reason not to remove it completely is
the existence of the 'pata_octeon_cf' driver which implements it
as an empty function. Just make the method optional then, with
ata_sff_irq_on() becoming generic taskfile-bound function, still
global for the 'pata_bf54x' driver to be able to call it from its
thaw() and postreset() methods.

While at it, make the sff_irq_on() method and ata_sff_irq_on() return
'void' as the result is always ignored anyway.

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

libata: introduce sff_set_devctl() method

The set of libata's taskfile access methods is clearly incomplete as
it lacks a method to write to the device control register -- which
forces drivers like 'pata_bf54x' and 'pata_scc' to implement more
"high level" (and more weighty) methods like freeze() and postreset().

So, introduce the optional sff_set_devctl() method which the drivers
only have to implement if the standard iowrite8() can't be used (just
like the existing sff_check_altstatus() method) and make use of it
in the freeze() and postreset() method implementations (I could also
have used it in softreset() method but it also reads other taskfile
registers without using tf_read() making that quite pointless);
this makes freeze() method implementations in the 'pata_bf54x' and
'pata_scc' methods virtually identical to ata_sff_freeze(), so we
can get rid of them completely.

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ahci_platform: properly set up EM messaging

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ahci: add "em_buffer" attribute for AHCI hosts

Add "em_buffer" attribute for SATA AHCI hosts to provide a way for
userland to access AHCI EM (enclosure management) buffer directly if the
host supports EM.

AHCI driver should support SGPIO EM messages. However the SATA/AHCI
specs did not define the SGPIO message format filled in EM buffer.
Different HW vendors may have different definitions. The mainly purpose
of this attribute is to solve this issue by allowing HW vendors to
provide userland drivers and tools for their SGPIO initiators.

Signed-off-by: Harry Zhang <harry.zhang@amd.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ahci: EM message type auto detect

Detect enclosure management message type automatically at driver
initialization, instead of using module parameter "ahci_em_messages".

Signed-off-by: Harry Zhang <harry.zhang@amd.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

pata_scc: kill useless check in scc_postreset()

The device control register exists and its address is set by scc_setup_ports(),
hence the check is useless...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

pata_scc: make scc_wait_after_reset() static

... since, of course, it's not used outside this driver.

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

libata: use __ratelimit

Use __ratelimit() instead of its own private rate limit implementation.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: linux-ide@vger.kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

libata: use longer 0xff wait if parallel scan is enabled

There are some SATA devices which take relatively long to get out of
0xff status after reset.  In libata, this timeout is determined by
ATA_TMOUT_FF_WAIT.  Quantum GoVault is the worst requring about 2s for
reliable detection.  However, because 2s 0xff timeout can introduce
rather long spurious delay during boot, libata has been compromising
at the next longest timeout of 800ms for HHD424020F7SV00 iVDR drive.

Now that parallel scan is in place for common drivers, libata can
afford 2s 0xff timeout.  Use 2s 0xff timeout if parallel scan is
enabled.

Please note that the chance of spurious wait is pretty slim w/ working
SCR access so this will only affect SATA controllers w/o SCR access
which isn't too common these days.

Please read the following thread for more information on the GoVault
drive.

  http://thread.gmane.org/gmane.linux.ide/14545/focus=14663

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

libata-sff: kill unused ata_bus_reset()

... since I see no callers of it.

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

[libata] Disable R_OK (Early ACK) on SII 3726 PMP

In 2009, While running "cache read" performance test of drives behind
SII PMP we encountered a "all 5 drives" timeout on more than 30% of the
machines under test. This patch reduces the rate by a factor of about 70.
Low enough that we didn't care to further investigate the issue.

Performance impact with any sort of "normal" use was ~2%+ CPU and less
than 1% throughput degradation. Worst case impact (cached read) was
6% IOPS reduction. This is with NCQ off (q=1) but I believe FIS based
switching enabled in the SATA driver.

The patch disables "Early ACK" in the 3726 port multiplier.
"Early ACK" is issued when device sends a FIS to the host (via PMP)
and the PMP sends an ACK immediately back to the device - well before
the host gets the response. Under worst case IOPs load (cached read
test) and more than 2 PMPs connected to a 4-port SATA controller,
I suspect the time to service all of the PMPs is exceeding the PMPs
ability to keep track of outstanding FIS it owes the Host. Reducing
the number of PMPs to 2 (or 1) reduces the frequency by several orders
of magnitude. Kudos to Gwendal for initial debugging of this issue.
[Any errors in the description are mine, not his.]

Patch is currently in production on Google servers.

Signed-off-by: Grant Grundler <grundler@google.com>
Signed-off-by: Gwendal Grignou <gwendal@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

libata: update gfp/slab.h includes

Implicit slab.h inclusion via percpu.h is about to go away. Make sure
gfp.h or slab.h is included as necessary.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ahci: implement AHCI_HFLAG_NO_FPDMA_AA and update NV quirks

It turns out different generations of MCPs have differing quirks.

* MCP 65-73 : FPDMA AA broken, lies about PMP support, forgets to report NCQ
* MCP 77-79 : FPDMA AA broken, lies about PMP support
* MCP 89 : FPDMA AA broken

Instead of turngin off FPDMA AA on all NVIDIAs, implement
HFLAG_NO_FPDMA_AA, define additional board IDs and apply necessary
quirks.

This fixes bko#15481 and the list of quirks is verified by Peer Chen.

http://bugzilla.kernel.org/show_bug.cgi?id=15481

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peer Chen <pchen@nvidia.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

pata_mpc52xx: reduce code size by simple change of constant data types

I've prepared a totally simple patch that, if I did it and measured it
correctly, reduces the text size as of the ppc-6xx-size command of
pata-mpc52xx by more than 10%, by reducing the rodata size from 0x4a4
to 0x17e bytes. This is simply done by changing the data types of the
ATA timing constants.

If you are interested at all, and it's worth the trouble, here the
details:

ppc-6xx-size:
     text data bss  dec  hex filename
old: 6532 1068   0 7600 1db0 pata-mpc52xx.o
new: 5718 1068   0 6786 1a82 pata-mpc52xx.o

The (assembler) code itself doesn't really change very much. I double
checked the final results inside mpc52xx-ata-apply-timings() and they
match. The driver is still working fine of course.

Signed-off-by: Roman Fietze <roman.fietze@telemotive.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ahci: clean up board IDs

ahci over time has grown a number of board IDs and it's a bit of mess
right now.  Clean it up such that,

* board_id_* now live in a separate enum board_ids and numbers are
  assigned automatically.

* Board IDs assigned to features are separated from the ones assigned
  to specific implementations and both are ordered alphabetically.

* For NV MCPs, define per-generation alias board_ids and assign
  matching aliases in the pci id table.  This makes mcp_linux, 67-73
  use board_ahci_mcp65 instead of board_ahci_yesncq.  Both are
  identical in content.

* Kill now unused board_ahci_nopmp and board_ahci_yesncq.

This patch doesn't cause any functional change but will make future
changes to board_ids and quirks much less painful.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peer Chen <pchen@nvidia.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ahci: Turn off DMA engines when there's no device attached

According to section 10.3.1 of the AHCI spec, PxCMD.ST must not be set
unless there's a device attached. Following this saves us a measurable
quantity of power and does not impair hotplug support. Based on a patch
by Kristen Carlson Accardi.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ahci: Add platform driver

This can be used for AHCI-compatible interfaces implemented inside
System-On-Chip solutions, or AHCI devices connected via localbus.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ahci: Move generic code into libahci

This patch should contain no functional changes, just moves code
around.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ahci: Introduce ahci_set_em_messages()

Factor out some ahci_em_messages handling code from ahci_init_one().
We would like to reuse it for non-PCI devices.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ahci: Factor out PCI specifics from ahci_print_info()

Introduce ahci_pci_print_info() that now handles PCI stuff.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ahci: Factor out PCI specifics from ahci_init_controller()

Move PCI stuff into ahci_pci_init_controller().

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ahci: Get rid of pci_dev argument in ahci_port_init()

To make the function bus-independand we have to get rid of
"struct pci_dev *", so let's pass just "struct devce *".

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ahci: Factor out PCI specifics from ahci_reset_controller()

Move PCI stuff into ahci_pci_reset_controller().

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ahci: Get rid of pci_dev argument in ahci_save_initial_config()

To make the function generic we have to get rid of "struct pci_dev *",
so let's pass just a "struct devce *".

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ahci: Factor out PCI specifics from ahci_save_initial_config()

Make ahci_save_initial_config() a bit more generic by introducing
force_port_map and mask_port_map arguments.

Move PCI stuff into ahci_pci_save_initial_config().

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ahci: Get rid of host->iomap usage

Currently the driver uses host->iomap to store all the iomapped BARs
of a PCI device (while AHCI devices actually use just a single memory
window).

We're going to teach AHCI to work with non-PCI buses, so there are two
options to make this work:

1. "fake" host->iomap array for non-PCI devices, and place the needed
address at iomap[AHCI_PCI_BAR];
2. Get rid of host->iomap usage, instead introduce a private mmio
field.

This patch implements the second option.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify

* 'for-linus' of git://git.infradead.org/users/eparis/notify:
  inotify: don't leak user struct on inotify release
  inotify: race use after free/double free in inotify inode marks
  inotify: clean up the inotify_add_watch out path
  Inotify: undefined reference to `anon_inode_getfd'

Manual merge to remove duplicate "select ANON_INODES" from Kconfig file

Merge branch 'davinci-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-davinci

* 'davinci-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-davinci:
DA830: fix USB 2.0 clock entry

DA830: fix USB 2.0 clock entry

DA8xx OHCI driver fails to load due to failing clk_get() call for the USB 2.0
clock. Arrange matching USB 2.0 clock by the clock name instead of the device.
(Adding another CLK() entry for "ohci.0" device won't do -- in the future I'll
also have to enable USB 2.0 clock to configure CPPI 4.1 module, in which case
I won't have any device at all.)

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>

inotify: don't leak user struct on inotify release

inotify_new_group() receives a get_uid-ed user_struct and saves the
reference on group->inotify_data.user. The problem is that free_uid() is
never called on it.

Issue seem to be introduced by 63c882a0 (inotify: reimplement inotify
using fsnotify) after 2.6.30.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Eric Paris <eparis@parisplace.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Paris <eparis@redhat.com>

inotify: race use after free/double free in inotify inode marks

There is a race in the inotify add/rm watch code.  A task can find and
remove a mark which doesn't have all of it's references.  This can
result in a use after free/double free situation.

Task A Task B
------------ -----------
inotify_new_watch()
allocate a mark (refcnt == 1)
add it to the idr
inotify_rm_watch()
inotify_remove_from_idr()
  fsnotify_put_mark()
      refcnt hits 0, free
take reference because we are on idr
[at this point it is a use after free]
[time goes on]
refcnt may hit 0 again, double free

The fix is to take the reference BEFORE the object can be found in the
idr.

Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: <stable@kernel.org>

inotify: clean up the inotify_add_watch out path

inotify_add_watch explictly frees the unused inode mark, but it can just
use the generic code. Just do that.

Signed-off-by: Eric Paris <eparis@redhat.com>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
vhost: fix barrier pairing

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
mmap_min_addr check CAP_SYS_RAWIO only for write

Merge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze

* 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze:
  microblaze: Fix module loading on system with WB cache
  microblaze: export assembly functions used by modules
  microblaze: Remove powerpc code from Microblaze port
  microblaze: Remove compilation warnings in cache macro
  microblaze: export assembly functions used by modules
  microblaze: fix get_user/put_user side-effects
  microblaze: re-enable interrupts before calling schedule

Merge branch 'net-2.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

mmap_min_addr check CAP_SYS_RAWIO only for write

Redirecting directly to lsm, here's the patch discussed on lkml:
http://lkml.org/lkml/2010/4/22/219

The mmap_min_addr value is useful information for an admin to see without
being root ("is my system vulnerable to kernel NULL pointer attacks?") and
its setting is trivially easy for an attacker to determine by calling
mmap() in PAGE_SIZE increments starting at 0, so trying to keep it private
has no value.

Only require CAP_SYS_RAWIO if changing the value, not reading it.

Comment from Serge :

  Me, I like to write my passwords with light blue pen on dark blue
  paper, pasted on my window - if you're going to get my password, you're
  gonna get a headache.

Signed-off-by: Kees Cook <kees.cook@canonical.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
(cherry picked from commit 822cceec7248013821d655545ea45d1c6a9d15b3)

microblaze: Fix module loading on system with WB cache

There is necessary to flush whole dcache. Icache work should be
done in kernel/module.c.

Signed-off-by: Michal Simek <monstr@monstr.eu>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
mfd: Clean up after WM83xx AUXADC interrupt if it arrives late

Merge branch 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm

* 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: PPC: Keep index within boundaries in kvmppc_44x_emul_tlbwe()
  KVM: VMX: blocked-by-sti must not defer NMI injections
  KVM: x86: Call vcpu_load and vcpu_put in cpuid_update
  KVM: SVM: Fix wrong intercept masks on 32 bit
  KVM: convert ioapic lock to spinlock

Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6:
  serial: imx.c: fix CTS trigger level lower to avoid lost chars
  tty: Fix unbalanced BKL handling in error path
  serial: mpc52xx_uart: fix null pointer dereference

serial: imx.c: fix CTS trigger level lower to avoid lost chars

The imx CTS trigger level is left at its reset value that is 32
chars. Since the RX FIFO has 32 entries, when CTS is raised, the
FIFO already is full. However, some serial port devices first empty
their TX FIFO before stopping when CTS is raised, resulting in lost
chars.

This patch sets the trigger level lower so that other chars arrive
after CTS is raised, there is still room for 16 of them.

Signed-off-by: Valentin Longchamp<valentin.longchamp@epfl.ch>
Tested-by: Philippe Rétornaz<philippe.retornaz@epfl.ch>
Acked-by: Wolfram Sang<w.sang@pengutronix.de>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

tty: Fix unbalanced BKL handling in error path

Arnd noted:

After the "retry_open:" label, we first get the tty_mutex
and then the BKL. However a the end of tty_open, we jump
back to retry_open with the BKL still held. If we run into
this case, the tty_open function will be left with the BKL
still held.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

serial: mpc52xx_uart: fix null pointer dereference

Commit 6acc6833510db8f72b5ef343296d97480555fda9
introduced NULL pointer dereference and kernel crash
on ppc32 machines while booting. Fix this bug now.

Reported-by: Leonardo Chiquitto <leonardo.lists@gmail.com>
Tested-by: Leonardo Chiquitto <leonardo.lists@gmail.com>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: guard against hardlinking directories

vfs: Fix O_NOFOLLOW behavior for paths with trailing slashes

According to specification

mkdir d; ln -s d a; open("a/", O_NOFOLLOW | O_RDONLY)

should return success but currently it returns ELOOP. This is a
regression caused by path lookup cleanup patch series.

Fix the code to ignore O_NOFOLLOW in case the provided path has trailing
slashes.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: Marius Tolzmann <tolzmann@molgen.mpg.de>
Acked-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: ice1724 - Fix ESI Maya44 capture source control
  ALSA: pcm - Use pgprot_noncached() for MIPS non-coherent archs
  ALSA: virtuoso: fix Xonar D1/DX front panel microphone
  ALSA: hda - Add hp-dv4 model for IDT 92HD71bx
  ALSA: hda - Fix mute-LED GPIO pin for HP dv series
  ALSA: hda: Fix 0 dB for Lenovo models using Conexant CX20549 (Venice)

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: ad7877 - keep dma rx buffers in seperate cache lines
  Input: psmouse - reset all types of mice before reconnecting
  Input: elantech - use all 3 bytes when checking version
  Input: iforce - fix Guillemot Jet Leader 3D entry
  Input: iforce - add Guillemot Jet Leader Force Feedback

mfd: Clean up after WM83xx AUXADC interrupt if it arrives late

In certain circumstances, especially under heavy load, the AUXADC
completion interrupt may be detected after we've timed out waiting for
it.  That conversion would still succeed but the next conversion will
see the completion that was signalled by the interrupt for the previous
conversion and therefore not wait for the AUXADC conversion to run,
causing it to report failure.

Provide a simple, non-invasive cleanup by using try_wait_for_completion()
to ensure that the completion is not signalled before we wait.  Since
the AUXADC is run within a mutex we know there can only have been at
most one AUXADC interrupt outstanding.  A more involved change should
follow for the next merge window.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

microblaze: export assembly functions used by modules

Export __strncpy_user, memory_size, ioremap_bot for modules.

Signed-off-by: Michal Simek <monstr@monstr.eu>

microblaze: Remove powerpc code from Microblaze port

Remove eeh_add_device_tree_late which is powerpc specific code.

Signed-off-by: Michal Simek <monstr@monstr.eu>

microblaze: Remove compilation warnings in cache macro

CC arch/microblaze/kernel/cpu/cache.o
arch/microblaze/kernel/cpu/cache.c: In function '__invalidate_dcache_range_wb':
arch/microblaze/kernel/cpu/cache.c:398: warning: ISO C90 forbids mixed declarations and code
arch/microblaze/kernel/cpu/cache.c: In function '__flush_dcache_range_wb':
arch/microblaze/kernel/cpu/cache.c:509: warning: ISO C90 forbids mixed declara

Signed-off-by: Michal Simek <monstr@monstr.eu>

microblaze: export assembly functions used by modules

Modules that use copy_{to,from}_user(), memcpy(), and memset() fail to build
in certain circumstances.

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>

Merge branch 'fix/hda' into for-linus

Input: ad7877 - keep dma rx buffers in seperate cache lines

With dma based spi transmission, data corruption is observed
occasionally. With dma buffers located right next to msg and
xfer fields, cache lines correctly flushed in preparation for
dma usage may be polluted again when writing to fields in the
same cache line.

Make sure cache fields used with dma do not share cache lines
with fields changed during dma handling. As both fields are part
of a struct that is allocated via kzalloc, thus cache aligned,
moving the fields to the 1st position and insert padding for
alignment does the job.

Signed-off-by: Oskar Schirmer <os@emlix.com>
Signed-off-by: Daniel Glöckner <dg@emlix.com>
Signed-off-by: Oliver Schneidewind <osw@emlix.com>
Signed-off-by: Johannes Weiner <jw@emlix.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
[dtor@mail.ru - changed to use ___cacheline_aligned as suggested
by akpm]
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>

Input: psmouse - reset all types of mice before reconnecting

Synaptics hardware requires resetting device after suspend to ram
in order for the device to be operational. The reset lives in
synaptics-specific reconnect handler, but it is not being invoked
if synaptics support is disabled and the device is handled as a
standard PS/2 device (bare or IntelliMouse protocol).

Let's add reset into generic reconnect handler as well.

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>

Input: elantech - use all 3 bytes when checking version

Apparently all 3 bytes returned by ETP_FW_VERSION_QUERY are significant
and should be taken into account when matching hardware version/features.

Tested-by: Eric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>

microblaze: fix get_user/put_user side-effects

The Microblaze implementations of get_user() and (MMU) put_user() evaluate
the address argument more than once. This causes unexpected side-effects for
invocations that include increment operators, i.e. get_user(foo, bar++).

This patch also removes the distinction between MMU and noMMU put_user().

Without the patch:
  $ echo 1234567890 > /proc/sys/kernel/core_pattern
  $ cat /proc/sys/kernel/core_pattern
  12345

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>

microblaze: re-enable interrupts before calling schedule

schedule() should not be called with interrupts disabled.

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>

KVM: PPC: Keep index within boundaries in kvmppc_44x_emul_tlbwe()

An index of KVM44x_GUEST_TLB_SIZE is already one too large.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Hollis Blanchard <hollis@penguinppc.org>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: VMX: blocked-by-sti must not defer NMI injections

As the processor may not consider GUEST_INTR_STATE_STI as a reason for
blocking NMI, it could return immediately with EXIT_REASON_NMI_WINDOW
when we asked for it. But as we consider this state as NMI-blocking, we
can run into an endless loop.

Resolve this by allowing NMI injection if just GUEST_INTR_STATE_STI is
active (originally suggested by Gleb). Intel confirmed that this is
safe, the processor will never complain about NMI injection in this
state.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
KVM-Stable-Tag
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: x86: Call vcpu_load and vcpu_put in cpuid_update

cpuid_update may operate VMCS, so vcpu_load() and vcpu_put()
should be called to ensure correctness.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: SVM: Fix wrong intercept masks on 32 bit

This patch makes KVM on 32 bit SVM working again by
correcting the masks used for iret interception. With the
wrong masks the upper 32 bits of the intercepts are masked
out which leaves vmrun unintercepted. This is not legal on
svm and the vmrun fails.
Bug was introduced by commits 95ba827313 and 3cfc3092.

Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: convert ioapic lock to spinlock

kvm_set_irq is used from non sleepable contexes, so convert ioapic from
mutex to spinlock.

KVM-Stable-Tag.
Tested-by: Ralf Bonenkamp <ralf.bonenkamp@swyx.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/perf_event: Fix oops due to perf_event_do_pending call
powerpc/swiotlb: Fix off by one in determining boundary of which ops to use

Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6

* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
  [S390] correct address of _stext with CONFIG_SHARED_KERNEL=y
  [S390] ptrace: fix return value of do_syscall_trace_enter()
  [S390] dasd: fix race between tasklet and dasd_sleep_on

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: preserve seq # on requeued messages after transient transport errors
  ceph: fix cap removal races
  ceph: zero unused message header, footer fields
  ceph: fix locking for waking session requests after reconnect
  ceph: resubmit requests on pg mapping change (not just primary change)
  ceph: fix open file counting on snapped inodes when mds returns no caps
  ceph: unregister osd request on failure
  ceph: don't use writeback_control in writepages completion
  ceph: unregister bdi before kill_anon_super releases device name

Merge commit 'kumar/merge' into merge

Revert "PCI: update bridge resources to get more big ranges in PCI assign unssigned"

This reverts commit 977d17bb1749517b353874ccdc9b85abc7a58c2a, because it
can cause problems with some devices not getting any resources at all
when the resource tree is re-allocated.

For an example of this, see

https://bugzilla.kernel.org/show_bug.cgi?id=15960
(originally https://bugtrack.alsa-project.org/alsa-bug/view.php?id=4982)
(lkml thread: http://lkml.org/lkml/2010/4/19/20)

where Peter Henriksson reported his Xonar DX sound card gone, because
the IO port region was no longer allocated.

Reported-bisected-and-tested-by: Peter Henriksson <peter.henriksson@gmail.com>
Requested-by: Andrew Morton <akpm@linux-foundation.org>
Requested-by: Clemens Ladisch <clemens@ladisch.de>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

CacheFiles: Fix error handling in cachefiles_determine_cache_security()

cachefiles_determine_cache_security() is expected to return with a
security override in place. However, if set_create_files_as() fails, we
fail to do this. In this case, we should just reinstate the security
override that was set by the caller.

Furthermore, if set_create_files_as() fails, we should dispose of the
new credentials we were in the process of creating.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rwsem: Test for no active locks in __rwsem_do_wake undo code

If there are no active threasd using a semaphore, it is always correct
to unqueue blocked threads. This seems to be what was intended in the
undo code.

What was done instead, was to look for a sem count of zero - this is an
impossible situation, given that at least one thread is known to be
queued on the semaphore. The code might be correct as written, but it's
hard to reason about and it's not what was intended (otherwise the goto
out would have been unconditional).

Go for checking the active count - the alternative is not worth the
headache.

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

vhost: fix barrier pairing

According to memory-barriers.txt, an smp memory barrier in guest
should always be paired with an smp memory barrier in host,
and I quote "a lack of appropriate pairing is almost certainly an
error". In case of vhost, failure to flush out used index
update before looking at the interrupt disable flag
could result in missed interrupts, resulting in
networking hang under stress.

This might happen when flags read bypasses used index write.
So we see interrupts disabled and do not interrupt, at the
same time guest writes flags value to enable interrupt,
reads an old used index value, thinks that
used ring is empty and waits for interrupt.

Note: the barrier we pair with here is in
drivers/virtio/virtio_ring.c, function
vring_enable_cb.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Juan Quintela <quintela@redhat.com>

Inotify: undefined reference to `anon_inode_getfd'

Fix:

fs/built-in.o: In function `sys_inotify_init1':
summary.c:(.text+0x347a4): undefined reference to `anon_inode_getfd'

found by kautobuild with arms bcmring_defconfig, which ends up with
INOTIFY_USER enabled (through the 'default y') but leaves ANON_INODES
unset. However, inotify_user.c uses anon_inode_getfd().

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Eric Paris <eparis@redhat.com>

ALSA: ice1724 - Fix ESI Maya44 capture source control

The capture source control of maya44 was wrongly coded with the bit
shift instead of the bit mask. Also, the slot for line-in was
wrongly assigned (slot 5 instead of 4).

Reported-by: Alex Chernyshoff <alexdsp@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: pcm - Use pgprot_noncached() for MIPS non-coherent archs

MIPS non-coherent archs need the noncached pgprot in mmap of PCM buffers.
But, since the coherency needs to be checked dynamically via
plat_device_is_coherent(), we need an ugly check dependent on MIPS
in ALSA core code.

This should be cleaned up in MIPS arch side (e.g. creating
dma_mmap_coherent()) in near future.

Tested-by: Wu Zhangjin <wuzhangjin@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: virtuoso: fix Xonar D1/DX front panel microphone

Commit 65c3ac885ce9852852b895a4a62212f62cb5f2e9 in 2.6.33 accidentally
left out the initialization of the AC97 codec FMIC2MIC bit, which broke
recording from the front panel microphone.

Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Cc: <stable@kernel.org>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: hda - Add hp-dv4 model for IDT 92HD71bx

It turned out that HP dv series have inconsistent the mute-LED GPIO
mapping among various models.  dv4/7 seem to use GPIO 0 while dv 5/6
seem to use GPIO 3.  The previous commit
  26ebe0a28986f4845b2c5bea43ac5cc0b9f27f0a
  ALSA: hda - Fix mute-LED GPIO pin for HP dv series
breaks dv5/6.

This patch adds the new quirk model, hp-dv4, to handle HP dv4/7
separately from HP dv5/6.

Tested-by: Kunal Gangakhedkar <kunal.gangakhedkar@gmail.com> (for dv6-1110ax)
Acked-by: Kunal Gangakhedkar <kunal.gangakhedkar@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

[S390] correct address of _stext with CONFIG_SHARED_KERNEL=y

As of git commit 1844c9bc0b2fed3023551c1affe033ab38e90b9a head64.S/head31.S
are not included in head.S anymore but build as an extra object. This breaks
shared kernel support because the .org statement in head64.S/head31.S for
CONFIG_SHARED_KERNEL=y will have a different effect. The end address of the
head.text section in head.o will be added to the .org value, to compensate
for this subtract 0x11000 to get the required value of 0x100000 again.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

[S390] ptrace: fix return value of do_syscall_trace_enter()

strace may change the system call number, so regs->gprs[2] must not
be read before tracehook_report_syscall_entry(). This fixes a bug
where "strace -f" will hang after a vfork().

Cc: <stable@kernel.org>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

[S390] dasd: fix race between tasklet and dasd_sleep_on

The various dasd_sleep_on functions use a global wait queue when
waiting for a cqr. The wait condition checks the status and devlist
fields of the cqr to determine if it is safe to continue. This
evaluation may return true, although the tasklet has not finished
processing of the cqr and the callback function has not been called
yet. When the callback is finally called, the data in the cqr may
already be invalid. The sleep_on wait condition needs a safe way to
determine if the tasklet has finished processing. Use the
callback_data field of the cqr to store a token, which is set by
the callback function itself.

Cc: <stable@kernel.org>
Signed-off-by: Stefan Weinhuber <wein@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

powerpc/perf_event: Fix oops due to perf_event_do_pending call

Anton Blanchard found that large POWER systems would occasionally
crash in the exception exit path when profiling with perf_events.
The symptom was that an interrupt would occur late in the exit path
when the MSR[RI] (recoverable interrupt) bit was clear.  Interrupts
should be hard-disabled at this point but they were enabled.  Because
the interrupt was not recoverable the system panicked.

The reason is that the exception exit path was calling
perf_event_do_pending after hard-disabling interrupts, and
perf_event_do_pending will re-enable interrupts.

The simplest and cleanest fix for this is to use the same mechanism
that 32-bit powerpc does, namely to cause a self-IPI by setting the
decrementer to 1.  This means we can remove the tests in the exception
exit path and raw_local_irq_restore.

This also makes sure that the call to perf_event_do_pending from
timer_interrupt() happens within irq_enter/irq_exit.  (Note that
calling perf_event_do_pending from timer_interrupt does not mean that
there is a possible 1/HZ latency; setting the decrementer to 1 ensures
that the timer interrupt will happen immediately, i.e. within one
timebase tick, which is a few nanoseconds or 10s of nanoseconds.)

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: stable@kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

ceph: preserve seq # on requeued messages after transient transport errors

If the tcp connection drops and we reconnect to reestablish a stateful
session (with the mds), we need to resend previously sent (and possibly
received) messages with the _same_ seq # so that they can be dropped on
the other end if needed. Only assign a new seq once after the message is
queued.

Signed-off-by: Sage Weil <sage@newdream.net>

ceph: fix cap removal races

The iterate_session_caps helper traverses the session caps list and tries
to grab an inode reference. However, the __ceph_remove_cap was clearing
the inode backpointer _before_ removing itself from the session list,
causing a null pointer dereference.

Clear cap->ci under protection of s_cap_lock to avoid the race, and to
tightly couple the list and backpointer state. Use a local flag to
indicate whether we are releasing the cap, as cap->session may be modified
by a racing thread in iterate_session_caps.

Signed-off-by: Sage Weil <sage@newdream.net>

Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging

* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
hwmon: (applesmc) Correct sysfs fan error handling
hwmon: (asc7621) Bug fixes

Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
kprobes/x86: Fix removed int3 checking order
perf: Fix static strings treated like dynamic ones

drivers/gpu/drm/i915/i915_irq.c:i915_error_object_create(): use correct kmap-atomic slot

i915_error_object_create() is called from the timer interrupt and hence
can corrupt the KM_USER0 slot. Use KM_IRQ0 instead.

Reported-by: Jaswinder Singh Rajput <jaswinderlinux@gmail.com>
Tested-by: Jaswinder Singh Rajput <jaswinderlinux@gmail.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

hp_accel: fix race in device removal

The work queue has to be flushed after the device has been made
inaccessible. The patch closes a window during which a work queue might
remain active after the device is removed and would then lead to ACPI
calls with undefined behavior.

Signed-off-by: Oliver Neukum <oneukum@suse.de>
Acked-by: Eric Piel <eric.piel@tremplin-utc.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Pavel Herrmann <morpheus.ibis@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mqueue: fix kernel BUG caused by double free() on mq_open()

In case of aborting because we reach the maximum amount of memory which
can be allocated to message queues per user (RLIMIT_MSGQUEUE), we would
try to free the message area twice when bailing out: first by the error
handling code itself, and then later when cleaning up the inode through
delete_inode().

Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fbdev: bfin-t350mcqb-fb: fix fbmem allocation with blanking lines

The current allocation does not include the memory required for blanking
lines. So avoid memory corruption when multiple devices are using the DMA
memory near each other.

Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

memcg: fix css_is_ancestor() RCU locking

Some callers (in memcontrol.c) calls css_is_ancestor() without
rcu_read_lock.  Because css_is_ancestor() has to access RCU protected
data, it should be under rcu_read_lock().

This makes css_is_ancestor() itself does safe access to RCU protected
area.  (At least, "root" can have refcnt==0 if it's not an ancestor of
"child".  So, we need rcu_read_lock().)

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

memcg: fix css_id() RCU locking for real

Commit ad4ba375373937817404fd92239ef4cadbded23b ("memcg: css_id() must be
called under rcu_read_lock()") modifies memcontol.c for fixing RCU check
message.  But Andrew Morton pointed out that the fix doesn't seems sane
and it was just for hidining lockdep messages.

This is a patch for do proper things.  Checking again, all places,
accessing without rcu_read_lock, that commit fixies was intentional....
all callers of css_id() has reference count on it.  So, it's not necessary
to be under rcu_read_lock().

Considering again, we can use rcu_dereference_check for css_id().  We know
css->id is valid if css->refcnt > 0.  (css->id never changes and freed
after css->refcnt going to be 0.)

This patch makes use of rcu_dereference_check() in css_id/depth and remove
unnecessary rcu-read-lock added by the commit.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

bsdacct: use del_timer_sync() in acct_exit_ns()

acct_exit_ns --> acct_file_reopen deletes timer without check timer
execution on other CPUs. So acct_timeout() can change an unmapped memory.

Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rmap: remove anon_vma check in page_address_in_vma()

Currently page_address_in_vma() compares vma->anon_vma and
page_anon_vma(page) for parameter check, but in 2.6.34 a vma can have
multiple anon_vmas with anon_vma_chain, so current check does not work.
(For anonymous page shared by multiple processes, some verified (page,vma)
pairs return -EFAULT wrongly.)

We can go to checking all anon_vmas in the "same_vma" chain, but it needs
to meet lock requirement. Instead, we can remove anon_vma check safely
because page_address_in_vma() assumes that page and vma are already
checked to belong to the identical process.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

hugetlbfs: kill applications that use MAP_NORESERVE with SIGBUS instead of OOM-killer

Ordinarily, application using hugetlbfs will create mappings with
reserves.  For shared mappings, these pages are reserved before mmap()
returns success and for private mappings, the caller process is guaranteed
and a child process that cannot get the pages gets killed with sigbus.

An application that uses MAP_NORESERVE gets no reservations and mmap()
will always succeed at the risk the page will not be available at fault
time.  This might be used for example on very large sparse mappings where
the developer is confident the necessary huge pages exist to satisfy all
faults even though the whole mapping cannot be backed by huge pages.
Unfortunately, if an allocation does fail, VM_FAULT_OOM is returned to the
fault handler which proceeds to trigger the OOM-killer.  This is
unhelpful.

Even without hugetlbfs mounted, a user using mmap() can trivially trigger
the OOM-killer because VM_FAULT_OOM is returned (will provide example
program if desired - it's a whopping 24 lines long).  It could be
considered a DOS available to an unprivileged user.

This patch alters hugetlbfs to kill a process that uses MAP_NORESERVE
where huge pages were not available with SIGBUS instead of triggering the
OOM killer.

This change affects hugetlb_cow() as well.  I feel there is a failure case
in there, but I didn't create one.  It would need a fairly specific target
in terms of the faulting application and the hugepage pool size.  The
hugetlb_no_page() path is much easier to hit but both might as well be
closed.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>