]> git.kernelconcepts.de Git - karo-tx-linux.git/log
karo-tx-linux.git
9 years agoMerge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma
Linus Torvalds [Thu, 2 Apr 2015 18:30:36 +0000 (11:30 -0700)]
Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
 "This time we have addition of caps for jz4740 which fixes intentional
  warning at boot.  Then we have memory leak issues in drivers using
  virt-dma by Peter on few drive"

* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: moxart-dma: Fix memory leak when stopping a running transfer
  dmaengine: bcm2835-dma: Fix memory leak when stopping a running transfer
  dmaengine: omap-dma: Fix memory leak when terminating running transfer
  dmaengine: edma: fix memory leak when terminating running transfers
  dmaengine: jz4740: Define capabilities

9 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Thu, 2 Apr 2015 18:09:41 +0000 (11:09 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Fix use-after-free with mac80211 RX A-MPDU reorder timer, from
    Johannes Berg.

 2) iwlwifi leaks memory every module load/unload cycles, fix from Larry
    Finger.

 3) Need to use for_each_netdev_safe() in rtnl_group_changelink()
    otherwise we can crash, from WANG Cong.

 4) mlx4 driver does register_netdev() too early in the probe sequence,
    from Ido Shamay.

 5) Don't allow router discovery hop limit to decrease the interface's
    hop limit, from D.S. Ljungmark.

 6) tx_packets and tx_bytes improperly accounted for certain classes of
    USB network devices, fix from Ben Hutchings.

 7) ip{6}mr_rules_init() mistakenly use plain kfree to release the ipmr
    tables in the error path, they must instead use ip{6}mr_free_table().
    Fix from WANG Cong.

 8) cxgb4 doesn't properly quiesce all RX activity before unregistering
    the netdevice.  Fix from Hariprasad Shenai.

 9) Fix hash corruptions in ipvlan driver, from Jiri Benc.

10) nla_memcpy(), like a real memcpy, should fully initialize the
    destination buffer, even if the source attribute is smaller.  Fix
    from Jiri Benc.

11) Fix wrong error code returned from iucv_sock_sendmsg().  We should
    use whatever sock_alloc_send_skb() put into 'err'.  From Eugene
    Crosser.

12) Fix slab object leak on module unload in TIPC, from Ying Xue.

13) Need a READ_ONCE() when reading the cached RX socket route in
    tcp_v{4,6}_early_demux().  From Michal Kubecek.

14) Still too many problems with TPC support in the ath9k driver, so
    disable it for now.  From Felix Fietkau.

15) When in AP mode the rtlwifi driver can leak DMA mappings, fix from
    Larry Finger.

16) Missing kzalloc() failure check in gs_usb CAN driver, from Colin Ian
    King.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
  cxgb4: Fix to dump devlog, even if FW is crashed
  cxgb4: Firmware macro changes for fw verison 1.13.32.0
  bnx2x: Fix kdump when iommu=on
  bnx2x: Fix kdump on 4-port device
  mac80211: fix RX A-MPDU session reorder timer deletion
  MAINTAINERS: Update Intel Wired Ethernet Driver info
  tipc: fix a slab object leak
  net/usb/r8152: add device id for Lenovo TP USB 3.0 Ethernet
  af_iucv: fix AF_IUCV sendmsg() errno
  openvswitch: Return vport module ref before destruction
  netlink: pad nla_memcpy dest buffer with zeroes
  bonding: Bonding Overriding Configuration logic restored.
  ipvlan: fix check for IP addresses in control path
  ipvlan: do not use rcu operations for address list
  ipvlan: protect against concurrent link removal
  ipvlan: fix addr hash list corruption
  net: fec: setup right value for mdio hold time
  net: tcp6: fix double call of tcp_v6_fill_cb()
  cxgb4vf: Fix sparse warnings
  netns: don't clear nsid too early on removal
  ...

9 years agoMerge tag 'wireless-drivers-for-davem-2015-04-01' of git://git.kernel.org/pub/scm...
David S. Miller [Wed, 1 Apr 2015 18:48:50 +0000 (14:48 -0400)]
Merge tag 'wireless-drivers-for-davem-2015-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
iwlwifi:

* fix a memory leak, we leaked memory each time the module
  was loaded.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'cxgb4-net'
David S. Miller [Wed, 1 Apr 2015 18:47:21 +0000 (14:47 -0400)]
Merge branch 'cxgb4-net'

Hariprasad Shenai says:

====================
cxgb4 FW macro changes for new FW

Fix to dump device log even in the case of firmware crash. Also
incorporates changes for new FW.

This patch series has been created against net tree and includes patches on
cxgb4 driver.

We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: Fix to dump devlog, even if FW is crashed
Hariprasad Shenai [Wed, 1 Apr 2015 16:11:16 +0000 (21:41 +0530)]
cxgb4: Fix to dump devlog, even if FW is crashed

Add new Common Code routines to retrieve Firmware Device Log
parameters from PCIE_FW_PF[7]. The firmware initializes its Device Log very
early on and stores the parameters for its location/size in that register.
Using the parameters from the register allows us to access the Firmware
Device Log even when the firmware crashes very early on or we're not
attached to the firmware

Based on original work by Casey Leedom <leedom@chelsio.com>

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: Firmware macro changes for fw verison 1.13.32.0
Hariprasad Shenai [Wed, 1 Apr 2015 16:11:15 +0000 (21:41 +0530)]
cxgb4: Firmware macro changes for fw verison 1.13.32.0

Adds new macro and few macro changes for fw version 1.13.32.0 also
changes version string in driver to match 1.13.32.0

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge tag 'mac80211-for-davem-2015-04-01' of git://git.kernel.org/pub/scm/linux/kerne...
David S. Miller [Wed, 1 Apr 2015 18:19:22 +0000 (14:19 -0400)]
Merge tag 'mac80211-for-davem-2015-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
This contains just a single fix for a crash I happened to randomly
run into today during testing. It's clearly been around for a while,
but is pretty hard to trigger, even when I tried explicitly (and
modified the code to make it more likely) it rarely did.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge tag 'iommu-fixes-v4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 1 Apr 2015 17:29:55 +0000 (10:29 -0700)]
Merge tag 'iommu-fixes-v4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU fixes from Joerg Roedel:
 "This contains fixes for:

   - a VT-d issue where hardware domain-ids might be freed while still
     in use.

   - an ipmmu-vmsa issue where where the device-table was not zero
     terminated

   - unchecked register access issue in the arm-smmu driver"

* tag 'iommu-fixes-v4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Remove unused variable
  iommu: ipmmu-vmsa: Add terminating entry for ipmmu_of_ids
  iommu/vt-d: Detach domain *only* from attached iommus
  iommu/arm-smmu: fix ARM_SMMU_FEAT_TRANS_OPS condition

9 years agolguest: now needs PCI_DIRECT.
Rusty Russell [Wed, 1 Apr 2015 06:33:30 +0000 (17:03 +1030)]
lguest: now needs PCI_DIRECT.

Since commit 8e7094694396 ("lguest: add a dummy PCI host bridge.")
lguest uses PCI, but it needs you to frob the ports directly.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoMerge tag 'lazytime_fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Linus Torvalds [Wed, 1 Apr 2015 17:05:42 +0000 (10:05 -0700)]
Merge tag 'lazytime_fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull lazytime fixes from Ted Ts'o:
 "This fixes a problem in the lazy time patches, which can cause
  frequently updated inods to never have their timestamps updated.

  These changes guarantee that no timestamp on disk will be stale by
  more than 24 hours"

* tag 'lazytime_fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  fs: add dirtytime_expire_seconds sysctl
  fs: make sure the timestamps for lazytime inodes eventually get written

9 years agoMerge branch 'for-4.0' of git://linux-nfs.org/~bfields/linux
Linus Torvalds [Wed, 1 Apr 2015 16:45:47 +0000 (09:45 -0700)]
Merge branch 'for-4.0' of git://linux-nfs.org/~bfields/linux

Pull nfsd fixes from Bruce Fields:
 "Two main issues:

   - We found that turning on pNFS by default (when it's configured at
     build time) was too aggressive, so we want to switch the default
     before the 4.0 release.

   - Recent client changes to increase open parallelism uncovered a
     serious bug lurking in the server's open code.

  Also fix a krb5/selinux regression.

  The rest is mainly smaller pNFS fixes"

* 'for-4.0' of git://linux-nfs.org/~bfields/linux:
  sunrpc: make debugfs file creation failure non-fatal
  nfsd: require an explicit option to enable pNFS
  NFSD: Fix bad update of layout in nfsd4_return_file_layout
  NFSD: Take care the return value from nfsd4_encode_stateid
  NFSD: Printk blocklayout length and offset as format 0x%llx
  nfsd: return correct lockowner when there is a race on hash insert
  nfsd: return correct openowner when there is a race to put one in the hash
  NFSD: Put exports after nfsd4_layout_verify fail
  NFSD: Error out when register_shrinker() fail
  NFSD: Take care the return value from nfsd4_decode_stateid
  NFSD: Check layout type when returning client layouts
  NFSD: restore trace event lost in mismerge

9 years agoMerge branch 'bnx2'
David S. Miller [Wed, 1 Apr 2015 16:30:39 +0000 (12:30 -0400)]
Merge branch 'bnx2'

Yuval Mintz says:

====================
bnx2x: kdump related fixes

This patch series aims to fix bnx2x driver issues when loading in kdump kernel.
Both issues fixed here would be fatal to the device, requiring full reset of
the system in order to recover, preventing the device from serving its purpose
in the kdump environment.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnx2x: Fix kdump when iommu=on
Yuval Mintz [Wed, 1 Apr 2015 07:02:20 +0000 (10:02 +0300)]
bnx2x: Fix kdump when iommu=on

When IOMM-vtd is active, once main kernel crashes unfinished DMAE transactions
will be blocked, putting the HW in an error state which will cause further
transactions to timeout.

Current employed logic uses wrong macros, causing the first function to be the
only function that cleanups that error state during its probe/load.

This patch allows all the functions to successfully re-load in kdump kernel.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnx2x: Fix kdump on 4-port device
Yuval Mintz [Wed, 1 Apr 2015 07:02:19 +0000 (10:02 +0300)]
bnx2x: Fix kdump on 4-port device

When running in a kdump kernel, it's very likely that due to sync. loss with
management firmware the first PCI function to probe and reach the previous
unload flow would decide it can reset the chip and continue onward. While doing
so, it will only close its own Rx port.

On a 4-port device where 2nd port on engine is a 1g-port, the 2nd port would
allow ingress traffic after the chip is reset [assuming it was active on the
first kernel]. This would later cause a HW attention.

This changes driver flow to close both ports' 1g capabilities during the
previous driver unload flow prior to the chip reset.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomac80211: fix RX A-MPDU session reorder timer deletion
Johannes Berg [Wed, 1 Apr 2015 12:20:42 +0000 (14:20 +0200)]
mac80211: fix RX A-MPDU session reorder timer deletion

There's an issue with the way the RX A-MPDU reorder timer is
deleted that can cause a kernel crash like this:

 * tid_rx is removed - call_rcu(ieee80211_free_tid_rx)
 * station is destroyed
 * reorder timer fires before ieee80211_free_tid_rx() runs,
   accessing the station, thus potentially crashing due to
   the use-after-free

The station deletion is protected by synchronize_net(), but
that isn't enough -- ieee80211_free_tid_rx() need not have
run when that returns (it deletes the timer.) We could use
rcu_barrier() instead of synchronize_net(), but that's much
more expensive.

Instead, to fix this, add a field tracking that the session
is being deleted. In this case, the only re-arming of the
timer happens with the reorder spinlock held, so make that
code not rearm it if the session is being deleted and also
delete the timer after setting that field. This ensures the
timer cannot fire after ___ieee80211_stop_rx_ba_session()
returns, which fixes the problem.

Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 years agoMAINTAINERS: Update Intel Wired Ethernet Driver info
Jeff Kirsher [Thu, 26 Mar 2015 00:01:03 +0000 (17:01 -0700)]
MAINTAINERS: Update Intel Wired Ethernet Driver info

Update the git tree info with a recent change in tree names.  Also
add our new mailing list created solely for Linux kernel patches
and kernel development, as well as the new patchwork project for
tracking patches.  Lastly update the list of "reviewers" since a
couple of developers have moved on to different projects.

Made an update to the section header so that it is more manageable
going forward as we add new drivers.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agotipc: fix a slab object leak
Ying Xue [Wed, 1 Apr 2015 01:42:50 +0000 (09:42 +0800)]
tipc: fix a slab object leak

When remove TIPC module, there is a warning to remind us that a slab
object is leaked like:

root@localhost:~# rmmod tipc
[   19.056226] =============================================================================
[   19.057549] BUG TIPC (Not tainted): Objects remaining in TIPC on kmem_cache_close()
[   19.058736] -----------------------------------------------------------------------------
[   19.058736]
[   19.060287] INFO: Slab 0xffffea0000519a00 objects=23 used=1 fp=0xffff880014668b00 flags=0x100000000004080
[   19.061915] INFO: Object 0xffff880014668000 @offset=0
[   19.062717] kmem_cache_destroy TIPC: Slab cache still has objects

This is because the listening socket of TIPC topology server is not
closed before TIPC proto handler is unregistered with proto_unregister().
However, as the socket is closed in tipc_exit_net() which is called by
unregister_pernet_subsys() during unregistering TIPC namespace operation,
the warning can be eliminated if calling unregister_pernet_subsys() is
moved before calling proto_unregister().

Fixes: e05b31f4bf89 ("tipc: make tipc socket support net namespace")
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/usb/r8152: add device id for Lenovo TP USB 3.0 Ethernet
Christian Hesse [Tue, 31 Mar 2015 12:10:07 +0000 (14:10 +0200)]
net/usb/r8152: add device id for Lenovo TP USB 3.0 Ethernet

This device is sold as 'Lenovo Tinkpad USB 3.0 Ethernet 4X90E51405'.
Chipset is RTL8153 and works with r8152.

Signed-off-by: Christian Hesse <mail@eworm.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoaf_iucv: fix AF_IUCV sendmsg() errno
Eugene Crosser [Mon, 30 Mar 2015 13:40:42 +0000 (15:40 +0200)]
af_iucv: fix AF_IUCV sendmsg() errno

When sending over AF_IUCV socket, errno was incorrectly set to
ENOMEM even when other values where appropriate, notably EAGAIN.
With this patch, error indicator returned by sock_alloc_send_skb()
is passed to the caller, rather than being overwritten with ENOMEM.

Signed-off-by: Eugene Crosser <Eugene.Crosser@ru.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoopenvswitch: Return vport module ref before destruction
Thomas Graf [Mon, 30 Mar 2015 11:57:41 +0000 (13:57 +0200)]
openvswitch: Return vport module ref before destruction

Return module reference before invoking the respective vport
->destroy() function. This is needed as ovs_vport_del() is not
invoked inside an RCU read side critical section so the kfree
can occur immediately before returning to ovs_vport_del().

Returning the module reference before ->destroy() is safe because
the module unregistration is blocked on ovs_lock which we hold
while destroying the datapath.

Fixes: 62b9c8d0372d ("ovs: Turn vports with dependencies into separate modules")
Reported-by: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosunrpc: make debugfs file creation failure non-fatal
Jeff Layton [Tue, 31 Mar 2015 16:03:28 +0000 (12:03 -0400)]
sunrpc: make debugfs file creation failure non-fatal

We currently have a problem that SELinux policy is being enforced when
creating debugfs files. If a debugfs file is created as a side effect of
doing some syscall, then that creation can fail if the SELinux policy
for that process prevents it.

This seems wrong. We don't do that for files under /proc, for instance,
so Bruce has proposed a patch to fix that.

While discussing that patch however, Greg K.H. stated:

    "No kernel code should care / fail if a debugfs function fails, so
     please fix up the sunrpc code first."

This patch converts all of the sunrpc debugfs setup code to be void
return functins, and the callers to not look for errors from those
functions.

This should allow rpc_clnt and rpc_xprt creation to work, even if the
kernel fails to create debugfs files for some reason.

Symptoms were failing krb5 mounts on systems using gss-proxy and
selinux.

Fixes: 388f0c776781 "sunrpc: add a debugfs rpc_xprt directory..."
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
9 years agonetlink: pad nla_memcpy dest buffer with zeroes
Jiri Benc [Sun, 29 Mar 2015 14:05:28 +0000 (16:05 +0200)]
netlink: pad nla_memcpy dest buffer with zeroes

This is especially important in cases where the kernel allocs a new
structure and expects a field to be set from a netlink attribute. If such
attribute is shorter than expected, the rest of the field is left containing
previous data. When such field is read back by the user space, kernel memory
content is leaked.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobonding: Bonding Overriding Configuration logic restored.
Anton Nayshtut [Sun, 29 Mar 2015 11:20:25 +0000 (14:20 +0300)]
bonding: Bonding Overriding Configuration logic restored.

Before commit 3900f29021f0bc7fe9815aa32f1a993b7dfdd402 ("bonding: slight
optimizztion for bond_slave_override()") the override logic was to send packets
with non-zero queue_id through the slave with corresponding queue_id, under two
conditions only - if the slave can transmit and it's up.

The above mentioned commit changed this logic by introducing an additional
condition - whether the bond is active (indirectly, using the slave_can_tx and
later - bond_is_active_slave), that prevents the user from implementing more
complex policies according to the Documentation/networking/bonding.txt.

Signed-off-by: Anton Nayshtut <anton@swortex.com>
Signed-off-by: Alexey Bogoslavsky <alexey@swortex.com>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'ipvlan-corruptions'
David S. Miller [Tue, 31 Mar 2015 17:28:38 +0000 (13:28 -0400)]
Merge branch 'ipvlan-corruptions'

Jiri Benc says:

====================
ipvlan: list corruption and rcu fixes

This patch set fixes different issues leading to corrupted lists and
incorrect rcu usage.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipvlan: fix check for IP addresses in control path
Jiri Benc [Sat, 28 Mar 2015 18:13:25 +0000 (19:13 +0100)]
ipvlan: fix check for IP addresses in control path

When an ipvlan interface is down, its addresses are not on the hash list.
Fix checks for existence of addresses not to depend on the hash list, walk
through all interface addresses instead.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipvlan: do not use rcu operations for address list
Jiri Benc [Sat, 28 Mar 2015 18:13:24 +0000 (19:13 +0100)]
ipvlan: do not use rcu operations for address list

All accesses to ipvlan->addrs are under rtnl.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipvlan: protect against concurrent link removal
Jiri Benc [Sat, 28 Mar 2015 18:13:23 +0000 (19:13 +0100)]
ipvlan: protect against concurrent link removal

Adding and removing to the 'ipvlans' list is already done using _rcu list
operations.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipvlan: fix addr hash list corruption
Jiri Benc [Sat, 28 Mar 2015 18:13:22 +0000 (19:13 +0100)]
ipvlan: fix addr hash list corruption

When ipvlan interface with IP addresses attached is brought down and then
deleted, the assigned addresses are deleted twice from the address hash
list, first on the interface down and second on the link deletion.
Similarly, when an address is added while the interface is down, it is added
second time once the interface is brought up.

When the interface is down, the addresses should be kept off the hash list
for performance reasons. Ensure this is true, which also fixes the double add
problem. To fix the double free, check whether the address is hashed before
removing it.

Reported-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge tag 'locks-v4.0-5' of git://git.samba.org/jlayton/linux
Linus Torvalds [Mon, 30 Mar 2015 22:13:04 +0000 (15:13 -0700)]
Merge tag 'locks-v4.0-5' of git://git.samba.org/jlayton/linux

Pull file locking fix from Jeff Layton:
 "Another small fix for the lease overhaul"

* tag 'locks-v4.0-5' of git://git.samba.org/jlayton/linux:
  locks: fix file_lock deletion inside loop

9 years agonfsd: require an explicit option to enable pNFS
Christoph Hellwig [Mon, 30 Mar 2015 16:46:29 +0000 (18:46 +0200)]
nfsd: require an explicit option to enable pNFS

Turns out sending out layouts to any client is a bad idea if they
can't get at the storage device, so require explicit admin action
to enable pNFS.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
9 years agoMerge branch 'for-4.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj...
Linus Torvalds [Mon, 30 Mar 2015 18:08:37 +0000 (11:08 -0700)]
Merge branch 'for-4.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata

Pull libata fixes from Tejun Heo:
 "Nothing exciting.  Two patches to update queued trim blacklist"

* 'for-4.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  libata: Blacklist queued TRIM on Samsung SSD 850 Pro
  libata: Update Crucial/Micron blacklist

9 years agodmaengine: moxart-dma: Fix memory leak when stopping a running transfer
Peter Ujfalusi [Fri, 27 Mar 2015 11:35:55 +0000 (13:35 +0200)]
dmaengine: moxart-dma: Fix memory leak when stopping a running transfer

The vd->node is removed from the lists when the transfer started so the
vchan_get_all_descriptors() will not find it. This results memory leak.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
9 years agodmaengine: bcm2835-dma: Fix memory leak when stopping a running transfer
Peter Ujfalusi [Fri, 27 Mar 2015 11:35:53 +0000 (13:35 +0200)]
dmaengine: bcm2835-dma: Fix memory leak when stopping a running transfer

The vd->node is removed from the lists when the transfer started so the
vchan_get_all_descriptors() will not find it. This results memory leak.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Acked-by: Stephen Warren <swarren@wwwdotorg.org>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
9 years agodmaengine: omap-dma: Fix memory leak when terminating running transfer
Peter Ujfalusi [Fri, 27 Mar 2015 11:35:52 +0000 (13:35 +0200)]
dmaengine: omap-dma: Fix memory leak when terminating running transfer

In omap_dma_start_desc the vdesc->node is removed from the virt-dma
framework managed lists (to be precise from the desc_issued list).
If a terminate_all comes before the transfer finishes the omap_desc will
not be freed up because it is not in any of the lists and we stopped the
DMA channel so the transfer will not going to complete.
There is no special sequence for leaking memory when using cyclic (audio)
transfer: with every start and stop of a cyclic transfer the driver leaks
struct omap_desc worth of memory.

Free up the allocated memory directly in omap_dma_terminate_all() since the
framework will not going to do that for us.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
CC: <stable@vger.kernel.org>
CC: <linux-omap@vger.kernel.org>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
9 years agodmaengine: edma: fix memory leak when terminating running transfers
Petr Kulhavy [Fri, 27 Mar 2015 11:35:51 +0000 (13:35 +0200)]
dmaengine: edma: fix memory leak when terminating running transfers

If edma_terminate_all() was called while a transfer was running (i.e. after
edma_execute() but before edma_callback()) the echan->edesc was not freed.

This was due to the fact that a running transfer is on none of the
vchan lists: desc_submitted, desc_issued, desc_completed (edma_execute()
removes it from the desc_issued list), so the vchan_dma_desc_free_list()
called at the end of edma_terminate_all() didn't find it and didn't free it.

This bug was found on an AM1808 based hardware (very similar to da850evm,
however using the second MMC/SD controller), where intense operations on the SD
card wasted the device 128MB RAM within a couple of days.

Peter Ujfalusi:
The issue is even more severe since it affects cyclic (audio) transfers as
well. In this case starting/stopping audio will results memory leak.

Signed-off-by: Petr Kulhavy <petr@barix.com>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
CC: <stable@vger.kernel.org>
CC: <linux-omap@vger.kernel.org>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
9 years agodmaengine: jz4740: Define capabilities
Lars-Peter Clausen [Sat, 28 Mar 2015 17:05:44 +0000 (18:05 +0100)]
dmaengine: jz4740: Define capabilities

Setup the capabilities of the device/driver, so that users of the DMAengine API
can query them.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
9 years agoMerge tag 'gpio-v4.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux...
Linus Torvalds [Mon, 30 Mar 2015 16:14:41 +0000 (09:14 -0700)]
Merge tag 'gpio-v4.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull late GPIO fixes from Linus Walleij:
 "Here are the (hopefully) last GPIO fixes for v4.0.  Nothing
  controversial whatsoever, just fixes:

   - syscon GPIO fix for Keystone DSP GPIOs

   - pin number translation fix for ACPI GPIO

   - a smallish compiler warning fix on the mpc8xxx driver"

* tag 'gpio-v4.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: syscon: reduce message level when direction reg offset not in dt
  gpiolib: translate pin number in GPIO ACPI callbacks
  gpio: mpc8xxx: remove __initdata annotation for mpc8xxx_gpio_ids[]

9 years agoMerge tag 'iwlwifi-for-kalle-2015-03-30' of https://git.kernel.org/pub/scm/linux...
Kalle Valo [Mon, 30 Mar 2015 06:39:12 +0000 (09:39 +0300)]
Merge tag 'iwlwifi-for-kalle-2015-03-30' of https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes

* fix a memory leak: we leaked memory each time the module
was loaded.

9 years agoLinux 4.0-rc6 v4.0-rc6
Linus Torvalds [Sun, 29 Mar 2015 22:26:31 +0000 (15:26 -0700)]
Linux 4.0-rc6

9 years agoMerge tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm...
Linus Torvalds [Sun, 29 Mar 2015 22:09:31 +0000 (15:09 -0700)]
Merge tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "The latest and greatest fixes for ARM platform code.  Worth pointing
  out are:

   - Lines-wise, largest is a PXA fix for dealing with interrupts on DT
     that was quite broken.  It's still newish code so while we could
     have held this off, it seemed appropriate to include now

   - Some GPIO fixes for OMAP platforms added a few lines.  This was
     also fixes for code recently added (this release).

   - Small OMAP timer fix to behave better with partially upstreamed
     platforms, which is quite welcome.

   - Allwinner fixes about operating point control, reducing
     overclocking in some cases for better stability.

  plus a handful of other smaller fixes across the map"

* tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  arm64: juno: Fix misleading name of UART reference clock
  ARM: dts: sunxi: Remove overclocked/overvoltaged OPP
  ARM: dts: sun4i: a10-lime: Override and remove 1008MHz OPP setting
  ARM: socfpga: dts: fix spi1 interrupt
  ARM: dts: Fix gpio interrupts for dm816x
  ARM: dts: dra7: remove ti,hwmod property from pcie phy
  ARM: OMAP: dmtimer: disable pm runtime on remove
  ARM: OMAP: dmtimer: check for pm_runtime_get_sync() failure
  ARM: OMAP2+: Fix socbus family info for AM33xx devices
  ARM: dts: omap3: Add missing dmas for crypto
  ARM: dts: rockchip: disable gmac by default in rk3288.dtsi
  MAINTAINERS: add rockchip regexp to the ARM/Rockchip entry
  ARM: pxa: fix pxa interrupts handling in DT
  ARM: pxa: Fix typo in zeus.c
  ARM: sunxi: Have ARCH_SUNXI select RESET_CONTROLLER for clock driver usage

9 years agoMerge tag 'sunxi-fixes-for-4.0' of https://git.kernel.org/pub/scm/linux/kernel/git...
Olof Johansson [Sun, 29 Mar 2015 21:00:53 +0000 (14:00 -0700)]
Merge tag 'sunxi-fixes-for-4.0' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into fixes

Allwinner fixes for 4.0

There's a few fixes to merge for 4.0, one to add a select in the machine
Kconfig option to fix a potential build failure, and two fixing cpufreq related
issues.

* tag 'sunxi-fixes-for-4.0' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux:
  ARM: dts: sunxi: Remove overclocked/overvoltaged OPP
  ARM: dts: sun4i: a10-lime: Override and remove 1008MHz OPP setting
  ARM: sunxi: Have ARCH_SUNXI select RESET_CONTROLLER for clock driver usage

Signed-off-by: Olof Johansson <olof@lixom.net>
9 years agoMerge tag 'fixes-v4.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind...
Olof Johansson [Sun, 29 Mar 2015 20:58:54 +0000 (13:58 -0700)]
Merge tag 'fixes-v4.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes

Fixes for omaps for the -rc cycle:

- Fix a device tree based booting vs legacy booting regression for
  omap3 crypto hardware by adding the missing DMA channels.

- Fix /sys/bus/soc/devices/soc0/family for am33xx devices.

- Fix two timer issues that can cause hangs if the timer related
  hwmod data is missing like it often initially is for new SoCs.

- Remove pcie hwmods entry from dts as that causes runtime PM to
  fail for the PHYs.

- A paper bag type dts configuration fix for dm816x GPIO
  interrupts that I just noticed. This is most of the changes
  diffstat wise, but as it's a basic feature for connecting
  devices and things work otherwise, it should be fixed.

* tag 'fixes-v4.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: dts: Fix gpio interrupts for dm816x
  ARM: dts: dra7: remove ti,hwmod property from pcie phy
  ARM: OMAP: dmtimer: disable pm runtime on remove
  ARM: OMAP: dmtimer: check for pm_runtime_get_sync() failure
  ARM: OMAP2+: Fix socbus family info for AM33xx devices
  ARM: dts: omap3: Add missing dmas for crypto

Signed-off-by: Olof Johansson <olof@lixom.net>
9 years agoMerge tag 'socfpga_fix_for_v4.0_2' of git://git.rocketboards.org/linux-socfpga-next...
Olof Johansson [Sun, 29 Mar 2015 20:58:04 +0000 (13:58 -0700)]
Merge tag 'socfpga_fix_for_v4.0_2' of git://git.rocketboards.org/linux-socfpga-next into fixes

Late fix for v4.0 on the SoCFPGA platform:
- Fix interrupt number for SPI1 interface

* tag 'socfpga_fix_for_v4.0_2' of git://git.rocketboards.org/linux-socfpga-next:
  ARM: socfpga: dts: fix spi1 interrupt

Signed-off-by: Olof Johansson <olof@lixom.net>
9 years agoarm64: juno: Fix misleading name of UART reference clock
Dave Martin [Tue, 17 Mar 2015 12:35:41 +0000 (12:35 +0000)]
arm64: juno: Fix misleading name of UART reference clock

The UART reference clock speed is 7273.8 kHz, not 72738 kHz.

Dots aren't usually used in node names even though ePAPR permits
them.  However, this can easily be avoided by expressing the
frequency in Hz, not kHz.

This patch changes the name to refclk7273800hz, reflecting the
actual clock speed.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Liviu Dudau <Liviu.Dudau@arm.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
9 years agoMerge tag 'fixes-for-v4.0-rc5' of https://github.com/rjarzmik/linux into fixes
Olof Johansson [Sun, 29 Mar 2015 20:47:21 +0000 (13:47 -0700)]
Merge tag 'fixes-for-v4.0-rc5' of https://github.com/rjarzmik/linux into fixes

arm: pxa: fixes for v4.0-rc5

There are only 2 fixes, one for the zeus board about the regulator changes,
where a typo prevented the zeus board from having a working can regulator,
and one regression triggered by the interrupts IRQ shift of 16 affecting all
boards.

* tag 'fixes-for-v4.0-rc5' of https://github.com/rjarzmik/linux:
  ARM: pxa: fix pxa interrupts handling in DT
  ARM: pxa: Fix typo in zeus.c

Signed-off-by: Olof Johansson <olof@lixom.net>
9 years agonet: fec: setup right value for mdio hold time
Uwe Kleine-König [Fri, 27 Mar 2015 10:08:32 +0000 (11:08 +0100)]
net: fec: setup right value for mdio hold time

The FEC modules used on i.MX28 and newer have a register to tune the MDIO
output hold time that should be at least 10 ns. Up to now this value was not
explicitly set and so resulted in less hold time if the fec clock was
faster than 100 MHz.

This was noticed on an i.MX28 machine that uses an input clock of ~150
Mhz which resulted in unreliable communication with a Marvell switch.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: tcp6: fix double call of tcp_v6_fill_cb()
Alexey Kodanev [Fri, 27 Mar 2015 09:24:22 +0000 (12:24 +0300)]
net: tcp6: fix double call of tcp_v6_fill_cb()

tcp_v6_fill_cb() will be called twice if socket's state changes from
TCP_TIME_WAIT to TCP_LISTEN. That can result in control buffer data
corruption because in the second tcp_v6_fill_cb() call it's not copying
IP6CB(skb) anymore, but 'seq', 'end_seq', etc., so we can get weird and
unpredictable results. Performance loss of up to 1200% has been observed
in LTP/vxlan03 test.

This can be fixed by copying inet6_skb_parm to the beginning of 'cb'
only if xfrm6_policy_check() and tcp_v6_fill_cb() are going to be
called again.

Fixes: 2dc49d1680b53 ("tcp6: don't move IP6CB before xfrm6_policy_check()")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4vf: Fix sparse warnings
Hariprasad Shenai [Fri, 27 Mar 2015 05:31:18 +0000 (11:01 +0530)]
cxgb4vf: Fix sparse warnings

Fixes sparse warnings introduced in commit e85c9a7abfa407ed ("cxgb4/cxgb4vf: Add
code to calculate T5 BAR2 Offsets for SGE Queue Registers") and
df64e4d38c904dd3 ("cxgb4/cxgb4vf: Use new interfaces to calculate BAR2 SGE Queue
Register addresses") and few old ones

sparse warnings:
>> drivers/net/ethernet/chelsio/cxgb4vf/sge.c:1006:48: sparse: cast removes
>> address space of expression
>> drivers/net/ethernet/chelsio/cxgb4vf/sge.c:1006:48: sparse: incorrect type in
>> initializer (different address space)
>> drivers/net/ethernet/chelsio/cxgb4vf/sge.c:1020:40: sparse: incorrect type in
>> argument 1 (different base types)

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonetns: don't clear nsid too early on removal
Nicolas Dichtel [Thu, 26 Mar 2015 16:56:38 +0000 (17:56 +0100)]
netns: don't clear nsid too early on removal

With the current code, ids are removed too early.
Suppose you have an ipip interface that stands in the netns foo and its link
part in the netns bar (so the netns bar has an nsid into the netns foo).
Now, you remove the netns bar:
 - the bar nsid into the netns foo is removed
 - the netns exit method of ipip is called, thus our ipip iface is removed:
   => a netlink message is sent in the netns foo to advertise this deletion
   => this netlink message requests an nsid for bar, thus a new nsid is
      allocated for bar and never removed.

We must remove nsids when we are sure that nobody will refer to netns currently
cleaned.

Fixes: 0c7aecd4bde4 ("netns: add rtnl cmd to add and get peer netns ids")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'cxgb4'
David S. Miller [Sun, 29 Mar 2015 19:24:46 +0000 (12:24 -0700)]
Merge branch 'cxgb4'

Hariprasad Shenai says:

====================
cxgb4: Fixes ingress queue mapping and other fixes

The below series fixes ingress queue mapping by allocating them dynamically to
prevent stack overflow. Disable napi and interrupts before unregistering netdev
to avoid crash while unloading driver when traffic is flowing.

The patches series is created against 'net' tree.
And includes patches on cxgb4 driver.

We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: Disable interrupts and napi before unregistering netdev
Hariprasad Shenai [Thu, 26 Mar 2015 04:34:26 +0000 (10:04 +0530)]
cxgb4: Disable interrupts and napi before unregistering netdev

Disable interrupts and quiesce rx before unregistering net device to avoid crash
while unloading driver when traffic is flowing through.

Based on original work by Shameem Khalid <shameem@chelsio.com>

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: Allocate dynamic mem. for egress and ingress queue maps
Hariprasad Shenai [Thu, 26 Mar 2015 04:34:25 +0000 (10:04 +0530)]
cxgb4: Allocate dynamic mem. for egress and ingress queue maps

QIDs (egress/ingress) from firmware in FW_*_CMD.alloc command
can be anywhere in the range from EQ(IQFLINT)_START to EQ(IQFLINT)_END.
For eg, in the first load eqid can be from 100 to 300.
In the next load it can be from 301 to 500 (assume eq_start is 100 and eq_end is
1000).

The driver was assuming them to always start from EQ(IQFLINT)_START till
MAX_EGRQ(INGQ). This was causing stack overflow and subsequent crash.

Fixed it by dynamically allocating memory (of qsize (x_END - x_START + 1)) for
these structures.

Based on original work by Santosh Rastapur <santosh@chelsio.com>

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipmr,ip6mr: call ip6mr_free_table() on failure path
WANG Cong [Wed, 25 Mar 2015 21:45:03 +0000 (14:45 -0700)]
ipmr,ip6mr: call ip6mr_free_table() on failure path

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agousbnet: Fix tx_bytes statistic running backward in cdc_ncm
Ben Hutchings [Wed, 25 Mar 2015 20:41:33 +0000 (21:41 +0100)]
usbnet: Fix tx_bytes statistic running backward in cdc_ncm

cdc_ncm disagrees with usbnet about how much framing overhead should
be counted in the tx_bytes statistics, and tries 'fix' this by
decrementing tx_bytes on the transmit path.  But statistics must never
be decremented except due to roll-over; this will thoroughly confuse
user-space.  Also, tx_bytes is only incremented by usbnet in the
completion path.

Fix this by requiring drivers that set FLAG_MULTI_FRAME to set a
tx_bytes delta along with the tx_packets count.

Fixes: beeecd42c3b4 ("net: cdc_ncm/cdc_mbim: adding NCM protocol statistics")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
9 years agousbnet: Fix tx_packets stat for FLAG_MULTI_FRAME drivers
Ben Hutchings [Thu, 26 Feb 2015 19:34:37 +0000 (19:34 +0000)]
usbnet: Fix tx_packets stat for FLAG_MULTI_FRAME drivers

Currently the usbnet core does not update the tx_packets statistic for
drivers with FLAG_MULTI_PACKET and there is no hook in the TX
completion path where they could do this.

cdc_ncm and dependent drivers are bumping tx_packets stat on the
transmit path while asix and sr9800 aren't updating it at all.

Add a packet count in struct skb_data so these drivers can fill it
in, initialise it to 1 for other drivers, and add the packet count
to the tx_packets statistic on completion.

Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Tested-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 28 Mar 2015 18:25:04 +0000 (11:25 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Ingo Molnar:
 "Fix x86 syscall exit code bug that resulted in spurious non-execution
  of TIF-driven user-return worklets, causing big trouble for things
  like KVM that rely on user notifiers for correctness of their vcpu
  model, causing crashes like double faults"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm/entry: Check for syscall exit work with IRQs disabled

9 years agoMerge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 28 Mar 2015 18:21:23 +0000 (11:21 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Ingo Molnar:
 "Two clocksource driver fixes, and an idle loop RCU warning fix"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource/drivers/sun5i: Fix cpufreq interaction with sched_clock()
  clocksource/drivers: Fix various !CONFIG_HAS_IOMEM build errors
  timers/tick/broadcast-hrtimer: Fix suspicious RCU usage in idle loop

9 years agoMerge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 28 Mar 2015 18:17:32 +0000 (11:17 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Ingo Molnar:
 "A single sched/rt corner case fix for RLIMIT_RTIME correctness"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix RLIMIT_RTTIME when PI-boosting to RT

9 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 28 Mar 2015 18:12:08 +0000 (11:12 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fix from Ingo Molnar:
 "A perf kernel side fix for a fuzzer triggered lockup"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix irq_work 'tail' recursion

9 years agoMerge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 28 Mar 2015 18:05:03 +0000 (11:05 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fix from Ingo Molnar:
 "A module unload lockdep race fix"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  lockdep: Fix the module unload key range freeing logic

9 years agoMerge branch 'parisc-4.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller...
Linus Torvalds [Sat, 28 Mar 2015 17:58:53 +0000 (10:58 -0700)]
Merge branch 'parisc-4.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parsic fixes from Helge Deller:
 "One patch from Mikulas fixes a bug on parisc by artifically
  incrementing the counter in pmd_free when the kernel tries to free
  the preallocated pmd.

  Other than that we now prevent that syscalls gets added without
  incrementing __NR_Linux_syscalls and fix the initial pmd setup code
  if a default page size greater than 4k has been selected"

* 'parisc-4.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Fix pmd code to depend on PT_NLEVELS value, not on CONFIG_64BIT
  parisc: mm: don't count preallocated pmds
  parisc: Add compile-time check when adding new syscalls

9 years agoMerge git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sat, 28 Mar 2015 17:54:59 +0000 (10:54 -0700)]
Merge git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm ppc bugfixes from Marcelo Tosatti.

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: PPC: Book3S HV: Fix instruction emulation
  KVM: PPC: Book3S HV: Endian fix for accessing VPA yield count
  KVM: PPC: Book3S HV: Fix spinlock/mutex ordering issue in kvmppc_set_lpcr()

9 years agoMerge tag 'arc-4.0-fixes-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 28 Mar 2015 17:47:27 +0000 (09:47 -0800)]
Merge tag 'arc-4.0-fixes-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:
 "We found some issues with signal handling taking down the system.  I
  know its late, but these are important and all marked for stable.

  ARC signal handling related fixes uncovered during recent testing of
  NPTL tools"

* tag 'arc-4.0-fixes-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: signal handling robustify
  ARC: SA_SIGINFO ucontext regs off-by-one

9 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Sat, 28 Mar 2015 17:41:22 +0000 (10:41 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull selinux bugfix from James Morris.

Fix broken return value.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  selinux: fix sel_write_enforce broken return value

9 years agoMerge git://www.linux-watchdog.org/linux-watchdog
Linus Torvalds [Fri, 27 Mar 2015 21:45:42 +0000 (14:45 -0700)]
Merge git://www.linux-watchdog.org/linux-watchdog

Pull watchdog fixes from Wim Van Sebroeck:

 - mtk_wdt: signedness bug in mtk_wdt_start()

 - imgpdc: Fix NULL pointer dereference during probe and fix the default
   heartbeat

* git://www.linux-watchdog.org/linux-watchdog:
  watchdog: imgpdc: Fix default heartbeat
  watchdog: imgpdc: Fix probe NULL pointer dereference
  watchdog: mtk_wdt: signedness bug in mtk_wdt_start()

9 years agoMerge tag 'sound-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 27 Mar 2015 21:38:02 +0000 (14:38 -0700)]
Merge tag 'sound-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Three trivial oneliner fixes for HD-audio.

  Two are device-specific quirks while one is a generic fix for recent
  Realtek codecs"

* tag 'sound-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Add one more node in the EAPD supporting candidate list
  ALSA: hda_intel: apply the Seperate stream_tag for Sunrise Point
  ALSA: hda - Add dock support for Thinkpad T450s (17aa:5036)

9 years agolibata: Blacklist queued TRIM on Samsung SSD 850 Pro
Martin K. Petersen [Fri, 27 Mar 2015 19:17:21 +0000 (15:17 -0400)]
libata: Blacklist queued TRIM on Samsung SSD 850 Pro

Blacklist queued TRIM on this drive for now.

Reported-by: Stefan Keller <linux-list@zahlenfresser.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
CC: stable@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
9 years agolibata: Update Crucial/Micron blacklist
Martin K. Petersen [Fri, 27 Mar 2015 19:17:20 +0000 (15:17 -0400)]
libata: Update Crucial/Micron blacklist

Micron has released an updated firmware (MU02) for M510/M550/MX100
drives to fix the issues with queued TRIM. Queued TRIM remains broken on
M500 but is working fine on later drives such as M600 and MX200.

Tweak our blacklist to reflect the above.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=71371
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
9 years agolocks: fix file_lock deletion inside loop
Yan, Zheng [Fri, 27 Mar 2015 02:34:20 +0000 (10:34 +0800)]
locks: fix file_lock deletion inside loop

locks_delete_lock_ctx() is called inside the loop, so we
should use list_for_each_entry_safe.

Fixes: 8634b51f6ca2 (locks: convert lease handling to file_lock_context)
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
9 years agogpio: syscon: reduce message level when direction reg offset not in dt
Grygorii Strashko [Tue, 24 Mar 2015 18:42:42 +0000 (20:42 +0200)]
gpio: syscon: reduce message level when direction reg offset not in dt

Now GPIO syscon driver produces bunch of warnings during the
boot of Kesytone 2 SoCs:
 gpio-syscon soc:keystone_dsp_gpio@02620240: can't read the dir register offset!
 gpio-syscon soc:keystone_dsp_gpio@2620244: can't read the dir register offset!

This message unintentionally was added using dev_err(), but its
actual log level is debug, because third cell of "ti,syscon-dev" is
optional.

Hence change it to dev_dbg() as it should be.

This patch fixes commit:
 5a3e3f8 ("gpio: syscon: retriave syscon node and regs offsets from dt")

Reported-by: Russell King <linux@arm.linux.org.uk>
Tested-by: Murali Karicheri <m-karicheri2@ti.com>
Acked-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Grygorii Strashko <grygorii.strashko@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
9 years agoMerge branch 'upstream' of git://git.infradead.org/users/pcmoore/selinux into for...
James Morris [Fri, 27 Mar 2015 09:33:27 +0000 (20:33 +1100)]
Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/selinux into for-linus

9 years agowatchdog: imgpdc: Fix default heartbeat
James Hogan [Fri, 20 Feb 2015 23:45:45 +0000 (23:45 +0000)]
watchdog: imgpdc: Fix default heartbeat

The IMG PDC watchdog driver heartbeat module parameter has no default so
it is initialised to zero. This results in the following warning during
probe:

imgpdc-wdt 2006000.wdt: Initial timeout out of range! setting max timeout

The module parameter description implies that the default value should
be PDC_WDT_DEF_TIMEOUT, which isn't yet used, so initialise it to that.

Also tweak the heartbeat module parameter description for consistency.

Fixes: 93937669e9b5 ("watchdog: ImgTec PDC Watchdog Timer Driver")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ezequiel Garcia <ezequiel.garcia@imgtec.com>
Cc: Naidu Tellapati <Naidu.Tellapati@imgtec.com>
Cc: Jude Abraham <Jude.Abraham@imgtec.com>
Cc: linux-watchdog@vger.kernel.org
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
9 years agowatchdog: imgpdc: Fix probe NULL pointer dereference
James Hogan [Fri, 20 Feb 2015 23:45:44 +0000 (23:45 +0000)]
watchdog: imgpdc: Fix probe NULL pointer dereference

The IMG PDC watchdog probe function calls pdc_wdt_stop() prior to
watchdog_set_drvdata(), causing a NULL pointer dereference when
pdc_wdt_stop() retrieves the struct pdc_wdt_dev pointer using
watchdog_get_drvdata() and reads the register base address through it.

Fix by moving the watchdog_set_drvdata() call earlier, to where various
other pdc_wdt->wdt_dev fields are initialised.

Fixes: 93937669e9b5 ("watchdog: ImgTec PDC Watchdog Timer Driver")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ezequiel Garcia <ezequiel.garcia@imgtec.com>
Cc: Naidu Tellapati <Naidu.Tellapati@imgtec.com>
Cc: Jude Abraham <Jude.Abraham@imgtec.com>
Cc: linux-watchdog@vger.kernel.org
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
9 years agowatchdog: mtk_wdt: signedness bug in mtk_wdt_start()
Dan Carpenter [Wed, 11 Feb 2015 10:26:21 +0000 (13:26 +0300)]
watchdog: mtk_wdt: signedness bug in mtk_wdt_start()

"ret" should be signed for the error handling to work correctly.  This
doesn't matter much in real life since mtk_wdt_set_timeout() always
succeeds.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
9 years agoMerge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Thu, 26 Mar 2015 22:04:05 +0000 (15:04 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm refcounting fixes from Dave Airlie:
 "Here is the complete set of i915 bug/warn/refcounting fixes"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/i915: Fixup legacy plane->crtc link for initial fb config
  drm/i915: Fix atomic state when reusing the firmware fb
  drm/i915: Keep ring->active_list and ring->requests_list consistent
  drm/i915: Don't try to reference the fb in get_initial_plane_config()
  drm: Fixup racy refcounting in plane_force_disable

9 years agoMerge tag 'dm-4.0-fix-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device...
Linus Torvalds [Thu, 26 Mar 2015 21:53:47 +0000 (14:53 -0700)]
Merge tag 'dm-4.0-fix-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fix from Mike Snitzer:
 "Fix DM core device cleanup regression -- due to a latent race that was
  exposed by the bdi changes that were introduced during the 4.0 merge"

* tag 'dm-4.0-fix-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: fix add_disk() NULL pointer due to race with free_dev()

9 years agoMerge tag 'linux-kselftest-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 26 Mar 2015 21:43:42 +0000 (14:43 -0700)]
Merge tag 'linux-kselftest-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fix from Shuah Khan.

* tag 'linux-kselftest-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: Fix build failures when invoked from kselftest target

9 years agoMerge tag 'drm-intel-fixes-2015-03-26' of git://anongit.freedesktop.org/drm-intel...
Dave Airlie [Thu, 26 Mar 2015 21:39:45 +0000 (07:39 +1000)]
Merge tag 'drm-intel-fixes-2015-03-26' of git://anongit.freedesktop.org/drm-intel into drm-fixes

This should cover the final warnings in -rc5 with two more backports
from our development branch (drm-intel-next-queued). They're the ones
from Daniel and Damien, with references to the reports.

This is on top of drm-fixes because of the dependency on the two earlier
fixes not yet in Linus' tree.

There's an additional regression fix from Chris.

* tag 'drm-intel-fixes-2015-03-26' of git://anongit.freedesktop.org/drm-intel:
  drm/i915: Fixup legacy plane->crtc link for initial fb config
  drm/i915: Fix atomic state when reusing the firmware fb
  drm/i915: Keep ring->active_list and ring->requests_list consistent

9 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Thu, 26 Mar 2015 21:11:17 +0000 (14:11 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:
 "A couple of bug fixes for s390.

  The ftrace comile fix is quite large for a -rc6 release, but it would
  be nice to have it in 4.0"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/smp: reenable smt after resume
  s390/mm: limit STACK_RND_MASK for compat tasks
  s390/ftrace: fix compile error if CONFIG_KPROBES is disabled
  s390/cpum_sf: add diagnostic sampling event only if it is authorized

9 years agodrm/i915: Fixup legacy plane->crtc link for initial fb config
Daniel Vetter [Wed, 25 Mar 2015 17:30:38 +0000 (18:30 +0100)]
drm/i915: Fixup legacy plane->crtc link for initial fb config

This is a very similar bug in the load detect code fixed in

commit 9128b040eb774e04bc23777b005ace2b66ab2a85
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Mar 3 17:31:21 2015 +0100

    drm/i915: Fix modeset state confusion in the load detect code

But this time around it was the initial fb code that forgot to update
the plane->crtc pointer. Otherwise it's the exact same bug, with the
exact same restrains (any set_config call/ioctl that doesn't disable
the pipe papers over the bug for free, so fairly hard to hit in normal
testing). So if you want the full explanation just go read that one
over there - it's rather long ...

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Reported-and-tested-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[Jani: backported to drm-intel-fixes for v4.0-rc]
Reference: http://mid.gmane.org/CA+5PVA7ChbtJrknqws1qvZcbrg1CW2pQAFkSMURWWgyASRyGXg@mail.gmail.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
9 years agodrm/i915: Fix atomic state when reusing the firmware fb
Damien Lespiau [Thu, 5 Feb 2015 19:24:25 +0000 (19:24 +0000)]
drm/i915: Fix atomic state when reusing the firmware fb

Right now, we get a warning when taking over the firmware fb:

  [drm:drm_atomic_plane_check] FB set but no CRTC

with the following backtrace:

  [<ffffffffa010339d>] drm_atomic_check_only+0x35d/0x510 [drm]
  [<ffffffffa0103567>] drm_atomic_commit+0x17/0x60 [drm]
  [<ffffffffa00a6ccd>] drm_atomic_helper_plane_set_property+0x8d/0xd0 [drm_kms_helper]
  [<ffffffffa00f1fed>] drm_mode_plane_set_obj_prop+0x2d/0x90 [drm]
  [<ffffffffa00a8a1b>] restore_fbdev_mode+0x6b/0xf0 [drm_kms_helper]
  [<ffffffffa00aa969>] drm_fb_helper_restore_fbdev_mode_unlocked+0x29/0x80 [drm_kms_helper]
  [<ffffffffa00aa9e2>] drm_fb_helper_set_par+0x22/0x50 [drm_kms_helper]
  [<ffffffffa050a71a>] intel_fbdev_set_par+0x1a/0x60 [i915]
  [<ffffffff813ad444>] fbcon_init+0x4f4/0x580

That's because we update the plane state with the fb from the firmware, but we
never associate the plane to that CRTC.

We don't quite have the full DRM take over from HW state just yet, so
fake enough of the plane atomic state to pass the checks.

v2: Fix the state on which we set the CRTC in the case we're sharing the
    initial fb with another pipe. (Matt)

Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[Jani: backported to drm-intel-fixes for v4.0-rc]
Reference: http://mid.gmane.org/CA+5PVA7yXH=U757w8V=Zj2U1URG4nYNav20NpjtQ4svVueyPNw@mail.gmail.com
Reference: http://lkml.kernel.org/r/CA+55aFweWR=nDzc2Y=rCtL_H8JfdprQiCimN5dwc+TgyD4Bjsg@mail.gmail.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
9 years agoALSA: hda - Add one more node in the EAPD supporting candidate list
Hui Wang [Thu, 26 Mar 2015 09:14:55 +0000 (17:14 +0800)]
ALSA: hda - Add one more node in the EAPD supporting candidate list

We have a HP machine which use the codec node 0x17 connecting the
internal speaker, and from the node capability, we saw the EAPD,
if we don't set the EAPD on for this node, the internal speaker
can't output any sound.

Cc: <stable@vger.kernel.org>
BugLink: https://bugs.launchpad.net/bugs/1436745
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
9 years agoclocksource/drivers/sun5i: Fix cpufreq interaction with sched_clock()
Maxime Ripard [Thu, 26 Mar 2015 09:27:09 +0000 (10:27 +0100)]
clocksource/drivers/sun5i: Fix cpufreq interaction with sched_clock()

The sun5i timer is used as the sched-clock on certain systems, and ever
since we started using cpufreq, the cpu clock (that is one of the
timer's clock indirect parent) now changes as well, along with the
actual sched_clock() rate.

This is not accurate and not desirable.

We can safely remove the sun5i sched-clock on those systems, since we
have other reliable sched_clock() sources in the system.

Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
[ Improved the changelog. ]
Cc: richard@nod.at
Link: http://lkml.kernel.org/r/1427362029-6511-4-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
9 years agoclocksource/drivers: Fix various !CONFIG_HAS_IOMEM build errors
Richard Weinberger [Thu, 26 Mar 2015 09:27:06 +0000 (10:27 +0100)]
clocksource/drivers: Fix various !CONFIG_HAS_IOMEM build errors

Fix !CONFIG_HAS_IOMEM related build failures in three clocksource drivers.

The build failures have the pattern of:

  drivers/clocksource/sh_cmt.c: In function â€˜sh_cmt_map_memory’: drivers/clocksource/sh_cmt.c:920:2:
  error: implicit declaration of function â€˜ioremap_nocache’ [-Werror=implicit-function-declaration]   cmt->mapbase = ioremap_nocache(mem->start, resource_size(mem));

Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: maxime.ripard@free-electrons.com
Link: http://lkml.kernel.org/r/1427362029-6511-1-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
9 years agodrm/i915: Keep ring->active_list and ring->requests_list consistent
Chris Wilson [Wed, 18 Mar 2015 18:19:22 +0000 (18:19 +0000)]
drm/i915: Keep ring->active_list and ring->requests_list consistent

If we retire requests last, we may use a later seqno and so clear
the requests lists without clearing the active list, leading to
confusion. Hence we should retire requests first for consistency with
the early return. The order used to be important as the lifecycle for
the object on the active list was determined by request->seqno. However,
the requests themselves are now reference counted removing the
constraint from the order of retirement.

Fixes regression from

commit 1b5a433a4dd967b125131da42b89b5cc0d5b1f57
Author: John Harrison <John.C.Harrison@Intel.com>
Date:   Mon Nov 24 18:49:42 2014 +0000

    drm/i915: Convert 'i915_seqno_passed' calls into 'i915_gem_request_completed
'

and a

WARNING: CPU: 0 PID: 1383 at drivers/gpu/drm/i915/i915_gem_evict.c:279 i915_gem_evict_vm+0x10c/0x140()
WARN_ON(!list_empty(&vm->active_list))

Identified by updating WATCH_LISTS:

[drm:i915_verify_lists] *ERROR* blitter ring: active list not empty, but no requests
WARNING: CPU: 0 PID: 681 at drivers/gpu/drm/i915/i915_gem.c:2751 i915_gem_retire_requests_ring+0x149/0x230()
WARN_ON(i915_verify_lists(ring->dev))

Note that this is only a problem in evict_vm where the following happens
after a retire_request has cleaned out all requests, but not all active
bo:
- intel_ring_idle called from i915_gpu_idle notices that no requests are
  outstanding and immediately returns.
- i915_gem_retire_requests_ring called from i915_gem_retire_requests also
  immediately returns when there's no request, still leaving the bo on the
  active list.
- evict_vm hits the WARN_ON(!list_empty(&vm->active_list)) after evicting
  all active objects that there's still stuff left that shouldn't be
  there.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
9 years agoALSA: hda_intel: apply the Seperate stream_tag for Sunrise Point
Libin Yang [Thu, 26 Mar 2015 05:28:39 +0000 (13:28 +0800)]
ALSA: hda_intel: apply the Seperate stream_tag for Sunrise Point

The total stream number of Sunrise Point's input and output stream
exceeds 15, which will cause some streams do not work because
of the overflow on SDxCTL.STRM field if using the legacy
stream tag allocation method.

This patch uses the new stream tag allocation method by add
the flag AZX_DCAPS_SEPARATE_STREAM_TAG for Skylake platform.

Signed-off-by: Libin Yang <libin.yang@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
9 years agoARC: signal handling robustify
Vineet Gupta [Thu, 26 Mar 2015 05:44:41 +0000 (11:14 +0530)]
ARC: signal handling robustify

A malicious signal handler / restorer can DOS the system by fudging the
user regs saved on stack, causing weird things such as sigreturn returning
to user mode PC but cpu state still being kernel mode....

Ensure that in sigreturn path status32 always has U bit; any other bogosity
(gargbage PC etc) will be taken care of by normal user mode exceptions mechanisms.

Reproducer signal handler:

    void handle_sig(int signo, siginfo_t *info, void *context)
    {
ucontext_t *uc = context;
struct user_regs_struct *regs = &(uc->uc_mcontext.regs);

regs->scratch.status32 = 0;
    }

Before the fix, kernel would go off to weeds like below:

    --------->8-----------
    [ARCLinux]$ ./signal-test
    Path: /signal-test
    CPU: 0 PID: 61 Comm: signal-test Not tainted 4.0.0-rc5+ #65
    task: 8f177880 ti: 5ffe6000 task.ti: 8f15c000

    [ECR   ]: 0x00220200 => Invalid Write @ 0x00000010 by insn @ 0x00010698
    [EFA   ]: 0x00000010
    [BLINK ]: 0x2007c1ee
    [ERET  ]: 0x10698
    [STAT32]: 0x00000000 :                                   <--------
    BTA: 0x00010680  SP: 0x5ffe7e48  FP: 0x00000000
    LPS: 0x20003c6c LPE: 0x20003c70 LPC: 0x00000000
    ...
    --------->8-----------

Reported-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
9 years agoARC: SA_SIGINFO ucontext regs off-by-one
Vineet Gupta [Thu, 26 Mar 2015 03:55:44 +0000 (09:25 +0530)]
ARC: SA_SIGINFO ucontext regs off-by-one

The regfile provided to SA_SIGINFO signal handler as ucontext was off by
one due to pt_regs gutter cleanups in 2013.

Before handling signal, user pt_regs are copied onto user_regs_struct and copied
back later. Both structs are binary compatible. This was all fine until
commit 2fa919045b72 (ARC: pt_regs update #2) which removed the empty stack slot
at top of pt_regs (corresponding to first pad) and made the corresponding
fixup in struct user_regs_struct (the pad in there was moved out of
@scratch - not removed altogether as it is part of ptrace ABI)

 struct user_regs_struct {
+       long pad;
        struct {
-               long pad;
                long bta, lp_start, lp_end,....
        } scratch;
 ...
 }

This meant that now user_regs_struct was off by 1 reg w.r.t pt_regs and
signal code needs to user_regs_struct.scratch to reflect it as pt_regs,
which is what this commit does.

This problem was hidden for 2 years, because both save/restore, despite
using wrong location, were using the same location. Only an interim
inspection (reproducer below) exposed the issue.

     void handle_segv(int signo, siginfo_t *info, void *context)
     {
  ucontext_t *uc = context;
struct user_regs_struct *regs = &(uc->uc_mcontext.regs);

printf("regs %x %x\n",               <=== prints 7 8 (vs. 8 9)
               regs->scratch.r8, regs->scratch.r9);
     }

     int main()
     {
struct sigaction sa;

sa.sa_sigaction = handle_segv;
sa.sa_flags = SA_SIGINFO;
sigemptyset(&sa.sa_mask);
sigaction(SIGSEGV, &sa, NULL);

asm volatile(
"mov r7, 7 \n"
"mov r8, 8 \n"
"mov r9, 9 \n"
"mov r10, 10 \n"
:::"r7","r8","r9","r10");

*((unsigned int*)0x10) = 0;
     }

Fixes: 2fa919045b72ec892e "ARC: pt_regs update #2: Remove unused gutter at start of pt_regs"
CC: <stable@vger.kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
9 years agoNFSD: Fix bad update of layout in nfsd4_return_file_layout
Kinglong Mee [Sun, 22 Mar 2015 14:17:20 +0000 (22:17 +0800)]
NFSD: Fix bad update of layout in nfsd4_return_file_layout

With return layout as, (seg is return layout, lo is record layout)
seg->offset <= lo->offset and layout_end(seg) < layout_end(lo),
nfsd should update lo's offset to seg's end,
and,
seg->offset > lo->offset and layout_end(seg) >= layout_end(lo),
nfsd should update lo's end to seg's offset.

Fixes: 9cf514ccfa ("nfsd: implement pNFS operations")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
9 years agoNFSD: Take care the return value from nfsd4_encode_stateid
Kinglong Mee [Sun, 22 Mar 2015 14:17:10 +0000 (22:17 +0800)]
NFSD: Take care the return value from nfsd4_encode_stateid

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
9 years agoNFSD: Printk blocklayout length and offset as format 0x%llx
Kinglong Mee [Sun, 22 Mar 2015 14:16:40 +0000 (22:16 +0800)]
NFSD: Printk blocklayout length and offset as format 0x%llx

When testing pnfs with nfsd_debug on, nfsd print a negative number
of layout length and foff in nfsd4_block_proc_layoutget as,
"GET: -xxxx:-xxx 2"

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
9 years agonfsd: return correct lockowner when there is a race on hash insert
J. Bruce Fields [Mon, 23 Mar 2015 15:02:30 +0000 (11:02 -0400)]
nfsd: return correct lockowner when there is a race on hash insert

alloc_init_lock_stateowner can return an already freed entry if there is
a race to put openowners in the hashtable.

Noticed by inspection after Jeff Layton fixed the same bug for open
owners.  Depending on client behavior, this one may be trickier to
trigger in practice.

Fixes: c58c6610ec24 "nfsd: Protect adding/removing lock owners using client_lock"
Cc: <stable@vger.kernel.org>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
9 years agonfsd: return correct openowner when there is a race to put one in the hash
Jeff Layton [Mon, 23 Mar 2015 14:53:42 +0000 (10:53 -0400)]
nfsd: return correct openowner when there is a race to put one in the hash

alloc_init_open_stateowner can return an already freed entry if there is
a race to put openowners in the hashtable.

In commit 7ffb588086e9, we changed it so that we allocate and initialize
an openowner, and then check to see if a matching one got stuffed into
the hashtable in the meantime. If it did, then we free the one we just
allocated and take a reference on the one already there. There is a bug
here though. The code will then return the pointer to the one that was
allocated (and has now been freed).

This wasn't evident before as this race almost never occurred. The Linux
kernel client used to serialize requests for a single openowner.  That
has changed now with v4.0 kernels, and this race can now easily occur.

Fixes: 7ffb588086e9
Cc: <stable@vger.kernel.org> # v3.17+
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Reported-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
9 years agoMerge tag 'metag-fixes-v4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhoga...
Linus Torvalds [Wed, 25 Mar 2015 23:52:53 +0000 (16:52 -0700)]
Merge tag 'metag-fixes-v4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag

Pull arch/metag fix from James Hogan:
 "Another metag architecture fix for v4.0

  This is another single fix, for an include dependency problem when
  using ioremap_wc() from asm/io.h without also including asm/pgtable.h"

* tag 'metag-fixes-v4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag:
  metag: Fix ioremap_wc/ioremap_cached build errors

9 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Wed, 25 Mar 2015 23:21:17 +0000 (16:21 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "15 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: numa: mark huge PTEs young when clearing NUMA hinting faults
  mm: numa: slow PTE scan rate if migration failures occur
  mm: numa: preserve PTE write permissions across a NUMA hinting fault
  mm: numa: group related processes based on VMA flags instead of page table flags
  hfsplus: fix B-tree corruption after insertion at position 0
  MAINTAINERS: add Jan as DMI/SMBIOS support maintainer
  fs/affs/file.c: unlock/release page on error
  mm/page_alloc.c: call kernel_map_pages in unset_migrateype_isolate
  mm/slub: fix lockups on PREEMPT && !SMP kernels
  mm/memory hotplug: postpone the reset of obsolete pgdat
  MAINTAINERS: correct rtc armada38x pattern entry
  mm/pagewalk.c: prevent positive return value of walk_page_test() from being passed to callers
  mm: fix anon_vma->degree underflow in anon_vma endless growing prevention
  drivers/rtc/rtc-mrst: fix suspend/resume
  aoe: update aoe maintainer information

9 years agoMerge tag 'signed-for-4.0' of git://github.com/agraf/linux-2.6
Marcelo Tosatti [Wed, 25 Mar 2015 23:20:31 +0000 (20:20 -0300)]
Merge tag 'signed-for-4.0' of git://github.com/agraf/linux-2.6

Patch queue for 4.0 - 2015-03-25

A few bug fixes for Book3S HV KVM:

  - Fix spinlock ordering
  - Fix idle guests on LE hosts
  - Fix instruction emulation

9 years agomm: numa: mark huge PTEs young when clearing NUMA hinting faults
Mel Gorman [Wed, 25 Mar 2015 22:55:45 +0000 (15:55 -0700)]
mm: numa: mark huge PTEs young when clearing NUMA hinting faults

Base PTEs are marked young when the NUMA hinting information is cleared
but the same does not happen for huge pages which this patch addresses.

Note that migrated pages are not marked young as the base page migration
code does not assume that migrated pages have been referenced.  This
could be addressed but beyond the scope of this series which is aimed at
Dave Chinners shrink workload that is unlikely to be affected by this
issue.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agomm: numa: slow PTE scan rate if migration failures occur
Mel Gorman [Wed, 25 Mar 2015 22:55:42 +0000 (15:55 -0700)]
mm: numa: slow PTE scan rate if migration failures occur

Dave Chinner reported the following on https://lkml.org/lkml/2015/3/1/226

  Across the board the 4.0-rc1 numbers are much slower, and the degradation
  is far worse when using the large memory footprint configs. Perf points
  straight at the cause - this is from 4.0-rc1 on the "-o bhash=101073" config:

   -   56.07%    56.07%  [kernel]            [k] default_send_IPI_mask_sequence_phys
      - default_send_IPI_mask_sequence_phys
         - 99.99% physflat_send_IPI_mask
            - 99.37% native_send_call_func_ipi
                 smp_call_function_many
               - native_flush_tlb_others
                  - 99.85% flush_tlb_page
                       ptep_clear_flush
                       try_to_unmap_one
                       rmap_walk
                       try_to_unmap
                       migrate_pages
                       migrate_misplaced_page
                     - handle_mm_fault
                        - 99.73% __do_page_fault
                             trace_do_page_fault
                             do_async_page_fault
                           + async_page_fault
              0.63% native_send_call_func_single_ipi
                 generic_exec_single
                 smp_call_function_single

This is showing excessive migration activity even though excessive
migrations are meant to get throttled.  Normally, the scan rate is tuned
on a per-task basis depending on the locality of faults.  However, if
migrations fail for any reason then the PTE scanner may scan faster if
the faults continue to be remote.  This means there is higher system CPU
overhead and fault trapping at exactly the time we know that migrations
cannot happen.  This patch tracks when migration failures occur and
slows the PTE scanner.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Dave Chinner <david@fromorbit.com>
Tested-by: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agomm: numa: preserve PTE write permissions across a NUMA hinting fault
Mel Gorman [Wed, 25 Mar 2015 22:55:40 +0000 (15:55 -0700)]
mm: numa: preserve PTE write permissions across a NUMA hinting fault

Protecting a PTE to trap a NUMA hinting fault clears the writable bit
and further faults are needed after trapping a NUMA hinting fault to set
the writable bit again.  This patch preserves the writable bit when
trapping NUMA hinting faults.  The impact is obvious from the number of
minor faults trapped during the basis balancing benchmark and the system
CPU usage;

  autonumabench
                                             4.0.0-rc4             4.0.0-rc4
                                              baseline              preserve
  Time System-NUMA01                  107.13 (  0.00%)      103.13 (  3.73%)
  Time System-NUMA01_THEADLOCAL       131.87 (  0.00%)       83.30 ( 36.83%)
  Time System-NUMA02                    8.95 (  0.00%)       10.72 (-19.78%)
  Time System-NUMA02_SMT                4.57 (  0.00%)        3.99 ( 12.69%)
  Time Elapsed-NUMA01                 515.78 (  0.00%)      517.26 ( -0.29%)
  Time Elapsed-NUMA01_THEADLOCAL      384.10 (  0.00%)      384.31 ( -0.05%)
  Time Elapsed-NUMA02                  48.86 (  0.00%)       48.78 (  0.16%)
  Time Elapsed-NUMA02_SMT              47.98 (  0.00%)       48.12 ( -0.29%)

               4.0.0-rc4   4.0.0-rc4
                baseline    preserve
  User          44383.95    43971.89
  System          252.61      201.24
  Elapsed         998.68     1000.94

  Minor Faults   2597249     1981230
  Major Faults       365         364

There is a similar drop in system CPU usage using Dave Chinner's xfsrepair
workload

                                      4.0.0-rc4             4.0.0-rc4
                                       baseline              preserve
  Amean    real-xfsrepair      454.14 (  0.00%)      442.36 (  2.60%)
  Amean    syst-xfsrepair      277.20 (  0.00%)      204.68 ( 26.16%)

The patch looks hacky but the alternatives looked worse.  The tidest was
to rewalk the page tables after a hinting fault but it was more complex
than this approach and the performance was worse.  It's not generally
safe to just mark the page writable during the fault if it's a write
fault as it may have been read-only for COW so that approach was
discarded.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Dave Chinner <david@fromorbit.com>
Tested-by: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agomm: numa: group related processes based on VMA flags instead of page table flags
Mel Gorman [Wed, 25 Mar 2015 22:55:37 +0000 (15:55 -0700)]
mm: numa: group related processes based on VMA flags instead of page table flags

These are three follow-on patches based on the xfsrepair workload Dave
Chinner reported was problematic in 4.0-rc1 due to changes in page table
management -- https://lkml.org/lkml/2015/3/1/226.

Much of the problem was reduced by commit 53da3bc2ba9e ("mm: fix up numa
read-only thread grouping logic") and commit ba68bc0115eb ("mm: thp:
Return the correct value for change_huge_pmd").  It was known that the
performance in 3.19 was still better even if is far less safe.  This
series aims to restore the performance without compromising on safety.

For the test of this mail, I'm comparing 3.19 against 4.0-rc4 and the
three patches applied on top

  autonumabench
                                                3.19.0             4.0.0-rc4             4.0.0-rc4             4.0.0-rc4             4.0.0-rc4
                                               vanilla               vanilla          vmwrite-v5r8         preserve-v5r8         slowscan-v5r8
  Time System-NUMA01                  124.00 (  0.00%)      161.86 (-30.53%)      107.13 ( 13.60%)      103.13 ( 16.83%)      145.01 (-16.94%)
  Time System-NUMA01_THEADLOCAL       115.54 (  0.00%)      107.64 (  6.84%)      131.87 (-14.13%)       83.30 ( 27.90%)       92.35 ( 20.07%)
  Time System-NUMA02                    9.35 (  0.00%)       10.44 (-11.66%)        8.95 (  4.28%)       10.72 (-14.65%)        8.16 ( 12.73%)
  Time System-NUMA02_SMT                3.87 (  0.00%)        4.63 (-19.64%)        4.57 (-18.09%)        3.99 ( -3.10%)        3.36 ( 13.18%)
  Time Elapsed-NUMA01                 570.06 (  0.00%)      567.82 (  0.39%)      515.78 (  9.52%)      517.26 (  9.26%)      543.80 (  4.61%)
  Time Elapsed-NUMA01_THEADLOCAL      393.69 (  0.00%)      384.83 (  2.25%)      384.10 (  2.44%)      384.31 (  2.38%)      380.73 (  3.29%)
  Time Elapsed-NUMA02                  49.09 (  0.00%)       49.33 ( -0.49%)       48.86 (  0.47%)       48.78 (  0.63%)       50.94 ( -3.77%)
  Time Elapsed-NUMA02_SMT              47.51 (  0.00%)       47.15 (  0.76%)       47.98 ( -0.99%)       48.12 ( -1.28%)       49.56 ( -4.31%)

                3.19.0   4.0.0-rc4   4.0.0-rc4   4.0.0-rc4   4.0.0-rc4
               vanilla     vanillavmwrite-v5r8preserve-v5r8slowscan-v5r8
  User        46334.60    46391.94    44383.95    43971.89    44372.12
  System        252.84      284.66      252.61      201.24      249.00
  Elapsed      1062.14     1050.96      998.68     1000.94     1026.78

Overall the system CPU usage is comparable and the test is naturally a
bit variable.  The slowing of the scanner hurts numa01 but on this
machine it is an adverse workload and patches that dramatically help it
often hurt absolutely everything else.

Due to patch 2, the fault activity is interesting

                                  3.19.0   4.0.0-rc4   4.0.0-rc4   4.0.0-rc4   4.0.0-rc4
                                 vanilla     vanillavmwrite-v5r8preserve-v5r8slowscan-v5r8
  Minor Faults                   2097811     2656646     2597249     1981230     1636841
  Major Faults                       362         450         365         364         365

Note the impact preserving the write bit across protection updates and
fault reduces faults.

  NUMA alloc hit                 1229008     1217015     1191660     1178322     1199681
  NUMA alloc miss                      0           0           0           0           0
  NUMA interleave hit                  0           0           0           0           0
  NUMA alloc local               1228514     1216317     1190871     1177448     1199021
  NUMA base PTE updates        245706197   240041607   238195516   244704842   115012800
  NUMA huge PMD updates           479530      468448      464868      477573      224487
  NUMA page range updates      491225557   479886983   476207932   489222218   229950144
  NUMA hint faults                659753      656503      641678      656926      294842
  NUMA hint local faults          381604      373963      360478      337585      186249
  NUMA hint local percent             57          56          56          51          63
  NUMA pages migrated            5412140     6374899     6266530     5277468     5755096
  AutoNUMA cost                    5121%       5083%       4994%       5097%       2388%

Here the impact of slowing the PTE scanner on migratrion failures is
obvious as "NUMA base PTE updates" and "NUMA huge PMD updates" are
massively reduced even though the headline performance is very similar.

As xfsrepair was the reported workload here is the impact of the series
on it.

  xfsrepair
                                         3.19.0             4.0.0-rc4             4.0.0-rc4             4.0.0-rc4             4.0.0-rc4
                                        vanilla               vanilla          vmwrite-v5r8         preserve-v5r8         slowscan-v5r8
  Min      real-fsmark        1183.29 (  0.00%)     1165.73 (  1.48%)     1152.78 (  2.58%)     1153.64 (  2.51%)     1177.62 (  0.48%)
  Min      syst-fsmark        4107.85 (  0.00%)     4027.75 (  1.95%)     3986.74 (  2.95%)     3979.16 (  3.13%)     4048.76 (  1.44%)
  Min      real-xfsrepair      441.51 (  0.00%)      463.96 ( -5.08%)      449.50 ( -1.81%)      440.08 (  0.32%)      439.87 (  0.37%)
  Min      syst-xfsrepair      195.76 (  0.00%)      278.47 (-42.25%)      262.34 (-34.01%)      203.70 ( -4.06%)      143.64 ( 26.62%)
  Amean    real-fsmark        1188.30 (  0.00%)     1177.34 (  0.92%)     1157.97 (  2.55%)     1158.21 (  2.53%)     1182.22 (  0.51%)
  Amean    syst-fsmark        4111.37 (  0.00%)     4055.70 (  1.35%)     3987.19 (  3.02%)     3998.72 (  2.74%)     4061.69 (  1.21%)
  Amean    real-xfsrepair      450.88 (  0.00%)      468.32 ( -3.87%)      454.14 ( -0.72%)      442.36 (  1.89%)      440.59 (  2.28%)
  Amean    syst-xfsrepair      199.66 (  0.00%)      290.60 (-45.55%)      277.20 (-38.84%)      204.68 ( -2.51%)      150.55 ( 24.60%)
  Stddev   real-fsmark           4.12 (  0.00%)       10.82 (-162.29%)       4.14 ( -0.28%)        5.98 (-45.05%)        4.60 (-11.53%)
  Stddev   syst-fsmark           2.63 (  0.00%)       20.32 (-671.82%)       0.37 ( 85.89%)       16.47 (-525.59%)      15.05 (-471.79%)
  Stddev   real-xfsrepair        6.87 (  0.00%)        4.55 ( 33.75%)        3.46 ( 49.58%)        1.78 ( 74.12%)        0.52 ( 92.50%)
  Stddev   syst-xfsrepair        3.02 (  0.00%)       10.30 (-241.37%)      13.17 (-336.37%)       0.71 ( 76.63%)        5.00 (-65.61%)
  CoeffVar real-fsmark           0.35 (  0.00%)        0.92 (-164.73%)       0.36 ( -2.91%)        0.52 (-48.82%)        0.39 (-12.10%)
  CoeffVar syst-fsmark           0.06 (  0.00%)        0.50 (-682.41%)       0.01 ( 85.45%)        0.41 (-543.22%)       0.37 (-478.78%)
  CoeffVar real-xfsrepair        1.52 (  0.00%)        0.97 ( 36.21%)        0.76 ( 49.94%)        0.40 ( 73.62%)        0.12 ( 92.33%)
  CoeffVar syst-xfsrepair        1.51 (  0.00%)        3.54 (-134.54%)       4.75 (-214.31%)       0.34 ( 77.20%)        3.32 (-119.63%)
  Max      real-fsmark        1193.39 (  0.00%)     1191.77 (  0.14%)     1162.90 (  2.55%)     1166.66 (  2.24%)     1188.50 (  0.41%)
  Max      syst-fsmark        4114.18 (  0.00%)     4075.45 (  0.94%)     3987.65 (  3.08%)     4019.45 (  2.30%)     4082.80 (  0.76%)
  Max      real-xfsrepair      457.80 (  0.00%)      474.60 ( -3.67%)      457.82 ( -0.00%)      444.42 (  2.92%)      441.03 (  3.66%)
  Max      syst-xfsrepair      203.11 (  0.00%)      303.65 (-49.50%)      294.35 (-44.92%)      205.33 ( -1.09%)      155.28 ( 23.55%)

The really relevant lines as syst-xfsrepair which is the system CPU
usage when running xfsrepair.  Note that on my machine the overhead was
45% higher on 4.0-rc4 which may be part of what Dave is seeing.  Once we
preserve the write bit across faults, it's only 2.51% higher on average.
With the full series applied, system CPU usage is 24.6% lower on
average.

Again, the impact of preserving the write bit on minor faults is obvious
and the impact of slowing scanning after migration failures is obvious
on the PTE updates.  Note also that the number of pages migrated is much
reduced even though the headline performance is comparable.

                                  3.19.0   4.0.0-rc4   4.0.0-rc4   4.0.0-rc4   4.0.0-rc4
                                 vanilla     vanillavmwrite-v5r8preserve-v5r8slowscan-v5r8
  Minor Faults                 153466827   254507978   249163829   153501373   105737890
  Major Faults                       610         702         690         649         724
  NUMA base PTE updates        217735049   210756527   217729596   216937111   144344993
  NUMA huge PMD updates           129294       85044      106921      127246       79887
  NUMA pages migrated           21938995    29705270    28594162    22687324    16258075

                        3.19.0   4.0.0-rc4   4.0.0-rc4   4.0.0-rc4   4.0.0-rc4
                       vanilla     vanillavmwrite-v5r8preserve-v5r8slowscan-v5r8
  Mean sdb-avgqusz       13.47        2.54        2.55        2.47        2.49
  Mean sdb-avgrqsz      202.32      140.22      139.50      139.02      138.12
  Mean sdb-await         25.92        5.09        5.33        5.02        5.22
  Mean sdb-r_await        4.71        0.19        0.83        0.51        0.11
  Mean sdb-w_await      104.13        5.21        5.38        5.05        5.32
  Mean sdb-svctm          0.59        0.13        0.14        0.13        0.14
  Mean sdb-rrqm           0.16        0.00        0.00        0.00        0.00
  Mean sdb-wrqm           3.59     1799.43     1826.84     1812.21     1785.67
  Max  sdb-avgqusz      111.06       12.13       14.05       11.66       15.60
  Max  sdb-avgrqsz      255.60      190.34      190.01      187.33      191.78
  Max  sdb-await        168.24       39.28       49.22       44.64       65.62
  Max  sdb-r_await      660.00       52.00      280.00       76.00       12.00
  Max  sdb-w_await     7804.00       39.28       49.22       44.64       65.62
  Max  sdb-svctm          4.00        2.82        2.86        1.98        2.84
  Max  sdb-rrqm           8.30        0.00        0.00        0.00        0.00
  Max  sdb-wrqm          34.20     5372.80     5278.60     5386.60     5546.15

FWIW, I also checked SPECjbb in different configurations but it's
similar observations -- minor faults lower, PTE update activity lower
and performance is roughly comparable against 3.19.

This patch (of 3):

Threads that share writable data within pages are grouped together as
related tasks.  This decision is based on whether the PTE is marked
dirty which is subject to timing races between the PTE scanner update
and when the application writes the page.  If the page is file-backed,
then background flushes and sync also affect placement.  This is
unpredictable behaviour which is impossible to reason about so this
patch makes grouping decisions based on the VMA flags.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Dave Chinner <david@fromorbit.com>
Tested-by: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>