Linus Torvalds [Sat, 22 Aug 2015 14:37:41 +0000 (07:37 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"Two minimalistic fixes for 4.2 regressions:
- Eric fixed a thinko in the timer_list base switching code caused by
the overhaul of the timer wheel. It can cause a cpu to see the
wrong base for a timer while we move the timer around.
- Guenter fixed a regression for IMX if booted w/o device tree, where
the timer interrupt is not initialized and therefor the machine
fails to boot"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/imx: Fix boot with non-DT systems
timer: Write timer->flags atomically
While the idea behind get_maintainer seems highly useful it's
unfortunately way to trigger happy to grab people that once had a few
commits to files. For someone like me who does a lot of tree-wide API
work that leads to an incredible amount of Cc spam.
Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Fri, 21 Aug 2015 21:11:51 +0000 (14:11 -0700)]
mm: make page pfmemalloc check more robust
Commit c48a11c7ad26 ("netvm: propagate page->pfmemalloc to skb") added
checks for page->pfmemalloc to __skb_fill_page_desc():
if (page->pfmemalloc && !page->mapping)
skb->pfmemalloc = true;
It assumes page->mapping == NULL implies that page->pfmemalloc can be
trusted. However, __delete_from_page_cache() can set set page->mapping
to NULL and leave page->index value alone. Due to being in union, a
non-zero page->index will be interpreted as true page->pfmemalloc.
So the assumption is invalid if the networking code can see such a page.
And it seems it can. We have encountered this with a NFS over loopback
setup when such a page is attached to a new skbuf. There is no copying
going on in this case so the page confuses __skb_fill_page_desc which
interprets the index as pfmemalloc flag and the network stack drops
packets that have been allocated using the reserves unless they are to
be queued on sockets handling the swapping which is the case here and
that leads to hangs when the nfs client waits for a response from the
server which has been dropped and thus never arrive.
The struct page is already heavily packed so rather than finding another
hole to put it in, let's do a trick instead. We can reuse the index
again but define it to an impossible value (-1UL). This is the page
index so it should never see the value that large. Replace all direct
users of page->pfmemalloc by page_is_pfmemalloc which will hide this
nastiness from unspoiled eyes.
The information will get lost if somebody wants to use page->index
obviously but that was the case before and the original code expected
that the information should be persisted somewhere else if that is
really needed (e.g. what SLAB and SLUB do).
[akpm@linux-foundation.org: fix blooper in slub] Fixes: c48a11c7ad26 ("netvm: propagate page->pfmemalloc to skb") Signed-off-by: Michal Hocko <mhocko@suse.com> Debugged-by: Vlastimil Babka <vbabka@suse.com> Debugged-by: Jiri Bohac <jbohac@suse.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Acked-by: Mel Gorman <mgorman@suse.de> Cc: <stable@vger.kernel.org> [3.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 21 Aug 2015 18:18:10 +0000 (11:18 -0700)]
Merge tag 'pci-v4.2-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"These are fixes for ASPM-related NULL pointer dereference crashes on
Sparc and PowerPC and 64-bit PCI address-related HPMC crashes on
PA-RISC. These are both caused by things we merged in the v4.2 merge
window. Details:
Resource management
- Don't use 64-bit bus addresses on PA-RISC
Miscellaneous
- Tolerate hierarchies with no Root Port"
* tag 'pci-v4.2-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Don't use 64-bit bus addresses on PA-RISC
PCI: Tolerate hierarchies with no Root Port
Linus Torvalds [Fri, 21 Aug 2015 17:46:56 +0000 (10:46 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"A bunch of i915 fixes, one revert a VBT fix that was a bit premature,
and some braswell feature removal that the hw actually didn't support.
One radeon race fix at boot, and one hlcdc build fix, one fix from
Russell that fixes build as well with new audio features.
This is hopefully all I have until -next"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon: fix hotplug race at startup
drm/edid: add function to help find SADs
drm/i915: Avoid TP3 on CHV
drm/i915: remove HBR2 from chv supported list
Revert "drm/i915: Add eDP intermediate frequencies for CHV"
Revert "drm/i915: Allow parsing of variable size child device entries from VBT"
drm/atmel-hlcdc: Compile suspend/resume for PM_SLEEP only
drm/i915: Flag the execlists context object as dirty after every use
Dave Airlie [Fri, 21 Aug 2015 00:44:03 +0000 (10:44 +1000)]
Merge tag 'drm-intel-fixes-2015-08-20' of git://anongit.freedesktop.org/drm-intel into drm-fixes
Revert of a VBT parsing commit that should've been queued for drm-next,
not v4.2. The revert unbreaks Braswell among other things.
Also on Braswell removal of DP HBR2/TP3 and intermediate eDP frequency
support. The code was optimistically added based on incorrect
documentation; the platform does not support them. These are cc: stable.
Finally a gpu state fix from Chris, also cc: stable.
* tag 'drm-intel-fixes-2015-08-20' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Avoid TP3 on CHV
drm/i915: remove HBR2 from chv supported list
Revert "drm/i915: Add eDP intermediate frequencies for CHV"
Revert "drm/i915: Allow parsing of variable size child device entries from VBT"
drm/i915: Flag the execlists context object as dirty after every use
Linus Torvalds [Fri, 21 Aug 2015 00:06:11 +0000 (17:06 -0700)]
Merge tag 'pm+acpi-4.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"These fix a recent regression in the ACPI backlight code and a memory
leak in the Exynos cpufreq driver.
Specifics:
- Fix a recently introduced issue in the ACPI backlight code which
causes lockdep to complain about a circular lock dependency during
initialization (Hans de Goede).
- Fix a possible memory during initialization in the Exynos cpufreq
driver (Shailendra Verma)"
* tag 'pm+acpi-4.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: exynos: Fix for memory leak in case SoC name does not match
ACPI / video: Fix circular lock dependency issue in the video-detect code
Bjorn Helgaas [Thu, 20 Aug 2015 05:08:15 +0000 (00:08 -0500)]
PCI: Don't use 64-bit bus addresses on PA-RISC
Meelis and Helge reported that 3a9ad0b4fdcd ("PCI: Add pci_bus_addr_t")
caused HPMCs on A500 and hangs on rp5470.
PA-RISC does not set ARCH_DMA_ADDR_T_64BIT, even for 64-bit kernels, so
prior to 3a9ad0b4fdcd, we always used 32-bit PCI addresses. After 3a9ad0b4fdcd, we do use 64-bit PCI addresses in 64-bit kernels, and
apparently there's some PA-RISC problem related to them.
Linus Torvalds [Thu, 20 Aug 2015 19:08:38 +0000 (12:08 -0700)]
Merge tag 'sound-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Here are a small collecton of sound fix patches.
The most significant one is the disablement of newly introduced
topology API. Its ABI couldn't be stabilized enough, so we decided to
delay for 4.3 in the end. Other than that, all oneliner fixes: a
USB-audio runtime PM fix and a couple of HD-audio quirks"
* tag 'sound-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Add dock support for Thinkpad W541 (17aa:2211)
ALSA: usb-audio: Fix runtime PM unbalance
ASoC: topology: Disable use from userspace
ASoC: topology: Add Kconfig option for topology
ALSA: hda - Fix the white noise on Dell laptop
Pull SCSI target fixes from Nicholas Bellinger:
"This contains a v4.2-rc specific RCU module unload regression bug-fix,
a long-standing iscsi-target bug-fix for duplicate target_xfer_tags
during NOP processing from Alexei, and two more small REPORT_LUNs
emulation related patches to make Solaris FC host LUN scanning happy
from Roland.
There is also one patch not included that allows target-core to limit
the number of fabric driver SGLs per I/O request using residuals, that
is currently required as a work-around for FC hosts which don't honor
EVPD block-limits settings. At this point, it will most likely become
for-next material"
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target: Fix handling of small allocation lengths in REPORT LUNS
target: REPORT LUNS should return LUN 0 even for dynamic ACLs
target/iscsi: Fix double free of a TUR followed by a solicited NOPOUT
target: Perform RCU callback barrier before backend/fabric unload
Linus Torvalds [Thu, 20 Aug 2015 18:32:33 +0000 (11:32 -0700)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
Pull thermal fixes from Eduardo Valentin:
"Last minute fixes on the thermal-soc tree. There is a fix of a long
lasting bug in cpu cooling device, thanks for RMK for being pushing
this"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
thermal/cpu_cooling: update policy limits if clipped_freq < policy->max
thermal/cpu_cooling: rename max_freq as clipped_freq in notifier
thermal/cpu_cooling: rename cpufreq_val as clipped_freq
thermal/cpu_cooling: convert 'switch' block to 'if' block in notifier
thermal/cpu_cooling: quit early after updating policy
thermal/cpu_cooling: No need to initialize max_freq to 0
thermal: cpu_cooling: fix lockdep problems in cpu_cooling
thermal: power_allocator: do not use devm* interfaces
Guenter Roeck [Thu, 20 Aug 2015 10:27:21 +0000 (03:27 -0700)]
clocksource/imx: Fix boot with non-DT systems
Commit 6dd747825b20 ("ARM: imx: move timer resources into a structure")
moved initialization parameters into a data structure, but neglected to set
the irq field in that data structure for non-DT boots. This causes the system
to hang if a non-DT boot is attempted.
Fixes: 6dd747825b20 ("ARM: imx: move timer resources into a structure") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Cc: Shawn Guo <shawn.guo@linaro.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Link: http://lkml.kernel.org/r/1440066441-13930-1-git-send-email-linux@roeck-us.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
David Vrabel [Thu, 20 Aug 2015 10:33:41 +0000 (11:33 +0100)]
x86/xen: make CONFIG_XEN depend on CONFIG_X86_LOCAL_APIC
Since commit feb44f1f7a4ac299d1ab1c3606860e70b9b89d69 (x86/xen:
Provide a "Xen PV" APIC driver to support >255 VCPUs) Xen guests need
a full APIC driver and thus should depend on X86_LOCAL_APIC.
This fixes an i386 build failure with !SMP && !CONFIG_X86_UP_APIC by
disabling Xen support in this configuration.
Users needing Xen support in a non-SMP i386 kernel will need to enable
CONFIG_X86_UP_APIC.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: <stable@vger.kernel.org>
Yijing Wang [Mon, 17 Aug 2015 10:47:58 +0000 (18:47 +0800)]
PCI: Tolerate hierarchies with no Root Port
We should not assume any particular hardware topology. Commit d0751b98dfa3
("PCI: Add dev->has_secondary_link to track downstream PCIe links") relied
on the assumption that every PCIe hierarchy is rooted at a Root Port. But
we can't rely on any assumption about what hardware we will find; we just
have to deal with the world as it is.
On some platforms, PCIe devices (endpoints, switch upstream ports, etc.)
appear directly on the root bus, and there is no Root Port in the PCI bus
hierarchy. For example, Meelis observed these top-level devices on a
Sparc V245:
0000:02:00.0 PCI bridge to [bus 03-0d] Switch Upstream Port
0001:02:00.0 PCI bridge to [bus 03] PCIe to PCI/PCI-X Bridge
These devices *look* like they have links going upstream, but there really
are no upstream devices.
In set_pcie_port_type(), we used the parent device to figure out which side
of a switch port has a link, so if the parent device did not exist, we
dereferenced a NULL parent pointer.
Check whether the parent device exists before dereferencing it.
Meelis observed this oops on Sparc V245 and T2000. Ben Herrenschmidt says
this is also possible on IBM PowerVM guests on PowerPC.
[bhelgaas: changelog, comment] Link: http://lkml.kernel.org/r/alpine.LRH.2.20.1508122118210.18637@math.ut.ee Reported-by: Meelis Roos <mroos@linux.ee> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: David S. Miller <davem@davemloft.net>
Takashi Iwai [Wed, 19 Aug 2015 16:31:54 +0000 (18:31 +0200)]
Merge tag 'asoc-v4.2-disable-topology' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Disable topology support for v4.2
The topology code merged in the v4.2 merge window introduced a new ABI
which was believed to be suitable for use but subsequently additional
work by the developers of this feature have revealed some problems that
need to be addressed. In order to allow this to be done without having
to support the initial ABI add Kconfig to disable the build and also add
some #error statements to the UAPI header so users can't use them.
Takashi Iwai [Wed, 19 Aug 2015 05:20:14 +0000 (07:20 +0200)]
ALSA: usb-audio: Fix runtime PM unbalance
The fix for deadlock in PM in commit [1ee23fe07ee8: ALSA: usb-audio:
Fix deadlocks at resuming] introduced a new check of in_pm flag.
However, the brainless patch author evaluated it in a wrong way
(logical AND instead of logical OR), thus usb_autopm_get_interface()
is wrongly called at probing, leading to unbalance of runtime PM
refcount.
This patch fixes it by correcting the logic.
Reported-by: Hans Yang <hansy@nvidia.com> Fixes: 1ee23fe07ee8 ('ALSA: usb-audio: Fix deadlocks at resuming') Cc: <stable@vger.kernel.org> [v3.15+] Signed-off-by: Takashi Iwai <tiwai@suse.de>
The current code is not mature enough, the API should allow a single
protocol to be specified. Also, the current code contains heuristics
that will depend on module load order.
Signed-off-by: David Härdeman <david@hardeman.nu> Acked-by: Antti Seppälä <a.seppala@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
The current code is not mature enough, the API should allow a single
protocol to be specified. Also, the current code contains heuristics
that will depend on module load order.
Signed-off-by: David Härdeman <david@hardeman.nu> Acked-by: Antti Seppälä <a.seppala@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
The current code is not mature enough, the API should allow a single
protocol to be specified. Also, the current code contains heuristics
that will depend on module load order.
Signed-off-by: David Härdeman <david@hardeman.nu> Acked-by: Antti Seppälä <a.seppala@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
The current code is not mature enough, the API should allow a single
protocol to be specified. Also, the current code contains heuristics
that will depend on module load order.
Signed-off-by: David Härdeman <david@hardeman.nu> Acked-by: Antti Seppälä <a.seppala@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
The current code is not mature enough, the API should allow a single
protocol to be specified. Also, the current code contains heuristics
that will depend on module load order.
Signed-off-by: David Härdeman <david@hardeman.nu> Acked-by: Antti Seppälä <a.seppala@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
The current code is not mature enough, the API should allow a single
protocol to be specified. Also, the current code contains heuristics
that will depend on module load order.
Signed-off-by: David Härdeman <david@hardeman.nu> Acked-by: Antti Seppälä <a.seppala@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
The current code is not mature enough, the API should allow a single
protocol to be specified. Also, the current code contains heuristics
that will depend on module load order.
Signed-off-by: David Härdeman <david@hardeman.nu> Acked-by: Antti Seppälä <a.seppala@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
drm/i915: Allow parsing of variable size child device entries from VBT
That commit is not valid for v4.2, however it will be valid for v4.3. It
was simply queued too early.
The referenced regressing commit is just fine until the size of struct
common_child_dev_config changes, and that won't happen until
v4.3. Indeed, the expected size checks here rely on the increased size
of the struct, breaking new platforms.
Fixes: 047fe6e6db91 ("drm/i915: Allow parsing of variable size child device entries from VBT") Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Weinehall <david.weinehall@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Roland Dreier [Fri, 14 Aug 2015 04:59:19 +0000 (21:59 -0700)]
target: Fix handling of small allocation lengths in REPORT LUNS
REPORT LUNS should not fail just because the allocation length is less
than 16. The relevant section of SPC-4 is:
4.2.5.6 Allocation length
The ALLOCATION LENGTH field specifies the maximum number of bytes or
blocks that an application client has allocated in the Data-In
Buffer. The ALLOCATION LENGTH field specifies bytes unless a
different requirement is stated in the command definition.
An allocation length of zero specifies that no data shall be
transferred. This condition shall not be considered an error.
So we should just truncate our response rather than return an error.
Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Spencer Baugh <sbaugh@catern.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Sven Eckelmann [Tue, 18 Aug 2015 11:37:01 +0000 (13:37 +0200)]
batman-adv: Fix memory leak on tt add with invalid vlan
The object tt_local is allocated with kmalloc and not initialized when the
function batadv_tt_local_add checks for the vlan. But this function can
only cleanup the object when the (not yet initialized) reference counter of
the object is 1. This is unlikely and thus the object would leak when the
vlan could not be found.
Instead the uninitialized object tt_local has to be freed manually and the
pointer has to set to NULL to avoid calling the function which would try to
decrement the reference counter of the not existing object.
CID: 1316518 Fixes: 354136bcc3c4 ("batman-adv: fix kernel crash due to missing NULL checks") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 18 Aug 2015 19:17:36 +0000 (12:17 -0700)]
Merge tag 'dmaengine-fix-4.2-rc8' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fix from Vinod Koul:
"We recently found issue with dma_request_slave_channel() API causing
privatecnt value to go bad. This is fixed by balancing the privatecnt"
* tag 'dmaengine-fix-4.2-rc8' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: fix balance of privatecnt inc/dec operations
Mark Brown [Tue, 18 Aug 2015 05:59:25 +0000 (22:59 -0700)]
ASoC: topology: Disable use from userspace
Since the topology API is still in sufficient flux for changes to be
identified disable the use of the userspace ABI by adding #error
statements to the code, ensuring that nobody relies on the headers as
currently defined. It is expected that this change will be reverted for
v4.3.
Linus Torvalds [Tue, 18 Aug 2015 14:55:05 +0000 (07:55 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"These came in late last week, I wanted to look over the mst one before
forwarding, but it seems good.
Just three i915 and one MST fix"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/i915: Commit planes on each crtc separately.
drm/i915: calculate primary visibility changes instead of calling from set_config
drm/i915: Only dither on 6bpc panels
drm/dp/mst: Remove port after removing connector.
Eric Dumazet [Mon, 17 Aug 2015 17:18:48 +0000 (10:18 -0700)]
timer: Write timer->flags atomically
lock_timer_base() cannot prevent the following :
CPU1 ( in __mod_timer()
timer->flags |= TIMER_MIGRATING;
spin_unlock(&base->lock);
base = new_base;
spin_lock(&base->lock);
// The next line clears TIMER_MIGRATING
timer->flags &= ~TIMER_BASEMASK;
CPU2 (in lock_timer_base())
see timer base is cpu0 base
spin_lock_irqsave(&base->lock, *flags);
if (timer->flags == tf)
return base; // oops, wrong base
timer->flags |= base->cpu // too late
We must write timer->flags in one go, otherwise we can fool other cpus.
Fixes: bc7a34b8b9eb ("timer: Reduce timer migration overhead if disabled") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jon Christopherson <jon@jons.org> Cc: David Miller <davem@davemloft.net> Cc: xen-devel@lists.xen.org Cc: david.vrabel@citrix.com Cc: Sander Eikelenboom <linux@eikelenboom.it> Link: http://lkml.kernel.org/r/1439831928.32680.11.camel@edumazet-glaptop2.roam.corp.google.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de>
Mark Brown [Sat, 15 Aug 2015 15:24:20 +0000 (08:24 -0700)]
ASoC: topology: Add Kconfig option for topology
Allow the topology code to be compiled out so that users who don't need
topology don't need to havve the code compiled in, saving them some
memory.
Some more configuration could be added to remove some of the hooks into
the core data structures but that is probably best done with some
refactoring to use functions to do the updates of the data structures
rather than ifdefing in the code as we'd need to do at the minute.
Suggested-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Mark Brown <broonie@kernel.org>
Linus Torvalds [Mon, 17 Aug 2015 23:20:45 +0000 (16:20 -0700)]
Merge branch 'for-4.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
"Three minor device-specific fixes and revert of NCQ autosense added
during this -rc1.
It turned out that NCQ autosense as currently implemented interferes
with the usual error handling behavior. It will be revisited in the
near future"
* 'for-4.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ata: ahci_brcmstb: Fix misuse of IS_ENABLED
sata_sx4: Check return code from pdc20621_i2c_read()
Revert "libata: Implement NCQ autosense"
Revert "libata: Implement support for sense data reporting"
Revert "libata-eh: Set 'information' field for autosense"
ata: ahci_brcmstb: Fix warnings with CONFIG_PM_SLEEP=n
Linus Torvalds [Mon, 17 Aug 2015 23:15:26 +0000 (16:15 -0700)]
Merge branch 'for-4.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
"A fix for a subtle bug introduced back during 3.17 cycle which
interferes with setting configurations under specific conditions"
* 'for-4.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cpuset: use trialcs->mems_allowed as a temp variable
Ivan Vecera [Fri, 14 Aug 2015 20:30:01 +0000 (22:30 +0200)]
be2net: avoid vxlan offloading on multichannel configs
VxLAN offloading is not functional if the NIC is running in multichannel
mode (UMC, FLEX-10, VNIC...). Enabling this additionally kills whole
connectivity through the NIC and the device needs to be down and up to
restore it. The firmware should take care about it and does not allow
the conversion of interface to tunnel type (be_cmd_manage_iface) or should
support VxLAN offloading if multichannel config is enabled.
I have tested this on the latest available firmware (10.6.144.21).
Result:
[root@sm-04 ~]# ip link set enp5s0f0 up[root@sm-04 ~]# ip addr add 172.30.10.50/24 dev enp5s0f0
[root@sm-04 ~]# ping -c 3 172.30.10.254PING 172.30.10.254 (172.30.10.254) 56(84) bytes of data.
64 bytes from 172.30.10.254: icmp_seq=1 ttl=64 time=0.317 ms
64 bytes from 172.30.10.254: icmp_seq=2 ttl=64 time=0.187 ms
64 bytes from 172.30.10.254: icmp_seq=3 ttl=64 time=0.188 ms
--- 172.30.10.254 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.187/0.230/0.317/0.063 ms
[root@sm-04 ~]# ip link add link enp5s0f0 vxlan10 type vxlan id 10 remote 172.30.10.60 dstport 4789
[root@sm-04 ~]# ip link set vxlan10 up
[ 7900.442811] be2net 0000:05:00.0: Enabled VxLAN offloads for UDP port 4789
[ 7900.455722] be2net 0000:05:00.1: Enabled VxLAN offloads for UDP port 4789
[ 7900.468635] be2net 0000:05:00.2: Enabled VxLAN offloads for UDP port 4789
[ 7900.481553] be2net 0000:05:00.3: Enabled VxLAN offloads for UDP port 4789
[root@sm-04 ~]# ping -c 3 172.30.10.254
PING 172.30.10.254 (172.30.10.254) 56(84) bytes of data.
[root@sm-04 ~]# ip link set vxlan10 down
[ 7959.434093] be2net 0000:05:00.0: Disabled VxLAN offloads for UDP port 4789
[ 7959.444792] be2net 0000:05:00.1: Disabled VxLAN offloads for UDP port 4789
[ 7959.455592] be2net 0000:05:00.2: Disabled VxLAN offloads for UDP port 4789
[ 7959.466416] be2net 0000:05:00.3: Disabled VxLAN offloads for UDP port 4789
[root@sm-04 ~]# ip link del vxlan10
[root@sm-04 ~]# ping -c 3 172.30.10.254
PING 172.30.10.254 (172.30.10.254) 56(84) bytes of data.
[root@sm-04 ~]# ip link set enp5s0f0 down
[root@sm-04 ~]# ip link set enp5s0f0 up
[ 8071.019003] be2net 0000:05:00.0 enp5s0f0: Link is Up
[root@sm-04 ~]# ping -c 3 172.30.10.254
PING 172.30.10.254 (172.30.10.254) 56(84) bytes of data.
64 bytes from 172.30.10.254: icmp_seq=1 ttl=64 time=0.318 ms
64 bytes from 172.30.10.254: icmp_seq=2 ttl=64 time=0.196 ms
64 bytes from 172.30.10.254: icmp_seq=3 ttl=64 time=0.194 ms
--- 172.30.10.254 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.194/0.236/0.318/0.057 ms
Cc: Sathya Perla <sathya.perla@avagotech.com> Cc: Ajit Khaparde <ajit.khaparde@avagotech.com> Cc: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com> Cc: Sriharsha Basavapatna <sriharsha.basavapatna@avagotech.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Acked-by: Ajit Khaparde <ajit.khaparde@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Fri, 14 Aug 2015 18:05:54 +0000 (11:05 -0700)]
ipv6: Fix a potential deadlock when creating pcpu rt
rt6_make_pcpu_route() is called under read_lock(&table->tb6_lock).
rt6_make_pcpu_route() calls ip6_rt_pcpu_alloc(rt) which then
calls dst_alloc(). dst_alloc() _may_ call ip6_dst_gc() which takes
the write_lock(&tabl->tb6_lock). A visualized version:
Fixes: d52d3997f843 ("pv6: Create percpu rt6_info") Signed-off-by: Martin KaFai Lau <kafai@fb.com> CC: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Fri, 14 Aug 2015 18:05:53 +0000 (11:05 -0700)]
ipv6: Add rt6_make_pcpu_route()
It is a prep work for fixing a potential deadlock when creating
a pcpu rt.
The current rt6_get_pcpu_route() will also create a pcpu rt if one does not
exist. This patch moves the pcpu rt creation logic into another function,
rt6_make_pcpu_route().
Signed-off-by: Martin KaFai Lau <kafai@fb.com> CC: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Fri, 14 Aug 2015 18:05:52 +0000 (11:05 -0700)]
ipv6: Remove un-used argument from ip6_dst_alloc()
After 4b32b5ad31a6 ("ipv6: Stop rt6_info from using inet_peer's metrics"),
ip6_dst_alloc() does not need the 'table' argument. This patch
cleans it up.
Signed-off-by: Martin KaFai Lau <kafai@fb.com> CC: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Plyatov [Fri, 14 Aug 2015 17:11:02 +0000 (20:11 +0300)]
net: phy: workaround for buggy cable detection by LAN8700 after cable plugging
* Due to HW bug, LAN8700 sometimes does not detect presence of energy in the
Ethernet cable in Energy Detect Power-Down mode (e.g while EDPWRDOWN bit is
set, the ENERGYON bit does not asserted sometimes). This is a common bug of
LAN87xx family of PHY chips.
* The lan87xx_read_status() was improved to acquire ENERGYON bit. Its previous
algorythm still not reliable on 100 % and sometimes skip cable plugging.
Signed-off-by: Igor Plyatov <plyatov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 14 Aug 2015 08:42:56 +0000 (10:42 +0200)]
ppp: fix device unregistration upon netns deletion
PPP devices may get automatically unregistered when their network
namespace is getting removed. This happens if the ppp control plane
daemon (e.g. pppd) exits while it is the last user of this namespace.
This leads to several races:
* ppp_exit_net() may destroy the per namespace idr (pn->units_idr)
before all file descriptors were released. Successive ppp_release()
calls may then cleanup PPP devices with ppp_shutdown_interface() and
try to use the already destroyed idr.
* Automatic device unregistration may also happen before the
ppp_release() call for that device gets executed. Once called on
the file owning the device, ppp_release() will then clean it up and
try to unregister it a second time.
To fix these issues, operations defined in ppp_shutdown_interface() are
moved to the PPP device's ndo_uninit() callback. This allows PPP
devices to be properly cleaned up by unregister_netdev() and friends.
So checking for ppp->owner is now an accurate test to decide if a PPP
device should be unregistered.
Setting ppp->owner is done in ppp_create_interface(), before device
registration, in order to avoid unprotected modification of this field.
Finally, ppp_exit_net() now starts by unregistering all remaining PPP
devices to ensure that none will get unregistered after the call to
idr_destroy().
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Shaohui Xie [Fri, 14 Aug 2015 04:23:40 +0000 (12:23 +0800)]
net: phy: fix PHY_RUNNING in phy_state_machine
Currently, if phy state is PHY_RUNNING, we always register a CHANGE
when phy works in polling or interrupt ignored, this will make the
adjust_link being called even the phy link did Not changed.
checking the phy link to make sure the link did changed before we
register a CHANGE, if link did not changed, we do nothing.
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Calvin Owens [Thu, 13 Aug 2015 21:21:34 +0000 (14:21 -0700)]
Revert "net: limit tcp/udp rmem/wmem to SOCK_{RCV,SND}BUF_MIN"
Commit 8133534c760d4083 ("net: limit tcp/udp rmem/wmem to
SOCK_{RCV,SND}BUF_MIN") modified four sysctls to enforce that the values
written to them are not less than SOCK_MIN_{RCV,SND}BUF.
That change causes 4096 to no longer be accepted as a valid value for
'min' in tcp_wmem and udp_wmem_min. 4096 has been the default for both
of those sysctls for a long time, and unfortunately seems to be an
extremely popular setting. This change breaks a large number of sysctl
configurations at Facebook.
That commit referred to b1cb59cf2efe7971 ("net: sysctl_net_core: check
SNDBUF and RCVBUF for min length"), which choose to use the SOCK_MIN
constants as the lower limits to avoid nasty bugs. But AFAICS, a limit
of SOCK_MIN_SNDBUF isn't necessary to do that: the BUG_ON cited in the
commit message seems to have happened because unix_stream_sendmsg()
expects a minimum of a full page (ie SK_MEM_QUANTUM) and the math broke,
not because it had less than SOCK_MIN_SNDBUF allocated.
This particular issue doesn't seem to affect TCP however: using a
setting of "1 1 1" for tcp_{r,w}mem works, although it's obviously
suboptimal. SK_MEM_QUANTUM would be a nice minimum, but it's 64K on
some archs, so there would still be breakage.
Since a value of one doesn't seem to cause any problems, we can drop the
minimum 8133534c added to fix this.
Fixes: 8133534c760d4083 ("net: limit tcp/udp rmem/wmem to SOCK_MIN...") Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Sorin Dumitru <sorin@returnze.ro> Signed-off-by: Calvin Owens <calvinowens@fb.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Baldyga [Fri, 7 Aug 2015 10:26:47 +0000 (12:26 +0200)]
dmaengine: fix balance of privatecnt inc/dec operations
This patch increments privatecnt value and set DMA_PRIVATE in device
caps in dma_request_slave_channel() function. This is needed to keep
privatecnt increment/decrement balance.
As function dma_release_channel() decrements privatecnt counter, we need
to increment it when channel is requested. Otherwise privatecnt drops
into negatives after few dma_release_channel() calls.
Reported-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Robert Baldyga <r.baldyga@samsung.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Pull crypto fixes from Herbert Xu:
"This fixes the following issues:
- a regression caused by the conversion of IPsec ESP to the new AEAD
interface: ESN with authencesn no longer works because it relied on
the AD input SG list having a specific layout which is no longer
the case. In linux-next authencesn is fixed properly and no longer
assumes anything about the SG list format. While for this release
a minimal fix is applied to authencesn so that it works with the
new linear layout.
- fix memory corruption caused by bogus index in the caam hash code.
- fix powerpc nx SHA hashing which could cause module load failures
if module signature verification is enabled"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: caam - fix memory corruption in ahash_final_ctx
crypto: nx - respect sg limit bounds when building sg lists for SHA
crypto: authencesn - Fix breakage with new ESP code
Chris Wilson [Fri, 14 Aug 2015 11:59:19 +0000 (12:59 +0100)]
drm/i915: Flag the execlists context object as dirty after every use
Everytime we use the logical context with execlists it becomes dirty (as
the hardware will write the new register values afterwards, as well as
the GPU state that will be used). We need to then flag the context as
dirty everytime since after a swap-out/swap-in cycle the dirty flag will
be cleared, and a further swap-out cycle will then loose the most recent
GPU state.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Linus Torvalds [Sun, 16 Aug 2015 22:44:33 +0000 (15:44 -0700)]
Merge tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"A smallish batch of fixes, a little more than expected this late, but
all fixes are contained to their platforms and seem reasonably low
risk:
- a somewhat large SMP fix for ux500 that still seemed warranted to
include here
- OMAP DT fixes for pbias regulator specification that broke due to
some DT reshuffling
- PCIe IRQ routing bugfix for i.MX
- networking fixes for keystone
- runtime PM for OMAP GPMC
- a couple of error path bug fixes for exynos"
* tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: dts: keystone: Fix the mdio bindings by moving it to soc specific file
ARM: dts: keystone: fix the clock node for mdio
memory: omap-gpmc: Don't try to save uninitialized GPMC context
ARM: imx6: correct i.MX6 PCIe interrupt routing
ARM: ux500: add an SMP enablement type and move cpu nodes
ARM: dts: dra7: Fix broken pbias device creation
ARM: dts: OMAP5: Fix broken pbias device creation
ARM: dts: OMAP4: Fix broken pbias device creation
ARM: dts: omap243x: Fix broken pbias device creation
ARM: EXYNOS: fix double of_node_put() on error path
ARM: EXYNOS: Fix potentian kfree() of ro memory
Linus Torvalds [Sun, 16 Aug 2015 22:11:25 +0000 (15:11 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Merge x86 fixes from Ingo Molnar:
"Two followup fixes related to the previous LDT fix"
Also applied a further FPU emulation fix from Andy Lutomirski to the
branch before actually merging it.
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
x86/ldt: Further fix FPU emulation
x86/ldt: Correct FPU emulation access to LDT
x86/ldt: Correct LDT access in single stepping logic
Olof Johansson [Sun, 16 Aug 2015 19:29:57 +0000 (21:29 +0200)]
Merge tag 'keystone-dts-late-fixes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone into fixes
ARM: Couple of Keysyone MDIO DTS fixes for 4.2-rc6+
These are necessary to get the NIC card working on all Keystone
EVMs. Couple of boards are broken without these two fixes.
* tag 'keystone-dts-late-fixes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone:
ARM: dts: keystone: Fix the mdio bindings by moving it to soc specific file
ARM: dts: keystone: fix the clock node for mdio
Markos Chandras [Thu, 13 Aug 2015 07:47:59 +0000 (08:47 +0100)]
MIPS: Fix seccomp syscall argument for MIPS64
Commit 4c21b8fd8f14 ("MIPS: seccomp: Handle indirect system calls (o32)")
fixed indirect system calls on O32 but it also introduced a bug for MIPS64
where it erroneously modified the v0 (syscall) register with the assumption
that the sycall offset hasn't been taken into consideration. This breaks
seccomp on MIPS64 n64 and n32 ABIs. We fix this by replacing the addition
with a move instruction.
Linus Torvalds [Sat, 15 Aug 2015 20:54:53 +0000 (13:54 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This has two libfc fixes for bugs causing rare crashes, one iscsi fix
for a potential hang on shutdown, and a fix for an I/O blocksize issue
which caused a regression"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
sd: Fix maximum I/O size for BLOCK_PC requests
libfc: Fix fc_fcp_cleanup_each_cmd()
libfc: Fix fc_exch_recv_req() error path
libiscsi: Fix host busy blocking during connection teardown
Dave Airlie [Sat, 15 Aug 2015 04:51:31 +0000 (14:51 +1000)]
Merge tag 'drm-intel-fixes-2015-08-14' of git://anongit.freedesktop.org/drm-intel into drm-next
three display fixes for Intel.
* tag 'drm-intel-fixes-2015-08-14' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Commit planes on each crtc separately.
drm/i915: calculate primary visibility changes instead of calling from set_config
drm/i915: Only dither on 6bpc panels
thermal/cpu_cooling: update policy limits if clipped_freq < policy->max
policy->max is the maximum allowed frequency defined by user and
clipped_freq is the maximum that thermal constraints allow.
If clipped_freq is lower than policy->max, then we need to readjust
policy->max.
But, if clipped_freq is greater than policy->max, we don't need to do
anything. We used to call cpufreq_verify_within_limits() in this case,
but it doesn't change anything in this case.
Lets skip this unnecessary call and write a comment that explains this.
thermal/cpu_cooling: convert 'switch' block to 'if' block in notifier
We just need to take care of single event here and there is no need to
increase indentation level of most of the code (which causes lines
longer that 80 columns to break).
thermal/cpu_cooling: quit early after updating policy
If a valid cpufreq_dev is found for policy->cpu, we should update the
policy and quit the for loop. There is no need to keep traversing the
list of cpufreq_dev's.
Russell King [Wed, 12 Aug 2015 09:52:16 +0000 (15:22 +0530)]
thermal: cpu_cooling: fix lockdep problems in cpu_cooling
A recent change to the cpu_cooling code introduced a AB-BA deadlock
scenario between the cpufreq_policy_notifier_list rwsem and the
cooling_cpufreq_lock. This is caused by cooling_cpufreq_lock being held
before the registration/removal of the notifier block (an operation
which takes the rwsem), and the notifier code itself which takes the
locks in the reverse order:
======================================================
[ INFO: possible circular locking dependency detected ]
3.18.0+ #1453 Not tainted
-------------------------------------------------------
rc.local/770 is trying to acquire lock:
(cooling_cpufreq_lock){+.+.+.}, at: [<c04abfc4>] cpufreq_thermal_notifier+0x34/0xfc
but task is already holding lock:
((cpufreq_policy_notifier_list).rwsem){++++.+}, at: [<c0042f04>] __blocking_notifier_call_chain+0x34/0x68
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
Solve this by moving to finer grained locking - use one mutex to protect
the cpufreq_dev_list as a whole, and a separate lock to ensure correct
ordering of cpufreq notifier registration and removal.
cooling_list_lock is taken within cooling_cpufreq_lock on
(un)registration to preserve the behavior of the code, i.e. to
atomically add/remove to the list and (un)register the notifier.
Fixes: 2dcd851fe4b4 ("thermal: cpu_cooling: Update always cpufreq policy with Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Linus Torvalds [Sat, 15 Aug 2015 00:27:52 +0000 (17:27 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Just two very small & simple patches"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Use adjustment in guest cycles when handling MSR_IA32_TSC_ADJUST
KVM: x86: zero IDT limit on entry to SMM
Linus Torvalds [Sat, 15 Aug 2015 00:05:26 +0000 (17:05 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
"11 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
Update maintainers for DRM STI driver
mm: cma: mark cma_bitmap_maxno() inline in header
zram: fix pool name truncation
memory-hotplug: fix wrong edge when hot add a new node
.mailmap: Andrey Ryabinin has moved
ipc/sem.c: update/correct memory barriers
mm/hwpoison: fix panic due to split huge zero page
ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()
ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits
mm/hwpoison: fix fail isolate hugetlbfs page w/ refcount held
mm/hwpoison: fix page refcount of unknown non LRU page
Signed-off-by: Benjamin Gaignard <benjamin.gaignard@linaro.org> Cc: Vincent Abriou <vincent.abriou@st.com> Cc: Dave Airlie <airlied@linux.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gregory Fong [Fri, 14 Aug 2015 22:35:21 +0000 (15:35 -0700)]
mm: cma: mark cma_bitmap_maxno() inline in header
cma_bitmap_maxno() was marked as static and not static inline, which can
cause warnings about this function not being used if this file is included
in a file that does not call that function, and violates the conventions
used elsewhere. The two options are to move the function implementation
back to mm/cma.c or make it inline here, and it's simple enough for the
latter to make sense.
Signed-off-by: Gregory Fong <gregory.0xf0@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
However, it defines pool name buffer to be only 8 bytes long (minus
trailing zero), which means that we can have only 1000 pool names: zram0
-- zram999.
With CONFIG_ZSMALLOC_STAT enabled an attempt to create a device zram1000
can fail if device zram100 already exists, because snprintf() will
truncate new pool name to zram100 and pass it debugfs_create_dir(),
causing:
debugfs dir <zram100> creation failed
zram: Error creating memory pool
... and so on.
Fix it by passing zram->disk->disk_name to zram_meta_alloc() instead of
divice_id. We construct zram%d name earlier and keep it as a ->disk_name,
no need to snprintf() it again.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Xishi Qiu [Fri, 14 Aug 2015 22:35:16 +0000 (15:35 -0700)]
memory-hotplug: fix wrong edge when hot add a new node
When we add a new node, the edge of memory may be wrong.
e.g. system has 4 nodes, and node3 is movable, node3 mem:[24G-32G],
1. hotremove the node3,
2. then hotadd node3 with a part of memory, mem:[26G-30G],
3. call hotadd_new_pgdat()
free_area_init_node()
get_pfn_range_for_nid()
4. it will return wrong start_pfn and end_pfn, because we have not
update the memblock.
This patch also fixes a BUG_ON during hot-addition, please see
http://marc.info/?l=linux-kernel&m=142961156129456&w=2
Manfred Spraul [Fri, 14 Aug 2015 22:35:10 +0000 (15:35 -0700)]
ipc/sem.c: update/correct memory barriers
sem_lock() did not properly pair memory barriers:
!spin_is_locked() and spin_unlock_wait() are both only control barriers.
The code needs an acquire barrier, otherwise the cpu might perform read
operations before the lock test.
As no primitive exists inside <include/spinlock.h> and since it seems
noone wants another primitive, the code creates a local primitive within
ipc/sem.c.
With regards to -stable:
The change of sem_wait_array() is a bugfix, the change to sem_lock() is a
nop (just a preprocessor redefinition to improve the readability). The
bugfix is necessary for all kernels that use sem_wait_array() (i.e.:
starting from 3.10).
Huge zero page is allocated if page fault w/o FAULT_FLAG_WRITE flag.
The get_user_pages_fast() which called in madvise_hwpoison() will get
huge zero page if the page is not allocated before. Huge zero page is a
tranparent huge page, however, it is not an anonymous page.
memory_failure will split the huge zero page and trigger
BUG_ON(is_huge_zero_page(page));
After commit 98ed2b0052e6 ("mm/memory-failure: give up error handling
for non-tail-refcounted thp"), memory_failure will not catch non anon
thp from madvise_hwpoison path and this bug occur.
Fix it by catching non anon thp in memory_failure in order to not split
huge zero page in madvise_hwpoison path.
After this patch:
Injecting memory failure for page 0x202800 at 0x7fd8ae800000
MCE: 0x202800: non anonymous thp
[...]
[akpm@linux-foundation.org: remove second split, per Wanpeng] Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()
After we acquire the sma->sem_perm lock in exit_sem(), we are protected
against a racing IPC_RMID operation. Also at that point, we are the last
user of sem_undo_list. Therefore it isn't required that we acquire or use
ulp->lock.
Signed-off-by: Herton R. Krzesinski <herton@redhat.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Rafael Aquini <aquini@redhat.com> CC: Aristeu Rozanski <aris@redhat.com> Cc: David Jeffery <djeffery@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits
The current semaphore code allows a potential use after free: in
exit_sem we may free the task's sem_undo_list while there is still
another task looping through the same semaphore set and cleaning the
sem_undo list at freeary function (the task called IPC_RMID for the same
semaphore set).
For example, with a test program [1] running which keeps forking a lot
of processes (which then do a semop call with SEM_UNDO flag), and with
the parent right after removing the semaphore set with IPC_RMID, and a
kernel built with CONFIG_SLAB, CONFIG_SLAB_DEBUG and
CONFIG_DEBUG_SPINLOCK, you can easily see something like the following
in the kernel log:
I wasn't able to trigger any badness on a recent kernel without the
proper config debugs enabled, however I have softlockup reports on some
kernel versions, in the semaphore code, which are similar as above (the
scenario is seen on some servers running IBM DB2 which uses semaphore
syscalls).
The patch here fixes the race against freeary, by acquiring or waiting
on the sem_undo_list lock as necessary (exit_sem can race with freeary,
while freeary sets un->semid to -1 and removes the same sem_undo from
list_proc or when it removes the last sem_undo).
After the patch I'm unable to reproduce the problem using the test case
[1].
void create_set()
{
int i, j;
pid_t p;
union {
int val;
struct semid_ds *buf;
unsigned short int *array;
struct seminfo *__buf;
} un;
/* Create and initialize semaphore set */
for (i = 0; i < NSET; i++) {
sid[i] = semget(IPC_PRIVATE , NSEM, 0644 | IPC_CREAT);
if (sid[i] < 0) {
perror("semget");
exit(EXIT_FAILURE);
}
}
un.val = 0;
for (i = 0; i < NSET; i++) {
for (j = 0; j < NSEM; j++) {
if (semctl(sid[i], j, SETVAL, un) < 0)
perror("semctl");
}
}
/* Launch threads that operate on semaphore set */
for (i = 0; i < NSEM * NSET * NSET; i++) {
p = fork();
if (p < 0)
perror("fork");
if (p == 0)
thread();
}
/* Free semaphore set */
for (i = 0; i < NSET; i++) {
if (semctl(sid[i], NSEM, IPC_RMID))
perror("IPC_RMID");
}
/* Wait for forked processes to exit */
while (wait(NULL)) {
if (errno == ECHILD)
break;
};
}
int main(int argc, char **argv)
{
pid_t p;
srand(time(NULL));
while (1) {
p = fork();
if (p < 0) {
perror("fork");
exit(EXIT_FAILURE);
}
if (p == 0) {
create_set();
goto end;
}
/* Wait for forked processes to exit */
while (wait(NULL)) {
if (errno == ECHILD)
break;
};
}
end:
return 0;
}
[akpm@linux-foundation.org: use normal comment layout] Signed-off-by: Herton R. Krzesinski <herton@redhat.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Rafael Aquini <aquini@redhat.com> CC: Aristeu Rozanski <aris@redhat.com> Cc: David Jeffery <djeffery@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wanpeng Li [Fri, 14 Aug 2015 22:34:59 +0000 (15:34 -0700)]
mm/hwpoison: fix fail isolate hugetlbfs page w/ refcount held
Hugetlbfs pages will get a refcount in get_any_page() or
madvise_hwpoison() if soft offlining through madvise. The refcount which
is held by the soft offline path should be released if we fail to isolate
hugetlbfs pages.
Fix it by reducing the refcount for both isolation success and failure.
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: <stable@vger.kernel.org> [3.9+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wanpeng Li [Fri, 14 Aug 2015 22:34:56 +0000 (15:34 -0700)]
mm/hwpoison: fix page refcount of unknown non LRU page
After trying to drain pages from pagevec/pageset, we try to get reference
count of the page again, however, the reference count of the page is not
reduced if the page is still not on LRU list.
Fix it by adding the put_page() to drop the page reference which is from
__get_any_page().
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: <stable@vger.kernel.org> [3.9+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 14 Aug 2015 18:06:43 +0000 (11:06 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
"A single clocksource driver suspend/resume fix"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clockevents/drivers/sh_cmt: Only perform clocksource suspend/resume if enabled