]> git.kernelconcepts.de Git - karo-tx-linux.git/log
karo-tx-linux.git
6 years agoperf: qcom_l2: fix column exclusion check
Neil Leeder [Mon, 24 Jul 2017 21:17:02 +0000 (17:17 -0400)]
perf: qcom_l2: fix column exclusion check

The check for column exclusion did not verify that the event being
checked was an L2 event, and not a software event.
Software events should not be checked for column exclusion.
This resulted in a group with both software and L2 events sometimes
incorrectly rejecting the L2 event for column exclusion and
not counting it.

Add a check for PMU type before applying column exclusion logic.

Fixes: 21bdbb7102edeaeb ("perf: add qcom l2 cache perf events driver")
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Neil Leeder <nleeder@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
6 years agopowerpc/Makefile: Fix ld version check with 64-bit LE-only toolchain
Michael Ellerman [Wed, 26 Jul 2017 05:00:42 +0000 (15:00 +1000)]
powerpc/Makefile: Fix ld version check with 64-bit LE-only toolchain

In commit efe0160cfd40 ("powerpc/64: Linker on-demand sfpr functions
for modules"), we added an ld version check early in the powerpc
top-level Makefile.

Because the Makefile runs before the kernel config is setup, the
checks for CONFIG_CPU_LITTLE_ENDIAN etc. all take the default case. So
we end up configuring ld for 32-bit big endian.

That would be OK, except that for historical (or perhaps no) reason,
we use 'override LD' to add the endian flags to the LD variable
itself, rather than the normal approach of adding them to LDFLAGS.

The end result is that when we check the ld version we run it as:

  $(CROSS_COMPILE)ld -EB -m elf32ppc --version

This often works, unless you are using a 64-bit only and/or little
endian only, toolchain. In which case you see something like:

  $ make defconfig
  powerpc64le-linux-ld: unrecognised emulation mode: elf32ppc
  Supported emulations: elf64lppc elf32lppc elf32lppclinux elf32lppcsim
  /bin/sh: 1: [: -ge: unexpected operator

The proper fix is to stop using 'override LD', but that will require a
fair bit of testing. Instead we can fix it for now just by reordering
the Makefile to do the version check earlier.

Fixes: efe0160cfd40 ("powerpc/64: Linker on-demand sfpr functions for modules")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/pseries: Fix of_node_put() underflow during reconfig remove
Laurent Vivier [Fri, 21 Jul 2017 14:51:39 +0000 (16:51 +0200)]
powerpc/pseries: Fix of_node_put() underflow during reconfig remove

As for commit 68baf692c435 ("powerpc/pseries: Fix of_node_put()
underflow during DLPAR remove"), the call to of_node_put() must be
removed from pSeries_reconfig_remove_node().

dlpar_detach_node() and pSeries_reconfig_remove_node() both call
of_detach_node(), and thus the node should not be released in both
cases.

Fixes: 0829f6d1f69e ("of: device_node kobject lifecycle fixes")
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/mm/radix: Workaround prefetch issue with KVM
Benjamin Herrenschmidt [Mon, 24 Jul 2017 04:26:06 +0000 (14:26 +1000)]
powerpc/mm/radix: Workaround prefetch issue with KVM

There's a somewhat architectural issue with Radix MMU and KVM.

When coming out of a guest with AIL (Alternate Interrupt Location, ie,
MMU enabled), we start executing hypervisor code with the PID register
still containing whatever the guest has been using.

The problem is that the CPU can (and will) then start prefetching or
speculatively load from whatever host context has that same PID (if
any), thus bringing translations for that context into the TLB, which
Linux doesn't know about.

This can cause stale translations and subsequent crashes.

Fixing this in a way that is neither racy nor a huge performance
impact is difficult. We could just make the host invalidations always
use broadcast forms but that would hurt single threaded programs for
example.

We chose to fix it instead by partitioning the PID space between guest
and host. This is possible because today Linux only use 19 out of the
20 bits of PID space, so existing guests will work if we make the host
use the top half of the 20 bits space.

We additionally add support for a property to indicate to Linux the
size of the PID register which will be useful if we eventually have
processors with a larger PID space available.

There is still an issue with malicious guests purposefully setting the
PID register to a value in the hosts PID range. Hopefully future HW
can prevent that, but in the meantime, we handle it with a pair of
kludges:

 - On the way out of a guest, before we clear the current VCPU in the
   PACA, we check the PID and if it's outside of the permitted range
   we flush the TLB for that PID.

 - When context switching, if the mm is "new" on that CPU (the
   corresponding bit was set for the first time in the mm cpumask), we
   check if any sibling thread is in KVM (has a non-NULL VCPU pointer
   in the PACA). If that is the case, we also flush the PID for that
   CPU (core).

This second part is needed to handle the case where a process is
migrated (or starts a new pthread) on a sibling thread of the CPU
coming out of KVM, as there's a window where stale translations can
exist before we detect it and flush them out.

A future optimization could be added by keeping track of whether the
PID has ever been used and avoid doing that for completely fresh PIDs.
We could similarily mark PIDs that have been the subject of a global
invalidation as "fresh". But for now this will do.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Rework the asm to build with CONFIG_PPC_RADIX_MMU=n, drop
      unneeded include of kvm_book3s_asm.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Wed, 26 Jul 2017 03:10:10 +0000 (20:10 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Three small fixes.

  The transfer size fixes are actually correcting some performance drops
  on the hpsa and smartpqi cards. The cards actually have an internal
  cache for request speed up but bypass it for transfers > 1MB. Since
  4.3 the efficiency of our merges has rendered the cache mostly unused,
  so limit transfers to under 1MB to recover the cache boost"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: sg: fix static checker warning in sg_is_valid_dxfer
  scsi: smartpqi: limit transfer length to 1MB
  scsi: hpsa: limit transfer length to 1MB

6 years agoMerge tag 'uuid-for-4.13-2' of git://git.infradead.org/users/hch/uuid
Linus Torvalds [Wed, 26 Jul 2017 02:46:05 +0000 (19:46 -0700)]
Merge tag 'uuid-for-4.13-2' of git://git.infradead.org/users/hch/uuid

Pull uuid fixes from Christoph Hellwig:

 - add a missing "!" in the uuid tests

 - remove the last remaining user of the uuid_be type, and then the type
   and its helpers

* tag 'uuid-for-4.13-2' of git://git.infradead.org/users/hch/uuid:
  uuid: remove uuid_be
  thunderbolt: use uuid_t instead of uuid_be
  uuid: fix incorrect uuid_equal conversion in test_uuid_test

6 years agoMerge tag 'dma-mapping-4.13-2' of git://git.infradead.org/users/hch/dma-mapping
Linus Torvalds [Wed, 26 Jul 2017 00:17:18 +0000 (17:17 -0700)]
Merge tag 'dma-mapping-4.13-2' of git://git.infradead.org/users/hch/dma-mapping

Pull dma mapping fixes from Christoph Hellwig:
 "split the global dma coherent pool from the per-device pool.

  This fixes a regression in the earlier 4.13 pull requests where the
  global pool would override a per-device CMA pool (Vladimir Murzin)"

* tag 'dma-mapping-4.13-2' of git://git.infradead.org/users/hch/dma-mapping:
  ARM: NOMMU: Wire-up default DMA interface
  dma-coherent: introduce interface for default DMA pool

6 years agoMD: fix warnning for UP case
Shaohua Li [Tue, 25 Jul 2017 22:18:13 +0000 (15:18 -0700)]
MD: fix warnning for UP case

spin_is_locked always returns 0 for UP case, so ignores it

Reported-by: Joshua Kinard <kumba@gentoo.org>
Signed-off-by: Shaohua Li <shli@fb.com>
6 years agoparisc: Extend disabled preemption in copy_user_page
John David Anglin [Tue, 25 Jul 2017 21:23:35 +0000 (17:23 -0400)]
parisc: Extend disabled preemption in copy_user_page

It's always bothered me that we only disable preemption in
copy_user_page around the call to flush_dcache_page_asm.
This patch extends this to after the copy.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.9+
Signed-off-by: Helge Deller <deller@gmx.de>
6 years agoparisc: Prevent TLB speculation on flushed pages on CPUs that only support equivalent...
John David Anglin [Tue, 25 Jul 2017 21:11:26 +0000 (17:11 -0400)]
parisc: Prevent TLB speculation on flushed pages on CPUs that only support equivalent aliases

Helge noticed that we flush the TLB page in flush_cache_page but not in
flush_cache_range or flush_cache_mm.

For a long time, we have had random segmentation faults building
packages on machines with PA8800/8900 processors.  These machines only
support equivalent aliases.  We don't see these faults on machines that
don't require strict coherency.  So, it appears TLB speculation
sometimes leads to cache corruption on machines that require coherency.

This patch adds TLB flushes to flush_cache_range and flush_cache_mm when
coherency is required.  We only flush the TLB in flush_cache_page when
coherency is required.

The patch also optimizes flush_cache_range.  It turns out we always have
the right context to use flush_user_dcache_range_asm and
flush_user_icache_range_asm.

The patch has been tested for some time on rp3440, rp3410 and A500-44.
It's been boot tested on c8000.  No random segmentation faults were
observed during testing.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.9+
Signed-off-by: Helge Deller <deller@gmx.de>
6 years agoMerge branch 'stable/for-jens-4.13' of git://git.kernel.org/pub/scm/linux/kernel...
Jens Axboe [Tue, 25 Jul 2017 21:30:21 +0000 (15:30 -0600)]
Merge branch 'stable/for-jens-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus

Pull xen-blkfront fixes from Konrad for 4.13.

6 years agoALSA: hda - Add mute led support for HP ProBook 440 G4
Kai-Heng Feng [Tue, 25 Jul 2017 16:23:46 +0000 (00:23 +0800)]
ALSA: hda - Add mute led support for HP ProBook 440 G4

Mic mute led does not work on HP ProBook 440 G4.
We can use CXT_FIXUP_MUTE_LED_GPIO fixup to support it.

BugLink: https://bugs.launchpad.net/bugs/1705586
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: <stable@vger.kernel.org> # v4.12+
Signed-off-by: Takashi Iwai <tiwai@suse.de>
6 years agodrm/amd/powerplay: fix AVFS voltage offset for Vega10
Eric Huang [Mon, 17 Jul 2017 21:18:33 +0000 (17:18 -0400)]
drm/amd/powerplay: fix AVFS voltage offset for Vega10

Signed-off-by: Eric Huang <JinHuiEric.Huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 years agodrm/amdgpu/gfx9: simplify and fix GRBM index selection
Nicolai Hähnle [Fri, 14 Jul 2017 11:00:04 +0000 (13:00 +0200)]
drm/amdgpu/gfx9: simplify and fix GRBM index selection

Copy the approach taken by gfx8, which simplifies the code, and set the
instance index properly. The latter is required for debugging, e.g. for
reading wave status by UMR.

Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 years agodrm/amdgpu: Fix blocking in RCU critical section(v2)
Alex Xie [Thu, 20 Jul 2017 04:02:08 +0000 (00:02 -0400)]
drm/amdgpu: Fix blocking in RCU critical section(v2)

In RCU read-side critical sections, blocking or sleeping is prohibited.

v2: Unlock RCU for the code path where result==NULL. (David Zhou)
    Update subject

Tested-by and reported by: Dave Airlie <airlied@redhat.com>

[  141.965723] =============================
[  141.965724] WARNING: suspicious RCU usage
[  141.965726] 4.12.0-rc7 #221 Not tainted
[  141.965727] -----------------------------
[  141.965728] /home/airlied/devel/kernel/linux-2.6/include/linux/rcupdate.h:531
Illegal context switch in RCU read-side critical section!
[  141.965730]
               other info that might help us debug this:

[  141.965731]
               rcu_scheduler_active = 2, debug_locks = 0
[  141.965732] 1 lock held by amdgpu_cs:0/1332:
[  141.965733]  #0:  (rcu_read_lock){......}, at: [<ffffffffa01a0d07>]
amdgpu_bo_list_get+0x0/0x109 [amdgpu]
[  141.965774]
               stack backtrace:
[  141.965776] CPU: 6 PID: 1332 Comm: amdgpu_cs:0 Not tainted 4.12.0-rc7 #221
[  141.965777] Hardware name: To be filled by O.E.M. To be filled by
O.E.M./M5A97 R2.0, BIOS 2603 06/26/2015
[  141.965778] Call Trace:
[  141.965782]  dump_stack+0x68/0x92
[  141.965785]  lockdep_rcu_suspicious+0xf7/0x100
[  141.965788]  ___might_sleep+0x56/0x1fc
[  141.965790]  __might_sleep+0x68/0x6f
[  141.965793]  __mutex_lock+0x4e/0x7b5
[  141.965817]  ? amdgpu_bo_list_get+0xa4/0x109 [amdgpu]
[  141.965820]  ? lock_acquire+0x125/0x1b9
[  141.965844]  ? amdgpu_bo_list_set+0x464/0x464 [amdgpu]
[  141.965846]  mutex_lock_nested+0x16/0x18
[  141.965848]  ? mutex_lock_nested+0x16/0x18
[  141.965872]  amdgpu_bo_list_get+0xa4/0x109 [amdgpu]
[  141.965895]  amdgpu_cs_ioctl+0x4a0/0x17dd [amdgpu]
[  141.965898]  ? radix_tree_node_alloc.constprop.11+0x77/0xab
[  141.965916]  drm_ioctl+0x264/0x393 [drm]
[  141.965939]  ? amdgpu_cs_find_mapping+0x83/0x83 [amdgpu]
[  141.965942]  ? trace_hardirqs_on_caller+0x16a/0x186

Signed-off-by: Alex Xie <AlexBin.Xie@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 years agonbd: clear disconnected on reconnect
Josef Bacik [Tue, 25 Jul 2017 17:31:19 +0000 (13:31 -0400)]
nbd: clear disconnected on reconnect

If our device loses its connection for longer than the dead timeout we
will set NBD_DISCONNECTED in order to quickly fail any pending IO's that
flood in after the IO's that were waiting during the dead timer.
However if we re-connect at some point in the future we'll still see
this DISCONNECTED flag set if we then lose our connection again after
that, which means we won't get notifications for our newly lost
connections.  Fix this by just clearing the DISCONNECTED flag on
reconnect in order to make sure everything works as it's supposed to.

Reported-by: Dan Melnic <dmm@fb.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoparisc: Suspend lockup detectors before system halt
Helge Deller [Tue, 25 Jul 2017 19:41:41 +0000 (21:41 +0200)]
parisc: Suspend lockup detectors before system halt

Some machines can't power off the machine, so disable the lockup detectors to
avoid this watchdog BUG to show up every few seconds:
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [systemd-shutdow:1]

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.9+
6 years agoparisc: Show DIMM slot number which holds broken memory module
Helge Deller [Tue, 25 Jul 2017 18:12:25 +0000 (20:12 +0200)]
parisc: Show DIMM slot number which holds broken memory module

The Page Deallocation Table (PDT) holds the physical addresses of all broken
memory addresses. With the physical address we now are able to show which DIMM
slot (e.g. 1a, 3c) actually holds the broken memory module so that users are
able to replace it.

Signed-off-by: Helge Deller <deller@gmx.de>
6 years agodm zoned: remove test for impossible REQ_OP_FLUSH conditions
Mikulas Patocka [Fri, 21 Jul 2017 15:56:46 +0000 (11:56 -0400)]
dm zoned: remove test for impossible REQ_OP_FLUSH conditions

The value REQ_OP_FLUSH is only used by the block code for
request-based devices.

Remove the tests for REQ_OP_FLUSH from the bio-based dm-zoned-target.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
6 years agodm raid: bump target version
Heinz Mauelshagen [Thu, 13 Jul 2017 15:52:18 +0000 (17:52 +0200)]
dm raid: bump target version

Bumo dm-raid target version to 1.12.1 to reflect that commit cc27b0c78c
("md: fix deadlock between mddev_suspend() and md_write_start()") is
available.

This version change allows userspace to detect that MD fix is available.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
6 years agodm raid: avoid mddev->suspended access
Heinz Mauelshagen [Thu, 13 Jul 2017 15:34:24 +0000 (17:34 +0200)]
dm raid: avoid mddev->suspended access

Use runtime flag to ensure that an mddev gets suspended/resumed just once.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
6 years agodm raid: fix activation check in validate_raid_redundancy()
Heinz Mauelshagen [Thu, 13 Jul 2017 15:36:12 +0000 (17:36 +0200)]
dm raid: fix activation check in validate_raid_redundancy()

During growing reshapes (i.e. stripes being added to a raid set), the
new stripe images are not in-sync and not part of the raid set until
the reshape is started.

LVM2 has to request multiple table reloads involving superblock updates
in order to reflect proper size of SubLVs in the cluster.  Before a stripe
adding reshape starts, validate_raid_redundancy() fails as a result of that
because it checks the total number of devices against the number of rebuild
ones rather than the actual ones in the raid set (as retrieved from the
superblock) thus resulting in failed raid4/5/6/10 redundancy checks.

E.g. convert 3 stripes -> 7 stripes raid5 (which only allows for maximum
1 device to fail) requesting +4 delta disks causing 4 devices to rebuild
during reshaping thus failing activation.

To fix this, move validate_raid_redundancy() to get access to the
current raid_set members.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
6 years agodm raid: remove WARN_ON() in raid10_md_layout_to_format()
Heinz Mauelshagen [Thu, 13 Jul 2017 15:33:22 +0000 (17:33 +0200)]
dm raid: remove WARN_ON() in raid10_md_layout_to_format()

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
6 years agoparisc: Add function to return DIMM slot of physical address
Helge Deller [Tue, 25 Jul 2017 17:26:23 +0000 (19:26 +0200)]
parisc: Add function to return DIMM slot of physical address

Add a firmware wrapper function, which asks PDC firmware for the DIMM slot of a
physical address. This is needed to show users which DIMM module needs
replacement in case a broken DIMM was encountered.

Signed-off-by: Helge Deller <deller@gmx.de>
6 years agoparisc: Fix crash when calling PDC_PAT_MEM PDT firmware function
Helge Deller [Tue, 25 Jul 2017 16:20:54 +0000 (18:20 +0200)]
parisc: Fix crash when calling PDC_PAT_MEM PDT firmware function

Commit c9c2877d08d9 ("parisc: Add Page Deallocation Table (PDT) support")
introduced the pdc_pat_mem_read_pd_pdt() firmware helper function, which
crashed the system because it trashed the stack if the
pdc_pat_mem_read_pd_retinfo struct was located on the stack (and which is
in size less than the required 32 64-bit values).

Fix it by using the pdc_result struct instead when calling firmware and copy
the return values back into the result struct when finished sucessfully.

While debugging this code I noticed that the pdc_type wasn't set correctly
either, so let's fix that too.

Fixes: c9c2877d08d9 ("parisc: Add Page Deallocation Table (PDT) support")
Signed-off-by: Helge Deller <deller@gmx.de>
6 years agonvme-pci: fix HMB size calculation
Christoph Hellwig [Tue, 25 Jul 2017 15:39:07 +0000 (17:39 +0200)]
nvme-pci: fix HMB size calculation

It's possible the preferred HMB size may not be a multiple of the
chunk_size. This patch moves len to function scope and uses that in
the for loop increment so the last iteration doesn't cause the total
size to exceed the allocated HMB size.

Based on an earlier patch from Keith Busch.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Fixes: 87ad72a59a38 ("nvme-pci: implement host memory buffer support")
6 years agonvme-fc: revise TRADDR parsing
James Smart [Mon, 17 Jul 2017 20:59:39 +0000 (13:59 -0700)]
nvme-fc: revise TRADDR parsing

The FC-NVME spec hasn't locked down on the format string for TRADDR.
Currently the spec is lobbying for "nn-<16hexdigits>:pn-<16hexdigits>"
where the wwn's are hex values but not prefixed by 0x.

Most implementations so far expect a string format of
"nn-0x<16hexdigits>:pn-0x<16hexdigits>" to be used. The transport
uses the match_u64 parser which requires a leading 0x prefix to set
the base properly. If it's not there, a match will either fail or return
a base 10 value.

The resolution in T11 is pushing out. Therefore, to fix things now and
to cover any eventuality and any implementations already in the field,
this patch adds support for both formats.

The change consists of replacing the token matching routine with a
routine that validates the fixed string format, and then builds
a local copy of the hex name with a 0x prefix before calling
the system parser.

Note: the same parser routine exists in both the initiator and target
transports. Given this is about the only "shared" item, we chose to
replicate rather than create an interdendency on some shared code.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agonvme-fc: address target disconnect race conditions in fcp io submit
James Smart [Tue, 18 Jul 2017 21:29:34 +0000 (14:29 -0700)]
nvme-fc: address target disconnect race conditions in fcp io submit

There are cases where threads are in the process of submitting new
io when the LLDD calls in to remove the remote port. In some cases,
the next io actually goes to the LLDD, who knows the remoteport isn't
present and rejects it. To properly recovery/restart these i/o's we
don't want to hard fail them, we want to treat them as temporary
resource errors in which a delayed retry will work.

Add a couple more checks on remoteport connectivity and commonize the
busy response handling when it's seen.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agonvme: fabrics commands should use the fctype field for data direction
Jon Derrick [Wed, 12 Jul 2017 16:58:19 +0000 (10:58 -0600)]
nvme: fabrics commands should use the fctype field for data direction

Fabrics commands with opcode 0x7F use the fctype field to indicate data
direction.

Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Reviewed-by: Sagi Grimberg <sai@grmberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Fixes: eb793e2c ("nvme.h: add NVMe over Fabrics definitions")
6 years agonvme: also provide a UUID in the WWID sysfs attribute
Johannes Thumshirn [Wed, 12 Jul 2017 13:38:56 +0000 (15:38 +0200)]
nvme: also provide a UUID in the WWID sysfs attribute

The WWID sysfs attribute can provide multiple means of a World Wide ID
for a NVMe device. It can either be a NGUID, a EUI-64 or a concatenation
of VID, Serial Number, Model and the Namespace ID in this order of
preference.

If the target also sends us a UUID use the UUID for identification and
give it the highest priority.

This eases generation of /dev/disk/by-* symlinks.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoMerge tag 'jfs-4.13' of git://github.com/kleikamp/linux-shaggy
Linus Torvalds [Tue, 25 Jul 2017 15:51:57 +0000 (08:51 -0700)]
Merge tag 'jfs-4.13' of git://github.com/kleikamp/linux-shaggy

Pull JFS fixes from David Kleikamp.

* tag 'jfs-4.13' of git://github.com/kleikamp/linux-shaggy:
  jfs: preserve i_mode if __jfs_set_acl() fails
  jfs: Don't clear SGID when inheriting ACLs
  jfs: atomically read inode size

6 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Linus Torvalds [Tue, 25 Jul 2017 15:49:00 +0000 (08:49 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid

Pull HID fixes from Jiri Kosina:

 - regression fix (missing IRQs) for devices that require 'always poll'
   quirk, from Dmitry Torokhov

 - new device ID addition to Ortek driver, from Benjamin Tissoires

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: ortek: add one more buggy device
  HID: usbhid: fix "always poll" quirk

6 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Tue, 25 Jul 2017 15:44:27 +0000 (08:44 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:
 "Three bug fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/mm: set change and reference bit on lazy key enablement
  s390: chp: handle CRW_ERC_INIT for channel-path status change
  s390/perf: fix problem state detection

6 years agoxfs: check that dir block entries don't off the end of the buffer
Darrick J. Wong [Fri, 21 Jul 2017 18:04:23 +0000 (11:04 -0700)]
xfs: check that dir block entries don't off the end of the buffer

When we're checking the entries in a directory buffer, make sure that
the entry length doesn't push us off the end of the buffer.  Found via
xfs/388 writing ones to the length fields.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
6 years agoxen/blkfront: always allocate grants first from per-queue persistent grants
Dongli Zhang [Wed, 28 Jun 2017 12:57:28 +0000 (20:57 +0800)]
xen/blkfront: always allocate grants first from per-queue persistent grants

This patch partially reverts 3df0e50 ("xen/blkfront: pseudo support for
multi hardware queues/rings"). The xen-blkfront queue/ring might hang due
to grants allocation failure in the situation when gnttab_free_head is
almost empty while many persistent grants are reserved for this queue/ring.

As persistent grants management was per-queue since 73716df ("xen/blkfront:
make persistent grants pool per-queue"), we should always allocate from
persistent grants first.

Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
6 years agoxen-blkfront: fix mq start/stop race
Junxiao Bi [Thu, 20 Jul 2017 01:26:21 +0000 (09:26 +0800)]
xen-blkfront: fix mq start/stop race

When ring buf full, hw queue will be stopped. While blkif interrupt consume
request and make free space in ring buf, hw queue will be started again.
But since start queue is protected by spin lock while stop not, that will
cause a race.

interrupt:                                      process:
blkif_interrupt()                               blkif_queue_rq()
 kick_pending_request_queues_locked()
   blk_mq_start_stopped_hw_queues()
      clear_bit(BLK_MQ_S_STOPPED, &hctx->state)
                                             blk_mq_stop_hw_queue(hctx)
  blk_mq_run_hw_queue(hctx, async)

If ring buf is made empty in this case, interrupt will never come, then the
hw queue will be stopped forever, all processes waiting for the pending io
in the queue will hung.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
6 years agodm bufio: fix error code in dm_bufio_write_dirty_buffers()
Dan Carpenter [Wed, 12 Jul 2017 07:26:34 +0000 (10:26 +0300)]
dm bufio: fix error code in dm_bufio_write_dirty_buffers()

We should be returning normal negative error codes here.  The "a"
variables comes from &c->async_write_error which is a blk_status_t
converted to a regular error code.

In the current code, the blk_status_t gets propogated back to
pool_create() and eventually results in an Oops.

Fixes: 4e4cbee93d56 ("block: switch bios to blk_status_t")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
6 years agodm integrity: test for corrupted disk format during table load
Mikulas Patocka [Fri, 21 Jul 2017 15:58:38 +0000 (11:58 -0400)]
dm integrity: test for corrupted disk format during table load

If the dm-integrity superblock was corrupted in such a way that the
journal_sections field was zero, the integrity target would deadlock
because it would wait forever for free space in the journal.

Detect this situation and refuse to activate the device.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 7eada909bfd7 ("dm: add integrity target")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
6 years agodm integrity: WARN_ON if variables representing journal usage get out of sync
Mikulas Patocka [Fri, 21 Jul 2017 17:16:06 +0000 (13:16 -0400)]
dm integrity: WARN_ON if variables representing journal usage get out of sync

If this WARN_ON triggers it speaks to programmer error, and likely
implies corruption, but no released kernel should trigger it.  This
WARN_ON serves to assist DM integrity developers as changes are
made/tested in the future.

BUG_ON is excessive for catching programmer error, if a user or
developer would like warnings to trigger a panic, they can enable that
via /proc/sys/kernel/panic_on_warn

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
6 years agovirtio-net: fix module unloading
Andrew Jones [Mon, 24 Jul 2017 13:38:32 +0000 (15:38 +0200)]
virtio-net: fix module unloading

Unregister the driver before removing multi-instance hotplug
callbacks. This order avoids the warning issued from
__cpuhp_remove_state_cpuslocked when the number of remaining
instances isn't yet zero.

Fixes: 8017c279196a ("net/virtio-net: Convert to hotplug state machine")
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
6 years agovirtio-balloon: coding format cleanup
Wei Wang [Wed, 12 Jul 2017 12:40:15 +0000 (20:40 +0800)]
virtio-balloon: coding format cleanup

Clean up the comment format.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
6 years agovirtio-balloon: deflate via a page list
Liang Li [Wed, 12 Jul 2017 12:40:14 +0000 (20:40 +0800)]
virtio-balloon: deflate via a page list

This patch saves the deflated pages to a list, instead of the PFN array.
Accordingly, the balloon_pfn_to_page() function is removed.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
6 years agovirtio_blk: Use sysfs_match_string() helper
Andy Shevchenko [Fri, 9 Jun 2017 12:07:42 +0000 (15:07 +0300)]
virtio_blk: Use sysfs_match_string() helper

Use sysfs_match_string() helper instead of open coded variant.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
6 years agoKVM: s390: take srcu lock when getting/setting storage keys
Christian Borntraeger [Mon, 10 Jul 2017 11:35:48 +0000 (13:35 +0200)]
KVM: s390: take srcu lock when getting/setting storage keys

The following warning was triggered by missing srcu locks around
the storage key handling functions.

=============================
WARNING: suspicious RCU usage
4.12.0+ #56 Not tainted
-----------------------------
./include/linux/kvm_host.h:572 suspicious rcu_dereference_check() usage!
rcu_scheduler_active = 2, debug_locks = 1
 1 lock held by live_migration/4936:
  #0:  (&mm->mmap_sem){++++++}, at: [<0000000000141be0>]
kvm_arch_vm_ioctl+0x6b8/0x22d0

 CPU: 8 PID: 4936 Comm: live_migration Not tainted 4.12.0+ #56
 Hardware name: IBM 2964 NC9 704 (LPAR)
 Call Trace:
 ([<000000000011378a>] show_stack+0xea/0xf0)
  [<000000000055cc4c>] dump_stack+0x94/0xd8
  [<000000000012ee70>] gfn_to_memslot+0x1a0/0x1b8
  [<0000000000130b76>] gfn_to_hva+0x2e/0x48
  [<0000000000141c3c>] kvm_arch_vm_ioctl+0x714/0x22d0
  [<000000000013306c>] kvm_vm_ioctl+0x11c/0x7b8
  [<000000000037e2c0>] do_vfs_ioctl+0xa8/0x6c8
  [<000000000037e984>] SyS_ioctl+0xa4/0xb8
  [<00000000008b20a4>] system_call+0xc4/0x27c
 1 lock held by live_migration/4936:
  #0:  (&mm->mmap_sem){++++++}, at: [<0000000000141be0>]
kvm_arch_vm_ioctl+0x6b8/0x22d0

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
6 years agox86/efi: Fix reboot_mode when EFI runtime services are disabled
Stefan Assmann [Mon, 24 Jul 2017 12:22:48 +0000 (14:22 +0200)]
x86/efi: Fix reboot_mode when EFI runtime services are disabled

When EFI runtime services are disabled, for example by the "noefi"
kernel cmdline parameter, the reboot_type could still be set to
BOOT_EFI causing reboot to fail.

Fix this by checking if EFI runtime services are enabled.

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170724122248.24006-1-sassmann@kpanic.de
[ Fixed 'not disabled' double negation. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agosched/wait: Clean up some documentation warnings
Jonathan Corbet [Mon, 24 Jul 2017 19:58:00 +0000 (13:58 -0600)]
sched/wait: Clean up some documentation warnings

A couple of kerneldoc comments in <linux/wait.h> had incorrect names for
macro parameters, with this unsightly result:

  ./include/linux/wait.h:555: warning: No description found for parameter 'wq'
  ./include/linux/wait.h:555: warning: Excess function parameter 'wq_head' description in 'wait_event_interruptible_hrtimeout'
  ./include/linux/wait.h:759: warning: No description found for parameter 'wq_head'
  ./include/linux/wait.h:759: warning: Excess function parameter 'wq' description in 'wait_event_killable'

Correct the comments and kill the warnings.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/20170724135800.769c4042@lwn.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agosched/core: Fix some documentation build warnings
Jonathan Corbet [Mon, 24 Jul 2017 19:56:28 +0000 (13:56 -0600)]
sched/core: Fix some documentation build warnings

The kerneldoc comments for try_to_wake_up_local() were out of date, leading
to these documentation build warnings:

  ./kernel/sched/core.c:2080: warning: No description found for parameter 'rf'
  ./kernel/sched/core.c:2080: warning: Excess function parameter 'cookie' description in 'try_to_wake_up_local'

Update the comment to reflect current reality and give us some peace and
quiet.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/20170724135628.695cecfc@lwn.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agox86/boot: #undef memcpy() et al in string.c
Michael Davidson [Mon, 24 Jul 2017 23:51:55 +0000 (16:51 -0700)]
x86/boot: #undef memcpy() et al in string.c

undef memcpy() and friends in boot/string.c so that the functions
defined here will have the correct names, otherwise we end up
up trying to redefine __builtin_memcpy() etc.

Surprisingly, GCC allows this (and, helpfully, discards the
__builtin_ prefix from the function name when compiling it),
but clang does not.

Adding these #undef's appears to preserve what I assume was
the original intent of the code.

Signed-off-by: Michael Davidson <md@google.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bernhard.Rosenkranzer@linaro.org
Cc: Greg Hackmann <ghackmann@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170724235155.79255-1-mka@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agoarm64/lib: copy_page: use consistent prefetch stride
Ard Biesheuvel [Wed, 12 Jul 2017 14:44:14 +0000 (15:44 +0100)]
arm64/lib: copy_page: use consistent prefetch stride

The optional prefetch instructions in the copy_page() routine are
inconsistent: at the start of the function, two cachelines are
prefetched beyond the one being loaded in the first iteration, but
in the loop, the prefetch is one more line ahead. This appears to
be unintentional, so let's fix it.

While at it, fix the comment style and white space.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
6 years agoMerge branch 'linux-4.13' of git://github.com/skeggsb/linux into drm-fixes
Dave Airlie [Tue, 25 Jul 2017 05:41:39 +0000 (15:41 +1000)]
Merge branch 'linux-4.13' of git://github.com/skeggsb/linux into drm-fixes

two more fixes for issues nouveau found in fedora 26.

* 'linux-4.13' of git://github.com/skeggsb/linux:
  drm/nouveau/bar/gf100: fix access to upper half of BAR2
  drm/nouveau/disp/nv50-: bump max chans to 21

6 years agodrm/nouveau/bar/gf100: fix access to upper half of BAR2
Ben Skeggs [Tue, 25 Jul 2017 01:06:47 +0000 (11:06 +1000)]
drm/nouveau/bar/gf100: fix access to upper half of BAR2

Bit 30 being set causes the upper half of BAR2 to stay in physical mode,
mapped over the end of VRAM, even when the rest of the BAR has been set
to virtual mode.

We inherited our initial value from RM, but I'm not aware of any reason
we need to keep it that way.

This fixes severe GPU hang/lockup issues revealed by Wayland on F26.

Shout-out to NVIDIA for the quick response with the potential cause!

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org # 4.3+
6 years agodrm/nouveau/disp/nv50-: bump max chans to 21
Ilia Mirkin [Wed, 28 Jun 2017 12:24:45 +0000 (08:24 -0400)]
drm/nouveau/disp/nv50-: bump max chans to 21

GP102's cursors go from chan 17..20. Increase the array size to hold
their data properly.

Fixes: e50fcff15f ("drm/nouveau/disp/gp102: fix cursor/overlay immediate channel indices")
Cc: stable@vger.kernel.org # v4.10+
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
6 years agodrm/i915/gvt: Extend KBL platform support in GVT-g
Jian Jun Chen [Wed, 19 Jul 2017 05:16:56 +0000 (13:16 +0800)]
drm/i915/gvt: Extend KBL platform support in GVT-g

Extend KBL platform support in GVT-g. Validation tests
are done on KBL server and KBL NUC. Both show the same
quality.

Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
6 years agoACPI: NUMA: Fix typo in the full name of SRAT
Ross Zwisler [Fri, 21 Jul 2017 22:51:24 +0000 (16:51 -0600)]
ACPI: NUMA: Fix typo in the full name of SRAT

To save someone the time of searching the ACPI spec for
"Static Resource Affinity Table".

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
6 years agoACPI: NUMA: add missing include in acpi_numa.h
Ross Zwisler [Fri, 21 Jul 2017 22:51:23 +0000 (16:51 -0600)]
ACPI: NUMA: add missing include in acpi_numa.h

Right now if a file includes acpi_numa.h and they don't happen to include
linux/numa.h before it, they get the following warning:

./include/acpi/acpi_numa.h:9:5: warning: "MAX_NUMNODES" is not defined [-Wundef]
 #if MAX_NUMNODES > 256
     ^~~~~~~~~~~~

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
6 years agoblk-mq: map queues to all present CPUs
Christoph Hellwig [Tue, 18 Jul 2017 15:04:40 +0000 (17:04 +0200)]
blk-mq: map queues to all present CPUs

We already do this for PCI mappings, and the higher level code now
expects that CPU on/offlining doesn't have an affect on the queue
mappings.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agouuid: remove uuid_be
Christoph Hellwig [Thu, 11 May 2017 07:16:24 +0000 (09:16 +0200)]
uuid: remove uuid_be

Everything uses uuid_t now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
6 years agothunderbolt: use uuid_t instead of uuid_be
Christoph Hellwig [Tue, 18 Jul 2017 13:30:05 +0000 (15:30 +0200)]
thunderbolt: use uuid_t instead of uuid_be

Switch thunderbolt to the new uuid type.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
6 years agoHID: ortek: add one more buggy device
Benjamin Tissoires [Tue, 18 Jul 2017 16:28:13 +0000 (18:28 +0200)]
HID: ortek: add one more buggy device

The iHome keypad also requires the same tweak we are doing for other
Ortek devices.

Reported-by: Mairin Duffy <duffy@redhat.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
6 years agoxfs: fix quotacheck dquot id overflow infinite loop
Brian Foster [Mon, 24 Jul 2017 15:33:25 +0000 (08:33 -0700)]
xfs: fix quotacheck dquot id overflow infinite loop

If a dquot has an id of U32_MAX, the next lookup index increment
overflows the uint32_t back to 0. This starts the lookup sequence
over from the beginning, repeats indefinitely and results in a
livelock.

Update xfs_qm_dquot_walk() to explicitly check for the lookup
overflow and exit the loop.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
6 years agomd/raid5: add thread_group worker async_tx_issue_pending_all
Ofer Heifetz [Mon, 24 Jul 2017 06:17:40 +0000 (09:17 +0300)]
md/raid5: add thread_group worker async_tx_issue_pending_all

Since thread_group worker and raid5d kthread are not in sync, if
worker writes stripe before raid5d then requests will be waiting
for issue_pendig.

Issue observed when building raid5 with ext4, in some build runs
jbd2 would get hung and requests were waiting in the HW engine
waiting to be issued.

Fix this by adding a call to async_tx_issue_pending_all in the
raid5_do_work.

Signed-off-by: Ofer Heifetz <oferh@marvell.com>
Cc: stable@vger.kernel.org
Signed-off-by: Shaohua Li <shli@fb.com>
6 years agoblock: disable runtime-pm for blk-mq
Christoph Hellwig [Fri, 21 Jul 2017 11:46:10 +0000 (13:46 +0200)]
block: disable runtime-pm for blk-mq

The blk-mq code lacks support for looking at the rpm_status field, tracking
active requests and the RQF_PM flag.

Due to the default switch to blk-mq for scsi people start to run into
suspend / resume issue due to this fact, so make sure we disable the runtime
PM functionality until it is properly implemented.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoxen-blkfront: Fix handling of non-supported operations
Bart Van Assche [Fri, 21 Jul 2017 17:11:10 +0000 (19:11 +0200)]
xen-blkfront: Fix handling of non-supported operations

This patch fixes the following sparse warnings:

drivers/block/xen-blkfront.c:916:45: warning: incorrect type in argument 2 (different base types)
drivers/block/xen-blkfront.c:916:45:    expected restricted blk_status_t [usertype] error
drivers/block/xen-blkfront.c:916:45:    got int [signed] error
drivers/block/xen-blkfront.c:1599:47: warning: incorrect type in assignment (different base types)
drivers/block/xen-blkfront.c:1599:47:    expected int [signed] error
drivers/block/xen-blkfront.c:1599:47:    got restricted blk_status_t [usertype] <noident>
drivers/block/xen-blkfront.c:1607:55: warning: incorrect type in assignment (different base types)
drivers/block/xen-blkfront.c:1607:55:    expected int [signed] error
drivers/block/xen-blkfront.c:1607:55:    got restricted blk_status_t [usertype] <noident>
drivers/block/xen-blkfront.c:1625:55: warning: incorrect type in assignment (different base types)
drivers/block/xen-blkfront.c:1625:55:    expected int [signed] error
drivers/block/xen-blkfront.c:1625:55:    got restricted blk_status_t [usertype] <noident>
drivers/block/xen-blkfront.c:1628:62: warning: restricted blk_status_t degrades to integer

Compile-tested only.

Fixes: commit 2a842acab109 ("block: introduce new block status code type")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: <xen-devel@lists.xenproject.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agobtrfs: round down size diff when shrinking/growing device
Nikolay Borisov [Fri, 21 Jul 2017 08:28:24 +0000 (11:28 +0300)]
btrfs: round down size diff when shrinking/growing device

Further testing showed that the fix introduced in 7dfb8be11b5d ("btrfs:
Round down values which are written for total_bytes_size") was
insufficient and it could still lead to discrepancies between the
total_bytes in the super block and the device total bytes. So this patch
also ensures that the difference between old/new sizes when
shrinking/growing is also rounded down. This ensure that we won't be
subtracting/adding a non-sectorsize multiples to the superblock/device
total sizees.

Fixes: 7dfb8be11b5d ("btrfs: Round down values which are written for total_bytes_size")
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agoBtrfs: fix early ENOSPC due to delalloc
Omar Sandoval [Thu, 20 Jul 2017 22:10:35 +0000 (15:10 -0700)]
Btrfs: fix early ENOSPC due to delalloc

If a lot of metadata is reserved for outstanding delayed allocations, we
rely on shrink_delalloc() to reclaim metadata space in order to fulfill
reservation tickets. However, shrink_delalloc() has a shortcut where if
it determines that space can be overcommitted, it will stop early. This
made sense before the ticketed enospc system, but now it means that
shrink_delalloc() will often not reclaim enough space to fulfill any
tickets, leading to an early ENOSPC. (Reservation tickets don't care
about being able to overcommit, they need every byte accounted for.)

Fix it by getting rid of the shortcut so that shrink_delalloc() reclaims
all of the metadata it is supposed to. This fixes early ENOSPCs we were
seeing when doing a btrfs receive to populate a new filesystem, as well
as early ENOSPCs Christoph saw when doing a big cp -r onto Btrfs.

Fixes: 957780eb2788 ("Btrfs: introduce ticketed enospc infrastructure")
Tested-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Cc: stable@vger.kernel.org
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: fix lockup in find_free_extent with read-only block groups
Jeff Mahoney [Thu, 20 Jul 2017 03:25:51 +0000 (23:25 -0400)]
btrfs: fix lockup in find_free_extent with read-only block groups

If we have a block group that is all of the following:
1) uncached in memory
2) is read-only
3) has a disk cache state that indicates we need to recreate the cache

AND the file system has enough free space fragmentation such that the
request for an extent of a given size can't be honored;

AND have a single CPU core;

AND it's the block group with the highest starting offset such that
there are no opportunities (like reading from disk) for the loop to
yield the CPU;

We can end up with a lockup.

The root cause is simple.  Once we're in the position that we've read in
all of the other block groups directly and none of those block groups
can honor the request, there are no more opportunities to sleep.  We end
up trying to start a caching thread which never gets run if we only have
one core.  This *should* present as a hung task waiting on the caching
thread to make some progress, but it doesn't.  Instead, it degrades into
a busy loop because of the placement of the read-only check.

During the first pass through the loop, block_group->cached will be set
to BTRFS_CACHE_STARTED and have_caching_bg will be set.  Then we hit the
read-only check and short circuit the loop.  We're not yet in
LOOP_CACHING_WAIT, so we skip that loop back before going through the
loop again for other raid groups.

Then we move to LOOP_CACHING_WAIT state.

During the this pass through the loop, ->cached will still be
BTRFS_CACHE_STARTED, which means it's not cached, so we'll enter
cache_block_group, do a lot of nothing, and return, and also set
have_caching_bg again.  Then we hit the read-only check and short circuit
the loop.  The same thing happens as before except now we DO trigger
the LOOP_CACHING_WAIT && have_caching_bg check and loop back up to the
top.  We do this forever.

There are two fixes in this patch since they address the same underlying
bug.

The first is to add a cond_resched to the end of the loop to ensure
that the caching thread always has an opportunity to run.  This will
fix the soft lockup issue, but find_free_extent will still loop doing
nothing until the thread has completed.

The second is to move the read-only check to the top of the loop.  We're
never going to return an allocation within a read-only block group so
we may as well skip it early.  The check for ->cached == BTRFS_CACHE_ERROR
would cause the same problem except that BTRFS_CACHE_ERROR is considered
a "done" state and we won't re-set have_caching_bg again.

Many thanks to Stephan Kulow <coolo@suse.de> for his excellent help in
the testing process.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agoARM: 8687/1: signal: Fix unparseable iwmmxt_sigframe in uc_regspace[]
Dave Martin [Fri, 30 Jun 2017 17:56:59 +0000 (18:56 +0100)]
ARM: 8687/1: signal: Fix unparseable iwmmxt_sigframe in uc_regspace[]

In kernels with CONFIG_IWMMXT=y running on non-iWMMXt hardware, the
signal frame can be left partially uninitialised in such a way
that userspace cannot parse uc_regspace[] safely.  In particular,
this means that the VFP registers cannot be located reliably in the
signal frame when a multi_v7_defconfig kernel is run on the
majority of platforms.

The cause is that the uc_regspace[] is laid out statically based on
the kernel config, but the decision of whether to save/restore the
iWMMXt registers must be a runtime decision.

To minimise breakage of software that may assume a fixed layout,
this patch emits a dummy block of the same size as iwmmxt_sigframe,
for non-iWMMXt threads.  However, the magic and size of this block
are now filled in to help parsers skip over it.  A new DUMMY_MAGIC
is defined for this purpose.

It is probably legitimate (if non-portable) for userspace to
manufacture its own sigframe for sigreturn, and there is no obvious
reason why userspace should be required to insert a DUMMY_MAGIC
block when running on non-iWMMXt hardware, when omitting it has
worked just fine forever in other configurations.  So in this case,
sigreturn does not require this block to be present.

Reported-by: Edmund Grimley-Evans <Edmund.Grimley-Evans@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
6 years agoARM: 8686/1: iwmmxt: Add missing __user annotations to sigframe accessors
Dave Martin [Fri, 30 Jun 2017 17:56:09 +0000 (18:56 +0100)]
ARM: 8686/1: iwmmxt: Add missing __user annotations to sigframe accessors

preserve_iwmmxt_context() and restore_iwmmxt_context() lack __user
accessors on their arguments pointing to the user signal frame.

There does not be appear to be a bug here, but this omission is
inconsistent with the crunch and vfp sigframe access functions.

This patch adds the annotations, for consistency.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
6 years agokprobes/x86: Release insn_slot in failure path
Masami Hiramatsu [Fri, 21 Jul 2017 14:45:52 +0000 (23:45 +0900)]
kprobes/x86: Release insn_slot in failure path

The following commit:

  003002e04ed3 ("kprobes: Fix arch_prepare_kprobe to handle copy insn failures")

returns an error if the copying of the instruction, but does not release
the allocated insn_slot.

Clean up correctly.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 003002e04ed3 ("kprobes: Fix arch_prepare_kprobe to handle copy insn failures")
Link: http://lkml.kernel.org/r/150064834183.6172.11694375818447664416.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agoperf/x86/intel/uncore: Fix missing marker for skx_uncore_cha_extra_regs
Stephane Eranian [Thu, 13 Jul 2017 17:35:50 +0000 (10:35 -0700)]
perf/x86/intel/uncore: Fix missing marker for skx_uncore_cha_extra_regs

This skx_uncore_cha_extra_regs array was missing an end-marker.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1499967350-10385-7-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agoperf/x86/intel/uncore: Fix SKX CHA event extra regs
Stephane Eranian [Thu, 13 Jul 2017 17:35:49 +0000 (10:35 -0700)]
perf/x86/intel/uncore: Fix SKX CHA event extra regs

This patch adds two missing event extra regs for Skylake Server CHA PMU:

 - TOR_INSERTS
 - TOR_OCCUPANCY

Were missing support for all the filters, including opcode matchers.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1499967350-10385-6-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agoperf/x86/intel/uncore: Remove invalid Skylake server CHA filter field
Kan Liang [Thu, 13 Jul 2017 17:35:48 +0000 (10:35 -0700)]
perf/x86/intel/uncore: Remove invalid Skylake server CHA filter field

There is no field c6 and link for CHA BOX FILTER.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1499967350-10385-5-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agoperf/x86/intel/uncore: Fix Skylake server CHA LLC_LOOKUP event umask
Kan Liang [Thu, 13 Jul 2017 17:35:47 +0000 (10:35 -0700)]
perf/x86/intel/uncore: Fix Skylake server CHA LLC_LOOKUP event umask

Correct the umask for LLC_LOOKUP.LOCAL and LLC_LOOKUP.REMOTE events

Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1499967350-10385-4-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agoperf/x86/intel/uncore: Fix Skylake server PCU PMU event format
Kan Liang [Thu, 13 Jul 2017 17:35:46 +0000 (10:35 -0700)]
perf/x86/intel/uncore: Fix Skylake server PCU PMU event format

PCU event format for SKX are different from snbep. Introduce a new
format group for SKX PCU.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1499967350-10385-3-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agoperf/x86/intel/uncore: Fix Skylake UPI PMU event masks
Stephane Eranian [Thu, 13 Jul 2017 17:35:45 +0000 (10:35 -0700)]
perf/x86/intel/uncore: Fix Skylake UPI PMU event masks

This patch fixes the event_mask and event_ext_mask for the Intel Skylake
Server UPI PMU. Bit 21 is not used as a filter. The extended umask is
from bit 32 to bit 55. Correct both umasks.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1499967350-10385-2-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agoKVM: VMX: remove unused field
Paolo Bonzini [Fri, 14 Jul 2017 11:38:28 +0000 (13:38 +0200)]
KVM: VMX: remove unused field

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agoKVM: PPC: Book3S HV: Fix host crash on changing HPT size
Paul Mackerras [Fri, 21 Jul 2017 05:41:49 +0000 (15:41 +1000)]
KVM: PPC: Book3S HV: Fix host crash on changing HPT size

Commit f98a8bf9ee20 ("KVM: PPC: Book3S HV: Allow KVM_PPC_ALLOCATE_HTAB
ioctl() to change HPT size", 2016-12-20) changed the behaviour of
the KVM_PPC_ALLOCATE_HTAB ioctl so that it now allocates a new HPT
and new revmap array if there was a previously-allocated HPT of a
different size from the size being requested.  In this case, we need
to reset the rmap arrays of the memslots, because the rmap arrays
will contain references to HPTEs which are no longer valid.  Worse,
these references are also references to slots in the new revmap
array (which parallels the HPT), and the new revmap array contains
random contents, since it doesn't get zeroed on allocation.

The effect of having these stale references to slots in the revmap
array that contain random contents is that subsequent calls to
functions such as kvmppc_add_revmap_chain will crash because they
will interpret the non-zero contents of the revmap array as HPTE
indexes and thus index outside of the revmap array.  This leads to
host crashes such as the following.

[ 7072.862122] Unable to handle kernel paging request for data at address 0xd000000c250c00f8
[ 7072.862218] Faulting instruction address: 0xc0000000000e1c78
[ 7072.862233] Oops: Kernel access of bad area, sig: 11 [#1]
[ 7072.862286] SMP NR_CPUS=1024
[ 7072.862286] NUMA
[ 7072.862325] PowerNV
[ 7072.862378] Modules linked in: kvm_hv vhost_net vhost tap xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm iw_cxgb3 mlx5_ib ib_core ses enclosure scsi_transport_sas ipmi_powernv ipmi_devintf ipmi_msghandler powernv_op_panel i2c_opal nfsd auth_rpcgss oid_registry
[ 7072.863085]  nfs_acl lockd grace sunrpc kvm_pr kvm xfs libcrc32c scsi_dh_alua dm_service_time radeon lpfc nvme_fc nvme_fabrics nvme_core scsi_transport_fc i2c_algo_bit tg3 drm_kms_helper ptp pps_core syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm dm_multipath i2c_core cxgb3 mlx5_core mdio [last unloaded: kvm_hv]
[ 7072.863381] CPU: 72 PID: 56929 Comm: qemu-system-ppc Not tainted 4.12.0-kvm+ #59
[ 7072.863457] task: c000000fe29e7600 task.stack: c000001e3ffec000
[ 7072.863520] NIP: c0000000000e1c78 LR: c0000000000e2e3c CTR: c0000000000e25f0
[ 7072.863596] REGS: c000001e3ffef560 TRAP: 0300   Not tainted  (4.12.0-kvm+)
[ 7072.863658] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE,TM[E]>
[ 7072.863667]   CR: 44082882  XER: 20000000
[ 7072.863767] CFAR: c0000000000e2e38 DAR: d000000c250c00f8 DSISR: 42000000 SOFTE: 1
GPR00: c0000000000e2e3c c000001e3ffef7e0 c000000001407d00 d000000c250c00f0
GPR04: d00000006509fb70 d00000000b3d2048 0000000003ffdfb7 0000000000000000
GPR08: 00000001007fdfb7 00000000c000000f d0000000250c0000 000000000070f7bf
GPR12: 0000000000000008 c00000000fdad000 0000000010879478 00000000105a0d78
GPR16: 00007ffaf4080000 0000000000001190 0000000000000000 0000000000010000
GPR20: 4001ffffff000415 d00000006509fb70 0000000004091190 0000000ee1881190
GPR24: 0000000003ffdfb7 0000000003ffdfb7 00000000007fdfb7 c000000f5c958000
GPR28: d00000002d09fb70 0000000003ffdfb7 d00000006509fb70 d00000000b3d2048
[ 7072.864439] NIP [c0000000000e1c78] kvmppc_add_revmap_chain+0x88/0x130
[ 7072.864503] LR [c0000000000e2e3c] kvmppc_do_h_enter+0x84c/0x9e0
[ 7072.864566] Call Trace:
[ 7072.864594] [c000001e3ffef7e0] [c000001e3ffef830] 0xc000001e3ffef830 (unreliable)
[ 7072.864671] [c000001e3ffef830] [c0000000000e2e3c] kvmppc_do_h_enter+0x84c/0x9e0
[ 7072.864751] [c000001e3ffef920] [d00000000b38d878] kvmppc_map_vrma+0x168/0x200 [kvm_hv]
[ 7072.864831] [c000001e3ffef9e0] [d00000000b38a684] kvmppc_vcpu_run_hv+0x1284/0x1300 [kvm_hv]
[ 7072.864914] [c000001e3ffefb30] [d00000000f465664] kvmppc_vcpu_run+0x44/0x60 [kvm]
[ 7072.865008] [c000001e3ffefb60] [d00000000f461864] kvm_arch_vcpu_ioctl_run+0x114/0x290 [kvm]
[ 7072.865152] [c000001e3ffefbe0] [d00000000f453c98] kvm_vcpu_ioctl+0x598/0x7a0 [kvm]
[ 7072.865292] [c000001e3ffefd40] [c000000000389328] do_vfs_ioctl+0xd8/0x8c0
[ 7072.865410] [c000001e3ffefde0] [c000000000389be4] SyS_ioctl+0xd4/0x130
[ 7072.865526] [c000001e3ffefe30] [c00000000000b760] system_call+0x58/0x6c
[ 7072.865644] Instruction dump:
[ 7072.865715] e95b2110 793a0020 7b4926e4 7f8a4a14 409e0098 807c000c 786326e4 7c6a1a14
[ 7072.865857] 935e0008 7bbd0020 813c000c 913e000c <93a3000893bc000c 48000038 60000000
[ 7072.866001] ---[ end trace 627b6e4bf8080edc ]---

Note that to trigger this, it is necessary to use a recent upstream
QEMU (or other userspace that resizes the HPT at CAS time), specify
a maximum memory size substantially larger than the current memory
size, and boot a guest kernel that does not support HPT resizing.

This fixes the problem by resetting the rmap arrays when the old HPT
is freed.

Fixes: f98a8bf9ee20 ("KVM: PPC: Book3S HV: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size")
Cc: stable@vger.kernel.org # v4.11+
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
6 years agoKVM: PPC: Book3S HV: Enable TM before accessing TM registers
Paul Mackerras [Fri, 21 Jul 2017 03:57:14 +0000 (13:57 +1000)]
KVM: PPC: Book3S HV: Enable TM before accessing TM registers

Commit 46a704f8409f ("KVM: PPC: Book3S HV: Preserve userspace HTM state
properly", 2017-06-15) added code to read transactional memory (TM)
registers but forgot to enable TM before doing so.  The result is
that if userspace does have live values in the TM registers, a KVM_RUN
ioctl will cause a host kernel crash like this:

[  181.328511] Unrecoverable TM Unavailable Exception f60 at d00000001e7d9980
[  181.328605] Oops: Unrecoverable TM Unavailable Exception, sig: 6 [#1]
[  181.328613] SMP NR_CPUS=2048
[  181.328613] NUMA
[  181.328618] PowerNV
[  181.328646] Modules linked in: vhost_net vhost tap nfs_layout_nfsv41_files rpcsec_gss_krb5 nfsv4 dns_resolver nfs
+fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat
+nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables
+ip6table_filter ip6_tables iptable_filter bridge stp llc kvm_hv kvm nfsd ses enclosure scsi_transport_sas ghash_generic
+auth_rpcgss gf128mul xts sg ctr nfs_acl lockd vmx_crypto shpchp ipmi_powernv i2c_opal grace ipmi_devintf i2c_core
+powernv_rng sunrpc ipmi_msghandler ibmpowernv uio_pdrv_genirq uio leds_powernv powernv_op_panel ip_tables xfs sd_mod
+lpfc ipr bnx2x libata mdio ptp pps_core scsi_transport_fc libcrc32c dm_mirror dm_region_hash dm_log dm_mod
[  181.329278] CPU: 40 PID: 9926 Comm: CPU 0/KVM Not tainted 4.12.0+ #1
[  181.329337] task: c000003fc6980000 task.stack: c000003fe4d80000
[  181.329396] NIP: d00000001e7d9980 LR: d00000001e77381c CTR: d00000001e7d98f0
[  181.329465] REGS: c000003fe4d837e0 TRAP: 0f60   Not tainted  (4.12.0+)
[  181.329523] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
[  181.329527]   CR: 24022448  XER: 00000000
[  181.329608] CFAR: d00000001e773818 SOFTE: 1
[  181.329608] GPR00: d00000001e77381c c000003fe4d83a60 d00000001e7ef410 c000003fdcfe0000
[  181.329608] GPR04: c000003fe4f00000 0000000000000000 0000000000000000 c000003fd7954800
[  181.329608] GPR08: 0000000000000001 c000003fc6980000 0000000000000000 d00000001e7e2880
[  181.329608] GPR12: d00000001e7d98f0 c000000007b19000 00000001295220e0 00007fffc0ce2090
[  181.329608] GPR16: 0000010011886608 00007fff8c89f260 0000000000000001 00007fff8c080028
[  181.329608] GPR20: 0000000000000000 00000100118500a6 0000010011850000 0000010011850000
[  181.329608] GPR24: 00007fffc0ce1b48 0000010011850000 00000000d673b901 0000000000000000
[  181.329608] GPR28: 0000000000000000 c000003fdcfe0000 c000003fdcfe0000 c000003fe4f00000
[  181.330199] NIP [d00000001e7d9980] kvmppc_vcpu_run_hv+0x90/0x6b0 [kvm_hv]
[  181.330264] LR [d00000001e77381c] kvmppc_vcpu_run+0x2c/0x40 [kvm]
[  181.330322] Call Trace:
[  181.330351] [c000003fe4d83a60] [d00000001e773478] kvmppc_set_one_reg+0x48/0x340 [kvm] (unreliable)
[  181.330437] [c000003fe4d83b30] [d00000001e77381c] kvmppc_vcpu_run+0x2c/0x40 [kvm]
[  181.330513] [c000003fe4d83b50] [d00000001e7700b4] kvm_arch_vcpu_ioctl_run+0x114/0x2a0 [kvm]
[  181.330586] [c000003fe4d83bd0] [d00000001e7642f8] kvm_vcpu_ioctl+0x598/0x7a0 [kvm]
[  181.330658] [c000003fe4d83d40] [c0000000003451b8] do_vfs_ioctl+0xc8/0x8b0
[  181.330717] [c000003fe4d83de0] [c000000000345a64] SyS_ioctl+0xc4/0x120
[  181.330776] [c000003fe4d83e30] [c00000000000b004] system_call+0x58/0x6c
[  181.330833] Instruction dump:
[  181.330869] e92d0260 e9290b50 e9290108 792807e3 41820058 e92d0260 e9290b50 e9290108
[  181.330941] 792ae8a4 794a1f87 408204f4 e92d0260 <7d4022a6f9490ff0 e92d0260 7d4122a6
[  181.331013] ---[ end trace 6f6ddeb4bfe92a92 ]---

The fix is just to turn on the TM bit in the MSR before accessing the
registers.

Cc: stable@vger.kernel.org # v3.14+
Fixes: 46a704f8409f ("KVM: PPC: Book3S HV: Preserve userspace HTM state properly")
Reported-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
6 years agoMerge branch 'drm-vmwgfx-fixes' of git://people.freedesktop.org/~syeh/repos_linux...
Dave Airlie [Mon, 24 Jul 2017 05:57:28 +0000 (15:57 +1000)]
Merge branch 'drm-vmwgfx-fixes' of git://people.freedesktop.org/~syeh/repos_linux into drm-fixes

misc vmwgfx fixes.

* 'drm-vmwgfx-fixes' of git://people.freedesktop.org/~syeh/repos_linux:
  drm/vmwgfx: constify pci_device_id.
  drm/vmwgfx: Fix gcc-7.1.1 warning
  drm/vmwgfx: Fix cursor hotspot issue with Wayland on Fedora
  drm/vmwgfx: Limit max desktop dimensions to 8Kx8K
  drm/vmwgfx: dma-buf: Constify ttm_place structures.
  drm/vmwgfx: fix comment mistake for vmw_cmd_dx_set_index_buffer()
  drm/vmwgfx: Use dma_pool_zalloc
  drm/vmwgfx: Fix handling of errors returned by 'vmw_cotable_alloc()'
  drm/vmwgfx: Fix NULL pointer comparison

6 years agoMerge branch 'linux-4.13' of git://github.com/skeggsb/linux into drm-fixes
Dave Airlie [Mon, 24 Jul 2017 05:38:53 +0000 (15:38 +1000)]
Merge branch 'linux-4.13' of git://github.com/skeggsb/linux into drm-fixes

nouveau regression fixes.

* 'linux-4.13' of git://github.com/skeggsb/linux:
  drm/nouveau/kms: remove call to drm_crtc_vblank_off() during unload/suspend
  drm/nouveau/kms/nv50: update vblank state in response to modeset actions
  drm/nouveau/disp: add tv encoders to output resource mapping
  drm/nouveau/i2c/gf119-: add support for address-only transactions

6 years agodrm/nouveau/kms: remove call to drm_crtc_vblank_off() during unload/suspend
Ben Skeggs [Tue, 18 Jul 2017 22:47:07 +0000 (08:47 +1000)]
drm/nouveau/kms: remove call to drm_crtc_vblank_off() during unload/suspend

These on()/off() calls should be done as a result of modesetting actions,
and as we shut down all heads already on unload/suspend, it's pointless
to call off() again.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
6 years agodrm/nouveau/kms/nv50: update vblank state in response to modeset actions
Ben Skeggs [Mon, 24 Jul 2017 01:01:52 +0000 (11:01 +1000)]
drm/nouveau/kms/nv50: update vblank state in response to modeset actions

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
6 years agodrm/nouveau/disp: add tv encoders to output resource mapping
Ben Skeggs [Tue, 18 Jul 2017 03:30:29 +0000 (13:30 +1000)]
drm/nouveau/disp: add tv encoders to output resource mapping

We don't support them on G80, but we need to add them to the mapping to
avoid triggering a WARN_ON() on GPUs where the ports are present.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
6 years agodrm/nouveau/i2c/gf119-: add support for address-only transactions
Ben Skeggs [Wed, 19 Jul 2017 06:49:59 +0000 (16:49 +1000)]
drm/nouveau/i2c/gf119-: add support for address-only transactions

Since switching the I2C-over-AUX helpers, there have been regressions on
some display combinations due to us not having support for "address only"
transactions.

This commits enables support for them for GF119 and newer.

Earlier GPUs have been reverted to a custom I2C-over-AUX algorithm.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
6 years agodrm/rockchip: fix Kconfig dependencies
Arnd Bergmann [Fri, 21 Jul 2017 21:12:06 +0000 (23:12 +0200)]
drm/rockchip: fix Kconfig dependencies

A bug that I had fixed earlier just came back, with CONFIG_EXTCON=m,
the rockchip drm driver will fail to link:

drivers/gpu/drm/rockchip/cdn-dp-core.o: In function `cdn_dp_get_port_lanes':
cdn-dp-core.c:(.text.cdn_dp_get_port_lanes+0x30): undefined reference to `extcon_get_state'
cdn-dp-core.c:(.text.cdn_dp_get_port_lanes+0x6c): undefined reference to `extcon_get_property'
drivers/gpu/drm/rockchip/cdn-dp-core.o: In function `cdn_dp_check_sink_connection':
cdn-dp-core.c:(.text.cdn_dp_check_sink_connection+0x80): undefined reference to `extcon_get_state'
drivers/gpu/drm/rockchip/cdn-dp-core.o: In function `cdn_dp_enable':
cdn-dp-core.c:(.text.cdn_dp_enable+0x748): undefined reference to `extcon_get_property'

The problem is that that the sub-drivers are now all linked into the
main rockchip drm module, which breaks all the Kconfig dependencies
that are specified in the options for those sub-drivers.

This clarifies the dependency to ensure that we can only turn on the DP
driver when EXTCON is reachable. As the 'select' statements can now
cause additional options to become built-in when they should be
loadable modules, I'm moving those into the main driver config option.
The dependency on DRM_ROCKCHIP can be reduced into a single 'if'
statement here for brevity, but this has no functional effect.

Fixes: b6705157b2db ("drm/rockchip: add extcon dependency for DP")
Fixes: 8820b68bd378 ("drm/rockchip: Refactor the component match logic.")
Link: https://patchwork.kernel.org/patch/9648761/
Acked-by: Guenter Roeck <groeck@chromium.org>
Tested-by: Jeffy Chen <jeffy.chen@rock-chips.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mark Yao <mark.yao@rock-chips.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721211214.3386387-1-arnd@arndb.de
6 years agoLinux 4.13-rc2 v4.13-rc2
Linus Torvalds [Sun, 23 Jul 2017 23:15:17 +0000 (16:15 -0700)]
Linux 4.13-rc2

6 years agoProperly alphabetize MAINTAINERS file
Linus Torvalds [Sun, 23 Jul 2017 23:06:21 +0000 (16:06 -0700)]
Properly alphabetize MAINTAINERS file

This adds a perl script to actually parse the MAINTAINERS file, clean up
some whitespace in it, warn about errors in it, and then properly sort
the end result.

My perl-fu is atrocious, so the script has basically been created by
randomly putting various characters in a pile, mixing them around, and
then looking it the end result does anything interesting when used as a
perl script.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoFix up MAINTAINERS file problems
Linus Torvalds [Sun, 23 Jul 2017 22:08:05 +0000 (15:08 -0700)]
Fix up MAINTAINERS file problems

Prepping for scripting the MAINTAINERS file cleanup (and possible split)
showed a couple of cases where the headers for a couple of entries were
bogus.

There's a few different kinds of bogosities:

 - the X-GENE SOC EDAC case was confused and split over two lines

 - there were four entries for "GREYBUS PROTOCOLS DRIVERS" that were all
   different things.

 - the NOKIA N900 CAMERA SUPPORT" was duplicated

all of which were more obvious when you started doing associative arrays
in perl to track these things by the header (so that we can alphabetize
this thing properly, and so that we might split it up by the data too).

Cc: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoparisc: regenerate defconfig files
Helge Deller [Sun, 23 Jul 2017 20:08:42 +0000 (22:08 +0200)]
parisc: regenerate defconfig files

Signed-off-by: Helge Deller <deller@gmx.de>
6 years agoparisc: pdc_stable: constify attribute_group structures.
Arvind Yadav [Tue, 11 Jul 2017 09:00:34 +0000 (14:30 +0530)]
parisc: pdc_stable: constify attribute_group structures.

attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work
with const attribute_group. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
6 years agoparisc: Merge millicode routines via linker script
Helge Deller [Sun, 9 Jul 2017 20:50:40 +0000 (22:50 +0200)]
parisc: Merge millicode routines via linker script

When compiling the 4.13-rc kernel I got those linker errors:
libgcc2.c:(.text+0x110): relocation truncated to fit: R_PARISC_PCREL22F against symbol `$$divU'
defined in .text.div section in /usr/lib/gcc/hppa64-linux-gnu/4.9.2/libgcc.a(_divU.o)
hppa64-linux-gnu-ld: /usr/lib/gcc/hppa64-linux-gnu/4.9.2/libgcc.a(_moddi3.o)(.text+0x174): cannot reach $$divU

Avoid such errors by bundling the millicode routines in the linker script.

Signed-off-by: Helge Deller <deller@gmx.de>
6 years agoparisc: Disable further stack checks when panic occurs during stack check
Helge Deller [Sat, 8 Jul 2017 21:25:14 +0000 (23:25 +0200)]
parisc: Disable further stack checks when panic occurs during stack check

Before the irq handler detects a low stack and then panics the kernel, disable
further stack checks to avoid recursive panics.

Reported-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
6 years agoMerge tag 'for-linus-4.13b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 23 Jul 2017 18:22:45 +0000 (11:22 -0700)]
Merge tag 'for-linus-4.13b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "Some fixes and cleanups for running under Xen"

* tag 'for-linus-4.13b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/balloon: don't online new memory initially
  xen/x86: fix cpu hotplug
  xen/grant-table: log the lack of grants
  xen/x86: Don't BUG on CPU0 offlining

6 years agoxen/balloon: don't online new memory initially
Juergen Gross [Mon, 10 Jul 2017 08:10:45 +0000 (10:10 +0200)]
xen/balloon: don't online new memory initially

When setting up the Xenstore watch for the memory target size the new
watch will fire at once. Don't try to reach the configured target size
by onlining new memory in this case, as the current memory size will
be smaller in almost all cases due to e.g. BIOS reserved pages.

Onlining new memory will lead to more problems e.g. undesired conflicts
with NVMe devices meant to be operated as block devices.

Instead remember the difference between target size and current size
when the watch fires for the first time and apply it to any further
size changes, too.

In order to avoid races between balloon.c and xen-balloon.c init calls
do the xen-balloon.c initialization from balloon.c.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
6 years agoxen/x86: fix cpu hotplug
Juergen Gross [Wed, 5 Jul 2017 14:05:20 +0000 (16:05 +0200)]
xen/x86: fix cpu hotplug

Commit dc6416f1d711eb4c1726e845d653235dcaae12e1 ("xen/x86: Call
cpu_startup_entry(CPUHP_AP_ONLINE_IDLE) from xen_play_dead()")
introduced an error leading to a stack overflow of the idle task when
a cpu was brought offline/online many times: by calling
cpu_startup_entry() instead of returning at the end of xen_play_dead()
do_idle() would be entered again and again.

Don't use cpu_startup_entry(), but cpuhp_online_idle() instead allowing
to return from xen_play_dead().

Cc: <stable@vger.kernel.org> # 4.12
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
6 years agoxen/grant-table: log the lack of grants
Wengang Wang [Tue, 18 Jul 2017 07:40:35 +0000 (09:40 +0200)]
xen/grant-table: log the lack of grants

log a message when we enter this situation:
1) we already allocated the max number of available grants from hypervisor
and
2) we still need more (but the request fails because of 1)).

Sometimes the lack of grants causes IO hangs in xen_blkfront devices.
Adding this log would help debuging.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
6 years agoxen/x86: Don't BUG on CPU0 offlining
Vitaly Kuznetsov [Mon, 26 Jun 2017 16:39:30 +0000 (18:39 +0200)]
xen/x86: Don't BUG on CPU0 offlining

CONFIG_BOOTPARAM_HOTPLUG_CPU0 allows to offline CPU0 but Xen HVM guests
BUG() in xen_teardown_timer(). Remove the BUG_ON(), this is probably a
leftover from ancient times when CPU0 hotplug was impossible, it works
just fine for HVM.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
6 years agonbd: only set sndtimeo if we have a timeout set
Josef Bacik [Fri, 21 Jul 2017 14:48:15 +0000 (10:48 -0400)]
nbd: only set sndtimeo if we have a timeout set

A user reported that he was getting immediate disconnects with my
sndtimeo patch applied.  This is because by default the OSS nbd client
doesn't set a timeout, so we end up setting the sndtimeo to 0, which of
course means we have send errors a lot.  Instead only set our sndtimeo
if the user specified a timeout, otherwise we'll just wait forever like
we did previously.

Fixes: dc88e34d69d8 ("nbd: set sk->sk_sndtimeo for our sockets")
Reported-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agonbd: take tx_lock before disconnecting
Josef Bacik [Fri, 21 Jul 2017 14:48:14 +0000 (10:48 -0400)]
nbd: take tx_lock before disconnecting

We need to take the tx_lock so we don't interleave our disconnect
request between real data going down the wire.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agonbd: allow multiple disconnects to be sent
Josef Bacik [Fri, 21 Jul 2017 14:48:13 +0000 (10:48 -0400)]
nbd: allow multiple disconnects to be sent

There's no reason to limit ourselves to one disconnect message per
socket.  Sometimes networks do strange things, might as well let
sysadmins hit the panic button as much as they want.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>