]> git.kernelconcepts.de Git - karo-tx-linux.git/log
karo-tx-linux.git
6 years agohysdn: fix to a race condition in put_log_buffer
Anton Volkov [Mon, 7 Aug 2017 12:54:14 +0000 (15:54 +0300)]
hysdn: fix to a race condition in put_log_buffer

The synchronization type that was used earlier to guard the loop that
deletes unused log buffers may lead to a situation that prevents any
thread from going through the loop.

The patch deletes previously used synchronization mechanism and moves
the loop under the spin_lock so the similar cases won't be feasible in
the future.

Found by by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Anton Volkov <avolkov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agos390/qeth: fix L3 next-hop in xmit qeth hdr
Julian Wiedmann [Mon, 7 Aug 2017 11:28:39 +0000 (13:28 +0200)]
s390/qeth: fix L3 next-hop in xmit qeth hdr

On L3, the qeth_hdr struct needs to be filled with the next-hop
IP address.
The current code accesses rtable->rt_gateway without checking that
rtable is a valid address. The accidental access to a lowcore area
results in a random next-hop address in the qeth_hdr.
rtable (or more precisely, skb_dst(skb)) can be NULL in rare cases
(for instance together with AF_PACKET sockets).
This patch adds the missing NULL-ptr checks.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Fixes: 87e7597b5a3 qeth: Move away from using neighbour entries in qeth_l3_fill_header()
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'rdma-rc-2017-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/leon...
Doug Ledford [Mon, 7 Aug 2017 17:30:40 +0000 (13:30 -0400)]
Merge tag 'rdma-rc-2017-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma into leon-ipoib

IPoIB fixes for 4.13

The patchset provides various fixes for IPoIB. It is combination of
fixes to various issues discovered during verification along with
static checkers cleanup patches.

Most of the patches are from pre-git era and hence lack of Fixes lines.

There is one exception in this IPoIB group - addition of patch revert:
Revert "IB/core: Allow QP state transition from reset to error", but
it followed by proper fix to the annoying print, so I thought it is
appropriate to include it.

Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoMerge branch 'asix-Improve-robustness'
David S. Miller [Mon, 7 Aug 2017 17:10:19 +0000 (10:10 -0700)]
Merge branch 'asix-Improve-robustness'

Dean Jenkins says:

====================
asix: Improve robustness

Please consider taking these patches to improve the robustness of the ASIX USB
to Ethernet driver.

Failures prompting an ASIX driver code review
=============================================

On an ARM i.MX6 embedded platform some strange one-off and two-off failures were
observed in and around the ASIX USB to Ethernet driver. This was observed on a
highly modified kernel 3.14 with the ASIX driver containing back-ported changes
from kernel.org up to kernel 4.8 approximately.

a) A one-off failure in asix_rx_fixup_internal():

There was an occurrence of an attempt to write off the end of the netdev buffer
which was trapped by skb_over_panic() in skb_put().

[20030.846440] skbuff: skb_over_panic: text:7f2271c0 len:120 put:60 head:8366ecc0 data:8366ed02 tail:0x8366ed7a end:0x8366ed40 dev:eth0
[20030.863007] Kernel BUG at 8044ce38 [verbose debug info unavailable]

[20031.215345] Backtrace:
[20031.217884] [<8044cde0>] (skb_panic) from [<8044d50c>] (skb_put+0x50/0x5c)
[20031.227408] [<8044d4bc>] (skb_put) from [<7f2271c0>] (asix_rx_fixup_internal+0x1c4/0x23c [asix])
[20031.242024] [<7f226ffc>] (asix_rx_fixup_internal [asix]) from [<7f22724c>] (asix_rx_fixup_common+0x14/0x18 [asix])
[20031.260309] [<7f227238>] (asix_rx_fixup_common [asix]) from [<7f21f7d4>] (usbnet_bh+0x74/0x224 [usbnet])
[20031.269879] [<7f21f760>] (usbnet_bh [usbnet]) from [<8002f834>] (call_timer_fn+0xa4/0x1f0)
[20031.283961] [<8002f790>] (call_timer_fn) from [<80030834>] (run_timer_softirq+0x230/0x2a8)
[20031.302782] [<80030604>] (run_timer_softirq) from [<80028780>] (__do_softirq+0x15c/0x37c)
[20031.321511] [<80028624>] (__do_softirq) from [<80028c38>] (irq_exit+0x8c/0xe8)
[20031.339298] [<80028bac>] (irq_exit) from [<8000e9c8>] (handle_IRQ+0x8c/0xc8)
[20031.350038] [<8000e93c>] (handle_IRQ) from [<800085c8>] (gic_handle_irq+0xb8/0xf8)
[20031.365528] [<80008510>] (gic_handle_irq) from [<8050de80>] (__irq_svc+0x40/0x70)

Analysis of the logic of the ASIX driver (containing backported changes from
kernel.org up to kernel 4.8 approximately) suggested that the software could not
trigger skb_over_panic(). The analysis of the kernel BUG() crash information
suggested that the netdev buffer was written with 2 minimal 60 octet length
Ethernet frames (ASIX hardware drops the 4 octet FCS field) and the 2nd Ethernet
frame attempted to write off the end of the netdev buffer.

Note that the netdev buffer should only contain 1 Ethernet frame so if an
attempt to write 2 Ethernet frames into the buffer is made then that is wrong.
However, the logic of the asix_rx_fixup_internal() only allows 1 Ethernet frame
to be written into the netdev buffer.

Potentially this failure was due to memory corruption because it was only seen
once.

b) Two-off failures in the NAPI layer's backlog queue:

There were 2 crashes in the NAPI layer's backlog queue presumably after
asix_rx_fixup_internal() called usbnet_skb_return().

[24097.273945] Unable to handle kernel NULL pointer dereference at virtual address 00000004

[24097.398944] PC is at process_backlog+0x80/0x16c

[24097.569466] Backtrace:
[24097.572007] [<8045ad98>] (process_backlog) from [<8045b64c>] (net_rx_action+0xcc/0x248)
[24097.591631] [<8045b580>] (net_rx_action) from [<80028780>] (__do_softirq+0x15c/0x37c)
[24097.610022] [<80028624>] (__do_softirq) from [<800289cc>] (run_ksoftirqd+0x2c/0x84)

and

[ 1059.828452] Unable to handle kernel NULL pointer dereference at virtual address 00000000

[ 1059.953715] PC is at process_backlog+0x84/0x16c

[ 1060.140896] Backtrace:
[ 1060.143434] [<8045ad98>] (process_backlog) from [<8045b64c>] (net_rx_action+0xcc/0x248)
[ 1060.163075] [<8045b580>] (net_rx_action) from [<80028780>] (__do_softirq+0x15c/0x37c)
[ 1060.181474] [<80028624>] (__do_softirq) from [<80028c38>] (irq_exit+0x8c/0xe8)
[ 1060.199256] [<80028bac>] (irq_exit) from [<8000e9c8>] (handle_IRQ+0x8c/0xc8)
[ 1060.210006] [<8000e93c>] (handle_IRQ) from [<800085c8>] (gic_handle_irq+0xb8/0xf8)
[ 1060.225492] [<80008510>] (gic_handle_irq) from [<8050de80>] (__irq_svc+0x40/0x70)

The embedded board was only using an ASIX USB to Ethernet adaptor eth0.

Analysis suggested that the doubly-linked list pointers of the backlog queue had
been corrupted because one of the link pointers was NULL.

Potentially this failure was due to memory corruption because it was only seen
twice.

Results of the ASIX driver code review
======================================

During the code review some weaknesses were observed in the ASIX driver and the
following patches have been created to improve the robustness.

Brief overview of the patches
-----------------------------

1. asix: Add rx->ax_skb = NULL after usbnet_skb_return()

The current ASIX driver sends the received Ethernet frame to the NAPI layer of
the network stack via the call to usbnet_skb_return() in
asix_rx_fixup_internal() but retains the rx->ax_skb pointer to the netdev
buffer. The driver no longer needs the rx->ax_skb pointer at this point because
the NAPI layer now has the Ethernet frame.

This means that asix_rx_fixup_internal() must not use rx->ax_skb after the call
to usbnet_skb_return() because it could corrupt the handling of the Ethernet
frame within the network layer.

Therefore, to remove the risk of erroneous usage of rx->ax_skb, set rx->ax_skb
to NULL after the call to usbnet_skb_return(). This avoids potential erroneous
freeing of rx->ax_skb and erroneous writing to the netdev buffer.  If the
software now somehow inappropriately reused rx->ax_skb, then a NULL pointer
dereference of rx->ax_skb would occur which makes investigation easier.

2. asix: Ensure asix_rx_fixup_info members are all reset

This patch creates reset_asix_rx_fixup_info() to allow all the
asix_rx_fixup_info structure members to be consistently reset to initial
conditions.

Call reset_asix_rx_fixup_info() upon each detectable error condition so that the
next URB is processed from a known state.

Otherwise, there is a risk that some members of the asix_rx_fixup_info structure
may be incorrect after an error occurred so potentially leading to a
malfunction.

3. asix: Fix small memory leak in ax88772_unbind()

This patch creates asix_rx_fixup_common_free() to allow the rx->ax_skb to be
freed when necessary.

asix_rx_fixup_common_free() is called from ax88772_unbind() before the parent
private data structure is freed.

Without this patch, there is a risk of a small netdev buffer memory leak each
time ax88772_unbind() is called during the reception of an Ethernet frame that
spans across 2 URBs.

Testing
=======

The patches have been sanity tested on a 64-bit Linux laptop running kernel
4.13-rc2 with the 3 patches applied on top.

The ASIX USB to Adaptor used for testing was (output of lsusb):
ID 0b95:772b ASIX Electronics Corp. AX88772B

Test #1
-------

The test ran a flood ping test script which slowly incremented the ICMP Echo
Request's payload from 0 to 5000 octets. This eventually causes IPv4
fragmentation to occur which causes Ethernet frames to be sent very close to
each other so increases the probability that an Ethernet frame will span 2 URBs.
The test showed that all pings were successful. The test took about 15 minutes
to complete.

Test #2
-------

A script was run on the laptop to periodically run ifdown and ifup every second
so that the ASIX USB to Adaptor was up for 1 second and down for 1 second.

From a Linux PC connected to the laptop, the following ping command was used
ping -f -s 5000 <ip address of laptop>

The large ICMP payload causes IPv4 fragmentation resulting in multiple
Ethernet frames per original IP packet.

Kernel debug within the ASIX driver was enabled to see whether any ASIX errors
were generated. The test was run for about 24 hours and no ASIX errors were
seen.

Patches
=======

The 3 patches have been rebased off the net-next repo master branch with HEAD
fbbeefd net: fec: Allow reception of frames bigger than 1522 bytes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoasix: Fix small memory leak in ax88772_unbind()
Dean Jenkins [Mon, 7 Aug 2017 08:50:16 +0000 (09:50 +0100)]
asix: Fix small memory leak in ax88772_unbind()

When Ethernet frames span mulitple URBs, the netdev buffer memory
pointed to by the asix_rx_fixup_info structure remains allocated
during the time gap between the 2 executions of asix_rx_fixup_internal().

This means that if ax88772_unbind() is called within this time
gap to free the memory of the parent private data structure then
a memory leak of the part filled netdev buffer memory will occur.

Therefore, create a new function asix_rx_fixup_common_free() to
free the memory of the netdev buffer and add a call to
asix_rx_fixup_common_free() from inside ax88772_unbind().

Consequently when an unbind occurs part way through receiving
an Ethernet frame, the netdev buffer memory that is holding part
of the received Ethernet frame will now be freed.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoasix: Ensure asix_rx_fixup_info members are all reset
Dean Jenkins [Mon, 7 Aug 2017 08:50:15 +0000 (09:50 +0100)]
asix: Ensure asix_rx_fixup_info members are all reset

There is a risk that the members of the structure asix_rx_fixup_info
become unsynchronised leading to the possibility of a malfunction.

For example, rx->split_head was not being set to false after an
error was detected so potentially could cause a malformed 32-bit
Data header word to be formed.

Therefore add function reset_asix_rx_fixup_info() to reset all the
members of asix_rx_fixup_info so that future processing will start
with known initial conditions.

Also, if (skb->len != offset) becomes true then call
reset_asix_rx_fixup_info() so that the processing of the next URB
starts with known initial conditions. Without the call, the check
does nothing which potentially could lead to a malfunction
when the next URB is processed.

In addition, for robustness, call reset_asix_rx_fixup_info() before
every error path's "return 0". This ensures that the next URB is
processed from known initial conditions.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoasix: Add rx->ax_skb = NULL after usbnet_skb_return()
Dean Jenkins [Mon, 7 Aug 2017 08:50:14 +0000 (09:50 +0100)]
asix: Add rx->ax_skb = NULL after usbnet_skb_return()

In asix_rx_fixup_internal() there is a risk that rx->ax_skb gets
reused after passing the Ethernet frame into the network stack via
usbnet_skb_return().

The risks include:

a) asynchronously freeing rx->ax_skb after passing the netdev buffer
   to the NAPI layer which might corrupt the backlog queue.

b) erroneously reusing rx->ax_skb such as calling skb_put_data() multiple
   times which causes writing off the end of the netdev buffer.

Therefore add a defensive rx->ax_skb = NULL after usbnet_skb_return()
so that it is not possible to free rx->ax_skb or to apply
skb_put_data() too many times.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobpf: fix selftest/bpf/test_pkt_md_access on s390x
Thomas Richter [Mon, 7 Aug 2017 08:16:36 +0000 (10:16 +0200)]
bpf: fix selftest/bpf/test_pkt_md_access on s390x

Commit 18f3d6be6be1 ("selftests/bpf: Add test cases to test narrower ctx field loads")
introduced new eBPF test cases. One of them (test_pkt_md_access.c)
fails on s390x. The BPF verifier error message is:

[root@s8360046 bpf]# ./test_progs
test_pkt_access:PASS:ipv4 349 nsec
test_pkt_access:PASS:ipv6 212 nsec
[....]
libbpf: load bpf program failed: Permission denied
libbpf: -- BEGIN DUMP LOG ---
libbpf:
0: (71) r2 = *(u8 *)(r1 +0)
invalid bpf_context access off=0 size=1

libbpf: -- END LOG --
libbpf: failed to load program 'test1'
libbpf: failed to load object './test_pkt_md_access.o'
Summary: 29 PASSED, 1 FAILED
[root@s8360046 bpf]#

This is caused by a byte endianness issue. S390x is a big endian
architecture.  Pointer access to the lowest byte or halfword of a
four byte value need to add an offset.
On little endian architectures this offset is not needed.

Fix this and use the same approach as the originator used for other files
(for example test_verifier.c) in his original commit.

With this fix the test program test_progs succeeds on s390x:
[root@s8360046 bpf]# ./test_progs
test_pkt_access:PASS:ipv4 236 nsec
test_pkt_access:PASS:ipv6 217 nsec
test_xdp:PASS:ipv4 3624 nsec
test_xdp:PASS:ipv6 1722 nsec
test_l4lb:PASS:ipv4 926 nsec
test_l4lb:PASS:ipv6 1322 nsec
test_tcp_estats:PASS: 0 nsec
test_bpf_obj_id:PASS:get-fd-by-notexist-prog-id 0 nsec
test_bpf_obj_id:PASS:get-fd-by-notexist-map-id 0 nsec
test_bpf_obj_id:PASS:get-prog-info(fd) 0 nsec
test_bpf_obj_id:PASS:get-map-info(fd) 0 nsec
test_bpf_obj_id:PASS:get-prog-info(fd) 0 nsec
test_bpf_obj_id:PASS:get-map-info(fd) 0 nsec
test_bpf_obj_id:PASS:get-prog-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-prog-info(next_id->fd) 0 nsec
test_bpf_obj_id:PASS:get-prog-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-prog-info(next_id->fd) 0 nsec
test_bpf_obj_id:PASS:check total prog id found by get_next_id 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:check get-map-info(next_id->fd) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:check get-map-info(next_id->fd) 0 nsec
test_bpf_obj_id:PASS:check total map id found by get_next_id 0 nsec
test_pkt_md_access:PASS: 277 nsec
Summary: 30 PASSED, 0 FAILED
[root@s8360046 bpf]#

Fixes: 18f3d6be6be1 ("selftests/bpf: Add test cases to test narrower ctx field loads")
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agopinctrl: generic: update references to Documentation/pinctrl.txt
Ludovic Desroches [Sun, 6 Aug 2017 14:00:05 +0000 (16:00 +0200)]
pinctrl: generic: update references to Documentation/pinctrl.txt

Update deprecated references to Documentation/pinctrl.txt since it has been
moved to Documentation/driver-api/pinctl.rst.

Signed-off-by: Ludovic Desroches <ludovic.desroches@o2linux.fr>
Fixes: 5a9b73832e9e ("pinctrl.txt: move it to the driver-api book")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
6 years agopinctrl: intel: merrifield: Correct UART pin lists
Andy Shevchenko [Fri, 4 Aug 2017 16:26:34 +0000 (19:26 +0300)]
pinctrl: intel: merrifield: Correct UART pin lists

UART pin lists consist GPIO numbers which is simply wrong.
Replace it by pin numbers.

Fixes: 4e80c8f50574 ("pinctrl: intel: Add Intel Merrifield pin controller support")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
6 years agopinctrl: armada-37xx: Fix number of pin in south bridge
Gregory CLEMENT [Tue, 1 Aug 2017 15:57:20 +0000 (17:57 +0200)]
pinctrl: armada-37xx: Fix number of pin in south bridge

On the south bridge we have pin from to 29, so it gives 30 pins (and not
29).

Without this patch the kernel complain with the following traces:
cat /sys/kernel/debug/pinctrl/d0018800.pinctrl/pingroups
[  154.530205] armada-37xx-pinctrl d0018800.pinctrl: failed to get pin(29) name
[  154.537567] ------------[ cut here ]------------
[  154.542348] WARNING: CPU: 1 PID: 1347 at /home/gclement/open/kernel/marvell-mainline-linux/drivers/pinctrl/core.c:1610 pinctrl_groups_show+0x15c/0x1a0
[  154.555918] Modules linked in:
[  154.558890] CPU: 1 PID: 1347 Comm: cat Tainted: G        W       4.13.0-rc1-00001-g19e1b9fa219d #525
[  154.568316] Hardware name: Marvell Armada 3720 Development Board DB-88F3720-DDR3 (DT)
[  154.576311] task: ffff80001d32d100 task.stack: ffff80001bdc0000
[  154.583048] PC is at pinctrl_groups_show+0x15c/0x1a0
[  154.587816] LR is at pinctrl_groups_show+0x148/0x1a0
[  154.592847] pc : [<ffff0000083e3adc>] lr : [<ffff0000083e3ac8>] pstate: 00000145
[  154.600840] sp : ffff80001bdc3c80
[  154.604255] x29: ffff80001bdc3c80 x28: 00000000f7750000
[  154.609825] x27: ffff80001d05d198 x26: 0000000000000009
[  154.615224] x25: ffff0000089ead20 x24: 0000000000000002
[  154.620705] x23: ffff000008c8e1d0 x22: ffff80001be55700
[  154.626187] x21: ffff80001d05d100 x20: 0000000000000005
[  154.631667] x19: 0000000000000006 x18: 0000000000000010
[  154.637238] x17: 0000000000000000 x16: ffff0000081fc4b8
[  154.642726] x15: 0000000000000006 x14: ffff0000899e537f
[  154.648214] x13: ffff0000099e538d x12: 206f742064656c69
[  154.653613] x11: 6166203a6c727463 x10: 0000000005f5e0ff
[  154.659094] x9 : ffff80001bdc38c0 x8 : 286e697020746567
[  154.664576] x7 : ffff000008551870 x6 : 000000000000011b
[  154.670146] x5 : 0000000000000000 x4 : 0000000000000000
[  154.675544] x3 : 0000000000000000 x2 : 0000000000000000
[  154.681025] x1 : ffff000008c8e1d0 x0 : ffff80001be55700
[  154.686507] Call trace:
[  154.688668] Exception stack(0xffff80001bdc3ab0 to 0xffff80001bdc3be0)
[  154.695224] 3aa0:                                   0000000000000006 0001000000000000
[  154.703310] 3ac0: ffff80001bdc3c80 ffff0000083e3adc ffff80001bdc3bb0 00000000ffffffd8
[  154.711304] 3ae0: 4554535953425553 6f6674616c703d4d 4349564544006d72 6674616c702b3d45
[  154.719478] 3b00: 313030643a6d726f 6e69702e30303838 ffff80006c727463 ffff0000089635d8
[  154.727562] 3b20: ffff80001d1ca0cb ffff000008af0fa4 ffff80001bdc3b40 ffff000008c8e1dc
[  154.735648] 3b40: ffff80001bdc3bc0 ffff000008223174 ffff80001be55700 ffff000008c8e1d0
[  154.743731] 3b60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  154.752354] 3b80: 000000000000011b ffff000008551870 286e697020746567 ffff80001bdc38c0
[  154.760446] 3ba0: 0000000005f5e0ff 6166203a6c727463 206f742064656c69 ffff0000099e538d
[  154.767910] 3bc0: ffff0000899e537f 0000000000000006 ffff0000081fc4b8 0000000000000000
[  154.776085] [<ffff0000083e3adc>] pinctrl_groups_show+0x15c/0x1a0
[  154.782823] [<ffff000008222abc>] seq_read+0x184/0x460
[  154.787505] [<ffff000008344120>] full_proxy_read+0x60/0xa8
[  154.793431] [<ffff0000081f9bec>] __vfs_read+0x1c/0x110
[  154.799001] [<ffff0000081faff4>] vfs_read+0x84/0x140
[  154.803860] [<ffff0000081fc4fc>] SyS_read+0x44/0xa0
[  154.808983] [<ffff000008082f30>] el0_svc_naked+0x24/0x28
[  154.814459] ---[ end trace 4cbb00a92d616b95 ]---

Cc: stable@vger.kernel.org
Fixes: 87466ccd9401 ("pinctrl: armada-37xx: Add pin controller support
for Armada 37xx")
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
6 years agopinctrl: armada-37xx: Fix the pin 23 on south bridge
Gregory CLEMENT [Tue, 1 Aug 2017 15:57:19 +0000 (17:57 +0200)]
pinctrl: armada-37xx: Fix the pin 23 on south bridge

Pin 23 on South bridge does not belong to the rgmii group. It belongs to
a separate group which can have 3 functions.

Due to this the fix also have to update the way the functions are
managed. Until now each groups used NB_FUNCS(which was 2) functions. For
the mpp23, 3 functions are available but it is the only group which needs
it, so on the loop involving NB_FUNCS an extra test was added to handle
only the functions added.

The bug was visible with the merge of the commit 07d065abf93d "arm64:
dts: marvell: armada-3720-db: Add vqmmc regulator for SD slot", the gpio
regulator used the gpio 23, due to this the whole rgmii group was setup
to gpio which broke the Ethernet support on the Armada 3720 DB
board. Thanks to this patch, the UHS SD cards (which need the vqmmc)
_and_ the Ethernet work again.

Cc: stable@vger.kernel.org
Fixes: 87466ccd9401 ("pinctrl: armada-37xx: Add pin controller support
for Armada 37xx")
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
6 years agoRevert "powerpc/64: Avoid restore_math call if possible in syscall exit"
Michael Ellerman [Mon, 7 Aug 2017 11:25:01 +0000 (21:25 +1000)]
Revert "powerpc/64: Avoid restore_math call if possible in syscall exit"

This reverts commit bc4f65e4cf9d6cc43e0e9ba0b8648cf9201cd55f.

As reported by Andreas, this commit is causing unrecoverable SLB misses in the
system call exit path:

  Unrecoverable exception 4100 at c00000000000a1ec
  Oops: Unrecoverable exception, sig: 6 [#1]
  SMP NR_CPUS=2 PowerMac
  ...
  CPU: 0 PID: 18626 Comm: rm Not tainted 4.13.0-rc3 #1
  task: c00000018335e080 task.stack: c000000139e50000
  NIP: c00000000000a1ec LR: c00000000000a118 CTR: 0000000000000000
  REGS: c000000139e53bb0 TRAP: 4100   Not tainted  (4.13.0-rc3)
  MSR: 9000000000001030 <SF,HV,ME,IR,DR> CR: 24000044  XER: 20000000 SOFTE: 1
  GPR00: 0000000000000000 c000000139e53e30 c000000000abb500 fffffffffffffffe
  GPR04: c0000001eb866298 0000000000000000 0000000000000000 c00000018335e080
  GPR08: 900000000000d032 0000000000000000 0000000000000002 fffffffffffff001
  GPR12: c000000139e50000 c00000000ffff000 00003fffa8c0dca0 00003fffa8c0dc88
  GPR16: 0000000010000000 0000000000000001 00003fffa8c0eaa0 0000000000000000
  GPR20: 00003fffa8c27528 00003fffa8c27b00 0000000000000000 0000000000000000
  GPR24: 00003fffa8c0d918 00003ffff1b3efa0 00003fffa8c26d68 0000000000000000
  GPR28: 00003fffa8c249e8 00003fffa8c263d0 00003fffa8c27550 00003ffff1b3ef10
  NIP [c00000000000a1ec] system_call_exit+0xc0/0x21c
  LR [c00000000000a118] system_call+0x58/0x6c
  Call Trace:
  [c000000139e53e30] [c00000000000a118] system_call+0x58/0x6c (unreliable)
  Instruction dump:
  64a51000 7c6300d0 f8a101a0 4bffff9c 3c000000 60000006 780007c6 64000000
  60000000 7c004039 4082001c e8ed0170 <88070b7888c70b79 7c003214 2c200000

This is caused by us trying to load THREAD_LOAD_FP with MSR_RI=0, and taking an
SLB miss on the thread struct.

Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Diagnosed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agodrm/i915: fix backlight invert for non-zero minimum brightness
Jani Nikula [Wed, 31 May 2017 08:33:55 +0000 (11:33 +0300)]
drm/i915: fix backlight invert for non-zero minimum brightness

When we started following the backlight minimum brightness in
6dda730e55f4 ("drm/i915: respect the VBT minimum backlight brightness")
we overlooked the brightness invert quirk. Even if we invert the
brightness, we need to take the min limit into account. We probably
missed this because the invert has only been required on gen4 for proper
operation.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101127
Fixes: 6dda730e55f4 ("drm/i915: respect the VBT minimum backlight brightness")
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170531083355.7898-1-jani.nikula@intel.com
(cherry picked from commit e9d7486eac949f2a8d121657e536c8abdd4ea088)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
6 years agodrm/i915/shrinker: Wrap need_resched() inside preempt-disable
Chris Wilson [Fri, 4 Aug 2017 10:41:35 +0000 (11:41 +0100)]
drm/i915/shrinker: Wrap need_resched() inside preempt-disable

In order for us to successfully detect the end of a timeslice,
preemption must be disabled. Otherwise, inside the loop we may be
preempted many times without our noticing, and each time our timeslice
will be reset, invalidating need_resched()

Reported-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reported-by: Tomi Sarvela <tomi.p.sarvela@intel.com>
Fixes: 290271de34f6 ("drm/i915: Spin for struct_mutex inside shrinker")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.13-rc1+
Link: https://patchwork.freedesktop.org/patch/msgid/20170804104135.26805-1-chris@chris-wilson.co.uk
Tested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit 6cb0c6ad9e07f2c7971c4e8e0d9b7ceba151a925)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
6 years agodrm/i915/perf: fix flex eu registers programming
Lionel Landwerlin [Thu, 3 Aug 2017 16:58:07 +0000 (17:58 +0100)]
drm/i915/perf: fix flex eu registers programming

We were reserving fewer dwords in the ring than necessary. Indeed
we're always writing all registers once, so discard the actual number
of registers given by the user and just program the whitelisted ones
once.

Fixes: 19f81df2859e ("drm/i915/perf: Add OA unit support for Gen 8+")
Reported-by: Matthew Auld <matthew.william.auld@gmail.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v4.12+
Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-6-lionel.g.landwerlin@intel.com
(cherry picked from commit 01d928e9a1644eb2e28f684905f888e700c7b9dc)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
6 years agoMerge tag 'gvt-fixes-2017-08-07' of https://github.com/01org/gvt-linux into drm-intel...
Jani Nikula [Mon, 7 Aug 2017 08:32:46 +0000 (11:32 +0300)]
Merge tag 'gvt-fixes-2017-08-07' of https://github.com/01org/gvt-linux into drm-intel-fixes

gvt-fixes-2017-08-07

- two regression fixes for 65f9f6febf12, one is for display MMIO
  initial value (Tina), another for 64bit MMIO access (Xiong)
- two reset fixes from Chuanxiao

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170807080716.qljcvws6opydnotk@zhen-hp.sh.intel.com
6 years agodrm/i915: Fix out-of-bounds array access in bdw_load_gamma_lut
Maarten Lankhorst [Mon, 24 Jul 2017 09:14:31 +0000 (11:14 +0200)]
drm/i915: Fix out-of-bounds array access in bdw_load_gamma_lut

bdw_load_gamma_lut is writing beyond the array to the maximum value.
The intend of the function is to clamp values > 1 to 1, so write
the intended color to the max register.

This fixes the following KASAN warning:

[  197.020857] [IGT] kms_pipe_color: executing
[  197.063434] [IGT] kms_pipe_color: starting subtest ctm-0-25-pipe0
[  197.078989] ==================================================================
[  197.079127] BUG: KASAN: slab-out-of-bounds in bdw_load_gamma_lut.isra.2+0x3b9/0x570 [i915]
[  197.079188] Read of size 2 at addr ffff8800d38db150 by task kms_pipe_color/1839
[  197.079208] CPU: 2 PID: 1839 Comm: kms_pipe_color Tainted: G     U 4.13.0-rc1-patser+ #5211
[  197.079215] Hardware name: NUC5i7RYB, BIOS RYBDWi35.86A.0246.2015.0309.1355 03/09/2015
[  197.079220] Call Trace:
[  197.079230]  dump_stack+0x68/0x9e
[  197.079239]  print_address_description+0x6f/0x250
[  197.079251]  kasan_report+0x216/0x370
[  197.079374]  ? bdw_load_gamma_lut.isra.2+0x3b9/0x570 [i915]
[  197.079451]  ? gen8_write16+0x4e0/0x4e0 [i915]
[  197.079460]  __asan_report_load2_noabort+0x14/0x20
[  197.079535]  bdw_load_gamma_lut.isra.2+0x3b9/0x570 [i915]
[  197.079612]  broadwell_load_luts+0x1df/0x550 [i915]
[  197.079690]  intel_color_load_luts+0x7b/0x80 [i915]
[  197.079764]  intel_begin_crtc_commit+0x138/0x760 [i915]
[  197.079783]  drm_atomic_helper_commit_planes_on_crtc+0x1a3/0x820 [drm_kms_helper]
[  197.079859]  ? intel_pre_plane_update+0x571/0x580 [i915]
[  197.079937]  intel_update_crtc+0x238/0x330 [i915]
[  197.080016]  intel_update_crtcs+0x10f/0x210 [i915]
[  197.080092]  intel_atomic_commit_tail+0x1552/0x3340 [i915]
[  197.080101]  ? _raw_spin_unlock+0x3c/0x40
[  197.080110]  ? __queue_work+0xb40/0xbf0
[  197.080188]  ? skl_update_crtcs+0xc00/0xc00 [i915]
[  197.080195]  ? trace_hardirqs_on+0xd/0x10
[  197.080269]  ? intel_atomic_commit_ready+0x128/0x13c [i915]
[  197.080329]  ? __i915_sw_fence_complete+0x5b8/0x6d0 [i915]
[  197.080336]  ? debug_object_activate+0x39e/0x580
[  197.080397]  ? i915_sw_fence_await+0x30/0x30 [i915]
[  197.080409]  ? __might_sleep+0x15b/0x180
[  197.080483]  intel_atomic_commit+0x944/0xa70 [i915]
[  197.080490]  ? refcount_dec_and_test+0x11/0x20
[  197.080567]  ? intel_atomic_commit_tail+0x3340/0x3340 [i915]
[  197.080597]  ? drm_atomic_crtc_set_property+0x303/0x580 [drm]
[  197.080674]  ? intel_atomic_commit_tail+0x3340/0x3340 [i915]
[  197.080704]  drm_atomic_commit+0xd7/0xe0 [drm]
[  197.080722]  drm_atomic_helper_crtc_set_property+0xec/0x130 [drm_kms_helper]
[  197.080749]  drm_mode_crtc_set_obj_prop+0x7d/0xb0 [drm]
[  197.080775]  drm_mode_obj_set_property_ioctl+0x50b/0x5d0 [drm]
[  197.080783]  ? __might_fault+0x104/0x180
[  197.080809]  ? drm_mode_obj_find_prop_id+0x160/0x160 [drm]
[  197.080838]  ? drm_mode_obj_find_prop_id+0x160/0x160 [drm]
[  197.080861]  drm_ioctl_kernel+0x154/0x1a0 [drm]
[  197.080885]  drm_ioctl+0x624/0x8f0 [drm]
[  197.080910]  ? drm_mode_obj_find_prop_id+0x160/0x160 [drm]
[  197.080934]  ? drm_getunique+0x210/0x210 [drm]
[  197.080943]  ? __handle_mm_fault+0x1bd0/0x1ce0
[  197.080949]  ? lock_downgrade+0x610/0x610
[  197.080957]  ? __lru_cache_add+0x15a/0x180
[  197.080967]  do_vfs_ioctl+0xd92/0xe40
[  197.080975]  ? ioctl_preallocate+0x1b0/0x1b0
[  197.080982]  ? selinux_capable+0x20/0x20
[  197.080991]  ? __do_page_fault+0x7b7/0x9a0
[  197.080997]  ? lock_downgrade+0x5bb/0x610
[  197.081007]  ? security_file_ioctl+0x57/0x90
[  197.081016]  SyS_ioctl+0x4e/0x80
[  197.081024]  entry_SYSCALL_64_fastpath+0x18/0xad
[  197.081030] RIP: 0033:0x7f61f287a987
[  197.081035] RSP: 002b:00007fff7d44d188 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  197.081043] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f61f287a987
[  197.081048] RDX: 00007fff7d44d1c0 RSI: 00000000c01864ba RDI: 0000000000000003
[  197.081053] RBP: 00007f61f2b3eb00 R08: 0000000000000059 R09: 0000000000000000
[  197.081058] R10: 0000002ea5c4a290 R11: 0000000000000246 R12: 00007f61f2b3eb58
[  197.081063] R13: 0000000000001010 R14: 00007f61f2b3eb58 R15: 0000000000002702

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101659
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reported-by: Martin Peres <martin.peres@linux.intel.com>
Cc: Martin Peres <martin.peres@linux.intel.com>
Fixes: 82cf435b3134 ("drm/i915: Implement color management on bdw/skl/bxt/kbl")
Cc: Shashank Sharma <shashank.sharma@intel.com>
Cc: Kiran S Kumar <kiran.s.kumar@intel.com>
Cc: Kausal Malladi <kausalmalladi@gmail.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v4.7+
Link: https://patchwork.freedesktop.org/patch/msgid/20170724091431.24251-1-maarten.lankhorst@linux.intel.com
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 09a92bc8773b4314e02b478e003fe5936ce85adb)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
6 years agodrm/i915/gvt: Change the max length of mmio_reg_rw from 4 to 8
Xiong Zhang [Wed, 2 Aug 2017 02:31:01 +0000 (10:31 +0800)]
drm/i915/gvt: Change the max length of mmio_reg_rw from 4 to 8

When linux guest access mmio with __raw_i915_read64 or __raw_i915_write64,
its length is 8 bytes.

This fix the linux guest in xengt couldn't boot up as it fail in
reading pv_info->magic.

Fixes: 65f9f6febf12 ("drm/i915/gvt: Optimize MMIO register handling for some large MMIO blocks")
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
6 years agonetvsc: fix race on sub channel creation
stephen hemminger [Fri, 4 Aug 2017 00:13:54 +0000 (17:13 -0700)]
netvsc: fix race on sub channel creation

The existing sub channel code did not wait for all the sub-channels
to completely initialize. This could lead to race causing crash
in napi_netif_del() from bad list. The existing code would send
an init message, then wait only for the initial response that
the init message was received. It thought it was waiting for
sub channels but really the init response did the wakeup.

The new code keeps track of the number of open channels and
waits until that many are open.

Other issues here were:
  * host might return less sub-channels than was requested.
  * the new init status is not valid until after init was completed.

Fixes: b3e6b82a0099 ("hv_netvsc: Wait for sub-channels to be processed during probe")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoLinux 4.13-rc4 v4.13-rc4
Linus Torvalds [Mon, 7 Aug 2017 01:44:49 +0000 (18:44 -0700)]
Linux 4.13-rc4

6 years agoMerge tag 'platform-drivers-x86-v4.13-4' of git://git.infradead.org/linux-platform...
Linus Torvalds [Sun, 6 Aug 2017 23:11:34 +0000 (16:11 -0700)]
Merge tag 'platform-drivers-x86-v4.13-4' of git://git.infradead.org/linux-platform-drivers-x86

Pull x86 platform driver fix from Darren Hart:
 "Fix loop preventing some platforms from waking up via the power button
  in s2idle:

   - intel-vbtn: match power button on press rather than release"

* tag 'platform-drivers-x86-v4.13-4' of git://git.infradead.org/linux-platform-drivers-x86:
  platform/x86: intel-vbtn: match power button on press rather than release

6 years agoMerge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 6 Aug 2017 19:31:17 +0000 (12:31 -0700)]
Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "A large number of ext4 bug fixes and cleanups for v4.13"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix copy paste error in ext4_swap_extents()
  ext4: fix overflow caused by missing cast in ext4_resize_fs()
  ext4, project: expand inode extra size if possible
  ext4: cleanup ext4_expand_extra_isize_ea()
  ext4: restructure ext4_expand_extra_isize
  ext4: fix forgetten xattr lock protection in ext4_expand_extra_isize
  ext4: make xattr inode reads faster
  ext4: inplace xattr block update fails to deduplicate blocks
  ext4: remove unused mode parameter
  ext4: fix warning about stack corruption
  ext4: fix dir_nlink behaviour
  ext4: silence array overflow warning
  ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize
  ext4: release discard bio after sending discard commands
  ext4: convert swap_inode_data() over to use swap() on most of the fields
  ext4: error should be cleared if ea_inode isn't added to the cache
  ext4: Don't clear SGID when inheriting ACLs
  ext4: preserve i_mode if __ext4_set_acl() fails
  ext4: remove unused metadata accounting variables
  ext4: correct comment references to ext4_ext_direct_IO()

6 years agoMerge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Linus Torvalds [Sun, 6 Aug 2017 18:52:01 +0000 (11:52 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus

Pull MIPS fixes from Ralf Baechle:
 "This fixes two build issues for ralink platforms, both due to missing
  #includes which used to be included indirectly via other headers"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: ralink: mt7620: Add missing header
  MIPS: ralink: Fix build error due to missing header

6 years agoFix compat_sys_sigpending breakage
Dmitry V. Levin [Sat, 5 Aug 2017 20:00:50 +0000 (23:00 +0300)]
Fix compat_sys_sigpending breakage

The latest change of compat_sys_sigpending in commit 8f13621abced
("sigpending(): move compat to native") has broken it in two ways.

First, it tries to write 4 bytes more than userspace expects:
sizeof(old_sigset_t) == sizeof(long) == 8 instead of
sizeof(compat_old_sigset_t) == sizeof(u32) == 4.

Second, on big endian architectures these bytes are being written in the
wrong order.

This bug was found by strace test suite.

Reported-by: Anatoly Pugachev <matorola@gmail.com>
Inspired-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Fixes: 8f13621abced ("sigpending(): move compat to native")
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoext4: fix copy paste error in ext4_swap_extents()
Maninder Singh [Sun, 6 Aug 2017 05:33:07 +0000 (01:33 -0400)]
ext4: fix copy paste error in ext4_swap_extents()

This bug was found by a static code checker tool for copy paste
problems.

Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Vaneet Narang <v.narang@samsung.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agoext4: fix overflow caused by missing cast in ext4_resize_fs()
Jerry Lee [Sun, 6 Aug 2017 05:18:31 +0000 (01:18 -0400)]
ext4: fix overflow caused by missing cast in ext4_resize_fs()

On a 32-bit platform, the value of n_blcoks_count may be wrong during
the file system is resized to size larger than 2^32 blocks.  This may
caused the superblock being corrupted with zero blocks count.

Fixes: 1c6bd7173d66
Signed-off-by: Jerry Lee <jerrylee@qnap.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org # 3.7+
6 years agoext4, project: expand inode extra size if possible
Miao Xie [Sun, 6 Aug 2017 05:00:49 +0000 (01:00 -0400)]
ext4, project: expand inode extra size if possible

When upgrading from old format, try to set project id
to old file first time, it will return EOVERFLOW, but if
that file is dirtied(touch etc), changing project id will
be allowed, this might be confusing for users, we could
try to expand @i_extra_isize here too.

Reported-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agoext4: cleanup ext4_expand_extra_isize_ea()
Miao Xie [Sun, 6 Aug 2017 04:55:48 +0000 (00:55 -0400)]
ext4: cleanup ext4_expand_extra_isize_ea()

Clean up some goto statement, make ext4_expand_extra_isize_ea() clearer.

Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
6 years agoext4: restructure ext4_expand_extra_isize
Miao Xie [Sun, 6 Aug 2017 04:40:01 +0000 (00:40 -0400)]
ext4: restructure ext4_expand_extra_isize

Current ext4_expand_extra_isize just tries to expand extra isize, if
someone is holding xattr lock or some check fails, it will give up.
So rename its name to ext4_try_to_expand_extra_isize.

Besides that, we clean up unnecessary check and move some relative checks
into it.

Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
6 years agoext4: fix forgetten xattr lock protection in ext4_expand_extra_isize
Miao Xie [Sun, 6 Aug 2017 04:27:38 +0000 (00:27 -0400)]
ext4: fix forgetten xattr lock protection in ext4_expand_extra_isize

We should avoid the contention between the i_extra_isize update and
the inline data insertion, so move the xattr trylock in front of
i_extra_isize update.

Signed-off-by: Miao Xie <miaoxie@huawei.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
6 years agoext4: make xattr inode reads faster
Tahsin Erdogan [Sun, 6 Aug 2017 04:07:01 +0000 (00:07 -0400)]
ext4: make xattr inode reads faster

ext4_xattr_inode_read() currently reads each block sequentially while
waiting for io operation to complete before moving on to the next
block. This prevents request merging in block layer.

Add a ext4_bread_batch() function that starts reads for all blocks
then optionally waits for them to complete. A similar logic is used
in ext4_find_entry(), so update that code to use the new function.

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agoext4: inplace xattr block update fails to deduplicate blocks
Tahsin Erdogan [Sun, 6 Aug 2017 02:41:42 +0000 (22:41 -0400)]
ext4: inplace xattr block update fails to deduplicate blocks

When an xattr block has a single reference, block is updated inplace
and it is reinserted to the cache. Later, a cache lookup is performed
to see whether an existing block has the same contents. This cache
lookup will most of the time return the just inserted entry so
deduplication is not achieved.

Running the following test script will produce two xattr blocks which
can be observed in "File ACL: " line of debugfs output:

  mke2fs -b 1024 -I 128 -F -O extent /dev/sdb 1G
  mount /dev/sdb /mnt/sdb

  touch /mnt/sdb/{x,y}

  setfattr -n user.1 -v aaa /mnt/sdb/x
  setfattr -n user.2 -v bbb /mnt/sdb/x

  setfattr -n user.1 -v aaa /mnt/sdb/y
  setfattr -n user.2 -v bbb /mnt/sdb/y

  debugfs -R 'stat x' /dev/sdb | cat
  debugfs -R 'stat y' /dev/sdb | cat

This patch defers the reinsertion to the cache so that we can locate
other blocks with the same contents.

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
6 years agoext4: remove unused mode parameter
Tahsin Erdogan [Sun, 6 Aug 2017 02:15:45 +0000 (22:15 -0400)]
ext4: remove unused mode parameter

ext4_alloc_file_blocks() does not use its mode parameter. Remove it.

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agoext4: fix warning about stack corruption
Arnd Bergmann [Sun, 6 Aug 2017 01:57:46 +0000 (21:57 -0400)]
ext4: fix warning about stack corruption

After commit 62d1034f53e3 ("fortify: use WARN instead of BUG for now"),
we get a warning about possible stack overflow from a memcpy that
was not strictly bounded to the size of the local variable:

    inlined from 'ext4_mb_seq_groups_show' at fs/ext4/mballoc.c:2322:2:
include/linux/string.h:309:9: error: '__builtin_memcpy': writing between 161 and 1116 bytes into a region of size 160 overflows the destination [-Werror=stringop-overflow=]

We actually had a bug here that would have been found by the warning,
but it was already fixed last year in commit 30a9d7afe70e ("ext4: fix
stack memory corruption with 64k block size").

This replaces the fixed-length structure on the stack with a variable-length
structure, using the correct upper bound that tells the compiler that
everything is really fine here. I also change the loop count to check
for the same upper bound for consistency, but the existing code is
already correct here.

Note that while clang won't allow certain kinds of variable-length arrays
in structures, this particular instance is fine, as the array is at the
end of the structure, and the size is strictly bounded.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agoext4: fix dir_nlink behaviour
Andreas Dilger [Sat, 5 Aug 2017 23:47:34 +0000 (19:47 -0400)]
ext4: fix dir_nlink behaviour

The dir_nlink feature has been enabled by default for new ext4
filesystems since e2fsprogs-1.41 in 2008, and was automatically
enabled by the kernel for older ext4 filesystems since the
dir_nlink feature was added with ext4 in kernel 2.6.28+ when
the subdirectory count exceeded EXT4_LINK_MAX-1.

Automatically adding the file system features such as dir_nlink is
generally frowned upon, since it could cause the file system to not be
mountable on older kernel, thus preventing the administrator from
rolling back to an older kernel if necessary.

In this case, the administrator might also want to disable the feature
because glibc's fts_read() function does not correctly optimize
directory traversal for directories that use st_nlinks field of 1 to
indicate that the number of links in the directory are not tracked by
the file system, and could fail to traverse the full directory
hierarchy.  Fortunately, in the past ten years very few users have
complained about incomplete file system traversal by glibc's
fts_read().

This commit also changes ext4_inc_count() to allow i_nlinks to reach
the full EXT4_LINK_MAX links on the parent directory (including "."
and "..") before changing i_links_count to be 1.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196405
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agoext4: silence array overflow warning
Dan Carpenter [Sat, 5 Aug 2017 23:00:31 +0000 (19:00 -0400)]
ext4: silence array overflow warning

I get a static checker warning:

    fs/ext4/ext4.h:3091 ext4_set_de_type()
    error: buffer overflow 'ext4_type_by_mode' 15 <= 15

It seems unlikely that we would hit this read overflow in real life, but
it's also simple enough to make the array 16 bytes instead of 15.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agoext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize
Jan Kara [Sat, 5 Aug 2017 21:43:24 +0000 (17:43 -0400)]
ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize

ext4_find_unwritten_pgoff() does not properly handle a situation when
starting index is in the middle of a page and blocksize < pagesize. The
following command shows the bug on filesystem with 1k blocksize:

  xfs_io -f -c "falloc 0 4k" \
            -c "pwrite 1k 1k" \
            -c "pwrite 3k 1k" \
            -c "seek -a -r 0" foo

In this example, neither lseek(fd, 1024, SEEK_HOLE) nor lseek(fd, 2048,
SEEK_DATA) will return the correct result.

Fix the problem by neglecting buffers in a page before starting offset.

Reported-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
CC: stable@vger.kernel.org # 3.8+
6 years agoplatform/x86: intel-vbtn: match power button on press rather than release
Mario Limonciello [Fri, 4 Aug 2017 17:00:06 +0000 (12:00 -0500)]
platform/x86: intel-vbtn: match power button on press rather than release

This fixes a problem where the system gets stuck in a loop
unable to wakeup via power button in s2idle.

The problem happens because:
 - press power button:
   - system emits 0xc0 (power press), event ignored
   - system emits 0xc1 (power release), event processed,
     emited as KEY_POWER
   - set wakeup_mode to true
   - system goes to s2idle
 - press power button
   - system emits 0xc0 (power press), wakeup_mode is true,
     system wakes
   - system emits 0xc1 (power release), event processed,
     emited as KEY_POWER
   - system goes to s2idle again

To avoid this situation, process the presses (which matches what
intel-hid does too).

Verified on an Dell XPS 9365

Signed-off-by: Mario Limonciello <mario.limonciello@dell.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
6 years agoMerge tag 'media/v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Sat, 5 Aug 2017 21:09:26 +0000 (14:09 -0700)]
Merge tag 'media/v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:
 "This series is larger than I would like to submit for -rc4. My
  original intent were to sent it to either -rc2 or -rc3. Unfortunately,
  due to my vacations, I got a lot of pending stuff after my return, and
  had to do some biz trips, with prevented me to send this earlier.

  Several fixes:

   - some fixes at atomisp staging driver

   - several gcc 7 warning fixes

   - cleanup media SVG files, in order to fix PDF build on some distros

   - fix random Kconfig build of venus driver

   - some fixes for the venus driver

   - some changes from semaphone to mutex in ngene's driver

   - some locking fixes at dib0700 driver

   - several fixes on ngene's driver and frontends to make it properly
     support some new boards added on Kernel 4.13

   - some fixes to CEC drivers

   - omap_vout: vrfb: convert to dmaengine

   - docs-rst: document EBUSY for VIDIOC_S_FMT

  Please notice that the big diffstat changes here are at the SVG files.

  Visually, the images look the same, but the file size is now a lot
  smaller than before, and they don't use some XML tags that would cause
  them to be badly parsed by some ImageMagick versions, or to require a
  lot of memory by TeTex, with would break PDF output on some
  distributions"

* tag 'media/v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (68 commits)
  media: atomisp2: array underflow in imx_enum_frame_size()
  media: atomisp2: array underflow in ap1302_enum_frame_size()
  media: atomisp2: Array underflow in atomisp_enum_input()
  media: platform: davinci: drop VPFE_CMD_S_CCDC_RAW_PARAMS
  media: platform: davinci: return -EINVAL for VPFE_CMD_S_CCDC_RAW_PARAMS ioctl
  media: venus: don't abuse dma_alloc for non-DMA allocations
  media: venus: hfi: fix error handling in hfi_sys_init_done()
  media: venus: fix compile-test build on non-qcom ARM platform
  media: venus: mark PM functions as __maybe_unused
  media: cec-notifier: small improvements
  media: pulse8-cec: persistent_config should be off by default
  media: cec: cec_transmit_attempt_done: ignore CEC_TX_STATUS_MAX_RETRIES
  media: staging: atomisp: array underflow in ioctl
  media: lirc: LIRC_GET_REC_RESOLUTION should return microseconds
  media: svg: avoid too long lines
  media: svg files: simplify files
  media: selection.svg: simplify the SVG file
  media: vimc: set id_table for platform drivers
  media: staging: atomisp: disable warnings with cc-disable-warning
  media: davinci: variable 'common' set but not used
  ...

6 years agoext4: release discard bio after sending discard commands
Daeho Jeong [Sat, 5 Aug 2017 17:11:57 +0000 (13:11 -0400)]
ext4: release discard bio after sending discard commands

We've changed the discard command handling into parallel manner.
But, in this change, I forgot decreasing the usage count of the bio
which was used to send discard request. I'm sorry about that.

Fixes: a015434480dc ("ext4: send parallel discards on commit completions")
Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
6 years agoMerge tag 'gpio-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Sat, 5 Aug 2017 13:55:13 +0000 (06:55 -0700)]
Merge tag 'gpio-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO fixes from Linus Walleij:

 - LP87565: set the proper output level for direction_output.

 - stm32: fix the kernel build by selecting the hierarchical irqdomain
   symbol properly - this happens to be done in the pin control
   framework but whatever, it had dependencies to GPIO so we need to
   apply it here.

 - Select the hierarchical IRQ domain also for Xgene.

 - Fix wakeups to work on MXC.

 - Fix up the device tree binding on Exar that went astray, also add the
   right bindings.

 - Fix the unwanted events for edges from the library.

 - Fix the unbalanced chanined IRQ on the Tegra.

* tag 'gpio-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: tegra: fix unbalanced chained_irq_enter/exit
  gpiolib: skip unwanted events, don't convert them to opposite edge
  gpio: exar: Use correct property prefix and document bindings
  gpio: gpio-mxc: Fix: higher 16 GPIOs usable as wake source
  gpio: xgene-sb: select IRQ_DOMAIN_HIERARCHY
  pinctrl: stm32: select IRQ_DOMAIN_HIERARCHY instead of depends on
  gpio: lp87565: Set proper output level and direction for direction_output
  MAINTAINERS: Add entry for Whiskey Cove PMIC GPIO driver

6 years agoMerge tag 'nand/fixes-for-4.13-rc4' of git://git.infradead.org/l2-mtd into MTD
Brian Norris [Sat, 5 Aug 2017 01:42:37 +0000 (18:42 -0700)]
Merge tag 'nand/fixes-for-4.13-rc4' of git://git.infradead.org/l2-mtd into MTD

"""
This PR contains both core and drivers fixes for 4.13.

Core fixes:
- Fix data interface setup for ONFI NANDs that do not support the SET
  FEATURES command
- Fix a kernel doc header
- Fix potential integer overflow when retrieving timing information
  from the parameter page
- Fix wrong OOB layout for small page NANDs

Driver fixes:
- Fix potential division-by-zero bug
- Fix backward compat with old atmel-nand DT bindings
- Fix ->setup_data_interface() in the atmel NAND driver
"""

6 years agoMerge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 4 Aug 2017 23:45:29 +0000 (16:45 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
 "A handful of critical fixes for changes introduce this merge window.

   - The TI sci_clk_get() API was pretty broken and nobody noticed.

   - There were some CPUfreq crashes on C.H.I.P devices because we
     failed to propagate rates up the clk tree.

   - Also, the Intel Atom PMC clk driver needs to mark a clk critical if
     the firmware has it enabled already so that audio doesn't get
     killed on Baytrail.

   - Gemini devices have a dead serial console because the reset control
     usage in the serial driver assume one method of reset that gemini
     doesn't support (this will be fixed in the next version in the
     reset framework so this is the small fix for -rc series).

   - Finally we have two rate calculation fixes, one for Exynos and one
     for Meson SoCs, that fix rate inconsistencies"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: keystone: sci-clk: Fix sci_clk_get
  clk: meson: mpll: fix mpll0 fractional part ignored
  clk: samsung: exynos5420: The EPLL rate table corrections
  clk: sunxi-ng: sun5i: Add clk_set_rate_parent to the CPU clock
  clk: x86: Do not gate clocks enabled by the firmware
  clk: gemini: Fix reset regression

6 years agobpf: fix byte order test in test_verifier
Daniel Borkmann [Fri, 4 Aug 2017 20:24:41 +0000 (22:24 +0200)]
bpf: fix byte order test in test_verifier

We really must check with #if __BYTE_ORDER == XYZ instead of
just presence of #ifdef __LITTLE_ENDIAN. I noticed that when
actually running this on big endian machine, the latter test
resolves to true for user space, same for #ifdef __BIG_ENDIAN.

E.g., looking at endian.h from libc, both are also defined
there, so we really must test this against __BYTE_ORDER instead
for proper insns selection. For the kernel, such checks are
fine though e.g. see 13da9e200fe4 ("Revert "endian: #define
__BYTE_ORDER"") and 415586c9e6d3 ("UAPI: fix endianness conditionals
in M32R's asm/stat.h") for some more context, but not for
user space. Lets also make sure to properly include endian.h.
After that, suite passes for me:

./test_verifier: ELF 64-bit MSB executable, [...]

Linux foo 4.13.0-rc3+ #4 SMP Fri Aug 4 06:59:30 EDT 2017 s390x s390x s390x GNU/Linux

Before fix: Summary: 505 PASSED, 11 FAILED
After  fix: Summary: 516 PASSED,  0 FAILED

Fixes: 18f3d6be6be1 ("selftests/bpf: Add test cases to test narrower ctx field loads")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Fri, 4 Aug 2017 22:18:27 +0000 (15:18 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Radim Krčmář:
 "ARM:

   - Yet another race with VM destruction plugged

   - A set of small vgic fixes

  x86:

   - Preserve pending INIT

   - RCU fixes in paravirtual async pf, VM teardown, and VMXOFF
     emulation

   - nVMX interrupt injection and dirty tracking fixes

   - initialize to make UBSAN happy"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: arm/arm64: vgic: Use READ_ONCE fo cmpxchg
  KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit"
  KVM: nVMX: mark vmcs12 pages dirty on L2 exit
  kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown
  KVM: nVMX: do not pin the VMCS12
  KVM: avoid using rcu_dereference_protected
  KVM: X86: init irq->level in kvm_pv_kick_cpu_op
  KVM: X86: Fix loss of pending INIT due to race
  KVM: async_pf: make rcu irq exit if not triggered from idle task
  KVM: nVMX: fixes to nested virt interrupt injection
  KVM: nVMX: do not fill vm_exit_intr_error_code in prepare_vmcs12
  KVM: arm/arm64: Handle hva aging while destroying the vm
  KVM: arm/arm64: PMU: Fix overflow interrupt injection
  KVM: arm/arm64: Fix bug in advertising KVM_CAP_MSI_DEVID capability

6 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 4 Aug 2017 22:16:09 +0000 (15:16 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Thomas Gleixner:
 "The recent irq core changes unearthed API abuse in the HPET code,
  which manifested itself in a suspend/resume regression.

  The fix replaces the cruft with the proper function calls and cures
  the regression"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/hpet: Cure interface abuse in the resume path

6 years agoMerge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 4 Aug 2017 22:14:09 +0000 (15:14 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
 "A single fix for a multiplication overflow in the timer code on 32bit
  systems"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers: Fix overflow in get_next_timer_interrupt

6 years agoMerge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Linus Torvalds [Fri, 4 Aug 2017 22:12:15 +0000 (15:12 -0700)]
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Arnd Bergmann:
 "This comes a bit later than I planned, and as a consequence is a
  larger than it should be.

  Most of the changes are devicetree fixes, across lots of platforms:
  Renesas, Samsung Exynos, Marvell EBU, TI OMAP, Rockchips, Amlogic
  Meson, Sigma Desings Tango, Allwinner SUNxi and TI Davinci.

  Also across many platforms, I applied an older series of simple
  randconfig build fixes. This includes making the CONFIG_MTD_XIP option
  compile again, which had been broken for many years and probably has
  not been missed, but it felt wrong to just remove it completely.

  The only other changes are:

   - We enable HWSPINLOCK in defconfig to get some Qualcomm boards to
     work out of the box.

   - A few regression fixes for Texas Instruments OMAP2+.

   - A boot regression fix for the Renesas regulator quirk.

   - A suspend/resume fix for Uniphier SoCs, fixing the resume of the
     system bus"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (43 commits)
  ARM: dts: tango4: Request RGMII RX and TX clock delays
  bus: uniphier-system-bus: set up registers when resuming
  ARM64: dts: marvell: armada-37xx: Fix the number of GPIO on south bridge
  ARM: shmobile: rcar-gen2: Fix deadlock in regulator quirk
  arm64: defconfig: enable missing HWSPINLOCK
  ARM: pxa: select both FB and FB_W100 for eseries
  ARM: ixp4xx: fix ioport_unmap definition
  ARM: ep93xx: use ARM_PATCH_PHYS_VIRT correctly
  ARM: mmp: mark usb_dma_mask as __maybe_unused
  ARM: omap2: mark unused functions as __maybe_unused
  ARM: omap1: avoid unused variable warning
  ARM: sirf: mark sirfsoc_init_late as __maybe_unused
  ARM: ixp4xx: use normal prototype for {read,write}s{b,w,l}
  ARM: omap1/ams-delta: warn about failed regulator enable
  ARM: rpc: rename RAM_SIZE macro
  ARM: w90x900: normalize clk API
  ARM: ep93xx: normalize clk API
  ARM: dts: sun8i: a83t: Switch to CCU device tree binding macros
  arm64: allwinner: sun50i-a64: Correct emac register size
  ARM: dts: sunxi: h3/h5: Correct emac register size
  ...

6 years agoxfs: Fix per-inode DAX flag inheritance
Lukas Czerner [Thu, 3 Aug 2017 20:19:13 +0000 (13:19 -0700)]
xfs: Fix per-inode DAX flag inheritance

According to the commit that implemented per-inode DAX flag:
commit 58f88ca2df72 ("xfs: introduce per-inode DAX enablement")
the flag is supposed to act as "inherit flag".

Currently this only works in the situations where parent directory
already has a flag in di_flags set, otherwise inheritance does not
work. This is because setting the XFS_DIFLAG2_DAX flag is done in a
wrong branch designated for di_flags, not di_flags2.

Fix this by moving the code to branch designated for setting di_flags2,
which does test for flags in di_flags2.

Fixes: 58f88ca2df72 ("xfs: introduce per-inode DAX enablement")
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
6 years agoxfs: Fix leak of discard bio
Jan Kara [Wed, 2 Aug 2017 19:37:16 +0000 (12:37 -0700)]
xfs: Fix leak of discard bio

The bio describing discard operation is allocated by
__blkdev_issue_discard() which returns us a reference to it. That
reference is never released and thus we leak this bio. Drop the bio
reference once it completes in xlog_discard_endio().

CC: stable@vger.kernel.org
Fixes: 4560e78f40cb55bd2ea8f1ef4001c5baa88531c7
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
6 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 4 Aug 2017 19:11:48 +0000 (12:11 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "Here are some more arm64 fixes for 4.13. The main one is the PTE race
  with the hardware walker, but there are a couple of other things too.

   - Report correct timer frequency to userspace when trapping
     CNTFRQ_EL0

   - Fix race with hardware page table updates when updating access
     flags

   - Silence clang overflow warning in VA_START and PAGE_OFFSET
     calculations"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: avoid overflow in VA_START and PAGE_OFFSET
  arm64: Fix potential race with hardware DBM in ptep_set_access_flags()
  arm64: Use arch_timer_get_rate when trapping CNTFRQ_EL0

6 years agoIB/hns: checking for IS_ERR() instead of NULL
Dan Carpenter [Fri, 4 Aug 2017 08:12:08 +0000 (11:12 +0300)]
IB/hns: checking for IS_ERR() instead of NULL

The hns_roce_v1_create_lp_qp() returns NULL on error, not error pointers.

Fixes: bfcc681bd09d ("IB/hns: Fix the bug when free mr")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoxgene: Always get clk source, but ignore if it's missing for SGMII ports
Thomas Bogendoerfer [Thu, 3 Aug 2017 13:43:14 +0000 (15:43 +0200)]
xgene: Always get clk source, but ignore if it's missing for SGMII ports

Even the driver doesn't do anything with the clk source for SGMII
ports it needs to be enabled by doing a devm_clk_get(), if there is
a clk source in DT.

Fixes: 0db01097cabd ('xgene: Don't fail probe, if there is no clk resource for SGMII interfaces')
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Tested-by: Laura Abbott <labbott@redhat.com>
Acked-by: Iyappan Subramanian <isubramanian@apm.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoRDMA/mlx5: Fix existence check for extended address vector
Leon Romanovsky [Tue, 1 Aug 2017 06:41:37 +0000 (09:41 +0300)]
RDMA/mlx5: Fix existence check for extended address vector

The extended address vector is the highest bit in be32 variable,
but it was compared with the lowest. This patch fixes the endianness
of that check and removes already declared define.

Fixes: 17d2f88f92ce ("IB/mlx5: Add ODP atomics support")
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/uverbs: Fix device cleanup
Yishai Hadas [Tue, 1 Aug 2017 06:41:36 +0000 (09:41 +0300)]
IB/uverbs: Fix device cleanup

Uverbs device should be cleaned up only when there is no
potential usage of.

As part of ib_uverbs_remove_one which might be triggered upon reset flow
the device reference count is decreased as expected and leave the final
cleanup to the FDs that were opened.

Current code increases reference count upon opening a new command FD and
decreases it upon closing the file. The event FD is opened internally
and rely on the command FD by taking on it a reference count.

In case that the command FD was closed and just later the event FD we
may ensure that the device resources as of srcu are still alive as they
are still in use.

Fixing the above by moving the reference count decreasing to the place
where the command FD is really freed instead of doing that when it was
just closed.

fixes: 036b10635739 ("IB/uverbs: Enable device removal when there are active user space applications")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/uverbs: Prevent leak of reserved field
Leon Romanovsky [Tue, 1 Aug 2017 06:41:35 +0000 (09:41 +0300)]
RDMA/uverbs: Prevent leak of reserved field

initialize to zero the response structure to prevent
the leakage of "resp.reserved" field.

drivers/infiniband/core/uverbs_cmd.c:1178 ib_uverbs_resize_cq() warn:
check that 'resp.reserved' doesn't leak information

Fixes: 33b9b3ee9709 ("IB: Add userspace support for resizing CQs")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/core: Fix race condition in resolving IP to MAC
Parav Pandit [Tue, 1 Aug 2017 06:41:34 +0000 (09:41 +0300)]
IB/core: Fix race condition in resolving IP to MAC

Currently while resolving IP address to MAC address single delayed work
is used for resolving multiple such resolve requests. This singled work
is essentially performs two tasks.
(a) any retry needed to resolve and
(b) it executes the callback function for all completed requests

While work is executing callbacks, any new work scheduled on for this
workqueue is lost because workqueue has completed looking at all pending
requests and now looking at callbacks, but work is still under
execution. Any further retry to look at pending requests in
process_req() after executing callbacks would lead to similar race
condition (may be reduce the probably further but doesn't eliminate it).
Retrying to enqueue work that from queue_req() context is not something
rest of the kernel modules have followed.

Therefore fix in this patch utilizes kernel facility to enqueue multiple
work items to a workqueue. This ensures that no such requests
gets lost in synchronization. Request list is still maintained so that
rdma_cancel_addr() can unlink the request and get the completion with
error sooner. Neighbour update event handling continues to be handled in
same way as before.
Additionally process_req() work entry cancels any pending work for a
request that gets completed while processing those requests.

Originally ib_addr was ST workqueue, but it became MT work queue with
patch of [1]. This patch again makes it similar to ST so that
neighbour update events handler work item doesn't race with
other work items.

In one such below trace, (though on 4.5 based kernel) it can be seen
that process_req() never executed the callback, which is likely for an
event that was schedule by queue_req() when previous callback was
getting executed by workqueue.

 [<ffffffff816b0dde>] schedule+0x3e/0x90
 [<ffffffff816b3c45>] schedule_timeout+0x1b5/0x210
 [<ffffffff81618c37>] ? ip_route_output_flow+0x27/0x70
 [<ffffffffa027f9c9>] ? addr_resolve+0x149/0x1b0 [ib_addr]
 [<ffffffff816b228f>] wait_for_completion+0x10f/0x170
 [<ffffffff810b6140>] ? try_to_wake_up+0x210/0x210
 [<ffffffffa027f220>] ? rdma_copy_addr+0xa0/0xa0 [ib_addr]
 [<ffffffffa0280120>] rdma_addr_find_l2_eth_by_grh+0x1d0/0x278 [ib_addr]
 [<ffffffff81321297>] ? sub_alloc+0x77/0x1c0
 [<ffffffffa02943b7>] ib_init_ah_from_wc+0x3a7/0x5a0 [ib_core]
 [<ffffffffa0457aba>] cm_req_handler+0xea/0x580 [ib_cm]
 [<ffffffff81015982>] ? __switch_to+0x212/0x5e0
 [<ffffffffa04582fd>] cm_work_handler+0x6d/0x150 [ib_cm]
 [<ffffffff810a14c1>] process_one_work+0x151/0x4b0
 [<ffffffff810a1940>] worker_thread+0x120/0x480
 [<ffffffff816b074b>] ? __schedule+0x30b/0x890
 [<ffffffff810a1820>] ? process_one_work+0x4b0/0x4b0
 [<ffffffff810a1820>] ? process_one_work+0x4b0/0x4b0
 [<ffffffff810a6b1e>] kthread+0xce/0xf0
 [<ffffffff810a6a50>] ? kthread_freezable_should_stop+0x70/0x70
 [<ffffffff816b53a2>] ret_from_fork+0x42/0x70
 [<ffffffff810a6a50>] ? kthread_freezable_should_stop+0x70/0x70
INFO: task kworker/u144:1:156520 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
kworker/u144:1  D ffff883ffe1d7600     0 156520      2 0x00000080
Workqueue: ib_addr process_req [ib_addr]
 ffff883f446fbbd8 0000000000000046 ffff881f95280000 ffff881ff24de200
 ffff883f66120000 ffff883f446f8008 ffff881f95280000 ffff883f6f9208c4
 ffff883f6f9208c8 00000000ffffffff ffff883f446fbbf8 ffffffff816b0dde

[1] http://lkml.iu.edu/hypermail/linux/kernel/1608.1/05834.html

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoMIPS: Add missing file for eBPF JIT.
David Daney [Fri, 4 Aug 2017 00:10:12 +0000 (17:10 -0700)]
MIPS: Add missing file for eBPF JIT.

Inexplicably, commit f381bf6d82f0 ("MIPS: Add support for eBPF JIT.")
lost a file somewhere on its path to Linus' tree.  Add back the
missing ebpf_jit.c so that we can build with CONFIG_BPF_JIT selected.

This version of ebpf_jit.c is identical to the original except for two
minor change need to resolve conflicts with changes merged from the
BPF branch:

A) Set prog->jited_len = image_size;
B) Use BPF_TAIL_CALL instead of BPF_CALL | BPF_X

Fixes: f381bf6d82f0 ("MIPS: Add support for eBPF JIT.")
Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 's390-bpf-jit-fixes'
David S. Miller [Fri, 4 Aug 2017 18:18:01 +0000 (11:18 -0700)]
Merge branch 's390-bpf-jit-fixes'

Daniel Borkmann says:

====================
Two BPF fixes for s390

Found while testing some other work touching JITs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobpf, s390: fix build for libbpf and selftest suite
Daniel Borkmann [Fri, 4 Aug 2017 12:20:55 +0000 (14:20 +0200)]
bpf, s390: fix build for libbpf and selftest suite

The BPF feature test as well as libbpf is missing the __NR_bpf
define for s390 and currently refuses to compile (selftest suite
depends on libbpf as well). Similar issue was fixed some time
ago via b0c47807d31d ("bpf: Add sparc support to tools and
samples."), just do the same and add definitions.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobpf, s390: fix jit branch offset related to ldimm64
Daniel Borkmann [Fri, 4 Aug 2017 12:20:54 +0000 (14:20 +0200)]
bpf, s390: fix jit branch offset related to ldimm64

While testing some other work that required JIT modifications, I
run into test_bpf causing a hang when JIT enabled on s390. The
problematic test case was the one from ddc665a4bb4b (bpf, arm64:
fix jit branch offset related to ldimm64), and turns out that we
do have a similar issue on s390 as well. In bpf_jit_prog() we
update next instruction address after returning from bpf_jit_insn()
with an insn_count. bpf_jit_insn() returns either -1 in case of
error (e.g. unsupported insn), 1 or 2. The latter is only the
case for ldimm64 due to spanning 2 insns, however, next address
is only set to i + 1 not taking actual insn_count into account,
thus fix is to use insn_count instead of 1. bpf_jit_enable in
mode 2 provides also disasm on s390:

Before fix:

  000003ff800349b6a7f40003   brc     15,3ff800349bc                 ; target
  000003ff800349ba: 0000               unknown
  000003ff800349bce3b0f0700024       stg     %r11,112(%r15)
  000003ff800349c2e3e0f0880024       stg     %r14,136(%r15)
  000003ff800349c8: 0db0               basr    %r11,%r0
  000003ff800349cac0ef00000000       llilf   %r14,0
  000003ff800349d0e320b0360004       lg      %r2,54(%r11)
  000003ff800349d6e330b03e0004       lg      %r3,62(%r11)
  000003ff800349dcec23ffeda065       clgrj   %r2,%r3,10,3ff800349b6 ; jmp
  000003ff800349e2e3e0b0460004       lg      %r14,70(%r11)
  000003ff800349e8e3e0b04e0004       lg      %r14,78(%r11)
  000003ff800349eeb904002e   lgr     %r2,%r14
  000003ff800349f2e3b0f0700004       lg      %r11,112(%r15)
  000003ff800349f8e3e0f0880004       lg      %r14,136(%r15)
  000003ff800349fe: 07fe               bcr     15,%r14

After fix:

  000003ff80ef3db4a7f40003   brc     15,3ff80ef3dba
  000003ff80ef3db8: 0000               unknown
  000003ff80ef3dbae3b0f0700024       stg     %r11,112(%r15)
  000003ff80ef3dc0e3e0f0880024       stg     %r14,136(%r15)
  000003ff80ef3dc6: 0db0               basr    %r11,%r0
  000003ff80ef3dc8c0ef00000000       llilf   %r14,0
  000003ff80ef3dcee320b0360004       lg      %r2,54(%r11)
  000003ff80ef3dd4e330b03e0004       lg      %r3,62(%r11)
  000003ff80ef3ddaec230006a065       clgrj   %r2,%r3,10,3ff80ef3de6 ; jmp
  000003ff80ef3de0e3e0b0460004       lg      %r14,70(%r11)
  000003ff80ef3de6e3e0b04e0004       lg      %r14,78(%r11)          ; target
  000003ff80ef3decb904002e   lgr     %r2,%r14
  000003ff80ef3df0e3b0f0700004       lg      %r11,112(%r15)
  000003ff80ef3df6e3e0f0880004       lg      %r14,136(%r15)
  000003ff80ef3dfc: 07fe               bcr     15,%r14

test_bpf.ko suite runs fine after the fix.

Fixes: 054623105728 ("s390/bpf: Add s390x eBPF JIT compiler backend")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'mlxsw-Couple-of-fixes'
David S. Miller [Fri, 4 Aug 2017 18:15:08 +0000 (11:15 -0700)]
Merge branch 'mlxsw-Couple-of-fixes'

Jiri Pirko says:

====================
mlxsw: Couple of fixes

Ido says:

The first patch prevents us from warning about valid situations that can
happen due to the fact that some operations in switchdev are deferred.

Second patch fixes a long standing problem in which we didn't correctly
free resources upon module removal, resulting in a memory leak.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_switchdev: Release multicast groups during fini
Ido Schimmel [Fri, 4 Aug 2017 12:12:30 +0000 (14:12 +0200)]
mlxsw: spectrum_switchdev: Release multicast groups during fini

Each multicast group (MID) stores a bitmap of ports to which a packet
should be forwarded to in case an MDB entry associated with the MID is
hit.

Since the initial introduction of IGMP snooping in commit 3a49b4fde2a1
("mlxsw: Adding layer 2 multicast support") the driver didn't correctly
free these multicast groups upon ungraceful situations such as the
removal of the upper bridge device or module removal.

The correct way to fix this is to associate each MID with the bridge
ports member in it and then drop the reference in case the bridge port
is destroyed, but this will result in a lot more code and will be fixed
in net-next.

For now, upon module removal, traverse the MID list and release each
one.

Fixes: 3a49b4fde2a1 ("mlxsw: Adding layer 2 multicast support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_switchdev: Don't warn about valid situations
Ido Schimmel [Fri, 4 Aug 2017 12:12:29 +0000 (14:12 +0200)]
mlxsw: spectrum_switchdev: Don't warn about valid situations

Some operations in the bridge driver such as MDB deletion are preformed
in an atomic context and thus deferred to a process context by the
switchdev infrastructure.

Therefore, by the time the operation is performed by the underlying
device driver it's possible the bridge port context is already gone.
This is especially true for removal flows, but theoretically can also be
invoked during addition.

Remove the warnings in such situations and return normally.

Fixes: c57529e1d5d8 ("mlxsw: spectrum: Replace vPorts with Port-VLAN")
Fixes: 3922285d96e7 ("net: bridge: Add support for offloading port attributes")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosparc64: Increase max_phys_bits to 51 and VA bits to 53 for M8.
Vijay Kumar [Sat, 29 Jul 2017 01:29:32 +0000 (19:29 -0600)]
sparc64: Increase max_phys_bits to 51 and VA bits to 53 for M8.

On M8 chips, use a max_phys_bits value of 51.

Also, M8 supports VA bits up to 54 bits. However, for now
restrict VA bits to 53 due to 4-level pagetable limitation.

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosparc64: recognize and support sparc M8 cpu type
Allen Pais [Mon, 24 Jul 2017 06:14:18 +0000 (11:44 +0530)]
sparc64: recognize and support sparc M8 cpu type

Recognize SPARC-M8 cpu type, hardware caps and cpu
distribution map.

Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: David Aldridge <david.j.aldridge@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosparc64: properly name the cpu constants
Allen Pais [Mon, 24 Jul 2017 06:14:17 +0000 (11:44 +0530)]
sparc64: properly name the cpu constants

Acked-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Linus Torvalds [Fri, 4 Aug 2017 17:17:45 +0000 (10:17 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc

Pull sparc fixes from David Miller:

 - block interrupts properly across the entire MMU context change (both
   the hw MMU context change and the TSB table change) so that we don't
   get a perf event interrupt in the middle. From Rob Gardner.

 - be sure to register hugepages early enough, from Nitin Gupta.

 - UltraSPARC-III user copy exception handling would return garbage for
   the copied length in some circumstances.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Fix exception handling in UltraSPARC-III memcpy.
  sbus: Convert to using %pOF instead of full_name
  sparc: defconfig: Cleanup from old Kconfig options
  sparc64: Register hugepages during arch init
  sparc64: Prevent perf from running during super critical sections

6 years agoMerge tag 'ceph-for-4.13-rc4' of git://github.com/ceph/ceph-client
Linus Torvalds [Fri, 4 Aug 2017 17:15:11 +0000 (10:15 -0700)]
Merge tag 'ceph-for-4.13-rc4' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "A bunch of fixes and follow-ups for -rc1 Luminous patches: issues with
  ->reencode_message() and last minute RADOS semantic changes in
  v12.1.2"

* tag 'ceph-for-4.13-rc4' of git://github.com/ceph/ceph-client:
  libceph: make RECOVERY_DELETES feature create a new interval
  libceph: upmap semantic changes
  crush: assume weight_set != null imples weight_set_size > 0
  libceph: fallback for when there isn't a pool-specific choose_arg
  libceph: don't call ->reencode_message() more than once per message
  libceph: make encode_request_*() work with r_mempool requests

6 years agoMerge tag 'sound-4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 4 Aug 2017 17:11:13 +0000 (10:11 -0700)]
Merge tag 'sound-4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Now we hit the usual ASoC-fix-flood in the middle of release.

  Most of the changes are trivial and device-specific, while one
  significant change is the fix for unbalanced of_graph_*() refcounts.
  This involved a change in the graph API itself that had been a bit
  messy"

* tag 'sound-4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Fix speaker output from VAIO VPCL14M1R
  device property: Fix usecount for of_graph_get_port_parent()
  ASoC: rt5665: fix wrong register for bclk ratio control
  ASoC: Intel: Use MCLK instead of BLCK as the sysclock for RT5514 codec on kabylake platform
  ASoC: Intel: Enabling ASRC for RT5663 codec on kabylake platform
  ASoC: codecs: msm8916-analog: fix DIG_CLK_CTL_RXD3_CLK_EN define
  ASoC: Intel: Skylake: Fix missing sentinels in sst_acpi_mach
  ASoC: sh: hac: add missing "int ret"
  ASoC: samsung: odroid: Fix EPLL frequency values
  ASoC: sgtl5000: Use snd_soc_kcontrol_codec()
  ASoC: rt5665: fix GPIO6 pin function define
  ASoC: ux500: Restore platform DAI assignments
  ASoC: fix pcm-creation regression
  ASoC: do not close shared backend dailink
  ASoC: pxa: SND_PXA2XX_SOC should depend on HAS_DMA
  ASoC: Intel: Skylake: Fix default dma_buffer_size
  ASoC: rt5663: Update the HW default values based on the shipping version
  ASoC: imx-ssi: add check on platform_get_irq return value

6 years agoMerge tag 'iommu-fixes-v4.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 4 Aug 2017 17:05:29 +0000 (10:05 -0700)]
Merge tag 'iommu-fixes-v4.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU fixes from Joerg Roedel:

 - fix a scheduling-while-atomic bug in the AMD IOMMU driver. It was
   found after the checker was enabled earlier.

 - a fix for the virtual APIC code in the AMD IOMMU driver which
   delivers device interrupts directly into KVM guests for assigned
   devices.

 - fixes for the recently merged lock-less page-table code for ARM. The
   redundant TLB syncs got reverted and locks added again around the TLB
   sync code.

 - fix for error handling in arm_smmu_add_device()

 - address sanitization fix for arm io-pgtable code

* tag 'iommu-fixes-v4.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/amd: Fix schedule-while-atomic BUG in initialization code
  iommu/amd: Enable ga_log_intr when enabling guest_mode
  iommu/io-pgtable: Sanitise map/unmap addresses
  iommu/arm-smmu: Fix the error path in arm_smmu_add_device
  Revert "iommu/io-pgtable: Avoid redundant TLB syncs"
  iommu/mtk: Avoid redundant TLB syncs locally
  iommu/arm-smmu: Reintroduce locking around TLB sync operations

6 years agoMerge tag 'mmc-v4.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Linus Torvalds [Fri, 4 Aug 2017 17:02:56 +0000 (10:02 -0700)]
Merge tag 'mmc-v4.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
 "A couple of mmc fixes intended for v4.13-rc4.

  MMC core:
   - Fix NULL pointer dereference for block I/O during hotplug

  MMC host:
   - sdhci-of-at91: Fix card detect for non-removable cards"

* tag 'mmc-v4.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: block: bypass the queue even if usage is present for hotplug
  mmc: sdhci-of-at91: force card detect value for non removable devices

6 years agoMerge tag 'drm-fixes-for-v4.13-rc4' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Fri, 4 Aug 2017 16:59:24 +0000 (09:59 -0700)]
Merge tag 'drm-fixes-for-v4.13-rc4' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "Either my email ate everything or everyone is on holidays, either way
  all I can find is some lonely AMD fixes"

[ Europe might be on vacation, and the Pacific NW is too hot for work. ]

* tag 'drm-fixes-for-v4.13-rc4' of git://people.freedesktop.org/~airlied/linux:
  drm/amdgpu: Use list_del_init in amdgpu_mn_unregister
  drm/amdgpu: Fix undue fallthroughs in golden registers initialization
  drm/amdgpu: fix header on gfx9 clear state

6 years agoMerge tag 'powerpc-4.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Fri, 4 Aug 2017 16:56:54 +0000 (09:56 -0700)]
Merge tag 'powerpc-4.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Fixes for recently merged code:
   - a fix for the _PAGE_DEVMAP support, which was breaking KVM on
     Power9 radix
   - avoid a (harmless) lockdep warning in the early SMP code
   - return failure for some uses of dma_set_mask() rather than falling
     back to 32-bits
   - fix stack setup in watchdog soft_nmi_common() to use emergency
     stack
   - fix of_irq_to_resource() error check in of_fsl_spi_probe()

  Two fixes going to stable:
   - fix saving of Transactional Memory SPRs in core dump
   - fix __check_irq_replay missing decrementer interrupt

  And two misc:
   - fix 64-bit boot wrapper build with non-biarch compiler
   - work around a POWER9 PMU hang after state-loss idle

  Thanks to: Alistair Popple, Aneesh Kumar K.V, Cyril Bur, Gustavo
  Romero, Jose Ricardo Ziviani, Laurent Vivier, Nicholas Piggin, Oliver
  O'Halloran, Sergei Shtylyov, Suraj Jitindar Singh, Thomas Gleixner"

* tag 'powerpc-4.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64: Fix __check_irq_replay missing decrementer interrupt
  powerpc/perf: POWER9 PMU stops after idle workaround
  powerpc/83xx/mpc832x_rdb: fix of_irq_to_resource() error check
  powerpc/64s: Fix stack setup in watchdog soft_nmi_common()
  powerpc/powernv/pci: Return failure for some uses of dma_set_mask()
  powerpc/boot: Fix 64-bit boot wrapper build with non-biarch compiler
  powerpc/smp: Call smp_ops->setup_cpu() directly on the boot CPU
  powerpc/tm: Fix saving of TM SPRs in core dump
  powerpc/mm: Fix pmd/pte_devmap() on non-leaf entries

6 years agosparc64: Fix exception handling in UltraSPARC-III memcpy.
David S. Miller [Fri, 4 Aug 2017 16:47:52 +0000 (09:47 -0700)]
sparc64: Fix exception handling in UltraSPARC-III memcpy.

Mikael Pettersson reported that some test programs in the strace-4.18
testsuite cause an OOPS.

After some debugging it turns out that garbage values are returned
when an exception occurs, causing the fixup memset() to be run with
bogus arguments.

The problem is that two of the exception handler stubs write the
successfully copied length into the wrong register.

Fixes: ee841d0aff64 ("sparc64: Convert U3copy_{from,to}_user to accurate exception reporting.")
Reported-by: Mikael Pettersson <mikpelinux@gmail.com>
Tested-by: Mikael Pettersson <mikpelinux@gmail.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoarm64: avoid overflow in VA_START and PAGE_OFFSET
Nick Desaulniers [Thu, 3 Aug 2017 18:03:58 +0000 (11:03 -0700)]
arm64: avoid overflow in VA_START and PAGE_OFFSET

The bitmask used to define these values produces overflow, as seen by
this compiler warning:

arch/arm64/kernel/head.S:47:8: warning:
      integer overflow in preprocessor expression
  #elif (PAGE_OFFSET & 0x1fffff) != 0
         ^~~~~~~~~~~
arch/arm64/include/asm/memory.h:52:46: note:
      expanded from macro 'PAGE_OFFSET'
  #define PAGE_OFFSET             (UL(0xffffffffffffffff) << (VA_BITS -
1))
                                      ~~~~~~~~~~~~~~~~~~  ^

It would be preferrable to use GENMASK_ULL() instead, but it's not set
up to be used from assembly (the UL() macro token pastes UL suffixes
when not included in assembly sources).

Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Suggested-by: Yury Norov <ynorov@caviumnetworks.com>
Suggested-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
6 years agoarm64: Fix potential race with hardware DBM in ptep_set_access_flags()
Catalin Marinas [Tue, 25 Jul 2017 13:53:03 +0000 (14:53 +0100)]
arm64: Fix potential race with hardware DBM in ptep_set_access_flags()

In a system with DBM (dirty bit management) capable agents there is a
possible race between a CPU executing ptep_set_access_flags() (maybe
non-DBM capable) and a hardware update of the dirty state (clearing of
PTE_RDONLY). The scenario:

a) the pte is writable (PTE_WRITE set), clean (PTE_RDONLY set) and old
   (PTE_AF clear)
b) ptep_set_access_flags() is called as a result of a read access and it
   needs to set the pte to writable, clean and young (PTE_AF set)
c) a DBM-capable agent, as a result of a different write access, is
   marking the entry as young (setting PTE_AF) and dirty (clearing
   PTE_RDONLY)

The current ptep_set_access_flags() implementation would set the
PTE_RDONLY bit in the resulting value overriding the DBM update and
losing the dirty state.

This patch fixes such race by setting PTE_RDONLY to the most permissive
(lowest value) of the current entry and the new one.

Fixes: 66dbd6e61a52 ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM")
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
6 years agoMerge tag 'davinci-fixes-for-v4.13' of git://git.kernel.org/pub/scm/linux/kernel...
Arnd Bergmann [Fri, 4 Aug 2017 11:22:33 +0000 (13:22 +0200)]
Merge tag 'davinci-fixes-for-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci into fixes

Pull "DaVinci fixes for v4.13" from Sekhar Nori:

Drop unused VPIF endpoints from device-tree.
They should be used only when an actual
remote-endpoint is connected.

* tag 'davinci-fixes-for-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci:
  ARM: dts: da850-lcdk: drop unused VPIF endpoints
  ARM: dts: da850-evm: drop unused VPIF endpoints

6 years agoMerge tag 'sunxi-fixes-for-4.13' of https://git.kernel.org/pub/scm/linux/kernel/git...
Arnd Bergmann [Fri, 4 Aug 2017 11:04:42 +0000 (13:04 +0200)]
Merge tag 'sunxi-fixes-for-4.13' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into fixes

Pull "Allwinner fixes for 4.13" from Chen-Yu Tsai:

Two fixes to correct the EMAC blocks memory region size to match the
datasheet. One that converts raw A83T clock indices to macros from the
clk dt-binding header, completing the A83T sunxi-ng clk driver.

* tag 'sunxi-fixes-for-4.13' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
  ARM: dts: sun8i: a83t: Switch to CCU device tree binding macros
  arm64: allwinner: sun50i-a64: Correct emac register size
  ARM: dts: sunxi: h3/h5: Correct emac register size

6 years agoMerge tag 'qcom-arm64-defconfig-fixes-for-4.13-rc2' of git://git.kernel.org/pub/scm...
Arnd Bergmann [Fri, 4 Aug 2017 11:03:24 +0000 (13:03 +0200)]
Merge tag 'qcom-arm64-defconfig-fixes-for-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux into fixes

Pull "Qualcomm ARM64 based defconfig Fixes for v4.13-rc2" from Andy Gross:

* Enable missing HWSPINLOCK

* tag 'qcom-arm64-defconfig-fixes-for-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux:
  arm64: defconfig: enable missing HWSPINLOCK

6 years agoARM: dts: tango4: Request RGMII RX and TX clock delays
Marc Gonzalez [Fri, 28 Jul 2017 13:27:49 +0000 (15:27 +0200)]
ARM: dts: tango4: Request RGMII RX and TX clock delays

RX and TX clock delays are required. Request them explicitly.

Fixes: cad008b8a77e6 ("ARM: dts: tango4: Initial device trees")
Cc: stable@vger.kernel.org
Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
6 years agobus: uniphier-system-bus: set up registers when resuming
Masahiro Yamada [Mon, 31 Jul 2017 05:49:25 +0000 (14:49 +0900)]
bus: uniphier-system-bus: set up registers when resuming

When resuming, set up registers that have been lost in the sleep state.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
6 years agoMerge tag 'renesas-fixes3-for-v4.13' of https://git.kernel.org/pub/scm/linux/kernel...
Arnd Bergmann [Fri, 4 Aug 2017 10:54:41 +0000 (12:54 +0200)]
Merge tag 'renesas-fixes3-for-v4.13' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into fixes

Pull "Third Round of Renesas ARM Based SoC Fixes for v4.13" from Simon Horman:

Fix deadlock in regulator quirk for R-Car Gen 2 SoCs

The da9063/da9210 regulator quirk for R-Car Gen2 boards uses a bus
notifier, and unregisters the notifier when it is no longer needed.
However, a notifier must not be unregistered from within the call chain.

This bug went unnoticed, as blocking_notifier_chain_unregister() didn't
take the semaphore during early boot. This is no longer the case as of
upstream commit 1c3c5eab171590f8 ("sched/core: Enable might_sleep() and
smp_processor_id() checks early") and a deadlock occurs.

* tag 'renesas-fixes3-for-v4.13' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
  ARM: shmobile: rcar-gen2: Fix deadlock in regulator quirk

6 years agoMerge tag 'mvebu-fixes-4.13-2' of git://git.infradead.org/linux-mvebu into fixes
Arnd Bergmann [Fri, 4 Aug 2017 10:53:21 +0000 (12:53 +0200)]
Merge tag 'mvebu-fixes-4.13-2' of git://git.infradead.org/linux-mvebu into fixes

Pull "mvebu fixes for 4.13 (part 2)" from Gregory CLEMENT:

All the fixes are for ARM64 mvebu:

 - Fix the RTC interrupt on A7K/A8K which was missed when switching
   from GIC to ICU
 - Mark the A7K/A8K crypto engine as dma coherent
 - Fix the number of GPIO on south bridge on Armada 3700

* tag 'mvebu-fixes-4.13-2' of git://git.infradead.org/linux-mvebu:
  ARM64: dts: marvell: armada-37xx: Fix the number of GPIO on south bridge
  arm64: dts: marvell: mark the cp110 crypto engine as dma coherent
  arm64: dts: marvell: use ICU for the CP110 slave RTC

6 years agoMerge tag 'amlogic-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman...
Arnd Bergmann [Fri, 4 Aug 2017 10:50:52 +0000 (12:50 +0200)]
Merge tag 'amlogic-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic into fixes

Pull "Amlogic fixes for v4.13-rc" from Kevin Hilman:

- 2 minor DT fixes

* tag 'amlogic-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic:
  ARM64: dts: meson-gxl-s905x-libretech-cc: fixup board definition
  ARM64: dts: meson-gx: use specific compatible for the AO pwms

6 years agoMerge tag 'v4.13-rockchip-dts32fixes-1' of git://git.kernel.org/pub/scm/linux/kernel...
Arnd Bergmann [Fri, 4 Aug 2017 10:48:46 +0000 (12:48 +0200)]
Merge tag 'v4.13-rockchip-dts32fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into fixes

Pull "Rockchip dts32 fixes for 4.13" from Heiko Stübner:

Fix for the recently added mali dt support. The example
showed a wrong value, so fix it before it gets copy-pasted
to much.

* tag 'v4.13-rockchip-dts32fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  ARM: dts: rockchip: fix mali gpu node on rk3288
  dt-bindings: gpu: drop wrong compatible from midgard binding example

6 years agodrm/i915/gvt: Initialize MMIO Block with HW state
Tina Zhang [Fri, 4 Aug 2017 09:39:41 +0000 (17:39 +0800)]
drm/i915/gvt: Initialize MMIO Block with HW state

MMIO block with tracked mmio, is introduced for the sake of performance
of searching tracked mmio. All the tracked mmio needs to get the initial
value from the HW state during vGPU being created. This patch is to
initialize the tracked registers in MMIO block with the HW state.

v2: Add "Fixes:" line for this patch (Zhenyu)

Fixes: 65f9f6febf12 ("drm/i915/gvt: Optimize MMIO register handling for some large MMIO blocks")
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
6 years agodrm/rockchip: vop: report error when check resource error
Mark yao [Mon, 31 Jul 2017 09:49:55 +0000 (17:49 +0800)]
drm/rockchip: vop: report error when check resource error

The user would be confused while facing a error commit without
any error report.

Signed-off-by: Mark Yao <mark.yao@rock-chips.com>
Reviewed-by: Sandy huang <sandy.huang@rock-chips.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1501494596-7090-1-git-send-email-mark.yao@rock-chips.com
6 years agodrm/rockchip: vop: round_up pitches to word align
Mark yao [Mon, 31 Jul 2017 09:49:50 +0000 (17:49 +0800)]
drm/rockchip: vop: round_up pitches to word align

VOP pitch register is word align, need align to word.

VOP_WIN0_VIR:
  bit[31:16] win0_vir_stride_uv
    Number of words of Win0 uv Virtual width
  bit[15:0] win0_vir_width
    Number of words of Win0 yrgb Virtual width
    ARGB888 : win0_vir_width
    RGB888 : (win0_vir_width*3/4) + (win0_vir_width%3)
    RGB565 : ceil(win0_vir_width/2)
    YUV : ceil(win0_vir_width/4)

Signed-off-by: Mark Yao <mark.yao@rock-chips.com>
Reviewed-by: Sandy huang <sandy.huang@rock-chips.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1501494591-7034-1-git-send-email-mark.yao@rock-chips.com
6 years agodrm/rockchip: vop: fix NV12 video display error
Mark yao [Mon, 31 Jul 2017 09:49:46 +0000 (17:49 +0800)]
drm/rockchip: vop: fix NV12 video display error

fixup the scale calculation formula on the case
src_height == (dst_height/2).

Signed-off-by: Mark Yao <mark.yao@rock-chips.com>
Reviewed-by: Sandy huang <sandy.huang@rock-chips.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1501494586-6984-1-git-send-email-mark.yao@rock-chips.com
6 years agodrm/rockchip: vop: fix iommu page fault when resume
Mark yao [Mon, 31 Jul 2017 09:49:42 +0000 (17:49 +0800)]
drm/rockchip: vop: fix iommu page fault when resume

Iommu would get page fault with following path:
   vop_disable:
      1, disable all windows and set vop config done
      2, vop enter to standy, all windows not works, but their registers
         are not clean, when you read window's enable bit, may found the
         window is enable.

   vop_enable:
      1, memcpy(vop->regsbak, vop->regs, len)
         save current vop registers to vop->regsbak, then you can found
         window is enable on regsbak.
      2, VOP_WIN_SET(vop, win, gate, 1);
         force enable window gate, but gate and enable are on same
         hardware register, then window enable bit rewrite to vop hardware.
      3, vop power on, and vop might try to scan destroyed buffer,
         then iommu get page fault.

Move windows disable after vop regsbak restore, then vop regsbak mechanism
would keep tracing the modify, everything would be safe.

Signed-off-by: Mark Yao <mark.yao@rock-chips.com>
Reviewed-by: Sandy huang <sandy.huang@rock-chips.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1501494582-6934-1-git-send-email-mark.yao@rock-chips.com
6 years agopowerpc/64: Fix __check_irq_replay missing decrementer interrupt
Nicholas Piggin [Tue, 1 Aug 2017 13:59:28 +0000 (23:59 +1000)]
powerpc/64: Fix __check_irq_replay missing decrementer interrupt

If the decrementer wraps again and de-asserts the decrementer
exception while hard-disabled, __check_irq_replay() has a test to
notice the wrap when interrupts are re-enabled.

The decrementer check must be done when clearing the PACA_IRQ_HARD_DIS
flag, not when the PACA_IRQ_DEC flag is tested. Previously this worked
because the decrementer interrupt was always the first one checked
after clearing the hard disable flag, but HMI check was moved ahead of
that, which introduced this bug.

This can cause a missed decrementer interrupt if we soft-disable
interrupts then take an HMI which is recorded in irq_happened, then
hard-disable interrupts for > 4s to wrap the decrementer.

Fixes: e0e0d6b7390b ("powerpc/64: Replay hypervisor maintenance interrupt first")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/perf: POWER9 PMU stops after idle workaround
Nicholas Piggin [Thu, 20 Jul 2017 01:53:22 +0000 (11:53 +1000)]
powerpc/perf: POWER9 PMU stops after idle workaround

POWER9 DD2 PMU can stop after a state-loss idle in some conditions.

A solution is to set then clear MMCRA[60] after wake from state-loss
idle. MMCRA[60] is a non-architected bit, see the user manual for
details.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agoMerge branch 'drm-fixes-4.13' of git://people.freedesktop.org/~agd5f/linux into drm...
Dave Airlie [Fri, 4 Aug 2017 01:43:14 +0000 (11:43 +1000)]
Merge branch 'drm-fixes-4.13' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

Just a few small fixes for 4.13.

* 'drm-fixes-4.13' of git://people.freedesktop.org/~agd5f/linux:
  drm/amdgpu: Use list_del_init in amdgpu_mn_unregister
  drm/amdgpu: Fix undue fallthroughs in golden registers initialization
  drm/amdgpu: fix header on gfx9 clear state

6 years agoMerge branch 'tcp-xmit-timer-rearming'
David S. Miller [Thu, 3 Aug 2017 22:38:31 +0000 (15:38 -0700)]
Merge branch 'tcp-xmit-timer-rearming'

Neal Cardwell says:

====================
tcp: fix xmit timer rearming to avoid stalls

This patch series is a bug fix for a TCP loss recovery performance bug
reported independently in recent netdev threads:

 (i)  July 26, 2017: netdev thread "TCP fast retransmit issues"
 (ii) July 26, 2017: netdev thread:
       "[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one
       outstanding TLP retransmission"

Many thanks to Klavs Klavsen and Mao Wenan for the detailed reports,
traces, and packetdrill test cases, which enabled us to root-cause
this issue and verify the fix.

- v1 -> v2:
 - In patch 2/3, changed an unclear comment in the pre-existing code
   in tcp_schedule_loss_probe() to be more clear (thanks to Eric Dumazet
   for suggesting we improve this).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: fix xmit timer to only be reset if data ACKed/SACKed
Neal Cardwell [Thu, 3 Aug 2017 13:19:54 +0000 (09:19 -0400)]
tcp: fix xmit timer to only be reset if data ACKed/SACKed

Fix a TCP loss recovery performance bug raised recently on the netdev
list, in two threads:

(i)  July 26, 2017: netdev thread "TCP fast retransmit issues"
(ii) July 26, 2017: netdev thread:
     "[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one
     outstanding TLP retransmission"

The basic problem is that incoming TCP packets that did not indicate
forward progress could cause the xmit timer (TLP or RTO) to be rearmed
and pushed back in time. In certain corner cases this could result in
the following problems noted in these threads:

 - Repeated ACKs coming in with bogus SACKs corrupted by middleboxes
   could cause TCP to repeatedly schedule TLPs forever. We kept
   sending TLPs after every ~200ms, which elicited bogus SACKs, which
   caused more TLPs, ad infinitum; we never fired an RTO to fill in
   the holes.

 - Incoming data segments could, in some cases, cause us to reschedule
   our RTO or TLP timer further out in time, for no good reason. This
   could cause repeated inbound data to result in stalls in outbound
   data, in the presence of packet loss.

This commit fixes these bugs by changing the TLP and RTO ACK
processing to:

 (a) Only reschedule the xmit timer once per ACK.

 (b) Only reschedule the xmit timer if tcp_clean_rtx_queue() deems the
     ACK indicates sufficient forward progress (a packet was
     cumulatively ACKed, or we got a SACK for a packet that was sent
     before the most recent retransmit of the write queue head).

This brings us back into closer compliance with the RFCs, since, as
the comment for tcp_rearm_rto() notes, we should only restart the RTO
timer after forward progress on the connection. Previously we were
restarting the xmit timer even in these cases where there was no
forward progress.

As a side benefit, this commit simplifies and speeds up the TCP timer
arming logic. We had been calling inet_csk_reset_xmit_timer() three
times on normal ACKs that cumulatively acknowledged some data:

1) Once near the top of tcp_ack() to switch from TLP timer to RTO:
        if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
               tcp_rearm_rto(sk);

2) Once in tcp_clean_rtx_queue(), to update the RTO:
        if (flag & FLAG_ACKED) {
               tcp_rearm_rto(sk);

3) Once in tcp_ack() after tcp_fastretrans_alert() to switch from RTO
   to TLP:
        if (icsk->icsk_pending == ICSK_TIME_RETRANS)
               tcp_schedule_loss_probe(sk);

This commit, by only rescheduling the xmit timer once per ACK,
simplifies the code and reduces CPU overhead.

This commit was tested in an A/B test with Google web server
traffic. SNMP stats and request latency metrics were within noise
levels, substantiating that for normal web traffic patterns this is a
rare issue. This commit was also tested with packetdrill tests to
verify that it fixes the timer behavior in the corner cases discussed
in the netdev threads mentioned above.

This patch is a bug fix patch intended to be queued for -stable
relases.

Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
Reported-by: Klavs Klavsen <kl@vsen.dk>
Reported-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: enable xmit timer fix by having TLP use time when RTO should fire
Neal Cardwell [Thu, 3 Aug 2017 13:19:53 +0000 (09:19 -0400)]
tcp: enable xmit timer fix by having TLP use time when RTO should fire

Have tcp_schedule_loss_probe() base the TLP scheduling decision based
on when the RTO *should* fire. This is to enable the upcoming xmit
timer fix in this series, where tcp_schedule_loss_probe() cannot
assume that the last timer installed was an RTO timer (because we are
no longer doing the "rearm RTO, rearm RTO, rearm TLP" dance on every
ACK). So tcp_schedule_loss_probe() must independently figure out when
an RTO would want to fire.

In the new TLP implementation following in this series, we cannot
assume that icsk_timeout was set based on an RTO; after processing a
cumulative ACK the icsk_timeout we see can be from a previous TLP or
RTO. So we need to independently recalculate the RTO time (instead of
reading it out of icsk_timeout). Removing this dependency on the
nature of icsk_timeout makes things a little easier to reason about
anyway.

Note that the old and new code should be equivalent, since they are
both saying: "if the RTO is in the future, but at an earlier time than
the normal TLP time, then set the TLP timer to fire when the RTO would
have fired".

Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: introduce tcp_rto_delta_us() helper for xmit timer fix
Neal Cardwell [Thu, 3 Aug 2017 13:19:52 +0000 (09:19 -0400)]
tcp: introduce tcp_rto_delta_us() helper for xmit timer fix

Pure refactor. This helper will be required in the xmit timer fix
later in the patch series. (Because the TLP logic will want to make
this calculation.)

Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'vfio-v4.13-rc4' of git://github.com/awilliam/linux-vfio
Linus Torvalds [Thu, 3 Aug 2017 22:25:14 +0000 (15:25 -0700)]
Merge tag 'vfio-v4.13-rc4' of git://github.com/awilliam/linux-vfio

Pull VFIO fixes from Alex Williamson:

 - SPAPR/EEH config build fix (Murilo Opsfelder Araujo)

 - Fix possible device lock deadlock (Alex Williamson)

 - Correctly size integrated endpoint PCIe capabilities (Alex
   Williamson)

* tag 'vfio-v4.13-rc4' of git://github.com/awilliam/linux-vfio:
  vfio/pci: Fix handling of RC integrated endpoint PCIe capability size
  vfio/pci: Use pci_try_reset_function() on initial open
  include/linux/vfio.h: Guard powerpc-specific functions with CONFIG_VFIO_SPAPR_EEH