Rick Jones [Fri, 7 Aug 2015 18:10:37 +0000 (11:10 -0700)]
net: add explicit logging and stat for neighbour table overflow
Add an explicit neighbour table overflow message (ratelimited) and
statistic to make diagnosing neighbour table overflows tractable in
the wild.
Diagnosing a neighbour table overflow can be quite difficult in the wild
because there is no explicit dmesg logged. Callers to neighbour code
seem to use net_dbg_ratelimit when the neighbour call fails which means
the "base message" is not emitted and the callback suppressed messages
from the ratelimiting can end-up juxtaposed with unrelated messages.
Further, a forced garbage collection will increment a stat on each call
whether it was successful in freeing-up a table entry or not, so that
statistic is only a hint. So, add a net_info_ratelimited message and
explicit statistic to the neighbour code.
Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bridge: netlink: add support for vlan_filtering attribute
This patch adds the ability to toggle the vlan filtering support via
netlink. Since we're already running with rtnl in .changelink() we don't
need to take any additional locks.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
After "62bccb8 net-timestamp: Make the clone operation stand-alone from phy
timestamping" the hwtstamps parameter of skb_complete_tx_timestamp() may no
longer be NULL.
Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Cc: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 10 Aug 2015 20:34:29 +0000 (13:34 -0700)]
Merge branch 'qlcnic-enhancements'
Shahed Shaikh says:
====================
qlcnic: enhancements
This series adds few enhancements.
o Patch from Harish reorders the sequence of header files inclusion,
keeping kernel's header files on top.
o Firmware introduced a new feature which allows driver to increases
the size of firmware dump of iSCSI function which is being collected
by NIC driver.
o Print buffer address which is holding a firmware dump.
o Use vzalloc() instead kzalloc() for allocating large chunk of memory
which will avoid potential memory allocation failure.
o Add new device ID for 0x8C30 which is a 83xx series based VF function.
Please apply this series to net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shahed Shaikh [Fri, 7 Aug 2015 11:17:06 +0000 (07:17 -0400)]
qlcnic: Don't use kzalloc unncecessarily for allocating large chunk of memory
Driver allocates a large chunk of temporary buffer using kzalloc
to copy FW image. As there is no real need of this memory to be
physically contiguous, use vzalloc instead.
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shahed Shaikh [Fri, 7 Aug 2015 11:17:03 +0000 (07:17 -0400)]
qlcnic: Add support to enable capability to extend minidump for iSCSI
In some cases it is required to capture minidump for iSCSI functions
as part of default minidump collection process. To enable this, firmware
exports it's capability and driver need to enable that capability
by issuing a mailbox command.
With this feature, firmware can provide additional iSCSI function's
minidump with smaller minidump capture mask (0x1f).
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Harish Patil [Fri, 7 Aug 2015 11:17:02 +0000 (07:17 -0400)]
qlcnic: Rearrange ordering of header files inclusion
Include local headers files after kernel's header files.
Signed-off-by: Harish Patil <harish.patil@qlogic.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Fri, 7 Aug 2015 09:07:50 +0000 (17:07 +0800)]
net: phy: select copper mode when Marvel 88e1111 in SGMII
For the Marvel 88e1111 PHY only two SGMII modes are available, both
allowing only SGMII to copper mode (with or without clock). SGMII
to fiber mode is not supported. Make sure the fiber/copper registers
selector bits are cleared for selecting copper mode.
Signed-off-by: Madalin Bucur <madalin.bucur@freescale.com> Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kevin Hao [Fri, 7 Aug 2015 05:52:37 +0000 (13:52 +0800)]
net: fec: fix the race between xmit and bdp reclaiming path
When we transmit a fragmented skb, we may run into a race like the
following scenario (assume txq->cur_tx is next to txq->dirty_tx):
cpu 0 cpu 1
fec_enet_txq_submit_skb
reserve a bdp for the first fragment
fec_enet_txq_submit_frag_skb
update the bdp for the other fragment
update txq->cur_tx
fec_enet_tx_queue
bdp = fec_enet_get_nextdesc(txq->dirty_tx, fep, queue_id);
This bdp is the bdp reserved for the first segment. Given
that this bdp BD_ENET_TX_READY bit is not set and txq->cur_tx
is already pointed to a bdp beyond this one. We think this is a
completed bdp and try to reclaim it.
update the bdp for the first segment
update txq->cur_tx
So we shouldn't update the txq->cur_tx until all the update to the
bdps used for fragments are performed. Also add the corresponding
memory barrier to guarantee that the update to the bdps, dirty_tx and
cur_tx performed in the proper order.
Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
It seems so far plausible that the recursive call into rtnetlink_rcv()
looks suspicious. One way, where this could trigger is that the senders
NETLINK_CB(skb).portid was wrongly 0 (which is rtnetlink socket), so
the rtnl_getlink() request's answer would be sent to the kernel instead
to the actual user process, thus grabbing rtnl_mutex() twice.
One theory would be that netlink_autobind() triggered via netlink_sendmsg()
internally overwrites the -EBUSY error to 0, but where it is wrongly
originating from __netlink_insert() instead. That would reset the
socket's portid to 0, which is then filled into NETLINK_CB(skb).portid
later on. As commit d470e3b483dc ("[NETLINK]: Fix two socket hashing bugs.")
also puts it, -EBUSY should not be propagated from netlink_insert().
It looks like it's very unlikely to reproduce. We need to trigger the
rhashtable_insert_rehash() handler under a situation where rehashing
currently occurs (one /rare/ way would be to hit ht->elasticity limits
while not filled enough to expand the hashtable, but that would rather
require a specifically crafted bind() sequence with knowledge about
destination slots, seems unlikely). It probably makes sense to guard
__netlink_insert() in any case and remap that error. It was suggested
that EOVERFLOW might be better than an already overloaded ENOMEM.
Reference: http://thread.gmane.org/gmane.linux.network/372676 Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Thu, 6 Aug 2015 20:48:23 +0000 (22:48 +0200)]
bna: fix interrupts storm caused by erroneous packets
The commit "e29aa33 bna: Enable Multi Buffer RX" moved packets counter
increment from the beginning of the NAPI processing loop after the check
for erroneous packets so they are never accounted. This counter is used
to inform firmware about number of processed completions (packets).
As these packets are never acked the firmware fires IRQs for them again
and again.
Fixes: e29aa33 ("bna: Enable Multi Buffer RX") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Acked-by: Rasesh Mody <rasesh.mody@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 10 Aug 2015 17:57:01 +0000 (10:57 -0700)]
Merge branch 'mvpp2-fixes'
Marcin Wojtas says:
====================
Fixes for the network driver of Marvell Armada 375 SoC
This is a set of three patches that fix long-lasting problems implemented in
the initial support for the Armada 375 network controller.
Due to an inappropriate concept of handling the per-CPU sent packets'
processing on TX path the driver numerous problems occured, such as RCU
stalls. Those have been fixed, of which details you can find in the commit
logs. The patches were intensively tested on top of v4.2-rc5.
I'm looking forward to any comments or remarks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Thu, 6 Aug 2015 17:00:30 +0000 (19:00 +0200)]
net: mvpp2: replace TX coalescing interrupts with hrtimer
The PP2 controller is capable of per-CPU TX processing, which means there are
per-CPU banked register sets and queues. Current version of the driver supports
TX packet coalescing - once on given CPU sent packets amount reaches a threshold
value, an IRQ occurs. However, there is a single interrupt line responsible for
CPU0/1 TX and RX events (the latter is not per-CPU, the hardware does not
support RSS).
When the top-half executes the interrupt cause is not known. This is why in
NAPI poll function, along with RX processing, IRQ cause register on both
CPU's is accessed in order to determine on which of them the TX coalescing
threshold might have been reached. Thus the egress processing and releasing the
buffers is able to take place on the corresponding CPU. Hitherto approach lead
to an illegal usage of on_each_cpu function in softirq context.
The problem is solved by resigning from TX coalescing interrupts and separating
egress finalization from NAPI processing. For that purpose a method of using
hrtimer is introduced. In main transmit function (mvpp2_tx) buffers are released
once a software coalescing threshold is reached. In case not all the data is
processed a timer is set on this CPU - in its interrupt context a tasklet is
scheduled in which all queues are processed. At once only one timer per-CPU can
be running, which is controlled by a dedicated flag.
This commit removes TX processing from NAPI polling function, disables hardware
coalescing and enables hrtimer with tasklet, using new per-CPU port structure
(mvpp2_port_pcpu).
Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mvpp2 driver allows usage of per-CPU TX processing. Once the packets are
prepared independetly on each CPU, the hardware enqueues the descriptors in
common TX queue. After they are sent, the buffers and associated sk_buffs
should be released on the corresponding CPU.
This is why a special index is maintained in order to point to the right data to
be released after transmission takes place. Each per-CPU TX queue comprise an
array of sent sk_buffs, freed in mvpp2_txq_bufs_free function. However, the
index was used there also for obtaining a descriptor (and therefore a buffer to
be DMA-unmapped) from common TX queue, which was wrong, because it was not
referring to the current CPU.
This commit enables proper unmapping of sent data buffers by indexing them in
per-CPU queues using a dedicated array for keeping their physical addresses.
Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Thu, 6 Aug 2015 17:00:28 +0000 (19:00 +0200)]
net: mvpp2: remove excessive spinlocks from driver initialization
Using spinlocks protection during one-time driver initialization is not
necessary. Moreover it resulted in invalid GFP_KERNEL allocation under the lock.
This commit removes redundant spinlocks from buffer manager part of mvpp2
initialization.
Signed-off-by: Marcin Wojtas <mw@semihalf.com> Reported-by: Alexandre Fournier <alexandre.fournier@wisp-e.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 10 Aug 2015 17:48:11 +0000 (10:48 -0700)]
Merge tag 'mfd-fixes-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD fixes from Lee Jones:
- fix dependency issues on ChromeOS platforms
- fix runtime PM issues on Arizona
- fix IRQ/suspend race on Arizona
* tag 'mfd-fixes-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
mfd: Remove MFD_CROS_EC_SPI depends on OF
platform/chrome: Don't make CHROME_PLATFORMS depends on X86 || ARM
mfd: arizona: Fix initialisation of the PM runtime
mfd: arizona: Fix race between runtime suspend and IRQs
Linus Torvalds [Mon, 10 Aug 2015 17:38:42 +0000 (10:38 -0700)]
Merge tag 'ntb-4.2-rc7' of git://github.com/jonmason/ntb
Pull NTB bugfixes from Jon Mason:
"NTB bug fixes to address transport receive issues, stats, link
negotiation issues, and string formatting"
* tag 'ntb-4.2-rc7' of git://github.com/jonmason/ntb:
ntb: avoid format string in dev_set_name
NTB: Fix dereference before check
NTB: Fix zero size or integer overflow in ntb_set_mw
NTB: Schedule to receive on QP link up
NTB: Fix oops in debugfs when transport is half-up
NTB: ntb_netdev not covering all receive errors
NTB: Fix transport stats for multiple devices
NTB: Fix ntb_transport out-of-order RX update
Since 906c55579a63 ("timekeeping: Copy the shadow-timekeeper over the
real timekeeper last") it has become possible on arm64 to:
- Obtain a CLOCK_MONOTONIC_COARSE or CLOCK_REALTIME_COARSE timestamp
via syscall.
- Subsequently obtain a timestamp for the same clock ID via VDSO which
predates the first timestamp (by one jiffy).
This is because arm64's update_vsyscall is deriving the coarse time
using the __current_kernel_time interface, when it should really be
using the timekeeper object provided to it by the timekeeping core.
It happened to work before only because __current_kernel_time would
access the same timekeeper object which had been passed to
update_vsyscall. This is no longer the case.
x86/xen: build "Xen PV" APIC driver for domU as well
It turns out that a PV domU also requires the "Xen PV" APIC
driver. Otherwise, the flat driver is used and we get stuck in busy
loops that never exit, such as in this stack trace:
(gdb) target remote localhost:9999
Remote debugging using localhost:9999
__xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
56 while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
(gdb) bt
#0 __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
#1 __default_send_IPI_shortcut (shortcut=<optimized out>,
dest=<optimized out>, vector=<optimized out>) at
./arch/x86/include/asm/ipi.h:75
#2 apic_send_IPI_self (vector=246) at arch/x86/kernel/apic/probe_64.c:54
#3 0xffffffff81011336 in arch_irq_work_raise () at
arch/x86/kernel/irq_work.c:47
#4 0xffffffff8114990c in irq_work_queue (work=0xffff88000fc0e400) at
kernel/irq_work.c:100
#5 0xffffffff8110c29d in wake_up_klogd () at kernel/printk/printk.c:2633
#6 0xffffffff8110ca60 in vprintk_emit (facility=0, level=<optimized
out>, dict=0x0 <irq_stack_union>, dictlen=<optimized out>,
fmt=<optimized out>, args=<optimized out>)
at kernel/printk/printk.c:1778
#7 0xffffffff816010c8 in printk (fmt=<optimized out>) at
kernel/printk/printk.c:1868
#8 0xffffffffc00013ea in ?? ()
#9 0x0000000000000000 in ?? ()
Mailing-list-thread: https://lkml.org/lkml/2015/8/4/755 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
A sun7i-a20-olinuxino-micro fails to boot when kernel parameter
vt.global_cursor_default=0. The value is copied to vc->vc_deccm
causing the initialization of ops->cur_blink_jiffies to be skipped.
Unconditionally initialize it.
Reported-and-tested-by: Jonathan Liu <net147@gmail.com> Signed-off-by: Scot Doyle <lkml14@scotdoyle.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
In case videomode_from_timings() fails in function of_get_videomode(), the
allocated display timing data is not freed in the exit path. Make sure that
display_timings_release() is called in any case. Detected by Coverity CID 1309681.
Signed-off-by: Christian Engelmayer <cengelma@gmx.at> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Phil Sutter [Mon, 27 Jul 2015 22:53:26 +0000 (00:53 +0200)]
netfilter: SYNPROXY: fix sending window update to client
Upon receipt of SYNACK from the server, ipt_SYNPROXY first sends back an ACK to
finish the server handshake, then calls nf_ct_seqadj_init() to initiate
sequence number adjustment of forwarded packets to the client and finally sends
a window update to the client to unblock it's TX queue.
Since synproxy_send_client_ack() does not set synproxy_send_tcp()'s nfct
parameter, no sequence number adjustment happens and the client receives the
window update with incorrect sequence number. Depending on client TCP
implementation, this leads to a significant delay (until a window probe is
being sent).
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This happens when networking namespaces are enabled.
Suggested-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jyri Sarha [Fri, 7 Aug 2015 11:04:30 +0000 (14:04 +0300)]
OMAPDSS: Fix omap_dss_find_output_by_port_node() port refcount decrement
Fix omap_dss_find_output_by_port_node() port parameter refcount
decrementation. The only user of dss_of_port_get_parent_device()
function is omap_dss_find_output_by_port_node() and it assumes the
refcount of the port parameter is not decremented by the call.
Signed-off-by: Jyri Sarha <jsarha@ti.com> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
fbdev: select versatile helpers for the integrator
Commit 11c32d7b6274cb0f554943d65bd4a126c4a86dcd
"video: move Versatile CLCD helpers" missed the fact
that the Integrator/CP is also using the helper, and
as a result the platform got only stubs and no graphics.
Add this as a default selection to Kconfig so we have
graphics again.
David S. Miller [Mon, 10 Aug 2015 05:54:10 +0000 (22:54 -0700)]
Merge branch 'mlxsw-fixes'
Jiri Pirko says:
====================
mlxsw: Couple of fixes/adjustments
Ido Schimmel (5):
mlxsw: Call free_netdev when removing port
mlxsw: Make system port to local port mapping explicit
mlxsw: Simplify mlxsw_sx_port_xmit function
mlxsw: Use correct skb length when dumping payload
mlxsw: Fix use-after-free bug in mlxsw_sx_port_xmit
Jiri Pirko (2):
mlxsw: Make pci module dependent on HAS_DMA and HAS_IOMEM
mlxsw: Strip FCS from incoming packets
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 6 Aug 2015 14:41:56 +0000 (16:41 +0200)]
mlxsw: Simplify mlxsw_sx_port_xmit function
Previously we only checked if the transmission queue is not full in the
middle of the xmit function. This lead to complex logic due to the fact
that sometimes we need to reallocate the headroom for our Tx header.
Allow the switch driver to know if the transmission queue is not full
before sending the packet and remove this complex logic.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 6 Aug 2015 14:41:54 +0000 (16:41 +0200)]
mlxsw: Make pci module dependent on HAS_DMA and HAS_IOMEM
This resolves compile errors on um-allyesconfig.
Note that there are many other drivers which have the same issue.
Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 6 Aug 2015 14:41:53 +0000 (16:41 +0200)]
mlxsw: Make system port to local port mapping explicit
System ports are unique identifiers in a multi-ASIC environment that
represent all the available ports in the system. Local ports on the
other hand, are unique only within the local ASIC.
Since system port to local port mapping is not part of the HW-SW
contract and since only single-ASIC configurations are currently
supported, set an explicit 1:1 mapping by configuring the Switch System
Port Record (SSPR) register.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 6 Aug 2015 14:41:52 +0000 (16:41 +0200)]
mlxsw: Call free_netdev when removing port
When removing a port's netdevice we should also free the memory
allocated by alloc_etherdev(). Do this by calling free_netdev() at the
end of the teardown sequence.
Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Masanari Iida [Thu, 6 Aug 2015 12:27:54 +0000 (21:27 +0900)]
net: ethernet: Fix double word "the the" in eth.c
This patch fix double word "the the" in
Documentation/DocBook/networking/API-eth-get-headlen.html
Documentation/DocBook/networking/netdev.html
Documentation/DocBook/networking.xml
These files are generated from comment in source,
so I have to fix comment in net/ethernet/eth.c.
Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Shearman [Thu, 6 Aug 2015 10:04:56 +0000 (11:04 +0100)]
mpls: Enforce payload type of traffic sent using explicit NULL
RFC 4182 s2 states that if an IPv4 Explicit NULL label is the only
label on the stack, then after popping the resulting packet must be
treated as a IPv4 packet and forwarded based on the IPv4 header. The
same is true for IPv6 Explicit NULL with an IPv6 packet following.
Therefore, when installing the IPv4/IPv6 Explicit NULL label routes,
add an attribute that specifies the expected payload type for use at
forwarding time for determining the type of the encapsulated packet
instead of inspecting the first nibble of the packet.
Signed-off-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
changes in V7:
- rebase the whole patch set to net-next tree(9dc20a64);
- split out the core perf APIs into Patch 1/5;
- change the return value of function perf_event_attrs()
from struct perf_event * to const struct perf_event * in
Patch 1/5;
- rename the function perf_event_read_internal() to perf_event_
read_local() and rewrite it in Patch 1/5;
- rename the function check_func_limit() to check_map_func
_compatibility() and remove the unnecessary pass pointer to
a pointer in Patch 4/5;
changes in V6:
- make the Patch 1/4 commit message more meaning and readable;
- remove the unnecessary comment in Patch 2/4 and make it clean;
- declare the function perf_event_release_kernel() in include/
linux/perf_event.h to fix the build error when CONFIG_PERF_EVENTS
isn't configured in Patch 2/4;
- add function perf_event_attrs() to get the struct perf_event_attr
in Patch 2/4.
- move the related code from kernel/trace/bpf_trace.c to kernel/
events/core.c and add function perf_event_read_internal() to
avoid poking inside of the event outside of perf code in Patch 3/4;
- generial the func & map match-pair with an array in Patch 3/4;
changes in V5:
- move struct fd_array_map_ops* fd_ops to bpf_map;
- move array perf event decrement refcnt function to
map_free;
- fix the NULL ptr of perf_event_get();
- move bpf_perf_event_read() to kernel/bpf/bpf_trace.c;
- get rid of the remaining struct bpf_prog;
- move the unnecessay cast on void *;
changes in V4:
- make the bpf_prog_array_map more generic;
- fix the bug of event refcnt leak;
- use more useful errno in bpf_perf_event_read();
changes in V3:
- collapse V2 patches 1-3 into one;
- drop the function map->ops->map_traverse_elem() and release
the struct perf_event in map_free;
- only allow to access bpf_perf_event_read() from programs;
- update the perf_event_array_map elem via xchg();
- pass index directly to bpf_perf_event_read() instead of
MAP_KEY;
changes in V2:
- put atomic_long_inc_not_zero() between fdget() and fdput();
- limit the event type to PERF_TYPE_RAW and PERF_TYPE_HARDWARE;
- Only read the event counter on current CPU or on current
process;
- add new map type BPF_MAP_TYPE_PERF_EVENT_ARRAY to store the
pointer to the struct perf_event;
- according to the perf_event_map_fd and key, the function
bpf_perf_event_read() can get the Hardware PMU counter value;
Patch 5/5 is a simple example and shows how to use this new eBPF
programs ability. The PMU counter data can be found in
/sys/kernel/debug/tracing/trace(trace_pipe).(the cycles PMU
value when 'kprobe/sys_write' sampling)
Kaixu Xia [Thu, 6 Aug 2015 07:02:35 +0000 (07:02 +0000)]
bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter
According to the perf_event_map_fd and index, the function
bpf_perf_event_read() can convert the corresponding map
value to the pointer to struct perf_event and return the
Hardware PMU counter value.
Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kaixu Xia [Thu, 6 Aug 2015 07:02:34 +0000 (07:02 +0000)]
bpf: Add new bpf map type to store the pointer to struct perf_event
Introduce a new bpf map type 'BPF_MAP_TYPE_PERF_EVENT_ARRAY'.
This map only stores the pointer to struct perf_event. The
user space event FDs from perf_event_open() syscall are converted
to the pointer to struct perf_event and stored in map.
Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wang Nan [Thu, 6 Aug 2015 07:02:33 +0000 (07:02 +0000)]
bpf: Make the bpf_prog_array_map more generic
All the map backends are of generic nature. In order to avoid
adding much special code into the eBPF core, rewrite part of
the bpf_prog_array map code and make it more generic. So the
new perf_event_array map type can reuse most of code with
bpf_prog_array map and add fewer lines of special code.
Signed-off-by: Wang Nan <wangnan0@huawei.com> Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kaixu Xia [Thu, 6 Aug 2015 07:02:32 +0000 (07:02 +0000)]
perf: add the necessary core perf APIs when accessing events counters in eBPF programs
This patch add three core perf APIs:
- perf_event_attrs(): export the struct perf_event_attr from struct
perf_event;
- perf_event_get(): get the struct perf_event from the given fd;
- perf_event_read_local(): read the events counters active on the
current CPU;
These APIs are needed when accessing events counters in eBPF programs.
The API perf_event_read_local() comes from Peter and I add the
corresponding SOB.
Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 10 Aug 2015 05:48:10 +0000 (22:48 -0700)]
Merge branch 'mv88e6xxx-switchdev-fdb'
Vivien Didelot says:
====================
net: dsa: mv88e6xxx: support switchdev FDB objects
This patchset refactors the DSA and mv88e6xxx code to use the switchdev FDB
objects.
The first two patches add minor but necessary changes to switchdev, the third
one implements the switchdev glue in DSA for FDB routines, and the remaining
ones refactor the FDB access functions in the mv88e6xxx code.
Below is an usage example (ports 0-2 belongs to br0, ports 3-4 belongs to br1):
# bridge fdb add 3c:97:0e:11:30:6e dev swp2
# bridge fdb add 3c:97:0e:11:40:78 dev swp3
# bridge fdb add 3c:97:0e:11:50:86 dev swp4
# bridge fdb del 3c:97:0e:11:40:78 dev swp3
# bridge fdb
01:00:5e:00:00:01 dev eth0 self permanent
01:00:5e:00:00:01 dev eth1 self permanent
00:50:d2:10:78:15 dev swp0 master br0 permanent
3c:97:0e:11:30:6e dev swp2 self static
00:50:d2:10:78:15 dev swp3 master br1 permanent
3c:97:0e:11:50:86 dev swp4 self static
# cat /sys/kernel/debug/dsa0/atu
# DB T/P Vec State Addr
# 001 Port 004 e 3c:97:0e:11:30:6e
# 004 Port 010 e 3c:97:0e:11:50:86
For the 88E6xxx switches, FIDs 1 to num_ports will be reserved for non-bridged
ports and bridge groups, and the remaining will be later used by VLANs.
This change is necessary to welcome the support for hardware VLANs (which will
follow soon).
Changes in v2:
- remove ndo_bridge_{get,set,del}link from switchdev/DSA glue code
- use ether_addr_copy instead of memcpy for MAC addresses
- constify MAC address in port_fdb_{add,del}
- split the mv88e6xxx code refactoring into several patches
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 6 Aug 2015 05:44:07 +0000 (01:44 -0400)]
net: dsa: mv88e6xxx: rework FDB getnext operation
This commit adds a low level _mv88e6xxx_atu_getnext function and helpers
to rewrite the mv88e6xxx_port_fdb_getnext operation.
A mv88e6xxx_atu_entry structure is added for convenient access to the
hardware, and GLOBAL_ATU_FID is defined instead of the raw 0x01 value.
The previous implementation did not handle the eventual trunk mapping.
If the related bit is set, then the ATU data register would contain the
trunk ID, and not the port vector.
Check this in the FDB getnext operation and do not handle it (yet).
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 6 Aug 2015 05:44:06 +0000 (01:44 -0400)]
net: dsa: mv88e6xxx: rename ATU MAC accessors
Rename the __mv88e6xxx_{read,write}_addr functions to more explicit
_mv88e6xxx_atu_mac_{read,write} functions, which also respect the single
underscore convention used in the file (meaning SMI lock must be held).
In the meantime, define their MAC address parameters as an array of
ETH_ALEN bytes instead of a char pointer.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 6 Aug 2015 05:44:05 +0000 (01:44 -0400)]
net: dsa: mv88e6xxx: extend fid mask
The driver currently manages one FID per port (or bridge group), with a
mask of DSA_MAX_PORTS bits, where 0 means that the FID is in use.
The Marvell 88E6xxx switches support up to 4094 FIDs (from 1 to 0xfff;
FID 0 means that multiple address databases are not being used).
This patch changes the fid_mask for an fid_bitmap of 4096 bits.
>From now on, FIDs 1 to num_ports are reserved for non-bridged ports and
bridge groups (a bridge group gets the FID of its first member). The
remaining bits will be reserved for VLAN entries.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 6 Aug 2015 05:44:02 +0000 (01:44 -0400)]
net: switchdev: change fdb addr for a byte array
The address in the switchdev_obj_fdb structure is currently represented
as a pointer. Replacing it for a 6-byte array allows switchdev to carry
addresses directly read from hardware registers, not stored by the
switch chip driver (as in Rocker).
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Jiang [Mon, 13 Jul 2015 12:07:11 +0000 (08:07 -0400)]
NTB: Fix oops in debugfs when transport is half-up
When the remote side is not up, we do not have all the context for the
transport, and that causes NULL ptr access. Have the debugfs reads check
to see if transport is up before we make access.
Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
Dave Jiang [Mon, 13 Jul 2015 12:07:09 +0000 (08:07 -0400)]
NTB: Fix transport stats for multiple devices
Currently the debugfs does not have files for all NTB transport queue
pairs. When there are multiple NTBs present in a system, the QP names
of the last transport clobber the names of previously added transport
QPs. Only the last added QPs can be observed via debugfs.
Create a directory per NTB transport to associate the QPs with that
transport. Name the directory the same as the PCI device.
Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
Allen Hubbe [Mon, 13 Jul 2015 12:07:08 +0000 (08:07 -0400)]
NTB: Fix ntb_transport out-of-order RX update
It was possible for a synchronous update of the RX index in the error
case to get ahead of the asynchronous RX index update in the normal
case. Change the RX processing to preserve an RX completion order.
There were two error cases. First, if a buffer is not present to
receive data, there would be no queue entry to preserve the RX
completion order. Instead of dropping the RX frame, leave the RX frame
in the ring. Schedule RX processing when RX entries are enqueued, in
case there are RX frames waiting in the ring to be received.
Second, if a buffer is too small to receive data, drop the frame in the
ring, mark the RX entry as done, and indicate the error in the RX entry
length. Check for a negative length in the receive callback in
ntb_netdev, and count occurrences as rx_length_errors.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
Linus Torvalds [Sun, 9 Aug 2015 07:38:42 +0000 (09:38 +0200)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input subsystem fixes from Dmitry Torokhov:
"Just small ALPS and Elan touchpads, and other driver fixups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: elantech - add special check for fw_version 0x470f01 touchpad
Input: twl4030-vibra - fix ERROR: Bad of_node_put() warning
Input: alps - only Dell laptops have separate button bits for v2 dualpoint sticks
Input: axp20x-pek - add module alias
Input: turbografx - fix potential out of bound access
Linus Torvalds [Sun, 9 Aug 2015 02:59:21 +0000 (05:59 +0300)]
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
"Another round of MIPS fixes for 4.2. No area does particularly stand
out but we have a two unpleasant ones:
- Kernel ptes are marked with a global bit which allows the kernel to
share kernel TLB entries between all processes. For this to work
both entries of an adjacent even/odd pte pair need to have the
global bit set. There has been a subtle race in setting the other
entry's global bit since ~ 2000 but it take particularly
pathological workloads that essentially do mostly vmalloc/vfree to
trigger this.
This pull request fixes the 64-bit case but leaves the case of 32
bit CPUs with 64 bit ptes unsolved for now. The unfixed cases
affect hardware that is not available in the field yet.
- Instruction emulation requires loading instructions from user space
but the current fast but simplistic approach will fail on pages
that are PROT_EXEC but !PROT_READ. For this reason we temporarily
do not permit this permission and will map pages with PROT_EXEC |
PROT_READ.
The remainder of this pull request is more or less across the field
and the short log explains them well"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: Make set_pte() SMP safe.
MIPS: Replace add and sub instructions in relocate_kernel.S with addiu
MIPS: Flush RPS on kernel entry with EVA
Revert "MIPS: BCM63xx: Provide a plat_post_dma_flush hook"
MIPS: BMIPS: Delete unused Kconfig symbol
MIPS: Export get_c0_perfcount_int()
MIPS: show_stack: Fix stack trace with EVA
MIPS: do_mcheck: Fix kernel code dump with EVA
MIPS: SMP: Don't increment irq_count multiple times for call function IPIs
MIPS: Partially disable RIXI support.
MIPS: Handle page faults of executable but unreadable pages correctly.
MIPS: Malta: Don't reinitialise RTC
MIPS: unaligned: Fix build error on big endian R6 kernels
MIPS: Fix sched_getaffinity with MT FPAFF enabled
MIPS: Fix build with CONFIG_OF=y for non OF-enabled targets
CPUFREQ: Loongson2: Fix broken build due to incorrect include.
Linus Torvalds [Sun, 9 Aug 2015 02:56:31 +0000 (05:56 +0300)]
Merge branch 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fix from Chris Mason:
"We have a btrfs quota regression fix.
I merged this one on Thursday and have run it through tests against
current master.
Normally I wouldn't have sent this while you were finalizing rc6, but
I'm feeding mosquitoes in the adirondacks next week, so I wanted to
get this one out before leaving. I'll leave longer tests running and
check on things during the week, but I don't expect any problems"
* 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: qgroup: Fix a regression in qgroup reserved space.
Linus Torvalds [Sun, 9 Aug 2015 02:54:27 +0000 (05:54 +0300)]
Merge branch 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal management fixes from Zhang Rui:
"Specifics:
- fix an error that "weight_attr" sysfs attribute is not removed
while unbinding. From: Viresh Kumar.
- fix power allocator governor tracing to return the real request.
From Javi Merino.
- remove redundant owner assignment of hisi platform thermal driver.
From Krzysztof Kozlowski.
- a couple of small fixes of Exynos thermal driver. From Krzysztof
Kozlowski and Chanwoo Choi"
* 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
thermal: Drop owner assignment from platform_driver
thermal: exynos: Remove unused code related to platform_data on probe()
thermal: exynos: Add the dependency of CONFIG_THERMAL_OF instead of CONFIG_OF
thermal: exynos: Disable the regulator on probe failure
thermal: power_allocator: trace the real requested power
thermal: remove dangling 'weight_attr' device file
Linus Torvalds [Sat, 8 Aug 2015 01:38:00 +0000 (04:38 +0300)]
Merge tag 'arc-v4.2-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
"Here's a late pull request for accumulated ARC fixes which came out of
extended testing of the new ARCv2 port with LTP etc. llock/scond
livelock workaround has been reviewed by PeterZ. The changes look a
lot but I've crafted them into finer grained patches for better
tracking later.
I have some more fixes (ARC Futex backend) ready to go but those will
have to wait for tglx to return from vacation.
Summary:
- Enable a reduced config of HS38 (w/o div-rem, ll64...)
- Add software workaround for LLOCK/SCOND livelock
- Fallout of a recent pt_regs update"
* tag 'arc-v4.2-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARCv2: spinlock/rwlock/atomics: reduce 1 instruction in exponential backoff
ARC: Make pt_regs regs unsigned
ARCv2: spinlock/rwlock: Reset retry delay when starting a new spin-wait cycle
ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff
ARC: LLOCK/SCOND based rwlock
ARC: LLOCK/SCOND based spin_lock
ARC: refactor atomic inline asm operands with symbolic names
Revert "ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock"
ARCv2: [axs103_smp] Reduce clk for Quad FPGA configs
ARCv2: Fix the peripheral address space detection
ARCv2: allow selection of page size for MMUv4
ARCv2: lib: memset: Don't assume 64-bit load/stores
ARCv2: lib: memcpy: Missing PREFETCHW
ARCv2: add knob for DIV_REV in Kconfig
ARC/time: Migrate to new 'set-state' interface
Linus Torvalds [Sat, 8 Aug 2015 01:36:40 +0000 (04:36 +0300)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fix from Michael Tsirkin:
"A last minute fix for the new virtio input driver. It seems pretty
obvious, and the problem it's fixing would be quite hard to debug"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio-input: reset device and detach unused during remove
Linus Torvalds [Sat, 8 Aug 2015 01:35:14 +0000 (04:35 +0300)]
Merge tag 'dm-4.2-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- stable fix for a dm_merge_bvec() regression on 32 bit Fedora systems.
- fix for a 4.2 DM thinp discard regression due to inability to
properly delete a range of blocks in a data mapping btree.
* tag 'dm-4.2-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm btree remove: fix bug in remove_one()
dm: fix dm_merge_bvec regression on 32 bit systems
Linus Torvalds [Sat, 8 Aug 2015 01:33:35 +0000 (04:33 +0300)]
Merge tag 'sound-4.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"The only bulk changes in this request is ABI updates for ASoC topology
API. It's a new API that was introduced in 4.2, and we'd like to
avoid ABI change after the release, so it's taken now. As there is no
real in-tree user for this API, it should be fairly safe.
Other than that, the usual small fixes are found in various drivers:
ASoC cs4265, rt5645, intel-sst, firewire, oxygen and HD-audio"
* tag 'sound-4.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: topology: Add private data type and bump ABI version to 3
ASoC: topology: Add ops support to byte controls UAPI
ASoC: topology: Update TLV support so we can support more TLV types
ASoC: topology: add private data to manifest
ASoC: topology: Add subsequence in topology
ALSA: hda - one Dell machine needs the headphone white noise fixup
ALSA: fireworks/firewire-lib: add support for recent firmware quirk
Revert "ALSA: fireworks: add support for AudioFire2 quirk"
ASoC: topology: fix typo in soc_tplg_kcontrol_bind_io()
ALSA: HDA: Dont check return for snd_hdac_chip_readl
ALSA: HDA: Fix stream assignment for host in decoupled mode
ASoC: rt5645: Fix lost pin setting for DMIC1
ALSA: oxygen: Fix logical-not-parentheses warning
ASoC: Intel: sst_byt: fix initialize 'NULL device *' issue
ASoC: Intel: haswell: fix initialize 'NULL device *' issue
ASoC: cs4265: Fix setting dai format for Left/Right Justified
Linus Torvalds [Sat, 8 Aug 2015 01:30:37 +0000 (04:30 +0300)]
Merge tag 'hwmon-for-linus-v4.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Export module alias information in g762 and nct7904 to support
auto-loading.
- Blacklist Dell Studio XPS 8100 in dell-smm to fix fan control
problems.
* tag 'hwmon-for-linus-v4.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (g762) Export OF module alias information
hwmon: (nct7904) Export I2C module alias information
hwmon: (dell-smm) Blacklist Dell Studio XPS 8100
Linus Torvalds [Sat, 8 Aug 2015 01:27:51 +0000 (04:27 +0300)]
Merge tag 'usb-4.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some USB and PHY fixes for 4.2-rc6 that resolve some reported
issues.
All of these have been in the linux-next tree for a while, full
details on the patches are in the shortlog below"
* tag 'usb-4.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
ARM: dts: dra7: Add syscon-pllreset syscon to SATA PHY
drivers/usb: Delete XHCI command timer if necessary
xhci: fix off by one error in TRB DMA address boundary check
usb: udc: core: add device_del() call to error pathway
phy: ti-pipe3: i783 workaround for SATA lockup after dpll unlock/relock
phy-sun4i-usb: Add missing EXPORT_SYMBOL_GPL for sun4i_usb_phy_set_squelch_detect
USB: sierra: add 1199:68AB device ID
usb: gadget: f_printer: actually limit the number of instances
usb: gadget: f_hid: actually limit the number of instances
usb: gadget: f_uac2: fix calculation of uac2->p_interval
usb: gadget: bdc: fix a driver crash on disconnect
usb: chipidea: ehci_init_driver is intended to call one time
USB: qcserial: Add support for Dell Wireless 5809e 4G Modem
USB: qcserial/option: make AT URCs work for Sierra Wireless MC7305/MC7355
Linus Torvalds [Sat, 8 Aug 2015 01:26:31 +0000 (04:26 +0300)]
Merge tag 'staging-4.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are three bugfixes for some staging driver issues that have been
reported. All have been in the linux-next tree for a while"
* tag 'staging-4.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: lustre: Include unaligned.h instead of access_ok.h
staging: vt6655: vnt_bss_info_changed check conf->beacon_rate is not NULL
staging: comedi: das1800: add missing break in switch
Linus Torvalds [Sat, 8 Aug 2015 01:25:10 +0000 (04:25 +0300)]
Merge tag 'char-misc-4.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg KH:
"Here are some extcon fixes for 4.2-rc6 that resolve some reported
problems.
All have been in linux-next for a while"
* tag 'char-misc-4.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
extcon: Fix extcon_cable_get_state() from getting old state after notification
extcon: Fix hang and extcon_get/set_cable_state().
extcon: palmas: Fix NULL pointer error
Linus Torvalds [Sat, 8 Aug 2015 01:18:14 +0000 (04:18 +0300)]
Merge tag 'drm-intel-fixes-2015-08-07' of git://anongit.freedesktop.org/drm-intel
Pull drm fixes from Daniel Vetter:
"One i915 regression fix and a drm core one since Dave's not around,
both introduced in 4.2 so not cc: stable.
The fix for the warning Ted reported isn't in here yet since he didn't
yet supply a tested-by and I can't repro this one myself (it's in
fixup code that needs firmware doing something i915 wouldn't do)"
* tag 'drm-intel-fixes-2015-08-07' of git://anongit.freedesktop.org/drm-intel:
drm/vblank: Use u32 consistently for vblank counters
drm/i915: Allow parsing of variable size child device entries from VBT
Tom Herbert [Wed, 5 Aug 2015 16:39:27 +0000 (09:39 -0700)]
net: Fix race condition in store_rps_map
There is a race condition in store_rps_map that allows jump label
count in rps_needed to go below zero. This can happen when
concurrently attempting to set and a clear map.
Scenario:
1. rps_needed count is zero
2. New map is assigned by setting thread, but rps_needed count _not_ yet
incremented (rps_needed count still zero)
2. Map is cleared by second thread, old_map set to that just assigned
3. Second thread performs static_key_slow_dec, rps_needed count now goes
negative
Fix is to increment or decrement rps_needed under the spinlock.
Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Carol L Soto [Wed, 5 Aug 2015 16:05:32 +0000 (11:05 -0500)]
net/mlx5_core: Set log_uar_page_sz for non 4K page size architecture
failed to configure the page size for architectures with page size
different than 4K.
Fixes: 938fe83 ("net/mlx5_core: New device capabilities handling") Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com> Acked-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 7 Aug 2015 22:51:10 +0000 (15:51 -0700)]
Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge
Antonio Quartulli says:
====================
Included changes:
- prevent DAT from replying on behalf of local clients and confuse L2
bridges
- fix crash on double list removal of TT objects (tt_local_entry)
- fix crash due to missing NULL checks
- initialize bw values for new GWs objects to prevent memory leak
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Wenyu Zhang [Wed, 5 Aug 2015 07:30:47 +0000 (00:30 -0700)]
openvswitch: Make 100 percents packets sampled when sampling rate is 1.
When sampling rate is 1, the sampling probability is UINT32_MAX. The packet
should be sampled even the prandom32() generate the number of UINT32_MAX.
And none packet need be sampled when the probability is 0.
Signed-off-by: Wenyu Zhang <wenyuz@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 7 Aug 2015 18:53:06 +0000 (11:53 -0700)]
Merge branch 'be2net-fixes'
Sathya Perla says:
====================
be2net: patch set
This patch set contains 2 driver fixes to a Lancer HW issue and a fix
to a double free bug. Pls apply to the "net" tree. Thanks!
Patch 1 now enables filters only after creating RXQs. This is done as
HW issues were observed on Lancer adapters if filters
(flags, mac addrs etc) are enabled *before* creating RXQs. This patch
changes the driver design by enabling filters in be_open() --
instead of be_setup() -- after RXQs are created and buffers posted.
Patch 2 fixes an RX stall issue that was seen on Lancer adapters when
RXQs are destroyed while they are in an "out of buffer" state.
This patch fixes this issue by posting 64 buffers to each RXQ before
destroying them in the close path. This is done after ensuring that no
more new packets are selected for transfer to the RXQs by disabling
interface filters.
Patch 3 protects eqo->affinity_mask variable from being freed twice and
resulting in a crash. It's now freed only when EQs haven't yet been
destroyed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Kalesh AP [Wed, 5 Aug 2015 07:27:50 +0000 (03:27 -0400)]
be2net: protect eqo->affinity_mask from getting freed twice
There are paths in the driver such as an unrecoverable error (UE) detection
followed by a driver unload wherein be_clear() is invoked twice.
Individual data structures are reset so that they are not cleaned/freed
twice. This patch does the same for eqo->affinity_mask. It is freed only
if EQs haven't yet been destroyed. This fixes a possible crash when
affinity_mask is freed twice.
Signed-off-by: Kalesh AP <kalesh.purayil@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kalesh AP [Wed, 5 Aug 2015 07:27:49 +0000 (03:27 -0400)]
be2net: post buffers before destroying RXQs in Lancer
An RX stall issue was seen on Lancer adapters, when RXQs are destroyed
while they are in an "out of buffer" state. This patch fixes this issue
by posting 64 buffers to each RXQ before destroying them in the close path.
This is done after ensuring that no more new packets are selected for
transfer to the RXQs by disabling interface filters.
Signed-off-by: Kalesh AP <kalesh.purayil@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kalesh AP [Wed, 5 Aug 2015 07:27:48 +0000 (03:27 -0400)]
be2net: enable IFACE filters only after creating RXQs
HW issues were observed on Lancer adapters if IFACE filters
(flags, mac addrs etc) are enabled *before* creating RXQs. This patch
changes the driver design by enabling filters in be_open() --
instead of be_setup() -- after RXQs are created and buffers posted.
Two new wrapper functions, be_enable_if_filters() and
be_disable_if_filters() are introduced to enable/disable IFACE filters in
be_open()/be_close() respectively. In be_setup() the IFACE is now created
only with the RSS flag.
Signed-off-by: Kalesh AP <kalesh.purayil@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
vxlan: combine VXLAN_FLOWBASED into VXLAN_COLLECT_METADATA
IFLA_VXLAN_FLOWBASED is useless without IFLA_VXLAN_COLLECT_METADATA,
so combine them into single IFLA_VXLAN_COLLECT_METADATA flag.
'flowbased' doesn't convey real meaning of the vxlan tunnel mode.
This mode can be used by routing, tc+bpf and ovs.
Only ovs is strictly flow based, so 'collect metadata' is a better
name for this tunnel mode.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 7 Aug 2015 18:29:58 +0000 (11:29 -0700)]
Merge branch 'rds-tcp-netns'
Sowmini Varadhan says:
====================
RDS-TCP: Network namespace support
This patch series contains the set of changes to correctly set up
the infra for PF_RDS sockets that use TCP as the transport in multiple
network namespaces.
Patch 1 in the series is the minimal set of changes to allow
a single instance of RDS-TCP to run in any (i.e init_net or other) net
namespace. The changes in this patch set ensure that the execution of
'modprobe [-r] rds_tcp' sets up the kernel TCP sockets
relative to the current netns, so that RDS applications can send/recv
packets from that netns, and the netns can later be deleted cleanly.
Patch 2 of the series further allows multiple RDS-TCP instances,
one per network namespace. The changes in this patch allows dynamic
creation/tear-down of RDS-TCP client and server sockets across all
current and future namespaces.
v2 changes from RFC sent out earlier:
David Ahern comments in patch 1, net_device notifier in patch 2,
patch 3 broken off and submitted separately.
v3: Cong Wang review comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.
Register pernet subsys init/stop functions that will set up
and tear down per-net RDS-TCP listen endpoints. Unregister
pernet subusys functions on 'modprobe -r' to clean up these
end points.
Enable keepalive on both accept and connect socket endpoints.
The keepalive timer expiration will ensure that client socket
endpoints will be removed as appropriate from the netns when
an interface is removed from a namespace.
Register a device notifier callback that will clean up all
sockets (and thus avoid the need to wait for keepalive timeout)
when the loopback device is unregistered from the netns indicating
that the netns is getting deleted.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
RDS-TCP: Make RDS-TCP work correctly when it is set up in a netns other than init_net
Open the sockets calling sock_create_kern() with the correct struct net
pointer, and use that struct net pointer when verifying the
address passed to rds_bind().
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Thornber [Fri, 7 Aug 2015 15:33:01 +0000 (16:33 +0100)]
dm btree remove: fix bug in remove_one()
remove_one() was not incrementing the key for the beginning of the
range, so not all entries were being removed. This resulted in
discards that were not unmapping all blocks.
Fixes: 4ec331c3ea ("dm btree: add dm_btree_remove_leaves()") Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
drm/vblank: Fixup and document timestamp update/read barriers
I've switched vblank->count from atomic_t to unsigned long and
accidentally created an integer comparison bug in
drm_vblank_count_and_time since vblanke->count might overflow the u32
local copy and hence the retry loop never succeed.
Fix this by consistently using u32.
Cc: Michel Dänzer <michel@daenzer.net> Reported-by: Michel Dänzer <michel@daenzer.net> Reviewed-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Takashi Iwai [Fri, 7 Aug 2015 11:53:41 +0000 (13:53 +0200)]
Merge tag 'asoc-fix-v4.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v4.2
There are a couple of small driver specific fixes here but the
overwhelming bulk of these changes are fixes to the topology ABI that
has been newly introduced in v4.2. Once this makes it into a release we
will have to firm this up but for now getting enhancements in before
they've made it into a release is the most expedient thing.
David S. Miller [Fri, 7 Aug 2015 07:19:09 +0000 (00:19 -0700)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-08-05
This series contains updates to i40e, i40evf and e1000e.
Anjali adds support for x772 devices to i40e and i40evf. With the added
support, x772 supports offloading of the outer UDP transmit and receive
checksum for tunneled packets. Also supports evicting ATR filters in the
hardware, so update the driver with this new feature set.
Raanan provides several fixes for e1000e, first rectifies the Energy
Efficient Ethernet in Sx code so that it only applies to parts that
actually support EEE in Sx. Fix whitespace and moved ICH8 related define
to the proper context. Fixed the ASPM locking which was reported by
Bjorn Helgaas. Fix a workaround implementation for systime which could
experience a large non-linear increment of the systime value when
checking for overflow.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Wed, 5 Aug 2015 02:34:04 +0000 (10:34 +0800)]
virtio-net: drop NETIF_F_FRAGLIST
virtio declares support for NETIF_F_FRAGLIST, but assumes
that there are at most MAX_SKB_FRAGS + 2 fragments which isn't
always true with a fraglist.
A longer fraglist in the skb will make the call to skb_to_sgvec overflow
the sg array, leading to memory corruption.
Drop NETIF_F_FRAGLIST so we only get what we can handle.
Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>