]> git.kernelconcepts.de Git - karo-tx-linux.git/log
karo-tx-linux.git
11 years agosparc: bpf_jit_comp: add VLAN instructions for BPF JIT
Daniel Borkmann [Sun, 4 Nov 2012 16:59:30 +0000 (16:59 +0000)]
sparc: bpf_jit_comp: add VLAN instructions for BPF JIT

This patch is a follow-up for patch "net: filter: add vlan tag access"
to support the new VLAN_TAG/VLAN_TAG_PRESENT accessors in BPF JIT.

Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agor8169: enable internal ASPM and clock request settings
hayeswang [Thu, 1 Nov 2012 16:46:28 +0000 (16:46 +0000)]
r8169: enable internal ASPM and clock request settings

The following chips need to enable internal settings to let ASPM
and clock request work.

RTL8111E-VL, RTL8111F, RTL8411, RTL8111G
RTL8105, RTL8402, RTL8106

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobridge: Avoid 'statement with no effect' compiler warnings
Lee Jones [Sat, 3 Nov 2012 22:02:30 +0000 (23:02 +0100)]
bridge: Avoid 'statement with no effect' compiler warnings

Instead of issuing (0) statements when !CONFIG_SYSFS which will cause
'warning: ', we'll use inline statements instead. This will effectively
do the same thing, but suppress any unnecessary warnings.

Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: bridge@lists.linux-foundation.org
Cc: netdev@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agodrivers/net/ethernet/ibm/emac/mal.c: use WARN
Julia Lawall [Sat, 3 Nov 2012 00:58:31 +0000 (00:58 +0000)]
drivers/net/ethernet/ibm/emac/mal.c: use WARN

Use WARN rather than printk followed by WARN_ON(1), for conciseness.

A simplified version of the semantic patch that makes this transformation
is as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression list es;
@@

-printk(
+WARN(1,
  es);
-WARN_ON(1);
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoatp: remove set_rx_mode_8012()
Paul Bolle [Fri, 2 Nov 2012 23:53:15 +0000 (23:53 +0000)]
atp: remove set_rx_mode_8012()

Building atp.o triggers this GCC warning:
    drivers/net/ethernet/realtek/atp.c: In function ‘set_rx_mode’:
    drivers/net/ethernet/realtek/atp.c:871:26: warning: ‘mc_filter[0]’ may be used uninitialized in this function [-Wuninitialized]

GCC is correct. In promiscuous mode 'mc_filter' will be used
uninitialized in set_rx_mode_8012(), which is apparently inlined into
set_rx_mode().

But it turns out set_rx_mode_8012() will never be called, since
net_local.chip_type will always be RTL8002. So we can just remove
set_rx_mode_8012() and do some related cleanups.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocpsw: fix leaking IO mappings
Richard Cochran [Fri, 2 Nov 2012 22:25:30 +0000 (22:25 +0000)]
cpsw: fix leaking IO mappings

The CPSW driver remaps two different IO regions, but fails to unmap them
both. This patch fixes the issue by calling iounmap in the appropriate
places.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocpsw: rename register banks to match the reference manual, part 2
Richard Cochran [Fri, 2 Nov 2012 22:25:29 +0000 (22:25 +0000)]
cpsw: rename register banks to match the reference manual, part 2

The code mixes up the CPSW_SS and the CPSW_WR register naming. This patch
changes the names to conform to the published Technical Reference Manual
from TI, in order to make working on the code less confusing.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: fix bridge notify hook to manage flags correctly
John Fastabend [Fri, 2 Nov 2012 16:32:36 +0000 (16:32 +0000)]
net: fix bridge notify hook to manage flags correctly

The bridge notify hook rtnl_bridge_notify() was not handling the
case where the master flags was set or with both flags set. First
flags are not being passed correctly and second the logic to parse
them is broken.

This patch passes the original flags value and fixes the
logic.

Reported-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agomacb: Keep driver's speed/duplex in sync with actual NCFGR
Vitalii Demianets [Fri, 2 Nov 2012 07:09:24 +0000 (07:09 +0000)]
macb: Keep driver's speed/duplex in sync with actual NCFGR

When underlying phy driver restores its state very fast after being brought
down and up so that macb driver function macb_handle_link_change() was never
called with link state "down", driver's internal representation of phy speed
and duplex (bp->speed and bp->duplex) didn't change. So, macb driver sees no
reason to perform actual write to the NCFGR register, although the speed and
duplex settings in that register were reset when interface was brought down
and up. In that case actual phy speed and duplex differ from NCFGR settings.
The patch fixes that by keeping internal driver representation of speed and
duplex in sync with actual content of NCFGR.

Signed-off-by: Vitalii Demianets <vitas@nppfactor.kiev.ua>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: neterion: Do not break word unregister.
YOSHIFUJI Hideaki / 吉藤英明 [Fri, 2 Nov 2012 04:45:24 +0000 (04:45 +0000)]
net: neterion: Do not break word unregister.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sh_eth: Fix a typo - replace regist with register.
YOSHIFUJI Hideaki / 吉藤英明 [Fri, 2 Nov 2012 04:45:07 +0000 (04:45 +0000)]
net: sh_eth: Fix a typo - replace regist with register.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoptp: fixup Kconfig for two PHC drivers.
Richard Cochran [Thu, 1 Nov 2012 23:57:38 +0000 (23:57 +0000)]
ptp: fixup Kconfig for two PHC drivers.

Ben Hutchings recently came up with a better way to handle the kconfig
dependencies for the PTP hardware clocks. This patch converts one new and
one older driver to the new scheme.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agohtb: improved accuracy at high rates
Vimalkumar [Wed, 31 Oct 2012 06:04:11 +0000 (06:04 +0000)]
htb: improved accuracy at high rates

Current HTB (and TBF) uses rate table computed by the "tc"
userspace program, which has the following issue:

The rate table has 256 entries to map packet lengths
to token (time units).  With TSO sized packets, the
256 entry granularity leads to loss/gain of rate,
making the token bucket inaccurate.

Thus, instead of relying on rate table, this patch
explicitly computes the time and accounts for packet
transmission times with nanosecond granularity.

This greatly improves accuracy of HTB with a wide
range of packet sizes.

Example:

tc qdisc add dev $dev root handle 1: \
        htb default 1

tc class add dev $dev classid 1:1 parent 1: \
        rate 5Gbit mtu 64k

Here is an example of inaccuracy:

$ iperf -c host -t 10 -i 1

With old htb:
eth4:   34.76 Mb/s In  5827.98 Mb/s Out -  65836.0 p/s In  481273.0 p/s Out
[SUM]  9.0-10.0 sec   669 MBytes  5.61 Gbits/sec
[SUM]  0.0-10.0 sec  6.50 GBytes  5.58 Gbits/sec

With new htb:
eth4:   28.36 Mb/s In  5208.06 Mb/s Out -  53704.0 p/s In  430076.0 p/s Out
[SUM]  9.0-10.0 sec   594 MBytes  4.98 Gbits/sec
[SUM]  0.0-10.0 sec  5.80 GBytes  4.98 Gbits/sec

The bits per second on the wire is still 5200Mb/s with new HTB
because qdisc accounts for packet length using skb->len, which
is smaller than total bytes on the wire if GSO is used.  But
that is for another patch regardless of how time is accounted.

Many thanks to Eric Dumazet for review and feedback.

Signed-off-by: Vimalkumar <j.vimal@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agovxlan: allow a user to set TTL value
Vincent Bernat [Tue, 30 Oct 2012 10:27:16 +0000 (10:27 +0000)]
vxlan: allow a user to set TTL value

"ip link add ... type vxlan ... ttl X" allows a user to set the TTL
used by a VXLAN for encapsulation. The provided value was ignored by
vxlan module and the default value of 1 was used when encapsulating
multicast packets.

Signed-off-by: Vincent Bernat <bernat@luffy.cx>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosmsc75xx: add wol support for more frame types
Steve Glendinning [Tue, 30 Oct 2012 07:46:32 +0000 (07:46 +0000)]
smsc75xx: add wol support for more frame types

This patch adds support for wol wakeup on unicast, broadcast,
multicast and arp frames.

Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoif_ether.h: add B.A.T.M.A.N.-Advanced Ethertype
Antonio Quartulli [Tue, 30 Oct 2012 04:08:41 +0000 (04:08 +0000)]
if_ether.h: add B.A.T.M.A.N.-Advanced Ethertype

Add Ethertype 0x4305 (not an officially registered id).
This Ethertype is used by every frame generated by B.A.T.M.A.N.-Advanced. Its
definition is currently batman-adv local only and since it is not officially
registered it is better to make its definition kernel-wide so that we avoid
collisions given by future unofficial uses of the same Ethertype.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agor8169: Kill SafeMtu macro
Kirill Smelkov [Mon, 29 Oct 2012 07:55:12 +0000 (07:55 +0000)]
r8169: Kill SafeMtu macro

After d58d46b5 (r8169: jumbo fixes.) max frame len is stored in
rtl_chip_infos[].jumbo_max for each chip and SafeMtu should be gone.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipv6: introduce ip6_rt_put()
Amerigo Wang [Mon, 29 Oct 2012 00:13:19 +0000 (00:13 +0000)]
ipv6: introduce ip6_rt_put()

As suggested by Eric, we could introduce a helper function
for ipv6 too, to avoid checking if rt is NULL before
dst_release().

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipv4: avoid a test in ip_rt_put()
Eric Dumazet [Sun, 28 Oct 2012 22:33:23 +0000 (22:33 +0000)]
ipv4: avoid a test in ip_rt_put()

We can save a test in ip_rt_put(), considering dst_release() accepts
a NULL parameter, and dst is first element in rtable.

Add a BUILD_BUG_ON() to catch any change that could break this
assertion.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <amwang@redhat.com>
Acked-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosctp: Clean up type-punning in sctp_cmd_t union
Neil Horman [Mon, 29 Oct 2012 08:32:13 +0000 (08:32 +0000)]
sctp: Clean up type-punning in sctp_cmd_t union

Lots of points in the sctp_cmd_interpreter function treat the sctp_cmd_t arg as
a void pointer, even though they are written as various other types.  Theres no
need for this as doing so just leads to possible type-punning issues that could
cause crashes, and if we remain type-consistent we can actually just remove the
void * member of the union entirely.

Change Notes:

v2)
* Dropped chunk that modified SCTP_NULL to create a marker pattern
 should anyone try to use a SCTP_NULL() assigned sctp_arg_t, Assigning
 to .zero provides the same effect and should be faster, per Vlad Y.

v3)
* Reverted part of V2, opting to use memset instead of .zero, so that
 the entire union is initalized thus avoiding the i164 speculative load
 problems previously encountered, per Dave M..  Also rewrote
 SCTP_[NO]FORCE so as to use common infrastructure a little more

Signed-off-by: Neil Horman <nhorman@tuxdriver.com
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: linux-sctp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipv6: remove a useless NULL check
Amerigo Wang [Sun, 28 Oct 2012 17:43:53 +0000 (17:43 +0000)]
ipv6: remove a useless NULL check

In dev_forward_change(), it is useless to check if idev->dev
is NULL, it is always non-NULL here.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agopktgen: clean up ktime_t helpers
Daniel Borkmann [Sun, 28 Oct 2012 08:27:19 +0000 (08:27 +0000)]
pktgen: clean up ktime_t helpers

Some years ago, the ktime_t helper functions ktime_now() and ktime_lt()
have been introduced. Instead of defining them inside pktgen.c, they
should either use ktime_t library functions or, if not available, they
should be defined in ktime.h, so that also others can benefit from them.
ktime_compare() is introduced with a similar notion as in timespec_compare().

Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotcp: better retrans tracking for defer-accept
Eric Dumazet [Sat, 27 Oct 2012 23:16:46 +0000 (23:16 +0000)]
tcp: better retrans tracking for defer-accept

For passive TCP connections using TCP_DEFER_ACCEPT facility,
we incorrectly increment req->retrans each time timeout triggers
while no SYNACK is sent.

SYNACK are not sent for TCP_DEFER_ACCEPT that were established (for
which we received the ACK from client). Only the last SYNACK is sent
so that we can receive again an ACK from client, to move the req into
accept queue. We plan to change this later to avoid the useless
retransmit (and potential problem as this SYNACK could be lost)

TCP_INFO later gives wrong information to user, claiming imaginary
retransmits.

Decouple req->retrans field into two independent fields :

num_retrans : number of retransmit
num_timeout : number of timeouts

num_timeout is the counter that is incremented at each timeout,
regardless of actual SYNACK being sent or not, and used to
compute the exponential timeout.

Introduce inet_rtx_syn_ack() helper to increment num_retrans
only if ->rtx_syn_ack() succeeded.

Use inet_rtx_syn_ack() from tcp_check_req() to increment num_retrans
when we re-send a SYNACK in answer to a (retransmitted) SYN.
Prior to this patch, we were not counting these retransmits.

Change tcp_v[46]_rtx_synack() to increment TCP_MIB_RETRANSSEGS
only if a synack packet was successfully queued.

Reported-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
Cc: Elliott Hughes <enh@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: Fix continued iteration in rtnl_bridge_getlink()
Ben Hutchings [Fri, 2 Nov 2012 12:56:52 +0000 (12:56 +0000)]
net: Fix continued iteration in rtnl_bridge_getlink()

Commit e5a55a898720096f43bc24938f8875c0a1b34cd7 ('net: create generic
bridge ops') broke the handling of a non-zero starting index in
rtnl_bridge_getlink() (based on the old br_dump_ifinfo()).

When the starting index is non-zero, we need to increment the current
index for each entry that we are skipping.  Also, we need to check the
index before both cases, since we may previously have stopped
iteration between getting information about a device from its master
and from itself.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Tested-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipv6/multipath: remove flag NLM_F_EXCL after the first nexthop
Nicolas Dichtel [Thu, 1 Nov 2012 22:58:22 +0000 (22:58 +0000)]
ipv6/multipath: remove flag NLM_F_EXCL after the first nexthop

fib6_add_rt2node() will reject the nexthop if this flag is set, so
we perform the check only for the first nexthop.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoeth: Rename and properly align br_reserved_address array
Ben Hutchings [Thu, 1 Nov 2012 09:12:02 +0000 (09:12 +0000)]
eth: Rename and properly align br_reserved_address array

Since this array is no longer part of the bridge driver, it should
have an 'eth' prefix not 'br'.

We also assume that either it's 16-bit-aligned or the architecture has
efficient unaligned access.  Ensure the first of these is true by
explicitly aligning it.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoeth: Make is_link_local() consistent with other address tests
Ben Hutchings [Thu, 1 Nov 2012 09:11:11 +0000 (09:11 +0000)]
eth: Make is_link_local() consistent with other address tests

Function name should include '_ether_addr'.
Return type should be bool.
Parameter name should be 'addr' not 'dest' (also matching kernel-doc).

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobridge: Use is_link_local() in store_group_addr()
Ben Hutchings [Thu, 1 Nov 2012 09:10:04 +0000 (09:10 +0000)]
bridge: Use is_link_local() in store_group_addr()

Parse the string into an array of bytes rather than ints, so we can
use is_link_local() rather than reimplementing it.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosfc: Select PTP_1588_CLOCK
Ben Hutchings [Thu, 1 Nov 2012 11:22:22 +0000 (11:22 +0000)]
sfc: Select PTP_1588_CLOCK

This was missed in commit a24006ed12616bde1bbdb26868495906a212d8dc
('ptp: Enable clock drivers along with associated net/PHY drivers')
which enabled sfc's clock driver unconditionally.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agovhost-net: reduce vq polling on tx zerocopy
Michael S. Tsirkin [Thu, 1 Nov 2012 09:16:55 +0000 (09:16 +0000)]
vhost-net: reduce vq polling on tx zerocopy

It seems that to avoid deadlocks it is enough to poll vq before
 we are going to use the last buffer.  This is faster than
c70aa540c7a9f67add11ad3161096fb95233aa2e.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agovhost-net: select tx zero copy dynamically
Michael S. Tsirkin [Thu, 1 Nov 2012 09:16:51 +0000 (09:16 +0000)]
vhost-net: select tx zero copy dynamically

Even when vhost-net is in zero-copy transmit mode,
net core might still decide to copy the skb later
which is somewhat slower than a copy in user
context: data copy overhead is added to the cost of
page pin/unpin. The result is that enabling tx zero copy
option leads to higher CPU utilization for guest to guest
and guest to host traffic.

To fix this, suppress zero copy tx after a given number of
packets triggered late data copy. Re-enable periodically
to detect workload changes.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agovhost: move -net specific code out
Michael S. Tsirkin [Thu, 1 Nov 2012 09:16:46 +0000 (09:16 +0000)]
vhost: move -net specific code out

Zerocopy handling code is vhost-net specific.
Move it from vhost.c/vhost.h out to net.c

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agovhost: track zero copy failures using DMA length
Michael S. Tsirkin [Thu, 1 Nov 2012 09:16:42 +0000 (09:16 +0000)]
vhost: track zero copy failures using DMA length

This will be used to disable zerocopy when error rate
is high.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agovhost-net: cleanup macros for DMA status tracking
Michael S. Tsirkin [Thu, 1 Nov 2012 09:16:37 +0000 (09:16 +0000)]
vhost-net: cleanup macros for DMA status tracking

Better document macros for DMA tracking. Add an
explicit one for DMA in progress instead of
relying on user supplying len != 1.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotun: report orphan frags errors to zero copy callback
Michael S. Tsirkin [Thu, 1 Nov 2012 09:16:32 +0000 (09:16 +0000)]
tun: report orphan frags errors to zero copy callback

When tun transmits a zero copy skb, it orphans the frags
which might need to allocate extra memory, in atomic context.
If that fails, notify ubufs callback before freeing the skb
as a hint that device should disable zerocopy mode.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoskb: api to report errors for zero copy skbs
Michael S. Tsirkin [Thu, 1 Nov 2012 09:16:28 +0000 (09:16 +0000)]
skb: api to report errors for zero copy skbs

Orphaning frags for zero copy skbs needs to allocate data in atomic
context so is has a chance to fail. If it does we currently discard
the skb which is safe, but we don't report anything to the caller,
so it can not recover by e.g. disabling zero copy.

Add an API to free skb reporting such errors: this is used
by tun in case orphaning frags fails.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoskb: report completion status for zero copy skbs
Michael S. Tsirkin [Thu, 1 Nov 2012 09:16:22 +0000 (09:16 +0000)]
skb: report completion status for zero copy skbs

Even if skb is marked for zero copy, net core might still decide
to copy it later which is somewhat slower than a copy in user context:
besides copying the data we need to pin/unpin the pages.

Add a parameter reporting such cases through zero copy callback:
if this happens a lot, device can take this into account
and switch to copying in user context.

This patch updates all users but ignores the passed value for now:
it will be used by follow-up patches.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Fri, 2 Nov 2012 22:45:35 +0000 (18:45 -0400)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
This series contains updates to igb, ixgbe and e1000.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agovlan: use IS_ENABLED()
Amerigo Wang [Mon, 29 Oct 2012 17:22:28 +0000 (17:22 +0000)]
vlan: use IS_ENABLED()

#if defined(CONFIG_FOO) || defined(CONFIG_FOO_MODULE)

can be replaced by

#if IS_ENABLED(CONFIG_FOO)

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipv6: use IS_ENABLED()
Amerigo Wang [Mon, 29 Oct 2012 16:23:10 +0000 (16:23 +0000)]
ipv6: use IS_ENABLED()

#if defined(CONFIG_FOO) || defined(CONFIG_FOO_MODULE)

can be replaced by

#if IS_ENABLED(CONFIG_FOO)

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agortnl/ipv4: use netconf msg to advertise rp_filter status
Nicolas Dichtel [Mon, 29 Oct 2012 04:53:27 +0000 (04:53 +0000)]
rtnl/ipv4: use netconf msg to advertise rp_filter status

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoppp: make ppp_get_stats64 static
stephen hemminger [Mon, 29 Oct 2012 08:34:02 +0000 (08:34 +0000)]
ppp: make ppp_get_stats64 static

This was picked up by sparse.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoFEC: Add time stamping code and a PTP hardware clock
Frank Li [Tue, 30 Oct 2012 18:25:31 +0000 (18:25 +0000)]
FEC: Add time stamping code and a PTP hardware clock

This patch adds a driver for the FEC(MX6) that offers time
stamping and a PTP haderware clock. Because FEC\ENET(MX6)
hardware frequency adjustment is complex, we have implemented
this in software by changing the multiplication factor of the
timecounter.

Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoARM: imx6q: Set enet tx reference clk from anatop to support 1588
Frank Li [Tue, 30 Oct 2012 18:25:22 +0000 (18:25 +0000)]
ARM: imx6q: Set enet tx reference clk from anatop to support 1588

Set GRP1 BIT21 ENET_CLK_SEL:
  Enet tx reference clk from internal clock from anatop
  (loopback through pad), this clock also sent out to external PHY

Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoARM: dts: imx6q: Add ENET PTP clock pin and clock source
Frank Li [Tue, 30 Oct 2012 18:24:57 +0000 (18:24 +0000)]
ARM: dts: imx6q: Add ENET PTP clock pin and clock source

Add ENET 1588 clock input pin
MX6Q_PAD_GPIO_16__ENET_ANATOP_ETHERNET_REF_OUT
and anatop PLL8 clock source for ENET

Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: fec: move fec_enet_private to header file
Frank Li [Tue, 30 Oct 2012 18:24:49 +0000 (18:24 +0000)]
net: fec: move fec_enet_private to header file

A new file fec_ptp.c will use fec_enet_private to support 1588 PTP
move such structure to common header file fec.h

Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoveth: allow changing the mac address while interface is up
Hannes Frederic Sowa [Tue, 30 Oct 2012 16:22:01 +0000 (16:22 +0000)]
veth: allow changing the mac address while interface is up

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocpsw: support the HWTSTAMP ioctl and the CPTS
Richard Cochran [Mon, 29 Oct 2012 08:45:20 +0000 (08:45 +0000)]
cpsw: support the HWTSTAMP ioctl and the CPTS

This patch hooks into the CPTS code and adds support for the HWTSTAMP
ioctl. The patch includes code for the CPSW version found in the dm814x
even though the background device tree support for this board is still
missing.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocpts: specify the input clock frequency via DT
Richard Cochran [Mon, 29 Oct 2012 08:45:19 +0000 (08:45 +0000)]
cpts: specify the input clock frequency via DT

This patch adds a way to configure the CPTS input clock scaling factors
via the device tree.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocpsw: add a DT field for the active time stamping port
Richard Cochran [Mon, 29 Oct 2012 08:45:18 +0000 (08:45 +0000)]
cpsw: add a DT field for the active time stamping port

Because time stamping on both external ports of the switch simultaneously
is positively useless from the application's point of view, this patch
provides a DT configuration method to choose the active port.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocpsw: add a DT field for the cpts offset
Richard Cochran [Mon, 29 Oct 2012 08:45:17 +0000 (08:45 +0000)]
cpsw: add a DT field for the cpts offset

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocpts: introduce time stamping code and a PTP hardware clock.
Richard Cochran [Mon, 29 Oct 2012 08:45:16 +0000 (08:45 +0000)]
cpts: introduce time stamping code and a PTP hardware clock.

This patch adds a driver for the CPTS that offers time
stamping and a PTP hardware clock. Because some of the
CPTS hardware variants (like the am335x) do not support
frequency adjustment, we have implemented this in software
by changing the multiplication factor of the timecounter.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocpsw: support both silicon versions
Richard Cochran [Mon, 29 Oct 2012 08:45:15 +0000 (08:45 +0000)]
cpsw: support both silicon versions

This patch fixes the cpsw driver to operate correctly with both the
dm814x and the am335x versions of the switch hardware.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocpsw: remember the silicon version
Richard Cochran [Mon, 29 Oct 2012 08:45:14 +0000 (08:45 +0000)]
cpsw: remember the silicon version

This patch lets the CPSW driver remember the version number in order to
support the two different variants already in the wild.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocpsw: add missing fields to the CPSW_SS register bank.
Richard Cochran [Mon, 29 Oct 2012 08:45:13 +0000 (08:45 +0000)]
cpsw: add missing fields to the CPSW_SS register bank.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocpsw: rename register banks to match the reference manual
Richard Cochran [Mon, 29 Oct 2012 08:45:12 +0000 (08:45 +0000)]
cpsw: rename register banks to match the reference manual

The code mixes up the CPSW_SS and the CPSW_WR register naming. This patch
changes the names to conform to the published Technical Reference Manual
from TI, in order to make working on the code less confusing.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agodrivers: net: ethernet: cpsw: add multicast address to ALE table
Mugunthan V N [Mon, 29 Oct 2012 08:45:11 +0000 (08:45 +0000)]
drivers: net: ethernet: cpsw: add multicast address to ALE table

Adding multicast address to ALE table via netdev ops to subscribe, transmit
or receive multicast frames to and from the network

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/macb: add pinctrl consumer support
Jean-Christophe PLAGNIOL-VILLARD [Wed, 31 Oct 2012 06:04:59 +0000 (06:04 +0000)]
net/macb: add pinctrl consumer support

If no pinctrl available just report a warning as some architecture may not
need to do anything.

Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
[nicolas.ferre@atmel.com: adapt the error path, remove unneeded headers]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Tested-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/macb: Offset first RX buffer by two bytes
Havard Skinnemoen [Wed, 31 Oct 2012 06:04:58 +0000 (06:04 +0000)]
net/macb: Offset first RX buffer by two bytes

Make the ethernet frame payload word-aligned, possibly making the
memcpy into the skb a bit faster. This will be even more important
after we eliminate the copy altogether.

Also eliminate the redundant RX_OFFSET constant -- it has the same
definition and purpose as NET_IP_ALIGN.

Signed-off-by: Havard Skinnemoen <havard@skinnemoen.net>
[nicolas.ferre@atmel.com: adapt to newer kernel]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Tested-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/macb: better manage tx errors
Nicolas Ferre [Wed, 31 Oct 2012 06:04:57 +0000 (06:04 +0000)]
net/macb: better manage tx errors

Handle all TX errors, not only underruns. TX error management is
deferred to a dedicated workqueue.
Reinitialize the TX ring after treating all remaining frames, and
restart the controller when everything has been cleaned up properly.
Napi is not stopped during this task as the driver only handles
napi for RX for now.
With this sequence, we do not need a special check during the xmit
method as the packets will be caught by TX disable during workqueue
execution.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Tested-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/macb: ethtool interface: add register dump feature
Nicolas Ferre [Wed, 31 Oct 2012 06:04:56 +0000 (06:04 +0000)]
net/macb: ethtool interface: add register dump feature

Add macb_get_regs() ethtool function and its helper function:
macb_get_regs_len().
The version field is deduced from the IP revision which gives the
"MACB or GEM" information. An additional version field is reserved.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Tested-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/macb: clean up ring buffer logic
Havard Skinnemoen [Wed, 31 Oct 2012 06:04:55 +0000 (06:04 +0000)]
net/macb: clean up ring buffer logic

Instead of masking head and tail every time we increment them, just let them
wrap through UINT_MAX and mask them when subscripting. Add simple accessor
functions to do the subscripting properly to minimize the chances of messing
this up.

This makes the code slightly smaller, and hopefully faster as well.  Also,
doing the ring buffer management this way will simplify things a lot when
making the ring sizes configurable in the future.

Available number of descriptors in ring buffer function by David Laight.

Signed-off-by: Havard Skinnemoen <havard@skinnemoen.net>
[nicolas.ferre@atmel.com: split patch in topics, adapt to newer kernel]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Tested-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/macb: tx status is more than 8 bits now
Nicolas Ferre [Wed, 31 Oct 2012 06:04:54 +0000 (06:04 +0000)]
net/macb: tx status is more than 8 bits now

On some revision of GEM, TSR status register has more information.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Tested-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/macb: remove macb_get_drvinfo()
Nicolas Ferre [Wed, 31 Oct 2012 06:04:53 +0000 (06:04 +0000)]
net/macb: remove macb_get_drvinfo()

This function has little meaning so remove it altogether and
let ethtool core fill in the fields automatically.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Tested-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/macb: change debugging messages
Havard Skinnemoen [Wed, 31 Oct 2012 06:04:52 +0000 (06:04 +0000)]
net/macb: change debugging messages

Convert some noisy netdev_dbg() statements to netdev_vdbg(). Defining
DEBUG will no longer fill up the logs; VERBOSE_DEBUG still does.
Add one more verbose debug for ISR status.

Signed-off-by: Havard Skinnemoen <havard@skinnemoen.net>
[nicolas.ferre@atmel.com: split patch in topics, add ISR status]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Tested-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/macb: memory barriers cleanup
Havard Skinnemoen [Wed, 31 Oct 2012 06:04:51 +0000 (06:04 +0000)]
net/macb: memory barriers cleanup

Remove a couple of unneeded barriers and document the remaining ones.

Signed-off-by: Havard Skinnemoen <havard@skinnemoen.net>
[nicolas.ferre@atmel.com: split patch in topics]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Tested-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/macb: Add support for Gigabit Ethernet mode
Patrice Vilchez [Wed, 31 Oct 2012 06:04:50 +0000 (06:04 +0000)]
net/macb: Add support for Gigabit Ethernet mode

Add Gigabit Ethernet mode to GEM cadence IP and enable RGMII connection.

Signed-off-by: Patrice Vilchez <patrice.vilchez@atmel.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Tested-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotime: remove the timecompare code.
Richard Cochran [Wed, 31 Oct 2012 06:27:25 +0000 (06:27 +0000)]
time: remove the timecompare code.

This patch removes the timecompare code from the kernel. The top five
reasons to do this are:

1. There are no more users of this code.
2. The original idea was a bit weak.
3. The original author has disappeared.
4. The code was not general purpose but tuned to a particular hardware,
5. There are better ways to accomplish clock synchronization.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Tested-by: Bob Liu <lliubbo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobfin_mac: offer a PTP Hardware Clock.
Richard Cochran [Wed, 31 Oct 2012 06:27:24 +0000 (06:27 +0000)]
bfin_mac: offer a PTP Hardware Clock.

The BF518 has a PTP time unit that works in a similar way to other MAC
based clocks, like gianfar, ixp46x, and igb. This patch adds support for
using the blackfin as a PHC. Although the blackfin hardware does offer a
few ancillary features, this patch implements only the basic operations.

Compile tested only.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Bob Liu <lliubbo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobfin_mac: replace sys time stamps with raw ones instead.
Richard Cochran [Wed, 31 Oct 2012 06:27:23 +0000 (06:27 +0000)]
bfin_mac: replace sys time stamps with raw ones instead.

This patch replaces the sys time stamps and timecompare code with simple
raw hardware time stamps in nanosecond resolution. The only tricky bit is
to find a PTP Hardware Clock period slower than the input clock period
and a power of two.

Compile tested only.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Bob Liu <lliubbo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobfin_mac: only advertise hardware time stamped when enabled.
Richard Cochran [Wed, 31 Oct 2012 06:27:22 +0000 (06:27 +0000)]
bfin_mac: only advertise hardware time stamped when enabled.

The hardware time stamping code is a compile time option for the blackfin.
When it is not enabled, the driver should fall back to the standard
ethtool reply to the get_ts_info query.

Compile tested only.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Bob Liu <lliubbo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoptp: add an ioctl to compare PHC time with system time
Richard Cochran [Wed, 31 Oct 2012 06:19:07 +0000 (06:19 +0000)]
ptp: add an ioctl to compare PHC time with system time

This patch adds an ioctl for PTP Hardware Clock (PHC) devices that allows
user space to measure the time offset between the PHC and the system
clock. Rather than hard coding any kind of estimation algorithm into the
kernel, this patch takes the more flexible approach of just delivering
an array of raw clock readings. In that way, the user space clock servo
may be adapted to new and different hardware clocks.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoptp: Enable clock drivers along with associated net/PHY drivers
Ben Hutchings [Wed, 31 Oct 2012 15:33:52 +0000 (15:33 +0000)]
ptp: Enable clock drivers along with associated net/PHY drivers

Where a PTP clock driver is associated with a net or PHY driver, it
should be enabled automatically whenever that driver is enabled.
Therefore:

- Make PTP clock drivers select rather than depending on PTP_1588_CLOCK
- Remove separate boolean options for PTP clock drivers that are built
  as part of net driver modules.  (This also fixes cases where the PTP
  subsystem is wrongly forced to be built-in.)
- Set 'default y' for PTP clock drivers that depend on specific net
  drivers but are built separately

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoptp: Make PTP_1588_CLOCK select rather than depend on PPS
Ben Hutchings [Wed, 31 Oct 2012 15:32:44 +0000 (15:32 +0000)]
ptp: Make PTP_1588_CLOCK select rather than depend on PPS

PTP hardware clock drivers that select PTP_1588_CLOCK must currently
also select PPS.  For those drivers that don't, the user must enable
PPS, then enable PTP_1588_CLOCK, then the driver.  Simplify things for
developers and users by putting this selection in one place.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agopps, ptp: Remove dependencies on EXPERIMENTAL
Ben Hutchings [Wed, 31 Oct 2012 15:31:29 +0000 (15:31 +0000)]
pps, ptp: Remove dependencies on EXPERIMENTAL

These are now established subsystems, and we want drivers to be able
to select PPS and PTP_1588_CLOCK without depending on EXPERIMENTAL.
Further, the use of EXPERIMENTAL is now deprecated in general.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosk-filter: Add ability to get socket filter program (v2)
Pavel Emelyanov [Thu, 1 Nov 2012 02:01:48 +0000 (02:01 +0000)]
sk-filter: Add ability to get socket filter program (v2)

The SO_ATTACH_FILTER option is set only. I propose to add the get
ability by using SO_ATTACH_FILTER in getsockopt. To be less
irritating to eyes the SO_GET_FILTER alias to it is declared. This
ability is required by checkpoint-restore project to be able to
save full state of a socket.

There are two issues with getting filter back.

First, kernel modifies the sock_filter->code on filter load, thus in
order to return the filter element back to user we have to decode it
into user-visible constants. Fortunately the modification in question
is interconvertible.

Second, the BPF_S_ALU_DIV_K code modifies the command argument k to
speed up the run-time division by doing kernel_k = reciprocal(user_k).
Bad news is that different user_k may result in same kernel_k, so we
can't get the original user_k back. Good news is that we don't have
to do it. What we need to is calculate a user2_k so, that

  reciprocal(user2_k) == reciprocal(user_k) == kernel_k

i.e. if it's re-loaded back the compiled again value will be exactly
the same as it was. That said, the user2_k can be calculated like this

  user2_k = reciprocal(kernel_k)

with an exception, that if kernel_k == 0, then user2_k == 1.

The optlen argument is treated like this -- when zero, kernel returns
the amount of sock_fprog elements in filter, otherwise it should be
large enough for the sock_fprog array.

changes since v1:
* Declared SO_GET_FILTER in all arch headers
* Added decode of vlan-tag codes

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotuntap: choose the txq based on rxq
Jason Wang [Wed, 31 Oct 2012 19:46:02 +0000 (19:46 +0000)]
tuntap: choose the txq based on rxq

This patch implements a simple multiqueue flow steering policy - tx follows rx
for tun/tap. The idea is simple, it just choose the txq based on which rxq it
comes. The flow were identified through the rxhash of a skb, and the hash to
queue mapping were recorded in a hlist with an ageing timer to retire the
mapping. The mapping were created when tun receives packet from userspace, and
was quired in .ndo_select_queue().

I run co-current TCP_CRR test and didn't see any mapping manipulation helpers in
perf top, so the overhead could be negelected.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotuntap: add ioctl to attach or detach a file form tuntap device
Jason Wang [Wed, 31 Oct 2012 19:46:01 +0000 (19:46 +0000)]
tuntap: add ioctl to attach or detach a file form tuntap device

Sometimes usespace may need to active/deactive a queue, this could be done by
detaching and attaching a file from tuntap device.

This patch introduces a new ioctls - TUNSETQUEUE which could be used to do
this. Flag IFF_ATTACH_QUEUE were introduced to do attaching while
IFF_DETACH_QUEUE were introduced to do the detaching.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotuntap: multiqueue support
Jason Wang [Wed, 31 Oct 2012 19:46:00 +0000 (19:46 +0000)]
tuntap: multiqueue support

This patch converts tun/tap to a multiqueue devices and expose the multiqueue
queues as multiple file descriptors to userspace. Internally, each tun_file were
abstracted as a queue, and an array of pointers to tun_file structurs were
stored in tun_structure device, so multiple tun_files were allowed to be
attached to the device as multiple queues.

When choosing txq, we first try to identify a flow through its rxhash, if it
does not have such one, we could try recorded rxq and then use them to choose
the transmit queue. This policy may be changed in the future.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotuntap: introduce multiqueue flags
Jason Wang [Wed, 31 Oct 2012 19:45:59 +0000 (19:45 +0000)]
tuntap: introduce multiqueue flags

Add flags to be used by creating multiqueue tuntap device.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotuntap: RCUify dereferencing between tun_struct and tun_file
Jason Wang [Wed, 31 Oct 2012 19:45:58 +0000 (19:45 +0000)]
tuntap: RCUify dereferencing between tun_struct and tun_file

RCU were introduced in this patch to synchronize the dereferences between
tun_struct and tun_file. All tun_{get|put} were replaced with RCU, the
dereference from one to other must be done under rtnl lock or rcu read critical
region.

This is needed for the following patches since the one of the goal of multiqueue
tuntap is to allow adding or removing queues during workload. Without RCU,
control path would hold tx locks when adding or removing queues (which may cause
sme delay) and it's hard to change the number of queues without stopping the net
device. With the help of rcu, there's also no need for tun_file hold an refcnt
to tun_struct.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotuntap: move socket to tun_file
Jason Wang [Wed, 31 Oct 2012 19:45:57 +0000 (19:45 +0000)]
tuntap: move socket to tun_file

Current tuntap makes use of the socket receive queue as its tx queue. To
implement multiple tx queues for tuntap and enable the ability of adding and
removing queues during workload, the first step is to move the socket related
structures to tun_file. Then we could let multiple fds/sockets to be attached to
the tuntap.

This patch removes tun_sock and moves socket related structures from tun_sock or
tun_struct to tun_file. Two exceptions are tap_filter and sock_fprog, they are
still kept in tun_structure since they are used to filter packets for the net
device instead of per transmit queue (at least I see no requirements for
them). After those changes, socket were created and destroyed during file open
and close (instead of device creation and destroy), the socket structures could
be dereferenced from tun_file instead of the file of tun_struct structure
itself.

For persisent device, since we purge during datching and wouldn't queue any
packets when no interface were attached, there's no behaviod changes before and
after this patch, so the changes were transparent to the userspace. To keep the
attributes such as sndbuf, socket filter and vnet header, those would be
re-initialize after a new interface were attached to an persist device.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotuntap: log the unsigned informaiton with %u
Jason Wang [Wed, 31 Oct 2012 19:45:56 +0000 (19:45 +0000)]
tuntap: log the unsigned informaiton with %u

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoe1000: fix concurrent accesses to PHY from watchdog and ethtool
Maxime Bizon [Sat, 20 Oct 2012 14:53:40 +0000 (14:53 +0000)]
e1000: fix concurrent accesses to PHY from watchdog and ethtool

The e1000 driver currently does not protect concurrent accesses to the PHY
from both the ethtool callbacks, and from the e1000_watchdog function. This
patchs adds a new spinlock which is used by e1000_{read,write}_phy_reg in
order to serialize concurrent accesses to the PHY.

Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Florian Fainelli <ffainelli@freebox.fr>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoigb: Fix EEPROM writes via ethtool on i210
Carolyn Wyborny [Wed, 24 Oct 2012 03:56:21 +0000 (03:56 +0000)]
igb: Fix EEPROM writes via ethtool on i210

This patch fixes a problem where the driver would crash when trying to
write a word to the EEPROM on i210 devices.

Reported-by: Ekman Tsang <Ekman.Tsang@riverbed.com>
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoigb: Add function to read i211's invm version
Carolyn Wyborny [Tue, 23 Oct 2012 13:04:37 +0000 (13:04 +0000)]
igb: Add function to read i211's invm version

The i211's one-time programmable (invm) version field is different than the
other fields contained in it.  This patch adds a function to get the invm version
of it and store it for output from ethtool.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoigb: Remove workaround for EEE configuration on i210/I211
Carolyn Wyborny [Fri, 19 Oct 2012 05:31:43 +0000 (05:31 +0000)]
igb: Remove workaround for EEE configuration on i210/I211

This patch removes a workaround that was needed on pre-release hardware.
Released hardware should not have this setting, but any devices that do
will get a warning message instead.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: fix default setting of TXDCTL.WTHRESH
Emil Tantilov [Wed, 24 Oct 2012 08:12:10 +0000 (08:12 +0000)]
ixgbe: fix default setting of TXDCTL.WTHRESH

The q_vector->itr check in ixgbe_configure_tx_ring() was done prior to it
being set, which resulted in TXDCTL.WTHRESH always being set to 1 on driver
load, while consequent resets would set it to 8.

This patch moves the setting of q_vector->itr in ixgbe_alloc_q_vector() to
make sure that TXDCTL.WTHRESH is set to 8 by default.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: fix uninitialized event.type in ixgbe_ptp_check_pps_event
Jacob Keller [Sat, 13 Oct 2012 05:00:06 +0000 (05:00 +0000)]
ixgbe: fix uninitialized event.type in ixgbe_ptp_check_pps_event

This patch fixes a bug in ixgbe_ptp_check_pps_event where the type was
uninitialized and could cause unknown event outcomes.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Wed, 31 Oct 2012 18:25:33 +0000 (14:25 -0400)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
This series contains updates to ixgbe, ixgbevf, igbvf, igb and
networking core (bridge).  Most notably is the addition of support
for local link multicast addresses in SR-IOV mode to the networking
core.

Also note, the ixgbe patch "ixgbe: Add support for pipeline reset" and
"ixgbe: Fix return value from macvlan filter function" is revised based
on community feedback.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoethernet: Convert dev_printk(KERN_<LEVEL> to dev_<level>(
Joe Perches [Sat, 27 Oct 2012 22:05:48 +0000 (22:05 +0000)]
ethernet: Convert dev_printk(KERN_<LEVEL> to dev_<level>(

dev_<level> calls take less code than dev_printk(KERN_<LEVEL>
and reducing object size is good.
Coalesce formats for easier grep.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agox86: bpf_jit_comp: add vlan tag support
Eric Dumazet [Sat, 27 Oct 2012 02:26:22 +0000 (02:26 +0000)]
x86: bpf_jit_comp: add vlan tag support

This patch is a follow-up for patch "net: filter: add vlan tag access"
to support the new VLAN_TAG/VLAN_TAG_PRESENT accessors in BPF JIT.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ani Sinha <ani@aristanetworks.com>
Cc: Daniel Borkmann <danborkmann@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: filter: add vlan tag access
Eric Dumazet [Sat, 27 Oct 2012 02:26:17 +0000 (02:26 +0000)]
net: filter: add vlan tag access

BPF filters lack ability to access skb->vlan_tci

This patch adds two new ancillary accessors :

SKF_AD_VLAN_TAG         (44) mapped to vlan_tx_tag_get(skb)

SKF_AD_VLAN_TAG_PRESENT (48) mapped to vlan_tx_tag_present(skb)

This allows libpcap/tcpdump to use a kernel filter instead of
having to fallback to accept all packets, then filter them in
user space.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Ani Sinha <ani@aristanetworks.com>
Suggested-by: Daniel Borkmann <danborkmann@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/cadence: depend on HAS_IOMEM
Joachim Eastwood [Sat, 27 Oct 2012 02:10:23 +0000 (02:10 +0000)]
net/cadence: depend on HAS_IOMEM

Fixes the following build failure on S390:
  In file included from drivers/net/ethernet/cadence/at91_ether.c:35:0:
   drivers/net/ethernet/cadence/macb.h: In function 'macb_is_gem':
   drivers/net/ethernet/cadence/macb.h:563:2: error: implicit declaration of function '__raw_readl' [-Werror=implicit-function-declaration]
   drivers/net/ethernet/cadence/at91_ether.c: In function 'update_mac_address':
   drivers/net/ethernet/cadence/at91_ether.c:119:2: error: implicit declaration of function '__raw_writel' [-Werror=implicit-function-declaration]
   cc1: some warnings being treated as errors

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonetxen: explicity handle pause autoneg parameter
Flavio Leitner [Fri, 26 Oct 2012 14:39:53 +0000 (14:39 +0000)]
netxen: explicity handle pause autoneg parameter

The hardware doesn't support controlling pause frames autoneg, so
report that back correctly to userspace.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotcp: make tcp_clear_md5_list static
stephen hemminger [Fri, 26 Oct 2012 14:31:40 +0000 (14:31 +0000)]
tcp: make tcp_clear_md5_list static

Trivial. Only used in one file.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: compute skb->rxhash if nic hash may be 3-tuple
Willem de Bruijn [Fri, 26 Oct 2012 11:52:08 +0000 (11:52 +0000)]
net: compute skb->rxhash if nic hash may be 3-tuple

Network device drivers can communicate a Toeplitz hash in skb->rxhash,
but devices differ in their hashing capabilities. All compute a 5-tuple
hash for TCP over IPv4, but for other connection-oriented protocols,
they may compute only a 3-tuple. This breaks RPS load balancing, e.g.,
for TCP over IPv6 flows. Additionally, for GRE and other tunnels,
the kernel computes a 5-tuple hash over the inner packet if possible,
but devices do not.

This patch recomputes the rxhash in software in all cases where it
cannot be certain that a 5-tuple was computed. Device drivers can avoid
recomputation by setting the skb->l4_rxhash flag.

Recomputing adds cycles to each packet when RPS is enabled or the
packet arrives over a tunnel. A comparison of 200x TCP_STREAM between
two servers running unmodified netnext with rxhash computation
in hardware vs software (using ethtool -K eth0 rxhash [on|off]) shows
how much time is spent in __skb_get_rxhash in this worst case:

     0.03%          swapper  [kernel.kallsyms]     [k] __skb_get_rxhash
     0.03%          swapper  [kernel.kallsyms]     [k] __skb_get_rxhash
     0.05%          swapper  [kernel.kallsyms]     [k] __skb_get_rxhash

With 200x TCP_RR it increases to

     0.10%          netperf  [kernel.kallsyms]     [k] __skb_get_rxhash
     0.10%          netperf  [kernel.kallsyms]     [k] __skb_get_rxhash
     0.10%          netperf  [kernel.kallsyms]     [k] __skb_get_rxhash

I considered having the patch explicitly skips recomputation when it knows
that it will not improve the hash (TCP over IPv4), but that conditional
complicates code without saving many cycles in practice, because it has
to take place after flow dissector.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agodlink: dl2k: use the module_pci_driver macro
Devendra Naga [Fri, 26 Oct 2012 09:29:00 +0000 (09:29 +0000)]
dlink: dl2k: use the module_pci_driver macro

use the module_pci_driver macro to make the code simpler
by eliminating module_init and module_exit calls.

Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agorealtek: r8169: use module_pci_driver macro
Devendra Naga [Fri, 26 Oct 2012 09:27:42 +0000 (09:27 +0000)]
realtek: r8169: use module_pci_driver macro

use the module_pci_driver macro to make the code simpler
by eliminating the module_init and module_exit calls

Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
David S. Miller [Wed, 31 Oct 2012 17:52:52 +0000 (13:52 -0400)]
Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

included changes:
- some code cleanups and minor fixes (3 of them were reported by Coverity)
- 'struct hard_iface' re-shaping to improve multi-protocol support
- ECTP packets silent drop
- transfer the WIFI flag on clients in case of roaming