git.kernelconcepts.de Git - karo-tx-linux.git/log

Linux 2.6.16.54-rc1

TCP: Fix TCP handling of SACK in bidirectional flows

It's possible that new SACK blocks that should trigger new LOST
markings arrive with new data (which previously made is_dupack
false). In addition, I think this fixes a case where we get
a cumulative ACK with enough SACK blocks to trigger the fast
recovery (is_dupack would be false there too).

I'm not completely pleased with this solution because readability
of the code is somewhat questionable as 'is_dupack' in SACK case
is no longer about dupacks only but would mean something like
'lost_marker_work_todo' too... But because of Eifel stuff done
in CA_Recovery, the FLAG_DATA_SACKED check cannot be placed to
the if statement which seems attractive solution. Nevertheless,
I didn't like adding another variable just for that either... :-)

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@kernel.org>

[PPP]: Fix output buffer size in ppp_decompress_frame().

This patch addresses the issue with "osize too small" errors in mppe
encryption. The patch fixes the issue with wrong output buffer size
being passed to ppp decompression routine.

--------------------
As pointed out by Suresh Mahalingam, the issue addressed by
ppp-fix-osize-too-small-errors-when-decoding patch is not fully resolved yet.
The size of allocated output buffer is correct, however it size passed to
ppp->rcomp->decompress in ppp_generic.c if wrong. The patch fixes that.
--------------------

Signed-off-by: Konstantin Sharlaimov <konstantin.sharlaimov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@kernel.org>

[PPP]: Fix osize too small errors when decoding mppe.

The mppe_decompress() function required a buffer that is 1 byte too
small when receiving a message of mru size. This fixes buffer
allocation to prevent this from occurring.

Signed-off-by: Konstantin Sharlaimov <konstantin.sharlaimov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@kernel.org>

[MATH-EMU]: Fix underflow exception reporting.

The underflow exception cases were wrong.

This is one weird area of ieee1754 handling in that the underflow
behavior changes based upon whether underflow is enabled in the trap
enable mask of the FPU control register.  As a specific case the Sparc
V9 manual gives us the following description:

--------------------
If UFM = 0:     Underflow occurs if a nonzero result is tiny and a
                loss of accuracy occurs.  Tininess may be detected
                before or after rounding.  Loss of accuracy may be
                either a denormalization loss or an inexact result.

If UFM = 1:     Underflow occurs if a nonzero result is tiny.
                Tininess may be detected before or after rounding.
--------------------

What this amounts to in the packing case is if we go subnormal,
we set underflow if any of the following are true:

1) rounding sets inexact
2) we ended up rounding back up to normal (this is the case where
   we set the exponent to 1 and set the fraction to zero), this
   should set inexact too
3) underflow is set in FPU control register trap-enable mask

The initially discovered example was "DBL_MIN / 16.0" which
incorrectly generated an underflow.  It should not, unless underflow
is set in the trap-enable mask of the FPU csr.

Another example, "0x0.0000000000001p-1022 / 16.0", should signal both
inexact and underflow.  The cpu implementations and ieee1754
literature is very clear about this.  This is case #2 above.

However, if underflow is set in the trap enable mask, only underflow
should be set and reported as a trap.  That is handled properly by the
prioritization logic in

arch/sparc{,64}/math-emu/math.c:record_exception().

Based upon a report and test case from Jakub Jelinek.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@kernel.org>

[SPARC32]: Fix rounding errors in ndelay/udelay implementation.

__ndelay and __udelay have not been delayung >= specified time.
The problem with __ndelay has been tacked down to the rounding of the
multiplier constant. By changing this, delays > app 18us are correctly
calculated.
The problem with __udelay has also been tracked down to rounding issues.
Changing the multiplier constant (to match that used in sparc64) corrects
for large delays and adding in a rounding constant corrects for trunctaion
errors in the claculations.
Many short delays will return without looping. This is not an error as there
is the fixed delay of doing all the maths to calculate the loop count.

Signed-off-by: Mark Fortescue <mark@mtfhpc.demon.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@kernel.org>

[SPARC32]: Fix bug in sparc optimized memset.

Sparc optimized memset (arch/sparc/lib/memset.S) does not fill last
byte of the memory area, if area size is less than 8 bytes and start
address is not word (4-bytes) aligned.

Here is code chunk where bug located:
/* %o0 - memory address, %o1 - size, %g3 - value */
8:
     add    %o0, 1, %o0
    subcc    %o1, 1, %o1
    bne,a    8b
     stb %g3, [%o0 - 1]

This code should write byte every loop iteration, but last time delay
instruction stb is not executed because branch instruction sets
"annul" bit.

Patch replaces bne,a by bne instruction.

Error can be reproduced by simple kernel module:

--------------------
#include <linux/module.h>
#include <linux/config.h>
#include <linux/kernel.h>
#include <linux/errno.h>
#include <string.h>

static void do_memset(void **p, int size)
{
        memset(p, 0x00, size);
}

static int __init memset_test_init(void)
{
    char fooc[8];
    int *fooi;
    memset(fooc, 0xba, sizeof(fooc));

    do_memset((void**)(fooc + 3), 1);

    fooi = (int*) fooc;
    printk("%08X %08X\n", fooi[0], fooi[1]);

    return -1;
}

static void __exit memset_test_cleanup(void)
{
    return;
}

module_init(memset_test_init);
module_exit(memset_test_cleanup);

MODULE_LICENSE("GPL");
EXPORT_NO_SYMBOLS;
--------------------

Signed-off-by: Alexander Shmelev <ashmelev@task.sun.mcst.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@kernel.org>

md: avoid possible BUG_ON in md bitmap handling

md/bitmap tracks how many active write requests are pending on blocks
associated with each bit in the bitmap, so that it knows when it can clear
the bit (when count hits zero).

The counter has 14 bits of space, so if there are ever more than 16383, we
cannot cope.

Currently the code just calles BUG_ON as "all" drivers have request queue
limits much smaller than this.

However is seems that some don't. Apparently some multipath configurations
can allow more than 16383 concurrent write requests.

So, in this unlikely situation, instead of calling BUG_ON we now wait
for the count to drop down a bit. This requires a new wait_queue_head,
some waiting code, and a wakeup call.

Tested by limiting the counter to 20 instead of 16383 (writes go a lot slower
in that case...).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>

md: fix a few problems with the interface (sysfs and ioctl) to md

While developing more functionality in mdadm I found some bugs in md...

- When we remove a device from an inactive array (write 'remove' to
  the 'state' sysfs file - see 'state_store') would should not
  update the superblock information - as we may not have
  read and processed it all properly yet.

- initialise all raid_disk entries to '-1' else the 'slot sysfs file
  will claim '0' for all devices in an array before the array is
  started.

- all '\n' not to be present at the end of words written to
  sysfs files

- when we use SET_ARRAY_INFO to set the md metadata version,
  set the flag to say that there is persistant metadata.

- allow GET_BITMAP_FILE to be called on an array that hasn't
  been started yet.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>

md: assorted md and raid1 one-liners

Fix few bugs that meant that:
  - superblocks weren't alway written at exactly the right time (this
    could show up if the array was not written to - writting to the array
    causes lots of superblock updates and so hides these errors).

  - restarting device recovery after a clean shutdown (version-1 metadata
    only) didn't work as intended (or at all).

1/ Ensure superblock is updated when a new device is added.
2/ Remove an inappropriate test on MD_RECOVERY_SYNC in md_do_sync.
   The body of this if takes one of two branches depending on whether
   MD_RECOVERY_SYNC is set, so testing it in the clause of the if
   is wrong.
3/ Flag superblock for updating after a resync/recovery finishes.
4/ If we find the neeed to restart a recovery in the middle (version-1
   metadata only) make sure a full recovery (not just as guided by
   bitmaps) does get done.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>

md: allow SET_BITMAP_FILE to work on 64bit kernel with 32bit userspace

.. so that you can use bitmaps with 32bit userspace on a 64 bit kernel.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>

md: fix some small races in bitmap plugging in raid5

The comment gives more details, but I didn't quite have the sequencing write,
so there was room for races to leave bits unset in the on-disk bitmap for
short periods of time.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>

md: fix a plug/unplug race in raid5

When a device is unplugged, requests are moved from one or two (depending on
whether a bitmap is in use) queues to the main request queue.

So whenever requests are put on either of those queues, we should make sure
the raid5 array is 'plugged'.  However we don't.  We currently plug the raid5
queue just before putting requests on queues, so there is room for a race.  If
something unplugs the queue at just the wrong time, requests will be left on
the queue and nothing will want to unplug them.  Normally something else will
plug and unplug the queue fairly soon, but there is a risk that nothing will.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>

md: fix resync speed calculation for restarted resyncs

We introduced 'io_sectors' recently so we could count the sectors that causes
io during resync separate from sectors which didn't cause IO - there can be a
difference if a bitmap is being used to accelerate resync.

However when a speed is reported, we find the number of sectors processed
recently by subtracting an oldish io_sectors count from a current
'curr_resync' count. This is wrong because curr_resync counts all sectors,
not just io sectors.

So, add a field to mddev to store the curren io_sectors separately from
curr_resync, and use that in the calculations.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>

md: Allow re-add to work on array without bitmaps

When an array has a bitmap, a device can be removed and re-added and only
blocks changes since the removal (as recorded in the bitmap) will be resynced.

It should be possible to do a similar thing to arrays without bitmaps.  i.e.
if a device is removed and re-added and *no* changes have been made in the
interim, then the add should not require a resync.

This patch allows that option.  This means that when assembling an array one
device at a time (e.g.  during device discovery) the array can be enabled
read-only as soon as enough devices are available, but extra devices can still
be added without causing a resync.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>

md/bitmap: tidy up i_writecount handling in md/bitmap

md/bitmap modifies i_writecount of a bitmap file to make sure that no-one else
writes to it. The reverting of the change is sometimes done twice, and there
is one error path where it is omitted.

This patch tidies that up.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>

md/bitmap: remove dead code from md/bitmap

bitmap_active is never called, and the BITMAP_ACTIVE flag is never users or
tested, so discard them both.

Also remove some out-of-date 'todo' comments.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>

md/bitmap: remove unnecessary page reference manipulations from md/bitmap code

md/bitmap gets a collection of pages representing the bitmap when it
initialises the bitmap, and puts all the references when discarding the
bitmap.

It also occasionally takes extra references without any good reason, and
sometimes drops them ... though it doesn't always drop them, which can result
in a memory leak.

This patch removes the unnecessary 'get_page' calls, and the corresponding
'put_page' calls.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>

md/bitmap: use set_bit etc for bitmap page attributes

In particular, this means that we use 4 bits per page instead of a whole
unsigned long.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>

md/bitmap: cleaner separation of page attribute handlers in md/bitmap

md/bitmap has some attributes per-page.  Handling of these attributes in
largely abstracted in set_page_attr and clear_page_attr.  However
get_page_attr exposes the format used to store them.  So prior to changing
that format, introduce test_page_attr instead of get_page_attr, and make
appropriate usage changes.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>

md/bitmap: fix online removal of file-backed bitmaps

When "mdadm --grow /dev/mdX --bitmap=none" is used to remove a filebacked
bitmap, the bitmap was disconnected from the array, but the file wasn't closed
(until the array was stopped).

The file also wasn't closed if adding the bitmap file failed.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>

md: Don't clear bits in bitmap when writing to one device fails during recovery

Currently a device failure during recovery leaves bits set in the bitmap.
This normally isn't a problem as the offending device will be rejected because
of errors. However if device re-adding is being used with non-persistent
bitmaps, this can be a problem.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>

md: Add '4' to the list of levels for which bitmaps are supported

I really should make this a function of the personality....

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>

[MCA] fix bus matching

There's a bug in the MCA bus matching algorithm in that it promotes from
signed short to int before comparing with the actual id and does sign
extension on anything > 0x7fff (which means that pos ids > 0x7fff never
get correctly matched).

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>

Linux 2.6.16.53

Linux 2.6.16.53-rc1

[IPV6]: MSG_ERRQUEUE messages do not pass to connected raw sockets

Taken from http://bugzilla.kernel.org/show_bug.cgi?id=8747

Problem Description:

It is related to the possibility to obtain MSG_ERRQUEUE messages from the udp
and raw sockets, both connected and unconnected.

There is a little typo in net/ipv6/icmp.c code, which prevents such messages
to be delivered to the errqueue of the correspond raw socket, when the socket
is CONNECTED.  The typo is due to swap of local/remote addresses.

Consider __raw_v6_lookup() function from net/ipv6/raw.c. When a raw socket is
looked up usual way, it is something like:

sk = __raw_v6_lookup(sk, nexthdr, daddr, saddr, IP6CB(skb)->iif);

where "daddr" is a destination address of the incoming packet (IOW our local
address), "saddr" is a source address of the incoming packet (the remote end).

But when the raw socket is looked up for some icmp error report, in
net/ipv6/icmp.c:icmpv6_notify() , daddr/saddr are obtained from the echoed
fragment of the "bad" packet, i.e.  "daddr" is the original destination
address of that packet, "saddr" is our local address.  Hence, for
icmpv6_notify() must use "saddr, daddr" in its arguments, not "daddr, saddr"
...

Steps to reproduce:

Create some raw socket, connect it to an address, and cause some error
situation: f.e. set ttl=1 where the remote address is more than 1 hop to reach.
Set IPV6_RECVERR .
Then send something and wait for the error (f.e. poll() with POLLERR|POLLIN).
You should receive "time exceeded" icmp message (because of "ttl=1"), but the
socket do not receive it.

If you do not connect your raw socket, you will receive MSG_ERRQUEUE
successfully.  (The reason is that for unconnected socket there are no actual
checks for local/remote addresses).

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[NET]: Fix gen_estimator timer removal race

As noticed by Jarek Poplawski <jarkao2@o2.pl>, the timer removal in
gen_kill_estimator races with the timer function rearming the timer.

Check whether the timer list is empty before rearming the timer
in the timer function to fix this.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jarek Poplawski <jarkao2@o2.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

SCTP: Add scope_id validation for link-local binds

SCTP currently permits users to bind to link-local addresses,
but doesn't verify that the scope id specified at bind matches
the interface that the address is configured on. It was report
that this can hang a system.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[NET] skbuff: remove export of static symbol

skb_clone_fraglist is static so it shouldn't be exported.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[NETFILTER]: nf_conntrack: don't track locally generated special ICMP error

The conntrack assigned to locally generated ICMP error is usually the one
assigned to the original packet which has caused the error. But if
the original packet is handled as invalid by nf_conntrack, no conntrack
is assigned to the original packet. Then nf_ct_attach() cannot assign
any conntrack to the ICMP error packet. In that case the current
nf_conntrack_icmp assigns appropriate conntrack to it. But the current
code mistakes the direction of the packet. As a result, NAT code mistakes
the address to be mangled.

To fix the bug, this changes nf_conntrack_icmp not to assign conntrack
to such ICMP error. Actually no address is necessary to be mangled
in this case.

Spotted by Jordan Russell.

Upstream commit ID: 130e7a83d7ec8c5c673225e0fa8ea37b1ed507a5

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

ide: clear bmdma status in ide_intr() for ICHx controllers (revised #4)

patch 1/2 (revised):
- Fix drive->waiting_for_dma to work with CDB-intr devices.
- Do the dma status clearing in ide_intr() and add a new
hwif->ide_dma_clear_irq for Intel ICHx controllers.

Revised per Alan, Sergei and Bart's advice.

Patch against 2.6.20-rc6. Tested ok on my ICH4 and pdc20275 adapters.
Please review/apply, thanks.

Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

8139too.c: fix netpoll deadlock

fix deadlock in the 8139too driver: poll handlers should never forcibly
enable local interrupts, because they might be used by netpoll/printk
from IRQ context.

  =================================
  [ INFO: inconsistent lock state ]
  2.6.19 #11
  ---------------------------------
  inconsistent {softirq-on-W} -> {in-softirq-W} usage.
  swapper/1 [HC0[0]:SC1[1]:HE1:SE0] takes:
   (&npinfo->poll_lock){-+..}, at: [<c0350a41>] net_rx_action+0x64/0x1de
  {softirq-on-W} state was registered at:
    [<c0134c86>] mark_lock+0x5b/0x39c
    [<c0135012>] mark_held_locks+0x4b/0x68
    [<c01351e9>] trace_hardirqs_on+0x115/0x139
    [<c02879e6>] rtl8139_poll+0x3d7/0x3f4
    [<c035c85d>] netpoll_poll+0x82/0x32f
    [<c035c775>] netpoll_send_skb+0xc9/0x12f
    [<c035cdcc>] netpoll_send_udp+0x253/0x25b
    [<c0288463>] write_msg+0x40/0x65
    [<c011cead>] __call_console_drivers+0x45/0x51
    [<c011cf16>] _call_console_drivers+0x5d/0x61
    [<c011d4fb>] release_console_sem+0x11f/0x1d8
    [<c011d7d7>] register_console+0x1ac/0x1b3
    [<c02883f8>] init_netconsole+0x55/0x67
    [<c010040c>] init+0x9a/0x24e
    [<c01049cf>] kernel_thread_helper+0x7/0x10
    [<ffffffff>] 0xffffffff
  irq event stamp: 819992
  hardirqs last  enabled at (819992): [<c0350a16>] net_rx_action+0x39/0x1de
  hardirqs last disabled at (819991): [<c0350b1e>] net_rx_action+0x141/0x1de
  softirqs last  enabled at (817552): [<c01214e4>] __do_softirq+0xa3/0xa8
  softirqs last disabled at (819987): [<c0106051>] do_softirq+0x5b/0xc9

  other info that might help us debug this:
  no locks held by swapper/1.

  stack backtrace:
   [<c0104d88>] dump_trace+0x63/0x1e8
   [<c0104f26>] show_trace_log_lvl+0x19/0x2e
   [<c010532d>] show_trace+0x12/0x14
   [<c0105343>] dump_stack+0x14/0x16
   [<c0134980>] print_usage_bug+0x23c/0x246
   [<c0134d33>] mark_lock+0x108/0x39c
   [<c01356a7>] __lock_acquire+0x361/0x9ed
   [<c0136018>] lock_acquire+0x56/0x72
   [<c03aff1f>] _spin_lock+0x35/0x42
   [<c0350a41>] net_rx_action+0x64/0x1de
   [<c0121493>] __do_softirq+0x52/0xa8
   [<c0106051>] do_softirq+0x5b/0xc9
   [<c0121338>] irq_exit+0x3c/0x48
   [<c0106163>] do_IRQ+0xa4/0xbd
   [<c01047c6>] common_interrupt+0x2e/0x34
   [<c011db92>] vprintk+0x2c0/0x309
   [<c011dbf6>] printk+0x1b/0x1d
   [<c01003f2>] init+0x80/0x24e
   [<c01049cf>] kernel_thread_helper+0x7/0x10
   =======================

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

ieee1394: forgotten dereference...

Going through the string and waiting for _pointer_ to become '\0'
is not what the authors meant...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Ben Collins <ben.collins@ubuntu.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

coda: wrong order of arguments of ->readdir()

Shows how many people are testing coda - the bug had been there for 5 years
and results of stepping on it are not subtle.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[TCP]: Use default 32768-61000 outgoing port range in all cases.

This diff changes the default port range used for outgoing connections,
from "use 32768-61000 in most cases, but use N-4999 on small boxes
(where N is a multiple of 1024, depending on just *how* small the box
is)" to just "use 32768-61000 in all cases".

I don't believe there are any drawbacks to this change, and it keeps
outgoing connection ports farther away from the mess of
IANA-registered ports.

Signed-off-by: Mark Glines <mark@glines.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[NET]: "wrong timeout value" in sk_wait_data() v2

sys_setsockopt() do not check properly timeout values for
SO_RCVTIMEO/SO_SNDTIMEO, for example it's possible to set negative timeout
values. POSIX do not defines behaviour for sys_setsockopt in case negative
timeouts, but requires that setsockopt() shall fail with -EDOM if the send and
receive timeout values are too big to fit into the timeout fields in the socket
structure.
In current implementation negative timeout can lead to error messages like
"schedule_timeout: wrong timeout value".

Proposed patch:
- checks tv_usec and returns -EDOM if it is wrong
- do not allows to set negative timeout values (sets 0 instead) and outputs
ratelimited information message about such attempts.

Signed-off-By: Vasily Averin <vvs@sw.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[SPARC]: Linux always started with 9600 8N1

The Linux kernel ignored the PROM's serial settings (115200,n,8,1 in
my case). This was because mode_prop remained "ttyX-mode" (expected:
"ttya-mode") due to the constness of string literals when used with
"char *". Since there is no "ttyX-mode" property in the PROM, Linux
always used the default 9600.

[ Investigation of the suncore.s assembler reveals that gcc optimizied
  away the stores, yet did not emit a warning, which is a pretty
  anti-social thing to do and is the only reason this bug lived for
  so long -DaveM ]

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[IPV4]: Correct rp_filter help text.

As mentioned in http://bugzilla.kernel.org/show_bug.cgi?id=5015
The helptext implies that this is on by default.
This may be true on some distros (Fedora/RHEL have it enabled
in /etc/sysctl.conf), but the kernel defaults to it off.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[NET]: Fix BMSR_100{HALF,FULL}2 defines in linux/mii.h

Noticed by Matvejchikov Ilya.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[NETFILTER]: {ip,nf}_conntrack_sctp: fix remotely triggerable NULL ptr dereference (CVE-2007-2876)

When creating a new connection by sending an unknown chunk type, we don't
transition to a valid state, causing a NULL pointer dereference in
sctp_packet when accessing sctp_timeouts[SCTP_CONNTRACK_NONE].

Fix by don't creating new conntrack entry if initial state is invalid.

Noticed by Vilmos Nebehaj <vilmos.nebehaj@ramsys.hu>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

ntfs_init_locked_inode(): fix array indexing

Local variable `i' is a byte-counter. Don't use it as an index into an array
of le32's.

Reported-by: "young dave" <hidave.darkstar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

Linux 2.6.16.52

Linux 2.6.16.52-rc1

[NET_SCHED]: prio qdisc boundary condition

This fixes an out-of-boundary condition when the classified
band equals q->bands. Caught by Alexey

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[IPV6]: Reverse sense of promisc tests in ip6_mc_input

Reverse the sense of the promiscuous-mode tests in ip6_mc_input().

Signed-off-by: Corey Mutter <crm-netdev@mutternet.com>
Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[IPV6]: Send ICMPv6 error on scope violations.

When an IPv6 router is forwarding a packet with a link-local scope source
address off-link, RFC 4007 requires it to send an ICMPv6 destination
unreachable with code 2 ("not neighbor"), but Linux doesn't. Fix below.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[TCP]: zero out rx_opt in tcp_disconnect()

When the server drops its connection, NFS client reconnects using the
same socket after disconnecting. If the new connection's SYN,ACK
doesn't contain the TCP timestamp option and the old connection's did,
tp->tcp_header_len is recomputed assuming no timestamp header but
tp->rx_opt.tstamp_ok remains set. Then tcp_build_and_update_options()
adds in a timestamp option past the end of the allocated TCP header,
overwriting TCP data, or when the data is in skb_shinfo(skb)->frags[],
overwriting skb_shinfo(skb) causing a crash soon after. (The issue was
debugged from such a crash.)

Similarly, wscale_ok and sack_ok also get set based on the SYN,ACK
packet but not reset on disconnect, since they are zeroed out at
initialization. The patch zeroes out the entire tp->rx_opt struct in
tcp_disconnect() to avoid this sort of problem.

Signed-off-by: Srinivas Aji <Aji_Srinivas@emc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[NETPOLL]: Remove CONFIG_NETPOLL_RX

Get rid of the CONFIG_NETPOLL_RX option completely since all the
dependencies have been removed long ago...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[NETPOLL]: Fix TX queue overflow in trapped mode.

CONFIG_NETPOLL_TRAP causes the TX queue controls to be completely bypassed in
the netpoll's "trapped" mode which easily causes overflows in the drivers with
short TX queues (most notably, in 8139too with its 4-deep queue). So, make
this option more sensible by making it only bypass the TX softirq wakeup.

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Acked-by: Tom Rini <trini@kernel.crashing.org>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[IPV6]: Track device renames in snmp6.

When network device's are renamed, the IPV6 snmp6 code
gets confused. It doesn't track name changes so it will OOPS
when network device's are removed.

The fix is trivial, just unregister/re-register in notify handler.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[IPV6]: Fix slab corruption running ip6sic

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

gcc-4.1.0 is bust

Keith says

Compiling 2.6.19-rc6 with gcc version 4.1.0 (SUSE Linux), wait_hpet_tick is
optimized away to a never ending loop and the kernel hangs on boot in timer
setup.

0000001a <wait_hpet_tick>:
  1a:   55                      push   %ebp
  1b:   89 e5                   mov    %esp,%ebp
  1d:   eb fe                   jmp    1d <wait_hpet_tick+0x3>

This is not a problem with gcc 3.3.5.  Adding barrier() calls to
wait_hpet_tick does not help, making the variables volatile does.

And the consensus is that gcc-4.1.0 is busted.

Adrian Bunk:
Changed from a #warning to an #error for 2.6.16.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

Linux 2.6.16.51

Linux 2.6.16.51-rc1

[X.25]: Add missing sock_put in x25_receive_data

__x25_find_socket does a sock_hold.
This adds a missing sock_put in x25_receive_data.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[NETFILTER]: ipt_CLUSTERIP: fix oops in checkentry function

The clusterip_config_find_get() already increases entries reference
counter, so there is no reason to do it twice in checkentry() callback.

This causes the config to be freed before it is removed from the list,
resulting in a crash when adding the next rule.

Signed-off-by: Jaroslav Kysela <perex@suse.cz>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

x86_64: ACPI_CPU_FREQ must select CPU_FREQ_TABLE

Fix a compile error reported by Michel Lespinasse.

Signed-off-by: Adrian Bunk <bunk@stusta.de>

hwmon/w83627ehf: Don't redefine REGION_OFFSET

On ia64, kernel headers define REGION_OFFSET so we can't use that.
Reported by Andrew Morton.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[NETFILTER]: ip_nat_proto_gre: do not modify/corrupt GREv0 packets through NAT

While porting some changes of the 2.6.21-rc7 pptp/proto_gre conntrack
and nat modules to a 2.4.32 kernel I noticed that the gre_key function
returns a wrong pointer to the GRE key of a version 0 packet thus
corrupting the packet payload.

The intended behaviour for GREv0 packets is to act like
ip_conntrack_proto_generic/ip_nat_proto_unknown so I have ripped the
offending functions (not used anymore) and modified the
ip_nat_proto_gre modules to not touch version 0 (non PPTP) packets.

Signed-off-by: Jorge Boncompte <jorge@dti2.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

holepunch: fix mmap_sem i_mutex deadlock

sys_madvise has down_write of mmap_sem, then madvise_remove calls
vmtruncate_range which takes i_mutex and i_alloc_sem: no, we can
easily devise deadlocks from that ordering.

madvise_remove drop mmap_sem while calling vmtruncate_range: luckily,
since madvise_remove doesn't split or merge vmas, it's easy to handle
this case with a NULL prev, without restructuring sys_madvise. (Though
sad to retake mmap_sem when it's unlikely to be needed, and certainly
down_read is sufficient for MADV_REMOVE, unlike the other madvices.)

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

holepunch: fix disconnected pages after second truncate

shmem_truncate_range has its own truncate_inode_pages_range, to free any
pages racily instantiated while it was in progress: a SHMEM_PAGEIN flag
is set when this might have happened. But holepunching gets no chance
to clear that flag at the start of vmtruncate_range, so it's always set
(unless a truncate came just before), so holepunch almost always does
this second truncate_inode_pages_range.

shmem holepunch has unlikely swap<->file races hereabouts whatever we do
(without a fuller rework than is fit for this release): I was going to
skip the second truncate in the punch_hole case, but Miklos points out
that would make holepunch correctness more vulnerable to swapoff. So
keep the second truncate, but follow it by an unmap_mapping_range to
eliminate the disconnected pages (freed from pagecache while still
mapped in userspace) that it might have left behind.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

holepunch: fix shmem_truncate_range punch locking

Miklos Szeredi observes that during truncation of shmem page directories,
info->lock is released to improve latency (after lowering i_size and
next_index to exclude races); but this is quite wrong for holepunching,
which receives no such protection from i_size or next_index, and is left
vulnerable to races with shmem_unuse, shmem_getpage and shmem_writepage.

Hold info->lock throughout when holepunching?  No, any user could prevent
rescheduling for far too long.  Instead take info->lock just when needed:
in shmem_free_swp when removing the swap entries, and whenever removing
a directory page from the level above.  But so long as we remove before
scanning, we can safely skip taking the lock at the lower levels, except
at misaligned start and end of the hole.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

holepunch: fix shmem_truncate_range punching too far

Miklos Szeredi observes BUG_ON(!entry) in shmem_writepage() triggered
in rare circumstances, because shmem_truncate_range() erroneously
removes partially truncated directory pages at the end of the range:
later reclaim on pages pointing to these removed directories triggers
the BUG. Indeed, and it can also cause data loss beyond the hole.

Fix this as in the patch proposed by Miklos, but distinguish between
"limit" (how far we need to search: ignore truncation's next_index
optimization in the holepunch case - if there are races it's more
consistent to act on the whole range specified) and "upper_limit"
(how far we can free directory pages: generally we must be careful
to keep partially punched pages, but can relax at end of file -
i_size being held stable by i_mutex).

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

Linux 2.6.16.50

Linux 2.6.16.50-rc1

[IPV6]: Disallow RH0 by default (CVE-2007-2242)

A security issue is emerging. Disallow Routing Header Type 0 by default
as we have been doing for IPv4.

This version already includes a fix for the original patch.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[NETLINK]: Infinite recursion in netlink (CVE-2007-1861)

Reply to NETLINK_FIB_LOOKUP messages were misrouted back to kernel,
which resulted in infinite recursion and stack overflow.

The bug is present in all kernel versions since the feature appeared.

The patch also makes some minimal cleanup:

1. Return something consistent (-ENOENT) when fib table is missing
2. Do not crash when queue is empty (does not happen, but yet)
3. Put result of lookup

Sergey Vlasov:
Oops fix

Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

Char: icom, mark __init as __devinit

Two functions are called from __devinit context, but they are marked as
__init. Fix this.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

aio: remove bare user-triggerable error printk

The user can generate console output if they cause do_mmap() to fail
during sys_io_setup(). This was seen in a regression test that does
exactly that by spinning calling mmap() until it gets -ENOMEM before
calling io_setup().

We don't need this printk at all, just remove it.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

mca_nmi_hook() can be called at any point

... and having it __init is a bad idea.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

IrDA: irttp_dup spin_lock initialisation

Without this initialization one gets

kernel BUG at kernel/rtmutex_common.h:80!

Signed-off-by: G. Liakhovetski <gl@dsa-ac.de>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

IrDA: Incorrect TTP header reservation

We must reserve SAR + MAX_HEADER bytes for IrLMP to fit in.
This fixes an oops reported (and fixed) by Jeet Chaudhuri, when max_sdu_size
is greater than 0.

Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

x86 microcode: don't check the size

IA32 manual says if micorcode update's size is 0, then the size is
default size (2048 bytes). But this doesn't suggest all microcode
update's size should be above 2048 bytes to me. We actually had a
microcode update whose size is 1024 bytes. The patch just removed the
check.

Backported by Daniel Drake.

Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

Linux 2.6.16.49

Linux 2.6.16.49-rc1

tty_io: fix race in master pty close/slave pty close path

This patch fixes a possible race that leads to double freeing an idr index.
When the master begin to close, release_dev() is called and then
pty_close() is called:

        if (tty->driver->close)
                tty->driver->close(tty, filp);

This is done without helding any locks other than BKL.  Inside pty_close(),
being a master close, the devpts entry will be removed:

#ifdef CONFIG_UNIX98_PTYS
                if (tty->driver == ptm_driver)
                        devpts_pty_kill(tty->index);
#endif

But devpts_pty_kill() will call get_node() that may sleep while waiting for
&devpts_root->d_inode->i_sem.  When this happens and the slave is being
opened, tty_open() just found the driver and index:

        driver = get_tty_driver(device, &index);
        if (!driver) {
                mutex_unlock(&tty_mutex);
                return -ENODEV;
        }

This part of the code is already protected under tty_mute.  The problem is
that the slave close already got an index.  Then init_dev() is called and
blocks waiting for the same &devpts_root->d_inode->i_sem.

When the master close resumes, it removes the devpts entry, and the
relation between idr index and the tty is gone.  The master then sleeps
waiting for the tty_mutex on release_dev().

Slave open resumes and found no tty for that index.  As result, a NULL tty
is returned and init_dev() doesn't flow to fast_track:

        /* check whether we're reopening an existing tty */
        if (driver->flags & TTY_DRIVER_DEVPTS_MEM) {
                tty = devpts_get_tty(idx);
                if (tty && driver->subtype == PTY_TYPE_MASTER)
                        tty = tty->link;
        } else {
                tty = driver->ttys[idx];
        }
        if (tty) goto fast_track;

The result of this, is that a new tty will be created and init_dev() returns
sucessfull. After returning, tty_mutex is dropped and master close may resume.

Master close finds it's the only use and both sides are closing, then releases
the tty and the index. At this point, the idr index is free, but slave still
has it.

Slave open then calls pty_open() and finds that tty->link->count is 0,
because there's no master and returns error.  Then tty_open() calls
release_dev() which executes without any warning, as it was a case of last
slave close when the master is already closed (master->count == 0,
slave->count == 1).  The tty is then released with the already released idr
index.

This normally would only issue a warning on idr_remove() but in case of a
customer's critical application, it's never too simple:

thread1: opens master, gets index X
thread1: begin closing master
thread2: begin opening slave with index X
thread1: finishes closing master, index X released
thread3: opens master, gets index X, just released
thread2: fails opening slave, releases index X         <----
thread4: opens master, gets index X, init_dev() then find an already in use
         and healthy tty and fails

If no more indexes are released, ptmx_open() will keep failing, as the
first free index available is X, and it will make init_dev() fail because
you're trying to "reopen a master" which isn't valid.

The patch notices when this race happens and make init_dev() fail
imediately.  The init_dev() function is called with tty_mutex held, so it's
safe to continue with tty till the end of function because release_dev()
won't make any further changes without grabbing the tty_mutex.

Without the patch, on some machines it's possible get easily idr warnings
like this one:

idr_remove called for id=15 which is not allocated.
[<c02555b9>] idr_remove+0x139/0x170
[<c02a1b62>] release_mem+0x182/0x230
[<c02a28e7>] release_dev+0x4b7/0x700
[<c02a0ea7>] tty_ldisc_enable+0x27/0x30
[<c02a1e64>] init_dev+0x254/0x580
[<c02a0d64>] check_tty_count+0x14/0xb0
[<c02a4f05>] tty_open+0x1c5/0x340
[<c02a4d40>] tty_open+0x0/0x340
[<c017388f>] chrdev_open+0xaf/0x180
[<c017c2ac>] open_namei+0x8c/0x760
[<c01737e0>] chrdev_open+0x0/0x180
[<c0167bc9>] __dentry_open+0xc9/0x210
[<c0167e2c>] do_filp_open+0x5c/0x70
[<c0167a91>] get_unused_fd+0x61/0xd0
[<c0167e93>] do_sys_open+0x53/0x100
[<c0167f97>] sys_open+0x27/0x30
[<c010303b>] syscall_call+0x7/0xb

using this test application available on:
http://www.ruivo.org/~aris/pty_sodomizer.c

Signed-off-by: Aristeu Sergio Rozanski Filho <aris@ruivo.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

elevator: move clearing of unplug flag earlier

A flag was recently added to the elevator code to avoid
performing an unplug when reuests are being re-queued.
The goal of this flag was to avoid a deep recursion that
can occur when re-queueing requests after a SCSI device/host
reset. See http://lkml.org/lkml/2006/5/17/254

However, that fix added the flag near the bottom of a case
statement, where an earlier break (in an if statement) could
transport one out of the case, without setting the flag.
This patch sets the flag earlier in the case statement.

I re-discovered the deep recursion recently during testing;
I was told that it was a known problem, and the fix to it was
in the kernel I was testing. Indeed it was ... but it didn't
fix the bug. With the patch below, I no longer see the bug.

Signed-off by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

start_kernel: test if irq's got enabled early, barf, and disable them again

The calls made by parse_parms to other initialization code might enable
interrupts again way too early.

Having interrupts on this early can make systems PANIC when they initialize
the IRQ controllers (which happens later in the code). This patch detects
that irq's are enabled again, barfs about it and disables them again as a
safety net.

[akpm@osdl.org: cleanups]
Signed-off-by: Ard van Breemen <ard@telegraafnet.nl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[IrDA]: Correctly handling socket error

This patch fixes an oops first reported in mid 2006 - see
http://lkml.org/lkml/2006/8/29/358 The cause of this bug report is that
when an error is signalled on the socket, irda_recvmsg_stream returns
without removing a local wait_queue variable from the socket's sk_sleep
queue. This causes havoc further down the road.

In response to this problem, a patch was made that invoked sock_orphan on
the socket when receiving a disconnect indication. This is not a good fix,
as this sets sk_sleep to NULL, causing applications sleeping in recvmsg
(and other places) to oops.

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

hwmon/w83627ehf: Fix the fan5 clock divider write

Users have been complaining about the w83627ehf driver flooding their logs
with debug messages like:

w83627ehf 9191-0a10: Increasing fan 4 clock divider from 64 to 128

or:

w83627ehf 9191-0290: Increasing fan 4 clock divider from 4 to 8

The reason is that we failed to actually write the LSB of the encoded clock
divider value for that fan, causing the next read to report the same old value
again and again.

Additionally, the fan number was improperly reported, making the bug harder to
find.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[NET]: Fix UDP checksum issue in net poll mode.

In net poll mode, the current checksum function doesn't consider the
kind of packet which is padded to reach a specific minimum length. I
believe that's the problem causing my test case failed. The following
patch fixed this issue.

Signed-off-by: Aubrey Li <aubreylee@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[SPARC64]: Fix inline directive in pci_iommu.c

While building a test kernel for the new esp driver (against
git-current), I hit this bug. Trivial fix, put the inline declaration
in the right place. :)

Signed-off-by: Tom Callaway <tcallawa@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[SPARC64]: Fix arg passing to compat_sys_ipc().

Do not sign extend args using the sys32_ipc stub, that is
buggy and unnecessary.

Based upon an excellent report by Mikael Pettersson.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[SPARC64]: Fix SBUS IOMMU allocation code.

There are several IOMMU allocator bugs. Instead of trying to fix this
overly complicated code, just mirror the PCI IOMMU arena allocator
which is very stable and well stress tested.

I tried to make the code as identical as possible so we can switch
sun4u PCI and SBUS over to a common piece of IOMMU code. All that
will be need are two callbacks, one to do a full IOMMU flush and one
to do a streaming buffer flush.

This patch gets rid of a lot of hangs and mysterious crashes on SBUS
sparc64 systems, at least for me.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[SCSI] QLOGICPTI: Do not unmap DMA unless we actually mapped something.

We only map DMA when cmd->request_bufflen is non-zero for non-sg
buffers, we thus should make the same check when unmapping.

Based upon a report from Pasi Pirhonen.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

Linux 2.6.16.48

Linux 2.6.16.48-rc1

[NET_SCHED]: cls_tcindex: fix compatibility breakage

Userspace uses an integer for TCA_TCINDEX_SHIFT, the kernel was changed
to expect and use a u16 value in 2.6.11, which broke compatibility on
big endian machines. Change back to use int.

Reported by Ole Reinartz <ole.reinartz@gmx.de>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[IPSEC]: Reject packets within replay window but outside the bit mask

Up until this point we've accepted replay window settings greater than
32 but our bit mask can only accomodate 32 packets. Thus any packet
with a sequence number within the window but outside the bit mask would
be accepted.

This patch causes those packets to be rejected instead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[TCP]: Do receiver-side SWS avoidance for rcvbuf < MSS.

Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[IPv6]: Fix incorrect length check in rawv6_sendmsg()

In article <20070329.142644.70222545.davem@davemloft.net> (at Thu, 29 Mar 2007 14:26:44 -0700 (PDT)), David Miller <davem@davemloft.net> says:

> From: Sridhar Samudrala <sri@us.ibm.com>
> Date: Thu, 29 Mar 2007 14:17:28 -0700
>
> > The check for length in rawv6_sendmsg() is incorrect.
> > As len is an unsigned int, (len < 0) will never be TRUE.
> > I think checking for IPV6_MAXPLEN(65535) is better.
> >
> > Is it possible to send ipv6 jumbo packets using raw
> > sockets? If so, we can remove this check.
>
> I don't see why such a limitation against jumbo would exist,
> does anyone else?
>
> Thanks for catching this Sridhar. A good compiler should simply
> fail to compile "if (x < 0)" when 'x' is an unsigned type, don't
> you think :-)

Dave, we use "int" for returning value,
so we should fix this anyway, IMHO;
we should not allow len > INT_MAX.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

[NET_SCHED]: cls_basic: fix memory leak in basic_destroy

tp->root is not freed on destruction.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

V4L/DVB: Pluto2: fix incorrect TSCR register setting

The ADEF bits in the TSCR register have different meanings in read and
write mode. For this reason ADEF has to be reset on every
read-modify-write operation.
This patch introduces a special write function for this register, which
takes care of it.

Thanks to Holger Magnussen for pointing my nose at this problem.

Signed-off-by: Andreas Oberritter <obi@linuxtv.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

V4L: saa7146: Fix allocation of clipping memory

Olaf Hering pointed out that SAA7146_CLIPPING_MEM would become
very large for PAGE_SIZE > 4K.

In fact, the number of clipping windows is limited to 16,
and calculate_clipping_registers_rect() does not use more
than 256 bytes. SAA7146_CLIPPING_MEM adjusted accordingly.

(cherry picked from commit 7a7cd1920969dd9da4e0d99aab573b3eba24c799)

Thanks-to: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Oliver Endriss <o.endriss@gmx.de>
Signed-off-by: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

V4L: radio: Fix error in Kbuild file

All the radio drivers need video_dev, but they were depending on
VIDEO_DEV!=n. That meant that one could try to compile the driver into
the kernel when VIDEO_DEV=m, which will not work. If video_dev is a
module, then the radio drivers must be modules too.

(cherry picked from commit b10fece583fdfdb3d2f29b0da3896ec58b8fe122)

Signed-off-by: Trent Piepho <xyzzy@speakeasy.org>
Signed-off-by: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

V4L: tveeprom: autodetect LG TAPC G701D as tuner type 37

Autodetect LG TAPC G701D as tuner type 37, fixing
mis-detected tuners in some Hauppauge tv tuner cards.

Thanks to Adonis Papas, for pointing this out.

(cherry picked from commit 1323fbda1343f50f198bc8bd6d1d59c8b7fc45bf)

Signed-off-by: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

sky2: turn on clocks when doing resume

Some of these chips are disabled until clock is enabled.
This fixes:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=404107

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

sky2: turn carrier off when down

Driver needs to turn off carrier when down.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

skge: turn carrier off when down

Driver needs to turn off carrier when down, otherwise it can
confuse bonding and bridging and looks like carrier is on immediately
when it is brought back up.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>