]> git.kernelconcepts.de Git - karo-tx-linux.git/log
karo-tx-linux.git
15 years agoWATCHDOG: rc32434_wdt: fix watchdog driver
Phil Sutter [Sun, 8 Feb 2009 15:44:42 +0000 (16:44 +0100)]
WATCHDOG: rc32434_wdt: fix watchdog driver

commit 0af98d37e85e6958eb84987b1f60da3b54008317 upstream.

The existing driver code wasn't working. Neither the timeout was set
correctly, nor system reset was being triggered, as the driver seemed
to keep the WDT alive himself. There was also some unnecessary code.

Signed-off-by: Phil Sutter <n0-1@freewrt.org>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoWATCHDOG: ks8695_wdt.c: 'CLOCK_TICK_RATE' undeclared
Alexey Dobriyan [Thu, 12 Feb 2009 10:42:41 +0000 (13:42 +0300)]
WATCHDOG: ks8695_wdt.c: 'CLOCK_TICK_RATE' undeclared

commit b02c387892fc6b3cc59c78ab2f79413d55f50190 upstream.

On arm-acs5k_tiny:

drivers/watchdog/ks8695_wdt.c:68: error: 'CLOCK_TICK_RATE' undeclared
(first use in this function)

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agortl8187: New USB ID's for RTL8187L
Larry Finger [Tue, 17 Feb 2009 20:31:12 +0000 (14:31 -0600)]
rtl8187: New USB ID's for RTL8187L

commit 046ee5d26ac91316a8ac0a29c0b33139dc9da20d upstream.

Add new USB ID codes. These come from two postings on forums and
mailing lists, and four are derived from the .inf that accompanies
the latest Realtek Windows driver for the RTL8187L.

Thanks to Viktor Ilijašić <viktor.ilijasic@gmail.com> and Xose Vazquez
Perez <xose.vazquez@gmail.com> for reporting these new ID's.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoUSB: cdc-acm: add usb id for motomagx phones
Dmitriy Taychenachev [Wed, 25 Feb 2009 04:36:51 +0000 (12:36 +0800)]
USB: cdc-acm: add usb id for motomagx phones

commit 155df65ae11dfc322214c6f887185929c809df1b upstream.

The Motorola MOTOMAGX phones (Z6, E8, Zn5 so far) are providing
combined ACM/BLAN USB configuration. Since it has Vendor Specific
class, the corresponding drivers (cdc-acm, zaurus) can't find it just
by interface info. This patch adds usb id so the cdc-acm driver can
properly handle this combined device.

Signed-off-by: Dmitriy Taychenachev <dimichxp@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoUSB: usb-storage: add IGNORE_RESIDUE flag for Genesys Logic adapters
Alan Stern [Mon, 23 Feb 2009 17:02:05 +0000 (12:02 -0500)]
USB: usb-storage: add IGNORE_RESIDUE flag for Genesys Logic adapters

commit 5126a2674ddac0804450f59da25a058cca629d38 upstream.

This patch (as1219) adds the IGNORE_RESIDUE flag to the unusual_devs
entries for Genesys Logic's USB-IDE adapter.  Although this device
usually gets the residue correct, there is one command crucial to the
operation of CD and DVD drives which it messes up.

Tested-by: Mike Lampard <mike@mtgambier.net>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoUSB: usb_get_string should check the descriptor type
Alan Stern [Fri, 20 Feb 2009 21:33:08 +0000 (16:33 -0500)]
USB: usb_get_string should check the descriptor type

commit 67f5a4ba9741fcef3f4db3509ad03565d9e33af2 upstream.

This patch (as1218) fixes a problem with a radio-control joystick used
in the "walkera 4#3" helicopter.  This device responds to the initial
Get-String-Descriptor request for string 0 (which is really the list
of supported languages) by sending its config descriptor!  The
usb_get_string() routine needs to check whether it got the right
type of descriptor.

Oddly enough, this sort of check is already present in
usb_get_descriptor().  The patch changes the error code from -EPROTO
to -ENODATA, because -EPROTO shows up in so many other contexts to
indicate a hardware failure rather than a firmware error.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Guillermo Jarabo <williamjap@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoSCSI: sd: revive sd_index_lock
Tejun Heo [Sat, 21 Feb 2009 02:04:45 +0000 (11:04 +0900)]
SCSI: sd: revive sd_index_lock

commit 4034cc68157bfa0b6622efe368488d3d3e20f4e6 upstream.

Commit f27bac2761cab5a2e212dea602d22457a9aa6943 which converted sd to
use ida instead of idr incorrectly removed sd_index_lock around id
allocation and free.  idr/ida do have internal locks but they protect
their free object lists not the allocation itself.  The caller is
responsible for that.  This missing synchronization led to the same id
being assigned to multiple devices leading to oops.

Reported and tracked down by Stuart Hayes of Dell.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoJFFS2: fix mount crash caused by removed nodes
Thomas Gleixner [Mon, 16 Feb 2009 20:29:31 +0000 (21:29 +0100)]
JFFS2: fix mount crash caused by removed nodes

commit 4c41bd0ec953954158f92bed5d3062645062b98e upstream.

At scan time we observed following scenario:

   node A inserted
   node B inserted
   node C inserted -> sets overlapped flag on node B

   node A is removed due to CRC failure -> overlapped flag on node B remains

   while (tn->overlapped)
     tn = tn_prev(tn);

   ==> crash, when tn_prev(B) is referenced.

When the ultimate node is removed at scan time and the overlapped flag
is set on the penultimate node, then nothing updates the overlapped
flag of that node. The overlapped iterators blindly expect that the
ultimate node does not have the overlapped flag set, which causes the
scan code to crash.

It would be a huge overhead to go through the node chain on node
removal and fix up the overlapped flags, so detecting such a case on
the fly in the overlapped iterators is a simpler and reliable
solution.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoSCSI: hptiop: Add new PCI device ID
HighPoint Linux Team [Thu, 12 Feb 2009 03:28:31 +0000 (11:28 +0800)]
SCSI: hptiop: Add new PCI device ID

commit b73a77494292b930642fbf87de3e3196593f7593 upstream.

Signed-off-by: HighPoint Linux Team <linux@highpoint-tech.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoPCI quirk: enable MSI on 8132
Yinghai Lu [Wed, 18 Feb 2009 04:40:09 +0000 (20:40 -0800)]
PCI quirk: enable MSI on 8132

commit e0ae4f5503235ba4449ffb5bcb4189edcef4d584 upstream.

David reported that LSI SAS doesn't work with MSI.  It turns out that
his BIOS doesn't enable it, but the HT MSI 8132 does support HT MSI.
Add quirk to enable it

Reported-by: David Lang <david@lang.hm>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agomm: vmap fix overflow
Nick Piggin [Fri, 27 Feb 2009 22:03:03 +0000 (14:03 -0800)]
mm: vmap fix overflow

commit 7766970cc13e9071b356b1f2a48a9eb8675bfcce upstream.

The new vmap allocator can wrap the address and get confused in the case
of large allocations or VMALLOC_END near the end of address space.

Problem reported by Christoph Hellwig on a 32-bit XFS workload.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agomm: fix lazy vmap purging (use-after-free error)
Vegard Nossum [Fri, 27 Feb 2009 22:03:04 +0000 (14:03 -0800)]
mm: fix lazy vmap purging (use-after-free error)

commit cbb766766f3f2f6d9326c561b1020590642c6e39 upstream.

I just got this new warning from kmemcheck:

    WARNING: kmemcheck: Caught 32-bit read from freed memory (c7806a60)
    a06a80c7ecde70c1a04080c700000000a06709c1000000000000000000000000
     f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f
     ^

    Pid: 0, comm: swapper Not tainted (2.6.29-rc4 #230)
    EIP: 0060:[<c1096df7>] EFLAGS: 00000286 CPU: 0
    EIP is at __purge_vmap_area_lazy+0x117/0x140
    EAX: 00070f43 EBX: c7806a40 ECX: c1677080 EDX: 00027b66
    ESI: 00002001 EDI: c170df0c EBP: c170df00 ESP: c178830c
     DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
    CR0: 80050033 CR2: c7806b14 CR3: 01775000 CR4: 00000690
    DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    DR6: 00004000 DR7: 00000000
     [<c1096f3e>] free_unmap_vmap_area_noflush+0x6e/0x70
     [<c1096f6a>] remove_vm_area+0x2a/0x70
     [<c1097025>] __vunmap+0x45/0xe0
     [<c10970de>] vunmap+0x1e/0x30
     [<c1008ba5>] text_poke+0x95/0x150
     [<c1008ca9>] alternatives_smp_unlock+0x49/0x60
     [<c171ef47>] alternative_instructions+0x11b/0x124
     [<c171f991>] check_bugs+0xbd/0xdc
     [<c17148c5>] start_kernel+0x2ed/0x360
     [<c171409e>] __init_begin+0x9e/0xa9
     [<ffffffff>] 0xffffffff

It happened here:

    $ addr2line -e vmlinux -i c1096df7
    mm/vmalloc.c:540

Code:

list_for_each_entry(va, &valist, purge_list)
__free_vmap_area(va);

It's this instruction:

    mov    0x20(%ebx),%edx

Which corresponds to a dereference of va->purge_list.next:

    (gdb) p ((struct vmap_area *) 0)->purge_list.next
    Cannot access memory at address 0x20

It seems that we should use "safe" list traversal here, as the element
is freed inside the loop. Please verify that this is the right fix.

Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoFix oops in cifs_strfromUCS_le mounting to servers which do not specify their OS
Steve French [Tue, 17 Feb 2009 01:29:40 +0000 (01:29 +0000)]
Fix oops in cifs_strfromUCS_le mounting to servers which do not specify their OS

commit 69765529d701c838df19ea1f5ad2f33a528261ae upstream.

Fixes kernel bug #10451 http://bugzilla.kernel.org/show_bug.cgi?id=10451

Certain NAS appliances do not set the operating system or network operating system
fields in the session setup response on the wire.  cifs was oopsing on the unexpected
zero length response fields (when trying to null terminate a zero length field).

This fixes the oops.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agomm: fix memmap init for handling memory hole
KAMEZAWA Hiroyuki [Wed, 18 Feb 2009 22:48:33 +0000 (14:48 -0800)]
mm: fix memmap init for handling memory hole

commit cc2559bccc72767cb446f79b071d96c30c26439b upstream.

Now, early_pfn_in_nid(PFN, NID) may returns false if PFN is a hole.
and memmap initialization was not done. This was a trouble for
sparc boot.

To fix this, the PFN should be initialized and marked as PG_reserved.
This patch changes early_pfn_in_nid() return true if PFN is a hole.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reported-by: David Miller <davem@davemlloft.net>
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agomm: clean up for early_pfn_to_nid()
KAMEZAWA Hiroyuki [Wed, 18 Feb 2009 22:48:32 +0000 (14:48 -0800)]
mm: clean up for early_pfn_to_nid()

commit f2dbcfa738368c8a40d4a5f0b65dc9879577cb21 upstream.

What's happening is that the assertion in mm/page_alloc.c:move_freepages()
is triggering:

BUG_ON(page_zone(start_page) != page_zone(end_page));

Once I knew this is what was happening, I added some annotations:

if (unlikely(page_zone(start_page) != page_zone(end_page))) {
printk(KERN_ERR "move_freepages: Bogus zones: "
       "start_page[%p] end_page[%p] zone[%p]\n",
       start_page, end_page, zone);
printk(KERN_ERR "move_freepages: "
       "start_zone[%p] end_zone[%p]\n",
       page_zone(start_page), page_zone(end_page));
printk(KERN_ERR "move_freepages: "
       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
       page_to_pfn(start_page), page_to_pfn(end_page));
printk(KERN_ERR "move_freepages: "
       "start_nid[%d] end_nid[%d]\n",
       page_to_nid(start_page), page_to_nid(end_page));
 ...

And here's what I got:

move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
move_freepages: start_nid[1] end_nid[0]

My memory layout on this box is:

[    0.000000] Zone PFN ranges:
[    0.000000]   Normal   0x00000000 -> 0x0081ff5d
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[8] active PFN ranges
[    0.000000]     0: 0x00000000 -> 0x00020000
[    0.000000]     1: 0x00800000 -> 0x0081f7ff
[    0.000000]     1: 0x0081f800 -> 0x0081fe50
[    0.000000]     1: 0x0081fed1 -> 0x0081fed8
[    0.000000]     1: 0x0081feda -> 0x0081fedb
[    0.000000]     1: 0x0081fedd -> 0x0081fee5
[    0.000000]     1: 0x0081fee7 -> 0x0081ff51
[    0.000000]     1: 0x0081ff59 -> 0x0081ff5d

So it's a block move in that 0x81f600-->0x81f7ff region which triggers
the problem.

This patch:

Declaration of early_pfn_to_nid() is scattered over per-arch include
files, and it seems it's complicated to know when the declaration is used.
 I think it makes fix-for-memmap-init not easy.

This patch moves all declaration to include/linux/mm.h

After this,
  if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -> Use static definition in include/linux/mm.h
  else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -> Use generic definition in mm/page_alloc.c
  else
     -> per-arch back end function will be called.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reported-by: David Miller <davem@davemlloft.net>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoaoe: ignore vendor extension AoE responses
Ed Cashin [Wed, 18 Feb 2009 22:48:13 +0000 (14:48 -0800)]
aoe: ignore vendor extension AoE responses

commit b6d6c5175809934e04a606d9193ef04924a7a7d9 upstream.

The Welland ME-747K-SI AoE target generates unsolicited AoE responses that
are marked as vendor extensions.  Instead of ignoring these packets, the
aoe driver was generating kernel messages for each unrecognized response
received.  This patch corrects the behavior.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Reported-by: <karaluh@karaluh.pl>
Tested-by: <karaluh@karaluh.pl>
Cc: Alex Buell <alex.buell@munted.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agotimerfd: add flags check
Davide Libenzi [Wed, 18 Feb 2009 22:48:18 +0000 (14:48 -0800)]
timerfd: add flags check

commit 610d18f4128ebbd88845d0fc60cce67b49af881e upstream.

As requested by Michael, add a missing check for valid flags in
timerfd_settime(), and make it return EINVAL in case some extra bits are
set.

Michael said:
If this is to be any use to userland apps that want to check flag
support (perhaps it is too late already), then the sooner we get it
into the kernel the better: 2.6.29 would be good; earlier stables as
well would be even better.

[akpm@linux-foundation.org: remove unused TFD_FLAGS_SET]
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agovt: Declare PIO_CMAP/GIO_CMAP as compatbile ioctls.
Bill Nottingham [Wed, 18 Feb 2009 22:48:39 +0000 (14:48 -0800)]
vt: Declare PIO_CMAP/GIO_CMAP as compatbile ioctls.

commit 2db69a9340da12a4db44edb7506dd68799aeff55 upstream.

Otherwise, these don't work when called from 32-bit userspace on 64-bit
kernels.

Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoseq_file: properly cope with pread
Eric Biederman [Wed, 18 Feb 2009 22:48:16 +0000 (14:48 -0800)]
seq_file: properly cope with pread

commit 8f19d472935c83d823fa4cf02bcc0a7b9952db30 upstream.

Currently seq_read assumes that the offset passed to it is always the
offset it passed to user space.  In the case pread this assumption is
broken and we do the wrong thing when presented with pread.

To solve this I introduce an offset cache inside of struct seq_file so we
know where our logical file position is.  Then in seq_read if we try to
read from another offset we reset our data structures and attempt to go to
the offset user space wanted.

[akpm@linux-foundation.org: restore FMODE_PWRITE]
[pjt@google.com: seq_open needs its fmode opened up to take advantage of this]
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Paul Turner <pjt@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agovfs: separate FMODE_PREAD/FMODE_PWRITE into separate flags
Paul Turner [Wed, 18 Feb 2009 22:48:15 +0000 (14:48 -0800)]
vfs: separate FMODE_PREAD/FMODE_PWRITE into separate flags

commit 55ec82176eca52e4e0530a82a0eb59160a1a95a1 upstream.

Separate FMODE_PREAD and FMODE_PWRITE into separate flags to reflect the
reality that the read and write paths may have independent restrictions.

A git grep verifies that these flags are always cleared together so this
new behavior will only apply to interfaces that change to clear flags
individually.

This is required for "seq_file: properly cope with pread", a post-2.6.25
regression fix.

[akpm@linux-foundation.org: add comment]
Signed-off-by: Paul Turner <pjt@google.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agosparc64: Fix DAX handling via userspace access from kernel.
David S. Miller [Tue, 20 Jan 2009 06:56:51 +0000 (22:56 -0800)]
sparc64: Fix DAX handling via userspace access from kernel.

[ Upstream commit fcd26f7ae2ea5889134e8b3d60a42ce8b993c95f ]

If we do a userspace access from kernel mode, and get a
data access exception, we need to check the exception
table just like a normal fault does.

The spitfire DAX handler was doing this, but such logic
was missing from the sun4v DAX code.

Reported-by: Dennis Gilmore <dgilmore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agosparc64: Fix crashes in jbusmc_print_dimm()
David S. Miller [Wed, 11 Feb 2009 08:54:07 +0000 (00:54 -0800)]
sparc64: Fix crashes in jbusmc_print_dimm()

[ Upstream commit 1b0e235cc9bfae4bc0f5cd0cba929206fb0f6a64 ]

Return was missing for the case where there is no dimm
info match.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agonet: Kill skb_truesize_check(), it only catches false-positives.
David S. Miller [Wed, 18 Feb 2009 05:24:05 +0000 (21:24 -0800)]
net: Kill skb_truesize_check(), it only catches false-positives.

[ Upstream commit 92a0acce186cde8ead56c6915d9479773673ea1a ]

A long time ago we had bugs, primarily in TCP, where we would modify
skb->truesize (for TSO queue collapsing) in ways which would corrupt
the socket memory accounting.

skb_truesize_check() was added in order to try and catch this error
more systematically.

However this debugging check has morphed into a Frankenstein of sorts
and these days it does nothing other than catch false-positives.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agonet: amend the fix for SO_BSDCOMPAT gsopt infoleak
Eugene Teo [Mon, 23 Feb 2009 23:38:41 +0000 (15:38 -0800)]
net: amend the fix for SO_BSDCOMPAT gsopt infoleak

[ Upstream commit 50fee1dec5d71b8a14c1b82f2f42e16adc227f8b ]

The fix for CVE-2009-0676 (upstream commit df0bca04) is incomplete. Note
that the same problem of leaking kernel memory will reappear if someone
on some architecture uses struct timeval with some internal padding (for
example tv_sec 64-bit and tv_usec 32-bit) --- then, you are going to
leak the padded bytes to userspace.

Signed-off-by: Eugene Teo <eugeneteo@kernel.sg>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoLinux 2.6.28.7 v2.6.28.7
Greg Kroah-Hartman [Fri, 20 Feb 2009 22:41:27 +0000 (14:41 -0800)]
Linux 2.6.28.7

15 years agoFix longstanding "error: storage size of '__mod_dmi_device_table' isn't known"
Alexey Dobriyan [Wed, 21 Jan 2009 22:58:36 +0000 (01:58 +0300)]
Fix longstanding "error: storage size of '__mod_dmi_device_table' isn't known"

commit 40413dcb7b273bda681dca38e6ff0bbb3728ef11 upstream.

gcc 3.4.6 doesn't like MODULE_DEVICE_TABLE(dmi, x) expansion enough to
error out.  Shut it up in a most simple way.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoext4: Initialize the new group descriptor when resizing the filesystem
Theodore Ts'o [Tue, 17 Feb 2009 15:32:42 +0000 (10:32 -0500)]
ext4: Initialize the new group descriptor when resizing the filesystem

(cherry picked from commit fdff73f094e7220602cc3f8959c7230517976412)

Make sure all of the fields of the group descriptor are properly
initialized.  Previously, we allowed bg_flags field to be contain
random garbage, which could trigger non-deterministic behavior,
including a kernel OOPS.

http://bugzilla.kernel.org/show_bug.cgi?id=12433

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agojbd2: On a __journal_expect() assertion failure printk "JBD2", not "EXT3-fs"
Theodore Ts'o [Tue, 17 Feb 2009 15:32:41 +0000 (10:32 -0500)]
jbd2: On a __journal_expect() assertion failure printk "JBD2", not "EXT3-fs"

(cherry picked from commit 08ec8c3878cea0bf91f2ba3c0badf44b383752d0)

Otherwise it can be very confusing to find a "EXT3-fs: " failure in
the middle of EXT4-fs failures, and it makes it harder to track the
source of the failure.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoext4: Add sanity check to make_indexed_dir
Theodore Ts'o [Tue, 17 Feb 2009 15:32:40 +0000 (10:32 -0500)]
ext4: Add sanity check to make_indexed_dir

(cherry picked from commit e6b8bc09ba2075cd91fbffefcd2778b1a00bd76f)

Make sure the rec_len field in the '..' entry is sane, lest we overrun
the directory block and cause a kernel oops on a purposefully
corrupted filesystem.

Thanks to Sami Liedes for reporting this bug.

http://bugzilla.kernel.org/show_bug.cgi?id=12430

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoext4: only use i_size_high for regular files
Theodore Ts'o [Tue, 17 Feb 2009 15:32:39 +0000 (10:32 -0500)]
ext4: only use i_size_high for regular files

(cherry picked from commit 06a279d636734da32bb62dd2f7b0ade666f65d7c)

Directories are not allowed to be bigger than 2GB, so don't use
i_size_high for anything other than regular files.  E2fsck should
complain about these inodes, but the simplest thing to do for the
kernel is to only use i_size_high for regular files.

This prevents an intentially corrupted filesystem from causing the
kernel to burn a huge amount of CPU and issuing error messages such
as:

EXT4-fs warning (device loop0): ext4_block_to_path: block 135090028 > max

Thanks to David Maciejak from Fortinet's FortiGuard Global Security
Research Team for reporting this issue.

http://bugzilla.kernel.org/show_bug.cgi?id=12375

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoext4: Add sanity checks for the superblock before mounting the filesystem
Theodore Ts'o [Tue, 17 Feb 2009 15:32:38 +0000 (10:32 -0500)]
ext4: Add sanity checks for the superblock before mounting the filesystem

(cherry picked from commit 4ec110281379826c5cf6ed14735e47027c3c5765)

This avoids insane superblock configurations that could lead to kernel
oops due to null pointer derefences.

http://bugzilla.kernel.org/show_bug.cgi?id=12371

Thanks to David Maciejak at Fortinet's FortiGuard Global Security
Research Team who discovered this bug independently (but at
approximately the same time) as Thiemo Nagel, who submitted the patch.

Signed-off-by: Thiemo Nagel <thiemo.nagel@ph.tum.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:32:37 +0000 (10:32 -0500)]
ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc

(cherry picked from commit 0087d9fb3f29f59e8d42c8b058376d80e5adde4c)

With nodelalloc option we need to update the dirty block counter on
block allocation failure. This is needed because we increment the
dirty block counter early in the block allocation phase. Without
the patch s_dirty_blocks_counter goes wrong so that filesystem's
free blocks decreases incorrectly.

Tested-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoext4: Init the complete page while building buddy cache
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:32:36 +0000 (10:32 -0500)]
ext4: Init the complete page while building buddy cache

(cherry picked from commit 29eaf024980e07cc01f31ae4ea5d68c917f4b7da)

We need to init the complete page during buddy cache init
by setting the contents to '1'.  Otherwise we can see the
following errors after doing an online resize of the
filesystem:

EXT4-fs error (device sdb1): ext4_mb_mark_diskspace_used:
Allocating block 1040385 in system zone of 127 group

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoext4: Don't allow new groups to be added during block allocation
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:32:35 +0000 (10:32 -0500)]
ext4: Don't allow new groups to be added during block allocation

(cherry picked from commit 8556e8f3b6c4c11601ce1e9ea8090a6d8bd5daae)

After we mark the blocks in the buddy cache as allocated,
we need to ensure that we don't reinit the buddy cache until
the block bitmap is updated.  This commit achieves this by holding
the group_info alloc_semaphore till ext4_mb_release_context

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoext4: mark the blocks/inode bitmap beyond end of group as used
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:32:34 +0000 (10:32 -0500)]
ext4: mark the blocks/inode bitmap beyond end of group as used

(cherry picked from commit 648f5879f5892dddd3ba71cd0d285599f40f2512)

We need to mark the block/inode bitmap beyond the end of the group
with '1'.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoext4: Use new buffer_head flag to check uninit group bitmaps initialization
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:32:33 +0000 (10:32 -0500)]
ext4: Use new buffer_head flag to check uninit group bitmaps initialization

(cherry picked from commit 2ccb5fb9f113dae969d1ae9b6c10e80fa34f8cd3)

For uninit block group, the on-disk bitmap is not initialized. That
implies we cannot depend on the uptodate flag on the bitmap
buffer_head to find bitmap validity.  Use a new buffer_head flag which
would be set after we properly initialize the bitmap.  This also
prevents (re-)initializing the uninit group bitmap every time we call
ext4_read_block_bitmap().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agojbd2: Add BH_JBDPrivateStart
Mark Fasheh [Tue, 17 Feb 2009 15:32:32 +0000 (10:32 -0500)]
jbd2: Add BH_JBDPrivateStart

(cherry picked from commit e97fcd95a4778a8caf1980c6c72fdf68185a0838)

Add this so that file systems using JBD2 can safely allocate unused b_state
bits.

In this case, we add it so that Ocfs2 can define a single bit for tracking
the validation state of a buffer.

Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoext4: Fix the race between read_inode_bitmap() and ext4_new_inode()
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:32:31 +0000 (10:32 -0500)]
ext4: Fix the race between read_inode_bitmap() and ext4_new_inode()

(cherry picked from commit 393418676a7602e1d7d3f6e560159c65c8cbd50e)

We need to make sure we update the inode bitmap and clear
EXT4_BG_INODE_UNINIT flag with sb_bgl_lock held, since
ext4_read_inode_bitmap() looks at EXT4_BG_INODE_UNINIT to decide
whether to initialize the inode bitmap each time it is called.
(introduced by commit c806e68f.)

ext4_read_inode_bitmap does:

spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
if (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
ext4_init_inode_bitmap(sb, bh, block_group, desc);

and ext4_new_inode does
if (!ext4_set_bit_atomic(sb_bgl_lock(sbi, group),
                   ino, inode_bitmap_bh->b_data))
   ......
   ...
spin_lock(sb_bgl_lock(sbi, group));

gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
i.e., on allocation we update the bitmap then we take the sb_bgl_lock
and clear the EXT4_BG_INODE_UNINIT flag. What can happen is a
parallel ext4_read_inode_bitmap can zero out the bitmap in between
the above ext4_set_bit_atomic and spin_lock(sb_bg_lock..)

The race results in below user visible errors
EXT4-fs error (device sdb1): ext4_free_inode: bit already cleared for inode 168449
EXT4-fs warning (device sdb1): ext4_unlink: Deleting nonexistent file ...
EXT4-fs warning (device sdb1): ext4_rmdir: empty directory has too many links ...
ls: /mnt/tmp/f/p369/d3/d6/d39/db2/dee/d10f/d3f/l71: Stale NFS file handle

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoext4: Fix race between read_block_bitmap() and mark_diskspace_used()
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:32:30 +0000 (10:32 -0500)]
ext4: Fix race between read_block_bitmap() and mark_diskspace_used()

(cherry picked from commit e8134b27e351e813414da3b95aa8eac6d3908088)

We need to make sure we update the block bitmap and clear
EXT4_BG_BLOCK_UNINIT flag with sb_bgl_lock held, since
ext4_read_block_bitmap() looks at EXT4_BG_BLOCK_UNINIT to decide
whether to initialize the block bitmap each time it is called
(introduced by commit c806e68f), and this can race with block
allocations in ext4_mb_mark_diskspace_used().

ext4_read_block_bitmap does:

spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
ext4_init_block_bitmap(sb, bh, block_group, desc);

Now on the block allocation side we do

mb_set_bits(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group), bitmap_bh->b_data,
ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len);
....
spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group));
if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);

ie on allocation we update the bitmap then we take the sb_bgl_lock
and clear the EXT4_BG_BLOCK_UNINIT flag. What can happen is a
parallel ext4_read_block_bitmap can zero out the bitmap in between
the above mb_set_bits and spin_lock(sb_bg_lock..)

The race results in below user visible errors
EXT4-fs error (device sdb1): ext4_mb_release_inode_pa: free 100, pa_free 105
EXT4-fs error (device sdb1): mb_free_blocks: double-free of inode 0's block ..

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoext4: don't use blocks freed but not yet committed in buddy cache init
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:32:29 +0000 (10:32 -0500)]
ext4: don't use blocks freed but not yet committed in buddy cache init

(cherry picked from commit 7a2fcbf7f85737735fd44eb34b62315bccf6d6e4)

When we generate buddy cache (especially during resize) we need to
make sure we don't use the blocks freed but not yet comitted.  This
makes sure we have the right value of free blocks count in the group
info and also in the bitmap.  This also ensures the ordered mode
consistency

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoext4: cleanup mballoc header files
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:32:28 +0000 (10:32 -0500)]
ext4: cleanup mballoc header files

(cherry picked from commit c3a326a657562dab81acf05aee106dc1fe345eb4)

Move some of the forward declaration of the static functions
to mballoc.c where they are used. This enables us to include
mballoc.h in other .c files. Also correct the buddy cache
documentation.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoext4: Use EXT4_GROUP_INFO_NEED_INIT_BIT during resize
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:32:27 +0000 (10:32 -0500)]
ext4: Use EXT4_GROUP_INFO_NEED_INIT_BIT during resize

(cherry picked from commit 920313a726e04fef0f2c0bcb04ad8229c0e700d8)

The new groups added during resize are flagged as
need_init group. Make sure we properly initialize these
groups. When we have block size < page size and we are adding
new groups the page may still be marked uptodate even though
we haven't initialized the group. While forcing the init
of buddy cache we need to make sure other groups part of the
same page of buddy cache is not using the cache.
group_info->alloc_sem is added to ensure the same.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoext4: Add blocks added during resize to bitmap
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:32:26 +0000 (10:32 -0500)]
ext4: Add blocks added during resize to bitmap

(cherry picked from commit e21675d4b63975d09eb75c443c48ebe663d23e18)

With this change new blocks added during resize
are marked as free in the block bitmap and the
group is flagged with EXT4_GROUP_INFO_NEED_INIT_BIT
flag.  This makes sure when mballoc tries to allocate
blocks from the new group we would reload the
buddy information using the bitmap present in the disk.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoext4: Don't overwrite allocation_context ac_status
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:32:25 +0000 (10:32 -0500)]
ext4: Don't overwrite allocation_context ac_status

(cherry picked from commit 032115fcef837a00336ddf7bda584e89789ea498)

We can call ext4_mb_check_limits even after successfully allocating
the requested blocks.  In that case, make sure we don't overwrite
ac_status if it already has the status AC_STATUS_FOUND.  This fixes
the lockdep warning:

=============================================
[ INFO: possible recursive locking detected ]
2.6.28-rc6-autokern1 #1
---------------------------------------------
fsstress/11948 is trying to acquire lock:
 (&meta_group_info[i]->alloc_sem){----}, at: [<c04d9a49>] ext4_mb_load_buddy+0x9f/0x278
.....

stack backtrace:
.....
 [<c04db974>] ext4_mb_regular_allocator+0xbb5/0xd44
.....

but task is already holding lock:
 (&meta_group_info[i]->alloc_sem){----}, at: [<c04d9a49>] ext4_mb_load_buddy+0x9f/0x278

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agojbd2: Add barrier not supported test to journal_wait_on_commit_record
Theodore Ts'o [Tue, 17 Feb 2009 15:32:24 +0000 (10:32 -0500)]
jbd2: Add barrier not supported test to journal_wait_on_commit_record

(cherry picked from commit fd98496f467b3d26d05ab1498f41718b5ef13de5)

Xen doesn't report that barriers are not supported until buffer I/O is
reported as completed, instead of when the buffer I/O is submitted.
Add a check and a fallback codepath to journal_wait_on_commit_record()
to detect this case, so that attempts to mount ext4 filesystems on
LVM/devicemapper devices on Xen guests don't blow up with an "Aborting
journal on device XXX"; "Remounting filesystem read-only" error.

Thanks to Andreas Sundstrom for reporting this issue.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoext4: Widen type of ext4_sb_info.s_mb_maxs[]
Yasunori Goto [Tue, 17 Feb 2009 15:32:23 +0000 (10:32 -0500)]
ext4: Widen type of ext4_sb_info.s_mb_maxs[]

(cherry picked from commit ff7ef329b268b603ea4a2303241ef1c3829fd574)

I chased the cause of following ext4 oops report which is tested on
ia64 box.

http://bugzilla.kernel.org/show_bug.cgi?id=12018

The cause is the size of s_mb_maxs array that is defined as "unsigned
short" in ext4_sb_info structure.  If the file system's block size is
8k or greater, an unsigned short is not wide enough to contain the
value fs->blocksize << 3.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoext4: avoid ext4_error when mounting a fs with a single bg
Aneesh Kumar K.V [Fri, 20 Feb 2009 22:40:23 +0000 (14:40 -0800)]
ext4: avoid ext4_error when mounting a fs with a single bg

(cherry picked from commit 565a9617b2151e21b22700e97a8b04e70e103153)

Remove some completely unneeded code which which caused an ext4_error
to be generated when mounting a file system with only a single block
group.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoext4: Fix the delalloc writepages to allocate blocks at the right offset.
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:32:21 +0000 (10:32 -0500)]
ext4: Fix the delalloc writepages to allocate blocks at the right offset.

(cherry picked from commit 791b7f08954869d7b8ff438f3dac3cfb39778297)

When iterating through the pages which have mapped buffer_heads, we
failed to update the b_state value. This results in allocating blocks
at logical offset 0.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoext4: tone down ext4_da_writepages warnings
Theodore Ts'o [Tue, 17 Feb 2009 15:32:20 +0000 (10:32 -0500)]
ext4: tone down ext4_da_writepages warnings

(cherry picked from commit 2a21e37e48b94388f2cc8c0392f104f5443d4bb8)

If the filesystem has errors, ext4_da_writepages() will return a *lot*
of errors, including lots and lots of stack dumps.  While it's true
that we are dropping user data on the floor, which is unfortunate, the
stack dumps aren't helpful, and they tend to obscure the true original
root cause of the problem.  So in the case where the filesystem has
aborted, return an EROFS right away.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoext4: Add support for non-native signed/unsigned htree hash algorithms
Theodore Ts'o [Tue, 17 Feb 2009 15:32:19 +0000 (10:32 -0500)]
ext4: Add support for non-native signed/unsigned htree hash algorithms

(cherry picked from commit f99b25897a86fcfff9140396a97261ae65fed872)

The original ext3 hash algorithms assumed that variables of type char
were signed, as God and K&R intended.  Unfortunately, this assumption
is not true on some architectures.  Userspace support for marking
filesystems with non-native signed/unsigned chars was added two years
ago, but the kernel-side support was never added (until now).

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoBluetooth: Fix TX error path in btsdio driver
Tomas Winkler [Sun, 30 Nov 2008 11:17:18 +0000 (12:17 +0100)]
Bluetooth: Fix TX error path in btsdio driver

commit 7644d63d1348ec044ccd8f775fefe5eb7cbcac69 upstream.

This patch fixes accumulating of the header in case packet was requeued
in the error path.

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years ago3c505: do not set pcb->data.raw beyond its size
Roel Kluin [Fri, 13 Feb 2009 00:52:31 +0000 (16:52 -0800)]
3c505: do not set pcb->data.raw beyond its size

commit 501aa061bd68169a5b54c123641f8dfa9ad31545 upstream.

Ensure that we do not set pcb->data.raw beyond its size, print an error message
and return false if we attempt to. A timout message was printed one too early.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoext2/xip: refuse to change xip flag during remount with busy inodes
Carsten Otte [Wed, 11 Feb 2009 21:04:37 +0000 (13:04 -0800)]
ext2/xip: refuse to change xip flag during remount with busy inodes

commit 0e4a9b59282914fe057ab17027f55123964bc2e2 upstream.

For a reason that I was unable to understand in three months of debugging,
mount ext2 -o remount stopped working properly when remounting from
regular operation to xip, or the other way around.  According to a git
bisect search, the problem was introduced with the VM_MIXEDMAP/PTE_SPECIAL
rework in the vm:

commit 70688e4dd1647f0ceb502bbd5964fa344c5eb411
Author: Nick Piggin <npiggin@suse.de>
Date:   Mon Apr 28 02:13:02 2008 -0700

    xip: support non-struct page backed memory

In the failing scenario, the filesystem is mounted read only via root=
kernel parameter on s390x.  During remount (in rc.sysinit), the inodes of
the bash binary and its libraries are busy and cannot be invalidated (the
bash which is running rc.sysinit resides on subject filesystem).
Afterwards, another bash process (running ifup-eth) recurses into a
subshell, runs dup_mm (via fork).  Some of the mappings in this bash
process were created from inodes that could not be invalidated during
remount.

Both parent and child process crash some time later due to inconsistencies
in their address spaces.  The issue seems to be timing sensitive, various
attempts to recreate it have failed.

This patch refuses to change the xip flag during remount in case some
inodes cannot be invalidated.  This patch keeps users from running into
that issue.

[akpm@linux-foundation.org: cleanup]
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Jared Hulbert <jaredeh@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoAdd support for VT6415 PCIE PATA IDE Host Controller
Zlatko Calusic [Wed, 18 Feb 2009 00:33:34 +0000 (01:33 +0100)]
Add support for VT6415 PCIE PATA IDE Host Controller

commit 5955c7a2cfb6a35429adea5dc480002b15ca8cfc upstream.

Signed-off-by: Zlatko Calusic <zlatko.calusic@iskon.hr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agox86, vm86: fix preemption bug
Thomas Gleixner [Tue, 13 Jan 2009 22:36:34 +0000 (23:36 +0100)]
x86, vm86: fix preemption bug

commit be716615fe596ee117292dc615e95f707fb67fd1 upstream.

Commit 3d2a71a596bd9c761c8487a2178e95f8a61da083 ("x86, traps: converge
do_debug handlers") changed the preemption disable logic of do_debug()
so vm86_handle_trap() is called with preemption disabled resulting in:

 BUG: sleeping function called from invalid context at include/linux/kernel.h:155
 in_atomic(): 1, irqs_disabled(): 0, pid: 3005, name: dosemu.bin
 Pid: 3005, comm: dosemu.bin Tainted: G        W  2.6.29-rc1 #51
 Call Trace:
  [<c050d669>] copy_to_user+0x33/0x108
  [<c04181f4>] save_v86_state+0x65/0x149
  [<c0418531>] handle_vm86_trap+0x20/0x8f
  [<c064e345>] do_debug+0x15b/0x1a4
  [<c064df1f>] debug_stack_correct+0x27/0x2c
  [<c040365b>] sysenter_do_call+0x12/0x2f
 BUG: scheduling while atomic: dosemu.bin/3005/0x10000001

Restore the original calling convention and reenable preemption before
calling handle_vm86_trap().

Reported-by: Michal Suchanek <hramrach@centrum.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agosched: SCHED_OTHER vs SCHED_IDLE isolation
Peter Zijlstra [Thu, 15 Jan 2009 13:53:38 +0000 (14:53 +0100)]
sched: SCHED_OTHER vs SCHED_IDLE isolation

commit 6bc912b71b6f33b041cfde93ca3f019cbaa852bc upstream.

Stronger SCHED_IDLE isolation:

 - no SCHED_IDLE buddies
 - never let SCHED_IDLE preempt on wakeup
 - always preempt SCHED_IDLE on wakeup
 - limit SLEEPER fairness for SCHED_IDLE.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agox86/cpa: make sure cpa is safe to call in lazy mmu mode
Jeremy Fitzhardinge [Wed, 11 Feb 2009 17:32:19 +0000 (09:32 -0800)]
x86/cpa: make sure cpa is safe to call in lazy mmu mode

commit 4f06b0436b2ddbd3b67b10e77098a6862787b3eb upstream.

Impact: fix race leading to crash under KVM and Xen

The CPA code may be called while we're in lazy mmu update mode - for
example, when using DEBUG_PAGE_ALLOC and doing a slab allocation
in an interrupt handler which interrupted a lazy mmu update.  In this
case, the in-memory pagetable state may be out of date due to pending
queued updates.  We need to flush any pending updates before inspecting
the page table.  Similarly, we must explicitly flush any modifications
CPA may have made (which comes down to flushing queued operations when
flushing the TLB).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoSCSI: libiscsi: fix iscsi pool leak
Mike Christie [Fri, 16 Jan 2009 18:36:51 +0000 (12:36 -0600)]
SCSI: libiscsi: fix iscsi pool leak

commit 2f5899a39dcffb404c9a3d06ad438aff3e03bf04 upstream.

I am not sure what happened. It looks like we have always leaked
the q->queue that is allocated from the kfifo_init call. nab finally
noticed that we were leaking and this patch fixes it by adding a
kfree call to iscsi_pool_free. kfifo_free is not used per kfifo_init's
instructions to use kfree.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoFix Intel IOMMU write-buffer flushing
David Woodhouse [Fri, 13 Feb 2009 23:18:03 +0000 (23:18 +0000)]
Fix Intel IOMMU write-buffer flushing

commit ca77fde8e62cecb2c0769052228d15b901367af8 upstream.

This is the cause of the DMA faults and disk corruption that people have
been seeing. Some chipsets neglect to report the RWBF "capability" --
the flag which says that we need to flush the chipset write-buffer when
changing the DMA page tables, to ensure that the change is visible to
the IOMMU.

Override that bit on the affected chipsets, and everything is happy
again.

Thanks to Chris and Bhavesh and others for helping to debug.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Tested-by: Chris Wright <chrisw@sous-sol.org>
Reviewed-by: Bhavesh Davda <bhavesh@vmware.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agosata_nv: give up hardreset on nf2
Tejun Heo [Thu, 12 Feb 2009 01:34:32 +0000 (10:34 +0900)]
sata_nv: give up hardreset on nf2

commit 7dac745b8e367c99175b8f0d014d996f0e5ed9e5 upstream.

Kernel bz#12176 reports that nf2 hardreset simply doesn't work.  Give
up.  Argh...

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Robert Hancock <hancockr@shaw.ca>
Reported-by: Saro <saro_v@hotmail.it>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agopowerpc/vsx: Fix VSX alignment handler for regs 32-63
Michael Neuling [Thu, 12 Feb 2009 19:08:58 +0000 (19:08 +0000)]
powerpc/vsx: Fix VSX alignment handler for regs 32-63

commit 26456dcfb8d8e43b1b64b2a14710694cf7a72f05 upstream.

Fix the VSX alignment handler for VSX registers > 32.  32-63 are stored
in the VMX part of the thread_struct not the FPR part.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoWATCHDOG: iTCO_wdt: fix SMI_EN regression 2
Wim Van Sebroeck [Wed, 28 Jan 2009 20:51:04 +0000 (20:51 +0000)]
WATCHDOG: iTCO_wdt: fix SMI_EN regression 2

commit 12d60e28bed3f593aac5385acbdbb089eb8ae21e upstream.

bugzilla: #12363
commit 7cd5b08be3c489df11b559fef210b81133764ad4 added a second regression:
some Dell's and Compaq's lockup on boot. So we revert most of the code.
The ICH9 reboot issue remains in place and will need some more fixing... :-(

Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agomqueue: fix si_pid value in mqueue do_notify()
Sukadev Bhattiprolu [Thu, 8 Jan 2009 02:08:50 +0000 (18:08 -0800)]
mqueue: fix si_pid value in mqueue do_notify()

commit a6684999f7c6bddd75cf9755ad7ff44435f72fff upstream.

If a process registers for asynchronous notification on a POSIX message
queue, it gets a signal and a siginfo_t structure when a message arrives
on the message queue.  The si_pid in the siginfo_t structure is set to the
PID of the process that sent the message to the message queue.

The principle is the following:
. when mq_notify(SIGEV_SIGNAL) is called, the caller registers for
  notification when a msg arrives. The associated pid structure is stroed into
  inode_info->notify_owner. Let's call this process P1.
. when mq_send() is called by say P2, P2 sends a signal to P1 to notify
  him about msg arrival.

The way .si_pid is set today is not correct, since it doesn't take into account
the fact that the process that is sending the message might not be in the
same namespace as the notified one.

This patch proposes to set si_pid to the sender's pid into the notify_owner
namespace.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Bastian Blank <bastian@waldi.eu.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agopid: implement ns_of_pid
Eric W. Biederman [Thu, 8 Jan 2009 02:08:46 +0000 (18:08 -0800)]
pid: implement ns_of_pid

commit f9fb860f67b9542cd78d1558dec7058092b57d8e upstream.

A current problem with the pid namespace is that it is easy to do pid
related work after exit_task_namespaces which drops the nsproxy pointer.

However if we are doing pid namespace related work we are always operating
on some struct pid which retains the pid_namespace pointer of the pid
namespace it was allocated in.

So provide ns_of_pid which allows us to find the pid namespace a pid was
allocated in.

Using this we have the needed infrastructure to do pid namespace related
work at anytime we have a struct pid, removing the chance of accidentally
having a NULL pointer dereference when accessing current->nsproxy.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Bastian Blank <bastian@waldi.eu.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoLinux 2.6.28.6 v2.6.28.6
Greg Kroah-Hartman [Tue, 17 Feb 2009 17:29:27 +0000 (09:29 -0800)]
Linux 2.6.28.6

15 years agonet: Fix data corruption when splicing from sockets.
Jarek Poplawski [Tue, 20 Jan 2009 01:03:56 +0000 (17:03 -0800)]
net: Fix data corruption when splicing from sockets.

[ Upstream commit 8b9d3728977760f6bd1317c4420890f73695354e ]

The trick in socket splicing where we try to convert the skb->data
into a page based reference using virt_to_page() does not work so
well.

The idea is to pass the virt_to_page() reference via the pipe
buffer, and refcount the buffer using a SKB reference.

But if we are splicing from a socket to a socket (via sendpage)
this doesn't work.

The from side processing will grab the page (and SKB) references.
The sendpage() calls will grab page references only, return, and
then the from side processing completes and drops the SKB ref.

The page based reference to skb->data is not enough to keep the
kmalloc() buffer backing it from being reused.  Yet, that is
all that the socket send side has at this point.

This leads to data corruption if the skb->data buffer is reused
by SLAB before the send side socket actually gets the TX packet
out to the device.

The fix employed here is to simply allocate a page and copy the
skb->data bytes into that page.

This will hurt performance, but there is no clear way to fix this
properly without a copy at the present time, and it is important
to get rid of the data corruption.

With fixes from Herbert Xu.

Tested-by: Willy Tarreau <w@1wt.eu>
Foreseen-by: Changli Gao <xiaosuo@gmail.com>
Diagnosed-by: Willy Tarreau <w@1wt.eu>
Reported-by: Willy Tarreau <w@1wt.eu>
Fixed-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoide-cd: fix DMA for non bio-backed requests
Borislav Petkov [Mon, 2 Feb 2009 19:12:21 +0000 (20:12 +0100)]
ide-cd: fix DMA for non bio-backed requests

commit 9e772d0135a5b5f8355320be429efa339700d52d upstream.

This one fixes http://bugzilla.kernel.org/show_bug.cgi?id=12320.

Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agox86: microcode_amd: fix wrong handling of equivalent CPU id
Andreas Herrmann [Tue, 16 Dec 2008 18:07:47 +0000 (19:07 +0100)]
x86: microcode_amd: fix wrong handling of equivalent CPU id

commit 3c763fd77e66e55d029052da31df0abd9920cb1e upstream.

Impact: fix bug resulting in non-loaded AMD microcode

mc_header->processor_rev_id is a 2 byte value. Similar is true for
equiv_cpu in an equiv_cpu_entry -- only 2 bytes are of interest.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agonetfilter: xt_sctp: sctp chunk mapping doesn't work
Qu Haoran [Thu, 12 Feb 2009 07:07:38 +0000 (08:07 +0100)]
netfilter: xt_sctp: sctp chunk mapping doesn't work

netfilter: xt_sctp: sctp chunk mapping doesn't work

Upstream commit: d4e2675a

When user tries to map all chunks given in argument, kernel
works on a copy of the chunkmap, but at the end it doesn't
check the copy, but the orginal one.

Signed-off-by: Qu Haoran <haoran.qu@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agonetfilter: fix tuple inversion for Node information request
Eric Leblond [Thu, 12 Feb 2009 07:07:37 +0000 (08:07 +0100)]
netfilter: fix tuple inversion for Node information request

netfilter: fix tuple inversion for Node information request

Upstream commit: a51f42f3c

The patch fixes a typo in the inverse mapping of Node Information
request. Following draft-ietf-ipngwg-icmp-name-lookups-09, "Querier"
sends a type 139 (ICMPV6_NI_QUERY) packet to "Responder" which answer
with a type 140 (ICMPV6_NI_REPLY) packet.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agolibata: fix EH device failure handling
Tejun Heo [Thu, 29 Jan 2009 11:31:29 +0000 (20:31 +0900)]
libata: fix EH device failure handling

commit d89293abd95bfd7dd9229087d6c30c1464c5ac83 upstream.

The dev->pio_mode > XFER_PIO_0 test is there to avoid unnecessary
speed down warning messages but it accidentally disabled SATA link spd
down during configuration phase after reset where PIO mode is always
zero.

This patch fixes the problem by moving the test where it belongs.
This makes libata probing sequence behave better when the connection
is flaky at higher link speeds which isn't too uncommon for eSATA
devices.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoide/libata: fix ata_id_is_cfa() (take 4)
Sergei Shtylyov [Sun, 1 Feb 2009 16:46:39 +0000 (20:46 +0400)]
ide/libata: fix ata_id_is_cfa() (take 4)

commit 2999b58b795ad81f10e34bdbbfd2742172f247e4 upstream.

When checking for the CFA feature set support, ata_id_is_cfa() tests bit 2 in
word 82 of the identify data instead the word 83;  it also checks the ATA/PI
version support in the word 80 (which the CompactFlash specifications have as
reserved), this having no slightest chance to work on the modern CF cards that
don't have 0x848A in the word 0...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoHID: adjust report descriptor fixup for MS 1028 receiver
Jiri Kosina [Wed, 14 Jan 2009 02:03:21 +0000 (03:03 +0100)]
HID: adjust report descriptor fixup for MS 1028 receiver

commit 0fb21de0799a985d2da3da14ae5625d724256638 upstream.

Report descriptor fixup for MS 1028 receiver changes also values for
Keyboard and Consumer, which incorrectly trims the range, causing correct
events being thrown away before passing to userspace.

We need to keep the GenDesk usage fixup though, as it reports totally bogus
values about axis.

Reported-by: Lucas Gadani <lgadani@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoALSA: mtpav - Fix initial value for input hwport
Takashi Iwai [Wed, 11 Feb 2009 23:06:42 +0000 (00:06 +0100)]
ALSA: mtpav - Fix initial value for input hwport

commit 32cf9a16f4af01573ddec1eb073111fc20a9d7d4 upstream.

Fix the initial value for input hwport.  The old value (-1) may cause
Oops when an realtime MIDI byte is received before the input port is
explicitly given.
Instead, now it's set to the broadcasting as default.

Tested-by: Holger Dehnhardt <dehnhardt@ahdehnhardt.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoALSA: hda - Add missing terminator in slave dig-out array
Takashi Iwai [Fri, 13 Feb 2009 10:37:08 +0000 (11:37 +0100)]
ALSA: hda - Add missing terminator in slave dig-out array

commit 3a08e30de2facffe8e1a25bf4fa62cbc920fbaf6 upstream.

Added the missing terminator for ad1989b_slave_dig_outs[].

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agosparc64: Annotate sparc64 specific syscalls with SYSCALL_DEFINEx()
David S. Miller [Fri, 13 Feb 2009 08:26:00 +0000 (00:26 -0800)]
sparc64: Annotate sparc64 specific syscalls with SYSCALL_DEFINEx()

[ Upstream commit e42650196df34789c825fa83f8bb37a5d5e52c14 ]

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agosparc: Enable syscall wrappers for 64-bit (CVE-2009-0029)
Christian Borntraeger [Fri, 13 Feb 2009 08:25:10 +0000 (00:25 -0800)]
sparc: Enable syscall wrappers for 64-bit (CVE-2009-0029)

[ Upstream commit 67605d6812691bbd2158d2f60259e0407611bc1b ]

sparc64 needs sign-extended function parameters. We have to enable
the system call wrappers.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agotcp: Fix length tcp_splice_data_recv passes to skb_splice_bits.
Dimitris Michailidis [Tue, 27 Jan 2009 06:15:31 +0000 (22:15 -0800)]
tcp: Fix length tcp_splice_data_recv passes to skb_splice_bits.

[ Upstream commit 9fa5fdf291c9b58b1cb8b4bb2a0ee57efa21d635 ]

tcp_splice_data_recv has two lengths to consider: the len parameter it
gets from tcp_read_sock, which specifies the amount of data in the skb,
and rd_desc->count, which is the amount of data the splice caller still
wants.  Currently it passes just the latter to skb_splice_bits, which then
splices min(rd_desc->count, skb->len - offset) bytes.

Most of the time this is fine, except when the skb contains urgent data.
In that case len goes only up to the urgent byte and is less than
skb->len - offset.  By ignoring len tcp_splice_data_recv may a) splice
data tcp_read_sock told it not to, b) return to tcp_read_sock a value > len.

Now, tcp_read_sock doesn't handle used > len and leaves the socket in a
bad state (both sk_receive_queue and copied_seq are bad at that point)
resulting in duplicated data and corruption.

Fix by passing min(rd_desc->count, len) to skb_splice_bits.

Signed-off-by: Dimitris Michailidis <dm@chelsio.com>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agotcp: splice as many packets as possible at once
Willy Tarreau [Wed, 14 Jan 2009 00:04:36 +0000 (16:04 -0800)]
tcp: splice as many packets as possible at once

[ Upstream commit 33966dd0e2f68f26943cd9ee93ec6abbc6547a8e ]

As spotted by Willy Tarreau, current splice() from tcp socket to pipe is not
optimal. It processes at most one segment per call.
This results in low performance and very high overhead due to syscall rate
when splicing from interfaces which do not support LRO.

Willy provided a patch inside tcp_splice_read(), but a better fix
is to let tcp_read_sock() process as many segments as possible, so
that tcp_rcv_space_adjust() and tcp_cleanup_rbuf() are called less
often.

With this change, splice() behaves like tcp_recvmsg(), being able
to consume many skbs in one system call. With typical 1460 bytes
of payload per frame, that means splice(SPLICE_F_NONBLOCK) can return
16*1460 = 23360 bytes.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agovirtio_net: Fix MAX_PACKET_LEN to support 802.1Q VLANs
Alex Williamson [Fri, 13 Feb 2009 08:06:29 +0000 (00:06 -0800)]
virtio_net: Fix MAX_PACKET_LEN to support 802.1Q VLANs

[ Upstream commit e918085aaff34086e265f825dd469926b1aec4a4 ]

802.1Q expanded the maximum ethernet frame size by 4 bytes for the
VLAN tag.  We're not taking this into account in virtio_net, which
means the buffers we provide to the backend in the virtqueue RX ring
aren't big enough to hold a full MTU VLAN packet.  For QEMU/KVM,
this results in the backend exiting with a packet truncation error.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Acked-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agotun: Fix unicast filter overflow
Alex Williamson [Mon, 9 Feb 2009 01:49:17 +0000 (17:49 -0800)]
tun: Fix unicast filter overflow

[ Upstream commit cfbf84fcbcda98bb91ada683a8dc8e6901a83ebd ]

Tap devices can make use of a small MAC filter set via the
TUNSETTXFILTER ioctl.  The filter has a set of exact matches
plus a hash for imperfect filtering of additional multicast
addresses.  The current code is unbalanced, adding unicast
addresses to the multicast hash, but only checking the hash
against multicast addresses.  This results in the filter
dropping unicast addresses that overflow the exact filter.
The fix is simply to disable the filter by leaving count set
to zero if we find non-multicast addresses after the exact
match table is filled.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agotun: Add some missing TUN compat ioctl translations.
David S. Miller [Fri, 30 Jan 2009 00:53:35 +0000 (16:53 -0800)]
tun: Add some missing TUN compat ioctl translations.

[ Upstream commit df1c46b2b6876d0a1b1b4740f009fa69d95ebbc9 ]

Based upon a report from Michael Tokarev <mjt@tls.msk.ru>:

Just saw in dmesg:

ioctl32(kvm:4408): Unknown cmd fd(9) cmd(800454cf){t:'T';sz:4} arg(ffc668e4) on /dev/net/tun

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agosky2: fix hard hang with netconsoling and iface going up
Alexey Dobriyan [Fri, 30 Jan 2009 21:45:31 +0000 (13:45 -0800)]
sky2: fix hard hang with netconsoling and iface going up

[ Upstream commit a11da890e4c9850411303efcf6514f048ca880ee ]

Printing anything over netconsole before hw is up and running is,
of course, not going to work.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agonet: 4 bytes kernel memory disclosure in SO_BSDCOMPAT gsopt try #2
Clément Lecigne [Fri, 13 Feb 2009 00:59:09 +0000 (16:59 -0800)]
net: 4 bytes kernel memory disclosure in SO_BSDCOMPAT gsopt try #2

[ Upstream commit df0bca049d01c0ee94afb7cd5dfd959541e6c8da ]

In function sock_getsockopt() located in net/core/sock.c, optval v.val
is not correctly initialized and directly returned in userland in case
we have SO_BSDCOMPAT option set.

This dummy code should trigger the bug:

int main(void)
{
unsigned char buf[4] = { 0, 0, 0, 0 };
int len;
int sock;
sock = socket(33, 2, 2);
getsockopt(sock, 1, SO_BSDCOMPAT, &buf, &len);
printf("%x%x%x%x\n", buf[0], buf[1], buf[2], buf[3]);
close(sock);
}

Here is a patch that fix this bug by initalizing v.val just after its
declaration.

Signed-off-by: Clément Lecigne <clement.lecigne@netasq.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoipv6: Copy cork options in ip6_append_data
Herbert Xu [Thu, 5 Feb 2009 23:15:50 +0000 (15:15 -0800)]
ipv6: Copy cork options in ip6_append_data

[ Upstream commit 0178b695fd6b40a62a215cbeb03dd51ada3bb5e0 ]

As the options passed to ip6_append_data may be ephemeral, we need
to duplicate it for corking.  This patch applies the simplest fix
which is to memdup all the relevant bits.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoipv6: Disallow rediculious flowlabel option sizes.
David S. Miller [Fri, 6 Feb 2009 08:49:55 +0000 (00:49 -0800)]
ipv6: Disallow rediculious flowlabel option sizes.

[ Upstream commit 684de409acff8b1fe8bf188d75ff2f99c624387d ]

Just like PKTINFO, limit the options area to 64K.

Based upon report by Eric Sesterhenn and analysis by
Roland Dreier.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoudp: increments sk_drops in __udp_queue_rcv_skb()
Eric Dumazet [Mon, 2 Feb 2009 21:41:57 +0000 (13:41 -0800)]
udp: increments sk_drops in __udp_queue_rcv_skb()

[ Upstream commit e408b8dcb5ce42243a902205005208e590f28454 ]

Commit 93821778def10ec1e69aa3ac10adee975dad4ff3 (udp: Fix rcv socket
locking) accidentally removed sk_drops increments for UDP IPV4
sockets.

This field can be used to detect incorrect sizing of socket receive
buffers.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoudp: Fix UDP short packet false positive
Jesper Dangaard Brouer [Thu, 5 Feb 2009 23:05:45 +0000 (15:05 -0800)]
udp: Fix UDP short packet false positive

[ Upstream commit 7b5e56f9d635643ad54f2f42e69ad16b80a2cff1 ]

The UDP header pointer assignment must happen after calling
pskb_may_pull().  As pskb_may_pull() can potentially alter the SKB
buffer.

This was exposted by running multicast traffic through the NIU driver,
as it won't prepull the protocol headers into the linear area on
receive.

Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agosungem: Soft lockup in sungem on Netra AC200 when switching interface up
Ilkka Virta [Sat, 7 Feb 2009 06:00:36 +0000 (22:00 -0800)]
sungem: Soft lockup in sungem on Netra AC200 when switching interface up

[ Upstream commit 71822faa3bc0af5dbf5e333a2d085f1ed7cd809f ]

From: Ilkka Virta <itvirta@iki.fi>

In the lockup situation the driver seems to go off in an eternal storm
of interrupts right after calling request_irq(). It doesn't actually
do anything interesting in the interrupt handler. Since connecting the link
afterwards works, something later in initialization must fix this.

Looking at gem_do_start() and gem_open(), it seems that the only thing
done while opening the device after the request_irq(), is a call to
napi_enable().

I don't know what the ordering requirements are for the
initialization, but I boldly tried to move the napi_enable() call
inside gem_do_start() before the link state is checked and interrupts
subsequently enabled, and it seems to work for me. Doesn't even break
anything too obvious...

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agopacket: Avoid lock_sock in mmap handler
Herbert Xu [Fri, 30 Jan 2009 22:12:06 +0000 (14:12 -0800)]
packet: Avoid lock_sock in mmap handler

[ Upstream commit 905db44087855e3c1709f538ecdc22fd149cadd8 ]

As the mmap handler gets called under mmap_sem, and we may grab
mmap_sem elsewhere under the socket lock to access user data, we
should avoid grabbing the socket lock in the mmap handler.

Since the only thing we care about in the mmap handler is for
pg_vec* to be invariant, i.e., to exclude packet_set_ring, we
can achieve this by simply using a new mutex.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Martin MOKREJŠ <mmokrejs@ribosome.natur.cuni.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agonet: packet socket packet_lookup_frame fix
Sebastiano Di Paola [Fri, 30 Jan 2009 23:37:17 +0000 (23:37 +0000)]
net: packet socket packet_lookup_frame fix

[ Upstream commit f9e6934502e46c363100245f137ddf0f4b1cb574 ]

packet_lookup_frames() fails to get user frame if current frame header
status contains extra flags.
This is due to the wrong assumption on the operators precedence during
frame status tests.
Fixed by forcing the right operators precedence order with explicit brackets.

Signed-off-by: Paolo Abeni <paolo.abeni@gmail.com>
Signed-off-by: Sebastiano Di Paola <sebastiano.dipaola@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agonet: Fix userland breakage wrt. linux/if_tunnel.h
David S. Miller [Mon, 2 Feb 2009 21:27:44 +0000 (13:27 -0800)]
net: Fix userland breakage wrt. linux/if_tunnel.h

[ Upstream commit 0afd4a21ba7d75e93fa79cf05d7a21774e149c0f ]

Reported by Andrew Walrond <andrew@walrond.org>

Changeset c19e654ddbe3831252f61e76a74d661e1a755530
("gre: Add netlink interface") added an include
of linux/ip.h to linux/if_tunnel.h

We can't really let that get exposed to userspace
because this conflicts with types defined in netinet/ip.h
which userland is almost certainly going to have included
either explicitly or implicitly.

So guard this include with a __KERNEL__ ifdef.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoipv4: fix infinite retry loop in IP-Config
Benjamin Zores [Fri, 30 Jan 2009 00:19:13 +0000 (16:19 -0800)]
ipv4: fix infinite retry loop in IP-Config

[ Upstream commit 9d8dba6c979fa99c96938c869611b9a23b73efa9 ]

Signed-off-by: Benjamin Zores <benjamin.zores@alcatel-lucent.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agodrivers/net/skfp: if !capable(CAP_NET_ADMIN): inverted logic
Roel Kluin [Fri, 30 Jan 2009 01:32:20 +0000 (17:32 -0800)]
drivers/net/skfp: if !capable(CAP_NET_ADMIN): inverted logic

[ Upstream commit c25b9abbc2c2c0da88e180c3933d6e773245815a ]

Fix inverted logic

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agonet: Fix OOPS in skb_seq_read().
Shyam Iyer [Fri, 30 Jan 2009 00:12:42 +0000 (16:12 -0800)]
net: Fix OOPS in skb_seq_read().

[ Upstream commit 71b3346d182355f19509fadb8fe45114a35cc499 ]

It oopsd for me in skb_seq_read. addr2line said it was
linux-2.6/net/core/skbuff.c:2228, which is this line:

while (st->frag_idx < skb_shinfo(st->cur_skb)->nr_frags) {

I added some printks in there and it looks like we hit this:

        } else if (st->root_skb == st->cur_skb &&
                   skb_shinfo(st->root_skb)->frag_list) {
                 st->cur_skb = skb_shinfo(st->root_skb)->frag_list;
                 st->frag_idx = 0;
                 goto next_skb;
        }

Actually I did some testing and added a few printks and found that the
st->cur_skb->data was 0 and hence the ptr used by iscsi_tcp was null.
This caused the kernel panic.

  if (abs_offset < block_limit) {
- *data = st->cur_skb->data + abs_offset;
+ *data = st->cur_skb->data + (abs_offset - st->stepped_offset);

I enabled the debug_tcp and with a few printks found that the code did
not go to the next_skb label and could find that the sequence being
followed was this -

It hit this if condition -

        if (st->cur_skb->next) {
                st->cur_skb = st->cur_skb->next;
                st->frag_idx = 0;
                goto next_skb;

And so, now the st pointer is shifted to the next skb whereas actually
it should have hit the second else if first since the data is in the
frag_list.

        else if (st->root_skb == st->cur_skb &&
                 skb_shinfo(st->root_skb)->frag_list) {
                st->cur_skb = skb_shinfo(st->root_skb)->frag_list;
                goto next_skb;
        }

Reversing the two conditions the attached patch fixes the issue for me
on top of Herbert's patches.

Signed-off-by: Shyam Iyer <shyam_iyer@dell.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agonet: Fix frag_list handling in skb_seq_read
Herbert Xu [Fri, 30 Jan 2009 00:07:52 +0000 (16:07 -0800)]
net: Fix frag_list handling in skb_seq_read

[ Upstream commit 95e3b24cfb4ec0479d2c42f7a1780d68063a542a ]

The frag_list handling was broken in skb_seq_read:

1) We didn't add the stepped offset when looking at the head
are of fragments other than the first.

2) We didn't take the stepped offset away when setting the data
pointer in the head area.

3) The frag index wasn't reset.

This patch fixes both issues.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agosctp: Properly timestamp outgoing data chunks for rtx purposes
Vlad Yasevich [Thu, 22 Jan 2009 22:53:01 +0000 (14:53 -0800)]
sctp: Properly timestamp outgoing data chunks for rtx purposes

[ Upstream commit 759af00ebef858015eb68876ac1f383bcb6a1774 ]

Recent changes to the retransmit code exposed a long standing
bug where it was possible for a chunk to be time stamped
after the retransmit timer was reset.  This caused a rare
situation where the retrnamist timer has expired, but
nothing was marked for retrnasmission because all of
timesamps on data were less then 1 rto ago.  As result,
the timer was never restarted since nothing was retransmitted,
and this resulted in a hung association that did couldn't
complete the data transfer.  The solution is to timestamp
the chunk when it's added to the packet for transmission
purposes.  After the packet is trsnmitted the rtx timer
is restarted.  This guarantees that when the timer expires,
there will be data to retransmit.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agosctp: Correctly start rtx timer on new packet transmissions.
Vlad Yasevich [Thu, 22 Jan 2009 22:52:43 +0000 (14:52 -0800)]
sctp: Correctly start rtx timer on new packet transmissions.

[ Upstream commit 6574df9a89f9f7da3a4e5cee7633d430319d3350 ]

Commit 62aeaff5ccd96462b7077046357a6d7886175a57
(sctp: Start T3-RTX timer when fast retransmitting lowest TSN)
introduced a regression where it was possible to forcibly
restart the sctp retransmit timer at the transmission of any
new chunk.  This resulted in much longer timeout times and
sometimes hung sctp connections.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agosctp: Fix crc32c calculations on big-endian arhes.
Vlad Yasevich [Thu, 22 Jan 2009 22:52:23 +0000 (14:52 -0800)]
sctp: Fix crc32c calculations on big-endian arhes.

[ Upstream commit 9c5ff5f75d0d0a1c7928ecfae3f38418b51a88e3 ]

crc32c algorithm provides a byteswaped result.  On little-endian
arches, the result ends up in big-endian/network byte order.
On big-endinan arches, the result ends up in little-endian
order and needs to be byte swapped again.  Thus calling cpu_to_le32
gives the right output.

Tested-by: Jukka Taimisto <jukka.taimisto@mail.suomi.net>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agolockd: fix regression in lockd's handling of blocked locks
J. Bruce Fields [Wed, 4 Feb 2009 22:35:38 +0000 (17:35 -0500)]
lockd: fix regression in lockd's handling of blocked locks

commit 9d9b87c1218be78ddecbc85ec3bb91c79c1d56ab upstream.

If a client requests a blocking lock, is denied, then requests it again,
then here in nlmsvc_lock() we will call vfs_lock_file() without FL_SLEEP
set, because we've already queued a block and don't need the locks code
to do it again.

But that means vfs_lock_file() will return -EAGAIN instead of
FILE_LOCK_DENIED.  So we still need to translate that -EAGAIN return
into a nlm_lck_blocked error in this case, and put ourselves back on
lockd's block list.

The bug was introduced by bde74e4bc64415b1 "locks: add special return
value for asynchronous locks".

Thanks to Frank van Maarseveen for the report; his original test
case was essentially

for i in `seq 30`; do flock /nfsmount/foo sleep 10 & done

Tested-by: Frank van Maarseveen <frankvm@frankvm.com>
Reported-by: Frank van Maarseveen <frankvm@frankvm.com>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>