TCP connection tracking in netfilter did not handle TCP reopening
properly: active close was taken into account for one side only and
not for any side, which is fixed now. The patch includes more comments
to explain the logic how the different cases are handled.
The bug was discovered by Jeff Chua.
Some devices report medium error locations incorrectly. Add guards to
make sure the reported bad lba is actually in the request that caused
it. Additionally remove the large case statment for sector sizes and
replace it with the proper u64 divisions.
Tested-by: Mike Snitzer <snitzer@gmail.com> Cc: Stable Tree <stable@kernel.org> Cc: Tony Battersby <tonyb@cybernetics.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
So I spent a while pounding my head against my monitor trying to figure
out the vmsplice() vulnerability - how could a failure to check for
*read* access turn into a root exploit? It turns out that it's a buffer
overflow problem which is made easy by the way get_user_pages() is
coded.
In particular, "len" is a signed int, and it is only checked at the
*end* of a do {} while() loop. So, if it is passed in as zero, the loop
will execute once and decrement len to -1. At that point, the loop will
proceed until the next invalid address is found; in the process, it will
likely overflow the pages array passed in to get_user_pages().
I think that, if get_user_pages() has been asked to grab zero pages,
that's what it should do. Thus this patch; it is, among other things,
enough to block the (already fixed) root exploit and any others which
might be lurking in similar code. I also think that the number of pages
should be unsigned, but changing the prototype of this function probably
requires some more careful review.
Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> CC: Oliver Pinter <oliver.pntr@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Quicklists calculates the size of the quicklists based on the number of
free pages. This must be the number of free pages that can be allocated
with GFP_KERNEL. node_page_state() includes the pages in ZONE_HIGHMEM and
ZONE_MOVABLE which may lead the quicklists to become too large causing OOM.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Tested-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Oliver Pinter <oliver.pntr@gmail.com>
Without this we always return 2^32-1 as the the maximum namelength.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> CC: Oliver Pinter <oliver.pntr@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It doesn't look as if the NFS file name limit is being initialised correctly
in the struct nfs_server. Make sure that we limit whatever is being set in
nfs_probe_fsinfo() and nfs_init_server().
Also ensure that readdirplus and nfs4_path_walk respect our file name
limits.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Neil Brown <neilb@suse.de> CC: Oliver Pinter <oliver.pntr@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Neil Brown said:
> Hi Trond,
>
> We found that a machine which made moderately heavy use of
> 'automount' was leaking some nfs data structures - particularly the
> 4K allocated by rpc_alloc_iostats.
> It turns out that this only happens with filesystems with -onolock
> set.
> The problem is that if NFS_MOUNT_NONLM is set, nfs_start_lockd doesn't
> set server->destroy, so when the filesystem is unmounted, the
> ->client_acl is not shutdown, and so several resources are still
> held. Multiple mount/umount cycles will slowly eat away memory
> several pages at a time.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: Neil Brown <neilb@suse.de> CC: Oliver Pinter <oliver.pntr@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The v2/v3 acl code in nfsd is translating any return from fh_verify() to
nfserr_inval. This is particularly unfortunate in the case of an
nfserr_dropit return, which is an internal error meant to indicate to
callers that this request has been deferred and should just be dropped
pending the results of an upcall to mountd.
Thanks to Roland <devzero@web.de> for bug report and data collection.
Cc: Roland <devzero@web.de> Acked-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Reviewed-By: NeilBrown <neilb@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> CC: Oliver Pinter <oliver.pntr@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The VIA veloicty driver needs the following to allow changing MTU when down.
The buffer size needs to be computed when device is brought up, not when
device is initialized. This also fixes a bug where the buffer size was
computed differently on change_mtu versus initial setting.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org> Acked-by: Jeff Mahoney <jeffm@suse.com> CC: Oliver Pinter <oliver.pntr@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Simple mtu change when device is down.
Fix http://bugzilla.kernel.org/show_bug.cgi?id=9382.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Jeff Mahoney <jeffm@suse.com> CC: Oliver Pinter <oliver.pntr@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
sonypi_compat uses a kfifo that needs to be present before _SRS is
called to be able to cope with the IRQs triggered when setting
resources.
Signed-off-by: Mattia Dongili <malattia@linux.it> Signed-off-by: Len Brown <len.brown@intel.com> Acked-by: Jeff Mahoney <jeffm@suse.com> CC: Oliver Pinter <oliver.pntr@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix a couple drivers that do not correctly terminate their pci_device_id
lists. This results in garbage being spewed into modules.pcimap when the
module happens to not have 28 NULL bytes following the table, and/or the
last PCI ID is actually truncated from the table when calculating the
modules.alias PCI aliases, cause those unfortunate device IDs to not
auto-load.
Signed-off-by: Kees Cook <kees@ubuntu.com> Acked-by: Corey Minyard <minyard@acm.org> Cc: David Woodhouse <dwmw2@infradead.org> Acked-by: Jeff Garzik <jeff@garzik.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Jeff Mahoney <jeffm@suse.com> CC: Oliver Pinter <oliver.pntr@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The TRACE_IRQS_ON function in iret_exc: calls a C function without
ensuring that the segments are set properly. Move the trace function and
the enabling of interrupt into the C stub.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Jeff Mahoney <jeffm@suse.com> CC: Oliver Pinter <oliver.pntr@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If the inode is flagged as having an invalid mapping, then we can't rely on
the PageUptodate() flag. Ensure that we don't use the "anti-fragmentation"
write optimisation in nfs_updatepage(), since that will cause NFS to write
out areas of the page that are no longer guaranteed to be up to date.
A potential corruption could occur in the following scenario:
fd=open("f",O_WRONLY|O_APPEND);
write(fd,"bar\n",4);
close(fd);
-----
The bug may lead to the file "f" reading 'fubar\n\0\0\0\nbar\n' because
client 2 does not update the cached page after re-opening the file for
write. Instead it keeps it marked as PageUptodate() until someone calls
invalidate_inode_pages2() (typically by calling read()).
Ian Abbott [Mon, 4 Feb 2008 13:56:36 +0000 (13:56 +0000)]
PCI: Fix fakephp deadlock
This patch works around a problem in the fakephp driver when a process
writing "0" to a "power" sysfs file to fake removal of a PCI device ends
up deadlocking itself in the sysfs code.
The patch is functionally identical to the one in Linus' tree post 2.6.24:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=5c796ae7a7ebe56967ed9b9963d7c16d733635ff
I have tested it on a 2.6.22 kernel.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Second-generation Promise SATA controllers have an ASIC bug
which can trigger if the last PRD entry is larger than 164 bytes,
resulting in intermittent errors and possible data corruption.
Work around this by replacing calls to ata_qc_prep() with a
private version that fills the PRD, checks the size of the
last entry, and if necessary splits it to avoid the bug.
Also reduce sg_tablesize by 1 to accommodate the new entry.
Tested on the second-generation SATA300 TX4 and SATA300 TX2plus,
and the first-generation PDC20378.
Thanks to Alexander Sabourenkov for verifying the bug by
studying the vendor driver, and for writing the initial patch
upon which this one is based.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch corrects sata_promise to classify FastTrack TX4200
(DID 3515/3519) as a second-generation chip. Promise's partial-
source FT TX4200 driver confirms this classification.
Treating it as a first-generation chip causes several problems:
1. Detection failures. This is a recent regression triggered by
the hotplug-enabling changes in 2.6.23-rc1.
2. Various "failed to resume link for reset" warnings.
This patch fixes <http://bugzilla.kernel.org/show_bug.cgi?id=8936>.
Thanks to Stephen Ziemba for reporting the bug and for testing the fix.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit 8811930dc74a503415b35c4a79d14fb0b408a361 ("splice: missing user
pointer access verification") added the proper access_ok() calls to
copy_from_user_mmap_sem() which ensures we can copy the struct iovecs
from userspace to the kernel.
But we also must check whether we can access the actual memory region
pointed to by the struct iovec to fix the access checks properly.
Signed-off-by: Bastian Blank <waldi@debian.org> Acked-by: Oliver Pinter <oliver.pntr@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Nick Piggin [Sat, 2 Feb 2008 02:08:53 +0000 (03:08 +0100)]
vm audit: add VM_DONTEXPAND to mmap for drivers that need it (CVE-2008-0007)
Drivers that register a ->fault handler, but do not range-check the
offset argument, must set VM_DONTEXPAND in the vm_flags in order to
prevent an expanding mremap from overflowing the resource.
I've audited the tree and attempted to fix these problems (usually by
adding VM_DONTEXPAND where it is not obvious).
[POWERPC] Fix invalid semicolon after if statement
A similar fix to netfilter from Eric Dumazet inspired me to
look around a bit by using some grep/sed stuff as looking for
this kind of bugs seemed easy to automate. This is one of them
I found where it looks like this semicolon is not valid.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Changed resolution of named references in packages
Fixed a problem with the Package operator where all named
references were created as object references and left otherwise
unresolved. According to the ACPI specification, a Package can
only contain Data Objects or references to control methods. The
implication is that named references to Data Objects (Integer,
Buffer, String, Package, BufferField, Field) should be resolved
immediately upon package creation. This is the approach taken
with this change. References to all other named objects (Methods,
Devices, Scopes, etc.) are all now properly created as reference objects.
Here's proposed fix for RX checksum handling in cassini; it affects
little-endian working with half-duplex gigabit, but obviously needs
testing on big-endian too.
The problem is, we need to convert checksum to fixed-endian *before*
correcting for (unstripped) FCS. On big-endian it won't matter
(conversion is no-op), on little-endian it will, but only if FCS is
not stripped by hardware; i.e. in half-duplex gigabit mode when
->crc_size is set.
cassini.c part is that fix, cassini.h one consists of trivial
endianness annotations. With that applied the sucker is endian-clean,
according to sparse.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Al went through the ip_fast_csum callers and found this piece of code
that did not validate the IP header. While root crashing the machine
by sending bogus packets through raw or AF_PACKET sockets isn't that
serious, it is still nice to react gracefully.
This patch ensures that the skb has enough data for an IP header and
that the header length field is valid.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When re-naming an interface, the previous secondary address
labels get lost e.g.
$> brctl addbr foo
$> ip addr add 192.168.0.1 dev foo
$> ip addr add 192.168.0.2 dev foo label foo:00
$> ip addr show dev foo | grep inet
inet 192.168.0.1/32 scope global foo
inet 192.168.0.2/32 scope global foo:00
$> ip link set foo name bar
$> ip addr show dev bar | grep inet
inet 192.168.0.1/32 scope global bar
inet 192.168.0.2/32 scope global bar:2
Turns out to be a simple thinko in inetdev_changename() - clearly we
want to look at the address label, rather than the device name, for
a suffix to retain.
Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The aalgos/ealgos fields are only 32 bits wide. However, af_key tries
to test them with the expression 1 << id where id can be as large as
253. This produces different behaviour on different architectures.
The following patch explicitly checks whether ID is greater than 31
and fails the check if that's the case.
We cannot easily extend the mask to be longer than 32 bits due to
exposure to user-space. Besides, this whole interface is obsolete
anyway in favour of the xfrm_user interface which doesn't use this
bit mask in templates (well not within the kernel anyway).
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If we get an error during the actual policy lookup we don't free the
original dst while the caller expects us to always free the original
dst in case of error.
This patch fixes that.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The following patch corrects this performance/latency problem,
removing quadratic behavior.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Since we setup the 256M/4M bitmap table after taking over the trap
table, it's possible for some 4M mapping to get loaded in the TLB
beforhand which later will be 256M mappings.
This can cause illegal TLB multiple-match conditions. Fix this by
setting up the bitmap before we take over the trap table.
Next, __flush_tlb_all() was not doing anything on hypervisor
platforms. Fix by adding sun4v_mmu_demap_all() and calling it.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The function x25_get_neigh increments a reference count. At the point of
the second goto out, the result of calling x25_get_neigh is only stored in
a local variable, and thus no one outside the function will be able to
decrease the reference count. Thus, x25_neigh_put should be called before
the return in this case.
The problem was found using the following semantic match.
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
type T,T1,T2;
identifier E;
statement S;
expression x1,x2,x3;
int ret;
@@
T E;
...
* if ((E = x25_get_neigh(...)) == NULL)
S
... when != x25_neigh_put(...,(T1)E,...)
when != if (E != NULL) { ... x25_neigh_put(...,(T1)E,...); ...}
when != x1 = (T1)E
when != E = x3;
when any
if (...) {
... when != x25_neigh_put(...,(T2)E,...)
when != if (E != NULL) { ... x25_neigh_put(...,(T2)E,...); ...}
when != x2 = (T2)E
(
* return;
|
* return ret;
)
}
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Way back when (in commit 834f2a4a1554dc5b2598038b3fe8703defcbe467, aka
"VFS: Allow the filesystem to return a full file pointer on open intent"
to be exact), Trond changed the open logic to keep track of the original
flags to a file open, in order to pass down the the intent of a dentry
lookup to the low-level filesystem.
However, when doing that reorganization, it changed the meaning of
namei_flags, and thus inadvertently changed the test of access mode for
directories (and RO filesystem) to use the wrong flag. So fix those
test back to use access mode ("acc_mode") rather than the open flag
("flag").
Issue noticed by Bill Roman at Datalight.
Reported-and-tested-by: Bill Roman <bill.roman@datalight.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Spurious NCQ completion detection implemented in ahci was incorrect.
On AHCI receving and processing FISes and raising interrupts are not
interlocked and spurious interrupts are expected.
For example, if an interrupt occurs while interrupt handler is running
and the running interrupt handler handles the event the new IRQ
indicated, after IRQ handler finishes, it will be executed again
because IRQ pending bit is set by the new interrupt but there won't be
anything to process.
Please read the following message for more information.
http://article.gmane.org/gmane.linux.ide/26012
This patch...
* Removes all spurious IRQ whining from ahci. Spurious NCQ completion
detection was completely wrong. Spurious D2H Register FIS taught us
that some early drives send spurious D2H Register FIS with I bit set
while NCQ commands are in progress but none of recent drives does
that and even the ones which show such behavior can do NCQ fine.
* Kills all NCQ blacklist entries which were added because of spurious
NCQ completions. I tracked down each commit and verified all
removed ones are actually added because of spurious completions.
WD740ADFD-00NLR1 wasn't deleted but moved upward because the drive
not only had spurious NCQ completions but also is slow on sequential
data transfers if NCQ is enabled.
Maxtor 7V300F0 was added by 0e3dbc01d53940fe10e5a5cfec15ede3e929c918
from Alan Cox. I can only find evidences that the drive only had
troubles with spuruious completions by searching the mailing list.
This entry needs to be verified and removed if it doesn't have other
NCQ related problems.
Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The xfrm_timer calls __xfrm_state_delete, which drops the final reference
manually without triggering destruction of the state. Change it to use
xfrm_state_put to add the state to the gc list when we're dropping the
last reference. The timer function may still continue to use the state
safely since the final destruction does a del_timer_sync().
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We need to disable all CPUs other than the boot CPU (usually 0) before
attempting to power-off modern SMP machines. This fixes the
hang-on-poweroff issue on my MythTV SMP box, and also on Thomas Gleixner's
new toybox.
Signed-off-by: Mark Lord <mlord@pobox.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There still is a remaining shutdown problem in 2.6.22 with old APM based
systems, but this fix is not the correct one
fsid_source decided where to get the 'fsid' number to
return for a GETATTR based on the type of filehandle.
It can be from the device, from the fsid, or from the
UUID.
It is possible for the filehandle to be inconsistent
with the export information, so make sure the export information
actually has the info implied by the value returned by
fsid_source.
Signed-off-by: Neil Brown <neilb@suse.de> Cc: "Luiz Fernando N. Capitulino" <lcapitulino@gmail.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oliver Pintr <oliver.pntr@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In case the br_netfilter_init() (or any subsequent call)
fails, the br_fdb_fini() must be called to free the allocated
in br_fdb_init() br_fdb_cache kmem cache.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As far as I see from the err variable initialization
the dn_nl_deladdr() routine was designed to report errors
like "EADDRNOTAVAIL" and probaby "ENODEV".
But the code sets this err to 0 after the first nlmsg_parse
and goes on, returning this 0 in any case.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Avaid provided test application, so bug got fixed.
IPv6 addrconf removes ipv6 inner device from netdev each time cmu
changes and new value is less than IPV6_MIN_MTU (1280 bytes).
When mtu is changed and new value is greater than IPV6_MIN_MTU,
it does not add ipv6 addresses and inner device bac.
This patch fixes that.
Tested with Avaid's application, which works ok now.
Lachlan Andrew observed that my TCP-Illinois implementation uses the
beta value incorrectly:
The parameter beta in the paper specifies the amount to decrease
*by*: that is, on loss,
W <- W - beta*W
but in tcp_illinois_ssthresh() uses beta as the amount
to decrease *to*: W <- beta*W
This bug makes the Linux TCP-Illinois get less-aggressive on uncongested network,
hurting performance. Note: since the base beta value is .5, it has no
impact on a congested network.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I am not absolutely sure whether this actually is a bug (as in: I've got
no clue what the standards say or what other implementations do), but at
least I was pretty surprised when I noticed that a recv() on a
non-blocking unix domain socket of type SOCK_SEQPACKET (which is connection
oriented, after all) where the remote end has closed the connection
returned -1 (EAGAIN) rather than 0 to indicate end of file.
if you are lucky (unlucky?) enough to have shared interrupts, the
interrupt handler can be called before the tasklet and lock are ready
for use.
Signed-off-by: chas williams <chas@cmf.nrl.navy.mil> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: David Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As it is crypto_remove_spawn may try to unregister an instance which is
yet to be registered. This patch fixes this by checking whether the
instance has been registered before attempting to remove it.
It also removes a bogus cra_destroy check in crypto_register_instance as
1) it's outside the mutex;
2) we have a check in __crypto_register_alg already.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: David Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The #ifdef's in arp_process() were not only a mess, they were also wrong
in the CONFIG_NET_ETHERNET=n and (CONFIG_NETDEV_1000=y or
CONFIG_NETDEV_10000=y) cases.
Since they are not required this patch removes them.
Also removed are some #ifdef's around #include's that caused compile
errors after this change.
Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: David Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It seems that stats of cpu 0 are counted twice, since
for_each_possible_cpu() is looping on all possible cpus, including 0
Before percpu conversion of ip_rt_acct, we should also remove the
assumption that CPU 0 is online (or even possible)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When the abstraction functions got added, conversion here was
made incorrectly. As a result, the skb may end up pointing
to skb which got included to the probe skb and then was freed.
For it to trigger, however, skb_transmit must fail sending as
well.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
sysctl_tcp_congestion_control seems to have a bug that prevents it
from actually calling the tcp_set_default_congestion_control
function. This is not so apparent because it does not return an error
and generally the /proc interface is used to configure the default TCP
congestion control algorithm. This is present in 2.6.18 onwards and
probably earlier, though I have not inspected 2.6.15--2.6.17.
sysctl_tcp_congestion_control calls sysctl_string and expects a successful
return code of 0. In such a case it actually sets the congestion control
algorithm with tcp_set_default_congestion_control. Otherwise, it returns the
value returned by sysctl_string. This was correct in 2.6.14, as sysctl_string
returned 0 on success. However, sysctl_string was updated to return 1 on
success around about 2.6.15 and sysctl_tcp_congestion_control was not updated.
Even though sysctl_tcp_congestion_control returns 1, do_sysctl_strategy
converts this return code to '0', so the caller never notices the error.
Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The code in fb_ddc_read() is said to be based on the implementation of the
radeon driver:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=fc5891c8a3ba284f13994d7bc1f1bfa8283982de
However, comparing the old radeon driver code with the new fb_ddc code
reveals some differences. Most notably, the I2C bus lines are held at the
end of the function, while the original code was releasing them (as the
comment above correctly says.)
There are a few other differences, which appear to be responsible for read
failures on my system. While tracing low-level I2C code in i2c-algo-bit, I
noticed that the initial attempt to read the EDID always failed. It takes
one retry for the read to succeed. As we are about to remove this
automatic retry property from i2c-algo-bit, reading the EDID would really
fail.
As a summary, the I2C lines quirk which is supposedly needed to read EDID
on some older monitors is currently breaking the (first) read on all other
monitors (and might not even work with older ones - did anyone try since
October 2006?)
After applying the patch below, which makes the code in fb_ddc_read()
really similar to what the radeon driver used to have, the first EDID read
succeeds again.
On top of that, as it appears that this code has been broken for one year
now and nobody seems to have complained, I'm curious if it makes sense to
keep this quirk in place. It makes the code more complex and slower just
for the sake of monitors which I guess nobody uses anymore. Can't we just
get rid of it?
Signed-off-by: Jean Delvare <khali@linux-fr.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Roger Leigh <rleigh@whinlatter.ukfsn.org> Tested-by: Michael Buesch <mb@bu3sch.de> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix a long boot delay in the forcedeth driver. During initialization, the
timeout for the handshake between mgmt unit and driver can be very long.
The patch reduces the timeout by eliminating a extra loop around the
timeout logic.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com> Cc: Alex Howells <astinus@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds new device ids and features for mcp79 devices into the
forcedeth driver.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
index 92ce2e3..f9ba0ac 100644
David Holmes found a bug in the -rt tree with respect to
pthread_cond_timedwait. After trying his test program on the latest git
from mainline, I found the bug was there too. The bug he was seeing
that his test program showed, was that if one were to do a "Ctrl-Z" on a
process that was in the pthread_cond_timedwait, and then did a "bg" on
that process, it would return with a "-ETIMEDOUT" but early. That is,
the timer would go off early.
Looking into this, I found the source of the problem. And it is a rather
nasty bug at that.
Here's the relevant code from kernel/futex.c: (not in order in the file)
[...]
smlinkage long sys_futex(u32 __user *uaddr, int op, u32 val,
struct timespec __user *utime, u32 __user *uaddr2,
u32 val3)
{
struct timespec ts;
ktime_t t, *tp = NULL;
u32 val2 = 0;
int cmd = op & FUTEX_CMD_MASK;
if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI)) {
if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
return -EFAULT;
if (!timespec_valid(&ts))
return -EINVAL;
t = timespec_to_ktime(ts);
if (cmd == FUTEX_WAIT)
t = ktime_add(ktime_get(), t);
tp = &t;
}
[...]
return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
}
[...]
long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
u32 __user *uaddr2, u32 val2, u32 val3)
{
int ret;
int cmd = op & FUTEX_CMD_MASK;
struct rw_semaphore *fshared = NULL;
if (!(op & FUTEX_PRIVATE_FLAG))
fshared = ¤t->mm->mmap_sem;
switch (cmd) {
case FUTEX_WAIT:
ret = futex_wait(uaddr, fshared, val, timeout);
So when the futex_wait is interrupt by a signal we break out of the
hrtimer code and set up or return from signal. This code does not return
back to userspace, so we set up a RESTARTBLOCK. The bug here is that we
save the "abs_time" which is a pointer to the stack variable "ktime_t t"
from sys_futex.
This returns and unwinds the stack before we get to call our signal. On
return from the signal we go to futex_wait_restart, where we update all
the parameters for futex_wait and call it. But here we have a problem
where abs_time is no longer valid.
I verified this with print statements, and sure enough, what abs_time
was set to ends up being garbage when we get to futex_wait_restart.
The solution I did to solve this (with input from Linus Torvalds)
was to add unions to the restart_block to allow system calls to
use the restart with specific parameters. This way the futex code now
saves the time in a 64bit value in the restart block instead of storing
it on the stack.
Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64
in thread_info.h, when there's a #ifdef __KERNEL__ just below that.
Not sure what that is there for. If this turns out to be a problem, I've
tested this with using "unsigned int" for u32 and "unsigned long long" for
u64 and it worked just the same. I'm using u32 and u64 just to be
consistent with what the futex code uses.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Relative hrtimers with a large timeout value might end up as negative
timer values, when the current time is added in hrtimer_start().
This in turn is causing the clockevents_set_next() function to set an
huge timeout and sleep for quite a long time when we have a clock
source which is capable of long sleeps like HPET. With PIT this almost
goes unnoticed as the maximum delta is ~27ms. The non-hrt/nohz code
sorts this out in the next timer interrupt, so we never noticed that
problem which has been there since the first day of hrtimers.
This bug became more apparent in 2.6.24 which activates HPET on more
hardware.
[LIB] crc32c: Keep intermediate crc state in cpu order
crypto/crc32.c:chksum_final() is computing the digest as
*(__le32 *)out = ~cpu_to_le32(mctx->crc);
so the low-level crc32c_le routines should just keep
the crc in cpu order, otherwise it is getting swabbed
one too many times on big-endian machines.
Li Zefan [Wed, 28 Nov 2007 08:56:27 +0000 (09:56 +0100)]
nf_nat: fix memset error
This patch fixes an incorrect memset in the NAT code, causing
misbehaviour when unloading and reloading the NAT module.
Applies to stable-2.6.22 and stable-2.6.23.
The size passing to memset is the size of a pointer. Fixes
misbehaviour when unloading and reloading the NAT module.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
tmpfs was misconverted to __GFP_ZERO in 2.6.11. There's an unusual case in
which shmem_getpage receives the page from its caller instead of allocating.
We must cover this case by clear_highpage before SetPageUptodate, as before.
A recent patch added software synchronization during EHCI startup,
so ports aren't switched away from the companion controllers after
resets have started. This patch adds a short delay letting hardware
finish that port switching before any new resets begin ... so both
ends of that hardware race window are closed.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Dave Miller <davem@davemloft.net> Cc: Dely Sy <dely.l.sy@intel.com> Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In wait_task_stopped() exit_code already contains the right value for the
si_status member of siginfo, and this is simply set in the non WNOWAIT
case.
If you call waitid() with a stopped or traced process, you'll get the signal
in siginfo.si_status as expected -- however if you call waitid(WNOWAIT) at the
same time, you'll get the signal << 8 | 0x7f
Pass it unchanged to wait_noreap_copyout(); we would only need to shift it
and add 0x7f if we were returning it in the user status field and that
isn't used for any function that permits WNOWAIT.
Signed-off-by: Scott James Remnant <scott@ubuntu.com> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We have seen ramdisk based install systems, where some pages of mapped
libraries and programs were suddendly zeroed under memory pressure. This
should not happen, as the ramdisk avoids freeing its pages by keeping
them dirty all the time.
It turns out that there is a case, where the VM makes a ramdisk page
clean, without telling the ramdisk driver. On memory pressure
shrink_zone runs and it starts to run shrink_active_list. There is a
check for buffer_heads_over_limit, and if true, pagevec_strip is called.
pagevec_strip calls try_to_release_page. If the mapping has no
releasepage callback, try_to_free_buffers is called. try_to_free_buffers
has now a special logic for some file systems to make a dirty page
clean, if all buffers are clean. Thats what happened in our test case.
The simplest solution is to provide a noop-releasepage callback for the
ramdisk driver. This avoids try_to_free_buffers for ramdisk pages.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Nick Piggin <npiggin@suse.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The L1 network chip can DMA to 64-bit addresses, but multiple descriptor
rings share a single register for the high 32 bits of their address, so
only a single, aligned, 4 GB physical address range can be used at a time.
As a result, we need to confine the driver to a 32-bit DMA mask, otherwise
we see occasional data corruption errors in systems containing 4 or more
gigabytes of RAM.
Signed-off-by: Luca Tettamanti <kronos.it@gmail.com> Signed-off-by: Jay Cliburn <jacliburn@bellsouth.net> Acked-by: Chris Snook <csnook@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Recent (i.e. 2005 and later) Sony Vaio laptops have names beginning
with VGN rather than PCG. Update the eeprom driver so that it
recognizes these.
Why this matters: the eeprom driver hides private data from the
EEPROMs it recognizes as Vaio EEPROMs (passwords, serial number...) so
if the driver fails to recognize a Vaio EEPROM as such, the private
data is exposed to the world.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The sysfs interface to DMI data takes care to not make the system
serial number and UUID world-readable, presumably due to privacy
concerns. For consistency, we should not let the eeprom driver
export these same strings to the world on Sony Vaio laptops.
Instead, only make them readable by root, as we already do for BIOS
passwords.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On file systems which don't support sparse files, Ocfs2_map_page_blocks()
was reading blocks on appending writes. This caused write performance to
suffer dramatically. Fix this by detecting an appending write on a nonsparse
fs and skipping the read.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The patch
- Includes the call to capilib_data_b3_req in the spinlock. This routine
in turn calls the offending mq_enqueue routine that triggered the
freeze if not locked. This should also fix other indicators of
incosistent capilib_msgidqueue list, that trigger messages like:
Oct 5 03:05:57 BERL0 kernel: kcapi: msgid 3019 ncci 0x30301 not on queue
that we saw several times a day (usually several in a row).
- Fixes all occurrences of c4_dispatch_tx to be called with active
spinlock, there were some instances where no lock was active. Mostly
these are in very infrequently called routines, so the additional
performance penalty is minimal.
This patch (as999) fixes a problem that sometimes shows up when host
controller driver modules are loaded in the wrong order. If ehci-hcd
happens to initialize an EHCI controller while the companion OHCI or
UHCI controller is in the middle of a port reset, the reset can fail
and the companion may get very confused. The patch adds an
rw-semaphore and uses it to keep EHCI initialization and port resets
mutually exclusive.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: David Brownell <david-b@pacbell.net> Cc: David Miller <davem@davemloft.net> Cc: Dely L Sy <dely.l.sy@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>