]> git.kernelconcepts.de Git - karo-tx-linux.git/log
karo-tx-linux.git
14 years agoLinux 2.6.29.3 v2.6.29.3
Greg Kroah-Hartman [Fri, 8 May 2009 22:47:21 +0000 (15:47 -0700)]
Linux 2.6.29.3

14 years agoath9k: Fix FIF_BCN_PRBRESP_PROMISC handling
Luis R. Rodriguez [Wed, 6 May 2009 00:04:11 +0000 (17:04 -0700)]
ath9k: Fix FIF_BCN_PRBRESP_PROMISC handling

This is a port of commit
91ed19f5f66a7fe544f0ec385e981f43491d1d5a
for 2.6.29.

Without this after scanning your device will set
the association ID to something bogus and what is
being reported is multicast/broadcast frame are not
being received. For details see this bug report:

https://bugzilla.redhat.com/show_bug.cgi?id=498502

>From the original commit:

So that a new created IBSS network
doesn't break on the first scan.

It seems to Sujith and me that this
stupid code unnecessary, too.

So remove it...

Reported-by: David Woodhouse <dwmw2@infradead.org>
Tested-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Alina Friedrichsen <x-alina@gmx.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Jouni Malinen <Jouni.Malinen@atheros.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agotracing: x86, mmiotrace: fix range test
Stuart Bennett [Tue, 28 Apr 2009 19:17:48 +0000 (20:17 +0100)]
tracing: x86, mmiotrace: fix range test

commit 33015c85995716d03f6293346cf05a1908b0fb9a upstream.

Matching on (addr == (p->addr + p->len)) causes problems when mappings
are adjacent.

[ Impact: fix mmiotrace confusion on adjacent iomaps ]

Signed-off-by: Stuart Bennett <stuart@freedesktop.org>
Acked-by: Pekka Paalanen <pq@iki.fi>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1240946271-7083-2-git-send-email-stuart@freedesktop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agosched: account system time properly
Eric Dumazet [Wed, 29 Apr 2009 12:44:49 +0000 (14:44 +0200)]
sched: account system time properly

commit f5f293a4e3d0a0c52cec31de6762c95050156516 upstream.

Andrew Gallatin reported that IRQ and SOFTIRQ times were
sometime not reported correctly on recent kernels, and even
bisected to commit 457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4
([PATCH] fix scaled & unscaled cputime accounting) as the first
bad commit.

Further analysis pointed that commit
79741dd35713ff4f6fd0eafd59fa94e8a4ba922d ([PATCH] idle cputime
accounting) was the real cause of the problem.

account_process_tick() was not taking into account timer IRQ
interrupting the idle task servicing a hard or soft irq.

On mostly idle cpu, irqs were thus not accounted and top or
mpstat could tell user/admin that cpu was 100 % idle, 0.00 %
irq, 0.00 % softirq, while it was not.

[ Impact: fix occasionally incorrect CPU statistics in top/mpstat ]

Reported-by: Andrew Gallatin <gallatin@myri.com>
Re-reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: rick.jones2@hp.com
Cc: brice@myri.com
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
LKML-Reference: <49F84BC1.7080602@cosmosbay.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agorndis_wlan: fix initialization order for workqueue&workers
Jussi Kivilinna [Wed, 22 Apr 2009 07:59:37 +0000 (10:59 +0300)]
rndis_wlan: fix initialization order for workqueue&workers

commit e805e4d0b53506dff4255a2792483f094e7fcd2c upstream.

rndis_wext_link_change() might be called from rndis_command() at
initialization stage and priv->workqueue/priv->work have not been
initialized yet. This causes invalid opcode at rndis_wext_bind on
some brands of bcm4320.

Fix by initializing workqueue/workers in rndis_wext_bind() before
rndis_command is used.

This bug has existed since 2.6.25, reported at:
http://bugzilla.kernel.org/show_bug.cgi?id=12794

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agomm: fix Committed_AS underflow on large NR_CPUS environment
KOSAKI Motohiro [Tue, 28 Apr 2009 20:48:11 +0000 (22:48 +0200)]
mm: fix Committed_AS underflow on large NR_CPUS environment

commit 00a62ce91e554198ef28234c91c36f850f5a3bc9 upstream

The Committed_AS field can underflow in certain situations:

>         # while true; do cat /proc/meminfo  | grep _AS; sleep 1; done | uniq -c
>               1 Committed_AS: 18446744073709323392 kB
>              11 Committed_AS: 18446744073709455488 kB
>               6 Committed_AS:    35136 kB
>               5 Committed_AS: 18446744073709454400 kB
>               7 Committed_AS:    35904 kB
>               3 Committed_AS: 18446744073709453248 kB
>               2 Committed_AS:    34752 kB
>               9 Committed_AS: 18446744073709453248 kB
>               8 Committed_AS:    34752 kB
>               3 Committed_AS: 18446744073709320960 kB
>               7 Committed_AS: 18446744073709454080 kB
>               3 Committed_AS: 18446744073709320960 kB
>               5 Committed_AS: 18446744073709454080 kB
>               6 Committed_AS: 18446744073709320960 kB

Because NR_CPUS can be greater than 1000 and meminfo_proc_show() does
not check for underflow.

But NR_CPUS proportional isn't good calculation.  In general,
possibility of lock contention is proportional to the number of online
cpus, not theorical maximum cpus (NR_CPUS).

The current kernel has generic percpu-counter stuff.  using it is right
way.  it makes code simplify and percpu_counter_read_positive() don't
make underflow issue.

Reported-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Eric B Munson <ebmunson@us.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoIgnore madvise(MADV_WILLNEED) for hugetlbfs-backed regions
Mel Gorman [Tue, 5 May 2009 15:37:17 +0000 (16:37 +0100)]
Ignore madvise(MADV_WILLNEED) for hugetlbfs-backed regions

commit a425a638c858fd10370b573bde81df3ba500e271 upstream.

madvise(MADV_WILLNEED) forces page cache readahead on a range of memory
backed by a file.  The assumption is made that the page required is
order-0 and "normal" page cache.

On hugetlbfs, this assumption is not true and order-0 pages are
allocated and inserted into the hugetlbfs page cache.  This leaks
hugetlbfs page reservations and can cause BUGs to trigger related to
corrupted page tables.

This patch causes MADV_WILLNEED to be ignored for hugetlbfs-backed
regions.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoclockevents: prevent endless loop in tick_handle_periodic()
john stultz [Fri, 1 May 2009 20:10:25 +0000 (13:10 -0700)]
clockevents: prevent endless loop in tick_handle_periodic()

commit 74a03b69d1b5ce00a568e142ca97e76b7f5239c6 upstream.

tick_handle_periodic() can lock up hard when a one shot clock event
device is used in combination with jiffies clocksource.

Avoid an endless loop issue by requiring that a highres valid
clocksource be installed before we call tick_periodic() in a loop when
using ONESHOT mode. The result is we will only increment jiffies once
per interrupt until a continuous hardware clocksource is available.

Without this, we can run into a endless loop, where each cycle through
the loop, jiffies is updated which increments time by tick_period or
more (due to clock steering), which can cause the event programming to
think the next event was before the newly incremented time and fail
causing tick_periodic() to be called again and the whole process loops
forever.

[ Impact: prevent hard lock up ]

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agointel-iommu: Avoid panic() for DRHD at address zero.
David Woodhouse [Tue, 5 May 2009 08:25:28 +0000 (09:25 +0100)]
intel-iommu: Avoid panic() for DRHD at address zero.

(cherry picked from commit e523b38e2f568af58baa13120a994cbf24e6dee0)

If the BIOS does something obviously stupid, like claiming that the
registers for the IOMMU are at physical address zero, then print a nasty
message and abort, rather than trying to set up the IOMMU and then later
panicking.

It's becoming more and more obvious that trusting this stuff to the BIOS
was a mistake.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agointel-iommu: Fix oops in device_to_iommu() when devices not found.
David Woodhouse [Tue, 5 May 2009 08:25:26 +0000 (09:25 +0100)]
intel-iommu: Fix oops in device_to_iommu() when devices not found.

(cherry picked from commit 4958c5dc7bcb2e42d985cd26aeafd8a7eca9ab1e)

It's possible for a device in the drhd->devices[] array to be NULL if
it wasn't found at boot time, which means we have to check for that
case.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agointel-iommu: Fix device-to-iommu mapping for PCI-PCI bridges.
David Woodhouse [Tue, 5 May 2009 08:25:23 +0000 (09:25 +0100)]
intel-iommu: Fix device-to-iommu mapping for PCI-PCI bridges.

(cherry picked from commit 924b6231edfaf1e764ffb4f97ea382bf4facff58)

When the DMAR table identifies that a PCI-PCI bridge belongs to a given
IOMMU, that means that the bridge and all devices behind it should be
associated with the IOMMU. Not just the bridge itself.

This fixes the device_to_iommu() function accordingly.

(It's broken if you have the same PCI bus numbers in multiple domains,
but this function was always broken in that way; I'll be dealing with
that later).

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agocs5536: define dma_sff_read_status() method
Sergei Shtylyov [Tue, 5 May 2009 11:34:34 +0000 (15:34 +0400)]
cs5536: define dma_sff_read_status() method

commit 15da90b516e9da92cc1d90001e640fd6707d0e27 upstream.

The driver somehow got merged with the initializer for the dma_sff_read_status()
method missing which caused kernel panic on bootup.

This should fix the kernel.org bug #13026...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Reported-by: Arnd Hannemann <hannemann@nets.rwth-aachen.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoproc: avoid information leaks to non-privileged processes
Jake Edge [Mon, 4 May 2009 18:51:14 +0000 (12:51 -0600)]
proc: avoid information leaks to non-privileged processes

commit f83ce3e6b02d5e48b3a43b001390e2b58820389d upstream.

By using the same test as is used for /proc/pid/maps and /proc/pid/smaps,
only allow processes that can ptrace() a given process to see information
that might be used to bypass address space layout randomization (ASLR).
These include eip, esp, wchan, and start_stack in /proc/pid/stat as well
as the non-symbolic output from /proc/pid/wchan.

ASLR can be bypassed by sampling eip as shown by the proof-of-concept
code at http://code.google.com/p/fuzzyaslr/ As part of a presentation
(http://www.cr0.org/paper/to-jt-linux-alsr-leak.pdf) esp and wchan were
also noted as possibly usable information leaks as well.  The
start_stack address also leaks potentially useful information.

Cc: Stable Team <stable@kernel.org>
Signed-off-by: Jake Edge <jake@lwn.net>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoath5k: fix buffer overrun in rate debug code
Bob Copeland [Tue, 28 Apr 2009 02:12:43 +0000 (22:12 -0400)]
ath5k: fix buffer overrun in rate debug code

commit b7fcb5c4a4c27da2f6d86cb03d18687e537442cf upstream.

char bname[5] is too small for the string "X GHz" when the null
terminator is taken into account.  Thus, turning on rate debugging
can crash unless we have lucky stack alignment.

Cc: stable@kernel.org
Reported-by: Paride Legovini <legovini@spiro.fisica.unipd.it>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agomv643xx_eth: OOM handling fixes
Lennert Buytenhek [Wed, 29 Apr 2009 11:57:34 +0000 (11:57 +0000)]
mv643xx_eth: OOM handling fixes

commit 1319ebadf185933e6b7ff95211d3cef9004e9754 upstream.

Currently, when OOM occurs during rx ring refill, mv643xx_eth will get
into an infinite loop, due to the refill function setting the OOM bit
but not clearing the 'rx refill needed' bit for this queue, while the
calling function (the NAPI poll handler) will call the refill function
in a loop until the 'rx refill needed' bit goes off, without checking
the OOM bit.

This patch fixes this by checking the OOM bit in the NAPI poll handler
before attempting to do rx refill.  This means that once OOM occurs,
we won't try to do any memory allocations again until the next invocation
of the poll handler.

While we're at it, change the OOM flag to be a single bit instead of
one bit per receive queue since OOM is a system state rather than a
per-queue state, and cancel the OOM timer on entry to the NAPI poll
handler if it's running to prevent it from firing when we've already
come out of OOM.

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agomv643xx_eth: 64bit mib counter read fix
Lennert Buytenhek [Wed, 29 Apr 2009 11:58:18 +0000 (11:58 +0000)]
mv643xx_eth: 64bit mib counter read fix

commit 93af7aca44f0e82e67bda10a0fb73d383edcc8bd upstream.

On several mv643xx_eth hardware versions, the two 64bit mib counters
for 'good octets received' and 'good octets sent' are actually 32bit
counters, and reading from the upper half of the register has the same
effect as reading from the lower half of the register: an atomic
read-and-clear of the entire 32bit counter value.  This can under heavy
traffic occasionally lead to small numbers being added to the upper
half of the 64bit mib counter even though no 32bit wrap has occured.

Since we poll the mib counters at least every 30 seconds anyway, we
might as well just skip the reads of the upper halves of the hardware
counters without breaking the stats, which this patch does.

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agocheck_unsafe_exec: s/lock_task_sighand/rcu_read_lock/
Oleg Nesterov [Thu, 23 Apr 2009 23:02:45 +0000 (01:02 +0200)]
check_unsafe_exec: s/lock_task_sighand/rcu_read_lock/

commit 437f7fdb607f32b737e4da9f14bebcfdac2c90c3 upstream.

write_lock(&current->fs->lock) guarantees we can't wrongly miss
LSM_UNSAFE_SHARE, this is what we care about. Use rcu_read_lock()
instead of ->siglock to iterate over the sub-threads. We must see
all CLONE_THREAD|CLONE_FS threads which didn't pass exit_fs(), it
takes fs->lock too.

With or without this patch we can miss the freshly cloned thread
and set LSM_UNSAFE_SHARE, we don't care.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
[ Fixed lock/unlock typo  - Hugh ]
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agodo_execve() must not clear fs->in_exec if it was set by another thread
Oleg Nesterov [Thu, 23 Apr 2009 23:01:56 +0000 (01:01 +0200)]
do_execve() must not clear fs->in_exec if it was set by another thread

commit 8c652f96d3852b97a49c331cd0bb02d22f3cb31b upstream.

If do_execve() fails after check_unsafe_exec(), it clears fs->in_exec
unconditionally. This is wrong if we race with our sub-thread which
also does do_execve:

Two threads T1 and T2 and another process P, all share the same
->fs.

T1 starts do_execve(BAD_FILE). It calls check_unsafe_exec(), since
->fs is shared, we set LSM_UNSAFE but not ->in_exec.

P exits and decrements fs->users.

T2 starts do_execve(), calls check_unsafe_exec(), now ->fs is not
shared, we set fs->in_exec.

T1 continues, open_exec(BAD_FILE) fails, we clear ->in_exec and
return to the user-space.

T1 does clone(CLONE_FS /* without CLONE_THREAD */).

T2 continues without LSM_UNSAFE_SHARE while ->fs is shared with
another process.

Change check_unsafe_exec() to return res = 1 if we set ->in_exec, and change
do_execve() to clear ->in_exec depending on res.

When do_execve() suceeds, it is safe to clear ->in_exec unconditionally.
It can be set only if we don't share ->fs with another process, and since
we already killed all sub-threads either ->in_exec == 0 or we are the
only user of this ->fs.

Also, we do not need fs->lock to clear fs->in_exec.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agocheck_unsafe_exec() doesn't care about signal handlers sharing
Al Viro [Mon, 30 Mar 2009 11:35:18 +0000 (07:35 -0400)]
check_unsafe_exec() doesn't care about signal handlers sharing

commit f1191b50ec11c8e2ca766d6d99eb5bb9d2c084a3 upstream.

... since we'll unshare sighand anyway

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoNew locking/refcounting for fs_struct
Al Viro [Mon, 30 Mar 2009 11:20:30 +0000 (07:20 -0400)]
New locking/refcounting for fs_struct

commit 498052bba55ecaff58db6a1436b0e25bfd75a7ff upstream.

* all changes of current->fs are done under task_lock and write_lock of
  old fs->lock
* refcount is not atomic anymore (same protection)
* its decrements are done when removing reference from current; at the
  same time we decide whether to free it.
* put_fs_struct() is gone
* new field - ->in_exec.  Set by check_unsafe_exec() if we are trying to do
  execve() and only subthreads share fs_struct.  Cleared when finishing exec
  (success and failure alike).  Makes CLONE_FS fail with -EAGAIN if set.
* check_unsafe_exec() may fail with -EAGAIN if another execve() from subthread
  is in progress.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoTake fs_struct handling to new file (fs/fs_struct.c)
Al Viro [Sun, 29 Mar 2009 23:00:13 +0000 (19:00 -0400)]
Take fs_struct handling to new file (fs/fs_struct.c)

commit 3e93cd671813e204c258f1e6c797959920cf7772 upstream.

Pure code move; two new helper functions for nfsd and daemonize
(unshare_fs_struct() and daemonize_fs_struct() resp.; for now -
the same code as used to be in callers).  unshare_fs_struct()
exported (for nfsd, as copy_fs_struct()/exit_fs() used to be),
copy_fs_struct() and exit_fs() don't need exports anymore.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoGet rid of bumping fs_struct refcount in pivot_root(2)
Al Viro [Tue, 31 Mar 2009 00:36:33 +0000 (20:36 -0400)]
Get rid of bumping fs_struct refcount in pivot_root(2)

commit f8ef3ed2bebd2c4cb9ece92efa185d7aead8831a upstream.

Not because execve races with _that_ are serious - we really
need a situation when final drop of fs_struct refcount is
done by something that used to have it as current->fs.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoKill unsharing fs_struct in __set_personality()
Al Viro [Mon, 30 Mar 2009 09:45:36 +0000 (05:45 -0400)]
Kill unsharing fs_struct in __set_personality()

commit 11d06b2a1e5658f448a308aa3beb97bacd64a940 upstream.

That's a rudiment of altroot support.  I.e. it should've been buried
a long time ago.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoAnnotate struct fs_struct's usage count restriction
David Howells [Sat, 28 Mar 2009 23:23:01 +0000 (23:23 +0000)]
Annotate struct fs_struct's usage count restriction

commit 795e2fe0a3b69dbc040d7efcf517e0cbad6901d0 upstream.

Annotate struct fs_struct's usage count to indicate the restrictions upon it.
It may not be incremented, except by clone(CLONE_FS), as this affects the
check in check_unsafe_exec() in fs/exec.c.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agofix setuid sometimes wouldn't
Hugh Dickins [Sat, 28 Mar 2009 23:21:27 +0000 (23:21 +0000)]
fix setuid sometimes wouldn't

commit 7c2c7d993044cddc5010f6f429b100c63bc7dffb upstream.

check_unsafe_exec() also notes whether the fs_struct is being
shared by more threads than will get killed by the exec, and if so
sets LSM_UNSAFE_SHARE to make bprm_set_creds() careful about euid.
But /proc/<pid>/cwd and /proc/<pid>/root lookups make transient
use of get_fs_struct(), which also raises that sharing count.

This might occasionally cause a setuid program not to change euid,
in the same way as happened with files->count (check_unsafe_exec
also looks at sighand->count, but /proc doesn't raise that one).

We'd prefer exec not to unshare fs_struct: so fix this in procfs,
replacing get_fs_struct() by get_fs_path(), which does path_get
while still holding task_lock, instead of raising fs->count.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agofix setuid sometimes doesn't
Hugh Dickins [Sat, 28 Mar 2009 23:20:19 +0000 (23:20 +0000)]
fix setuid sometimes doesn't

commit e426b64c412aaa3e9eb3e4b261dc5be0d5a83e78 upstream.

Joe Malicki reports that setuid sometimes doesn't: very rarely,
a setuid root program does not get root euid; and, by the way,
they have a health check running lsof every few minutes.

Right, check_unsafe_exec() notes whether the files_struct is being
shared by more threads than will get killed by the exec, and if so
sets LSM_UNSAFE_SHARE to make bprm_set_creds() careful about euid.
But /proc/<pid>/fd and /proc/<pid>/fdinfo lookups make transient
use of get_files_struct(), which also raises that sharing count.

There's a rather simple fix for this: exec's check on files->count
has been redundant ever since 2.6.1 made it unshare_files() (except
while compat_do_execve() omitted to do so) - just remove that check.

[Note to -stable: this patch will not apply before 2.6.29: earlier
releases should just remove the files->count line from unsafe_exec().]

Reported-by: Joe Malicki <jmalicki@metacarta.com>
Narrowed-down-by: Michael Itz <mitz@metacarta.com>
Tested-by: Joe Malicki <jmalicki@metacarta.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agocompat_do_execve should unshare_files
Hugh Dickins [Sat, 28 Mar 2009 23:16:03 +0000 (23:16 +0000)]
compat_do_execve should unshare_files

commit 53e9309e01277ec99c38e84e0ca16921287cf470 upstream.

2.6.26's commit fd8328be874f4190a811c58cd4778ec2c74d2c05
"sanitize handling of shared descriptor tables in failing execve()"
moved the unshare_files() from flush_old_exec() and several binfmts
to the head of do_execve(); but forgot to make the same change to
compat_do_execve(), leaving a CLONE_FILES files_struct shared across
exec from a 32-bit process on a 64-bit kernel.

It's arguable whether the files_struct really ought to be unshared
across exec; but 2.6.1 made that so to stop the loading binary's fd
leaking into other threads, and a 32-bit process on a 64-bit kernel
ought to behave in the same way as 32 on 32 and 64 on 64.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agopowerpc: Sanitize stack pointer in signal handling code
Josh Boyer [Tue, 28 Apr 2009 15:15:59 +0000 (11:15 -0400)]
powerpc: Sanitize stack pointer in signal handling code

This has been backported to 2.6.29.x from commit efbda86098 in Linus' tree

On powerpc64 machines running 32-bit userspace, we can get garbage bits in the
stack pointer passed into the kernel.  Most places handle this correctly, but
the signal handling code uses the passed value directly for allocating signal
stack frames.

This fixes the issue by introducing a get_clean_sp function that returns a
sanitized stack pointer.  For 32-bit tasks on a 64-bit kernel, the stack
pointer is masked correctly.  In all other cases, the stack pointer is simply
returned.

Additionally, we pass an 'is_32' parameter to get_sigframe now in order to
get the properly sanitized stack.  The callers are know to be 32 or 64-bit
statically.

Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoACPI: Revert conflicting workaround for BIOS w/ mangled PRT entries
Zhang Rui [Fri, 1 May 2009 15:12:41 +0000 (11:12 -0400)]
ACPI: Revert conflicting workaround for BIOS w/ mangled PRT entries

upstream 82babbb3887e234c995626e4121d411ea9070ca5
backported to 2.6.29.2

2f894ef9c8b36a35d80709bedca276d2fc691941
in Linux-2.6.21 worked around BIOS with mangled _PRT entries:
http://bugzilla.kernel.org/show_bug.cgi?id=6859

d0e184abc5983281ef189db2c759d65d56eb1b80
worked around the same issue via ACPICA, and shipped in 2.6.27.

Unfortunately the two workarounds conflict:
http://bugzilla.kernel.org/show_bug.cgi?id=12270

So revert the Linux specific one.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoUSB: serial: fix lifetime and locking problems
Alan Stern [Mon, 4 May 2009 15:30:32 +0000 (11:30 -0400)]
USB: serial: fix lifetime and locking problems

This is commit 2d93148ab6988cad872e65d694c95e8944e1b626 back-ported to
2.6.29.

This patch (as1229-3) fixes a few lifetime and locking problems in the
usb-serial driver.  The main symptom is that an invalid kevent is
created when the serial device is unplugged while a connection is
active.

Ports should be unregistered when device is disconnected,
not when the parent usb_serial structure is deallocated.

Each open file should hold a reference to the corresponding
port structure, and the reference should be released when
the file is closed.

serial->disc_mutex should be acquired in serial_open(), to
resolve the classic race between open and disconnect.

serial_close() doesn't need to hold both serial->disc_mutex
and port->mutex at the same time.

Release the subdriver's module reference only after releasing
all the other references, in case one of the release routines
needs to invoke some code in the subdriver module.

Replace a call to flush_scheduled_work() (which is prone to
deadlocks) with cancel_work_sync().  Also, add a call to
cancel_work_sync() in the disconnect routine.

Reduce the scope of serial->disc_mutex in serial_disconnect().
The only place it really needs to protect is where the
"disconnected" flag is set.

Call the shutdown method from within serial_disconnect()
instead of destroy_serial(), because some subdrivers expect
the port data structures still to be in existence when
their shutdown method runs.

This fixes the bug reported in

http://bugs.freedesktop.org/show_bug.cgi?id=20703

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
14 years agoptrace: ptrace_attach: fix the usage of ->cred_exec_mutex
Oleg Nesterov [Sun, 26 Apr 2009 23:41:34 +0000 (01:41 +0200)]
ptrace: ptrace_attach: fix the usage of ->cred_exec_mutex

commit cad81bc2529ab8c62b6fdc83a1c0c7f4a87209eb upstream.

ptrace_attach() needs task->cred_exec_mutex, not current->cred_exec_mutex.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agokbuild: fix Module.markers permission error under cygwin
Cedric Hombourger [Sat, 25 Apr 2009 07:38:21 +0000 (09:38 +0200)]
kbuild: fix Module.markers permission error under cygwin

commit 99e3a1eb3c22bb671c6f3d22d8244bfc9fad8185 upstream.

While building the kernel, we end-up calling modpost with -K and -M
options for the same file (Modules.markers).  This is resulting in
modpost's main function calling read_markers() and then write_markers() on
the same file.

We then have read_markers() mmap'ing the file, and writer_markers()
opening that same file for writing.

The issue is that read_markers() exits without munmap'ing the file and is
as a matter holding a reference on Modules.markers.  When write_markers()
is opening that very same file for writing, we still have a reference on
it and cygwin (Windows?) is then making fopen() fail with EPERM.

Calling release_file() before exiting read_markers() clears that reference
(and memory leak) and fopen() then succeeds.

Tested on both cygwin (1.3.22) and Linux.  Also ran modpost within
valgrind on Linux to make sure that the munmap'ed file was not accessed
after read_markers()

Signed-off-by: Cedric Hombourger <chombourger@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agopagemap: require aligned-length, non-null reads of /proc/pid/pagemap
Vitaly Mayatskikh [Thu, 30 Apr 2009 22:08:18 +0000 (15:08 -0700)]
pagemap: require aligned-length, non-null reads of /proc/pid/pagemap

commit 0816178638c15ce5472d39d771a96860dff4141a upstream.

The intention of commit aae8679b0ebcaa92f99c1c3cb0cd651594a43915
("pagemap: fix bug in add_to_pagemap, require aligned-length reads of
/proc/pid/pagemap") was to force reads of /proc/pid/pagemap to be a
multiple of 8 bytes, but now it allows to read 0 bytes, which actually
puts some data to user's buffer.  According to POSIX, if count is zero,
read() should return zero and has no other results.

Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
Cc: Thomas Tuttle <ttuttle@google.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agodrm/i915: allow tiled front buffers on 965+
Jesse Barnes [Tue, 14 Apr 2009 21:17:47 +0000 (14:17 -0700)]
drm/i915: allow tiled front buffers on 965+

commit f544847fbaf099278343f875987a983f2b913134 upstream.

This patch corrects a pretty big oversight in the KMS code for 965+
chips.  The current code is missing tiled surface register programming,
so userland can allocate a tiled surface and use it for mode setting,
resulting in corruption.  This patch fixes that, allowing for tiled
front buffers on 965+.

Cc: stable@kernel.org
Tested-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agobio: fix memcpy corruption in bio_copy_user_iov()
FUJITA Tomonori [Tue, 28 Apr 2009 18:24:29 +0000 (20:24 +0200)]
bio: fix memcpy corruption in bio_copy_user_iov()

commit 69838727bcd819a8fd73a88447801221788b0c6d upstream.

st driver uses blk_rq_map_user() in order to just build a request out
of page frames. In this case, map_data->offset is a non zero value and
iov[0].iov_base is NULL. We need to increase nr_pages for that.

Cc: stable@kernel.org
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoPCI quirk: disable MSI on VIA VT3364 chipsets
Thomas Renninger [Fri, 3 Apr 2009 13:34:00 +0000 (06:34 -0700)]
PCI quirk: disable MSI on VIA VT3364 chipsets

commit 162dedd39dcc6eca3fc0d29cf19658c6c13b840e upstream.

Without this patch, Broadcom BCM5906 Ethernet controllers set up via MSI
cause the machine to hang.  Tejun agreed that the best is to blacklist
the whole chipset and after adding it, seeing the other VIA quirks
disabling MSI, this very much looks like the right way.

Cc: <stable@kernel.org>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoASoC: Fix offset of freqmode in WM8580 PLL configuration
Mark Brown [Tue, 21 Apr 2009 11:35:15 +0000 (12:35 +0100)]
ASoC: Fix offset of freqmode in WM8580 PLL configuration

commit ce88168f5b5eca7f40394fa6b05ae29f4b685569 upstream.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agox86/PCI: don't call e820_all_mapped with -1 in the mmconfig case
Yinghai Lu [Sat, 18 Apr 2009 08:43:46 +0000 (01:43 -0700)]
x86/PCI: don't call e820_all_mapped with -1 in the mmconfig case

commit 044cd80942e47b9de0915b627902adf05c52377f upstream.

e820_all_mapped need end is (addr + size) instead of (addr + size - 1)

Cc: stable@kernel.org
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agox86-64: fix FPU corruption with signals and preemption
Suresh Siddha [Thu, 9 Apr 2009 22:24:34 +0000 (15:24 -0700)]
x86-64: fix FPU corruption with signals and preemption

commit 06c38d5e36b12d040839ff224e805146c0368556 upstream.

In 64bit signal delivery path, clear_used_math() was happening before saving
the current active FPU state on to the user stack for signal handling. Between
clear_used_math() and the state store on to the user stack, potentially we
can get a page fault for the user address and can block. Infact, while testing
we were hitting the might_fault() in __clear_user() which can do a schedule().

At a later point in time, we will schedule back into this process and
resume the save state (using "xsave/fxsave" instruction) which can lead
to DNA fault. And as used_math was cleared before, we will reinit the FP state
in the DNA fault and continue. This reinit will result in loosing the
FPU state of the process.

Move clear_used_math() to a point after the FPU state has been stored
onto the user stack.

This issue is present from a long time (even before the xsave changes
and the x86 merge). But it can easily be exposed in 2.6.28.x and 2.6.29.x
series because of the __clear_user() in this path, which has an explicit
__cond_resched() leading to a context switch with CONFIG_PREEMPT_VOLUNTARY.

[ Impact: fix FPU state corruption ]

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agodrm/i915: add support for G41 chipset
Zhenyu Wang [Mon, 17 Nov 2008 05:58:11 +0000 (13:58 +0800)]
drm/i915: add support for G41 chipset

commit 72021788678523047161e97b3dfed695e802a5fd upstream.

This had been delayed for some time due to failure to work on the one piece
of G41 hardware we had, and lack of success reports from anybody else.
Current hardware appears to be OK.

Signed-off-by: Zhenyu Wang <zhenyu.z.wang@intel.com>
[anholt: hand-applied due to conflicts with IGD patches]
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agounreached code in selinux_ip_postroute_iptables_compat() (CVE-2009-1184)
Eugene Teo [Mon, 13 Apr 2009 02:04:41 +0000 (10:04 +0800)]
unreached code in selinux_ip_postroute_iptables_compat() (CVE-2009-1184)

Not upstream in 2.6.30, as the function was removed there, making this a
non-issue.

Node and port send checks can skip in the compat_net=1 case. This bug
was introduced in commit effad8d.

Signed-off-by: Eugene Teo <eugeneteo@kernel.sg>
Reported-by: Dan Carpenter <error27@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoPCI: fix incorrect mask of PM No_Soft_Reset bit
Yu Zhao [Wed, 25 Feb 2009 05:15:52 +0000 (13:15 +0800)]
PCI: fix incorrect mask of PM No_Soft_Reset bit

commit 998dd7c719f62dcfa91d7bf7f4eb9c160e03d817 upstream.

Reviewed-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoexit_notify: kill the wrong capable(CAP_KILL) check (CVE-2009-1337)
Oleg Nesterov [Mon, 6 Apr 2009 14:16:02 +0000 (16:16 +0200)]
exit_notify: kill the wrong capable(CAP_KILL) check (CVE-2009-1337)

CVE-2009-1337

commit 432870dab85a2f69dc417022646cb9a70acf7f94 upstream.

The CAP_KILL check in exit_notify() looks just wrong, kill it.

Whatever logic we have to reset ->exit_signal, the malicious user
can bypass it if it execs the setuid application before exiting.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agocrypto: ixp4xx - Fix handling of chained sg buffers
Christian Hohnstaedt [Fri, 27 Mar 2009 07:09:05 +0000 (15:09 +0800)]
crypto: ixp4xx - Fix handling of chained sg buffers

commit 0d44dc59b2b434b29aafeae581d06f81efac7c83 upstream.

 - keep dma functions away from chained scatterlists.
   Use the existing scatterlist iteration inside the driver
   to call dma_map_single() for each chunk and avoid dma_map_sg().

Signed-off-by: Christian Hohnstaedt <chohnstaedt@innominate.com>
Tested-By: Karl Hiramoto <karl@hiramoto.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoblock: include empty disks in /proc/diskstats
Tejun Heo [Fri, 17 Apr 2009 06:34:48 +0000 (08:34 +0200)]
block: include empty disks in /proc/diskstats

commit 71982a409f12c50d011325a4471aa20666bb908d upstream.

/proc/diskstats used to show stats for all disks whether they're
zero-sized or not and their non-zero partitions.  Commit
074a7aca7afa6f230104e8e65eba3420263714a5 accidentally changed the
behavior such that it doesn't print out zero sized disks.  This patch
implements DISK_PITER_INCL_EMPTY_PART0 flag to partition iterator and
uses it in diskstats_show() such that empty part0 is shown in
/proc/diskstats.

Reported and bisectd by Dianel Collins.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Daniel Collins <solemnwarning@solemnwarning.no-ip.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agob44: Use kernel DMA addresses for the kernel DMA API
Michael Buesch [Mon, 6 Apr 2009 09:52:27 +0000 (09:52 +0000)]
b44: Use kernel DMA addresses for the kernel DMA API

commit 37efa239901493694a48f1d6f59f8de17c2c4509 upstream.

We must not use the device DMA addresses for the kernel DMA API, because
device DMA addresses have an additional offset added for the SSB translation.

Use the original dma_addr_t for the sync operation.

Cc: stable@kernel.org
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agovirtio-rng: Remove false BUG for spurious callbacks
Christian Borntraeger [Fri, 24 Apr 2009 22:35:03 +0000 (22:35 +0000)]
virtio-rng: Remove false BUG for spurious callbacks

upstream commit: e5b89542ea18020961882228c26db3ba87f6e608

The virtio-rng drivers checks for spurious callbacks. Since
callbacks can be implemented via shared interrupts (e.g. PCI) this
could lead to guest kernel oopses with lots of virtio devices.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
14 years agoUSB: Unusual Device support for Gold MP3 Player Energy
Chuck Short [Fri, 24 Apr 2009 16:05:04 +0000 (16:05 +0000)]
USB: Unusual Device support for Gold MP3 Player Energy

upstream commit: 46c6e93faa85d1362e1d127dc28cf9d0b304a6f1

Reported by Alessio Treglia on
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/125250

User was getting the following errors in dmesg:

[ 2158.139386] sd 5:0:0:1: ioctl_internal_command return code = 8000002
[ 2158.139390] : Current: sense key: No Sense
[ 2158.139393] Additional sense: No additional sense information

Adds unusual device support.

modified:   drivers/usb/storage/unusual_devs.h

Signed-off-by: Chuck Short <zulcss@ubuntu.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
14 years agoKVM: x86: release time_page on vcpu destruction
Joerg Roedel [Fri, 24 Apr 2009 16:05:07 +0000 (16:05 +0000)]
KVM: x86: release time_page on vcpu destruction

upstream commit: 7f1ea208968f021943d4103ba59e06bb6d8239cb

Not releasing the time_page causes a leak of that page or the compound
page it is situated in.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
14 years agoKVM: Fix overlapping check for memory slots
Jan Kiszka [Fri, 24 Apr 2009 16:05:09 +0000 (16:05 +0000)]
KVM: Fix overlapping check for memory slots

upstream commit: 4cd481f68dde99ac416003b825c835f71e364393

When checking for overlapping slots on registration of a new one, kvm
currently also considers zero-length (ie. deleted) slots and rejects
requests incorrectly. This finally denies user space from joining slots.
Fix the check by skipping deleted slots and advertise this via a
KVM_CAP_JOIN_MEMORY_REGIONS_WORKS.

Cc: stable@kernel.org
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
14 years agoKVM: MMU: disable global page optimization
Marcelo Tosatti [Fri, 24 Apr 2009 21:18:27 +0000 (18:18 -0300)]
KVM: MMU: disable global page optimization

upstream commit: bf47a760f66add7870fba33ab50f58b550d6bbd1

Complexity to fix it not worthwhile the gains, as discussed
in http://article.gmane.org/gmane.comp.emulators.kvm.devel/28649.

Cc: stable@kernel.org
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
[mtosatti: backport to 2.6.29]
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
14 years agoKVM: MMU: Fix off-by-one calculating large page count
Avi Kivity [Fri, 24 Apr 2009 16:05:14 +0000 (16:05 +0000)]
KVM: MMU: Fix off-by-one calculating large page count

upstream commit: 99894a799f09cf9e28296bb16e75bd5830fd2c4e

The large page initialization code concludes there are two large pages spanned
by a slot covering 1 (small) page starting at gfn 1.  This is incorrect, and
also results in incorrect write_count initialization in some cases (base = 1,
npages = 513 for example).

Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
14 years agomac80211: fix basic rate bitmap calculation
Johannes Berg [Fri, 24 Apr 2009 16:05:16 +0000 (16:05 +0000)]
mac80211: fix basic rate bitmap calculation

upstream commit: 7e0986c17f695952ce5d61ed793ce048ba90a661

"mac80211: fix basic rates setting from association response"
introduced a copy/paste error.

Unfortunately, this not just leads to wrong data being passed
to the driver but is remotely exploitable for some hardware or
driver combinations.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
14 years agoALSA: us122l: add snd_us122l_free()
Karsten Wiese [Fri, 24 Apr 2009 16:05:19 +0000 (16:05 +0000)]
ALSA: us122l: add snd_us122l_free()

upstream commit: 5d4af1be06affa2b42cdf59cd376752be1f934b3

Use it to clean up snd_us122l_card_used[].

Without patch unplugging of an US122L soundcard didn't reset the
corresponding element of snd_us122l_card_used[] to 0.
The (SNDRV_CARDS + 1)th plugging in did not result in creating the soundcard
device anymore.
Index values supplied with the modprobe command line were not used correctly
anymore after the first unplugging of an US122L.

Signed-off-by: Karsten Wiese <fzu@wemgehoertderstaat.de>
Cc: stable@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[chrisw: backport to 2.6.29]
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
14 years agothinkpad-acpi: fix LED blinking through timer trigger
Henrique de Moraes Holschuh [Fri, 24 Apr 2009 16:05:21 +0000 (16:05 +0000)]
thinkpad-acpi: fix LED blinking through timer trigger

upstream commit: 75bd3bf2ade9d548be0d2bde60b5ee0fdce0b127

The set_blink hook code in the LED subdriver would never manage to get
a LED to blink, and instead it would just turn it on.  The consequence
of this is that the "timer" trigger would not cause the LED to blink
if given default parameters.

This problem exists since 2.6.26-rc1.

To fix it, switch the deferred LED work handling to use the
thinkpad-acpi-specific LED status (off/on/blink) directly.

This also makes the code easier to read, and to extend later.

Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: stable@kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
14 years agob43: Refresh RX poison on buffer recycling
Michael Buesch [Fri, 24 Apr 2009 16:05:29 +0000 (16:05 +0000)]
b43: Refresh RX poison on buffer recycling

upstream commit: cf68636a9773aa97915497fe54fa4a51e3f08f3a

The RX buffer poison needs to be refreshed, if we recycle an RX buffer,
because it might be (partially) overwritten by some DMA operations.

Cc: stable@kernel.org
Cc: Francesco Gringoli <francesco.gringoli@ing.unibs.it>
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
14 years agob43: Poison RX buffers
Michael Buesch [Fri, 24 Apr 2009 16:05:31 +0000 (16:05 +0000)]
b43: Poison RX buffers

upstream commit: ec9a1d8c13e36440eda0f3c79b8149080e3ab5ba

This patch adds poisoning and sanity checking to the RX DMA buffers.
This is used for protection against buggy hardware/firmware that raises
RX interrupts without doing an actual DMA transfer.

This mechanism protects against rare "bad packets" (due to uninitialized skb data)
and rare kernel crashes due to uninitialized RX headers.

The poison is selected to not match on valid frames and to be cheap for checking.

The poison check mechanism _might_ trigger incorrectly, if we are voluntarily
receiving frames with bad PLCP headers. However, this is nonfatal, because the
chance of such a match is basically zero and in case it happens it just results
in dropping the packet.
Bad-PLCP RX defaults to off, and you should leave it off unless you want to listen
to the latest news broadcasted by your microwave oven.

This patch also moves the initialization of the RX-header "length" field in front of
the mapping of the DMA buffer. The CPU should not touch the buffer after we mapped it.

Cc: stable@kernel.org
Reported-by: Francesco Gringoli <francesco.gringoli@ing.unibs.it>
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
14 years agomac80211: Fix bug in getting rx status for frames pending in reorder buffer
Vasanthakumar Thiagarajan [Fri, 24 Apr 2009 16:05:33 +0000 (16:05 +0000)]
mac80211: Fix bug in getting rx status for frames pending in reorder buffer

upstream commit: b3631286aca3f54427ca0eb950981e9753866f6c

Currently rx status for frames which are completed from reorder buffer
is taken from it's cb area which is not always right, cb is not holding
the rx status when driver uses mac80211's non-irq rx handler to pass it's
received frames. This results in dropping almost all frames from reorder
buffer when security is enabled by doing double decryption (first in hw,
second in sw because of wrong rx status). This patch copies rx status into
cb area before the frame is put into reorder buffer. After this patch,
there is a significant improvement in throughput with ath9k + WPA2(AES).

Signed-off-by: Vasanthakumar Thiagarajan <vasanth@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
14 years agoforcedeth: Fix resume from hibernation regression.
Ed Swierk [Tue, 7 Apr 2009 00:49:12 +0000 (17:49 -0700)]
forcedeth: Fix resume from hibernation regression.

upstream commit: 35a7433c789ba6df6d96b70fa745ae9e6cac0038

Reset phy state on resume, fixing a regression caused by powering down
the phy on hibernate.

Signed-off-by: Ed Swierk <eswierk@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Tvrtko Ursulin <tvrtko.ursulin@sophos.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agoLinux 2.6.29.2 v2.6.29.2
Chris Wright [Mon, 27 Apr 2009 17:37:11 +0000 (10:37 -0700)]
Linux 2.6.29.2

15 years agoBonding: fix zero address hole bug in arp_ip_target list
Brian Haley [Mon, 13 Apr 2009 07:11:30 +0000 (00:11 -0700)]
Bonding: fix zero address hole bug in arp_ip_target list

upstream commit: 5a31bec014449dc9ca994e4c1dbf2802b7ca458a

Fix a zero address hole bug in the bonding arp_ip_target list
that was causing the bond to ignore ARP replies (bugz 13006).
Instead of just setting the array entry to zero, we now
copy any additional entries down one slot, putting the
zero entry at the end.  With this change we can now have
all the loops that walk the array stop when they hit a zero
since there will be no addresses after it.

Changes are based in part on code fragment provided in kernel:
bugzilla 13006:

http://bugzilla.kernel.org/show_bug.cgi?id=13006

by Steve Howard <steve@astutenetworks.com>

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agoskge: fix occasional BUG during MTU change
Michal Schmidt [Tue, 14 Apr 2009 22:16:55 +0000 (15:16 -0700)]
skge: fix occasional BUG during MTU change

upstream commit: d119b3927994e3d620d6adb0dd1ea6bf24427875

The BUG_ON(skge->tx_ring.to_use != skge->tx_ring.to_clean) in skge_up()
was sometimes observed when setting MTU.

skge_down() disables the TX queue, but then reenables it by mistake via
skge_tx_clean().
Fix it by moving the waking of the queue from skge_tx_clean() to the
other caller. And to make sure start_xmit is not in progress on another
CPU, skge_down() should call netif_tx_disable().

The bug was reported to me by Jiri Jilek whose Debian system sometimes
failed to boot. He tested the patch and the bug did not happen anymore.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agoscsi: mpt: suppress debugobjects warning
Eric Paris [Tue, 21 Apr 2009 21:20:02 +0000 (21:20 +0000)]
scsi: mpt: suppress debugobjects warning

upstream commit: b298cecb3deddf76d60022473a57f1cb776cbdcd

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13133

ODEBUG: object is on stack, but not annotated
------------[ cut here ]------------
WARNING: at lib/debugobjects.c:253 __debug_object_init+0x1f3/0x276()
Hardware name: VMware Virtual Platform
Modules linked in: mptspi(+) mptscsih mptbase scsi_transport_spi ext3 jbd mbcache
Pid: 540, comm: insmod Not tainted 2.6.28-mm1 #2
Call Trace:
 [<c042c51c>] warn_slowpath+0x74/0x8a
 [<c0469600>] ? start_critical_timing+0x96/0xb7
 [<c060c8ea>] ? _spin_unlock_irqrestore+0x2f/0x3c
 [<c0446fad>] ? trace_hardirqs_off_caller+0x18/0xaf
 [<c044704f>] ? trace_hardirqs_off+0xb/0xd
 [<c060c8ea>] ? _spin_unlock_irqrestore+0x2f/0x3c
 [<c042cb84>] ? release_console_sem+0x1a5/0x1ad
 [<c05013e6>] __debug_object_init+0x1f3/0x276
 [<c0501494>] debug_object_init+0x13/0x17
 [<c0433c56>] init_timer+0x10/0x1a
 [<e08e5b54>] mpt_config+0x1c1/0x2b7 [mptbase]
 [<e08e3b82>] ? kmalloc+0x8/0xa [mptbase]
 [<e08e3b82>] ? kmalloc+0x8/0xa [mptbase]
 [<e08e6fa2>] mpt_do_ioc_recovery+0x950/0x1212 [mptbase]
 [<c04496c2>] ? __lock_acquire+0xa69/0xacc
 [<c060c8f1>] ? _spin_unlock_irqrestore+0x36/0x3c
 [<c060c3af>] ? _spin_unlock_irq+0x22/0x26
 [<c04f2d8b>] ? string+0x2b/0x76
 [<c04f310e>] ? vsnprintf+0x338/0x7b3
 [<c04496c2>] ? __lock_acquire+0xa69/0xacc
 [<c060c8ea>] ? _spin_unlock_irqrestore+0x2f/0x3c
 [<c04496c2>] ? __lock_acquire+0xa69/0xacc
 [<c044897d>] ? debug_check_no_locks_freed+0xeb/0x105
 [<c060c8f1>] ? _spin_unlock_irqrestore+0x36/0x3c
 [<c04488bc>] ? debug_check_no_locks_freed+0x2a/0x105
 [<c0446b8c>] ? lock_release_holdtime+0x43/0x48
 [<c043f742>] ? up_read+0x16/0x29
 [<c05076f8>] ? pci_get_slot+0x66/0x72
 [<e08e89ca>] mpt_attach+0x881/0x9b1 [mptbase]
 [<e091c8e5>] mptspi_probe+0x11/0x354 [mptspi]

Noticing that every caller of mpt_config has its CONFIGPARMS struct
declared on the stack and thus the &pCfg->timer is always on the stack I
changed init_timer() to init_timer_on_stack() and it seems to have shut
up.....

Cc: "Moore, Eric Dean" <Eric.Moore@lsil.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: "Desai, Kashyap" <Kashyap.Desai@lsi.com>
Cc: <stable@kernel.org> [2.6.29.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agohugetlbfs: return negative error code for bad mount option
Akinobu Mita [Tue, 21 Apr 2009 21:20:04 +0000 (21:20 +0000)]
hugetlbfs: return negative error code for bad mount option

upstream commit: c12ddba09394c60e1120e6997794fa6ed52da884

This fixes the following BUG:

  # mount -o size=MM -t hugetlbfs none /huge
  hugetlbfs: Bad value 'MM' for mount option 'size=MM'
  ------------[ cut here ]------------
  kernel BUG at fs/super.c:996!

Due to

BUG_ON(!mnt->mnt_sb);

in vfs_kern_mount().

Also, remove unused #include <linux/quotaops.h>

Cc: William Irwin <wli@holomorphy.com>
Cc: <stable@kernel.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agoNFS: Fix the XDR iovec calculation in nfs3_xdr_setaclargs
Trond Myklebust [Tue, 21 Apr 2009 21:20:08 +0000 (21:20 +0000)]
NFS: Fix the XDR iovec calculation in nfs3_xdr_setaclargs

upstream commit: 8340437210390676f687633a80e3748c40885dc8

Commit ae46141ff08f1965b17c531b571953c39ce8b9e2 (NFSv3: Fix posix ACL code)
introduces a bug in the calculation of the XDR header iovec. In the case
where we are inlining the acls, we need to adjust the length of the iovec
req->rq_svec, in addition to adjusting the total buffer length.

Tested-by: Leonardo Chiquitto <leonardo.lists@gmail.com>
Tested-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agogso: Fix support for linear packets
Herbert Xu [Tue, 21 Apr 2009 11:31:50 +0000 (04:31 -0700)]
gso: Fix support for linear packets

upstream commit: 2f181855a0b3c2b39314944add7b41c15647cf86

When GRO/frag_list support was added to GSO, I made an error
which broke the support for segmenting linear GSO packets (GSO
packets are normally non-linear in the payload).

These days most of these packets are constructed by the tun
driver, which prefers to allocate linear memory if possible.
This is fixed in the latest kernel, but for 2.6.29 and earlier
it is still the norm.

Therefore this bug causes failures with GSO when used with tun
in 2.6.29.

Reported-by: James Huang <jamesclhuang@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agoagp: zero pages before sending to userspace
Shaohua Li [Mon, 20 Apr 2009 00:08:35 +0000 (10:08 +1000)]
agp: zero pages before sending to userspace

upstream commit: 59de2bebabc5027f93df999d59cc65df591c3e6e

CVE-2009-1192

AGP pages might be mapped into userspace finally, so the pages should be
set to zero before userspace can use it. Otherwise there is potential
information leakage.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agovirtio: fix suspend when using virtio_balloon
Marcelo Tosatti [Sun, 19 Apr 2009 18:05:04 +0000 (18:05 +0000)]
virtio: fix suspend when using virtio_balloon

upstream commit: 84a139a985300901dfad99bd93c7345d180af860

Break out of wait_event_interruptible() if freezing has been requested,
in the vballoon thread. Without this change vballoon refuses to stop and
the system can't suspend.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agoRevert "console ASCII glyph 1:1 mapping"
Samuel Thibault [Sun, 19 Apr 2009 18:05:02 +0000 (18:05 +0000)]
Revert "console ASCII glyph 1:1 mapping"

upstream commit: c0b7988200a82290287c6f4cd49585007f73175a

This reverts commit 1c55f18717304100a5f624c923f7cb6511b4116d.

Ingo Brueckl was assuming that reverting to 1:1 mapping for chars >= 128
was not useful, but it happens to be: due to the limitations of the
Linux console, when a blind user wants to read BIG5 on it, he has no
other way than loading a font without SFM and let the 1:1 mapping permit
the screen reader to get the BIG5 encoding.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agoInput: gameport - fix attach driver code
Dmitry Torokhov [Mon, 13 Apr 2009 22:27:49 +0000 (15:27 -0700)]
Input: gameport - fix attach driver code

upstream commit: 4ced8e7cb990a2c3bbf0ac7f27b35c890e7ce895

The commit 6902c0bead4ce266226fc0c5b3828b850bdc884a that moved
driver registration out of kgameportd thread was incomplete and
did not add the code necessary to actually attach driver to
already registered devices, rectify that.

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agox86, PAT: Remove page granularity tracking for vm_insert_pfn maps
Pallipadi, Venkatesh [Sat, 18 Apr 2009 09:08:04 +0000 (11:08 +0200)]
x86, PAT: Remove page granularity tracking for vm_insert_pfn maps

upstream commit: 4b065046273afa01ec8e3de7da407e8d3599251d

This change resolves the problem of too many single page entries
in pat_memtype_list and "freeing invalid memtype" errors with i915,
reported here:

  http://marc.info/?l=linux-kernel&m=123845244713183&w=2

Remove page level granularity track and untrack of vm_insert_pfn.
memtype tracking at page granularity does not scale and cleaner
approach would be for the driver to request a type for a bigger
IO address range or PCI io memory range for that device, either at
mmap time or driver init time and just use that type during
vm_insert_pfn.

This patch just removes the track/untrack of vm_insert_pfn. That
means we will be in same state as 2.6.28, with respect to these APIs.

Newer APIs for the drivers to request a memtype for a bigger region
is coming soon.

[ Impact: fix Xorg startup warnings and hangs ]

Reported-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com>
Tested-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
LKML-Reference: <20090408223716.GC3493@linux-os.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agoKVM: is_long_mode() should check for EFER.LMA
Amit Shah [Fri, 17 Apr 2009 22:40:13 +0000 (19:40 -0300)]
KVM: is_long_mode() should check for EFER.LMA

upstream commit: 41d6af119206e98764b4ae6d264d63acefcf851e

is_long_mode currently checks the LongModeEnable bit in
EFER instead of the LongModeActive bit. This is wrong, but
we survived this till now since it wasn't triggered. This
breaks guests that go from long mode to compatibility mode.

This is noticed on a solaris guest and fixes bug #1842160

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agoKVM: VMX: Update necessary state when guest enters long mode
Amit Shah [Fri, 17 Apr 2009 22:40:12 +0000 (19:40 -0300)]
KVM: VMX: Update necessary state when guest enters long mode

upstream commit: 401d10dee083bda281f2fdcdf654080313ba30ec

setup_msrs() should be called when entering long mode to save the
shadow state for the 64-bit guest state.

Using vmx_set_efer() in enter_lmode() removes some duplicated code
and also ensures we call setup_msrs(). We can safely pass the value
of shadow_efer to vmx_set_efer() as no other bits in the efer change
while enabling long mode (guest first sets EFER.LME, then sets CR0.PG
which causes a vmexit where we activate long mode).

With this fix, is_long_mode() can check for EFER.LMA set instead of
EFER.LME and 5e23049e86dd298b72e206b420513dbc3a240cd9 can be reverted.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agoKVM: fix kvm_vm_ioctl_deassign_device
Weidong Han [Fri, 17 Apr 2009 22:40:11 +0000 (19:40 -0300)]
KVM: fix kvm_vm_ioctl_deassign_device

upstream commit: 4a906e49f103c2e544148a209ba1db316510799f

only need to set assigned_dev_id for deassignment, use
match->flags to judge and deassign it.

Acked-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agoKVM: MMU: handle compound pages in kvm_is_mmio_pfn
Joerg Roedel [Fri, 17 Apr 2009 22:40:10 +0000 (19:40 -0300)]
KVM: MMU: handle compound pages in kvm_is_mmio_pfn

upstream commit: fc5659c8c6b6c4e02ac354b369017c1bf231f347

The function kvm_is_mmio_pfn is called before put_page is called on a
page by KVM. This is a problem when when this function is called on some
struct page which is part of a compund page. It does not test the
reserved flag of the compound page but of the struct page within the
compount page. This is a problem when KVM works with hugepages allocated
at boot time. These pages have the reserved bit set in all tail pages.
Only the flag in the compount head is cleared. KVM would not put such a
page which results in a memory leak.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agoKVM: Reset PIT irq injection logic when the PIT IRQ is unmasked
Avi Kivity [Fri, 17 Apr 2009 22:40:09 +0000 (19:40 -0300)]
KVM: Reset PIT irq injection logic when the PIT IRQ is unmasked

upstream commit: 4780c65904f0fc4e312ee2da9383eacbe04e61ea

While the PIT is masked the guest cannot ack the irq, so the reinject logic
will never allow the interrupt to be injected.

Fix by resetting the reinjection counters on unmask.

Unbreaks Xen.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agoKVM: Interrupt mask notifiers for ioapic
Avi Kivity [Fri, 17 Apr 2009 22:40:08 +0000 (19:40 -0300)]
KVM: Interrupt mask notifiers for ioapic

upstream commit: 75858a84a6207f5e60196f6bbd18fde4250e5759

Allow clients to request notifications when the guest masks or unmasks a
particular irq line.  This complements irq ack notifications, as the guest
will not ack an irq line that is masked.

Currently implemented for the ioapic only.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agoKVM: Add CONFIG_HAVE_KVM_IRQCHIP
Avi Kivity [Fri, 17 Apr 2009 22:40:07 +0000 (19:40 -0300)]
KVM: Add CONFIG_HAVE_KVM_IRQCHIP

upstream commit: 5d9b8e30f543a9f21a968a4cda71e8f6d1c66a61

Two KVM archs support irqchips and two don't.  Add a Kconfig item to
make selecting between the two models easier.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agoKVM: Fix missing smp tlb flush in invlpg
Andrea Arcangeli [Fri, 17 Apr 2009 22:40:06 +0000 (19:40 -0300)]
KVM: Fix missing smp tlb flush in invlpg

upstream commit: 4539b35881ae9664b0e2953438dd83f5ee02c0b4

When kvm emulates an invlpg instruction, it can drop a shadow pte, but
leaves the guest tlbs intact.  This can cause memory corruption when
swapping out.

Without this the other cpu can still write to a freed host physical page.
tlb smp flush must happen if rmap_remove is called always before mmu_lock
is released because the VM will take the mmu_lock before it can finally add
the page to the freelist after swapout. mmu notifier makes it safe to flush
the tlb after freeing the page (otherwise it would never be safe) so we can do
a single flush for multiple sptes invalidated.

Cc: stable@kernel.org
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
[mtosatti: backport to 2.6.29]
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agoUSB: usb-storage: augment unusual_devs entry for Simple Tech/Datafab
Alan Stern [Fri, 17 Apr 2009 21:20:03 +0000 (21:20 +0000)]
USB: usb-storage: augment unusual_devs entry for Simple Tech/Datafab

upstream commit: e4813eec8d47c8299d968bd5349dc881fa481c26

This patch (as1227) adds the MAX_SECTORS_64 flag to the unusual_devs
entry for the Simple Tech/Datafab controller.  This fixes Bugzilla
#12882.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: binbin <binbinsh@gmail.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agoUSB: fix oops in cdc-wdm in case of malformed descriptors
Oliver Neukum [Fri, 17 Apr 2009 21:20:06 +0000 (21:20 +0000)]
USB: fix oops in cdc-wdm in case of malformed descriptors

upstream commit: e13c594f3a1fc2c78e7a20d1a07974f71e4b448f

cdc-wdm needs to ignore extremely malformed descriptors.

Signed-off-by: Oliver Neukum <oliver@neukum.org>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agoUSB: ftdi_sio: add vendor/project id for JETI specbos 1201 spectrometer
Peter Korsgaard [Fri, 17 Apr 2009 21:20:07 +0000 (21:20 +0000)]
USB: ftdi_sio: add vendor/project id for JETI specbos 1201 spectrometer

upstream commit: ae27d84351f1f3568118318a8c40ff3a154bd629

Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agousb gadget: fix ethernet link reports to ethtool
Jonathan McDowell [Fri, 17 Apr 2009 21:20:10 +0000 (21:20 +0000)]
usb gadget: fix ethernet link reports to ethtool

upstream commit: 237e75bf1e558f7330f8deb167fa3116405bef2c

The g_ether USB gadget driver currently decides whether or not there's a
link to report back for eth_get_link based on if the USB link speed is
set. The USB gadget speed is however often set even before the device is
enumerated. It seems more sensible to only report a "link" if we're
actually connected to a host that wants to talk to us. The patch below
does this for me - tested with the PXA27x UDC driver.

Signed-off-by: Jonathan McDowell <noodles@earth.li>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agox86: disable X86_PTRACE_BTS for now
Ingo Molnar [Wed, 15 Apr 2009 21:15:14 +0000 (23:15 +0200)]
x86: disable X86_PTRACE_BTS for now

upstream commit: d45b41ae8da0f54aec0eebcc6f893ba5f22a1e8e

Oleg Nesterov found a couple of races in the ptrace-bts code
and fixes are queued up for it but they did not get ready in time
for the merge window. We'll merge them in v2.6.31 - until then
mark the feature as CONFIG_BROKEN. There's no user-space yet
making use of this so it's not a big issue.

Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[chrisw: trivial 2.6.29 backport]
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agoSCSI: sg: fix q->queue_lock on scsi_error_handler path
FUJITA Tomonori [Mon, 6 Apr 2009 20:55:06 +0000 (20:55 +0000)]
SCSI: sg: fix q->queue_lock on scsi_error_handler path

upstream commit: 015640edb1f346e0b2eda703587c4cd1c310ec1d

sg_rq_end_io() is called via rq->end_io. In some rare cases,
sg_rq_end_io calls blk_put_request/blk_rq_unmap_user (when a program
issuing a command has gone before the command completion; e.g. by
interrupting a program issuing a command before the command
completes).

We can't call blk_put_request/blk_rq_unmap_user in interrupt so the
commit c96952ed7031e7c576ecf90cf95b8ec099d5295a uses
execute_in_process_context().

The problem is that scsi_error_handler() calls rq->end_io too. We
can't call blk_put_request/blk_rq_unmap_user too in this path (we hold
q->queue_lock).

To avoid the above problem, in these rare cases, this patch always
uses schedule_work() instead of execute_in_process_context().

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agoSCSI: sg: avoid blk_put_request/blk_rq_unmap_user in interrupt
FUJITA Tomonori [Wed, 4 Feb 2009 02:36:27 +0000 (11:36 +0900)]
SCSI: sg: avoid blk_put_request/blk_rq_unmap_user in interrupt

upstream commit: c96952ed7031e7c576ecf90cf95b8ec099d5295a

This fixes the following oops:

http://marc.info/?l=linux-kernel&m=123316111415677&w=2

You can reproduce this bug by interrupting a program before a sg
response completes. This leads to the special sg state (the orphan
state), then sg calls blk_put_request in interrupt (rq->end_io).

The above bug report shows the recursive lock problem because sg calls
blk_put_request in interrupt. We could call __blk_put_request here
instead however we also need to handle blk_rq_unmap_user here, which
can't be called in interrupt too.

In the orphan state, we don't need to care about the data transfer
(the program revoked the command) so adding 'just free the resource'
mode to blk_rq_unmap_user is a possible option.

I prefer to avoid complicating the blk mapping API when possible. I
change the orphan state to call sg_finish_rem_req via
execute_in_process_context. We hold sg_fd->kref so sg_fd doesn't go
away until keventd_wq finishes our work. copy_from_user/to_user fails
so blk_rq_unmap_user just frees the resource without the data
transfer.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agoSCSI: sg: fix races with ioctl(SG_IO)
Tony Battersby [Tue, 20 Jan 2009 22:00:09 +0000 (17:00 -0500)]
SCSI: sg: fix races with ioctl(SG_IO)

upstream commit: a2dd3b4cea335713b58996bb07b3abcde1175f47

sg_io_owned needs to be set before the command is sent to the midlevel;
otherwise, a quickly-completing command may cause a different CPU
to see "srp->done == 1 && !srp->sg_io_owned", which would lead to
incorrect behavior.

Check srp->done and set srp->orphan while holding rq_list_lock to
prevent races with sg_rq_end_io().

There is no need to check sfp->closed from read/write/ioctl/poll/etc.
since the kernel guarantees that this won't happen.

The usefulness of sg_srp_done() was questionable before; now it is
definitely not needed.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agoSCSI: sg: fix races during device removal
Tony Battersby [Wed, 21 Jan 2009 19:45:50 +0000 (14:45 -0500)]
SCSI: sg: fix races during device removal

upstream commit: c6517b7942fad663cc1cf3235cbe4207cf769332

sg has the following problems related to device removal:

* opening a sg fd races with removing a device
* closing a sg fd races with removing a device
* /proc/scsi/sg/* access races with removing a device
* command completion races with removing a device
* command completion races with closing a sg fd
* can rmmod sg with active commands

These problems can cause kernel oopses, memory-use-after-free, or
double-free errors.  This patch fixes these problems by using krefs
to manage the lifetime of sg_device and sg_fd.

Each command submitted to the midlevel holds a reference to sg_fd
until the completion callback.  This ensures that sg_fd doesn't go
away if the fd is closed with commands still outstanding.

sg_fd gets the reference of sg_device (with scsi_device) and also
makes sure that the sg module doesn't go away.

/proc/scsi/sg/* functions don't play nicely with krefs because they
give information about sg_fds which have been closed but not yet
freed due to still having outstanding commands and sg_devices which
have been removed but not yet freed due to still being referenced
by one or more sg_fds.  To deal with this safely without removing
functionality, /proc functions now access sg_device and sg_fd while
holding a lock instead of using kref_get()/kref_put().

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
[chrisw: big for -stable, helps fix real bug, and made it through rc2 upstream]
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agomm: pass correct mm when growing stack
Hugh Dickins [Thu, 16 Apr 2009 21:45:05 +0000 (21:45 +0000)]
mm: pass correct mm when growing stack

upstream commit: 05fa199d45c54a9bda7aa3ae6537253d6f097aa9

Tetsuo Handa reports seeing the WARN_ON(current->mm == NULL) in
security_vm_enough_memory(), when do_execve() is touching the
target mm's stack, to set up its args and environment.

Yes, a UMH_NO_WAIT or UMH_WAIT_PROC call_usermodehelper() spawns
an mm-less kernel thread to do the exec.  And in any case, that
vm_enough_memory check when growing stack ought to be done on the
target mm, not on the execer's mm (though apart from the warning,
it only makes a slight tweak to OVERCOMMIT_NEVER behaviour).

Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agopata_hpt37x: fix HPT370 DMA timeouts
Sergei Shtylyov [Tue, 14 Apr 2009 14:39:14 +0000 (18:39 +0400)]
pata_hpt37x: fix HPT370 DMA timeouts

upstream commit: 265b7215aed36941620b65ecfff516200fb190c1

The libata driver has copied the code from the IDE driver which caused a post
2.4.18 regression on many HPT370[A] chips -- DMA stopped to work completely,
only causing timeouts.  Now remove hpt370_bmdma_start() for good...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agohpt366: fix HPT370 DMA timeouts
Sergei Shtylyov [Sat, 18 Apr 2009 15:42:19 +0000 (17:42 +0200)]
hpt366: fix HPT370 DMA timeouts

upstream commit: c018f1ee5cf81e58b93d9e93a2ee39cad13dc1ac

The big driver change in 2.4.19-rc1 introduced a regression for many HPT370[A]
chips -- DMA stopped to work completely, only causing endless timeouts...

The culprit has been identified (at last!): it turned to be the code resetting
the DMA state machine before each transfer. Stop doing it now as this counter-
measure has clearly caused more harm than good.

This should fix the kernel.org bug #7703.

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agopowerpc: Fix data-corrupting bug in __futex_atomic_op
Paul Mackerras [Wed, 15 Apr 2009 17:25:05 +0000 (17:25 +0000)]
powerpc: Fix data-corrupting bug in __futex_atomic_op

upstream commit: 306a82881b14d950d59e0b59a55093a07d82aa9a

Richard Henderson pointed out that the powerpc __futex_atomic_op has a
bug: it will write the wrong value if the stwcx. fails and it has to
retry the lwarx/stwcx. loop, since 'oparg' will have been overwritten
by the result from the first time around the loop.  This happens
because it uses the same register for 'oparg' (an input) as it uses
for the result.

This fixes it by using separate registers for 'oparg' and 'ret'.

Cc: stable@kernel.org
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agoALSA: hda - Fix the cmd cache keys for amp verbs
Takashi Iwai [Wed, 15 Apr 2009 17:25:03 +0000 (17:25 +0000)]
ALSA: hda - Fix the cmd cache keys for amp verbs

upstream commit: fcad94a4c71c36a05f4d5c6dcb174534b4e0b136

Fix the key value generation for get/set amp verbs.  The upper bits of
the parameter have to be combined with the verb value to be unique for
each direction/index of amp access.

This fixes the resume problem on some hardwares like Macbook after
the channel mode is changed.

Tested-by: Johannes Berg <johannes@sipsolutions.net>
Cc: <stable@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agosfc: Match calls to netif_napi_add() and netif_napi_del()
Ben Hutchings [Wed, 15 Apr 2009 00:39:03 +0000 (01:39 +0100)]
sfc: Match calls to netif_napi_add() and netif_napi_del()

upstream commit: 718cff1eec595ce6ab0635b8160a51ee37d9268d

sfc could call netif_napi_add() multiple times for the same
napi_struct, corrupting the list of napi_structs for the associated
device and leading to a busy-loop on device removal.  Move the call to
netif_napi_add() and add a call to netif_napi_del() in the obvious
places.

[bhutchings: backport to 2.6.29]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agotty: Fix leak in ti-usb
Alan Cox [Tue, 14 Apr 2009 13:58:11 +0000 (14:58 +0100)]
tty: Fix leak in ti-usb

upstream commit: cf5450930db0ae308584e5361f3345e0ff73e643

If the ti-usb adapter returns an zero data length frame (which happens)
then we leak a kref.  Found by Christoph Mair <christoph.mair@gmail.com>
who proposed a patch.  The patch here is different as Christoph's patch
didn't work for the case where tty = NULL and data arrived but Christoph
did all the hard work chasing it down.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agospi: spi_write_then_read() bugfixes
David Brownell [Mon, 13 Apr 2009 22:35:03 +0000 (22:35 +0000)]
spi: spi_write_then_read() bugfixes

upstream commit: bdff549ebeff92b1a6952e5501caf16a6f8898c8

The "simplify spi_write_then_read()" patch included two regressions from
the 2.6.27 behaviors:

 - The data it wrote out during the (full duplex) read side
   of the transfer was not zeroed.

 - It fails completely on half duplex hardware, such as
   Microwire and most "3-wire" SPI variants.

So, revert that patch.  A revised version should be submitted at some
point, which can get the speedup on standard hardware (full duplex)
without breaking on less-capable half-duplex stuff.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: <stable@kernel.org> [2.6.28.x, 2.6.29.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agoadd some long-missing capabilities to fs_mask
Serge E. Hallyn [Mon, 13 Apr 2009 17:25:03 +0000 (17:25 +0000)]
add some long-missing capabilities to fs_mask

upstream commit: 0ad30b8fd5fe798aae80df6344b415d8309342cc

When POSIX capabilities were introduced during the 2.1 Linux
cycle, the fs mask, which represents the capabilities which having
fsuid==0 is supposed to grant, did not include CAP_MKNOD and
CAP_LINUX_IMMUTABLE.  However, before capabilities the privilege
to call these did in fact depend upon fsuid==0.

This patch introduces those capabilities into the fsmask,
restoring the old behavior.

See the thread starting at http://lkml.org/lkml/2009/3/11/157 for
reference.

Note that if this fix is deemed valid, then earlier kernel versions (2.4
and 2.2) ought to be fixed too.

Changelog:
[Mar 23] Actually delete old CAP_FS_SET definition...
[Mar 20] Updated against J. Bruce Fields's patch

Reported-by: Igor Zhbanov <izh1979@gmail.com>
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: stable@kernel.org
Cc: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agohrtimer: fix rq->lock inversion (again)
Peter Zijlstra [Fri, 13 Mar 2009 11:21:27 +0000 (12:21 +0100)]
hrtimer: fix rq->lock inversion (again)

upstream commit: 7f1e2ca9f04b02794597f60e7b1d43f0a1317939

It appears I inadvertly introduced rq->lock recursion to the
hrtimer_start() path when I delegated running already expired
timers to softirq context.

This patch fixes it by introducing a __hrtimer_start_range_ns()
method that will not use raise_softirq_irqoff() but
__raise_softirq_irqoff() which avoids the wakeup.

It then also changes schedule() to check for pending softirqs and
do the wakeup then, I'm not quite sure I like this last bit, nor
am I convinced its really needed.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
LKML-Reference: <20090313112301.096138802@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agox86: fix broken irq migration logic while cleaning up multiple vectors
Suresh Siddha [Thu, 9 Apr 2009 22:49:41 +0000 (15:49 -0700)]
x86: fix broken irq migration logic while cleaning up multiple vectors

upstream commit: 68a8ca593fac82e336a792226272455901fa83df

Impact: fix spurious IRQs

During irq migration, we send a low priority interrupt to the previous
irq destination. This happens in non interrupt-remapping case after interrupt
starts arriving at new destination and in interrupt-remapping case after
modifying and flushing the interrupt-remapping table entry caches.

This low priority irq cleanup handler can cleanup multiple vectors, as
multiple irq's can be migrated at almost the same time. While
there will be multiple invocations of irq cleanup handler (one cleanup
IPI for each irq migration), first invocation of the cleanup handler
can potentially cleanup more than one vector (as the first invocation can
see the requests for more than vector cleanup). When we cleanup multiple
vectors during the first invocation of the smp_irq_move_cleanup_interrupt(),
other vectors that are to be cleanedup can still be pending in the local
cpu's IRR (as smp_irq_move_cleanup_interrupt() runs with interrupts disabled).

When we are ready to unhook a vector corresponding to an irq, check if that
vector is registered in the local cpu's IRR. If so skip that cleanup and
do a self IPI with the cleanup vector, so that we give a chance to
service the pending vector interrupt and then cleanup that vector
allocation once we execute the lowest priority handler.

This fixes spurious interrupts seen when migrating multiple vectors
at the same time.

[ This is apparently possible even on conventional xapic, although to
  the best of our knowledge it has never been seen.  The stable
  maintainers may wish to consider this one for -stable. ]

[suresh: backport to 2.6.29]
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: stable@kernel.org
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
15 years agosched: do not count frozen tasks toward load
Nathan Lynch [Thu, 9 Apr 2009 18:20:02 +0000 (18:20 +0000)]
sched: do not count frozen tasks toward load

upstream commit: e3c8ca8336707062f3f7cb1cd7e6b3c753baccdd

Freezing tasks via the cgroup freezer causes the load average to climb
because the freezer's current implementation puts frozen tasks in
uninterruptible sleep (D state).

Some applications which perform job-scheduling functions consult the
load average when making decisions.  If a cgroup is frozen, the load
average does not provide a useful measure of the system's utilization
to such applications.  This is especially inconvenient if the job
scheduler employs the cgroup freezer as a mechanism for preempting low
priority jobs.  Contrast this with using SIGSTOP for the same purpose:
the stopped tasks do not count toward system load.

Change task_contributes_to_load() to return false if the task is
frozen.  This results in /proc/loadavg behavior that better meets
users' expectations.

Signed-off-by: Nathan Lynch <ntl@pobox.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Nigel Cunningham <nigel@tuxonice.net>
Tested-by: Nigel Cunningham <nigel@tuxonice.net>
Cc: <stable@kernel.org>
Cc: containers@lists.linux-foundation.org
Cc: linux-pm@lists.linux-foundation.org
Cc: Matt Helsley <matthltc@us.ibm.com>
LKML-Reference: <20090408194512.47a99b95@manatee.lan>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>