The PKT_CTRL_CMD_STATUS device ioctl retrieves a pointer to a
pktcdvd_device from the global pkt_devs array. The index into this
array is provided directly by the user and is a signed integer, so the
comparison to ensure that it falls within the bounds of this array will
fail when provided with a negative index.
This can be used to read arbitrary kernel memory or cause a crash due to
an invalid pointer dereference. This can be exploited by users with
permission to open /dev/pktcdvd/control (on many distributions, this is
readable by group "cdrom").
Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
[ Rather than add a cast, just make the function take the right type -Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andi Kleen <ak@linux.intel.com>
The other pipe functions do not need to use the 'careful' version, since
they are only ever called for things that are already known to be pipes.
The normal read/write/ioctl functions are called through the file
operations structures, so if a file isn't a pipe, they'd never get
called. But pipe_fcntl() is special, and called directly from the
generic fcntl code, and needs to use the same careful function that the
splice code is using.
Cc: Jens Axboe <jaxboe@fusionio.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
.. and change it to take the 'file' pointer instead of an inode, since
that's what all users want anyway.
The renaming is preparatory to exporting it to other users. The old
'pipe_info()' name was too generic and is already used elsewhere, so
before making the function public we need to use a more specific name.
Cc: Jens Axboe <jaxboe@fusionio.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
On each machine check all registers are revalidated. The save area for
the clock comparator however only contains the upper most seven bytes
of the former contents, if valid.
Therefore the machine check handler uses a store clock instruction to
get the current time and writes that to the clock comparator register
which in turn will generate an immediate timer interrupt.
However within the lowcore the expected time of the next timer
interrupt is stored. If the interrupt happens before that time the
handler won't be called. In turn the clock comparator won't be
reprogrammed and therefore the interrupt condition stays pending which
causes an interrupt loop until the expected time is reached.
On NOHZ machines this can result in unresponsive machines since the
time of the next expected interrupted can be a couple of days in the
future.
To fix this just revalidate the clock comparator register with the
expected value.
In addition the special handling for udelay must be changed as well.
If r8196 received packets with invalid sctp/igmp(not tcp, udp) checksum, r8196 set skb->ip_summed
wit CHECKSUM_UNNECESSARY. This cause that upper protocol don't check checksum field.
I am not family with r8196 driver. I try to guess the meaning of RxProtoIP and IPFail.
RxProtoIP stands for received IPv4 packet that upper protocol is not tcp and udp.
!(opts1 & IPFail) is true means that driver correctly to check checksum in IPv4 header.
If it's right, I think we should not set ip_summed wit CHECKSUM_UNNECESSARY for my sctp packets
with invalid checksum.
If it's not right, please tell me.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
While porting GRO to r8169, I found this driver has a bug in its rx
path.
All skbs given to network stack had their ip_summed set to
CHECKSUM_NONE, while hardware said they had correct TCP/UDP checksums.
The reason is driver sets skb->ip_summed on the original skb before the
copy eventually done by copybreak. The fresh skb gets the ip_summed =
CHECKSUM_NONE value, forcing network stack to recompute checksum, and
preventing my GRO patch to work.
Fix is to make the ip_summed setting after skb copy.
Note : rx_copybreak current value is 16383, so all frames are copied...
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
When operating in a mode that initiates communication and using
HT40 we should fail if we cannot use both primary and secondary
channels to initiate communication. Our current ht40 allowmap
only covers STA mode of operation, for beaconing modes we need
a check on the fly as the mode of operation is dynamic and
there other flags other than disable which we should read
to check if we can initiate communication.
Do not allow for initiating communication if our secondary HT40
channel has is either disabled, has a passive scan flag, a
no-ibss flag or is a radar channel. Userspace now has similar
checks but this is also needed in-kernel.
Reported-by: Jouni Malinen <jouni.malinen@atheros.com> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
In rds_cmsg_rdma_args(), the user-provided args->nr_local value is
restricted to less than UINT_MAX. This seems to need a tighter upper
bound, since the calculation of total iov_size can overflow, resulting
in a small sock_kmalloc() allocation. This would probably just result
in walking off the heap and crashing when calling rds_rdma_pages() with
a high count value. If it somehow doesn't crash here, then memory
corruption could occur soon after.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
Don't declare variable sized array of iovecs on the stack since this
could cause stack overflow if msg->msgiovlen is large. Instead, coalesce
the user-supplied data into a new buffer and use a single iovec for it.
Signed-off-by: Phil Blundell <philb@gnu.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
Add missing check for capable(CAP_NET_ADMIN) in SIOCSIFADDR operation.
Signed-off-by: Phil Blundell <philb@gnu.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
Later parts of econet_sendmsg() rely on saddr != NULL, so return early
with EINVAL if NULL was passed otherwise an oops may occur.
Signed-off-by: Phil Blundell <philb@gnu.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
On certain VIA chipsets AES-CBC requires the input/output to be
a multiple of 64 bytes. We had a workaround for this but it was
buggy as it sent the whole input for processing when it is meant
to only send the initial number of blocks which makes the rest
a multiple of 64 bytes.
As expected this causes memory corruption whenever the workaround
kicks in.
Reported-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
On parsing malformed X.25 facilities, decrementing the remaining length
may cause it to underflow. Since the length is an unsigned integer,
this will result in the loop continuing until the kernel crashes.
This patch adds checks to ensure decrementing the remaining length does
not cause it to wrap around.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reported-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Andi Kleen <ak@linux.intel.com> CC: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There is a possibility malicious users can get limited information about
uninitialized stack mem array. Even if sk_run_filter() result is bound
to packet length (0 .. 65535), we could imagine this can be used by
hostile user.
Initializing mem[] array, like Dan Rosenberg suggested in his patch is
expensive since most filters dont even use this array.
Its hard to make the filter validation in sk_chk_filter(), because of
the jumps. This might be done later.
In this patch, I use a bitmap (a single long var) so that only filters
using mem[] loads/stores pay the price of added security checks.
For other filters, additional cost is a single instruction.
[ Since we access fentry->k a lot now, cache it in a local variable
and mark filter entry pointer as const. -DaveM ]
Reported-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
Gcc is currenlty not in the ability to optimize the switch statement in
sk_run_filter() because of dense case labels. This patch replace the
OR'd labels with ordered sequenced case labels. The sk_chk_filter()
function is modified to patch/replace the original OPCODES in a
ordered but equivalent form. gcc is now in the ability to transform the
switch statement in sk_run_filter into a jump table of complexity O(1).
Until this patch gcc generates a sequence of conditional branches (O(n) of 567
byte .text segment size (arch x86_64):
Furthermore, I reordered the instructions to reduce cache line misses by
order the most common instruction to the start.
[AK: Added as dependency on next patch] Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-of-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
Implement the suggested workaround for OMAP3 regarding to sDMA draining
issue, when the channel is disabled on the fly.
This errata affects the following configuration:
sDMA transfer is source synchronized
Buffering is enabled
SmartStandby is selected.
The issue can be easily reproduced by creating overrun situation while
recording audio.
Either introduce load to the CPU:
nice -19 arecord -D hw:0 -M -B 10000 -F 5000 -f dat > /dev/null & \
dd if=/dev/urandom of=/dev/null
or suspending the arecord, and resuming it:
arecord -D hw:0 -M -B 10000 -F 5000 -f dat > /dev/null
CTRL+Z; fg; CTRL+Z; fg; ...
In case of overrun audio stops DMA, and restarts it (without reseting
the sDMA channel). When we hit this errata in stop case (sDMA drain did
not complete), at the coming start the sDMA will not going to be
operational (it is still draining).
This leads to DMA stall condition.
On OMAP3 we can recover with sDMA channel reset, it has been observed
that by introducing unrelated sDMA activity might also help (reading
from MMC for example).
The same errata exists for OMAP2, where the suggestion is to disable the
buffering to avoid this type of error.
On OMAP3 the suggestion is to set sDMA to NoStandby before disabling
the channel, and wait for the drain to finish, than configure sDMA to
SmartStandby again.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@nokia.com> Acked-by: Jarkko Nikula <jhnikula@gmail.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by : Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by : Manjunath Kondaiah G <manjugk@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
An errata workaround for omap24xx is not setting the buffering disable bit
25 what is the purpose but channel enable bit 7 instead.
Background for this fix is the DMA stalling issue with ASoC omap-mcbsp
driver. Peter Ujfalusi <peter.ujfalusi@nokia.com> has found an issue in
recording that the DMA stall could happen if there were a buffer overrun
detected by ALSA and the DMA was stopped and restarted due that. This
problem is known to occur on both OMAP2420 and OMAP3. It can recover on
OMAP3 after dma free, dma request and reconfiguration cycle. However, on
OMAP2420 it seems that only way to recover is a reset.
Problem was not visible before the commit c12abc0. That commit changed that
the McBSP transmitter/receiver is released from reset only when needed. That
is, only enabled McBSP transmitter without transmission was able to prevent
this DMA stall problem in receiving side and underlying problem did not show
up until now. McBSP transmitter itself seems to no be reason since DMA
stall does not recover by enabling the transmission after stall.
Debugging showed that there were a DMA write active during DMA stop time and
it never completed even when restarting the DMA. Experimenting showed that
the DMA buffering disable bit could be used to avoid stalling when using
source synchronized transfers. However that could have performance hit and
OMAP3 TRM states that buffering disable is not allowed for destination
synchronized transfers so subsequent patch will implement a method to
complete DMA writes when stopping.
This patch is based on assumtion that complete lock-up on OMAP2420 is
different but related problem. I don't have access to OMAP2420 errata but
I believe this old workaround here is put for a reason but unfortunately
a wrong bit was typed and problem showed up only now.
Signed-off-by: Jarkko Nikula <jhnikula@gmail.com> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@nokia.com> Acked-by: Manjunath Kondaiah G <manjugk@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
Dmitry Torokhov [Thu, 4 Nov 2010 16:12:44 +0000 (09:12 -0700)]
Input: i8042 - add Sony VAIO VPCZ122GX to nomux list
[Note that the mainline will not have this particular fix but rather
will blacklist entire VAIO line based off DMI board name. For stable
I am being a bit more cautious and blacklist one particular product.]
Trying to query/activate active multiplexing mode on this VAIO makes
both keyboard and touchpad inoperable. Futher kernels will blacklist
entire VAIO line, however here we blacklist just one particular model.
This helps protect us from overflow issues down in the
individual protocol sendmsg/recvmsg handlers. Once
we hit INT_MAX we truncate out the rest of the iovec
by setting the iov_len members to zero.
This works because:
1) For SOCK_STREAM and SOCK_SEQPACKET sockets, partial
writes are allowed and the application will just continue
with another write to send the rest of the data.
2) For datagram oriented sockets, where there must be a
one-to-one correspondance between write() calls and
packets on the wire, INT_MAX is going to be far larger
than the packet size limit the protocol is going to
check for and signal with -EMSGSIZE.
Based upon a patch by Linus Torvalds.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
Since commit a1afb637(switch /proc/irq/*/spurious to seq_file) all
/proc/irq/XX/spurious files show the information of irq 0.
Current irq_spurious_proc_open() passes on NULL as the 3rd argument,
which is used as an IRQ number in irq_spurious_proc_show(), to the
single_open(). Because of this, all the /proc/irq/XX/spurious file
shows IRQ 0 information regardless of the IRQ number.
To fix the problem, irq_spurious_proc_open() must pass on the
appropreate data (IRQ number) to single_open().
This fixes the same problem as described in the patch "nohz: fix
printk_needs_cpu() return value on offline cpus" for the arch_needs_cpu()
primitive:
arch_needs_cpu() may return 1 if called on offline cpus. When a cpu gets
offlined it schedules the idle process which, before killing its own cpu,
will call tick_nohz_stop_sched_tick().
That function in turn will call arch_needs_cpu() in order to check if the
local tick can be disabled. On offline cpus this function should naturally
return 0 since regardless if the tick gets disabled or not the cpu will be
dead short after. That is besides the fact that __cpu_disable() should already
have made sure that no interrupts on the offlined cpu will be delivered anyway.
In this case it prevents tick_nohz_stop_sched_tick() to call
select_nohz_load_balancer(). No idea if that really is a problem. However what
made me debug this is that on 2.6.32 the function get_nohz_load_balancer() is
used within __mod_timer() to select a cpu on which a timer gets enqueued.
If arch_needs_cpu() returns 1 then the nohz_load_balancer cpu doesn't get
updated when a cpu gets offlined. It may contain the cpu number of an offline
cpu. In turn timers get enqueued on an offline cpu and not very surprisingly
they never expire and cause system hangs.
This has been observed 2.6.32 kernels. On current kernels __mod_timer() uses
get_nohz_timer_target() which doesn't have that problem. However there might
be other problems because of the too early exit tick_nohz_stop_sched_tick()
in case a cpu goes offline.
This specific bug was indrocuded with 3c5d92a0 "nohz: Introduce
arch_needs_cpu".
In this case a cpu hotplug notifier is used to fix the issue in order to keep
the normal/fast path small. All we need to do is to clear the condition that
makes arch_needs_cpu() return 1 since it is just a performance improvement
which is supposed to keep the local tick running for a short period if a cpu
goes idle. Nothing special needs to be done except for clearing the condition.
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
While looking for the duplicates in /sys/class/wmi/, I couldn't find
them. The code that looks for duplicates uses strncmp in a binary GUID,
which may contain zero bytes. The right function is memcmp, which is
also used in another section of wmi code.
It was finding 49142400-C6A3-40FA-BADB-8A2652834100 as a duplicate of 39142400-C6A3-40FA-BADB-8A2652834100. Since the first byte is the fourth
printed, they were found as equal by strncmp.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
There is a problem that swap pages allocated before the creation of
a hibernation image can be released and used for storing the contents
of different memory pages while the image is being saved. Since the
kernel stored in the image doesn't know of that, it causes memory
corruption to occur after resume from hibernation, especially on
systems with relatively small RAM that need to swap often.
This issue can be addressed by keeping the GFP_IOFS bits clear
in gfp_allowed_mask during the entire hibernation, including the
saving of the image, until the system is finally turned off or
the hibernation is aborted. Unfortunately, for this purpose
it's necessary to rework the way in which the hibernate and
suspend code manipulates gfp_allowed_mask.
This change is based on an earlier patch from Hugh Dickins.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reported-by: Ondrej Zary <linux@rainbow-software.org> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes a compilation issue when compiling PCMCIA SA1100
support as a module with PCMCIA_DEBUG enabled. The symbol
soc_pcmcia_debug was not beeing exported.
ARM: pcmcia: Fix for building DEBUG with sa11xx_base.c as a module.
This patch fixes a compilation issue when compiling PCMCIA SA1100
support as a module with PCMCIA_DEBUG enabled. The symbol
soc_pcmcia_debug was not beeing exported.
Signed-off-by: Marcelo Roberto Jimenez <mroberto@cpti.cetuc.puc-rio.br> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
It was found that sometimes children of tasks with inherited events had
one extra event. Eventually it turned out to be due to the list rotation
no being exclusive with the list iteration in the inheritance code.
Cure this by temporarily disabling the rotation while we inherit the events.
eth_type_trans tries to pull data with the length of the ethernet header
from the skb. We only ensured that enough data for the first ethernet
header and the batman header is available in non-paged memory of the skb
and not for the ethernet after the batman header.
eth_type_trans would fail sometimes with drivers which don't ensure that
all there data is perfectly linearised.
The failure was noticed through a kernel bug Oops generated by the
skb_pull inside eth_type_trans.
Reported-by: Rafal Lesniak <lesniak@eresi-project.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Sven Eckelmann <sven.eckelmann@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
They should be writable by root, not readable.
Doh, stupid me with the wrong flags.
Reported-by: Jonathan Cameron <jic23@cam.ac.uk> Acked-by: Jonathan Cameron <jic23@cam.ac.uk> Cc: Barry Song <Barry.Song@analog.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
Found that the nas_led_whitelist dmi_system_id structure array had no
NULL end delimiter, causing the dmi_check_system() loop to read an
undefined entry.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Dave Hansen <dave@sr71.net> Acked-by: Richard Purdie <rpurdie@linux.intel.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
The find_next_bit, find_first_bit, find_next_zero_bit
and find_first_zero_bit functions were not properly
clamping to the maxbit argument at the bit level. They
were instead only checking maxbit at the byte level.
To fix this, add a compare and a conditional move
instruction to the end of the common bit-within-the-
byte code used by all the functions and be sure not to
clobber the maxbit argument before it is used.
Reviewed-by: Nicolas Pitre <nicolas.pitre@linaro.org> Tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: James Jones <jajones@nvidia.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
Commit 8b592783 added a Thumb-2 variant of usracc which, when it is
called with \rept=2, calls usraccoff once with an offset of 0 and
secondly with a hard-coded offset of 4 in order to avoid incrementing
the pointer again. If \inc != 4 then we will store the data to the wrong
offset from \ptr. Luckily, the only caller that passes \rept=2 to this
function is __clear_user so we haven't been actively corrupting user data.
This patch fixes usracc to pass \inc instead of #4 to usraccoff
when it is called a second time.
Reported-by: Tony Thompson <tony.thompson@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
This comes from the fact that when USE_SPLIT_PTLOCKS is not defined,
the only lock protecting the page tables is mm->page_table_lock
which is already locked before update_mmu_cache() is called.
Signed-off-by: Mika Westerberg <mika.westerberg@iki.fi> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
As pointed out by Linus, commit dab5855 ("perf_counter: Add mmap event hooks to
mprotect()") is fundamentally wrong as mprotect_fixup() can free 'vma' due to
merging. Fix the problem by moving perf_event_mmap() hook to
mprotect_fixup().
Note: there's another successful return path from mprotect_fixup() if old
flags equal to new flags. We don't, however, need to call
perf_event_mmap() there because 'perf' already knows the VMA is
executable.
Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Analyzed-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A single uninitialized padding byte is leaked to userspace.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
MMC hosts that poll for card detection by defining the MMC_CAP_NEEDS_POLL
flag have a race on rmmod, where the delayed work is cancelled without
waiting for completed polling. To prevent this a _sync version of the work
cancellation has to be used.
When a single step exception fires, the trap bits, used to
signal hardware breakpoints, are in a random state.
These trap bits might be set if another exception will follow,
like a breakpoint in the next instruction, or a watchpoint in the
previous one. Or there can be any junk there.
So if we handle these trap bits during the single step exception,
we are going to handle an exception twice, or we are going to
handle junk.
Just ignore them in this case.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=21332
Reported-by: Michael Stefaniuc <mstefani@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Maciej Rutecki <maciej.rutecki@gmail.com> Cc: Alexandre Julliard <julliard@winehq.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
Depending on processor speed, page size, and the amount of memory a
process is allowed to amass, cleanup of a large VM may freeze the system
for many seconds. This can result in a watchdog timeout.
Make sure other tasks receive some service when cleaning up large VMs.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com> Cc: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
According to the comment describing ops_lock in the definition of struct
backlight_device and when comparing with other functions in backlight.c
the mutex must be hold when checking ops to be non-NULL.
Fixes a problem added by c835ee7f4154992e6 ("backlight: Add suspend/resume
support to the backlight core") in Jan 2009.
Disable the winch irq early to make sure we don't take an interrupt part
way through the freeing of the handler data, resulting in a crash on
shutdown:
Signed-off-by: Will Newton <will.newton@gmail.com> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
If a user manages to trigger an oops with fs set to KERNEL_DS, fs is not
otherwise reset before do_exit(). do_exit may later (via mm_release in
fork.c) do a put_user to a user-controlled address, potentially allowing
a user to leverage an oops into a controlled write into kernel memory.
This is only triggerable in the presence of another bug, but this
potentially turns a lot of DoS bugs into privilege escalations, so it's
worth fixing. I have proof-of-concept code which uses this bug along
with CVE-2010-3849 to write a zero to an arbitrary kernel address, so
I've tested that this is not theoretical.
A more logical place to put this fix might be when we know an oops has
occurred, before we call do_exit(), but that would involve changing
every architecture, in multiple places.
The attribute cache for a file was not being cleared when a file is opened
with O_TRUNC.
If the filesystem's open operation truncates the file ("atomic_o_trunc"
feature flag is set) then the kernel should invalidate the cached st_mtime
and st_ctime attributes.
Also i_size should be explicitly be set to zero as it is used sometimes
without refreshing the cache.
The entries for those cards are after the generic entries,
so they don't work, in practice. Moving them to happen before the
generic entres fix the issue.
If primary ID (HID) is invalid try locating first valid ID on compatible
ID list before giving up.
This helps, for example, to recognize i8042 AUX port on Sony Vaio VPCZ1
which uses SNYSYN0003 as HID. Without the patch users are forced to
boot with i8042.nopnp to make use of their touchpads.
Tested-by: Jan-Hendrik Zab <jan@jhz.name> Signed-off-by: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
According to the ACPI spec, some kinds of primary battery can
report percentage battery remaining capacity directly to OS.
In this case, it reports the LastFullChargedCapacity == 100,
BatteryPresentRate = 0xFFFFFFFF, and BatteryRemaingCapacity a
percentage value, which actually means RemainingBatteryPercentage.
Now we found some battery follows this rule even if it's a rechargeable.
https://bugzilla.kernel.org/show_bug.cgi?id=15979
Handle these batteries correctly in ACPI battery driver
so that they won't break userspace.
Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
VMWare reports that the e1000 driver has a bug when bringing down the
interface, such that interrupts are not disabled in the hardware but the
driver stops reporting that it consumed the interrupt.
The fix is to set the driver's "down" flag later in the routine,
after all the timers and such have exited, preventing the interrupt
handler from being called and exiting early without handling the
interrupt.
CC: Anupam Chanda <anupamc@vmware.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
This patch (as1437) fixes a bug in the usb-serial autosuspend
handling. Since the usb-serial core now has autosuspend support, it
must set the .supports_autosuspend member in every serial driver it
registers. Otherwise the usb_autopm_get_interface() call won't work.
This fixes Bugzilla #23012.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Kevin Smith <thirdwiggin@gmail.com> Reported-and-tested-by: Simon Gerber <gesimu@gmail.com> Reported-and-tested-by: Matteo Croce <matteo@openwrt.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
This patch (as1435) fixes an obscure and unlikely race in ehci-hcd.
When an async URB is unlinked, the corresponding QH is removed from
the async list. If the QH's endpoint is then disabled while the URB
is being given back, ehci_endpoint_disable() won't find the QH on the
async list, causing it to believe that the QH has been lost. This
will lead to a memory leak at best and quite possibly to an oops.
The solution is to trust usbcore not to lose track of endpoints. If
the QH isn't on the async list then it doesn't need to be taken off
the list, but the driver should still wait for the QH to become IDLE
before disabling it.
In theory this fixes Bugzilla #20182. In fact the race is so rare
that it's not possible to tell whether the bug is still present.
However, adding delays and making other changes to force the race
seems to show that the patch works.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Andi Kleen <ak@linux.intel.com> CC: David Brownell <david-b@pacbell.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Structure usbdevfs_connectinfo is copied to userland with padding byted
after "slow" field uninitialized. It leads to leaking of contents of
kernel stack memory.
Structure iowarrior_info is copied to userland with padding byted
between "serial" and "revision" fields uninitialized. It leads to
leaking of contents of kernel stack memory.
When huawei datacard with PID 0x14AC is insterted into Linux system, the
present kernel will load the "option" driver to all the interfaces. But
actually, some interfaces run as other function and do not need "option"
driver.
In this path, we modify the id_tables, when the PID is 0x14ac ,VID is
0x12d1, Only when the interface's Class is 0xff,Subclass is 0xff, Pro is
0xff, it does need "option" driver.
Add the USB IDs for the Milkymist One FTDI-based JTAG/serial adapter
(http://projects.qi-hardware.com/index.php/p/mmone-jtag-serial-cable/)
to the ftdi_sio driver and disable the first serial channel (used as
JTAG from userspace).
musb driver still may write MUSB_DEVCTL register after clock is disabled
in musb_platform_exit, which may cause the kernel oops[1] when musb_hdrc
module is loaded for the 2nd time.
The patch fixes the kernel oops in this case.
[1] kernel oops when loading musb_hdrc module for the 2nd time
Disabling SuperSpeed ports is a Very Bad Thing (TM). It disables
SuperSpeed terminations, which means that devices will never connect at
SuperSpeed on that port. For USB 2.0/1.1 ports, disabling the port meant
that the USB core could always get a connect status change later. That's
not true with USB 3.0 ports.
Do not let the USB core disable SuperSpeed ports. We can't rely on the
device speed in the port status registers, since that isn't valid until
there's a USB device connected to the port. Instead, we use the port
speed array that's created from the Extended Capabilities registers.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Tested-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
An xHCI host controller contains USB 2.0 and USB 3.0 ports, which can
occur in any order in the PORTSC registers. We cannot read the port speed
bits in the PORTSC registers at init time to determine the port speed,
since those bits are only valid when a USB device is plugged into the
port.
Instead, we read the "Supported Protocol Capability" registers in the xHC
Extended Capabilities space. Those describe the protocol, port offset in
the PORTSC registers, and port count. We use those registers to create
two arrays of pointers to the PORTSC registers, one for USB 3.0 ports, and
another for USB 2.0 ports. A third array keeps track of the port protocol
major revision, and is indexed with the internal xHCI port number.
This commit is a bit big, but it should be queued for stable because the "Don't
let the USB core disable SuperSpeed ports" patch depends on it. There is no
other way to determine which ports are SuperSpeed ports without this patch.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Tested-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
We have been having problems with the USB-IF Gold Tree tests when plugging
and unplugging devices from the tree. I have seen that the reset-device
and configure-endpoint commands, which are invoked from
xhci_discover_or_reset_device() and xhci_configure_endpoint(), will sometimes
time out.
After much debugging, I determined that the commands themselves do not actually
time out, but rather their completion events do not get delivered to the right
place.
This happens when the command ring has just wrapped around, and it's enqueue
pointer is left pointing to the link TRB. xhci_discover_or_reset_device() and
xhci_configure_endpoint() use the enqueue pointer directly as their command
TRB pointer, without checking whether it's pointing to the link TRB.
When the completion event arrives, if the command TRB is pointing to the link
TRB, the check against the command ring dequeue pointer in
handle_cmd_in_cmd_wait_list() fails, so the completion inside the command does
not get signaled.
The patch below fixes the timeout problem for me.
This should be queued for the 2.6.35 and 2.6.36 stable trees.
Signed-off-by: Paul Zimmerman <paulz@synopsys.com> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
USB2.0 spec 9.6.6 says: For all endpoints, bit 10..0 specify the maximum
packet size(in bytes).
So the wMaxPacketSize mask should be 0x7ff rather than 0x3ff.
This patch should be queued for the stable tree. The bug in
xhci_endpoint_init() was present as far back as 2.6.31, and the bug in
xhci_get_max_esit_payload() was present when the function was introduced
in 2.6.34.
This code seems to be asking for a shared read/write mapping of 16MB worth of
BAR0 starting at file offset 0, and letting the kernel assign a starting
address. Unfortunately, this -EINVAL causes X not to start. Looking into
dmesg, there's a complaint like so:
process "Xorg" tried to map 0x01000000 bytes at page 0x00000000 on 0000:07:00.0 BAR 0 (start 0x 96000000, size 0x 1000000)
It looks like the logic here is set up such that when the mmap call comes via
sysfs, the check in pci_mmap_fits wants vma->vm_pgoff to be between the
resource's start and end address, and the end of the vma to be no farther than
the end. However, the sysfs PCI resource files always start at offset zero,
which means that this test always fails for programs that mmap the sysfs files.
Given the comment in the original commit 3b519e4ea618b6943a82931630872907f9ac2c2b, I _think_ the old procfs files
require that the file offset be equal to the resource's base address when
mmapping.
I think what we want here is for pci_start to be 0 when mmap_api ==
PCI_MMAP_PROCFS. The following patch makes that change, after which the Matrox
and Mach64 X drivers work again.
Acked-by: Martin Wilck <martin.wilck@ts.fujitsu.com> Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
The checks for valid mmaps of PCI resources made through /proc/bus/pci files
that were introduced in 9eff02e2042f96fb2aedd02e032eca1c5333d767 have several
problems:
1. mmap() calls on /proc/bus/pci files are made with real file offsets > 0,
whereas under /sys/bus/pci/devices, the start of the resource corresponds
to offset 0. This may lead to false negatives in pci_mmap_fits(), which
implicitly assumes the /sys/bus/pci/devices layout.
2. The loop in proc_bus_pci_mmap doesn't skip empty resouces. This leads
to false positives, because pci_mmap_fits() doesn't treat empty resources
correctly (the calculated size is 1 << (8*sizeof(resource_size_t)-PAGE_SHIFT)
in this case!).
3. If a user maps resources with BAR > 0, pci_mmap_fits will emit bogus
WARNINGS for the first resources that don't fit until the correct one is found.
On many controllers the first 2-4 BARs are used, and the others are empty.
In this case, an mmap attempt will first fail on the non-empty BARs
(including the "right" BAR because of 1.) and emit bogus WARNINGS because
of 3., and finally succeed on the first empty BAR because of 2.
This is certainly not the intended behaviour.
This patch addresses all 3 issues.
Updated with an enum type for the additional parameter for pci_mmap_fits().
SCSI commands may be issued between __scsi_add_device() and dev->sdev
assignment, so it's unsafe for ata_qc_complete() to dereference
dev->sdev->locked without checking whether it's NULL or not. Fix it.
[0.051203] CPU0: AMD QEMU Virtual CPU version 0.12.4 stepping 03
[0.052999] lockdep: fixing up alternatives.
[0.054105]
[0.054106] ===================================================
[0.054999] [ INFO: suspicious rcu_dereference_check() usage. ]
[0.054999] ---------------------------------------------------
[0.054999] kernel/sched.c:616 invoked rcu_dereference_check() without protection!
[0.054999]
[0.054999] other info that might help us debug this:
[0.054999]
[0.054999]
[0.054999] rcu_scheduler_active = 1, debug_locks = 1
[0.054999] 3 locks held by swapper/1:
[0.054999] #0: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff814be933>] cpu_up+0x42/0x6a
[0.054999] #1: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810400d8>] cpu_hotplug_begin+0x2a/0x51
[0.054999] #2: (&rq->lock){-.-...}, at: [<ffffffff814be2f7>] init_idle+0x2f/0x113
[0.054999]
[0.054999] stack backtrace:
[0.054999] Pid: 1, comm: swapper Not tainted 2.6.35 #1
[0.054999] Call Trace:
[0.054999] [<ffffffff81068054>] lockdep_rcu_dereference+0x9b/0xa3
[0.054999] [<ffffffff810325c3>] task_group+0x7b/0x8a
[0.054999] [<ffffffff810325e5>] set_task_rq+0x13/0x40
[0.054999] [<ffffffff814be39a>] init_idle+0xd2/0x113
[0.054999] [<ffffffff814be78a>] fork_idle+0xb8/0xc7
[0.054999] [<ffffffff81068717>] ? mark_held_locks+0x4d/0x6b
[0.054999] [<ffffffff814bcebd>] do_fork_idle+0x17/0x2b
[0.054999] [<ffffffff814bc89b>] native_cpu_up+0x1c1/0x724
[0.054999] [<ffffffff814bcea6>] ? do_fork_idle+0x0/0x2b
[0.054999] [<ffffffff814be876>] _cpu_up+0xac/0x127
[0.054999] [<ffffffff814be946>] cpu_up+0x55/0x6a
[0.054999] [<ffffffff81ab562a>] kernel_init+0xe1/0x1ff
[0.054999] [<ffffffff81003854>] kernel_thread_helper+0x4/0x10
[0.054999] [<ffffffff814c353c>] ? restore_args+0x0/0x30
[0.054999] [<ffffffff81ab5549>] ? kernel_init+0x0/0x1ff
[0.054999] [<ffffffff81003850>] ? kernel_thread_helper+0x0/0x10
[0.056074] Booting Node 0, Processors #1lockdep: fixing up alternatives.
[0.130045] #2lockdep: fixing up alternatives.
[0.203089] #3 Ok.
[0.275286] Brought up 4 CPUs
[0.276005] Total of 4 processors activated (16017.17 BogoMIPS).
The cgroup_subsys_state structures referenced by idle tasks are never
freed, because the idle tasks should be part of the root cgroup,
which is not removable.
The problem is that while we do in-fact hold rq->lock, the newly spawned
idle thread's cpu is not yet set to the correct cpu so the lockdep check
in task_group():
lockdep_is_held(&task_rq(p)->lock)
will fail.
But this is a chicken and egg problem. Setting the CPU's runqueue requires
that the CPU's runqueue already be set. ;-)
So insert an RCU read-side critical section to avoid the complaint.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
In commit f1befe71 Chris Wilson added some code to clear the full gtt
on g33/pineview instead of just the mappable part. The code looks like
it was copy-pasted from agp/intel-gtt.c, at least an identical piece
of code is still there (in intel_i830_init_gtt_entries). This lead to
a regression in 2.6.35 which was supposedly fixed in commit e7b96f28
Now this commit makes absolutely no sense to me. It seems to be
slightly confused about chipset generations - it references docs for
4th gen but the regression concerns 3rd gen g33. Luckily the the g33
gmch docs are available with the GMCH Graphics Control pci config
register definitions. The other (bigger problem) is that the new
check in there uses the i830 stolen mem bits (.5M, 1M or 8M of stolen
mem). They are different since the i855GM.
The most likely case is that it hits the 512M fallback, which was
probably the right thing for the boxes this was tested on.
So the original approach by Chris Wilson seems to be wrong and the
current code is definitely wrong. There is a third approach by Jesse
Barnes from his RFC patch "Who wants a bigger GTT mapping range?"
where he simply shoves g33 in the same clause like later chipset
generations.
I've asked him and Jesse confirmed that this should work. So implement
it.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=16891$ Tested-by: Anisse Astier <anisse@astier.eu> Signed-off-by: Anisse Astier <anisse@astier.eu> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
Commit d09c23de intended to add a 30ms delay to give the ADD time to
detect any TVs connected. However, it used the sdvo->is_tv flag to do so
which is dependent upon the previous detection result and not whether the
output supports TVs.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
Note: this patch targets 2.6.37 and tries to be as simple as possible.
That is why it adds more copy-and-paste horror into fs/compat.c and
uglifies fs/exec.c, this will be cleanuped later.
compat_copy_strings() plays with bprm->vma/mm directly and thus has
two problems: it lacks the RLIMIT_STACK check and argv/envp memory
is not visible to oom killer.
Export acct_arg_size() and get_arg_page(), change compat_copy_strings()
to use get_arg_page(), change compat_do_execve() to do acct_arg_size(0)
as do_execve() does.
Add the fatal_signal_pending/cond_resched checks into compat_count() and
compat_copy_strings(), this matches the code in fs/exec.c and certainly
makes sense.
Brad Spengler published a local memory-allocation DoS that
evades the OOM-killer (though not the virtual memory RLIMIT):
http://www.grsecurity.net/~spender/64bit_dos.c
execve()->copy_strings() can allocate a lot of memory, but
this is not visible to oom-killer, nobody can see the nascent
bprm->mm and take it into account.
With this patch get_arg_page() increments current's MM_ANONPAGES
counter every time we allocate the new page for argv/envp. When
do_execve() succeds or fails, we change this counter back.
Technically this is not 100% correct, we can't know if the new
page is swapped out and turn MM_ANONPAGES into MM_SWAPENTS, but
I don't think this really matters and everything becomes correct
once exec changes ->mm or fails.
Compared to upstream:
before 2.6.36 kernel, oom-killer's badness() takes
mm->total_vm into account and nothing else. So
acct_arg_size() has to play with this counter too.
If there was no connector mapped to the encoder, atombios_get_encoder_mode()
returned 0 which is the id for DP. Return something sane instead based on
the encoder id. This avoids hitting the DP paths on non-DP encoders.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>
If the iovec is being set up in a way that causes uaddr + PAGE_SIZE
to overflow, we could end up attempting to map a huge number of
pages. Check for this invalid input type.
70 hours into some stress tests of a 2.6.32-based enterprise kernel, we
ran into a NULL dereference in here:
int block_is_partially_uptodate(struct page *page, read_descriptor_t *desc,
unsigned long from)
{
----> struct inode *inode = page->mapping->host;
It looks like page->mapping was the culprit. (xmon trace is below).
After closer examination, I realized that do_generic_file_read() does a
find_get_page(), and eventually locks the page before calling
block_is_partially_uptodate(). However, it doesn't revalidate the
page->mapping after the page is locked. So, there's a small window
between the find_get_page() and ->is_partially_uptodate() where the page
could get truncated and page->mapping cleared.
We _have_ a reference, so it can't get reclaimed, but it certainly
can be truncated.
I think the correct thing is to check page->mapping after the
trylock_page(), and jump out if it got truncated. This patch has been
running in the test environment for a month or so now, and we have not
seen this bug pop up again.
Per task latencytop accumulator prematurely terminates due to erroneous
placement of latency_record_count. It should be incremented whenever a
new record is allocated instead of increment on every latencytop event.
Also fix search iterator to only search known record events instead of
blindly searching all pre-allocated space.
Signed-off-by: Ken Chen <kenchen@google.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Salman Qazi describes the following radix-tree bug:
In the following case, we get can get a deadlock:
0. The radix tree contains two items, one has the index 0.
1. The reader (in this case find_get_pages) takes the rcu_read_lock.
2. The reader acquires slot(s) for item(s) including the index 0 item.
3. The non-zero index item is deleted, and as a consequence the other item is
moved to the root of the tree. The place where it used to be is queued for
deletion after the readers finish.
3b. The zero item is deleted, removing it from the direct slot, it remains in
the rcu-delayed indirect node.
4. The reader looks at the index 0 slot, and finds that the page has 0 ref
count
5. The reader looks at it again, hoping that the item will either be freed or
the ref count will increase. This never happens, as the slot it is looking
at will never be updated. Also, this slot can never be reclaimed because
the reader is holding rcu_read_lock and is in an infinite loop.
The fix is to re-use the same "indirect" pointer case that requires a slot
lookup retry into a general "retry the lookup" bit.
The NF_HOOK_COND returns 0 when it shouldn't due to what I believe to be an
error in the code as the order of operations is not what was intended. C will
evalutate == before =. Which means ret is getting set to the bool result,
rather than the return value of the function call. The code says
if (ret = function() == 1)
when it meant to say:
if ((ret = function()) == 1)
Normally the compiler would warn, but it doesn't notice it because its
a actually complex conditional and so the wrong code is wrapped in an explict
set of () [exactly what the compiler wants you to do if this was intentional].
Fixing this means that errors when netfilter denies a packet get propagated
back up the stack rather than lost.
Problem introduced by commit 2249065f (netfilter: get rid of the grossness
in netfilter.h).
Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com>