Chen Gang [Fri, 11 Jan 2013 22:32:17 +0000 (14:32 -0800)]
MAINTAINERS: Omar had moved
Signed-off-by: Chen Gang <gang.chen@asianux.com> Cc: Omar Ramirez Luna <omar.ramirez@ti.com> Cc: Omar Ramirez Luna <omar.ramirez@copitl.com> Cc: David Miller <davem@davemloft.net> Cc: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Fri, 11 Jan 2013 22:32:16 +0000 (14:32 -0800)]
mm: compaction: partially revert capture of suitable high-order page
Eric Wong reported on 3.7 and 3.8-rc2 that ppoll() got stuck when
waiting for POLLIN on a local TCP socket. It was easier to trigger if
there was disk IO and dirty pages at the same time and he bisected it to
commit 1fb3f8ca0e92 ("mm: compaction: capture a suitable high-order page
immediately when it is made available").
The intention of that patch was to improve high-order allocations under
memory pressure after changes made to reclaim in 3.6 drastically hurt
THP allocations but the approach was flawed. For Eric, the problem was
that page->pfmemalloc was not being cleared for captured pages leading
to a poor interaction with swap-over-NFS support causing the packets to
be dropped. However, I identified a few more problems with the patch
including the fact that it can increase contention on zone->lock in some
cases which could result in async direct compaction being aborted early.
In retrospect the capture patch took the wrong approach. What it should
have done is mark the pageblock being migrated as MIGRATE_ISOLATE if it
was allocating for THP and avoided races that way. While the patch was
showing to improve allocation success rates at the time, the benefit is
marginal given the relative complexity and it should be revisited from
scratch in the context of the other reclaim-related changes that have
taken place since the patch was first written and tested. This patch
partially reverts commit 1fb3f8ca0e92 ("mm: compaction: capture a
suitable high-order page immediately when it is made available").
Reported-and-tested-by: Eric Wong <normalperson@yhbt.net> Tested-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Frysinger [Fri, 11 Jan 2013 22:32:13 +0000 (14:32 -0800)]
linux/audit.h: move ptrace.h include to kernel header
While the kernel internals want pt_regs (and so it includes
linux/ptrace.h), the user version of audit.h does not need it. So move
the include out of the uapi version.
This avoids issues where people want the audit defines and userland
ptrace api. Including both the kernel ptrace and the userland ptrace
headers can easily lead to failure.
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Fri, 11 Jan 2013 22:32:11 +0000 (14:32 -0800)]
kernel/audit.c: avoid negative sleep durations
audit_log_start() performs the same jiffies comparison in two places.
If sufficient time has elapsed between the two comparisons, the second
one produces a negative sleep duration:
schedule_timeout: wrong timeout value fffffffffffffff0
Pid: 6606, comm: trinity-child1 Not tainted 3.8.0-rc1+ #43
Call Trace:
schedule_timeout+0x305/0x340
audit_log_start+0x311/0x470
audit_log_exit+0x4b/0xfb0
__audit_syscall_exit+0x25f/0x2c0
sysret_audit+0x17/0x21
Fix it by performing the comparison a single time.
Reported-by: Dave Jones <davej@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Fri, 11 Jan 2013 22:32:07 +0000 (14:32 -0800)]
audit: catch possible NULL audit buffers
It's possible for audit_log_start() to return NULL. Handle it in the
various callers.
Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Cc: Jeff Layton <jlayton@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Julien Tinnes <jln@google.com> Cc: Will Drewry <wad@google.com> Cc: Steve Grubb <sgrubb@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Fri, 11 Jan 2013 22:32:05 +0000 (14:32 -0800)]
audit: create explicit AUDIT_SECCOMP event type
The seccomp path was using AUDIT_ANOM_ABEND from when seccomp mode 1
could only kill a process. While we still want to make sure an audit
record is forced on a kill, this should use a separate record type since
seccomp mode 2 introduces other behaviors.
In the case of "handled" behaviors (process wasn't killed), only emit a
record if the process is under inspection. This change also fixes
userspace examination of seccomp audit events, since it was considered
malformed due to missing fields of the AUDIT_ANOM_ABEND event type.
Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Cc: Jeff Layton <jlayton@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Julien Tinnes <jln@google.com> Acked-by: Will Drewry <wad@chromium.org> Acked-by: Steve Grubb <sgrubb@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexander Beregalov and Alex Xu reported similar bugs and Hillf Danton
identified that commit 5a505085f043 ("mm/rmap: Convert the struct
anon_vma::mutex to an rwsem") and commit 4fc3f1d66b1e ("mm/rmap,
migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable")
were likely the problem. Reverting these commits was reported to solve
the problem for Alexander.
Despite the reason for these commits, NUMA balancing is not the direct
source of the problem. split_huge_page() expects the anon_vma lock to
be exclusive to serialise the whole split operation. Ordinarily it is
expected that the anon_vma lock would only be required when updating the
avcs but THP also uses the anon_vma rwsem for collapse and split
operations where the page lock or compound lock cannot be used (as the
page is changing from base to THP or vice versa) and the page table
locks are insufficient.
This patch takes the anon_vma lock for write to serialise against parallel
split_huge_page as THP expected before the conversion to rwsem.
Reported-and-tested-by: Zhouping Liu <zliu@redhat.com> Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Reported-by: Alex Xu <alex_y_xu@yahoo.ca> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
which is incomplete, and causes the false positive report from lockdep
below.
Annotate the fact that mmap_sem is used as an outter lock to serialize
taking of all the anon_vma rwsems at once no matter the order, using the
down_write_nest_lock() primitive.
This patch fixes this lockdep report:
=============================================
[ INFO: possible recursive locking detected ] 3.8.0-rc2-00036-g5f73896 #171 Not tainted
---------------------------------------------
qemu-kvm/2315 is trying to acquire lock:
(&anon_vma->rwsem){+.+...}, at: mm_take_all_locks+0x149/0x1b0
but task is already holding lock:
(&anon_vma->rwsem){+.+...}, at: mm_take_all_locks+0x149/0x1b0
other info that might help us debug this:
Possible unsafe locking scenario:
Jiri Kosina [Fri, 11 Jan 2013 22:31:56 +0000 (14:31 -0800)]
lockdep, rwsem: provide down_write_nest_lock()
down_write_nest_lock() provides a means to annotate locking scenario
where an outer lock is guaranteed to serialize the order nested locks
are being acquired.
This is analogoue to already existing mutex_lock_nest_lock() and
spin_lock_nest_lock().
Max Filippov [Fri, 11 Jan 2013 22:31:52 +0000 (14:31 -0800)]
mm: bootmem: fix free_all_bootmem_core() with odd bitmap alignment
Currently free_all_bootmem_core ignores that node_min_pfn may be not
multiple of BITS_PER_LONG. Eg commit 6dccdcbe2c3e ("mm: bootmem: fix
checking the bitmap when finally freeing bootmem") shifts vec by lower
bits of start instead of lower bits of idx. Also
if (IS_ALIGNED(start, BITS_PER_LONG) && vec == ~0UL)
assumes that vec bit 0 corresponds to start pfn, which is only true when
node_min_pfn is a multiple of BITS_PER_LONG. Also loop in the else
clause can double-free pages (e.g. with node_min_pfn == start == 1,
map[0] == ~0 on 32-bit machine page 32 will be double-freed).
This bug causes the following message during xtensa kernel boot:
bootmem::free_all_bootmem_core nid=0 start=1 end=8000
BUG: Bad page state in process swapper pfn:00001
page:d04bd020 count:0 mapcount:-127 mapping: (null) index:0x2
page flags: 0x0()
Call Trace:
bad_page+0x8c/0x9c
free_pages_prepare+0x5e/0x88
free_hot_cold_page+0xc/0xa0
__free_pages+0x24/0x38
__free_pages_bootmem+0x54/0x56
free_all_bootmem_core$part$11+0xeb/0x138
free_all_bootmem+0x46/0x58
mem_init+0x25/0xa4
start_kernel+0x11e/0x25c
should_never_return+0x0/0x3be7
The fix is the following:
- always align vec so that its bit 0 corresponds to start
- provide BITS_PER_LONG bits in vec, if those bits are available in the
map
- don't free pages past next start position in the else clause.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Cc: Gavin Shan <shangw@linux.vnet.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Prasad Koya <prasad.koya@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Laura Abbott [Fri, 11 Jan 2013 22:31:51 +0000 (14:31 -0800)]
mm: use aligned zone start for pfn_to_bitidx calculation
The current calculation in pfn_to_bitidx assumes that (pfn -
zone->zone_start_pfn) >> pageblock_order will return the same bit for
all pfn in a pageblock. If zone_start_pfn is not aligned to
pageblock_nr_pages, this may not always be correct.
Consider the following with pageblock order = 10, zone start 2MB:
This means that calling {get,set}_pageblock_migratetype on a single page
will not set the migratetype for the full block. Fix this by rounding
down zone_start_pfn when doing the bitidx calculation.
For our use case, the effects of this bug were mostly tied to the fact
that CMA allocations would either take a long time or fail to happen.
Depending on the driver using CMA, this could result in anything from
visual glitches to application failures.
Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Xi Wang [Fri, 11 Jan 2013 22:31:48 +0000 (14:31 -0800)]
fs/exec.c: work around icc miscompilation
The tricky problem is this check:
if (i++ >= max)
icc (mis)optimizes this check as:
if (++i > max)
The check now becomes a no-op since max is MAX_ARG_STRINGS (0x7FFFFFFF).
This is "allowed" by the C standard, assuming i++ never overflows,
because signed integer overflow is undefined behavior. This
optimization effectively reverts the previous commit 362e6663ef23
("exec.c, compat.c: fix count(), compat_count() bounds checking") that
tries to fix the check.
This patch simply moves ++ after the check.
Signed-off-by: Xi Wang <xi.wang@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lin Feng [Fri, 11 Jan 2013 22:31:44 +0000 (14:31 -0800)]
mm: memblock: fix wrong memmove size in memblock_merge_regions()
The memmove span covers from (next+1) to the end of the array, and the
index of next is (i+1), so the index of (next+1) is (i+2). So the size
of remaining array elements is (type->cnt - (i + 2)).
Since the remaining elements of the memblock array are move forward by
one element and there is only one additional element caused by this bug.
So there won't be any write overflow here but read overflow. It may
read one more element out of the array address if the array happens to
be full. Commonly it doesn't matter at all but if the array happens to
be located at the end a memblock, it may cause a invalid read operation
for the physical address doesn't exist.
There are 2 *happens to be* here, so I think the probability is quite
low, I don't know if any guy is haunted by this bug before.
Mostly I think it's user-invisible.
Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Maxime Ripard [Fri, 11 Jan 2013 22:31:43 +0000 (14:31 -0800)]
drivers/video/ssd1307fb.c: fix bit order bug in the byte translation function
This was leading to a strange behaviour when using the fbcon driver on
top of this one: the letters were in the right order, but each letter
had a vertical symmetry.
This was because the addressing was right for the byte, but the
addressing of each individual bit was inverted.
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com> Cc: Brian Lilly <brian@crystalfontz.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Cc: Thomas Petazzoni <thomas@free-electrons.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Fri, 11 Jan 2013 22:31:40 +0000 (14:31 -0800)]
mm: migrate: check page_count of THP before migrating
Hugh Dickins pointed out that migrate_misplaced_transhuge_page() does
not check page_count before migrating like base page migration and
khugepage. He could not see why this was safe and he is right.
The potential impact of the bug is avoided due to the limitations of
NUMA balancing. The page_mapcount() check ensures that only a single
address space is using this page and as THPs are typically private it
should not be possible for another address space to fault it in
parallel. If the address space has one associated task then it's
difficult to have both a GUP pin and be referencing the page at the same
time. If there are multiple tasks then a buggy scenario requires that
another thread be accessing the page while the direct IO is in flight.
This is dodgy behaviour as there is a possibility of corruption with or
without THP migration. It would be
While we happen to be safe for the most part it is shoddy to depend on
such "safety" so this patch checks the page count similar to anonymous
pages. Note that this does not mean that the page_mapcount() check can
go away. If we were to remove the page_mapcount() check the the THP
would have to be unmapped from all referencing PTEs, replaced with
migration PTEs and restored properly afterwards.
WARNING: drivers/rtc/rtc-da9055.o(.text+0xa71): Section mismatch in reference from the function da9055_rtc_probe() to the function .init.text:da9055_rtc_device_init()
Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Decotigny [Fri, 11 Jan 2013 22:31:36 +0000 (14:31 -0800)]
lib: cpu_rmap: avoid flushing all workqueues
In some cases, free_irq_cpu_rmap() is called while holding a lock (eg
rtnl). This can lead to deadlocks, because it invokes
flush_scheduled_work() which ends up waiting for whole system workqueue
to flush, but some pending works might try to acquire the lock we are
already holding.
This commit uses reference-counting to replace
irq_run_affinity_notifiers(). It also removes
irq_run_affinity_notifiers() altogether.
[akpm@linux-foundation.org: eliminate free_cpu_rmap, rename cpu_rmap_reclaim() to cpu_rmap_release(), propagate kref_put() retval from cpu_rmap_put()] Signed-off-by: David Decotigny <decot@googlers.com> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesse Barnes [Wed, 14 Nov 2012 20:43:31 +0000 (20:43 +0000)]
x86/Sandy Bridge: reserve pages when integrated graphics is present
SNB graphics devices have a bug that prevent them from accessing certain
memory ranges, namely anything below 1M and in the pages listed in the
table. So reserve those at boot if set detect a SNB gfx device on the
CPU to avoid GPU hangs.
Stephane Marchesin had a similar patch to the page allocator awhile
back, but rather than reserving pages up front, it leaked them at
allocation time.
[ hpa: made a number of stylistic changes, marked arrays as static
const, and made less verbose; use "memblock=debug" for full
verbosity. ]
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Krzysztof Mazur [Fri, 11 Jan 2013 22:20:09 +0000 (23:20 +0100)]
cpuidle: fix number of initialized/destroyed states
Commit bf4d1b5ddb78f86078ac6ae0415802d5f0c68f92 (cpuidle: support
multiple drivers) changed the number of initialized state kobjects
in cpuidle_add_state_sysfs() from device->state_count to
drv->state_count, but left device->state_count in
cpuidle_remove_state_sysfs(). The values of these two fields may be
different, in which case a NULL pointer dereference may happen in
cpuidle_remove_state_sysfs(), for example. Fix this problem by making
cpuidle_add_state_sysfs() use device->state_count too (which restores
the original behavior of it).
[rjw: Changelog] Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Steven Rostedt [Fri, 11 Jan 2013 21:14:10 +0000 (16:14 -0500)]
tracing: Fix regression with irqsoff tracer and tracing_on file
Commit 02404baf1b47 "tracing: Remove deprecated tracing_enabled file"
removed the tracing_enabled file as it never worked properly and
the tracing_on file should be used instead. But the tracing_on file
didn't call into the tracers start/stop routines like the
tracing_enabled file did. This caused trace-cmd to break when it
enabled the irqsoff tracer.
If you just did "echo irqsoff > current_tracer" then it would work
properly. But the tool trace-cmd disables tracing first by writing
"0" into the tracing_on file. Then it writes "irqsoff" into
current_tracer and then writes "1" into tracing_on. Unfortunately,
the above commit changed the irqsoff tracer to check the tracing_on
status instead of the tracing_enabled status. If it's disabled then
it does not start the tracer internals.
The problem is that writing "1" into tracing_on does not call the
tracers "start" routine like writing "1" into tracing_enabled did.
This makes the irqsoff tracer not start when using the trace-cmd
tool, and is a regression for userspace.
Simple fix is to have the tracing_on file call the tracers start()
method when being enabled (and the stop() method when disabled).
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Oliver Neukum [Thu, 29 Nov 2012 14:05:57 +0000 (15:05 +0100)]
USB: hub: handle claim of enabled remote wakeup after reset
Some touchscreens have buggy firmware which claims
remote wakeup to be enabled after a reset. They nevertheless
crash if the feature is cleared by the host.
Add a check for reset resume before checking for
an enabled remote wakeup feature. On compliant
devices the feature must be cleared after a reset anyway.
Signed-off-by: Oliver Neukum <oneukum@suse.de> Acked-by: Alan Stern <stern@rowland.harvard.edu> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linus Torvalds [Fri, 11 Jan 2013 20:09:04 +0000 (12:09 -0800)]
Merge tag 'nfs-for-3.8-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfix from Trond Myklebust:
- Fix a socket lock leak in net/sunrpc/xprt.c
* tag 'nfs-for-3.8-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: Ensure we release the socket write lock if the rpc_task exits early
Fabio Estevam [Mon, 10 Dec 2012 02:58:36 +0000 (00:58 -0200)]
usb: imx21-hcd: Include missing linux/module.h
Include <linux/module.h>, so that the following errors are fixed:
drivers/usb/host/imx21-hcd.c:1929:20: error: expected declaration specifiers or '...' before string constant
drivers/usb/host/imx21-hcd.c:1930:15: error: expected declaration specifiers or '...' before string constant
drivers/usb/host/imx21-hcd.c:1931:16: error: expected declaration specifiers or '...' before string constant
drivers/usb/host/imx21-hcd.c:1932:14: error: expected declaration specifiers or '...' before string constant
Linus Torvalds [Fri, 11 Jan 2013 17:05:28 +0000 (09:05 -0800)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull intel DRM fixes from Dave Airlie:
"Just intel fixes, including getting the Ironlake systems back to the
state they were in for 3.6."
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/i915: Revert shrinker changes from "Track unbound pages"
drm/i915: Use pixel size for computing linear offsets into a sprite
drm/i915: Add DEBUG messages to all intel_create_user_framebuffer error paths
drm/i915: The sprite scaler on Ironlake also support YUV planes
drm: Only evict the blocks required to create the requested hole
drm/i915: Treat crtc->mode.clock == 0 as disabled
Revert "drm/i915: no lvds quirk for Zotac ZDBOX SD ID12/ID13"
drm/i915; Only increment the user-pin-count after successfully pinning the bo
Mel Gorman [Fri, 11 Jan 2013 09:27:01 +0000 (09:27 +0000)]
mm: compaction: Partially revert capture of suitable high-order page
Eric Wong reported on 3.7 and 3.8-rc2 that ppoll() got stuck when
waiting for POLLIN on a local TCP socket. It was easier to trigger if
there was disk IO and dirty pages at the same time and he bisected it to
commit 1fb3f8ca0e92 ("mm: compaction: capture a suitable high-order page
immediately when it is made available").
The intention of that patch was to improve high-order allocations under
memory pressure after changes made to reclaim in 3.6 drastically hurt
THP allocations but the approach was flawed. For Eric, the problem was
that page->pfmemalloc was not being cleared for captured pages leading
to a poor interaction with swap-over-NFS support causing the packets to
be dropped. However, I identified a few more problems with the patch
including the fact that it can increase contention on zone->lock in some
cases which could result in async direct compaction being aborted early.
In retrospect the capture patch took the wrong approach. What it should
have done is mark the pageblock being migrated as MIGRATE_ISOLATE if it
was allocating for THP and avoided races that way. While the patch was
showing to improve allocation success rates at the time, the benefit is
marginal given the relative complexity and it should be revisited from
scratch in the context of the other reclaim-related changes that have
taken place since the patch was first written and tested. This patch
partially reverts commit 1fb3f8ca "mm: compaction: capture a suitable
high-order page immediately when it is made available".
Reported-and-tested-by: Eric Wong <normalperson@yhbt.net> Tested-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Reisner [Wed, 2 Jan 2013 13:54:37 +0000 (08:54 -0500)]
debugfs: convert gid= argument from decimal, not octal
This patch technically breaks userspace, but I suspect that anyone who
actually used this flag would have encountered this brokenness, declared
it lunacy, and already sent a patch.
Thomas Schwinge [Fri, 16 Nov 2012 09:46:20 +0000 (10:46 +0100)]
sh: Fix FDPIC binary loader
Ensure that the aux table is properly initialized, even when optional features
are missing. Without this, the FDPIC loader did not work. This was meant to
be included in commit d5ab780305bb6d60a7b5a74f18cf84eb6ad153b1.
Signed-off-by: Thomas Schwinge <thomas@codesourcery.com> Cc: stable@vger.kernel.org Signed-off-by: Paul Mundt <lethal@linux-sh.org>
sh: clkfwk: bugfix: sh_clk_div_enable() care sh_clk_div_set_rate() if div6
764f4e4e33d18cde4dcaf8a0d860b749c6d6d08b
(sh: clkfwk: Use shared sh_clk_div_enable/disable())
shared enable/disable funcions for div4/div6.
But new sh_clk_div_enable() didn't care sh_clk_div_set_rate()
which is required on div6 clock.
This patch fixes it.
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
sh: define TASK_UNMAPPED_BASE as a page aligned constant
b4265f12340f809447b9a48055e88c444b480c89
(mm: use vm_unmapped_area() on sh architecture)
broke sh boot. This patch define TASK_UNMAPPED_BASE
as a page aligned constant to solve this issue.
Special thanks to Michel
Acked-by: Michel Lespinasse <walken@google.com> Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Takashi Iwai [Thu, 10 Jan 2013 13:06:38 +0000 (14:06 +0100)]
ALSA: usb-audio: Fix NULL dereference by access to non-existing substream
The commit [0d9741c0: ALSA: usb-audio: sync ep init fix for
audioformat mismatch] introduced the correction of parameters to be
set for sync EP. But since the new code assumes that the sync EP is
always paired with the data EP of another direction, it triggers Oops
when a device only with a single direction is used.
This patch adds a proper check of sync EP type and the presence of the
paired substream for avoiding the crash.
Eric Dumazet [Thu, 10 Jan 2013 16:18:47 +0000 (16:18 +0000)]
tcp: accept RST without ACK flag
commit c3ae62af8e755 (tcp: should drop incoming frames without ACK flag
set) added a regression on the handling of RST messages.
RST should be allowed to come even without ACK bit set. We validate
the RST by checking the exact sequence, as requested by RFC 793 and
5961 3.2, in tcp_validate_incoming()
Reported-by: Eric Wong <normalperson@yhbt.net> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Eric Wong <normalperson@yhbt.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Roland Dreier [Mon, 7 Jan 2013 19:45:16 +0000 (11:45 -0800)]
iscsi-target: Fix CmdSN comparison (use cmd->cmd_sn instead of cmd->stat_sn)
Commit 64c13330a389 ("iscsi-target: Fix bug in handling of ExpStatSN
ACK during u32 wrap-around") introduced a bug where we compare the
wrong SN against our ExpCmdSN.
Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Roland Dreier <roland@purestorage.com> Cc: stable@vger.kernel.org Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Roland Dreier [Wed, 2 Jan 2013 20:47:59 +0000 (12:47 -0800)]
target: Release se_cmd when LUN lookup fails for TMR
When transport_lookup_tmr_lun() fails and we return a task management
response from target_complete_tmr_failure(), we need to call
transport_cmd_check_stop_to_fabric() to release the last ref to the
cmd after calling se_tfo->queue_tm_rsp(), or else we will never remove
the failed TMR from the session command list (and we'll end up waiting
forever when trying to tear down the session).
(nab: Fix minor compile breakage)
Signed-off-by: Roland Dreier <roland@purestorage.com> Cc: stable@vger.kernel.org Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Roland Dreier [Wed, 2 Jan 2013 20:47:58 +0000 (12:47 -0800)]
target: Fix use-after-free in LUN RESET handling
If a backend IO takes a really long then an initiator might abort a
command, and then when it gives up on the abort, send a LUN reset too,
all before we process any of the original command or the abort. (The
abort will wait for the backend IO to complete too)
When the backend IO final completes (or fails), the abort handling
will proceed and queue up a "return aborted status" operation. Then,
while that's still pending, the LUN reset might find the original
command still on the LUN's list of commands and try to return aborted
status again, which leads to a use-after free when the first
se_tfo->queue_status call frees the command and then the second
se_tfo->queue_status call runs.
Fix this by removing a command from the LUN state_list when we first
are about to queue aborted status; we shouldn't do anything
LUN-related after we've started returning status, so this seems like
the correct thing to do.
Signed-off-by: Roland Dreier <roland@purestorage.com> Cc: stable@vger.kernel.org Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Roland Dreier [Wed, 2 Jan 2013 20:47:57 +0000 (12:47 -0800)]
target: Fix missing CMD_T_ACTIVE bit regression for pending WRITEs
This patch fixes a regression bug introduced during v3.6.x code with
the following commit to drop transport_add_cmd_to_queue(), which
originally re-set CMD_T_ACTIVE during pending WRITE I/O submission:
target: replace the processing thread with a TMR work queue
The following sequence happens for write commands (or any other
commands with a data out phase):
- The transport calls target_submit_cmd(), which sets CMD_T_ACTIVE in
cmd->transport_state and sets cmd->t_state to TRANSPORT_NEW_CMD.
- Things go on transport_generic_new_cmd(), which notices that the
command needs to transfer data, so it sets cmd->t_state to
TRANSPORT_WRITE_PENDING and calls transport_cmd_check_stop().
- transport_cmd_check_stop() clears CMD_T_ACTIVE in cmd->transport_state
and returns in the normal case.
- Then we continue on to call ->se_tfo->write_pending().
- The data comes back from the initiator, and the transport calls
target_execute_cmd(), which sets cmd->t_state to TRANSPORT_PROCESSING
and calls into the backend to actually write the data.
At this point, the backend might take a long time to complete the
command, since it has to do real IO. If an abort request comes in for
this command at this point, it will not wait for the command to finish
since CMD_T_ACTIVE is not set. Then when the command does finally
finish, we blow up with use-after-free.
Avoid this by setting CMD_T_ACTIVE in target_execute_cmd() so that
transport_wait_for_tasks() waits for the command to finish executing.
This matches the behavior from before commit 1389533ef944 ("target:
remove transport_generic_handle_data"), when data was signaled via
transport_generic_handle_data(), which set CMD_T_ACTIVE because it
called transport_add_cmd_to_queue().
Signed-off-by: Roland Dreier <roland@purestorage.com> Reported-by: Martin Svec <martin.svec@zoner.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Mark Rustad [Fri, 21 Dec 2012 18:58:19 +0000 (10:58 -0800)]
tcm_fc: Do not report target role when target is not defined
Clear the target role when no target is provided for
the node performing a PRLI.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Reviewed-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Acked by Robert Love <robert.w.love@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Mark Rustad [Fri, 21 Dec 2012 18:58:14 +0000 (10:58 -0800)]
tcm_fc: Do not indicate retry capability to initiators
When generating a PRLI response to an initiator, clear the
FCP_SPPF_RETRY bit in the response.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Reviewed-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Acked by Robert Love <robert.w.love@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Hannes Reinecke [Mon, 17 Dec 2012 08:53:34 +0000 (09:53 +0100)]
target: Use TCM_NO_SENSE for initialisation
The compiler complained about uninitialized variables, so
use TCM_NO_SENSE here.
Signed-off-by: Hannes Reinecke <hare@suse.de> Cc: Nicholas Bellinger <nab@risingtidesystems.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Hannes Reinecke [Mon, 17 Dec 2012 08:53:33 +0000 (09:53 +0100)]
target: Introduce TCM_NO_SENSE
Introduce TCM_NO_SENSE, mapping to sense code
'Not ready, no additional sense information'.
Signed-off-by: Hannes Reinecke <hare@suse.de> Cc: Nicholas Bellinger <nab@risingtidesystems.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Randy Dunlap [Thu, 10 Jan 2013 01:13:00 +0000 (17:13 -0800)]
seq_file: fix new kernel-doc warnings
Fix kernel-doc warnings in fs/seq_file.c:
Warning(fs/seq_file.c:304): No description found for parameter 'whence'
Warning(fs/seq_file.c:304): Excess function parameter 'origin' description in 'seq_lseek'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Thu, 10 Jan 2013 01:12:45 +0000 (17:12 -0800)]
audit: fix auditfilter.c kernel-doc warnings
Fix new kernel-doc warning in auditfilter.c:
Warning(kernel/auditfilter.c:1157): Excess function parameter 'uid' description in 'audit_receive_filter'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Cc: linux-audit@redhat.com (subscribers-only) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Thu, 10 Jan 2013 01:12:35 +0000 (17:12 -0800)]
nfs: fix sunrpc/clnt.c kernel-doc warnings
Fix new kernel-doc warnings in clnt.c:
Warning(net/sunrpc/clnt.c:561): No description found for parameter 'flavor'
Warning(net/sunrpc/clnt.c:561): Excess function parameter 'auth' description in 'rpc_clone_client_set_auth'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: linux-nfs@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Simek [Thu, 10 Jan 2013 06:58:43 +0000 (06:58 +0000)]
net: ethernet: xilinx: Do not use NO_IRQ in axienet
This driver is used on Microblaze and will be used
on Arm Zynq.
Microblaze doesn't define NO_IRQ and no IRQ is 0.
Arm still uses NO_IRQ as -1 and there is no option
to connect IRQ to irq 0.
That's why <= 0 is only one option how to find out
undefined IRQ.
Signed-off-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Barak Witkowski [Thu, 10 Jan 2013 04:53:40 +0000 (04:53 +0000)]
bnx2x: Allow management traffic after boot from SAN
As part of the previous driver unload flow, whenever bnx2x is
loaded after the UNDI driver it closes all Rx traffic.
However, this leads to management traffic also being stopped until
the network interface associated with one of its functions gets loaded.
To remedy this, management traffic is re-opened once the 'cleaning'
after the previous driver ends.
Signed-off-by: Barak Witkowski <barak@broadcom.com> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Thu, 10 Jan 2013 04:53:39 +0000 (04:53 +0000)]
bnx2x: Fix fastpath structures when memory allocation fails
When allocating Tx queues, if for some reason
(e.g., lack of memory) allocation fails, driver will incorrectly
calculate the pointers of the various queues.
This patch repositions all pointers in such a case to point at
sequential structures in memory, allowing the bnx2x macros to
be used correctly when accessing them.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bfin_mac: Restore hardware time-stamping dependency on BF518
Commit 70ac618c07 ("ptp: fixup Kconfig for two PHC drivers.") removed all
dependencies for the blackfin hardware time-stamping Kconfig entry. Hardware
time-stamping is only available on BF518 though. Since the Kconfig entry is
'default y', just updateing your kernel source and running `make defconfig` will
result in the the following build errors:
drivers/net/ethernet/adi/bfin_mac.c:694: error: implicit declaration of function ‘bfin_read_EMAC_PTP_CTL’
drivers/net/ethernet/adi/bfin_mac.c:702: error: implicit declaration of function ‘bfin_write_EMAC_PTP_FV3’
drivers/net/ethernet/adi/bfin_mac.c:712: error: implicit declaration of function ‘bfin_write_EMAC_PTP_CTL’
drivers/net/ethernet/adi/bfin_mac.c:717: error: implicit declaration of function ‘bfin_write_EMAC_PTP_FOFF’
...
This patch adds back the dependency on BF518, and since it does not make sense
to expose this config option when the blackfin MAC driver is not enabled also
restore the dependency on BFIN_MAC.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
At the moment, we check owner when we enable queue in tun.
This seems redundant and will break some valid uses
where fd is passed around: I think TUNSETOWNER is there
to prevent others from attaching to a persistent device not
owned by them. Here the fd is already attached,
enabling/disabling queue is more like read/write.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Wed, 9 Jan 2013 22:58:07 +0000 (22:58 +0000)]
bnx2x: move debugging code before the return
I move the return down a line after the debugging printk.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Hajnoczi [Wed, 9 Jan 2013 21:59:48 +0000 (21:59 +0000)]
tuntap: refuse to re-attach to different tun_struct
Multiqueue tun devices support detaching a tun_file from its tun_struct
and re-attaching at a later point in time. This allows users to disable
a specific queue temporarily.
ioctl(TUNSETIFF) allows the user to specify the network interface to
attach by name. This means the user can attempt to attach to interface
"B" after detaching from interface "A".
The driver is not designed to support this so check we are re-attaching
to the right tun_struct. Failure to do so may lead to oops.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Romain Kuntz [Wed, 9 Jan 2013 21:06:03 +0000 (21:06 +0000)]
ipv6: use addrconf_get_prefix_route for prefix route lookup [v2]
Replace ip6_route_lookup() with addrconf_get_prefix_route() when
looking up for a prefix route. This ensures that the connected prefix
is looked up in the main table, and avoids the selection of other
matching routes located in different tables as well as blackhole
or prohibited entries.
In addition, this fixes an Opps introduced by commit 64c6d08e (ipv6:
del unreachable route when an addr is deleted on lo), that would occur
when a blackhole or prohibited entry is selected by ip6_route_lookup().
Such entries have a NULL rt6i_table argument, which is accessed by
__ip6_del_rt() when trying to lock rt6i_table->tb6_lock.
The function addrconf_is_prefix_route() is not used anymore and is
removed.
[v2] Minor indentation cleanup and log updates.
Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Romain Kuntz [Wed, 9 Jan 2013 14:02:26 +0000 (15:02 +0100)]
ipv6: fix the noflags test in addrconf_get_prefix_route
The tests on the flags in addrconf_get_prefix_route() does no make
much sense: the 'noflags' parameter contains the set of flags that
must not match with the route flags, so the test must be done
against 'noflags', and not against 'flags'.
Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
To fix this problem, we must eat skbs in tcp_recv_skb().
Remove the inline keyword from tcp_recv_skb() definition since
it has three call sites.
Reported-by: Christian Becker <c.becker@traviangames.com> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
Jerome Glisse [Wed, 9 Jan 2013 21:40:42 +0000 (16:40 -0500)]
radeon/kms: fix dma relocation checking
We were checking the index against the size of the relocation buffer
instead of against the last index. This fix kernel segfault when
userspace submit ill formated command stream/relocation buffer pair.
Signed-off-by: Jerome Glisse <jglisse@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jerome Glisse [Tue, 8 Jan 2013 23:41:01 +0000 (18:41 -0500)]
radeon/kms: force rn50 chip to always report connected on analog output
Those rn50 chip are often connected to console remoting hw and load
detection often fails with those. Just don't try to load detect and
report connect.
Signed-off-by: Jerome Glisse <jglisse@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
Dave Airlie [Thu, 10 Jan 2013 21:47:25 +0000 (07:47 +1000)]
Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel
Daniel writes:
"Pretty much all just major fixes:
- 2 pieces of duct-tape for the ilk bug.
- Sprite regression fixes from Chris.
- OOPS fix for a div-by-zero from Chris, regression due to the modeset
rework in 3.7, now brought to light by a benign change in 3.8.
- Fix interrupted bo pinning, used to work around CS coherency issues on
i830/i845 (kernel also has a w/a newly in 3.8, but pinning is more efficient if
possible)."
also removed the irq_disable/enable around kvm guest switch, which
is correct in itself. Unfortunately, there is a BUG ON that (correctly)
checks for preemptible to cover the call to rcu later on.
(Introduced with commit 8fa2206821953a50a3a02ea33fcfb3ced2fd9997
KVM: make guest mode entry to be rcu quiescent state)
This check might trigger depending on the kernel config.
Lets make sure that no preemption happens during kvm_guest_enter.
We can enable preemption again after the call to
rcu_virt_note_context_switch returns.
Please note that we continue to run s390 guests with interrupts
enabled.
Linus Torvalds [Thu, 10 Jan 2013 17:09:41 +0000 (09:09 -0800)]
Merge branch 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86
Pull x86 platform driver bugfixes from Matthew Garrett.
* 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86:
asus-laptop: Fix potential invalid pointer dereference
Update MAINTAINERS entry
asus-laptop: Do not call HWRS on init
sony-laptop: fix SNC buffer calls when SN06 returns Integers
samsung-laptop: Add quirk for broken acpi_video backlight on N250P
acer-wmi: add Aspire 5741G touchpad toggle key
acer-wmi: change to emit touchpad on off key
acer-wmi: fix obj is NULL but dereferenced
MAINTAINERS: change the mail address of acer-wmi/msi-laptop maintainer
Linus Torvalds [Thu, 10 Jan 2013 17:05:18 +0000 (09:05 -0800)]
Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM bugfixes from Marcelo Tosatti.
* git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: use dynamic percpu allocations for shared msrs area
KVM: PPC: Book3S HV: Fix compilation without CONFIG_PPC_POWERNV
powerpc: Corrected include header path in kvm_para.h
Add rcu user eqs exception hooks for async page fault
Linus Torvalds [Thu, 10 Jan 2013 17:03:16 +0000 (09:03 -0800)]
Merge tag 'trace-3.8-rc2-regression-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing regression fix from Steven Rostedt:
"A change that came in this merge window broke the writing to the
trace_options file. It causes garbage to be read during the compare
of option names, and breaks setting options via the trace_options
file, although options can still be set via the options/<option>
files."
* tag 'trace-3.8-rc2-regression-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix regression of trace_options file setting
Closer inspection of that patch revealed a bunch of unrelated changes
in the shrinker:
- The shrinker count is now in pages instead of objects.
- For counting the shrinkable objects the old code only looked at the
inactive list, the new code looks at all bounds objects (including
pinned ones). That is obviously in addition to the new unbound list.
- The shrinker cound is no longer scaled with
sysctl_vfs_cache_pressure. Note though that with the default tuning
value of vfs_cache_pressue = 100 this doesn't affect the shrinker
behaviour.
- When actually shrinking objects, the old code first dropped
purgeable objects, then normal (inactive) objects. Only then did it,
in a last-ditch effort idle the gpu and evict everything. The new
code omits the intermediate step of evicting normal inactive
objects.
Safe for the first change, which seems benign, and the shrinker count
scaling, which is a bit a different story, the endresult of all these
changes is that the shrinker is _much_ more likely to fall back to the
last-ditch resort of idling the gpu and evicting everything. The old
code could only do that if something else evicted lots of objects
meanwhile (since without any other changes the nr_to_scan will be
smaller than the object count).
Reverting the vfs_cache_pressure behaviour itself is a bit bogus: Only
dentry/inode object caches should scale their shrinker counts with
vfs_cache_pressure. Originally I've had that change reverted, too. But
Chris Wilson insisted that it's too bogus and shouldn't again see the
light of day.
Hence revert all these other changes and restore the old shrinker
behaviour, with the minor adjustment that we now first scan the
unbound list, then the inactive list for each object category
(purgeable or normal).
A similar patch has been tested by a few people affected by the gen4/5
hangs which started to appear in 3.7, which some people bisected to
the "drm/i915: Track unbound pages" commit. But just disabling the
unbound logic alone didn't change things at all.
Note that this patch doesn't fix the referenced bugs, it only hides
the underlying bug(s) well enough to restore pre-3.7 behaviour. The
key to achieve that is to massively reduce the likelyhood of going
into a full gpu stall and evicting everything.
v2: Reword commit message a bit, taking Chris Wilson's comment into
account.
v3: On Chris Wilson's insistency, do not reinstate the rather bogus
vfs_cache_pressure change.
Tested-by: Greg KH <gregkh@linuxfoundation.org> Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=55984
References: https://bugs.freedesktop.org/show_bug.cgi?id=57122
References: https://bugs.freedesktop.org/show_bug.cgi?id=56916
References: https://bugs.freedesktop.org/show_bug.cgi?id=57136 Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Linus Torvalds [Thu, 10 Jan 2013 16:20:15 +0000 (08:20 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 patches from Martin Schwidefsky:
"Add the finit_module system call, fix the irq statistics in
/proc/stat, fix a s390dbf lockdep problem, a patch revert for a
problem that is not 100% understood yet, and a few patches to
fix warnings."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/pci: define read*_relaxed functions
s390/topology: export cpu_topology
s390/pm: export pm_power_off
s390/pci: define isa_dma_bridge_buggy
s390/3215: partially revert tty close handling fix
s390/irq: count cpu restart events
s390/irq: remove split irq fields from /proc/stat
s390/irq: enable irq sum accounting for /proc/stat again
s390/syscalls: wire up finit_module syscall
s390/pci: remove dead code
s390/smp: fix section mismatch for smp_add_present_cpu()
s390/debug: Fix s390dbf lockdep problem in debug_(un)register_view()
Will Deacon [Tue, 18 Dec 2012 14:15:15 +0000 (14:15 +0000)]
arm64: mm: introduce present, faulting entries for PAGE_NONE
This is mostly a port of dbf62d50067e ("ARM: mm: introduce L_PTE_VALID
for page table entries") and 26ffd0d43b18 ("ARM: mm: introduce present,
faulting entries for PAGE_NONE") from ARM, which makes use of present,
faulting page table entries for page table entries mapped as PROT_NONE.
The main difference with this implementation is that we can make use of
the two pte type bits in order to avoid allocating a software bit for
identifying PROT_NONE pages, instead reserving the 10b suffix for these
types of mappings.
This is required to prevent users from accessing such pages via syscalls
such as read/write over a pipe.
Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Will Deacon [Mon, 7 Jan 2013 16:12:31 +0000 (16:12 +0000)]
arm64: vdso: remove broken, redundant sequence counting for timezones
This patch is an arm64 version of ce73ec6db47a ("powerpc/vdso: Remove
redundant locking in update_vsyscall_tz()").
Timezone data is not protected, so the sequence counter is not required
to ensure consistency. Furthermore, having multiple paths updating the
counter leads to a race between update_vsyscall and update_vsyscall_tz,
so remove the timezone sequence counting from both the kernel and the
vdso.
Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Guenter Roeck [Mon, 31 Dec 2012 08:04:59 +0000 (00:04 -0800)]
hwmon: (vexpress) Fix build error seen if CONFIG_OF_DEVICE is not set
Fix:
vexpress.c: In function ‘vexpress_hwmon_name_show’:
vexpress.c:34:2: error: implicit declaration of function
‘of_get_property’ [-Werror=implicit-function-declaration]
vexpress.c:34:27: warning: initialization makes pointer from integer
without a cast [enabled by default]
vexpress.c: In function ‘vexpress_hwmon_label_show’:
vexpress.c:43:22: warning: initialization makes pointer from integer
without a cast [enabled by default]
Seen if CONFIG_OF_DEVICE is not defined. of_get_property is declared in
of.h which is only included by of_device.h if CONFIG_OF_DEVICE is defined.
of.h needs to be included directly.
Steven Rostedt [Thu, 10 Jan 2013 01:54:17 +0000 (20:54 -0500)]
tracing: Fix regression of trace_options file setting
The latest change to allow trace options to be set on the command
line also broke the trace_options file.
The zeroing of the last byte of the option name that is echoed into
the trace_option file was removed with the consolidation of some
of the code. The compare between the option and what was written to
the trace_options file fails because the string holding the data
written doesn't terminate with a null character.
A zero needs to be added to the end of the string copied from
user space.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Linus Torvalds [Wed, 9 Jan 2013 16:58:57 +0000 (08:58 -0800)]
Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
Pull ARM fixes from Russell King.
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
ARM: 7616/1: cache-l2x0: aurora: Use writel_relaxed instead of writel
ARM: 7615/1: cache-l2x0: aurora: Invalidate during clean operation with WT enable
ARM: 7614/1: mm: fix wrong branch from Cortex-A9 to PJ4b
ARM: 7612/1: imx: Do not select some errata that depends on !ARCH_MULTIPLATFORM
ARM: 7611/1: VIC: fix bug in VIC irqdomain code
ARM: 7610/1: versatile: bump IRQ numbers
ARM: 7609/1: disable errata work-arounds which access secure registers
ARM: 7608/1: l2x0: Only set .set_debug on PL310 r3p0 and earlier