Ingo Molnar [Wed, 23 Oct 2013 07:45:50 +0000 (09:45 +0200)]
Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
* Convert callchain children list to rbtree, greatly reducing the time
taken for callchain processing, from Namhyung Kim.
* Add --max-stack option to limit callchain stack scan in 'top' and 'report',
improving callchain processing when reducing the stack depth is an option,
from Waiman Long.
* Compare dso's also when comparing symbols, to avoid grouping together
symbols with the same name but on different DSOs, fix from Namhyung Kim.
* 'perf trace' now can can use a 'perf probe' wannabe tracepoint to hook into
the userspace -> kernel pathname copy so that it can map fds to pathnames
without reading /proc/pid/fd/ symlinks. From Arnaldo Carvalho de Melo.
* 'perf trace' now emits hints as to why tracing is not possible, helping the
user to setup the system to allow tracing in the desired permission
granularity, telling if the problem is due to debugfs not being mounted or
with not enough permission for !root, /proc/sys/kernel/perf_event_paranoit
value, etc. From Arnaldo Carvalho de Melo.
* Add missing 'mmap2' in evsel debug print, from Adrian Hunter.
* Add missing decrement in id sample parsing, not a fix per se, just to
avoid a problem whem somebody adds another field, from Adrian Hunter.
* Improve write_output error message in 'perf record', from Adrian Hunter.
* Add missing sample flush for piped events, fix from Adrian Hunter.
* Add missing members to perf_event__attr_swap(), fix from Adrian Hunter.
* Assorted fixes for 32-bit build, from Adrian Hunter
* Print addr by default for BTS in 'perf script', from Adrian Juntmer
* Separating data file properties from session, code reorganization from
Jiri Olsa.
* Show error in 'perf list' if tracepoints not available, from Pekka Enberg.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
Waiman Long [Fri, 18 Oct 2013 14:38:49 +0000 (10:38 -0400)]
perf top: Add --max-stack option to limit callchain stack scan
When the callgraph function is enabled (-G), it may take a long time to
scan all the stack data and merge them accordingly.
This patch adds a new --max-stack option to perf-top to limit the depth
of callchain stack data to look at to reduce the time it takes for
perf-top to finish its processing. It reduces the amount of information
provided to the user in exchange for faster speed.
Signed-off-by: Waiman Long <Waiman.Long@hp.com> Acked-by: David Ahern <dsahern@gmail.com> Tested-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Scott J Norton <scott.norton@hp.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1382107129-2010-5-git-send-email-Waiman.Long@hp.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Waiman Long [Fri, 18 Oct 2013 14:38:48 +0000 (10:38 -0400)]
perf report: Add --max-stack option to limit callchain stack scan
When callgraph data was included in the perf data file, it may take a
long time to scan all those data and merge them together especially if
the stored callchains are long and the perf data file itself is large,
like a Gbyte or so.
The callchain stack is currently limited to PERF_MAX_STACK_DEPTH (127).
This is a large value. Usually the callgraph data that developers are
most interested in are the first few levels, the rests are usually not
looked at.
This patch adds a new --max-stack option to perf-report to limit the
depth of callchain stack data to look at to reduce the time it takes for
perf-report to finish its processing. It trades the presence of trailing
stack information with faster speed.
The following table shows the elapsed time of doing perf-report on a
perf.data file of size 985,531,828 bytes.
--max_stack Elapsed Time Output data size
----------- ------------ ----------------
not set 88.0s 124,422,651
64 87.5s 116,303,213
32 87.2s 112,023,804
16 86.6s 94,326,380
8 59.9s 33,697,248
4 40.7s 10,116,637
-g none 27.1s 2,555,810
Signed-off-by: Waiman Long <Waiman.Long@hp.com> Acked-by: David Ahern <dsahern@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Scott J Norton <scott.norton@hp.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1382107129-2010-4-git-send-email-Waiman.Long@hp.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Tue, 15 Oct 2013 14:27:32 +0000 (16:27 +0200)]
perf tools: Add data object to handle perf data file
This patch is adding 'struct perf_data_file' object as a placeholder for
all attributes regarding perf.data file handling. Changing
perf_session__new to take it as an argument.
The rest of the functionality will be added later to keep this change
simple enough, because all the places using perf_session are changed
now.
Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1381847254-28809-2-git-send-email-jolsa@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 15 Oct 2013 02:01:56 +0000 (11:01 +0900)]
perf tools: Compare dso's also when comparing symbols
Linus reported that sometimes 'perf report -s symbol' exits without any
message on TUI. David and Jiri found that it's because it failed to add
a hist entry due to an invalid symbol length.
It turns out that sorting by symbol (address) was broken since it only
compares symbol addresses. The symbol address is a relative address
within a dso thus just checking its address can result in merging
unrelated symbols together. Fix it by checking dso before comparing
symbol address.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1381802517-18812-1-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pekka Enberg [Tue, 15 Oct 2013 20:07:27 +0000 (23:07 +0300)]
perf list: Show error if tracepoints not available
Tracepoints are not visible in "perf list" on Fedora 19 because regular
users have no permission to /sys/kernel/debug by default. Show an error
message so that the user knows about it instead of assuming that
tracepoints are not supported on the system.
Adrian Hunter [Fri, 18 Oct 2013 12:29:14 +0000 (15:29 +0300)]
perf script: Print addr by default for BTS
The addr field is not displayed by default for hardware events, however
for branch events it is the target of the branch so for BTS display it
by default if it was recorded.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: David Ahern <dsahern@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1382099356-4918-18-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Fri, 18 Oct 2013 12:29:09 +0000 (15:29 +0300)]
perf tools: Fix bench/numa.c for 32-bit build
bench/numa.c: In function 'worker_thread':
bench/numa.c:1123:20: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
bench/numa.c:1171:6: error: format '%lx' expects argument of type 'long unsigned int', but argument 5 has type 'u64' [-Werror=format]
cc1: all warnings being treated as errors
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1382099356-4918-13-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Fri, 18 Oct 2013 12:29:08 +0000 (15:29 +0300)]
perf tools: Fix test_on_exit for 32-bit build
builtin-record.c:42:12: error: static declaration of 'on_exit' follows non-static declaration
In file included from util/util.h:51:0,
from builtin.h:4,
from builtin-record.c:8:
/usr/include/stdlib.h:536:12: note: previous declaration of 'on_exit' was here
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1382099356-4918-12-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Fri, 18 Oct 2013 12:29:07 +0000 (15:29 +0300)]
perf evlist: Fix 32-bit build error
util/evlist.c: In function 'perf_evlist__mmap':
util/evlist.c:772:2: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' [-Werror=format]
cc1: all warnings being treated as errors
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1382099356-4918-11-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Sun, 20 Oct 2013 22:20:45 +0000 (23:20 +0100)]
Merge branch 'parisc-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parsic fixes from Helge Deller:
"There are just two small fixes in here:
- Revert a commit which exported the flush_cache_page function. This
was noticed by Christoph Hellwig.
- Enable the DEVTMPFS, DEVTMPFS_MOUNT and BLK_DEV_INITRD config
options in the parisc defconfigs so that latest udev/initrd finds
the root disk at boot"
* 'parisc-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: enable DEVTMPFS, DEVTMPFS_MOUNT and BLK_DEV_INITRD in defconfigs
Revert "parisc: Export flush_cache_page() (needed by lustre)"
Helge Deller [Mon, 14 Oct 2013 20:55:36 +0000 (22:55 +0200)]
parisc: enable DEVTMPFS, DEVTMPFS_MOUNT and BLK_DEV_INITRD in defconfigs
Latest udev requires that DEVTMPFS and DEVTMPFS_MOUNT are enabled, else
initrd will fail to find root filesystem. Enable missing BLK_DEV_INITRD
for B180 and C3000 machines.
Christoph Hellwig <hch@infradead.org> commented:
This one shouldn't go in - Geert sent it a bit prematurely, as Lustre
shouldn't use it just to reimplement core VM functionality (which it
shouldn't use either, but that's a separate story).
Linus Torvalds [Fri, 18 Oct 2013 23:46:21 +0000 (16:46 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fix from Chris Mason:
"Sage hit a deadlock with ceph on btrfs, and Josef tracked it down to a
regression in our initial rc1 pull. When doing nocow writes we were
sometimes starting a transaction with locks held"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: release path before starting transaction in can_nocow_extent
Linus Torvalds [Fri, 18 Oct 2013 21:26:51 +0000 (14:26 -0700)]
Merge tag 'pm+acpi-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
- intel_pstate fix for misbehavior after system resume if sysfs
attributes are set in a specific way before the corresponding suspend
from Dirk Brandewie.
- A recent intel_pstate fix has no effect if unsigned long is 32-bit,
so fix it up to cover that case as well.
- The s3c64xx cpufreq driver was not updated when the index field of
struct cpufreq_frequency_table was replaced with driver_data, so
update it now. From Charles Keepax.
- The Kconfig help text for ACPI_BUTTON still refers to
/proc/acpi/event that has been dropped recently, so modify it to
remove that reference. From Krzysztof Mazur.
- A Lan Tianyu's change adds a missing mutex unlock to an error code
path in acpi_resume_power_resources().
- Some code related to ACPI power resources, whose very purpose is
questionable to put it lightly, turns out to cause problems to happen
during testing on real systems, so remove it completely (we may
revisit that in the future if there's a compelling enough reason).
From Rafael J Wysocki and Aaron Lu.
* tag 'pm+acpi-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / PM: Drop two functions that are not used any more
ATA / ACPI: remove power dependent device handling
cpufreq: s3c64xx: Rename index to driver_data
ACPI / power: Drop automaitc resume of power resource dependent devices
intel_pstate: Fix type mismatch warning
cpufreq / intel_pstate: Fix max_perf_pct on resume
ACPI: remove /proc/acpi/event from ACPI_BUTTON help
ACPI / power: Release resource_lock after acpi_power_get_state() return error
Tetsuo Handa [Thu, 17 Oct 2013 10:45:29 +0000 (19:45 +0900)]
mutex: Avoid gcc version dependent __builtin_constant_p() usage
Commit 040a0a37 ("mutex: Add support for wound/wait style locks")
used "!__builtin_constant_p(p == NULL)" but gcc 3.x cannot
handle such expression correctly, leading to boot failure when
built with CONFIG_DEBUG_MUTEXES=y.
Fix it by explicitly passing a bool which tells whether p != NULL
or not.
[ PeterZ: This is a sad patch, but provided it actually generates
similar code I suppose its the best we can do bar whole
sale deprecating gcc-3. ]
* acpi-fixes:
ACPI / PM: Drop two functions that are not used any more
ATA / ACPI: remove power dependent device handling
ACPI / power: Drop automaitc resume of power resource dependent devices
ACPI: remove /proc/acpi/event from ACPI_BUTTON help
ACPI / power: Release resource_lock after acpi_power_get_state() return error
Ingo Molnar [Fri, 18 Oct 2013 10:46:14 +0000 (12:46 +0200)]
Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney.
Major changes:
" 1. Update RCU documentation. These were posted to LKML at
http://article.gmane.org/gmane.linux.kernel/1566994.
2. Miscellaneous fixes. These were posted to LKML at
http://article.gmane.org/gmane.linux.kernel/1567027.
3. Grace-period-related changes, primarily to aid in debugging,
inspired by a -rt debugging session. These were posted to
LKML at http://article.gmane.org/gmane.linux.kernel/1567076.
4. Idle entry/exit changes, primarily to address issues located
by Tibor Billes. These were posted to LKML at
http://article.gmane.org/gmane.linux.kernel/1567096.
5. Code reorganization moving RCU's source files from kernel
to kernel/rcu. This was posted to LKML at
http://article.gmane.org/gmane.linux.kernel/1577344."
Kees Cook [Wed, 16 Oct 2013 06:43:14 +0000 (23:43 -0700)]
x86/relocs: Add percpu fixup for GNU ld 2.23
The GNU linker tries to put __per_cpu_load into the percpu area,
resulting in a lack of its relocation. Force this symbol to be
relocated. Seen starting with GNU ld 2.23 and later.
Linus Torvalds [Fri, 18 Oct 2013 01:49:21 +0000 (18:49 -0700)]
Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
"Five small cifs fixes (includes fixes for: unmount hang, 2 security
related, symlink, large file writes)"
* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
cifs: ntstatus_to_dos_map[] is not terminated
cifs: Allow LANMAN auth method for servers supporting unencapsulated authentication methods
cifs: Fix inability to write files >2GB to SMB2/3 shares
cifs: Avoid umount hangs with smb2 when server is unresponsive
do not treat non-symlink reparse points as valid symlinks
David Cohen [Thu, 17 Oct 2013 22:35:36 +0000 (15:35 -0700)]
intel_mid: Move platform device setups to their own platform_<device>.* files
As Intel rolling out more SoC's after Moorestown, we need to
re-structure the code in a way that is backward compatible and easy to
expand. This patch implements a flexible way to support multiple boards
and devices.
This patch does not add any new functional support. It just refactors
the existing code to increase the modularity and decrease the code
duplication for supporting multiple soc's and boards.
Currently intel-mid.c has both board and soc related code in one file.
This patch moves the board related code to new files and let linker
script to create SFI devite table following this:
1. Move the SFI device specific code to
arch/x86/platform/intel-mid/device-libs/platform_<device>.*
A new device file is added for every supported device. This code will
get conditionally compiled by using corresponding device driver
CONFIG option.
2. Move the device_ids location to .x86_intel_mid_dev.init section by
using new sfi_device() macro.
This patch was based on previous code from Sathyanarayanan Kuppuswamy.
Moved SFI specific parsing/handling code to sfi.c. This will enable us
to reuse our intel-mid code for platforms that supports firmware
interfaces other than SFI (like ACPI).
This patch provides a means to add custom handler for
SFI devices. If you set device_handler as NULL in
device_id table standard SFI device handler will be used.
If its not NULL custom handler will be called.
SFI device_id[] table parsing code is duplicated in every SFI
device handler. This patch removes this code duplication, by
adding a seperate function get_device_id() to parse through the
device table. Also this patch moves the SPI, I2C, IPC info code from
sfi_parse_devs() to respective device handlers.
mrst is used as common name to represent all intel_mid type
soc's. But moorsetwon is just one of the intel_mid soc. So
renamed them to use intel_mid.
This patch mainly renames the variables and related
functions that uses *mrst* prefix with *intel_mid*.
To ensure that there are no functional changes, I have compared
the objdump of related files before and after rename and found
the only difference is symbol and name changes.
Fengguang Wu [Thu, 17 Oct 2013 22:35:28 +0000 (15:35 -0700)]
pci: intel_mid: Return true/false in function returning bool
Function 'type1_access_ok' should return bool value, not 0/1.
This patch changes 'return 0/1' to 'return false/true'.
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Link: http://lkml.kernel.org/r/1382049336-21316-5-git-send-email-david.a.cohen@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Also, renamed the corresponding header files and made changes
to the driver files that included these header files.
To ensure that there are no functional changes, I have compared
the objdump of renamed files before and after rename and found
that the only difference is file name change.
perf trace: Improve messages related to /proc/sys/kernel/perf_event_paranoid
kernel/events/core.c has:
/*
* perf event paranoia level:
* -1 - not paranoid at all
* 0 - disallow raw tracepoint access for unpriv
* 1 - disallow cpu events for unpriv
* 2 - disallow kernel profiling for unpriv
*/
int sysctl_perf_event_paranoid __read_mostly = 1;
So, with the default being 1, a non-root user can trace his stuff:
[acme@zoo ~]$ trace
Error: Operation not permitted.
Hint: Check /proc/sys/kernel/perf_event_paranoid setting.
Hint: For system wide tracing it needs to be set to -1.
Hint: The current value is 1.
[acme@zoo ~]$
[acme@zoo ~]$ trace --cpu 0
Error: Operation not permitted.
Hint: Check /proc/sys/kernel/perf_event_paranoid setting.
Hint: For system wide tracing it needs to be set to -1.
Hint: The current value is 1.
[acme@zoo ~]$
If the paranoid level is >= 2, i.e. turn this perf stuff off for !root users:
[acme@zoo ~]$ sudo sh -c 'echo 2 > /proc/sys/kernel/perf_event_paranoid'
[acme@zoo ~]$ cat /proc/sys/kernel/perf_event_paranoid
2
[acme@zoo ~]$
[acme@zoo ~]$ trace usleep 1
Error: Permission denied.
Hint: Check /proc/sys/kernel/perf_event_paranoid setting.
Hint: For your workloads it needs to be <= 1
Hint: For system wide tracing it needs to be set to -1.
Hint: The current value is 2.
[acme@zoo ~]$
[acme@zoo ~]$ trace
Error: Permission denied.
Hint: Check /proc/sys/kernel/perf_event_paranoid setting.
Hint: For your workloads it needs to be <= 1
Hint: For system wide tracing it needs to be set to -1.
Hint: The current value is 2.
[acme@zoo ~]$
[acme@zoo ~]$ trace --cpu 1
Error: Permission denied.
Hint: Check /proc/sys/kernel/perf_event_paranoid setting.
Hint: For your workloads it needs to be <= 1
Hint: For system wide tracing it needs to be set to -1.
Hint: The current value is 2.
[acme@zoo ~]$
If the user manages to get what he/she wants, convincing root not
to be paranoid at all...
Stephane Eranian [Thu, 17 Oct 2013 17:32:15 +0000 (19:32 +0200)]
perf: Disable PERF_RECORD_MMAP2 support
For now, we disable the extended MMAP record support (MMAP2).
We have identified cases where it would not report the correct mapping
information, clone(VM_CLONE) but with separate pids. We will revisit
the support once we find a solution for this case.
The patch changes the kernel to return EINVAL if attr->mmap2 is set. The
patch also modifies the perf tool to use regular PERF_RECORD_MMAP for
synthetic events and it also prevents the tool from requesting
attr->mmap2 mode because the kernel would reject it.
The support will be revisited once the kenrel interface is updated.
In V2, we reduce the patch to the strict minimum.
In V3, we avoid calling perf_event_open() with mmap2 set because we know
it will fail and require fallback retry.
Signed-off-by: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20131017173215.GA8820@quad Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cast __u64 to u64 to silence this warning on older distros, such as
Fedora 12:
CC /tmp/build/perf/util/scripting-engines/trace-event-perl.o
cc1: warnings being treated as errors
util/scripting-engines/trace-event-perl.c: In function ‘perl_process_tracepoint’:
util/scripting-engines/trace-event-perl.c:285: error: format ‘%lu’ expects type ‘long unsigned int’, but argument 2 has type ‘__u64’
make[1]: *** [/tmp/build/perf/util/scripting-engines/trace-event-perl.o] Error 1
make: *** [install] Error 2
make: Leaving directory `/home/acme/git/linux/tools/perf'
[acme@fedora12 linux]$
Reported-by: Waiman Long <Waiman.Long@hp.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Waiman Long <Waiman.Long@hp.com> Link: http://lkml.kernel.org/n/tip-nlxofdqcdjfm0w9o6bgq4kqv@git.kernel.org Link: http://lkml.kernel.org/r/1381265120-58532-1-git-send-email-Waiman.Long@hp.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Thu, 17 Oct 2013 17:39:01 +0000 (10:39 -0700)]
Merge tag 'driver-core-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
"Here is one fix for the hotplug memory path that resolves a regression
when removing memory that showed up in 3.12-rc1"
* tag 'driver-core-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: Release device_hotplug_lock when store_mem_state returns EINVAL
Linus Torvalds [Thu, 17 Oct 2013 17:38:18 +0000 (10:38 -0700)]
Merge tag 'usb-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some USB fixes and new device ids for 3.12-rc6
The largest change here is a bunch of new device ids for the option
USB serial driver for new Huawei devices. Other than that, just some
small bug fixes for issues that people have reported (run-time and
build-time), nothing major"
* tag 'usb-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: usb_phy_gen: refine conditional declaration of usb_nop_xceiv_register
usb: misc: usb3503: Fix compile error due to incorrect regmap depedency
usb/chipidea: fix oops on memory allocation failure
usb-storage: add quirk for mandatory READ_CAPACITY_16
usb: serial: option: blacklist Olivetti Olicard200
USB: quirks: add touchscreen that is dazzeled by remote wakeup
Revert "usb: musb: gadget: fix otg active status flag"
USB: quirks.c: add one device that cannot deal with suspension
USB: serial: option: add support for Inovia SEW858 device
USB: serial: ti_usb_3410_5052: add Abbott strip port ID to combined table as well.
USB: support new huawei devices in option.c
usb: musb: start musb on the udc side, too
xhci: Fix spurious wakeups after S5 on Haswell
xhci: fix write to USB3_PSSEN and XUSB2PRM pci config registers
xhci: quirk for extra long delay for S4
xhci: Don't enable/disable RWE on bus suspend/resume.
Linus Torvalds [Thu, 17 Oct 2013 17:37:42 +0000 (10:37 -0700)]
Merge tag 'tty-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull serial driver fixes from Greg KH:
"Here are two serial driver fixes for your tree. One is a revert of a
patch that causes a build error, the other is a fix to provide the
correct brace placement which resolves a bug where the driver was not
working properly"
* tag 'tty-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: vt8500: add missing braces
Revert "serial: i.MX: evaluate linux,stdout-path property"
Linus Torvalds [Thu, 17 Oct 2013 17:36:57 +0000 (10:36 -0700)]
Merge tag 'char-misc-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small iio and w1 driver fixes for 3.12-rc6.
There is also a hyper-v fix in here, which turned out to be incorrect,
so it was reverted. That will probably have to wait unto 3.13-rc1 to
get accepted as it's still being discussed"
* tag 'char-misc-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Revert "Drivers: hv: vmbus: Fix a bug in channel rescind code"
Drivers: hv: vmbus: Fix a bug in channel rescind code
iio:buffer: Free active scan mask in iio_disable_all_buffers()
iio: frequency: adf4350: add missing clk_disable_unprepare() on error in adf4350_probe()
w1 - call request_module with w1 master mutex unlocked
w1 - fix fops in w1_bus_notify
Linus Torvalds [Thu, 17 Oct 2013 17:17:25 +0000 (10:17 -0700)]
Merge tag 'sound-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"All reasonably small fixes as rc6: a HD-audio mic fix, a us122l mmap
regression fix, and kernel memory leak fix in hdsp driver. Hopefully
this will be the last pull request for 3.12..."
* tag 'sound-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hdsp - info leak in snd_hdsp_hwdep_ioctl()
ALSA: us122l: Fix pcm_usb_stream mmapping regression
ALSA: hda - Fix inverted internal mic not indicated on some machines
Linus Torvalds [Thu, 17 Oct 2013 17:16:45 +0000 (10:16 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull apparmor fixes from James Morris:
"A couple more regressions fixed"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
apparmor: fix bad lock balance when introspecting policy
apparmor: fix memleak of the profile hash
Merge tag 'iio-fixes-for-3.12c' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-linus
Jonathan writes:
Third set of IIO fixes for the 3.12 cycle.
Two little ones this time:
1) A missing clk_unprepare in adf4350.
2) A missing free of the active_scan_mask when iio_disable_all_buffers is
called during an unexpected device removal. This leak was introduced by
the fix a87c82e454f184a9473f8cdfd4d304205f585f65 iio: Stop sampling when the device is removed
and hence is a regression fix.
Guenter Roeck [Thu, 17 Oct 2013 02:18:41 +0000 (19:18 -0700)]
usb: usb_phy_gen: refine conditional declaration of usb_nop_xceiv_register
Commit 3fa4d734 (usb: phy: rename nop_usb_xceiv => usb_phy_gen_xceiv)
changed the conditional around the declaration of usb_nop_xceiv_register
from
#if defined(CONFIG_NOP_USB_XCEIV) ||
(defined(CONFIG_NOP_USB_XCEIV_MODULE) && defined(MODULE))
to
#if IS_ENABLED(CONFIG_NOP_USB_XCEIV)
While that looks the same, it is semantically different. The first expression
is true if CONFIG_NOP_USB_XCEIV is built as module and if the including
code is built as module. The second expression is true if code depending on
CONFIG_NOP_USB_XCEIV if built as module or into the kernel.
As a result, the arm:allmodconfig build fails with
arch/arm/mach-omap2/built-in.o: In function `omap3_evm_init':
arch/arm/mach-omap2/board-omap3evm.c:703: undefined reference to
`usb_nop_xceiv_register'
Fix the problem by reverting to the old conditional.
Revert "Drivers: hv: vmbus: Fix a bug in channel rescind code"
This reverts commit 90d33f3ec519db19d785216299a4ee85ef58ec97 as it's not
the correct fix for this issue, and it causes a build warning to be
added to the kernel tree.
Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ACPI / PM: Drop two functions that are not used any more
Two functions defined in device_pm.c, acpi_dev_pm_add_dependent()
and acpi_dev_pm_remove_dependent(), have no callers and may be
dropped, so drop them.
Moreover, they are the only functions adding entries to and removing
entries from the power_dependent list in struct acpi_device, so drop
that list too.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Aaron Lu [Thu, 17 Oct 2013 13:38:53 +0000 (15:38 +0200)]
ATA / ACPI: remove power dependent device handling
Previously, we wanted SCSI devices corrsponding to ATA devices to
be runtime resumed when the power resource for those ATA device was
turned on by some other device, so we added the SCSI device to the
dependent device list of the ATA device's ACPI node. However, this
code has no effect after commit 41863fc (ACPI / power: Drop automaitc
resume of power resource dependent devices) and the mechanism it was
supposed to implement is regarded as a bad idea now, so drop it.
[rjw: Changelog] Signed-off-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Thu, 17 Oct 2013 04:36:03 +0000 (21:36 -0700)]
Merge branch 'akpm' (fixes from Andrew Morton)
Merge misc fixes from Andrew Morton.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits)
mm: revert mremap pud_free anti-fix
mm: fix BUG in __split_huge_page_pmd
swap: fix set_blocksize race during swapon/swapoff
procfs: call default get_unmapped_area on MMU-present architectures
procfs: fix unintended truncation of returned mapped address
writeback: fix negative bdi max pause
percpu_refcount: export symbols
fs: buffer: move allocation failure loop into the allocator
mm: memcg: handle non-error OOM situations more gracefully
tools/testing/selftests: fix uninitialized variable
block/partitions/efi.c: treat size mismatch as a warning, not an error
mm: hugetlb: initialize PG_reserved for tail pages of gigantic compound pages
mm/zswap: bugfix: memory leak when re-swapon
mm: /proc/pid/pagemap: inspect _PAGE_SOFT_DIRTY only on present pages
mm: migration: do not lose soft dirty bit if page is in migration state
gcov: MAINTAINERS: Add an entry for gcov
mm/hugetlb.c: correct missing private flag clearing
mm/vmscan.c: don't forget to free shrinker->nr_deferred
ipc/sem.c: synchronize semop and semctl with IPC_RMID
ipc: update locking scheme comments
...
Hugh Dickins [Wed, 16 Oct 2013 20:47:09 +0000 (13:47 -0700)]
mm: revert mremap pud_free anti-fix
Revert commit 1ecfd533f4c5 ("mm/mremap.c: call pud_free() after fail
calling pmd_alloc()").
The original code was correct: pud_alloc(), pmd_alloc(), pte_alloc_map()
ensure that the pud, pmd, pt is already allocated, and seldom do they
need to allocate; on failure, upper levels are freed if appropriate by
the subsequent do_munmap(). Whereas commit 1ecfd533f4c5 did an
unconditional pud_free() of a most-likely still-in-use pud: saved only
by the near-impossiblity of pmd_alloc() failing.
Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Chen Gang <gang.chen@asianux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 16 Oct 2013 20:47:08 +0000 (13:47 -0700)]
mm: fix BUG in __split_huge_page_pmd
Occasionally we hit the BUG_ON(pmd_trans_huge(*pmd)) at the end of
__split_huge_page_pmd(): seen when doing madvise(,,MADV_DONTNEED).
It's invalid: we don't always have down_write of mmap_sem there: a racing
do_huge_pmd_wp_page() might have copied-on-write to another huge page
before our split_huge_page() got the anon_vma lock.
Forget the BUG_ON, just go back and try again if this happens.
Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
swap: fix set_blocksize race during swapon/swapoff
Fix race between swapoff and swapon. Swapoff used old_block_size from
swap_info outside of swapon_mutex so it could be overwritten by
concurrent swapon.
The race has visible effect only if more than one swap block device
exists with different block sizes (e.g. /dev/sda1 with block size 4096
and /dev/sdb1 with 512). In such case it leads to setting the blocksize
of swapped off device with wrong blocksize.
The bug can be triggered with multiple concurrent swapoff and swapon:
0. Swap for some device is on.
1. swapoff:
First the swapoff is called on this device and "struct swap_info_struct
*p" is assigned. This is done under swap_lock however this lock is
released for the call try_to_unuse().
2. swapon:
After the assignment above (and before acquiring swapon_mutex &
swap_lock by swapoff) the swapon is called on the same device.
The p->old_block_size is assigned to the value of block_size the device.
This block size should be the same as previous but sometimes it is not.
The swapon ends successfully.
3. swapoff:
Swapoff resumes, grabs the locks and mutex and continues to disable this
swap device. Now it sets the block size to value taken from swap_info
which was overwritten by swapon in 2.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Reported-by: Weijie Yang <weijie.yang.kh@gmail.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Shaohua Li <shli@fusionio.com> Cc: Minchan Kim <minchan@kernel.org> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
HATAYAMA Daisuke [Wed, 16 Oct 2013 20:47:05 +0000 (13:47 -0700)]
procfs: call default get_unmapped_area on MMU-present architectures
Commit c4fe24485729 ("sparc: fix PCI device proc file mmap(2)") added
proc_reg_get_unmapped_area in proc_reg_file_ops and
proc_reg_file_ops_no_compat, by which now mmap always returns EIO if
get_unmapped_area method is not defined for the target procfs file,
which causes regression of mmap on /proc/vmcore.
To address this issue, like get_unmapped_area(), call default
current->mm->get_unmapped_area on MMU-present architectures if
pde->proc_fops->get_unmapped_area, i.e. the one in actual file
operation in the procfs file, is not defined.
Reported-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: David S. Miller <davem@davemloft.net> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
HATAYAMA Daisuke [Wed, 16 Oct 2013 20:47:04 +0000 (13:47 -0700)]
procfs: fix unintended truncation of returned mapped address
Currently, proc_reg_get_unmapped_area truncates upper 32-bit of the
mapped virtual address returned from get_unmapped_area method in
pde->proc_fops due to the variable rv of signed integer on x86_64. This
is too small to have vitual address of unsigned long on x86_64 since on
x86_64, signed integer is of 4 bytes while unsigned long is of 8 bytes.
To fix this issue, use unsigned long instead.
Fixes a regression added in commit c4fe24485729 ("sparc: fix PCI device
proc file mmap(2)").
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: David S. Miller <davem@davemloft.net> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since pause is bounded by [min_pause, max_pause] where min_pause is also
bounded by max_pause. It's suspected and demonstrated that the
max_pause calculation goes wrong:
The problem lies in the two "long = unsigned long" assignments in
bdi_max_pause() which might go negative if the highest bit is 1, and the
min_t(long, ...) check failed to protect it falling under 0. Fix all of
them by using "unsigned long" throughout the function.
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Reported-by: Toralf Förster <toralf.foerster@gmx.de> Tested-by: Toralf Förster <toralf.foerster@gmx.de> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Richard Weinberger <richard@nod.at> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Wed, 16 Oct 2013 20:47:00 +0000 (13:47 -0700)]
fs: buffer: move allocation failure loop into the allocator
Buffer allocation has a very crude indefinite loop around waking the
flusher threads and performing global NOFS direct reclaim because it can
not handle allocation failures.
The most immediate problem with this is that the allocation may fail due
to a memory cgroup limit, where flushers + direct reclaim might not make
any progress towards resolving the situation at all. Because unlike the
global case, a memory cgroup may not have any cache at all, only
anonymous pages but no swap. This situation will lead to a reclaim
livelock with insane IO from waking the flushers and thrashing unrelated
filesystem cache in a tight loop.
Use __GFP_NOFAIL allocations for buffers for now. This makes sure that
any looping happens in the page allocator, which knows how to
orchestrate kswapd, direct reclaim, and the flushers sensibly. It also
allows memory cgroups to detect allocations that can't handle failure
and will allow them to ultimately bypass the limit if reclaim can not
make progress.
Reported-by: azurIt <azurit@pobox.sk> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Wed, 16 Oct 2013 20:46:59 +0000 (13:46 -0700)]
mm: memcg: handle non-error OOM situations more gracefully
Commit 3812c8c8f395 ("mm: memcg: do not trap chargers with full
callstack on OOM") assumed that only a few places that can trigger a
memcg OOM situation do not return VM_FAULT_OOM, like optional page cache
readahead. But there are many more and it's impractical to annotate
them all.
First of all, we don't want to invoke the OOM killer when the failed
allocation is gracefully handled, so defer the actual kill to the end of
the fault handling as well. This simplifies the code quite a bit for
added bonus.
Second, since a failed allocation might not be the abrupt end of the
fault, the memcg OOM handler needs to be re-entrant until the fault
finishes for subsequent allocation attempts. If an allocation is
attempted after the task already OOMed, allow it to bypass the limit so
that it can quickly finish the fault and invoke the OOM killer.
Reported-by: azurIt <azurit@pobox.sk> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Doug Anderson [Wed, 16 Oct 2013 20:46:57 +0000 (13:46 -0700)]
block/partitions/efi.c: treat size mismatch as a warning, not an error
In commit 27a7c642174e ("partitions/efi: account for pmbr size in lba")
we started treating bad sizes in lba field of the partition that has the
0xEE (GPT protective) as errors.
However, we may run into these "bad sizes" in the real world if someone
uses dd to copy an image from a smaller disk to a bigger disk. Since
this case used to work (even without using force_gpt), keep it working
and treat the size mismatch as a warning instead of an error.
Reported-by: Josh Triplett <josh@joshtriplett.org> Reported-by: Sean Paul <seanpaul@chromium.org> Signed-off-by: Doug Anderson <dianders@chromium.org> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Davidlohr Bueso <davidlohr@hp.com> Tested-by: Artem Bityutskiy <dedekind1@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Arcangeli [Wed, 16 Oct 2013 20:46:56 +0000 (13:46 -0700)]
mm: hugetlb: initialize PG_reserved for tail pages of gigantic compound pages
Commit 11feeb498086 ("kvm: optimize away THP checks in
kvm_is_mmio_pfn()") introduced a memory leak when KVM is run on gigantic
compound pages.
That commit depends on the assumption that PG_reserved is identical for
all head and tail pages of a compound page. So that if get_user_pages
returns a tail page, we don't need to check the head page in order to
know if we deal with a reserved page that requires different
refcounting.
The assumption that PG_reserved is the same for head and tail pages is
certainly correct for THP and regular hugepages, but gigantic hugepages
allocated through bootmem don't clear the PG_reserved on the tail pages
(the clearing of PG_reserved is done later only if the gigantic hugepage
is freed).
This patch corrects the gigantic compound page initialization so that we
can retain the optimization in 11feeb498086. The cacheline was already
modified in order to set PG_tail so this won't affect the boot time of
large memory systems.
[akpm@linux-foundation.org: tweak comment layout and grammar] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: andy123 <ajs124.ajs124@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Acked-by: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Weijie Yang [Wed, 16 Oct 2013 20:46:54 +0000 (13:46 -0700)]
mm/zswap: bugfix: memory leak when re-swapon
zswap_tree is not freed when swapoff, and it got re-kmalloced in swapon,
so a memory leak occurs.
Free the memory of zswap_tree in zswap_frontswap_invalidate_area().
Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Reviewed-by: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org>
From: Weijie Yang <weijie.yang@samsung.com>
Subject: mm/zswap: bugfix: memory leak when invalidate and reclaim occur concurrently
Consider the following scenario:
thread 0: reclaim entry x (get refcount, but not call zswap_get_swap_cache_page)
thread 1: call zswap_frontswap_invalidate_page to invalidate entry x.
finished, entry x and its zbud is not freed as its refcount != 0
now, the swap_map[x] = 0
thread 0: now call zswap_get_swap_cache_page
swapcache_prepare return -ENOENT because entry x is not used any more
zswap_get_swap_cache_page return ZSWAP_SWAPCACHE_NOMEM
zswap_writeback_entry do nothing except put refcount
Now, the memory of zswap_entry x and its zpage leak.
Modify:
- check the refcount in fail path, free memory if it is not referenced.
- use ZSWAP_SWAPCACHE_FAIL instead of ZSWAP_SWAPCACHE_NOMEM as the fail path
can be not only caused by nomem but also by invalidate.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> Acked-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cyrill Gorcunov [Wed, 16 Oct 2013 20:46:53 +0000 (13:46 -0700)]
mm: /proc/pid/pagemap: inspect _PAGE_SOFT_DIRTY only on present pages
If a page we are inspecting is in swap we may occasionally report it as
having soft dirty bit (even if it is clean). The pte_soft_dirty helper
should be called on present pte only.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cyrill Gorcunov [Wed, 16 Oct 2013 20:46:51 +0000 (13:46 -0700)]
mm: migration: do not lose soft dirty bit if page is in migration state
If page migration is turned on in config and the page is migrating, we
may lose the soft dirty bit. If fork and mprotect are called on
migrating pages (once migration is complete) pages do not obtain the
soft dirty bit in the correspond pte entries. Fix it adding an
appropriate test on swap entries.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>