However the feature can be useful for other relatively slow or untrusted
BDIs like USB flash drives and DVD+RW. The patch adds a knob to enable
the feature:
echo 1 > /sys/class/bdi/X:Y/strictlimit
Being enabled, the feature enforces bdi max_ratio limit even if global
(10%) dirty limit is not reached. Of course, the effect is not visible
until /sys/class/bdi/X:Y/max_ratio is decreased to some reasonable value.
Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Theodore Ts'o <tytso@mit.edu> Cc: "Artem S. Tashkinov" <t.artem@lycos.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Jan Kara <jack@suse.cz> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Minchan Kim [Wed, 15 Jan 2014 05:56:13 +0000 (16:56 +1100)]
zram: remove zram->lock in read path and change it with mutex
Finally, we separated zram->lock dependency from 32bit stat/ table
handling so there is no reason to use rw_semaphore between read and write
path so this patch removes the lock from read path totally and changes
rw_semaphore with mutex. So, we could do
old:
read-read: OK
read-write: NO
write-write: NO
Now:
read-read: OK
read-write: OK
write-write: NO
So below data proves mixed workload performs well 11 times
and there is also enhance on write-write path because current
rw-semaphore doesn't support SPIN_ON_OWNER.
It's side effect but anyway good thing for us.
Write-related test perform better(from 61% to 1058%) but
read path has good/bad(from -2.22% to 1.45%) but they are all
marginal within stddev.
Minchan Kim [Wed, 15 Jan 2014 05:56:12 +0000 (16:56 +1100)]
zram: remove workqueue for freeing removed pending slot
a0c516c ("zram: don't grab mutex in zram_slot_free_noity") introduced free
request pending code to avoid scheduling by mutex under spinlock and it
was a mess which made code lenghty and increased overhead.
Now, we don't need zram->lock any more to free slot so this patch reverts
it and then, tb_lock should protect it.
Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Minchan Kim [Wed, 15 Jan 2014 05:56:12 +0000 (16:56 +1100)]
zram: introduce zram->tb_lock
Currently, table is protected by zram->lock but it's rather coarse-grained
lock and it makes hard for scalibility.
Let's use own rwlock instead of depending on zram->lock. This patch adds
new locking so obviously, it would make slow but this patch is just
prepartion for removing coarse-grained rw_semaphore(ie, zram->lock) which
is hurdle about zram scalability.
Final patch in this patchset series will remove the lock from read-path
and change rw_semaphore with mutex in write path. With bonus, we could
drop pending slot free mess in next patch.
Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Minchan Kim [Wed, 15 Jan 2014 05:56:12 +0000 (16:56 +1100)]
zram: use atomic operation for stat
Some of fields in zram->stats are protected by zram->lock which
is rather coarse-grained so let's use atomic operation without
explict locking.
This patch is ready for removing dependency of zram->lock in
read path which is very coarse-grained rw_semaphore.
Of course, this patch adds new atomic operation so it might make
slow but my 12CPU test couldn't spot any regression.
All gain/lose is marginal within stddev.
Minchan Kim [Wed, 15 Jan 2014 05:56:12 +0000 (16:56 +1100)]
zram: remove unnecessary free
a0c516cbfc ("zram: don't grab mutex in zram_slot_free_noity") introduced
pending zram slot free in zram's write path in case of missing slot free
by memory allocation failure in zram_slot_free_notify but it is not
necessary because we have already freed the slot right before overwriting.
Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Minchan Kim [Wed, 15 Jan 2014 05:56:11 +0000 (16:56 +1100)]
zram: delay pending free request in read path
Sergey reported we don't need to handle pending free request every I/O so
that this patch removes it in read path while we remain it in write path.
Let's consider below example.
Swap subsystem ask to zram "A" block free by swap_slot_free_notify but
zram had been pended it without real freeing. Swap subsystem allocates
"A" block for new data but request pended for a long time just handled and
zram blindly free new data on the "A" block. :(
That's why we couldn't remove handle pending free request right before
zram-write.
Minchan Kim [Wed, 15 Jan 2014 05:56:11 +0000 (16:56 +1100)]
zram: fix race between reset and flushing pending work
Dan and Sergey reported that there is a racy between reset and flushing of
pending work so that it could make oops by freeing zram->meta in reset
while zram_slot_free can access zram->meta if new request is adding during
the race window.
This patch moves flush after taking init_lock so it prevents new request
so that it closes the race.
Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Vladimir Murzin [Wed, 15 Jan 2014 05:56:11 +0000 (16:56 +1100)]
arm: move arm_dma_limit to setup_dma_zone
Since 4dcfa600 ("ARM: DMA-API: better handing of DMA masks for coherent
allocations") arm_dma_limit_pfn has almost substituted the arm_dma_limit.
The remaining user is dma_contiguous_reserve(). It is also referenced in
setup_dma_zone() to calculate arm_dma_limit_pfn.
Kill the global arm_dma_limit and equip setup_zone_dma with the local one.
Signed-off-by: Vladimir Murzin <murzin.v@gmail.com> Reported-by: Vassili Karpov <av1474@comtv.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Roman Gushchin [Wed, 15 Jan 2014 05:56:11 +0000 (16:56 +1100)]
kernel/smp.c: remove cpumask_ipi
After 9a46ad6 ("smp: make smp_call_function_many() use logic similar to
smp_call_function_single()"), cfd->cpumask is accessed only in
smp_call_function_many(). So there is no more need to copy it into
cfd->cpumask_ipi before putting csd into the list. The cpumask_ipi field
is obsolete and can be removed.
Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> Cc: Ingo Molnar <mingo@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Wang YanQing <udknight@gmail.com> Cc: Xie XiuQi <xiexiuqi@huawei.com> Cc: Shaohua Li <shli@fusionio.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
kernel: use lockless list for smp_call_function_single
Make smp_call_function_single and friends more efficient by using
a lockless list.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrew Morton [Wed, 15 Jan 2014 05:56:10 +0000 (16:56 +1100)]
block/blk-mq-cpu.c: use hotcpu_notifier()
Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Levente Kurusa [Wed, 15 Jan 2014 05:56:08 +0000 (16:56 +1100)]
drivers/net/phy/mdio_bus.c: call put_device on device_register() failure
It is required to call put_device() if device_register() fails, so that we
give up the last reference to the device. Calling put_device allows for
mdiobus_release to be executed, kfreeing the bus.
Signed-off-by: Levente Kurusa <levex@linux.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: David Daney <david.daney@cavium.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Levente Kurusa [Wed, 15 Jan 2014 05:56:08 +0000 (16:56 +1100)]
drivers/video/backlight/lcd.c: call put_device if device_register fails
Currently we kfree the container of the device which failed to register.
This is wrong as the last reference is not given up with a put_device
call. Also, now that we have put_device() callen, we no longer need the
kfree as the new_ld->dev.release function will take care of kfreeing the
associated memory.
Signed-off-by: Levente Kurusa <levex@linux.com> Acked-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Levente Kurusa [Wed, 15 Jan 2014 05:56:08 +0000 (16:56 +1100)]
drivers/w1/w1_int.c: call put_device if device_register fails
Currently, memsetting and kfreeing the device is bad behaviour. The
device will have a reference count of 1 and hence can cause trouble
because it has kfree'd. Proper way to handle a failed device_register is
to call put_device right after it fails.
Minchan Kim [Wed, 15 Jan 2014 05:56:08 +0000 (16:56 +1100)]
zram: promote zram from staging
Zram has lived in staging for a LONG LONG time and have been
fixed/improved by many contributors so code is clean and stable now. Of
course, there are lots of product using zram in real practice.
The major TV companys have used zram as swap since two years ago and
recently our production team released android smart phone with zram which
is used as swap, too and recently Android Kitkat start to use zram for
small memory smart phone. And there was a report Google released their
ChromeOS with zram, too and cyanogenmod have been used zram long time ago.
And I heard some disto have used zram block device for tmpfs. In
addition, I saw many report from many other peoples. For example, Lubuntu
start to use it.
The benefit of zram is very clear. With my experience, one of the benefit
was to remove jitter of video application with backgroud memory pressure.
It would be effect of efficient memory usage by compression but more issue
is whether swap is there or not in the system. Recent mobile platforms
have used JAVA so there are many anonymous pages. But embedded system
normally are reluctant to use eMMC or SDCard as swap because there is
wear-leveling and latency issues so if we do not use swap, it means we
can't reclaim anoymous pages and at last, we could encounter OOM kill. :(
Although we have real storage as swap, it was a problem, too. Because it
sometime ends up making system very unresponsible caused by slow swap
storage performance.
Quote from Luigi on Google
"
Since Chrome OS was mentioned: the main reason why we don't use swap
to a disk (rotating or SSD) is because it doesn't degrade gracefully
and leads to a bad interactive experience. Generally we prefer to
manage RAM at a higher level, by transparently killing and restarting
processes. But we noticed that zram is fast enough to be competitive
with the latter, and it lets us make more efficient use of the
available RAM.
"
and he announced. http://www.spinics.net/lists/linux-mm/msg57717.html
Other uses case is to use zram for block device. Zram is block device so
anyone can format the block device and mount on it so some guys on the
internet start zram as /var/tmp.
http://forums.gentoo.org/viewtopic-t-838198-start-0.html
Let's promote zram and enhance/maintain it instead of removing.
Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Nitin Gupta <ngupta@vflare.org> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Bob Liu <bob.liu@oracle.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Luigi Semenzato <semenzato@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Seth Jennings <sjenning@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Minchan Kim [Wed, 15 Jan 2014 05:56:07 +0000 (16:56 +1100)]
zsmalloc: move it under mm
This patch moves zsmalloc under mm directory.
Before that, description will explain why we have needed custom allocator.
Zsmalloc is a new slab-based memory allocator for storing compressed
pages. It is designed for low fragmentation and high allocation success
rate on large object, but <= PAGE_SIZE allocations.
zsmalloc differs from the kernel slab allocator in two primary ways to
achieve these design goals.
zsmalloc never requires high order page allocations to back slabs, or
"size classes" in zsmalloc terms. Instead it allows multiple single-order
pages to be stitched together into a "zspage" which backs the slab. This
allows for higher allocation success rate under memory pressure.
Also, zsmalloc allows objects to span page boundaries within the zspage.
This allows for lower fragmentation than could be had with the kernel slab
allocator for objects between PAGE_SIZE/2 and PAGE_SIZE. With the kernel
slab allocator, if a page compresses to 60% of it original size, the
memory savings gained through compression is lost in fragmentation because
another object of the same size can't be stored in the leftover space.
This ability to span pages results in zsmalloc allocations not being
directly addressable by the user. The user is given an non-dereferencable
handle in response to an allocation request. That handle must be mapped,
using zs_map_object(), which returns a pointer to the mapped region that
can be used. The mapping is necessary since the object data may reside in
two different noncontigious pages.
The zsmalloc fulfills the allocation needs for zram perfectly
[sjenning@linux.vnet.ibm.com: borrow Seth's quote] Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Nitin Gupta <ngupta@vflare.org> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Luigi Semenzato <semenzato@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Pekka Enberg <penberg@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Seth Jennings <sjenning@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Wanpeng Li [Wed, 15 Jan 2014 05:56:07 +0000 (16:56 +1100)]
mm/migrate.c: fix setting of cpupid on page migration twice against normal page
7851a45cd3 ("mm: numa: Copy cpupid on page migration") copies over the
cpupid at page migration time. it is unnecessary to set it again in
alloc_misplaced_dst_page().
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Chen Gang [Wed, 15 Jan 2014 05:56:06 +0000 (16:56 +1100)]
kernel/kexec.c: use vscnprintf() instead of vsnprintf() in vmcoreinfo_append_str()
vsnprintf() may let 'r' larger than sizeof(buf), in this case, if 'r' is
also less than "vmcoreinfo_max_size - vmcoreinfo_size" (left size of
destination buffer), next memcpy() will read the unexpected addresses.
Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>