Linus Torvalds [Thu, 24 Oct 2013 06:45:34 +0000 (07:45 +0100)]
Merge tag 'md/3.12-fixes' of git://neil.brown.name/md
Pull md bugfixes from Neil Brown:
"Assorted md bug-fixes for 3.12.
All tagged for -stable releases too"
* tag 'md/3.12-fixes' of git://neil.brown.name/md:
raid5: avoid finding "discard" stripe
raid5: set bio bi_vcnt 0 for discard request
md: avoid deadlock when md_set_badblocks.
md: Fix skipping recovery for read-only arrays.
Linus Torvalds [Thu, 24 Oct 2013 06:44:47 +0000 (07:44 +0100)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a set of two fixes which cause oopses (Buslogic, qla2xxx) and
one fix which may cause a hang because of request miscounting (sd)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
[SCSI] sd: call blk_pm_runtime_init before add_disk
[SCSI] qla2xxx: Fix request queue null dereference.
[SCSI] BusLogic: Fix an oops when intializing multimaster adapter
majianpeng [Wed, 28 Aug 2013 11:40:33 +0000 (19:40 +0800)]
raid1: Relace raise_barrier/lower_barrrier with freeze_array/unfreeze_array for reconfigure the array.
We used raise_barrier to suspend nornal IO while we reconfigure the
array. Now we replace it with freeze_array.
Raise_barrier only suspends normal IO. But freeze_array not only susend
nornal io,but it can suspend resync io. For the place where call
raise_barrier for reconfigure, it don't has a problem.
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
majianpeng [Wed, 28 Aug 2013 11:40:14 +0000 (19:40 +0800)]
raid1: Add a field array_frozen to indicate whether raid in freeze state.
Because the following patch will rewrite the content between normal IO
and resync IO. So we used a parameter to indicate whether raid is in freeze
array.
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Tue, 22 Oct 2013 00:50:35 +0000 (11:50 +1100)]
md/raid5: avoid deadlock when raid5 array has unack badblocks during md_stop_writes.
When raid5 recovery hits a fresh badblock, this badblock will flagged as unack
badblock until md_update_sb() is called.
But md_stop will take reconfig lock which means raid5d can't call
md_update_sb() in md_check_recovery(), the badblock will always
be unack, so raid5d thread enters an infinite loop and md_stop_write()
can never stop sync_thread. This causes deadlock.
To solve this, when STOP_ARRAY ioctl is issued and sync_thread is
running, we need set md->recovery FROZEN and INTR flags and wait for
sync_thread to stop before we (re)take reconfig lock.
raid5: Retry R5_ReadNoMerge flag when hit a read error.
Because of block layer merge, one bio fails will cause other bios
which belongs to the same request fails, so raid5_end_read_request
will record all these bios as badblocks.
If retry request with R5_ReadNoMerge flag to avoid bios merge,
badblocks can only record sector which is bad exactly.
Shaohua Li [Tue, 10 Sep 2013 07:37:56 +0000 (15:37 +0800)]
raid5: relieve lock contention in get_active_stripe()
get_active_stripe() is the last place we have lock contention. It has two
paths. One is stripe isn't found and new stripe is allocated, the other is
stripe is found.
The first path basically calls __find_stripe and init_stripe. It accesses
conf->generation, conf->previous_raid_disks, conf->raid_disks,
conf->prev_chunk_sectors, conf->chunk_sectors, conf->max_degraded,
conf->prev_algo, conf->algorithm, the stripe_hashtbl and inactive_list. Except
stripe_hashtbl and inactive_list, other fields are changed very rarely.
With this patch, we split inactive_list and add new hash locks. Each free
stripe belongs to a specific inactive list. Which inactive list is determined
by stripe's lock_hash. Note, even a stripe hasn't a sector assigned, it has a
lock_hash assigned. Stripe's inactive list is protected by a hash lock, which
is determined by it's lock_hash too. The lock_hash is derivied from current
stripe_hashtbl hash, which guarantees any stripe_hashtbl list will be assigned
to a specific lock_hash, so we can use new hash lock to protect stripe_hashtbl
list too. The goal of the new hash locks introduced is we can only use the new
locks in the first path of get_active_stripe(). Since we have several hash
locks, lock contention is relieved significantly.
The first path of get_active_stripe() accesses other fields, since they are
changed rarely, changing them now need take conf->device_lock and all hash
locks. For a slow path, this isn't a problem.
If we need lock device_lock and hash lock, we always lock hash lock first. The
tricky part is release_stripe and friends. We need take device_lock first.
Neil's suggestion is we put inactive stripes to a temporary list and readd it
to inactive_list after device_lock is released. In this way, we add stripes to
temporary list with device_lock hold and remove stripes from the list with hash
lock hold. So we don't allow concurrent access to the temporary list, which
means we need allocate temporary list for all participants of release_stripe.
One downside is free stripes are maintained in their inactive list, they can't
across between the lists. By default, we have total 256 stripes and 8 lists, so
each list will have 32 stripes. It's possible one list has free stripe but
other list hasn't. The chance should be rare because stripes allocation are
even distributed. And we can always allocate more stripes for cache, several
mega bytes memory isn't a big deal.
This completely removes the lock contention of the first path of
get_active_stripe(). It slows down the second code path a little bit though
because we now need takes two locks, but since the hash lock isn't contended,
the overhead should be quite small (several atomic instructions). The second
path of get_active_stripe() (basically sequential write or big request size
randwrite) still has lock contentions.
Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: NeilBrown <neilb@suse.de>
md/raid5.c: add proper locking to error path of raid5_start_reshape.
If raid5_start_reshape errors out, we need to reset all the fields
that were updated (not just some), and need to use the seq_counter
to ensure make_request() doesn't use an inconsitent state.
Shaohua Li [Sat, 19 Oct 2013 06:51:42 +0000 (14:51 +0800)]
raid5: avoid finding "discard" stripe
SCSI discard will damage discard stripe bio setting, eg, some fields are
changed. If the stripe is reused very soon, we have wrong bios setting. We
remove discard stripe from hash list, so next time the strip will be fully
initialized.
Suitable for backport to 3.7+.
Cc: <stable@vger.kernel.org> (3.7+) Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: NeilBrown <neilb@suse.de>
Shaohua Li [Sat, 19 Oct 2013 06:50:28 +0000 (14:50 +0800)]
raid5: set bio bi_vcnt 0 for discard request
SCSI layer will add new payload for discard request. If two bios are merged
to one, the second bio has bi_vcnt 1 which is set in raid5. This will confuse
SCSI and cause oops.
Suitable for backport to 3.7+
Cc: stable@vger.kernel.org (v3.7+) Reported-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Bian Yu [Sat, 12 Oct 2013 05:10:03 +0000 (01:10 -0400)]
md: avoid deadlock when md_set_badblocks.
When operate harddisk and hit errors, md_set_badblocks is called after
scsi_restart_operations which already disabled the irq. but md_set_badblocks
will call write_sequnlock_irq and enable irq. so softirq can preempt the
current thread and that may cause a deadlock. I think this situation should
use write_sequnlock_irqsave/irqrestore instead.
This bug was introduce in commit 2e8ac30312973dd20e68073653
(the first time rdev_set_badblock was call from interrupt context),
so this patch is appropriate for 3.5 and subsequent kernels.
spares are activated on a read-only array. In case of raid1 and raid10
personalities it causes that not-in-sync devices are marked in-sync
without checking if recovery has been finished.
If a read-only array is degraded and one of its devices is not in-sync
(because the array has been only partially recovered) recovery will be skipped.
This patch adds checking if recovery has been finished before marking a device
in-sync for raid1 and raid10 personalities. In case of raid5 personality
such condition is already present (at raid5.c:6029).
Bug was introduced in 3.10 and causes data corruption.
Huang Shijie [Tue, 27 Aug 2013 09:29:07 +0000 (17:29 +0800)]
mtd: gpmi: imx6: fix the wrong method for checking ready/busy
In the imx6, all the ready/busy pins are binding togeter.
So we should always check the ready/busy pin of the chip 0.
In the other word, when the CS1 is enabled, we should also check the
ready/busy of chip 0; if we check the ready/busy of chip 1,
we will get the wrong result.
Signed-off-by: Huang Shijie <b32955@freescale.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Dave Chinner [Mon, 14 Oct 2013 22:17:56 +0000 (09:17 +1100)]
xfs: split xfs_rtalloc.c for userspace sanity
xfs_rtalloc.c is partially shared with userspace. Split the file up
into two parts - one that is kernel private and the other which is
wholly shared with userspace.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>