git.kernelconcepts.de Git - karo-tx-linux.git/log

mtd: nand: mtk: add ->setup_data_interface() hook

Currently, we use the fixed ACC timing 0x10804211. This is not the best
setting for each case. Actually, MTK NAND controller can adapt ACC timings
dynamically according to nfi clock frequence.
Implement the ->setup_data_interface() hook to optimize driver performance.

Signed-off-by: Xiaolei Li <xiaolei.li@mediatek.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: mtk: remove unneeded mtk_ecc_hw_init from mtk_ecc_resume

There is no need to add mtk_ecc_hw_init during ecc resume, because there
always takes mtk_ecc_wait_idle in the function mtk_ecc_enable.

Signed-off-by: Xiaolei Li <xiaolei.li@mediatek.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: mtk: remove unneeded mtk_nfc_hw_init from mtk_nfc_resume

chip->select_chip will do nfc runtime configuration. There is no need to
do mtk_nfc_hw_init before it.

Signed-off-by: Xiaolei Li <xiaolei.li@mediatek.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: mtk: disable ecc irq when writing page with hwecc

Currently, ecc encode irq is enabled when writing page with hwecc, but
we actually do not wait for this irq done. Because NFI and ECC work in
parallel, nfi irq and ecc irq almost come together.

Now, there are two steps to check whether page data are totally written.
First, wait for nfi irq INTR_AHB_DONE. This is to ensure all data
in RAM are received by NFI.
Second, polling the register NFI_ADDRCNTR till all data include ecc
parity data runtime generated by ECC are sent to NAND device.

So, it is redunant to enable ecc irq without waiting for it.

Signed-off-by: Xiaolei Li <xiaolei.li@mediatek.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: mtk: fix incorrect register setting order about ecc irq

Currently, we trigger ECC HW before setting ecc irq. It is incorrect.
Because ECC starts working once the register ECC_CTL_REG is set as
ECC_OP_ENABLE. And this may lead an abnormal behavior of ecc irq.
So, should enable ecc irq at first, then trigger ECC.

Fixes: 1d6b1e464950 ("mtd: mediatek: driver for MTK Smart Device")
Signed-off-by: Xiaolei Li <xiaolei.li@mediatek.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

MAINTAINERS: add entry for Denali NAND controller driver

The Denali NAND controller driver (drivers/mtd/nand/denali*) has been
largely reworked by me. Add myself as its maintainer now.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

dt-bindings: gpmc: Correct location of generic gpmc binding

The binding bus/ti-gpmc.txt has been moved to
memory-controllers/omap-gpmc.txt. Update all references to this in
order to make reading and understanding a given binding easier.

Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc:Boris Brezillon <boris.brezillon@free-electrons.com>
Cc: Marek Vasut <marek.vasut@gmail.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Cyrille Pitchen <cyrille.pitchen@wedev4u.fr>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Tom Rini <trini@konsulko.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

dt-bindings: mtd: elm: Correct compatible string requirement

The binding says that the compatible string must be "ti,am33xx-elm"
but the code checks only for, and all functioning users set, this as
"ti,am3352-elm" so correct the binding.

Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
Cc: Marek Vasut <marek.vasut@gmail.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Cyrille Pitchen <cyrille.pitchen@wedev4u.fr>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Tom Rini <trini@konsulko.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: ifc: Initialize SRAM for all version >= 1.0

All IFC version >= 1.0 use 28nm technology for SRAM. Here SRAM has
a requirement to initialize before any read operation performed for
avoiding ECC Error.

So update condition check to initialize SRAM for all IFC version >= 1.0.0

Signed-off-by: Prabhakar Kushwaha <prabhakar.kushwaha@nxp.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: denali: avoid magic numbers and rename for clarification

Introduce some macros and helpers to avoid magic numbers and
rename macros/functions for clarification.

- We see '| 2' in several places.  This means Data Cycle in MAP11 mode.
  The Denali User's Guide says bit[1:0] of MAP11 is like follows:

  b'00 = Command Cycle
  b'01 = Address Cycle
  b'10 = Data Cycle

  So, this commit added DENALI_MAP11_{CMD,ADDR,DATA} macros.

- We see 'denali->flash_mem + 0x10' in several places, but 0x10 is a
  magic number.  Actually, this accesses the data port of the Host
  Data/Command Interface.  So, this commit added DENALI_HOST_DATA.
  On the other hand, 'denali->flash_mem' gets access to the address
  port, so DENALI_HOST_ADDR was also added.

- We see 'index_addr(denali, cmd, 0x1)' in denali_erase(), but 0x1
  is a magic number.  0x1 means the erase operation.  Replace 0x1
  with DENALI_ERASE.

- Rename index_addr() to denali_host_write() for clarification

- Denali User's Guide says MAP{00,01,10,11} for access mode.  Match
  the macros with terminology in the IP document.

- Rename struct members as follows:
  flash_bank   -> active_bank    (currently selected bank)
  flash_reg    -> reg            (base address of registers)
  flash_mem    -> host           (base address of host interface)
  devnum       -> devs_per_cs    (devices connected in parallel)
  bbtskipbytes -> oob_skip_bytes (number of bytes to skip in OOB)

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: denali: enable bad block table scan

Now this driver is ready to remove NAND_SKIP_BBTSCAN.

The BBT descriptors in denali.c are equivalent to the ones in
nand_bbt.c. There is no need to duplicate the equivalent structures.
The with-oob decriptors do not work for this driver anyway.

The bbt_pattern (offs = 8) and the version (veroffs = 12) area
overlaps the ECC area. Set NAND_BBT_NO_OOB flag to use the no_oob
variant of the BBT descriptors.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: denali: use non-managed kmalloc() for DMA buffer

As Russell and Lars stated in the discussion [1], using
devm_k*alloc() with DMA is not a good idea.

Let's use kmalloc (not kzalloc because no need for zero-out).
Also, allocate the buffer as late as possible because it must be
freed for any error that follows.

[1] https://lkml.org/lkml/2017/3/8/693

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: denali: skip driver internal bounce buffer when possible

For ecc->read_page() and ecc->write_page(), it is possible to call
dma_map_single() against the given buffer. This bypasses the driver
internal bounce buffer and save the memcpy().

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: denali: support hardware-assisted erased page detection

Recent versions of this IP support automatic erased page detection.
If an erased page is detected on reads, the controller does not set
INTR__ECC_UNCOR_ERR, but INTR__ERASED_PAGE.

The detection of erased pages is based on the number of zeros in a
page; if the number of zeros is less than the value in the field
ERASED_THRESHOLD, the page is assumed as erased.

Please note ERASED_THRESHOLD specifies the number of zeros in a _page_
instead of an ECC chunk.  Moreover, the controller does not provide a
way to know the actual number of bitflips.

Actually, an erased page (all 0xff) is not an ECC correctable pattern
on the Denali ECC engine.  In other words, there may be overlap between
the following two:

[1] a bit pattern reachable from a valid payload + ECC pattern within
    ecc.strength bitflips
[2] a bit pattern reachable from an erased state (all 0xff) within
    ecc.strength bitflips

So, this feature may intercept ECC correctable patterns, then replace
[1] with [2].

After all, this feature can work safely only when ECC_THRESHOLD == 1,
i.e. detect erased pages without any bitflips.  This should be the
case most of the time.  If there is a bitflip or more, the driver will
fallback to the software method by using nand_check_erased_ecc_chunk().

Strangely enough, the driver still has to fill the buffer with 0xff
in case of INTR__ERASED_PAGE because the ECC correction engine has
already manipulated the data in the buffer before it judges erased
pages.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: denali: fix raw and oob accessors for syndrome page layout

The Denali IP adopts the syndrome page layout; payload and ECC are
interleaved, with BBM area always placed at the beginning of OOB.

The figure below shows the page organization for ecc->steps == 2:

  |----------------|    |-----------|
  |                |    |           |
  |                |    |           |
  |    Payload0    |    |           |
  |                |    |           |
  |                |    |           |
  |                |    |           |
  |----------------|    |  in-band  |
  |      ECC0      |    |   area    |
  |----------------|    |           |
  |                |    |           |
  |                |    |           |
  |    Payload1    |    |           |
  |                |    |           |
  |                |    |           |
  |----------------|    |-----------|
  |      BBM       |    |           |
  |----------------|    |           |
  |Payload1 (cont.)|    |           |
  |----------------|    |out-of-band|
  |      ECC1      |    |    area   |
  |----------------|    |           |
  |    OOB free    |    |           |
  |----------------|    |-----------|

The current raw / oob accessors do not take that into consideration,
so in-band and out-of-band data are transferred as stored in the
device.  In the case above,

  in-band:      Payload0 + ECC0 + Payload1(partial)
  out-of-band:  BBM + Payload1(cont.) + ECC1 + OOB-free

This is wrong.  As the comment block of struct nand_ecc_ctrl says,
driver callbacks must hide the specific layout used by the hardware
and always return contiguous in-band and out-of-band data.

The current implementation is completely screwed-up, so read/write
callbacks must be re-worked.

Also, it is reasonable to support PIO transfer in case DMA may not
work for some reasons.  Actually, the Data DMA may not be equipped
depending on the configuration of the RTL.  This can be checked by
reading the bit 4 of the FEATURES register.  Even if the controller
has the DMA support, dma_set_mask() and dma_map_single() could fail.
In either case, the driver can fall back to the PIO transfer.  Slower
access would be better than giving up.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: denali: use flag instead of register macro for direction

It is not a good idea to re-use macros that represent a specific
register bit field for the transfer direction.

It is true that bit 8 indicates the direction for the MAP10 pipeline
operation and the data DMA operation, but this is not valid across
the IP.

Use a simple flag (write: 1, read: 0) for the direction.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: denali: merge struct nand_buf into struct denali_nand_info

Now struct nand_buf has only two members, so I see no reason for the
separation.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: denali: propagate page to helpers via function argument

This driver stores the currently addressed page into denali->page,
which is later read out by helper functions. While I am tackling on
this driver, I often missed to insert "denali->page = page;" where
needed. This makes page_read/write callbacks to get access to a
wrong page, which is a bug hard to figure out.

Instead, I'd rather pass the page via function argument because the
compiler's prototype checks will help to detect bugs.

For the same reason, propagate dma_addr to the DMA helpers instead
of denali->buf.dma_buf .

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: denali: use interrupt instead of polling for bank reset

The current bank reset implementation polls the INTR_STATUS register
until interested bits are set.  This is not good because:

- polling simply wastes time-slice of the thread

- The while() loop may continue eternally if no bit is set, for
  example, due to the controller problem.  The denali_wait_for_irq()
  uses wait_for_completion_timeout(), which is safer.

We can use interrupt by moving the denali_reset_bank() call below
the interrupt setup.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: denali: fix bank reset function to detect the number of chips

The nand_scan_ident() iterates over maxchips, and calls nand_reset()
for each.  This driver currently passes the maximum number of banks
(=chip selects) supported by the controller as maxchips.  So, maxchips
is typically 4 or 8.  Usually, less number of NAND chips are connected
to the controller.

This can be a problem for ONFi devices.  Now, this driver implements
->setup_data_interface() hook, so nand_setup_data_interface() issues
Set Features (0xEF) command, which waits until the chip returns R/B#
response.  If no chip there, we know it never happens, but the driver
still ends up with waiting for a long time.  It will finally bail-out
with timeout error and the driver will work with existing chips, but
unnecessary wait will give a bad user experience.

The denali_nand_reset() polls the INTR__RST_COMP and INTR__TIME_OUT
bits, but they are always set even if not NAND chip is connected to
that bank.  To know the chip existence, INTR__INT_ACT bit must be
checked; this flag is set only when R/B# is toggled.  Since the Reset
(0xFF) command toggles the R/B# pin, this can be used to know the
actual number of chips, and update denali->max_banks.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: denali: switch over to cmd_ctrl instead of cmdfunc

The NAND_CMD_SET_FEATURES support is missing from denali_cmdfunc().
We also see /* TODO: Read OOB data */ comment.

It would be possible to add more commands along with the current
implementation, but having ->cmd_ctrl() seems a better approach from
the discussion with Boris [1].

Rely on the default ->cmdfunc() from the framework and implement the
driver's own ->cmd_ctrl().

This transition also fixes NAND_CMD_STATUS and NAND_CMD_PARAM handling.
NAND_CMD_STATUS was just faked by the register read, so the only valid
bit was the WP bit. NAND_CMD_PARAM was completely broken; not only the
command sent on the bus was NAND_CMD_STATUS instead of NAND_CMD_PARAM,
but also the driver was only reading 8 bytes, while the parameter page
contains several hundreds of bytes.

Also add ->write_byte(), which is needed for write direction commands,
->read/write_buf(16), which will be used some commits later.
->read_word() is not used for now, but the core may call it in the
future.

Now, this driver can drop nand_onfi_get_set_features_notsupp().

[1] https://lkml.org/lkml/2017/3/15/97

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: denali: rework interrupt handling

Simplify the interrupt handling and fix issues:

- The register field view of INTR_EN / INTR_STATUS is different
  among IP versions.  The global macro DENALI_IRQ_ALL is hard-coded
  for Intel platforms.  The interrupt mask should be determined at
  run-time depending on the running platform.

- wait_for_irq() loops do {} while() until interested flags are
  asserted.  The logic can be simplified.

- The spin_lock() guard seems too complex (and suspicious in a race
  condition if wait_for_completion_timeout() bails out by timeout).

- denali->complete is reused again and again, but reinit_completion()
  is missing.  Add it.

Re-work the code to make it more robust and easier to handle.

While we are here, also rename the jump label "failed_req_irq" to
more appropriate "disable_irq".

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: denali: handle timing parameters by setup_data_interface()

Handling timing parameters in a driver's own way should be avoided
because it duplicates efforts of drivers/mtd/nand/nand_timings.c
Besides, this driver hard-codes Intel specific parameters such as
CLK_X=5, CLK_MULTI=4.  Taking a certain device (Samsung K9WAG08U1A)
into account by get_samsung_nand_para() is weird as well.

Now, the core framework provides .setup_data_interface() hook, which
handles timing parameters in a generic manner.

While I am working on this, I found even more issues in the current
code, so fixed the following as well:

- In recent IP versions, WE_2_RE and TWHR2 share the same register.
  Likewise for ADDR_2_DATA and TCWAW, CS_SETUP_CNT and TWB.  When
  updating one, the other must be masked.  Otherwise, the other will
  be set to 0, then timing settings will be broken.

- The recent IP release expanded the ADDR_2_DATA to 7-bit wide.
  This register is related to tADL.  As commit 74a332e78e8f ("mtd:
  nand: timings: Fix tADL_min for ONFI 4.0 chips") addressed, the
  ONFi 4.0 increased the minimum of tADL to 400 nsec.  This may not
  fit in the 6-bit ADDR_2_DATA in older versions.  Check the IP
  revision and handle this correctly, otherwise the register value
  would wrap around.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: denali: remove unneeded find_valid_banks()

The function find_valid_banks() issues the Read ID (0x90) command,
then compares the first byte (Manufacturer ID) of each bank with
the one of bank0.

This is equivalent to what nand_scan_ident() does. The number of
chips is detected there, so this is unneeded.

What is worse for find_valid_banks() is that, if multiple chips are
connected to INTEL_CE4100 platform, it crashes the kernel by BUG().
This is what we should avoid. This function is just harmful and
unneeded.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: denali: set NAND_ECC_CUSTOM_PAGE_ACCESS

The denali_cmdfunc() actually does nothing valuable for
NAND_CMD_{PAGEPROG,READ0,SEQIN}.

For NAND_CMD_{READ0,SEQIN}, it copies "page" to "denali->page", then
denali_read_page(_raw) compares them just for the sanity check.
(Inconsistently, this check is missing from denali_write_page(_raw).)

The Denali controller is equipped with high level read/write interface,
so let's skip unneeded call of cmdfunc().

If NAND_ECC_CUSTOM_PAGE_ACCESS is set, nand_write_page() will not
call ->waitfunc hook. So, ->write_page(_raw) hooks should directly
return -EIO on failure. The error handling of page writes will be
much simpler.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: atmel: drop unused include

The Atmel NAND driver doesn't used anything from
linux/platform_data/atmel.h, stop including it.

Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: denali_dt: add compatible strings for UniPhier SoC variants

Add two compatible strings for UniPhier SoC family.

"socionext,uniphier-denali-nand-v5a" is used on UniPhier sLD3, LD4,
Pro4, sLD8.

"socionext,uniphier-denali-nand-v5b" is used on UniPhier Pro5, PXs2,
LD6b, LD11, LD20.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: denali: remove Toshiba and Hynix specific fixup code

The Denali IP can automatically detect device parameters such as
page size, oob size, device width, etc. and this driver currently
relies on it.  However, this hardware function is known to be
problematic.

[1] Due to a hardware bug, various misdetected cases were reported.
    That is why get_toshiba_nand_para() and get_hynix_nand_para()
    exist to fix-up the misdetected parameters.  It is not realistic
    to add a new NAND device to the *black list* every time we are
    hit by a misdetected case.  We would never be able to guarantee
    that all cases are covered.

[2] Because this feature is unreliable, it is disabled on some
    platforms.

The nand_scan_ident() detects device parameters in a more tested
way.  The hardware should not set the device parameter registers in
a different, unreliable way.  Instead, set the parameters from the
nand_scan_ident() back to the registers.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: denali: avoid hard-coding ECC step, strength, bytes

This driver was originally written for the Intel MRST platform with
several platform-specific parameters hard-coded.

Currently, the ECC settings are hard-coded as follows:

  #define ECC_SECTOR_SIZE 512
  #define ECC_8BITS       14
  #define ECC_15BITS      26

Therefore, the driver can only support two cases.
- ecc.size = 512, ecc.strength = 8    --> ecc.bytes = 14
- ecc.size = 512, ecc.strength = 15   --> ecc.bytes = 26

However, these are actually customizable parameters, for example,
UniPhier platform supports the following:

- ecc.size = 1024, ecc.strength = 8   --> ecc.bytes = 14
- ecc.size = 1024, ecc.strength = 16  --> ecc.bytes = 28
- ecc.size = 1024, ecc.strength = 24  --> ecc.bytes = 42

So, we need to handle the ECC parameters in a more generic manner.
Fortunately, the Denali User's Guide explains how to calculate the
ecc.bytes.  The formula is:

  ecc.bytes = 2 * CEIL(13 * ecc.strength / 16)  (for ecc.size = 512)
  ecc.bytes = 2 * CEIL(14 * ecc.strength / 16)  (for ecc.size = 1024)

For DT platforms, it would be reasonable to allow DT to specify ECC
strength by either "nand-ecc-strength" or "nand-ecc-maximize".  If
none of them is specified, the driver will try to meet the chip's ECC
requirement.

For PCI platforms, the max ECC strength is used to keep the original
behavior.

Newer versions of this IP need ecc.size and ecc.steps explicitly
set up via the following registers:
  CFG_DATA_BLOCK_SIZE       (0x6b0)
  CFG_LAST_DATA_BLOCK_SIZE  (0x6c0)
  CFG_NUM_DATA_BLOCKS       (0x6d0)

For older IP versions, write accesses to these registers are just
ignored.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: add a shorthand to generate nand_ecc_caps structure

struct nand_ecc_caps was designed as flexible as possible to support
multiple stepsizes (like sunxi_nand.c).

So, we need to write multiple arrays even for the simplest case.
I guess many controllers support a single stepsize, so here is a
shorthand macro for the case.

It allows to describe like ...

NAND_ECC_CAPS_SINGLE(denali_pci_ecc_caps, denali_calc_ecc_bytes, 512, 8, 15);

... instead of

static const int denali_pci_ecc_strengths[] = {8, 15};
static const struct nand_ecc_step_info denali_pci_ecc_stepinfo = {
        .stepsize = 512,
        .strengths = denali_pci_ecc_strengths,
        .nstrengths = ARRAY_SIZE(denali_pci_ecc_strengths),
};
static const struct nand_ecc_caps denali_pci_ecc_caps = {
        .stepinfos = &denali_pci_ecc_stepinfo,
        .nstepinfos = 1,
        .calc_ecc_bytes = denali_calc_ecc_bytes,
};

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: add generic helpers to check, match, maximize ECC settings

Driver are responsible for setting up ECC parameters correctly.
Those include:
  - Check if ECC parameters specified (usually by DT) are valid
  - Meet the chip's ECC requirement
  - Maximize ECC strength if NAND_ECC_MAXIMIZE flag is set

The logic can be generalized by factoring out common code.

This commit adds 3 helpers to the NAND framework:
nand_check_ecc_caps - Check if preset step_size and strength are valid
nand_match_ecc_req - Match the chip's requirement
nand_maximize_ecc - Maximize the ECC strength

To use the helpers above, a driver needs to provide:
  - Data array of supported ECC step size and strength
  - A hook that calculates ECC bytes from the combination of
    step_size and strength.

By using those helpers, code duplication among drivers will be
reduced.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: denali: use BIT() and GENMASK() for register macros

Use BIT() and GENMASK() for register field macros. This will make
it easier to compare the macros with the register description in the
Denali User's Guide.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: denali_dt: clean up resource ioremap

No need to use two struct resource pointers. Just reuse one.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: gpmi: fix typo in comment

Signed-off-by: Matthias Lange <matthias.lange@kernkonzept.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: gpmi: Fix typo in data structure name

This makes it easier to grep.

Signed-off-by: Matthias Lange <matthias.lange@kernkonzept.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: orion: Handle return value of clk_prepare_enable

clk_prepare_enable() can fail here and we must check its return value.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: fsl_ifc: fix handing of bit flips in erased pages

If we see unrecoverable ECC error, we need to count number of bitflips
from all-ones and report correctable/uncorrectable according to
that. Otherwise we report ECC failed on erased flash with single bit error.

Signed-off-by: Pavel Machek <pavel@denx.de>
Reported-by: Darwin Dingel <Darwin.Dingel@alliedtelesis.co.nz>
Acked-by: Darwin Dingel <Darwin.Dingel@alliedtelesis.co.nz>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: mediatek: add support for MT2712 NAND FLASH Controller

MT2712 NAND FLASH Controller is similar to MT2701 except those following:
(1) MT2712 supports up to 148B spare size per 1KB size sector (the same
    with 74B spare size per 512B size sector). There are three new spare
    format: 61, 67, 74.
(2) MT2712 supports up to 80 bit ecc strength. There are three new ecc
    strength level: 68, 72, 80.
(3) MT2712 ECC encode parity data register's start offset is 0x300, and
    different with 0x10 of MT2701.
(4) MT2712 improves ecc irq function. When ECC works in ECC_NFI_MODE,
    MT2701 will generate ecc irq number the same with ecc steps during
    page read. However, MT2712 can only generate one ecc irq.

Changes of this patch are:
(1) add two new variables named pg_irq_sel, encode_parity_reg0 in struct
    mtk_ecc_caps.
(2) add new bitfield ECC_PG_IRQ_SEL for register ECC_IRQ_REG.
(3) add ecc strength array of mt2712.
(4) add spare size array of mt2712.
(5) add mt2712 nfc and ecc device compatiable and data.

Signed-off-by: Xiaolei Li <xiaolei.li@mediatek.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: mediatek: add support for different MTK NAND FLASH Controller IP

ECC strength and spare size supported may be different among MTK NAND
FLASH Controller IPs.

This patch contains changes as following:
(1) add new struct mtk_nfc_caps to support different spare size.
(2) add new struct mtk_ecc_caps to support different ecc strength.
(3) remove ECC_CNFG_xBIT define, use a for loop to do ecc strength config.
(4) remove PAGEFMT_SPARE_ define, use a for loop to do spare format config.
(5) malloc ecc->eccdata buffer according to max ecc strength of this IP.

Signed-off-by: Xiaolei Li <xiaolei.li@mediatek.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: mediatek: refine register NFI_PAGEFMT setting

The register NFI_PAGEFMT is always 32 bits length, so it is better to
do register program using writel() compare with writew().

Signed-off-by: Xiaolei Li <xiaolei.li@mediatek.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: mediatek: update DT bindings

Add MT2712 NAND Flash Controller dt bindings documentation.

Signed-off-by: Xiaolei Li <xiaolei.li@mediatek.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: atmel: mark resume function __maybe_unused

The newly added suspend/resume support causes a harmless warning:

drivers/mtd/nand/atmel/nand-controller.c:2513:12: error: 'atmel_nand_controller_resume' defined but not used [-Werror=unused-function]

This shuts up the warning with a __maybe_unused annotation.

Fixes: b107007a7114 ("mtd: nand: atmel: Add PM ops")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: check ecc->total sanity in nand_scan_tail

Drivers are supposed to set correct ecc->{size,strength,bytes} before
calling nand_scan_tail(), but it does not complain about ecc->total
bigger than oobsize.

In this case, chip->scan_bbt() crashes due to memory corruption, but
it is hard to debug. It would be kind to fail it earlier with a clear
message.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: Support 'EXIT GET STATUS' command in nand_command[_lp]()

READ0 is sometimes used to exit GET STATUS mode. When this is the case
no address cycles are requested, and we can use this information to
detect that READSTART should not be issued after READ0 or that we
shouldn't wait for the chip to be ready.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Tested-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>

mtd: nand: Wait for PAGEPROG to finish in drivers setting NAND_ECC_CUSTOM_PAGE_ACCESS

Drivers setting NAND_ECC_CUSTOM_PAGE_ACCESS are supposed to handle the
full read/write page sequence, and waiting for a page to actually be
programmed is part of this write-page sequence.
This is also what is done in ->write_oob_xxx() hooks, so let's do that in
->write_page_xxx() as well to make it consistent.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: tango: Fix incorrect use of SEQIN command

SEQIN is supposed to be used when one wants to start programming a page.
What we want here is just to change the column within the page, which is
done with the RNDIN command.

Fixes: 6956e2385a16 ("mtd: nand: add tango NAND flash controller support")
Cc: stable@vger.kernel.org
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Acked-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>

mtd: nand: sunxi: Remove unneeded ->cmdfunc(NAND_CMD_READ0, 0, page)

The core already sends the NAND_CMD_READ0 for us. Duplicating this call
in the driver is useless and introduces a perf penalty.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: sunxi: Actually use DMA for subpage reads

ecc->read_subpage is set to sunxi_nfc_hw_ecc_read_subpage_dma when
->dmac != NULL, but is then unconditionally overwritten in the common
init path.

Remove this extra assignment to allow usage of the DMA operation when
possible.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: Drop the ->errstat() hook

The ->errstat() hook is no longer implemented NAND controller drivers.
Get rid of it before someone starts abusing it.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: Drop unused cached programming support

Cached programming is always skipped, so drop the associated code until
we decide to really support it.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: atmel: Add PM ops

Provide a ->resume() hook to make sure the NAND timings are correctly
restored by resetting all chips connected to the controller.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: atmel: Add ->setup_data_interface() hooks

The NAND controller IP can adapt the NAND controller timings dynamically.
Implement the ->setup_data_interface() hook to support this feature.

Note that it's not supported on at91rm9200 because this SoC has a
completely different SMC block, which is not supported yet.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: Pass the CS line to ->setup_data_interface()

Some NAND controllers can assign different NAND timings to different
CS lines. Pass the CS line information to ->setup_data_interface() so
that the NAND controller driver knows which CS line is concerned by
the setup_data_interface() request.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: gpmi: Kill gpmi_nand_exit()

The only user of gpmi_nand_exit() is gpmi_nand_remove(). Move its content
to the caller.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reviewed-by: Marek Vasut <marek.vasut@gmail.com>
Acked-by: Han Xu <han.xu@nxp.com>

mtd: nand: gpmi: Fix gpmi_nand_init() error path

The GPMI driver is wrongly assuming that nand_release() can safely be
called on an uninitialized/unregistered NAND device.

Add a new err_nand_cleanup label in the error path and only execute if
nand_scan_tail() succeeded.

Note that we now call nand_cleanup() instead of nand_release()
(nand_release() is actually grouping the mtd_device_unregister() and
nand_cleanup() in one call) because there's no point in trying to
unregister a device that has never been registered.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reviewed-by: Marek Vasut <marek.vasut@gmail.com>
Acked-by: Han Xu <han.xu@nxp.com>
Reviewed-by: Marek Vasut <marek.vasut@gmail.com>

mtd: gpmi: document current clock requirements

The clock requirements are completely missing, add the clocks
currently required by the driver.

Signed-off-by: Stefan Agner <stefan@agner.ch>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: gpmi: add i.MX 7 SoC support

Add support for i.MX 7 SoC. The i.MX 7 has a slightly different
clock architecture requiring only two clocks to be referenced.
The IP is slightly different compared to i.MX 6, but currently none
of this differences are in use, therefore reuse GPMI_IS_MX6.

Signed-off-by: Stefan Agner <stefan@agner.ch>
Reviewed-by: Marek Vasut <marek.vasut@gmail.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: gpmi: unify clock handling

Add device specific list of clocks required, and handle all clocks
in a single for loop. This avoids further code duplication when
adding i.MX 7 support.

Signed-off-by: Stefan Agner <stefan@agner.ch>
Reviewed-by: Marek Vasut <marek.vasut@gmail.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: Optimize checking of erased buffers

If we see ~0UL in flash, there's no need for hweight, and no need to
check number of bitflips. So this should be net win.

Signed-off-by: Pavel Machek <pavel@denx.de>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: fsmc_nand: handle on-die ECC case

This commit adjusts the fsmc_nand driver so that it accepts the
NAND_ECC_ON_DIE case. It simply does nothing in this case, since both
the ECC operations and OOB layout will be defined by the NAND chip code
rather than by the NAND controller code.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: add support for Micron on-die ECC

Now that the core NAND subsystem has support for on-die ECC, this commit
brings the necessary code to support on-die ECC on Micron NANDs.

In micron_nand_init(), we detect if the Micron NAND chip supports on-die
ECC mode, by checking a number of conditions:

- It must be an ONFI NAND
- It must be a SLC NAND

- Enabling *and* disabling on-die ECC must work

- The on-die ECC must be correcting 4 bits per 512 bytes of data. Some
   Micron NAND chips have an on-die ECC able to correct 8 bits per 512
   bytes of data, but they work slightly differently and therefore we
   don't support them in this patch.

Then, if the on-die ECC cannot be disabled (some Micron NAND have on-die
ECC forcefully enabled), we bail out, as we don't support such
NANDs. Indeed, the implementation of raw_read()/raw_write() make the
assumption that on-die ECC can be disabled. Support for Micron NANDs
with on-die ECC forcefully enabled can easily be added, but in the
absence of such HW for testing, we preferred to simply bail out.

If the on-die ECC is supported, and requested in the Device Tree, then
it is indeed enabled, by using custom implementations of the
->read_page(), ->read_page_raw(), ->write_page() and ->write_page_raw()
operation to properly handle the on-die ECC.

In the non-raw functions, we need to enable the internal ECC engine
before issuing the NAND_CMD_READ0 or NAND_CMD_SEQIN commands, which is
why we set the NAND_ECC_CUSTOM_PAGE_ACCESS option at initialization
time (it asks the NAND core to let the NAND driver issue those
commands).

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: Make sure drivers not supporting SET/GET_FEATURES return -ENOTSUPP

A lot of drivers are providing their own ->cmdfunc(), and most of the
time this implementation does not support all possible NAND operations.
But since ->cmdfunc() cannot return an error code, the core has no way
to know that the operation it requested is not supported.

This is a problem we cannot address for all kind of operations with the
current design, but we can prevent these silent failures for the
GET/SET FEATURES operation by overloading the default
->onfi_{set,get}_features() methods with one returning -ENOTSUPP.

Reported-by: Chris Packham <Chris.Packham@alliedtelesis.co.nz>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Tested-by: Chris Packham <Chris.Packham@alliedtelesis.co.nz>

mtd: nand: export nand_{read,write}_page_raw()

The nand_read_page_raw() and nand_write_page_raw() functions might be
re-used by vendor-specific implementations of the read_page/write_page
functions. Instead of having vendor-specific code duplicate this code,
it is much better to export those functions and allow them to be
re-used.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: add core support for on-die ECC

A number of NAND flashes have a capability called "on-die ECC" where the
NAND chip itself is capable of detecting and correcting errors.

Linux already has support for using the ECC implementation of the NAND
controller, or a software based ECC implementation, but not for using
the ECC implementation of the NAND controller. However, such an
implementation is sometimes useful in situations where the NAND
controller provides ECC algorithms that are not strong enough for the
NAND chip used on the system. A typical case is a NAND chip that
requires a 4-bit ECC, while the NAND controller only provides a 1-bit
ECC algorithm.

This commit introduces the support for the NAND_ECC_ON_DIE ECC mode:

- Parsing of the "on-die" value for the "nand-ecc-mode" Device Tree
   property

- Handling NAND_ECC_ON_DIE case in nand_scan_tail(). The idea is that
   the vendor specific code for the NAND chip must implement
   ->read_page() and ->write_page(). It may optionally provide its own
   ->read_page_raw() and ->write_page_raw() as well. For OOB operation,
   we assume the standard operations are good enough, but they can be
   overridden by the vendor specific code if needed.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

dt-bindings: mtd: document new "on-die" nand-ecc-mode

A number of NAND chips support a feature called on-die ECC, where the
NAND chip itself is capable of doing error detection and correction. The
new "on-die" value for nand-ecc-mode indicates that we want this
functionality to be used.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: fsmc: remove default timings

When timings are no longer provided by the Device Tree, we now use the
SDR timings specified by the NAND flash, and such SDR timings are always
provided. Therefore, it is no longer necessary to keep "default" timings
in the fmsc driver.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: fsmc: add support for SDR timings

Until now, the fsmc_nand driver was either using controller timings
specified in the Device Tree (through FSMC specific DT properties) or
alternatively default/fallback timings.

This commit implements support to use the timings advertised by the NAND
chip itself, by implementing the ->setup_data_interface() hook. To
preserve backward compatibility, if timings are specified in the Device
Tree, we use the timings from the Device Tree (and don't implement
->setup_data_interface).

Many thanks to Boris Brezillon for coming up with the logic to convert
the NAND chip timings into the timings expected by the FSMC controller.

Also, since the timings are now not only coming from the DT, the message
warning that default timings will be used is removed.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: fsmc: reduce number of arguments of fsmc_nand_setup()

In preparation for the introduction of support for using SDR timings
exposed by the NAND flash instead of hard-coded timings, this commit
reworks the fsmc_nand_setup() function to take a "struct fsmc_nand_data"
as argument, which already contains the I/O registers base address, bank
and bus width information.

The timings is also currently contained in the "struct fsmc_nand_data",
but we still pass it as a separate argument because the support for
using SDR timings will pass a different value.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: davinci: set ECC algorithm explicitly for HW based ECC

If ECC strength is 4bits/512bytes the algorithm of the ECC engine is
BCH, otherwise (1bit/512bytes) Hamming is used.

Signed-off-by: Alexander Couzens <lynxis@fe80.eu>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: jz4780: Use mtd_set_ooblayout() to set the ooblayout

The mtd_set_ooblayout() accesor has been added to hide internals of
mtd_info and ease future refactoring. Call mtd_set_ooblayout() instead of
directly accessing mtd->ooblayout.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Acked-by: Harvey Hunt <harveyhuntnexus@gmail.com>

mtd: nand: Add Mediatek machine dependency

The Mediatek NAND driver is only needed for a specific
platform, so avoid cluttering the configuration.

Signed-off-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

mtd: nand: Add Hisilicon machine dependency

The Hisilicon NAND driver is only needed for a specific
platform, so avoid cluttering the configuration.

Signed-off-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

Linux 4.12-rc1

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull some more input subsystem updates from Dmitry Torokhov:
"An updated xpad driver with a few more recognized device IDs, and a
  new psxpad-spi driver, allowing connecting Playstation 1 and 2 joypads
  via SPI bus"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: cros_ec_keyb - remove extraneous 'const'
  Input: add support for PlayStation 1/2 joypads connected via SPI
  Input: xpad - add USB IDs for Mad Catz Brawlstick and Razer Sabertooth
  Input: xpad - sync supported devices with xboxdrv
  Input: xpad - sort supported devices by USB ID

Merge tag 'upstream-4.12-rc1' of git://git.infradead.org/linux-ubifs

Pull UBI/UBIFS updates from Richard Weinberger:

- new config option CONFIG_UBIFS_FS_SECURITY

- minor improvements

- random fixes

* tag 'upstream-4.12-rc1' of git://git.infradead.org/linux-ubifs:
  ubi: Add debugfs file for tracking PEB state
  ubifs: Fix a typo in comment of ioctl2ubifs & ubifs2ioctl
  ubifs: Remove unnecessary assignment
  ubifs: Fix cut and paste error on sb type comparisons
  ubi: fastmap: Fix slab corruption
  ubifs: Add CONFIG_UBIFS_FS_SECURITY to disable/enable security labels
  ubi: Make mtd parameter readable
  ubi: Fix section mismatch

Merge branch 'for-linus-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml

Pull UML fixes from Richard Weinberger:
"No new stuff, just fixes"

* 'for-linus-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: Add missing NR_CPUS include
  um: Fix to call read_initrd after init_bootmem
  um: Include kbuild.h instead of duplicating its macros
  um: Fix PTRACE_POKEUSER on x86_64
  um: Set number of CPUs
  um: Fix _print_addr()

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"15 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm, docs: update memory.stat description with workingset* entries
  mm: vmscan: scan until it finds eligible pages
  mm, thp: copying user pages must schedule on collapse
  dax: fix PMD data corruption when fault races with write
  dax: fix data corruption when fault races with write
  ext4: return to starting transaction in ext4_dax_huge_fault()
  mm: fix data corruption due to stale mmap reads
  dax: prevent invalidation of mapped DAX entries
  Tigran has moved
  mm, vmalloc: fix vmalloc users tracking properly
  mm/khugepaged: add missed tracepoint for collapse_huge_page_swapin
  gcov: support GCC 7.1
  mm, vmstat: Remove spurious WARN() during zoneinfo print
  time: delete current_fs_time()
  hwpoison, memcg: forcibly uncharge LRU pages

mm, docs: update memory.stat description with workingset* entries

Commit 4b4cea91691d ("mm: vmscan: fix IO/refault regression in cache
workingset transition") introduced three new entries in memory stat
file:

- workingset_refault
- workingset_activate
- workingset_nodereclaim

This commit adds a corresponding description to the cgroup v2 docs.

Link: http://lkml.kernel.org/r/1494530293-31236-1-git-send-email-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: vmscan: scan until it finds eligible pages

Although there are a ton of free swap and anonymous LRU page in elgible
zones, OOM happened.

  balloon invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=0, oom_score_adj=0
  CPU: 7 PID: 1138 Comm: balloon Not tainted 4.11.0-rc6-mm1-zram-00289-ge228d67e9677-dirty #17
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
  Call Trace:
   oom_kill_process+0x21d/0x3f0
   out_of_memory+0xd8/0x390
   __alloc_pages_slowpath+0xbc1/0xc50
   __alloc_pages_nodemask+0x1a5/0x1c0
   pte_alloc_one+0x20/0x50
   __pte_alloc+0x1e/0x110
   __handle_mm_fault+0x919/0x960
   handle_mm_fault+0x77/0x120
   __do_page_fault+0x27a/0x550
   trace_do_page_fault+0x43/0x150
   do_async_page_fault+0x2c/0x90
   async_page_fault+0x28/0x30
  Mem-Info:
  active_anon:424716 inactive_anon:65314 isolated_anon:0
   active_file:52 inactive_file:46 isolated_file:0
   unevictable:0 dirty:27 writeback:0 unstable:0
   slab_reclaimable:3967 slab_unreclaimable:4125
   mapped:133 shmem:43 pagetables:1674 bounce:0
   free:4637 free_pcp:225 free_cma:0
  Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
  DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
  lowmem_reserve[]: 0 992 992 1952
  DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB
  lowmem_reserve[]: 0 0 0 959
  Movable free:3644kB min:1980kB low:2960kB high:3940kB active_anon:738560kB inactive_anon:261340kB active_file:188kB inactive_file:640kB unevictable:0kB writepending:20kB present:1048444kB managed:1010816kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:832kB local_pcp:60kB free_cma:0kB
  lowmem_reserve[]: 0 0 0 0
  DMA: 1*4kB (E) 0*8kB 18*16kB (E) 10*32kB (E) 10*64kB (E) 9*128kB (ME) 8*256kB (E) 2*512kB (E) 2*1024kB (E) 0*2048kB 0*4096kB = 7524kB
  DMA32: 417*4kB (UMEH) 181*8kB (UMEH) 68*16kB (UMEH) 48*32kB (UMEH) 14*64kB (MH) 3*128kB (M) 1*256kB (H) 1*512kB (M) 2*1024kB (M) 0*2048kB 0*4096kB = 9836kB
  Movable: 1*4kB (M) 1*8kB (M) 1*16kB (M) 1*32kB (M) 0*64kB 1*128kB (M) 2*256kB (M) 4*512kB (M) 1*1024kB (M) 0*2048kB 0*4096kB = 3772kB
  378 total pagecache pages
  17 pages in swap cache
  Swap cache stats: add 17325, delete 17302, find 0/27
  Free swap  = 978940kB
  Total swap = 1048572kB
  524157 pages RAM
  0 pages HighMem/MovableOnly
  12629 pages reserved
  0 pages cma reserved
  0 pages hwpoisoned
  [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
  [  433]     0   433     4904        5      14       3       82             0 upstart-udev-br
  [  438]     0   438    12371        5      27       3      191         -1000 systemd-udevd

With investigation, skipping page of isolate_lru_pages makes reclaim
void because it returns zero nr_taken easily so LRU shrinking is
effectively nothing and just increases priority aggressively.  Finally,
OOM happens.

The problem is that get_scan_count determines nr_to_scan with eligible
zones so although priority drops to zero, it couldn't reclaim any pages
if the LRU contains mostly ineligible pages.

get_scan_count:

        size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
size = size >> sc->priority;

Assumes sc->priority is 0 and LRU list is as follows.

N-N-N-N-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H

(Ie, small eligible pages are in the head of LRU but others are
almost ineligible pages)

In that case, size becomes 4 so VM want to scan 4 pages but 4 pages from
tail of the LRU are not eligible pages.  If get_scan_count counts
skipped pages, it doesn't reclaim any pages remained after scanning 4
pages so it ends up OOM happening.

This patch makes isolate_lru_pages try to scan pages until it encounters
eligible zones's pages.

[akpm@linux-foundation.org: clean up mind-bending `for' statement.  Tweak comment text]
Fixes: 3db65812d688 ("Revert "mm, vmscan: account for skipped pages as a partial scan"")
Link: http://lkml.kernel.org/r/1494457232-27401-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm, thp: copying user pages must schedule on collapse

We have encountered need_resched warnings in __collapse_huge_page_copy()
while doing {clear,copy}_user_highpage() over HPAGE_PMD_NR source pages.

mm->mmap_sem is held for write, but the iteration is well bounded.

Reschedule as needed.

Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1705101426380.109808@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

dax: fix PMD data corruption when fault races with write

This is based on a patch from Jan Kara that fixed the equivalent race in
the DAX PTE fault path.

Currently DAX PMD read fault can race with write(2) in the following
way:

CPU1 - write(2)                 CPU2 - read fault
                                dax_iomap_pmd_fault()
                                  ->iomap_begin() - sees hole

dax_iomap_rw()
  iomap_apply()
    ->iomap_begin - allocates blocks
    dax_iomap_actor()
      invalidate_inode_pages2_range()
        - there's nothing to invalidate

                                  grab_mapping_entry()
  - we add huge zero page to the radix tree
    and map it to page tables

The result is that hole page is mapped into page tables (and thus zeros
are seen in mmap) while file has data written in that place.

Fix the problem by locking exception entry before mapping blocks for the
fault.  That way we are sure invalidate_inode_pages2_range() call for
racing write will either block on entry lock waiting for the fault to
finish (and unmap stale page tables after that) or read fault will see
already allocated blocks by write(2).

Fixes: 9f141d6ef6258 ("dax: Call ->iomap_begin without entry lock during dax fault")
Link: http://lkml.kernel.org/r/20170510172700.18991-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

dax: fix data corruption when fault races with write

Currently DAX read fault can race with write(2) in the following way:

CPU1 - write(2) CPU2 - read fault
dax_iomap_pte_fault()
  ->iomap_begin() - sees hole
dax_iomap_rw()
  iomap_apply()
    ->iomap_begin - allocates blocks
    dax_iomap_actor()
      invalidate_inode_pages2_range()
        - there's nothing to invalidate
  grab_mapping_entry()
  - we add zero page in the radix tree
    and map it to page tables

The result is that hole page is mapped into page tables (and thus zeros
are seen in mmap) while file has data written in that place.

Fix the problem by locking exception entry before mapping blocks for the
fault.  That way we are sure invalidate_inode_pages2_range() call for
racing write will either block on entry lock waiting for the fault to
finish (and unmap stale page tables after that) or read fault will see
already allocated blocks by write(2).

Fixes: 9f141d6ef6258a3a37a045842d9ba7e68f368956
Link: http://lkml.kernel.org/r/20170510085419.27601-5-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ext4: return to starting transaction in ext4_dax_huge_fault()

DAX will return to locking exceptional entry before mapping blocks for a
page fault to fix possible races with concurrent writes. To avoid lock
inversion between exceptional entry lock and transaction start, start
the transaction already in ext4_dax_huge_fault().

Fixes: 9f141d6ef6258a3a37a045842d9ba7e68f368956
Link: http://lkml.kernel.org/r/20170510085419.27601-4-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: fix data corruption due to stale mmap reads

Currently, we didn't invalidate page tables during invalidate_inode_pages2()
for DAX.  That could result in e.g. 2MiB zero page being mapped into
page tables while there were already underlying blocks allocated and
thus data seen through mmap were different from data seen by read(2).
The following sequence reproduces the problem:

- open an mmap over a 2MiB hole

- read from a 2MiB hole, faulting in a 2MiB zero page

- write to the hole with write(3p). The write succeeds but we
   incorrectly leave the 2MiB zero page mapping intact.

- via the mmap, read the data that was just written. Since the zero
   page mapping is still intact we read back zeroes instead of the new
   data.

Fix the problem by unconditionally calling invalidate_inode_pages2_range()
in dax_iomap_actor() for new block allocations and by properly
invalidating page tables in invalidate_inode_pages2_range() for DAX
mappings.

Fixes: c6dcf52c23d2d3fb5235cec42d7dd3f786b87d55
Link: http://lkml.kernel.org/r/20170510085419.27601-3-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

dax: prevent invalidation of mapped DAX entries

Patch series "mm,dax: Fix data corruption due to mmap inconsistency",
v4.

This series fixes data corruption that can happen for DAX mounts when
page faults race with write(2) and as a result page tables get out of
sync with block mappings in the filesystem and thus data seen through
mmap is different from data seen through read(2).

The series passes testing with t_mmap_stale test program from Ross and
also other mmap related tests on DAX filesystem.

This patch (of 4):

dax_invalidate_mapping_entry() currently removes DAX exceptional entries
only if they are clean and unlocked.  This is done via:

  invalidate_mapping_pages()
    invalidate_exceptional_entry()
      dax_invalidate_mapping_entry()

However, for page cache pages removed in invalidate_mapping_pages()
there is an additional criteria which is that the page must not be
mapped.  This is noted in the comments above invalidate_mapping_pages()
and is checked in invalidate_inode_page().

For DAX entries this means that we can can end up in a situation where a
DAX exceptional entry, either a huge zero page or a regular DAX entry,
could end up mapped but without an associated radix tree entry.  This is
inconsistent with the rest of the DAX code and with what happens in the
page cache case.

We aren't able to unmap the DAX exceptional entry because according to
its comments invalidate_mapping_pages() isn't allowed to block, and
unmap_mapping_range() takes a write lock on the mapping->i_mmap_rwsem.

Since we essentially never have unmapped DAX entries to evict from the
radix tree, just remove dax_invalidate_mapping_entry().

Fixes: c6dcf52c23d2 ("mm: Invalidate DAX radix tree entries only if appropriate")
Link: http://lkml.kernel.org/r/20170510085419.27601-2-jack@suse.cz
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org> [4.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Tigran has moved

Cc: Tigran Aivazian <aivazian.tigran@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm, vmalloc: fix vmalloc users tracking properly

Commit 1f5307b1e094 ("mm, vmalloc: properly track vmalloc users") has
pulled asm/pgtable.h include dependency to linux/vmalloc.h and that
turned out to be a bad idea for some architectures.  E.g.  m68k fails
with

   In file included from arch/m68k/include/asm/pgtable_mm.h:145:0,
                    from arch/m68k/include/asm/pgtable.h:4,
                    from include/linux/vmalloc.h:9,
                    from arch/m68k/kernel/module.c:9:
   arch/m68k/include/asm/mcf_pgtable.h: In function 'nocache_page':
>> arch/m68k/include/asm/mcf_pgtable.h:339:43: error: 'init_mm' undeclared (first use in this function)
    #define pgd_offset_k(address) pgd_offset(&init_mm, address)

as spotted by kernel build bot. nios2 fails for other reason

  In file included from include/asm-generic/io.h:767:0,
                   from arch/nios2/include/asm/io.h:61,
                   from include/linux/io.h:25,
                   from arch/nios2/include/asm/pgtable.h:18,
                   from include/linux/mm.h:70,
                   from include/linux/pid_namespace.h:6,
                   from include/linux/ptrace.h:9,
                   from arch/nios2/include/uapi/asm/elf.h:23,
                   from arch/nios2/include/asm/elf.h:22,
                   from include/linux/elf.h:4,
                   from include/linux/module.h:15,
                   from init/main.c:16:
  include/linux/vmalloc.h: In function '__vmalloc_node_flags':
  include/linux/vmalloc.h:99:40: error: 'PAGE_KERNEL' undeclared (first use in this function); did you mean 'GFP_KERNEL'?

which is due to the newly added #include <asm/pgtable.h>, which on nios2
includes <linux/io.h> and thus <asm/io.h> and <asm-generic/io.h> which
again includes <linux/vmalloc.h>.

Tweaking that around just turns out a bigger headache than necessary.
This patch reverts 1f5307b1e094 and reimplements the original fix in a
different way.  __vmalloc_node_flags can stay static inline which will
cover vmalloc* functions.  We only have one external user
(kvmalloc_node) and we can export __vmalloc_node_flags_caller and
provide the caller directly.  This is much simpler and it doesn't really
need any games with header files.

[akpm@linux-foundation.org: coding-style fixes]
[mhocko@kernel.org: revert old comment]
Link: http://lkml.kernel.org/r/20170509211054.GB16325@dhcp22.suse.cz
Fixes: 1f5307b1e094 ("mm, vmalloc: properly track vmalloc users")
Link: http://lkml.kernel.org/r/20170509153702.GR6481@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Tobias Klauser <tklauser@distanz.ch>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/khugepaged: add missed tracepoint for collapse_huge_page_swapin

One return case of `__collapse_huge_page_swapin()` does not invoke
tracepoint while every other return case does. This commit adds a
tracepoint invocation for the case.

Link: http://lkml.kernel.org/r/20170507101813.30187-1-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

gcov: support GCC 7.1

Starting from GCC 7.1, __gcov_exit is a new symbol expected to be
implemented in a profiling runtime.

[akpm@linux-foundation.org: coding-style fixes]
[mliska@suse.cz: v2]
Link: http://lkml.kernel.org/r/e63a3c59-0149-c97e-4084-20ca8f146b26@suse.cz
Link: http://lkml.kernel.org/r/8c4084fa-3885-29fe-5fc4-0d4ca199c785@suse.cz
Signed-off-by: Martin Liska <mliska@suse.cz>
Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm, vmstat: Remove spurious WARN() during zoneinfo print

After commit e2ecc8a79ed4 ("mm, vmstat: print non-populated zones in
zoneinfo"), /proc/zoneinfo will show unpopulated zones.

A memoryless node, having no populated zones at all, was previously
ignored, but will now trigger the WARN() in is_zone_first_populated().

Remove this warning, as its only purpose was to warn of a situation that
has since been enabled.

Aside: The "per-node stats" are still printed under the first populated
zone, but that's not necessarily the first stanza any more. I'm not
sure which criteria is more important with regard to not breaking
parsers, but it looks a little weird to the eye.

Fixes: e2ecc8a79ed4 ("mm, vmstat: print node-based stats in zoneinfo file")
Link: http://lkml.kernel.org/r/1493854905-10918-1-git-send-email-arbab@linux.vnet.ibm.com
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

time: delete current_fs_time()

All uses of the current_fs_time() function have been replaced by other
time interfaces.

And, its use cases can be fulfilled by current_time() or ktime_get_*
variants.

Link: http://lkml.kernel.org/r/1491613030-11599-13-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

hwpoison, memcg: forcibly uncharge LRU pages

Laurent Dufour has noticed that hwpoinsoned pages are kept charged.  In
his particular case he has hit a bad_page("page still charged to
cgroup") when onlining a hwpoison page.  While this looks like something
that shouldn't happen in the first place because onlining hwpages and
returning them to the page allocator makes only little sense it shows a
real problem.

hwpoison pages do not get freed usually so we do not uncharge them (at
least not since commit 0a31bc97c80c ("mm: memcontrol: rewrite uncharge
API")).  Each charge pins memcg (since e8ea14cc6ead ("mm: memcontrol:
take a css reference for each charged page")) as well and so the
mem_cgroup and the associated state will never go away.  Fix this leak
by forcibly uncharging a LRU hwpoisoned page in delete_from_lru_cache().
We also have to tweak uncharge_list because it cannot rely on zero ref
count for these pages.

[akpm@linux-foundation.org: coding-style fixes]
Fixes: 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")
Link: http://lkml.kernel.org/r/20170502185507.GB19165@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Dan Williams:
"Incremental fixes and a small feature addition on top of the main
  libnvdimm 4.12 pull request:

   - Geert noticed that tinyconfig was bloated by BLOCK selecting DAX.
     The size regression is fixed by moving all dax helpers into the
     dax-core and only specifying "select DAX" for FS_DAX and
     dax-capable drivers. He also asked for clarification of the
     NR_DEV_DAX config option which, on closer look, does not need to be
     a config option at all. Mike also throws in a DEV_DAX_PMEM fixup
     for good measure.

   - Ben's attention to detail on -stable patch submissions caught a
     case where the recent fixes to arch_copy_from_iter_pmem() missed a
     condition where we strand dirty data in the cache. This is tagged
     for -stable and will also be included in the rework of the pmem api
     to a proposed {memcpy,copy_user}_flushcache() interface for 4.13.

   - Vishal adds a feature that missed the initial pull due to pending
     review feedback. It allows the kernel to clear media errors when
     initializing a BTT (atomic sector update driver) instance on a pmem
     namespace.

   - Ross noticed that the dax_device + dax_operations conversion broke
     __dax_zero_page_range(). The nvdimm unit tests fail to check this
     path, but xfstests immediately trips over it. No excuse for missing
     this before submitting the 4.12 pull request.

  These all pass the nvdimm unit tests and an xfstests spot check. The
  set has received a build success notification from the kbuild robot"

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  filesystem-dax: fix broken __dax_zero_page_range() conversion
  libnvdimm, btt: ensure that initializing metadata clears poison
  libnvdimm: add an atomic vs process context flag to rw_bytes
  x86, pmem: Fix cache flushing for iovec write < 8 bytes
  device-dax: kill NR_DEV_DAX
  block, dax: move "select DAX" from BLOCK to FS_DAX
  device-dax: Tell kbuild DEV_DAX_PMEM depends on DEV_DAX

Merge tag 'sound-fix-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"This contains a one-liner change that has a significant impact:
  disabling the build of OSS. It's been unmaintained for long time, and
  we'd like to drop the stuff. Finally, as the first step, stop the
  build. Let's see whether it works without much complaints.

  Other than that, there are two small fixes for HD-audio"

* tag 'sound-fix-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  sound: Disable the build of OSS drivers
  ALSA: hda: Fix cpu lockup when stopping the cmd dmas
  ALSA: hda - Add mute led support for HP EliteBook 840 G3

Merge tag 'for-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply

Pull more power-supply updates from Sebastian Reichel:
"The power-supply subsystem has a few more changes for the v4.12 merge
  window:

   - New battery driver for AXP20X and AXP22X PMICs

   - Improve max17042_battery for usage on x86

   - Misc small cleanups & fixes"

* tag 'for-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (34 commits)
  power: supply: cpcap-charger: Keep trickle charger bits disabled
  power: supply: cpcap-charger: Fix enable for 3.8V charge setting
  power: supply: cpcap-charger: Fix charge voltage configuration
  power: supply: cpcap-charger: Fix charger name
  power: supply: twl4030-charger: make twl4030_bci_property_is_writeable static
  power: supply: sbs-battery: Add alert callback
  mailmap: add Sebastian Reichel
  power: supply: avoid unused twl4030-madc.h
  power: supply: sbs-battery: Correct supply status with current draw
  power: supply: sbs-battery: Don't ignore the first external power change
  power: supply: pda_power: move from timer to delayed_work
  power: supply: max17042_battery: Add support for the SCOPE property
  power: supply: max17042_battery: Add support for the CHARGE_NOW property
  power: supply: max17042_battery: Add support for the CHARGE_FULL_DESIGN property
  power: supply: max17042_battery: mAh readings depend on r_sns value
  power: supply: max17042_battery: Add support for the VOLT_MIN property
  power: supply: max17042_battery: Add support for the TECHNOLOGY attribute
  power: supply: max17042_battery: Add external_power_changed callback
  power: supply: max17042_battery: Add support for the STATUS property
  power: supply: max17042_battery: Add default platform_data fallback data
  ...

Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux

Pull thermal management updates from Zhang Rui:

- Fix a problem where orderly_shutdown() is called for multiple times
   due to multiple critical overheating events raised in a short period
   by platform thermal driver. (Keerthy)

- Introduce a backup thermal shutdown mechanism, which invokes
   kernel_power_off()/emergency_restart() directly, after
   orderly_shutdown() being issued for certain amount of time(specified
   via Kconfig). This is useful in certain conditions that userspace may
   be unable to power off the system in a clean manner and leaves the
   system in a critical state, like in the middle of driver probing
   phase. (Keerthy)

- Introduce a new interface in thermal devfreq_cooling code so that the
   driver can provide more precise data regarding actual power to the
   thermal governor every time the power budget is calculated. (Lukasz
   Luba)

- Introduce BCM 2835 soc thermal driver and northstar thermal driver,
   within a new sub-folder. (Rafał Miłecki)

- Introduce DA9062/61 thermal driver. (Steve Twiss)

- Remove non-DT booting on TI-SoC driver. Also add support to fetching
   coefficients from DT. (Keerthy)

- Refactorf RCAR Gen3 thermal driver. (Niklas Söderlund)

- Small fix on MTK and intel-soc-dts thermal driver. (Dawei Chien,
   Brian Bian)

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (25 commits)
  thermal: core: Add a back up thermal shutdown mechanism
  thermal: core: Allow orderly_poweroff to be called only once
  Thermal: Intel SoC DTS: Change interrupt request behavior
  trace: thermal: add another parameter 'power' to the tracing function
  thermal: devfreq_cooling: add new interface for direct power read
  thermal: devfreq_cooling: refactor code and add get_voltage function
  thermal: mt8173: minor mtk_thermal.c cleanups
  thermal: bcm2835: move to the broadcom subdirectory
  thermal: broadcom: ns: specify myself as MODULE_AUTHOR
  thermal: da9062/61: Thermal junction temperature monitoring driver
  Documentation: devicetree: thermal: da9062/61 TJUNC temperature binding
  thermal: broadcom: add Northstar thermal driver
  dt-bindings: thermal: add support for Broadcom's Northstar thermal
  thermal: bcm2835: add thermal driver for bcm2835 SoC
  dt-bindings: Add thermal zone to bcm2835-thermal example
  thermal: rcar_gen3_thermal: add suspend and resume support
  thermal: rcar_gen3_thermal: store device match data in private structure
  thermal: rcar_gen3_thermal: enable hardware interrupts for trip points
  thermal: rcar_gen3_thermal: record and check number of TSCs found
  thermal: rcar_gen3_thermal: check that TSC exists before memory allocation
  ...

Merge tag 'drm-fixes-for-v4.12-rc1' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"AMD, nouveau, one i915, and one EDID fix for v4.12-rc1

  Some fixes that it would be good to have in rc1. It contains the i915
  quiet fix that you reported.

  It also has an amdgpu fixes pull, with lots of ongoing work on Vega10
  which is new in this kernel and is preliminary support so may have a
  fair bit of movement.

  Otherwise a few non-Vega10 AMD fixes, one EDID fix and some nouveau
  regression fixers"

* tag 'drm-fixes-for-v4.12-rc1' of git://people.freedesktop.org/~airlied/linux: (144 commits)
  drm/i915: Make vblank evade warnings optional
  drm/nouveau/therm: remove ineffective workarounds for alarm bugs
  drm/nouveau/tmr: avoid processing completed alarms when adding a new one
  drm/nouveau/tmr: fix corruption of the pending list when rescheduling an alarm
  drm/nouveau/tmr: handle races with hw when updating the next alarm time
  drm/nouveau/tmr: ack interrupt before processing alarms
  drm/nouveau/core: fix static checker warning
  drm/nouveau/fb/ram/gf100-: remove 0x10f200 read
  drm/nouveau/kms/nv50: skip core channel cursor update on position-only changes
  drm/nouveau/kms/nv50: fix source-rect-only plane updates
  drm/nouveau/kms/nv50: remove pointless argument to window atomic_check_acquire()
  drm/amd/powerplay: refine pwm1_enable callback functions for CI.
  drm/amd/powerplay: refine pwm1_enable callback functions for vi.
  drm/amd/powerplay: refine pwm1_enable callback functions for Vega10.
  drm/amdgpu: refine amdgpu pwm1_enable sysfs interface.
  drm/amdgpu: add amd fan ctrl mode enums.
  drm/amd/powerplay: add more smu message on Vega10.
  drm/amdgpu: fix dependency issue
  drm/amd: fix init order of sched job
  drm/amdgpu: add some additional vega10 pci ids
  ...

Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending

Pull SCSI target updates from Nicholas Bellinger:
"Things were a lot more calm than previously expected. It's primarily
  fixes in various areas, with most of the new functionality centering
  around TCMU backend driver work that Xiubo Li has been driving.

  Here's the summary on the feature side:

   - Make T10-PI verify configurable for emulated (FILEIO + RD) backends
    (Dmitry Monakhov)
   - Allow target-core/TCMU pass-through to use in-kernel SPC-PR logic
    (Bryant Ly + MNC)
   - Add TCMU support for growing ring buffer size (Xiubo Li + MNC)
   - Add TCMU support for global block data pool (Xiubo Li + MNC)

  and on the bug-fix side:

   - Fix COMPARE_AND_WRITE non GOOD status handling for READ phase
    failures (Gary Guo + nab)
   - Fix iscsi-target hang with explicitly changing per NodeACL
    CmdSN number depth with concurrent login driven session
    reinstatement.  (Gary Guo + nab)
   - Fix ibmvscsis fabric driver ABORT task handling (Bryant Ly)
   - Fix target-core/FILEIO zero length handling (Bart Van Assche)

  Also, there was an OOPs introduced with the WRITE_VERIFY changes that
  I ended up reverting at the last minute, because as not unusual Bart
  and I could not agree on the fix in time for -rc1. Since it's specific
  to a conformance test, it's been reverted for now.

  There is a separate patch in the queue to address the underlying
  control CDB write overflow regression in >= v4.3 separate from the
  WRITE_VERIFY revert here, that will be pushed post -rc1"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (30 commits)
  Revert "target: Fix VERIFY and WRITE VERIFY command parsing"
  IB/srpt: Avoid that aborting a command triggers a kernel warning
  IB/srpt: Fix abort handling
  target/fileio: Fix zero-length READ and WRITE handling
  ibmvscsis: Do not send aborted task response
  tcmu: fix module removal due to stuck thread
  target: Don't force session reset if queue_depth does not change
  iscsi-target: Set session_fall_back_to_erl0 when forcing reinstatement
  target: Fix compare_and_write_callback handling for non GOOD status
  tcmu: Recalculate the tcmu_cmd size to save cmd area memories
  tcmu: Add global data block pool support
  tcmu: Add dynamic growing data area feature support
  target: fixup error message in target_tg_pt_gp_tg_pt_gp_id_store()
  target: fixup error message in target_tg_pt_gp_alua_access_type_store()
  target/user: PGR Support
  target: Add WRITE_VERIFY_16
  Documentation/target: add an example script to configure an iSCSI target
  target: Use kmalloc_array() in transport_kmap_data_sg()
  target: Use kmalloc_array() in compare_and_write_callback()
  target: Improve size determinations in two functions
  ...

Merge branch 'work.sane_pwd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull misc vfs updates from Al Viro:
"Making sure that something like a referral point won't end up as pwd
  or root.

  The main part is the last commit (fixing mntns_install()); that one
  fixes a hard-to-hit race. The fchdir() commit is making fchdir(2) a
  bit more robust - it should be impossible to get opened files (even
  O_PATH ones) for referral points in the first place, so the existing
  checks are OK, but checking the same thing as in chdir(2) is just as
  cheap.

  The path_init() commit removes a redundant check that shouldn't have
  been there in the first place"

* 'work.sane_pwd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  make sure that mntns_install() doesn't end up with referral for root
  path_init(): don't bother with checking MAY_EXEC for LOOKUP_ROOT
  make sure that fchdir() won't accept referral points, etc.

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates/fixes from Ingo Molnar:
"Mostly tooling updates, but also two kernel fixes: a call chain
  handling robustness fix and an x86 PMU driver event definition fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/callchain: Force USER_DS when invoking perf_callchain_user()
  tools build: Fixup sched_getcpu feature test
  perf tests kmod-path: Don't fail if compressed modules aren't supported
  perf annotate: Fix AArch64 comment char
  perf tools: Fix spelling mistakes
  perf/x86: Fix Broadwell-EP DRAM RAPL events
  perf config: Refactor a duplicated code for obtaining config file name
  perf symbols: Allow user probes on versioned symbols
  perf symbols: Accept symbols starting at address 0
  tools lib string: Adopt prefixcmp() from perf and subcmd
  perf units: Move parse_tag_value() to units.[ch]
  perf ui gtk: Move gtk .so name to the only place where it is used
  perf tools: Move HAS_BOOL define to where perl headers are used
  perf memswap: Split the byteswap memory range wrappers from util.[ch]
  perf tools: Move event prototypes from util.h to event.h
  perf buildid: Move prototypes from util.h to build-id.h