Merge tag 'wireless-drivers-next-for-davem-2016-05-13' of git://git.kernel.org/pub...

[karo-tx-linux.git] / Documentation / memory-barriers.txt
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt

index 3729cbe60e4169340b5bc522951d9e0f40c4cb46..147ae8ec836f85666110634ff5565f4016de1d80 100644 (file)
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -4,8 +4,40 @@
  
  By: David Howells <dhowells@redhat.com>
      Paul E. McKenney <paulmck@linux.vnet.ibm.com>
+    Will Deacon <will.deacon@arm.com>
+    Peter Zijlstra <peterz@infradead.org>
  
-Contents:
+==========
+DISCLAIMER
+==========
+
+This document is not a specification; it is intentionally (for the sake of
+brevity) and unintentionally (due to being human) incomplete. This document is
+meant as a guide to using the various memory barriers provided by Linux, but
+in case of any doubt (and there are many) please ask.
+
+To repeat, this document is not a specification of what Linux expects from
+hardware.
+
+The purpose of this document is twofold:
+
+ (1) to specify the minimum functionality that one can rely on for any
+     particular barrier, and
+
+ (2) to provide a guide as to how to use the barriers that are available.
+
+Note that an architecture can provide more than the minimum requirement
+for any particular barrier, but if the architecure provides less than
+that, that architecture is incorrect.
+
+Note also that it is possible that a barrier may be a no-op for an
+architecture because the way that arch works renders an explicit barrier
+unnecessary in that case.
+
+
+========
+CONTENTS
+========
  
   (*) Abstract memory access model.
  
@@ -31,15 +63,15 @@ Contents:
  
   (*) Implicit kernel memory barriers.
  
-     - Locking functions.
+     - Lock acquisition functions.
       - Interrupt disabling functions.
       - Sleep and wake-up functions.
       - Miscellaneous functions.
  
- (*) Inter-CPU locking barrier effects.
+ (*) Inter-CPU acquiring barrier effects.
  
-     - Locks vs memory accesses.
-     - Locks vs I/O accesses.
+     - Acquires vs memory accesses.
+     - Acquires vs I/O accesses.
  
   (*) Where are memory barriers needed?
  
@@ -61,6 +93,7 @@ Contents:
   (*) The things CPUs get up to.
  
       - And then there's the Alpha.
+     - Virtual Machine Guests.
  
   (*) Example uses.
  
@@ -148,7 +181,7 @@ As a further example, consider this sequence of events:
  
         CPU 1           CPU 2
         =============== ===============
-       { A == 1, B == 2, C = 3, P == &A, Q == &C }
+       { A == 1, B == 2, C == 3, P == &A, Q == &C }
         B = 4;          Q = P;
         P = &B          D = *Q;
  
@@ -430,8 +463,9 @@ And a couple of implicit varieties:
       This acts as a one-way permeable barrier.  It guarantees that all memory
       operations after the ACQUIRE operation will appear to happen after the
       ACQUIRE operation with respect to the other components of the system.
-     ACQUIRE operations include LOCK operations and smp_load_acquire()
-     operations.
+     ACQUIRE operations include LOCK operations and both smp_load_acquire()
+     and smp_cond_acquire() operations. The later builds the necessary ACQUIRE
+     semantics from relying on a control dependency and smp_rmb().
  
       Memory operations that occur before an ACQUIRE operation may appear to
       happen after it completes.
@@ -464,6 +498,11 @@ And a couple of implicit varieties:
       This means that ACQUIRE acts as a minimal "acquire" operation and
       RELEASE acts as a minimal "release" operation.
  
+A subset of the atomic operations described in atomic_ops.txt have ACQUIRE
+and RELEASE variants in addition to fully-ordered and relaxed (no barrier
+semantics) definitions.  For compound atomics performing both a load and a
+store, ACQUIRE semantics apply only to the load and RELEASE semantics apply
+only to the store portion of the operation.
  
  Memory barriers are only required where there's a possibility of interaction
  between two CPUs or between a CPU and a device.  If it can be guaranteed that
@@ -517,7 +556,7 @@ following sequence of events:
  
         CPU 1                 CPU 2
         ===============       ===============
-       { A == 1, B == 2, C = 3, P == &A, Q == &C }
+       { A == 1, B == 2, C == 3, P == &A, Q == &C }
         B = 4;
         <write barrier>
         WRITE_ONCE(P, &B)
@@ -544,7 +583,7 @@ between the address load and the data load:
  
         CPU 1                 CPU 2
         ===============       ===============
-       { A == 1, B == 2, C = 3, P == &A, Q == &C }
+       { A == 1, B == 2, C == 3, P == &A, Q == &C }
         B = 4;
         <write barrier>
         WRITE_ONCE(P, &B);
@@ -813,9 +852,10 @@ In summary:
        the same variable, then those stores must be ordered, either by
        preceding both of them with smp_mb() or by using smp_store_release()
        to carry out the stores.  Please note that it is -not- sufficient
-      to use barrier() at beginning of each leg of the "if" statement,
-      as optimizing compilers do not necessarily respect barrier()
-      in this case.
+      to use barrier() at beginning of each leg of the "if" statement
+      because, as shown by the example above, optimizing compilers can
+      destroy the control dependency while respecting the letter of the
+      barrier() law.
  
    (*) Control dependencies require at least one run-time conditional
        between the prior load and the subsequent store, and this
@@ -1731,15 +1771,15 @@ The Linux kernel has eight basic CPU memory barriers:
  
  
  All memory barriers except the data dependency barriers imply a compiler
-barrier. Data dependencies do not impose any additional compiler ordering.
+barrier.  Data dependencies do not impose any additional compiler ordering.
  
  Aside: In the case of data dependencies, the compiler would be expected
  to issue the loads in the correct order (eg. `a[b]` would have to load
  the value of b before loading a[b]), however there is no guarantee in
  the C specification that the compiler may not speculate the value of b
  (eg. is equal to 1) and load a before b (eg. tmp = a[1]; if (b != 1)
-tmp = a[b]; ). There is also the problem of a compiler reloading b after
-having loaded a[b], thus having a newer copy of b than a[b]. A consensus
+tmp = a[b]; ).  There is also the problem of a compiler reloading b after
+having loaded a[b], thus having a newer copy of b than a[b].  A consensus
  has not yet been reached about these problems, however the READ_ONCE()
  macro is a good place to start looking.
  
@@ -1794,6 +1834,7 @@ There are some more advanced barrier functions:
  
  
   (*) lockless_dereference();
+
       This can be thought of as a pointer-fetch wrapper around the
       smp_read_barrier_depends() data-dependency barrier.
  
@@ -1858,7 +1899,7 @@ This is a variation on the mandatory write barrier that causes writes to weakly
  ordered I/O regions to be partially ordered.  Its effects may go beyond the
  CPU->Hardware interface and actually affect the hardware at some level.
  
-See the subsection "Locks vs I/O accesses" for more information.
+See the subsection "Acquires vs I/O accesses" for more information.
  
  
  ===============================
@@ -1873,8 +1914,8 @@ provide more substantial guarantees, but these may not be relied upon outside
  of arch specific code.
  
  
-ACQUIRING FUNCTIONS
--------------------
+LOCK ACQUISITION FUNCTIONS
+--------------------------
  
  The Linux kernel has a number of locking constructs:
  
@@ -1895,7 +1936,7 @@ for each construct.  These operations all imply certain barriers:
       Memory operations issued before the ACQUIRE may be completed after
       the ACQUIRE operation has completed.  An smp_mb__before_spinlock(),
       combined with a following ACQUIRE, orders prior stores against
-     subsequent loads and stores. Note that this is weaker than smp_mb()!
+     subsequent loads and stores.  Note that this is weaker than smp_mb()!
       The smp_mb__before_spinlock() primitive is free on many architectures.
  
   (2) RELEASE operation implication:
@@ -2090,9 +2131,9 @@ or:
         event_indicated = 1;
         wake_up_process(event_daemon);
  
-A write memory barrier is implied by wake_up() and co. if and only if they wake
-something up.  The barrier occurs before the task state is cleared, and so sits
-between the STORE to indicate the event and the STORE to set TASK_RUNNING:
+A write memory barrier is implied by wake_up() and co.  if and only if they
+wake something up.  The barrier occurs before the task state is cleared, and so
+sits between the STORE to indicate the event and the STORE to set TASK_RUNNING:
  
         CPU 1                           CPU 2
         =============================== ===============================
@@ -2206,7 +2247,7 @@ three CPUs; then should the following sequence of events occur:
  
  Then there is no guarantee as to what order CPU 3 will see the accesses to *A
  through *H occur in, other than the constraints imposed by the separate locks
-on the separate CPUs. It might, for example, see:
+on the separate CPUs.  It might, for example, see:
  
         *E, ACQUIRE M, ACQUIRE Q, *G, *C, *F, *A, *B, RELEASE Q, *D, *H, RELEASE M
  
@@ -2486,9 +2527,9 @@ The following operations are special locking primitives:
         clear_bit_unlock();
         __clear_bit_unlock();
  
-These implement ACQUIRE-class and RELEASE-class operations. These should be used in
-preference to other operations when implementing locking primitives, because
-their implementations can be optimised on many architectures.
+These implement ACQUIRE-class and RELEASE-class operations.  These should be
+used in preference to other operations when implementing locking primitives,
+because their implementations can be optimised on many architectures.
  
  [!] Note that special memory barrier primitives are available for these
  situations because on some CPUs the atomic instructions used imply full memory
@@ -2568,12 +2609,12 @@ explicit barriers are used.
  
  Normally this won't be a problem because the I/O accesses done inside such
  sections will include synchronous load operations on strictly ordered I/O
-registers that form implicit I/O barriers. If this isn't sufficient then an
+registers that form implicit I/O barriers.  If this isn't sufficient then an
  mmiowb() may need to be used explicitly.
  
  
  A similar situation may occur between an interrupt routine and two routines
-running on separate CPUs that communicate with each other. If such a case is
+running on separate CPUs that communicate with each other.  If such a case is
  likely, then interrupt-disabling locks should be used to guarantee ordering.
  
  
@@ -2587,8 +2628,8 @@ functions:
   (*) inX(), outX():
  
       These are intended to talk to I/O space rather than memory space, but
-     that's primarily a CPU-specific concept. The i386 and x86_64 processors do
-     indeed have special I/O space access cycles and instructions, but many
+     that's primarily a CPU-specific concept.  The i386 and x86_64 processors
+     do indeed have special I/O space access cycles and instructions, but many
       CPUs don't have such a concept.
  
       The PCI bus, amongst others, defines an I/O space concept which - on such
@@ -2610,7 +2651,7 @@ functions:
  
       Whether these are guaranteed to be fully ordered and uncombined with
       respect to each other on the issuing CPU depends on the characteristics
-     defined for the memory window through which they're accessing. On later
+     defined for the memory window through which they're accessing.  On later
       i386 architecture machines, for example, this is controlled by way of the
       MTRR registers.
  
@@ -2635,10 +2676,10 @@ functions:
   (*) readX_relaxed(), writeX_relaxed()
  
       These are similar to readX() and writeX(), but provide weaker memory
-     ordering guarantees. Specifically, they do not guarantee ordering with
+     ordering guarantees.  Specifically, they do not guarantee ordering with
       respect to normal memory accesses (e.g. DMA buffers) nor do they guarantee
-     ordering with respect to LOCK or UNLOCK operations. If the latter is
-     required, an mmiowb() barrier can be used. Note that relaxed accesses to
+     ordering with respect to LOCK or UNLOCK operations.  If the latter is
+     required, an mmiowb() barrier can be used.  Note that relaxed accesses to
       the same peripheral are guaranteed to be ordered with respect to each
       other.
  
@@ -3040,8 +3081,9 @@ The Alpha defines the Linux kernel's memory barrier model.
  
  See the subsection on "Cache Coherency" above.
  
+
  VIRTUAL MACHINE GUESTS
--------------------
+----------------------
  
  Guests running within virtual machines might be affected by SMP effects even if
  the guest itself is compiled without SMP support.  This is an artifact of
@@ -3050,7 +3092,7 @@ barriers for this use-case would be possible but is often suboptimal.
  
  To handle this case optimally, low-level virt_mb() etc macros are available.
  These have the same effect as smp_mb() etc when SMP is enabled, but generate
-identical code for SMP and non-SMP systems. For example, virtual machine guests
+identical code for SMP and non-SMP systems.  For example, virtual machine guests
  should use virt_mb() rather than smp_mb() when synchronizing against a
  (possibly SMP) host.
  
@@ -3058,6 +3100,7 @@ These are equivalent to smp_mb() etc counterparts in all other respects,
  in particular, they do not control MMIO effects: to control
  MMIO effects, use mandatory barriers.
  
+
  ============
  EXAMPLE USES
  ============