I saw the proposal of fences in first mail by Laszlo. Please forgive my
ignorance. What is asm(“”) in x86? A nop? The how a nop can help as a
processor level load store barrier?
On x86 load-load, load-store and store-store ordering is already guaranteed
by the processor. Therefore on x86 the AcquireMemoryFence and
ReleaseMemoryFence are just like CompilerFence: they only have to block
compiler-level reordering. MemoryFence is the only one that blocks
store-load reordering and needs to emit an MFENCE instruction.

On ARM (either 32- or 64-bit) the processor-level guarantees are weaker,
and you need to emit a "dmb" instruction for acquire and release fences as


Without calling _ReadWriteBarrier, is it possible that compiler
generates the assembly in the wrong location? I mean the compiler may
in-line the LibWaitForSemaphore and call cmpxchg earlier than the
desired location.
Similar to LibReleaseSemaphore.

So my understanding is the _ReadWriteBarrier in ReleaseSpinLock is

The proposed ReleaseMemoryFence() should already have that effect. All
the proposed fences except CompilerFence() are both compiler
optimization barriers and processor barriers.
InterlockedCompareExchange() is also both a compiler optimization
barrier and a processor barrier

CompilerFence() is just a better name for _ReadWriteBarrier(), it blocks
optimizations but it has no effect at the processor level. It should
only be used (instead of volatile) in busy-waiting loops that do not
always go through an InterlockedCompareExchange.


234 _ReadWriteBarrier ();
236 _ReadWriteBarrier ();
238 return SpinLock;
it *is* buggy because it is missing a
(processor) barrier on non-x86 architectures and has a useless barrier
after the store. Instead it should be just this:

ReleaseMemoryFence ();

