On Feb 5, 2021, at 10:11 AM, Ni, Ray <ray.ni@...> wrote:
I saw the proposal of fences in first mail by Laszlo. Please forgive my ignorance. What is asm(“”) in x86? A nop? The how a nop can help as a processor level load store barrier?
In C calling out to assembly is also a barrier/fence operation from the compilers point of view. Actually calling an indirect procedure call (gBS->*, Protoco->*) is also a barrier. The compiler has no idea what the assemble code is doing across the boundary so all the operations need to complete prior to calling the assembly code (indirect procedure call). I guess a binary static lib is in this list too. In reality it is anything the C compiler can’t link time optimize through.
For gcc flavored compliers any __asm__ call is a compiler read/write barrier. That is why you see __asm__ __volatile__ ("":::"memory”); in the memory fence. That means for gcc/clang any synchronization primitive implemented in inline assembler are also a compiler barrier/fence wrapping that assembly operation.
The VC++ inline assemble seems to be “more integrated” with the compiler, so I’m not sure what the rules are for that. Some one should really investigate that. Something tells me you will not find those answer on the Linux mailing list :).
The reason the MMIO operations are wrapped in _ReadWriteBarrier ()/__asm__ __volatile__ ("":::"memory”) has to do with the order of operations.
1) The leading barrier forces all the code before the call to complete the read/writes prior to the operation.
2) The trailing barrier forces the MMIO operation to complete before unrelated read writes that come after the call that could get reordered in optimization.
As I mentioned in this thread I’m wondering if VC++ has extra behaviors around volatile that are compiler dependent and that is why we have not seen a lot of issues to date?
I’m really glad this topic came up. It really seems like something we need to work through …...
发件人: firstname.lastname@example.org <email@example.com> 代表 Paolo Bonzini <pbonzini@...>
发送时间: Saturday, February 6, 2021 2:01:14 AM
收件人: Ni, Ray <ray.ni@...>; Laszlo Ersek <lersek@...>; Ard Biesheuvel <ardb@...>
抄送: Andrew Fish <afish@...>; edk2 RFC list <firstname.lastname@example.org>; Kinney, Michael D <michael.d.kinney@...>; Leif Lindholm (Nuvia address) <leif@...>; Dong, Eric <eric.dong@...>; Liming Gao (Byosoft address) <gaoliming@...>; Ankur Arora <ankur.a.arora@...>
主题: Re: [edk2-rfc] MemoryFence()
On 05/02/21 18:53, Ni, Ray wrote:
Without calling _ReadWriteBarrier, is it possible that compiler The proposed ReleaseMemoryFence() should already have that effect. All
generates the assembly in the wrong location? I mean the compiler may
in-line the LibWaitForSemaphore and call cmpxchg earlier than the
Similar to LibReleaseSemaphore.
So my understanding is the _ReadWriteBarrier in ReleaseSpinLock is required.
the proposed fences except CompilerFence() are both compiler
optimization barriers and processor barriers.
InterlockedCompareExchange() is also both a compiler optimization
barrier and a processor barrier
CompilerFence() is just a better name for _ReadWriteBarrier(), it blocks
optimizations but it has no effect at the processor level. It should
only be used (instead of volatile) in busy-waiting loops that do not
always go through an InterlockedCompareExchange.
it *is* buggy because it is missing a
234 _ReadWriteBarrier ();
235 *SpinLock = SPIN_LOCK_RELEASED;
236 _ReadWriteBarrier ();
238 return SpinLock;
(processor) barrier on non-x86 architectures and has a useless barrier
after the store. Instead it should be just this:
*SpinLock = SPIN_LOCK_RELEASED;