On 02/04/21 21:04, Paolo Bonzini wrote:
Il gio 4 feb 2021, 20:46 Ard Biesheuvel <ardb@...> ha scritto:[*]Acquire fences are barriers between earlier loads and subsequent loads(1) We should introduce finer-grained fence primitives:Acquire semantics typically order writes before reads, not /between/
Doesn't look too specific:
I've found this article very relevant:It is very important to be *aware* of the acquire/release semantics,I agree as long as the primitives are self-documenting. A single
(1) It provides an example for a store-load (Dekker's algorithm) where
any combination of read-acquire + write-release is insufficient. Thus it
would need an MFENCE (hence we need all four APIs in edk2).
(... If we jump back to the part I marked with [*], then we can see
Paolo's description of read-acquire and store-release covers load-load,
load-store, load-store (again), and store-store. What's not covered is
store-load, which Paolo said elsewhere in this thread is exactly what
x86 does reorder. So the MemoryFence() API's use would be "exceptional",
in future code, but it should exist for supporting patterns like
(2) I think the "Recommendations" section:
highlights the very problem we have. It recommends
When doing lockless programming, be sure to use volatile flag*after* explaining why "volatile" is generally insufficient:
and *after* describing the compiler barriers.
So this recommendation should recommend compiler barriers rather than
(3) The article recommends _ReadWriteBarrier, _ReadBarrier and
_WriteBarrier, for compiler fences. I think _ReadWriteBarrier should
suffice for edk2's purposes.
However, the following reference deprecates those intrinsics:
while offering *only* C++ language replacements.
Could we implement CompilerFence() for all edk2 architectures as
*non-inline* assembly? The function would consist of a return
instruction only. For x86, we could use a NASM source; for ARM, separate
MS and GNU assembler sources would be needed.
I totally want to get rid of "volatile" at least in future code, but
that's only possible if one of the following options can be satisfied:
- we find a supported replacement method for _ReadWriteBarrier when
using the MSFT toolchain family (such as the *non-inline*, empty
- or we accept that CompilerFence() is not possible to implement
portably, and we only offer the heavier-weight acquire / release /
full fences, which *include* a compiler fence too.
In the latter case, the body of a busy-waiting loop would have to use
the heavier read-acquire API.
So the structure of the solution we're looking for is:
- exactly *one* of:
- compiler fence
- acquire fence used as a heavy substitute for compiler fence,
- and *all* of
- acquire fence (load-load, load-store)
- release fence (load-store, store-store)
- full fence (load-load, load-store, store-store, store-load)
The implementation of each fence would have to be *at least* as safe as
required; it could be stronger.
I feel that we have to reach an agreement on the "exactly one of" part;
subsequent to that, maybe I can try an RFC patch for <BaseLib.h> (just
the interface contracts, at first).