On 02/04/21 22:31, Andrew Fish wrote:
On Feb 4, 2021, at 10:09 AM, Laszlo Ersek <lersek@...> wrote:
Hi All,
here's the declaration of MemoryFence(), from <BaseLib.h>:
/**
Used to serialize load and store operations.
All loads and stores that proceed calls to this function are guaranteed to be
globally visible when this function returns.
**/
VOID
EFIAPI
MemoryFence (
VOID
);
The IA32 and X64 implementations of MemoryFence() are not memory
barriers, only compiler barriers, however.
This is a problem for two reasons:
- code that believes to add a memory barrier only adds a compiler
barrier, possibly creating an unsafe situation,
- code that knowingly calls MemoryFence() for the purposes of a compiler
barrier -- while it will hopefully work safely, relying on tacit
knowledge / assumptions about the x86 architecture -- results in a
source reading experience that mismatches known lockless patterns.
This topic has repeatedly come up in the past; most recently in the
review of Ankur's v6 series for
<https://bugzilla.tianocore.org/show_bug.cgi?id=3132>, where some
inter-processor communication needs to happen.
I asked Paolo for help with understanding some of the synchronziation
patterns in PiSmmCpuDxeSmm, the reason being that Ankur's series is
going to rely on, and/or imitate, some of those patterns. From this
discussion with Paolo, the following have emerged (I'm going to
liberally steal / paraphrase Paolo's words):
(1) We should introduce finer-grained fence primitives:
ARM AARCH64 i386
CompilerFence() asm("") asm("") asm("")
AcquireMemoryFence() dmb ish dmb ishld asm("")
ReleaseMemoryFence() dmb ish dmb ish asm("")
MemoryFence() dmb ish dmb ish mfence
"where AcquireMemoryFence() is used on the read side (i.e. between
reads) and ReleaseMemoryFence() is used on the write side (i.e. between
writes)".
(Note that this step would make MemoryFence() heavier-weight on x86 than
it currently is.)
(2) We should audit and rework existent uses (combinations) of
MemoryFence() into matched acquire / release pairs.
Less importantly, this would restore the x86 behavior (performance) to
the one seen with the current MemoryFence() implementation.
More importantly, it would fix what the code *means*: "it's very
important to stick to known lockless patterns, and having matching
acquire/release fences makes it much more intuitive to understand what's
going on".
--*--
I'm not proposing this as a pre-requisite to merging Ankur's series
(i.e., I don't expect the series to use acquire/release -- it can be
converted later with the rest of the audit).
However, I'd really like us to stop wasting more time on MemoryFence()
doubts such as:
- Hey it's not a memory fence at all, is it safe?
- But on x86 it kinda is, *dependent on* the spinlocks / lock prefixes
we use for synchronization, between the other accesses.
- Is it interchangeable with volatile? Yes? No? which one should we use?
In the longer term, multi-processing code like MpInitLib (CpuDxe,
CpuMpPei) and PiSmmCpuDxeSmm should likely abandon all use of
"volatile", and use the explicit barriers instead, for *clarity*.
Step (2) would take quite a bit of thinking (if not much code).
Comments?
Laszlo,
Thanks for looking into this. I noticed most of the MemoryFence() usage[1] is in the virtual platforms so changes seem manageable. I guess 3rd party code that defines its own low level primitive could be impacted, but they chose to reinvent the wheel.
It looks like the generic platform performance risk is BaseIoLibIntrinsic. I know from debugging crashes a long time ago the the MemoryFence() for x86 is really a CompilerFence() in the new scheme. If the new AcquireMemoryFence()/ReleaseMemoryFence()/CompilerFence() replace MemoryFence() this likely just works. If not then we might need an architecture specific pattern to maintain current performance, we could always make a function internal to the lib that does the right thing for the given architecture. Also making it more precise might actually help ARM/AARCH64 performance?
So if I understand correctly
1) volatile basically tells C to always read the memory. So it impacts the C memory model.
2) CompilerFence() tells the C serialize and complete everything before the barrier, and read memory again the 1st time after the barrier.
3) MemoryFence() is really dealing with speculative fetches and maybe coherency. Basically the CPU reordering things.
So for example in BaseIoLibIntrinsic() the volatile is used to force the memory read and the CompilerFence() is used to enforce the order of that read relative to other operations. Thus wrapping the volatile access in CompilerFence() is in some ways redundant as long as there is only a single access inside the CompilerFence() wrapper.
Right, I agree with everything you said thus far. (I don't claim to be
an expert on this, of course.) CompilerFence() is a "better volatile"
(for RAM only, not for MMIO), where we can allow the compiler to
optimize, up to a certain location.
I like the idea of properly annotating the code. Given x86 defaults to so much cache coherency for compatibility with older simpler times it can tend to hide issues. I’m not sure removing the volatile keyword is required, and it does imply this variable has special rules. I guess we could add a REQUIRE_FENCE or some such? But it it feels like defining the variables in some way to imply the require special handling has some value?
Objects that are shared between processors, or between processor and
some device, should always be highlighted in some way (naming, comments,
...).
The only reason you would need a volatile is if you had more than a single read inside of the fences. Something like:
UINT8 *Address;
UINT8. Value
MemoryFence ();
do {
Value = *Address
} while (Value == 0xFF) ;
MemoryFence ();
The compiler could optimize the above code (current MemoryFence() model) to a single read and infinite loop if 0xFF. Yes I realize you could move the MemoryFence () into the loop, but what if you did not care about that? Mostly just trying to think this through we some examples….
I think it depends on what Address refers to. If it refers to an MMIO
register, then volatile is needed, IMO (and the MemoryFence()s are
unnecessary). If Address points to a byte in RAM that's manipulated by
multiple CPUs, then I think:
- volatile is not needed
- the two MemoryFence() calls should be removed
- an AcquireMemoryFence() call should be inserted into the loop, after
the read access. Because, presumably, once the flag assumes the
appropriate value (it has been "released")), we'll proceed to reading
some memory resource that was protected by the flag until then.
Regarding the many MemoryFence() calls in the virt platforms, most of
them do not exist for inter-CPU communication, but for communication
with devices through RAM. I admit I'm unsure what we should do with them.
Thanks
Laszlo
[1] /Volumes/Case/edk2-github(master)>git grep MemoryFence | grep ';'
ArmVirtPkg/Library/QemuFwCfgLib/QemuFwCfgLib.c:320: MemoryFence ();
ArmVirtPkg/Library/QemuFwCfgLib/QemuFwCfgLib.c:334: MemoryFence ();
ArmVirtPkg/Library/QemuFwCfgLib/QemuFwCfgLib.c:344: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:1493: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:1497: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:4702: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:4707: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:4712: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:4765: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:4770: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:4775: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:4828: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:4833: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:4838: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:4892: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:4894: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:86: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:88: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:115: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:117: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:147: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:149: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:179: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:181: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:211: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:213: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:243: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:245: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:275: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:277: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:305: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:307: MemoryFence ();
MdePkg/Library/BaseLib/AArch64/MemoryFence.asm:3:; MemoryFence() for AArch64
MdePkg/Library/BaseLib/AArch64/MemoryFence.asm:23:;MemoryFence (
MdePkg/Library/BaseLib/Arm/MemoryFence.asm:3:; MemoryFence() for AArch64
MdePkg/Library/BaseLib/Arm/MemoryFence.asm:24:;MemoryFence (
OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c:412: MemoryFence ();
OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c:425: MemoryFence ();
OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c:438: MemoryFence ();
OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgPei.c:173: MemoryFence ();
OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgPei.c:186: MemoryFence ();
OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgPei.c:199: MemoryFence ();
OvmfPkg/Library/VirtioLib/VirtioLib.c:305: MemoryFence();
OvmfPkg/Library/VirtioLib/VirtioLib.c:312: MemoryFence();
OvmfPkg/Library/VirtioLib/VirtioLib.c:327: MemoryFence();
OvmfPkg/Library/VirtioLib/VirtioLib.c:334: MemoryFence();
OvmfPkg/Library/VirtioLib/VirtioLib.c:337: MemoryFence();
OvmfPkg/Library/VmgExitLib/VmgExitLib.c:122: MemoryFence ();
OvmfPkg/Library/VmgExitLib/VmgExitLib.c:124: MemoryFence ();
OvmfPkg/Library/XenConsoleSerialPortLib/XenConsoleSerialPortLib.c:117: MemoryFence ();
OvmfPkg/Library/XenConsoleSerialPortLib/XenConsoleSerialPortLib.c:122: MemoryFence ();
OvmfPkg/Library/XenConsoleSerialPortLib/XenConsoleSerialPortLib.c:171: MemoryFence ();
OvmfPkg/Library/XenConsoleSerialPortLib/XenConsoleSerialPortLib.c:177: MemoryFence ();
OvmfPkg/PvScsiDxe/PvScsi.c:595: MemoryFence ();
OvmfPkg/PvScsiDxe/PvScsi.c:625: MemoryFence ();
OvmfPkg/VirtioNetDxe/Events.c:56: MemoryFence ();
OvmfPkg/VirtioNetDxe/Events.c:58: MemoryFence ();
OvmfPkg/VirtioNetDxe/SnpGetStatus.c:99: MemoryFence ();
OvmfPkg/VirtioNetDxe/SnpGetStatus.c:102: MemoryFence ();
OvmfPkg/VirtioNetDxe/SnpInitialize.c:257: MemoryFence ();
OvmfPkg/VirtioNetDxe/SnpInitialize.c:384: MemoryFence ();
OvmfPkg/VirtioNetDxe/SnpInitialize.c:426: MemoryFence ();
OvmfPkg/VirtioNetDxe/SnpInitialize.c:437: MemoryFence ();
OvmfPkg/VirtioNetDxe/SnpReceive.c:101: MemoryFence ();
OvmfPkg/VirtioNetDxe/SnpReceive.c:103: MemoryFence ();
OvmfPkg/VirtioNetDxe/SnpReceive.c:173: MemoryFence ();
OvmfPkg/VirtioNetDxe/SnpReceive.c:176: MemoryFence ();
OvmfPkg/VirtioNetDxe/SnpTransmit.c:169: MemoryFence ();
OvmfPkg/VirtioNetDxe/SnpTransmit.c:172: MemoryFence ();
OvmfPkg/XenBusDxe/GrantTable.c:86: MemoryFence ();
OvmfPkg/XenBusDxe/XenStore.c:496: MemoryFence ();
OvmfPkg/XenBusDxe/XenStore.c:503: MemoryFence ();
OvmfPkg/XenBusDxe/XenStore.c:569: MemoryFence ();
OvmfPkg/XenBusDxe/XenStore.c:580: MemoryFence ();
OvmfPkg/XenBusDxe/XenStore.c:587: MemoryFence ();
OvmfPkg/XenPvBlkDxe/BlockFront.c:473: MemoryFence ();
OvmfPkg/XenPvBlkDxe/BlockFront.c:528: MemoryFence ();
OvmfPkg/XenPvBlkDxe/BlockFront.c:578: MemoryFence ();
UefiCpuPkg/Library/BaseXApicX2ApicLib/BaseXApicX2ApicLib.c:204: MemoryFence ();
Thanks,
Andrew Fish
Would there be interest in reviewing such work?
Thanks
Laszlo