MemoryFence()
Laszlo Ersek
Hi All,
here's the declaration of MemoryFence(), from <BaseLib.h>: /**The IA32 and X64 implementations of MemoryFence() are not memory barriers, only compiler barriers, however. This is a problem for two reasons: - code that believes to add a memory barrier only adds a compiler barrier, possibly creating an unsafe situation, - code that knowingly calls MemoryFence() for the purposes of a compiler barrier -- while it will hopefully work safely, relying on tacit knowledge / assumptions about the x86 architecture -- results in a source reading experience that mismatches known lockless patterns. This topic has repeatedly come up in the past; most recently in the review of Ankur's v6 series for <https://bugzilla.tianocore.org/show_bug.cgi?id=3132>, where some inter-processor communication needs to happen. I asked Paolo for help with understanding some of the synchronziation patterns in PiSmmCpuDxeSmm, the reason being that Ankur's series is going to rely on, and/or imitate, some of those patterns. From this discussion with Paolo, the following have emerged (I'm going to liberally steal / paraphrase Paolo's words): (1) We should introduce finer-grained fence primitives: ARM AARCH64 i386 CompilerFence() asm("") asm("") asm("") AcquireMemoryFence() dmb ish dmb ishld asm("") ReleaseMemoryFence() dmb ish dmb ish asm("") MemoryFence() dmb ish dmb ish mfence "where AcquireMemoryFence() is used on the read side (i.e. between reads) and ReleaseMemoryFence() is used on the write side (i.e. between writes)". (Note that this step would make MemoryFence() heavier-weight on x86 than it currently is.) (2) We should audit and rework existent uses (combinations) of MemoryFence() into matched acquire / release pairs. Less importantly, this would restore the x86 behavior (performance) to the one seen with the current MemoryFence() implementation. More importantly, it would fix what the code *means*: "it's very important to stick to known lockless patterns, and having matching acquire/release fences makes it much more intuitive to understand what's going on". --*-- I'm not proposing this as a pre-requisite to merging Ankur's series (i.e., I don't expect the series to use acquire/release -- it can be converted later with the rest of the audit). However, I'd really like us to stop wasting more time on MemoryFence() doubts such as: - Hey it's not a memory fence at all, is it safe? - But on x86 it kinda is, *dependent on* the spinlocks / lock prefixes we use for synchronization, between the other accesses. - Is it interchangeable with volatile? Yes? No? which one should we use? In the longer term, multi-processing code like MpInitLib (CpuDxe, CpuMpPei) and PiSmmCpuDxeSmm should likely abandon all use of "volatile", and use the explicit barriers instead, for *clarity*. Step (2) would take quite a bit of thinking (if not much code). Comments? Would there be interest in reviewing such work? Thanks Laszlo |
|