Re: MemoryFence()

Laszlo Ersek

On 02/04/21 21:04, Paolo Bonzini wrote:
Il gio 4 feb 2021, 20:46 Ard Biesheuvel <ardb@...> ha scritto:
It is very important to be *aware* of the acquire/release semantics,
but I don't think it is necessary to use all the fine grained barrier
types in EDK2.
I agree as long as the primitives are self-documenting. A single
MemoryFence() does not make it clear in which direction the data is flowing
(whether from other threads to this one, or vice versa).
I can say about the MemoryFence() calls I've added, and about the ones
Ankur is about to add, that we're aware of the data flow directions. We
could certainly annotate every such location with either "we're
publishing information here" or "we're consuming information here".

So for example:

(a) publish:

- prepare and expose the data
- "release fence"
- release the spinlock with InterlockedCompareExchange, or push the MMIO
doorbell register

(b1) consume:

- spin on a spinlock with InterlockedCompareExchange,
- "acquire fence"
- gather / consume the data

(b2) consume:

- spin on an MMIO latch, using "volatile"
- "acquire fence"
- gather / consume the data

(b3) consume:

- spin on a normal memory location with an "acquire fence" at the end of
the loop body -- still inside the loop body,
- gather / consume the data, after the loop

I believe these general patterns / semantics are "more or less" clear to
us all; I personally have not had to do anything fancier than the above,
programming fw_cfg, virtio, or even some inter-processor communication.

*However*, how those natural-language "fence" statements map to compiler
fences, CPU/memory fences, especially with regard to architecture
specifics -- I honestly admit that's where my eyes glaze over.

The MemoryFence() API has been very convenient for *both* of those
directions, but:

- it does not inherently express the intended data flow direction, as
Paolo notes

- it's unclear whether "volatile" (or a compiler barrier) is needed *in
addition* to MemoryFence() or not -- after all, if the compiler didn't
produce the necessary store/load instructions yet, "fencing" the effects
of those instructions makes no sense (paraphrasing Ard),

- the x86 implementation of MemoryFence() openly admits to breaking the
interface contract ("this is a compiler fence only"), without explaining
why -- i.e., for what architectural reasons -- it might *still* suffice
for upholding the contract.

So in the end one just goes for: "whenever in doubt, use volatile *PLUS*

Here's what I'd prefer:

- someone more versed in the *implementation* of these primitivies than
myself should please add the "directional" barriers,

- if possible, I would prefer if a naked compiler barrier API didn't
exist at all (because (a) the above-noted directional barriers should
*guarantee* in the contracts to imply the necessary compiler barrier(s),
and (b) I wouldn't know where I'd be content with *just* a compiler barrier)

Then I could go over the MemoryFence() calls I've added, and replace
each with the proper directional variant, while simultaneously killing
*all* related "volatile" qualifiers. (I'd only keep volatile where it
referred to a known MMIO register, for example for pressing or reading
an MMIO doorbell / latch.)

Then I think someone from Intel should do the same for PiSmmCpuDxeSmm
and MpInitLib -- because every time I need to look at the swaths of
"volatile" in those drivers, I must start reasoning from square#1. And
for the life of me, I cannot squeeze a bunch of the patterns I see there
into any one of the patterns (a), (b1), (b2), and (b3) that I outlined
above. In every such case, I'm left wondering, "does it work by chance
only, or by silently exploiting architecture guarantees that I don't
know about".

In summary, I'm mostly sure what I'd do about the volatile's and
MemoryFence()s that *I* have written, provided the directional barrier
variants existed, but I don't have the first clue about cleaning up
PiSmmCpuDxeSmm and MpInitLib. And ultimately, it would be awesome if,
when discussing a patch series like Ankur's v6, we didn't have to go
look for examples and tacit intentions in PiSmmCpuDxeSmm. :/


Join to automatically receive all group messages.