Date   

Re: MemoryFence()

Andrew Fish <afish@...>
 

Ray,

Yes I was writing a mail when I wrote the bugzilla….

Ray,

Function calls are not compiler fences. The compiler just optimized the code way as dead code. For non VC++ we have a real compiler fence/barrier for MemoryFence(). I filed a bugzilla [1] so we can investigate this some more.

I hit a real bug with clang using the MMIO functions a long time ago and I think that might be why the GCC version of the MemoryFence() got the compiler fence/barrier.

If I had to guess VC++ is probably more conservative about volatiles and that is why we have not seen issue with VC++, but that is depending on non standard behavior.

Well this probably explains why we don’t have many MemoryFence() calls in the x86 CPU code, given they don’t do anything on VC++. I have noticed from debugging MP code that was developed using VC++, that we seem to hit more issues when compiled via clang. I wonder if the behavior difference with volatile would explain some of those?

VOID
EFIAPI
MemoryFence (
VOID
)
{
return;
}

And:

VOID
EFIAPI
MemoryFence (
VOID
)
{
// This is a little bit of overkill and it is more about the compiler that it is
// actually processor synchronization. This is like the _ReadWriteBarrier
// Microsoft specific intrinsic
__asm__ __volatile__ ("":::"memory");
}

So I think for VC++ it should be…

VOID
EFIAPI
MemoryFence (
VOID
)
{
_ReadWriteBarrier();
return;
}

[1] https://bugzilla.tianocore.org/show_bug.cgi?id=3213

Thanks,

Andrew Fish

On Feb 5, 2021, at 9:53 AM, Ni, Ray <ray.ni@...> wrote:

Without calling _ReadWriteBarrier, is it possible that compiler generates the assembly in the wrong location? I mean the compiler may in-line the LibWaitForSemaphore and call cmpxchg earlier than the desired location.
Similar to LibReleaseSemaphore.

So my understanding is the _ReadWriteBarrier in ReleaseSpinLock is required.

I think Andrew also has the same thoughts.

thanks,
ray
发件人: Paolo Bonzini <pbonzini@...>
发送时间: Saturday, February 6, 2021 1:34:15 AM
收件人: Laszlo Ersek <lersek@...>; Ni, Ray <ray.ni@...>; Ard Biesheuvel <ardb@...>
抄送: Andrew Fish <afish@...>; edk2 RFC list <rfc@edk2.groups.io>; Kinney, Michael D <michael.d.kinney@...>; Leif Lindholm (Nuvia address) <leif@...>; Dong, Eric <eric.dong@...>; Liming Gao (Byosoft address) <gaoliming@...>; Ankur Arora <ankur.a.arora@...>
主题: Re: MemoryFence()

On 05/02/21 18:31, Laszlo Ersek wrote:
On 02/05/21 18:21, Paolo Bonzini wrote:
On 05/02/21 17:56, Laszlo Ersek wrote:
221 SPIN_LOCK *
222 EFIAPI
223 ReleaseSpinLock (
224 IN OUT SPIN_LOCK *SpinLock
225 )
226 {
227 SPIN_LOCK LockValue;
228
229 ASSERT (SpinLock != NULL);
230
231 LockValue = *SpinLock;
232 ASSERT (LockValue == SPIN_LOCK_ACQUIRED || LockValue ==
SPIN_LOCK_RELEASED);
233
234 _ReadWriteBarrier ();
235 *SpinLock = SPIN_LOCK_RELEASED;
236 _ReadWriteBarrier ();
237
238 return SpinLock;
239 }

Fishy. I would have implemented it with another
InterlockedCompareExchangePointer(), and maybe ASSERT()ed on the
original value returned by the InterlockedCompareExchangePointer().
That would be overkill. However, it *is* buggy because it is missing a
(processor) barrier on non-x86 architectures and has a useless barrier
after the store. Instead it should be just this:

ReleaseMemoryFence ();
*SpinLock = SPIN_LOCK_RELEASED;
No concern that the store might not be atomic?
Not as long as it's a pointer or smaller.

Paolo


Re: MemoryFence()

Ni, Ray
 

I saw the proposal of fences in first mail by Laszlo. Please forgive my ignorance. What is asm(“”) in x86? A nop? The how a nop can help as a processor level load store barrier?


thanks,
ray
________________________________
发件人: rfc@edk2.groups.io <rfc@edk2.groups.io> 代表 Paolo Bonzini <pbonzini@...>
发送时间: Saturday, February 6, 2021 2:01:14 AM
收件人: Ni, Ray <ray.ni@...>; Laszlo Ersek <lersek@...>; Ard Biesheuvel <ardb@...>
抄送: Andrew Fish <afish@...>; edk2 RFC list <rfc@edk2.groups.io>; Kinney, Michael D <michael.d.kinney@...>; Leif Lindholm (Nuvia address) <leif@...>; Dong, Eric <eric.dong@...>; Liming Gao (Byosoft address) <gaoliming@...>; Ankur Arora <ankur.a.arora@...>
主题: Re: [edk2-rfc] MemoryFence()

On 05/02/21 18:53, Ni, Ray wrote:
Without calling _ReadWriteBarrier, is it possible that compiler
generates the assembly in the wrong location? I mean the compiler may
in-line the LibWaitForSemaphore and call cmpxchg earlier than the
desired location.
Similar to LibReleaseSemaphore.

So my understanding is the _ReadWriteBarrier in ReleaseSpinLock is required.
The proposed ReleaseMemoryFence() should already have that effect. All
the proposed fences except CompilerFence() are both compiler
optimization barriers and processor barriers.
InterlockedCompareExchange() is also both a compiler optimization
barrier and a processor barrier

CompilerFence() is just a better name for _ReadWriteBarrier(), it blocks
optimizations but it has no effect at the processor level. It should
only be used (instead of volatile) in busy-waiting loops that do not
always go through an InterlockedCompareExchange.

Paolo

234 _ReadWriteBarrier ();
235 *SpinLock = SPIN_LOCK_RELEASED;
236 _ReadWriteBarrier ();
237
238 return SpinLock;
it *is* buggy because it is missing a
(processor) barrier on non-x86 architectures and has a useless barrier
after the store. Instead it should be just this:

ReleaseMemoryFence ();
*SpinLock = SPIN_LOCK_RELEASED;


Re: MemoryFence()

Ni, Ray
 

Andrew,
I just read the link you provided. It recommends to use mutex semaphore spin lock over volatile.
But the funny thing here is we are talking about how to implement the mutex semaphore spin lock in edk2

Maybe we need to rely on built in atomic functions but not sure these are gcc only?
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html


thanks,
ray
________________________________
发件人: Andrew Fish <afish@...>
发送时间: Saturday, February 6, 2021 1:48:50 AM
收件人: Ni, Ray <ray.ni@...>
抄送: Laszlo Ersek <lersek@...>; edk2 RFC list <rfc@edk2.groups.io>; Kinney, Michael D <michael.d.kinney@...>; Leif Lindholm (Nuvia address) <leif@...>; Ard Biesheuvel (kernel.org address) <ardb@...>; Dong, Eric <eric.dong@...>; Liming Gao (Byosoft address) <gaoliming@...>; Ankur Arora <ankur.a.arora@...>; Paolo Bonzini <pbonzini@...>
主题: Re: MemoryFence()



On Feb 5, 2021, at 12:25 AM, Ni, Ray <ray.ni@...<mailto:ray.ni@...>> wrote:

So if I understand correctly
1) volatile basically tells C to always read the memory. So it impacts the C memory model.
2) CompilerFence() tells the C serialize and complete everything before the barrier, and read memory again the 1st time after the barrier.
3) MemoryFence() is really dealing with speculative fetches and maybe coherency. Basically the CPU reordering things.
I agree with this summary.
Volatile using in MpInitLib or PiSmmCpu driver cannot be emitted because different CPU threads change the same memory content to synchronize with each other.
I don’t quite understand if removing “volatile” what mechanism can be used to force the compiler generating code that always reads from memory?


Ray,

I agree the C definition of volatile is not thread safe.

I was doing some research on volatile and I realize the Linux kernel guides recommend against using volatile [1]. I think it has to do with people confusing volatile for being thread safe. So my guess is it is more of a coding standard to encourage people to use synchronization primitives. The primitives imply compiler fences which indirectly enforces the same behavior as volatile.

[1] https://www.kernel.org/doc/html/latest/process/volatile-considered-harmful.html

Thanks,

Andrew Fish


Thanks,
Ray


Re: MemoryFence()

Paolo Bonzini <pbonzini@...>
 

On 05/02/21 18:53, Ni, Ray wrote:
Without calling _ReadWriteBarrier, is it possible that compiler generates the assembly in the wrong location? I mean the compiler may in-line the LibWaitForSemaphore and call cmpxchg earlier than the desired location.
Similar to LibReleaseSemaphore.
So my understanding is the _ReadWriteBarrier in ReleaseSpinLock is required.
The proposed ReleaseMemoryFence() should already have that effect. All the proposed fences except CompilerFence() are both compiler optimization barriers and processor barriers. InterlockedCompareExchange() is also both a compiler optimization barrier and a processor barrier

CompilerFence() is just a better name for _ReadWriteBarrier(), it blocks optimizations but it has no effect at the processor level. It should only be used (instead of volatile) in busy-waiting loops that do not always go through an InterlockedCompareExchange.

Paolo

234 _ReadWriteBarrier ();
235 *SpinLock = SPIN_LOCK_RELEASED;
236 _ReadWriteBarrier ();
237
238 return SpinLock;
it *is* buggy because it is missing a
(processor) barrier on non-x86 architectures and has a useless barrier
after the store. Instead it should be just this:

ReleaseMemoryFence ();
*SpinLock = SPIN_LOCK_RELEASED;


Re: MemoryFence()

Ni, Ray
 

Without calling _ReadWriteBarrier, is it possible that compiler generates the assembly in the wrong location? I mean the compiler may in-line the LibWaitForSemaphore and call cmpxchg earlier than the desired location.
Similar to LibReleaseSemaphore.

So my understanding is the _ReadWriteBarrier in ReleaseSpinLock is required.

I think Andrew also has the same thoughts.

thanks,
ray
________________________________
发件人: Paolo Bonzini <pbonzini@...>
发送时间: Saturday, February 6, 2021 1:34:15 AM
收件人: Laszlo Ersek <lersek@...>; Ni, Ray <ray.ni@...>; Ard Biesheuvel <ardb@...>
抄送: Andrew Fish <afish@...>; edk2 RFC list <rfc@edk2.groups.io>; Kinney, Michael D <michael.d.kinney@...>; Leif Lindholm (Nuvia address) <leif@...>; Dong, Eric <eric.dong@...>; Liming Gao (Byosoft address) <gaoliming@...>; Ankur Arora <ankur.a.arora@...>
主题: Re: MemoryFence()

On 05/02/21 18:31, Laszlo Ersek wrote:
On 02/05/21 18:21, Paolo Bonzini wrote:
On 05/02/21 17:56, Laszlo Ersek wrote:
221 SPIN_LOCK *
222 EFIAPI
223 ReleaseSpinLock (
224 IN OUT SPIN_LOCK *SpinLock
225 )
226 {
227 SPIN_LOCK LockValue;
228
229 ASSERT (SpinLock != NULL);
230
231 LockValue = *SpinLock;
232 ASSERT (LockValue == SPIN_LOCK_ACQUIRED || LockValue ==
SPIN_LOCK_RELEASED);
233
234 _ReadWriteBarrier ();
235 *SpinLock = SPIN_LOCK_RELEASED;
236 _ReadWriteBarrier ();
237
238 return SpinLock;
239 }

Fishy. I would have implemented it with another
InterlockedCompareExchangePointer(), and maybe ASSERT()ed on the
original value returned by the InterlockedCompareExchangePointer().
That would be overkill. However, it *is* buggy because it is missing a
(processor) barrier on non-x86 architectures and has a useless barrier
after the store. Instead it should be just this:

ReleaseMemoryFence ();
*SpinLock = SPIN_LOCK_RELEASED;
No concern that the store might not be atomic?
Not as long as it's a pointer or smaller.

Paolo


Re: MemoryFence()

Andrew Fish <afish@...>
 

On Feb 5, 2021, at 7:38 AM, Paolo Bonzini <pbonzini@...> wrote:

On 05/02/21 16:34, Laszlo Ersek wrote:
And your code is wrong because the*original*, namely the MSFT
implementation of MemoryFence() in edk2, is bugged:
* MdePkg/Library/BaseLib/X86MemoryFence.c:
/**
Used to serialize load and store operations.

All loads and stores that proceed calls to this function are guaranteed to be
globally visible when this function returns.

**/
VOID
EFIAPI
MemoryFence (
VOID
)
{
return;
}
I think it's okay-ish (it works as long as you don't do link-time optimization, but it's less efficient than a compiler intrinsic) because X86MemoryFence.c is a separate file.
We do link time optimization on VC++ and clang.

Thanks,

Andrew Fish

Paolo


Re: MemoryFence()

Andrew Fish <afish@...>
 

On Feb 5, 2021, at 12:25 AM, Ni, Ray <ray.ni@...> wrote:

So if I understand correctly
1) volatile basically tells C to always read the memory. So it impacts the C memory model.
2) CompilerFence() tells the C serialize and complete everything before the barrier, and read memory again the 1st time after the barrier.
3) MemoryFence() is really dealing with speculative fetches and maybe coherency. Basically the CPU reordering things.
I agree with this summary.
Volatile using in MpInitLib or PiSmmCpu driver cannot be emitted because different CPU threads change the same memory content to synchronize with each other.
I don’t quite understand if removing “volatile” what mechanism can be used to force the compiler generating code that always reads from memory?
Ray,

I agree the C definition of volatile is not thread safe.

I was doing some research on volatile and I realize the Linux kernel guides recommend against using volatile [1]. I think it has to do with people confusing volatile for being thread safe. So my guess is it is more of a coding standard to encourage people to use synchronization primitives. The primitives imply compiler fences which indirectly enforces the same behavior as volatile.

[1] https://www.kernel.org/doc/html/latest/process/volatile-considered-harmful.html

Thanks,

Andrew Fish


Thanks,
Ray


Re: MemoryFence()

Paolo Bonzini <pbonzini@...>
 

On 05/02/21 18:31, Laszlo Ersek wrote:
On 02/05/21 18:21, Paolo Bonzini wrote:
On 05/02/21 17:56, Laszlo Ersek wrote:
    221  SPIN_LOCK *
    222  EFIAPI
    223  ReleaseSpinLock (
    224    IN OUT  SPIN_LOCK                 *SpinLock
    225    )
    226  {
    227    SPIN_LOCK    LockValue;
    228
    229    ASSERT (SpinLock != NULL);
    230
    231    LockValue = *SpinLock;
    232    ASSERT (LockValue == SPIN_LOCK_ACQUIRED || LockValue ==
SPIN_LOCK_RELEASED);
    233
    234    _ReadWriteBarrier ();
    235    *SpinLock = SPIN_LOCK_RELEASED;
    236    _ReadWriteBarrier ();
    237
    238    return SpinLock;
    239  }

Fishy. I would have implemented it with another
InterlockedCompareExchangePointer(), and maybe ASSERT()ed on the
original value returned by the InterlockedCompareExchangePointer().
That would be overkill.  However, it *is* buggy because it is missing a
(processor) barrier on non-x86 architectures and has a useless barrier
after the store.  Instead it should be just this:

    ReleaseMemoryFence ();
    *SpinLock = SPIN_LOCK_RELEASED;
No concern that the store might not be atomic?
Not as long as it's a pointer or smaller.

Paolo


Re: MemoryFence()

Andrew Fish <afish@...>
 

On Feb 4, 2021, at 11:48 PM, Laszlo Ersek <lersek@...> wrote:

On 02/04/21 22:31, Andrew Fish wrote:


On Feb 4, 2021, at 10:09 AM, Laszlo Ersek <lersek@...> wrote:

Hi All,

here's the declaration of MemoryFence(), from <BaseLib.h>:

/**
Used to serialize load and store operations.

All loads and stores that proceed calls to this function are guaranteed to be
globally visible when this function returns.

**/
VOID
EFIAPI
MemoryFence (
VOID
);
The IA32 and X64 implementations of MemoryFence() are not memory
barriers, only compiler barriers, however.

This is a problem for two reasons:

- code that believes to add a memory barrier only adds a compiler
barrier, possibly creating an unsafe situation,

- code that knowingly calls MemoryFence() for the purposes of a compiler
barrier -- while it will hopefully work safely, relying on tacit
knowledge / assumptions about the x86 architecture -- results in a
source reading experience that mismatches known lockless patterns.


This topic has repeatedly come up in the past; most recently in the
review of Ankur's v6 series for
<https://bugzilla.tianocore.org/show_bug.cgi?id=3132>, where some
inter-processor communication needs to happen.

I asked Paolo for help with understanding some of the synchronziation
patterns in PiSmmCpuDxeSmm, the reason being that Ankur's series is
going to rely on, and/or imitate, some of those patterns. From this
discussion with Paolo, the following have emerged (I'm going to
liberally steal / paraphrase Paolo's words):


(1) We should introduce finer-grained fence primitives:

ARM AARCH64 i386

CompilerFence() asm("") asm("") asm("")
AcquireMemoryFence() dmb ish dmb ishld asm("")
ReleaseMemoryFence() dmb ish dmb ish asm("")
MemoryFence() dmb ish dmb ish mfence

"where AcquireMemoryFence() is used on the read side (i.e. between
reads) and ReleaseMemoryFence() is used on the write side (i.e. between
writes)".

(Note that this step would make MemoryFence() heavier-weight on x86 than
it currently is.)


(2) We should audit and rework existent uses (combinations) of
MemoryFence() into matched acquire / release pairs.

Less importantly, this would restore the x86 behavior (performance) to
the one seen with the current MemoryFence() implementation.

More importantly, it would fix what the code *means*: "it's very
important to stick to known lockless patterns, and having matching
acquire/release fences makes it much more intuitive to understand what's
going on".

--*--

I'm not proposing this as a pre-requisite to merging Ankur's series
(i.e., I don't expect the series to use acquire/release -- it can be
converted later with the rest of the audit).

However, I'd really like us to stop wasting more time on MemoryFence()
doubts such as:

- Hey it's not a memory fence at all, is it safe?

- But on x86 it kinda is, *dependent on* the spinlocks / lock prefixes
we use for synchronization, between the other accesses.

- Is it interchangeable with volatile? Yes? No? which one should we use?

In the longer term, multi-processing code like MpInitLib (CpuDxe,
CpuMpPei) and PiSmmCpuDxeSmm should likely abandon all use of
"volatile", and use the explicit barriers instead, for *clarity*.


Step (2) would take quite a bit of thinking (if not much code).

Comments?
Laszlo,

Thanks for looking into this. I noticed most of the MemoryFence() usage[1] is in the virtual platforms so changes seem manageable. I guess 3rd party code that defines its own low level primitive could be impacted, but they chose to reinvent the wheel.

It looks like the generic platform performance risk is BaseIoLibIntrinsic. I know from debugging crashes a long time ago the the MemoryFence() for x86 is really a CompilerFence() in the new scheme. If the new AcquireMemoryFence()/ReleaseMemoryFence()/CompilerFence() replace MemoryFence() this likely just works. If not then we might need an architecture specific pattern to maintain current performance, we could always make a function internal to the lib that does the right thing for the given architecture. Also making it more precise might actually help ARM/AARCH64 performance?

So if I understand correctly
1) volatile basically tells C to always read the memory. So it impacts the C memory model.
2) CompilerFence() tells the C serialize and complete everything before the barrier, and read memory again the 1st time after the barrier.
3) MemoryFence() is really dealing with speculative fetches and maybe coherency. Basically the CPU reordering things.

So for example in BaseIoLibIntrinsic() the volatile is used to force the memory read and the CompilerFence() is used to enforce the order of that read relative to other operations. Thus wrapping the volatile access in CompilerFence() is in some ways redundant as long as there is only a single access inside the CompilerFence() wrapper.
Right, I agree with everything you said thus far. (I don't claim to be
an expert on this, of course.) CompilerFence() is a "better volatile"
(for RAM only, not for MMIO), where we can allow the compiler to
optimize, up to a certain location.

I like the idea of properly annotating the code. Given x86 defaults to so much cache coherency for compatibility with older simpler times it can tend to hide issues. I’m not sure removing the volatile keyword is required, and it does imply this variable has special rules. I guess we could add a REQUIRE_FENCE or some such? But it it feels like defining the variables in some way to imply the require special handling has some value?
Objects that are shared between processors, or between processor and
some device, should always be highlighted in some way (naming, comments,
...).


The only reason you would need a volatile is if you had more than a single read inside of the fences. Something like:

UINT8 *Address;
UINT8. Value

MemoryFence ();
do {
Value = *Address
} while (Value == 0xFF) ;
MemoryFence ();

The compiler could optimize the above code (current MemoryFence() model) to a single read and infinite loop if 0xFF. Yes I realize you could move the MemoryFence () into the loop, but what if you did not care about that? Mostly just trying to think this through we some examples….
I think it depends on what Address refers to. If it refers to an MMIO
register, then volatile is needed, IMO (and the MemoryFence()s are
unnecessary). If Address points to a byte in RAM that's manipulated by
multiple CPUs, then I think:

- volatile is not needed
OK I see Linux recommends not using volatile [1], so I think this makes sense to me. It is not that volatile is not useful, but it is not thread safe, so not using volatile encourages people to use primitives to make things thread safe. So not using volatile is more of a coding convention.

- the two MemoryFence() calls should be removed
They may need to turn into CompilerFence().

- an AcquireMemoryFence() call should be inserted into the loop, after
the read access. Because, presumably, once the flag assumes the
appropriate value (it has been "released")), we'll proceed to reading
some memory resource that was protected by the flag until then.
CompilerFence ();
do {
Value = *Address;
AcquireMemoryFence();
} while (Value == 0xFF) ;



Regarding the many MemoryFence() calls in the virt platforms, most of
them do not exist for inter-CPU communication, but for communication
with devices through RAM. I admit I'm unsure what we should do with them.
I think this is where x86 makes things confusing with its coherency all the time motto. On x86 I think they are really CompilerFences() to force C memory model order of operations. They got added to fix real bugs on x86. Remember the compiler can still reorder a volatile access withs surrounding code as an optimization.

So for example if MmioWrite8() has MemoryFence() replaced with CompilerFence() these operations happen in order. If there is no CompilerFence() the order is implementation dependent.

UINT8 *A, *B, *C;

*A = 1;
MmioWrite8 (B, 2);
*C = 3;

So that is why we want the CompilerFence() + volatile for the MMIO. For example you could end up writing to a Common DMA buffer and then doing an MMIO write. You might want those to happen in sequence and you need the CompilerFence()

At least the CPU code is CPU specific so the rules should be more well defined :).

[1] https://www.kernel.org/doc/html/latest/process/volatile-considered-harmful.html

Thanks,

Andrew Fish


Thanks
Laszlo


[1] /Volumes/Case/edk2-github(master)>git grep MemoryFence | grep ';'
ArmVirtPkg/Library/QemuFwCfgLib/QemuFwCfgLib.c:320: MemoryFence ();
ArmVirtPkg/Library/QemuFwCfgLib/QemuFwCfgLib.c:334: MemoryFence ();
ArmVirtPkg/Library/QemuFwCfgLib/QemuFwCfgLib.c:344: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:1493: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:1497: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:4702: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:4707: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:4712: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:4765: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:4770: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:4775: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:4828: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:4833: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:4838: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:4892: MemoryFence ();
MdeModulePkg/Universal/EbcDxe/EbcExecute.c:4894: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:86: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:88: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:115: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:117: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:147: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:149: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:179: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:181: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:211: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:213: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:243: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:245: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:275: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:277: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:305: MemoryFence ();
MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:307: MemoryFence ();
MdePkg/Library/BaseLib/AArch64/MemoryFence.asm:3:; MemoryFence() for AArch64
MdePkg/Library/BaseLib/AArch64/MemoryFence.asm:23:;MemoryFence (
MdePkg/Library/BaseLib/Arm/MemoryFence.asm:3:; MemoryFence() for AArch64
MdePkg/Library/BaseLib/Arm/MemoryFence.asm:24:;MemoryFence (
OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c:412: MemoryFence ();
OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c:425: MemoryFence ();
OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c:438: MemoryFence ();
OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgPei.c:173: MemoryFence ();
OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgPei.c:186: MemoryFence ();
OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgPei.c:199: MemoryFence ();
OvmfPkg/Library/VirtioLib/VirtioLib.c:305: MemoryFence();
OvmfPkg/Library/VirtioLib/VirtioLib.c:312: MemoryFence();
OvmfPkg/Library/VirtioLib/VirtioLib.c:327: MemoryFence();
OvmfPkg/Library/VirtioLib/VirtioLib.c:334: MemoryFence();
OvmfPkg/Library/VirtioLib/VirtioLib.c:337: MemoryFence();
OvmfPkg/Library/VmgExitLib/VmgExitLib.c:122: MemoryFence ();
OvmfPkg/Library/VmgExitLib/VmgExitLib.c:124: MemoryFence ();
OvmfPkg/Library/XenConsoleSerialPortLib/XenConsoleSerialPortLib.c:117: MemoryFence ();
OvmfPkg/Library/XenConsoleSerialPortLib/XenConsoleSerialPortLib.c:122: MemoryFence ();
OvmfPkg/Library/XenConsoleSerialPortLib/XenConsoleSerialPortLib.c:171: MemoryFence ();
OvmfPkg/Library/XenConsoleSerialPortLib/XenConsoleSerialPortLib.c:177: MemoryFence ();
OvmfPkg/PvScsiDxe/PvScsi.c:595: MemoryFence ();
OvmfPkg/PvScsiDxe/PvScsi.c:625: MemoryFence ();
OvmfPkg/VirtioNetDxe/Events.c:56: MemoryFence ();
OvmfPkg/VirtioNetDxe/Events.c:58: MemoryFence ();
OvmfPkg/VirtioNetDxe/SnpGetStatus.c:99: MemoryFence ();
OvmfPkg/VirtioNetDxe/SnpGetStatus.c:102: MemoryFence ();
OvmfPkg/VirtioNetDxe/SnpInitialize.c:257: MemoryFence ();
OvmfPkg/VirtioNetDxe/SnpInitialize.c:384: MemoryFence ();
OvmfPkg/VirtioNetDxe/SnpInitialize.c:426: MemoryFence ();
OvmfPkg/VirtioNetDxe/SnpInitialize.c:437: MemoryFence ();
OvmfPkg/VirtioNetDxe/SnpReceive.c:101: MemoryFence ();
OvmfPkg/VirtioNetDxe/SnpReceive.c:103: MemoryFence ();
OvmfPkg/VirtioNetDxe/SnpReceive.c:173: MemoryFence ();
OvmfPkg/VirtioNetDxe/SnpReceive.c:176: MemoryFence ();
OvmfPkg/VirtioNetDxe/SnpTransmit.c:169: MemoryFence ();
OvmfPkg/VirtioNetDxe/SnpTransmit.c:172: MemoryFence ();
OvmfPkg/XenBusDxe/GrantTable.c:86: MemoryFence ();
OvmfPkg/XenBusDxe/XenStore.c:496: MemoryFence ();
OvmfPkg/XenBusDxe/XenStore.c:503: MemoryFence ();
OvmfPkg/XenBusDxe/XenStore.c:569: MemoryFence ();
OvmfPkg/XenBusDxe/XenStore.c:580: MemoryFence ();
OvmfPkg/XenBusDxe/XenStore.c:587: MemoryFence ();
OvmfPkg/XenPvBlkDxe/BlockFront.c:473: MemoryFence ();
OvmfPkg/XenPvBlkDxe/BlockFront.c:528: MemoryFence ();
OvmfPkg/XenPvBlkDxe/BlockFront.c:578: MemoryFence ();
UefiCpuPkg/Library/BaseXApicX2ApicLib/BaseXApicX2ApicLib.c:204: MemoryFence ();


Thanks,

Andrew Fish

Would there be interest in reviewing such work?

Thanks
Laszlo


Re: MemoryFence()

Laszlo Ersek
 

On 02/05/21 18:21, Paolo Bonzini wrote:
On 05/02/21 17:56, Laszlo Ersek wrote:
    221  SPIN_LOCK *
    222  EFIAPI
    223  ReleaseSpinLock (
    224    IN OUT  SPIN_LOCK                 *SpinLock
    225    )
    226  {
    227    SPIN_LOCK    LockValue;
    228
    229    ASSERT (SpinLock != NULL);
    230
    231    LockValue = *SpinLock;
    232    ASSERT (LockValue == SPIN_LOCK_ACQUIRED || LockValue ==
SPIN_LOCK_RELEASED);
    233
    234    _ReadWriteBarrier ();
    235    *SpinLock = SPIN_LOCK_RELEASED;
    236    _ReadWriteBarrier ();
    237
    238    return SpinLock;
    239  }

Fishy. I would have implemented it with another
InterlockedCompareExchangePointer(), and maybe ASSERT()ed on the
original value returned by the InterlockedCompareExchangePointer().
That would be overkill.  However, it *is* buggy because it is missing a
(processor) barrier on non-x86 architectures and has a useless barrier
after the store.  Instead it should be just this:

    ReleaseMemoryFence ();
    *SpinLock = SPIN_LOCK_RELEASED;
No concern that the store might not be atomic?

Thanks
Laszlo


Re: MemoryFence()

Paolo Bonzini <pbonzini@...>
 

On 05/02/21 17:56, Laszlo Ersek wrote:
On 02/05/21 17:04, Ni, Ray wrote:
798 **/
799 VOID
800 LibWaitForSemaphore (
801 IN OUT volatile UINT32 *Sem
802 )
I think LibWaitForSemaphore() is safe. (Which is not identical to saying
"it's in the best style".)
Agreed. I would prefer an explicit compiler fence before the "Value = *Sem" assignment and no volatile anywhere, but it is okay.

193 LockValue = *SpinLock;
194 ASSERT (LockValue == SPIN_LOCK_ACQUIRED || LockValue == SPIN_LOCK_RELEASED);
195
196 _ReadWriteBarrier ();
197 Result = InterlockedCompareExchangePointer (
198 (VOID**)SpinLock,
199 (VOID*)SPIN_LOCK_RELEASED,
200 (VOID*)SPIN_LOCK_ACQUIRED
201 );
202
203 _ReadWriteBarrier ();
204 return (BOOLEAN) (Result == (VOID*) SPIN_LOCK_RELEASED);
205 }
Having the LockValue / ASSERT() stuff seems a bit pointless to me.
The _ReadWriteBarrier (aka compiler fence) is pointless too; it is implicit in the InterlockedCompareExchangePointer.

781 VOID
782 LibReleaseSemaphore (
783 IN OUT volatile UINT32 *Sem
784 )
785 {
786 InterlockedIncrement (Sem);
787 }
This seems OK to me.
Yep.

221 SPIN_LOCK *
222 EFIAPI
223 ReleaseSpinLock (
224 IN OUT SPIN_LOCK *SpinLock
225 )
226 {
227 SPIN_LOCK LockValue;
228
229 ASSERT (SpinLock != NULL);
230
231 LockValue = *SpinLock;
232 ASSERT (LockValue == SPIN_LOCK_ACQUIRED || LockValue == SPIN_LOCK_RELEASED);
233
234 _ReadWriteBarrier ();
235 *SpinLock = SPIN_LOCK_RELEASED;
236 _ReadWriteBarrier ();
237
238 return SpinLock;
239 }
Fishy. I would have implemented it with another
InterlockedCompareExchangePointer(), and maybe ASSERT()ed on the
original value returned by the InterlockedCompareExchangePointer().
That would be overkill. However, it *is* buggy because it is missing a (processor) barrier on non-x86 architectures and has a useless barrier after the store. Instead it should be just this:

ReleaseMemoryFence ();
*SpinLock = SPIN_LOCK_RELEASED;

Paolo


Re: MemoryFence()

Laszlo Ersek
 

On 02/05/21 17:04, Ni, Ray wrote:
Maybe that explains why the MemoryFence was implemented so
confidently.
I trust Mike:)

In UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c,
the LibWaitForSemaphore, LibReleaseSemaphore are coded without the
read write barrier.
Let's focus on the "down" operations first.

At commit c6be6dab9c4b:

789 /**
790 Decrement the semaphore by 1 if it is not zero.
791
792 Performs an atomic decrement operation for semaphore.
793 The compare exchange operation must be performed using
794 MP safe mechanisms.
795
796 @param Sem IN: 32-bit unsigned integer
797
798 **/
799 VOID
800 LibWaitForSemaphore (
801 IN OUT volatile UINT32 *Sem
802 )
803 {
804 UINT32 Value;
805
806 do {
807 Value = *Sem;
808 } while (Value == 0 ||
809 InterlockedCompareExchange32 (
810 Sem,
811 Value,
812 Value - 1
813 ) != Value);
814 }

But the AcquireSpinLockOrFail and ReleaseSpinLock in
MdePkg/Library/BaseSynchronizationLib/SynchronizationMsc.c contain
read write barrier.
Is the former implementation wrong because without the barrier the
cmpxchg can happen earlier or later chosen by the compiler?

Does volatile keyword help here?
I think LibWaitForSemaphore() is safe. (Which is not identical to saying
"it's in the best style".)

The volatile ensures there is a load instruction, and that the compiler
generates the load instruction in the right place.

The volatile does not ensure that the data fetched is neither corrupt
nor stale -- the load could fetch a partially updated value or a stale
value, into "Value". But in this particular instance that should be no
problem:

- if we see a corrupt or stale zero value, we'll just continue spinning;
the value will stabilize eventually

- if we see a corrupt or stale *nonzero* value, then when we attempt to
decrement it with a safe InterlockedCompareExchange32(), we'll just
fail.

So the volatile fetch can be considered an "initial guess" that works
most of the time.

(BTW, PiSmmCpuDxeSmm has a more recent variant that is less
power-hungry; see commit 9001b750df64 ("UefiCpuPkg/PiSmmCpuDxeSmm: pause
in WaitForSemaphore() before re-fetch", 2020-07-31).)

The implementation seen in
"MdePkg/Library/BaseSynchronizationLib/SynchronizationMsc.c" is:

165 /**
166 Attempts to place a spin lock in the acquired state.
167
168 This function checks the state of the spin lock specified by SpinLock. If
169 SpinLock is in the released state, then this function places SpinLock in the
170 acquired state and returns TRUE. Otherwise, FALSE is returned. All state
171 transitions of SpinLock must be performed using MP safe mechanisms.
172
173 If SpinLock is NULL, then ASSERT().
174 If SpinLock was not initialized with InitializeSpinLock(), then ASSERT().
175
176 @param SpinLock A pointer to the spin lock to place in the acquired state.
177
178 @retval TRUE SpinLock was placed in the acquired state.
179 @retval FALSE SpinLock could not be acquired.
180
181 **/
182 BOOLEAN
183 EFIAPI
184 AcquireSpinLockOrFail (
185 IN OUT SPIN_LOCK *SpinLock
186 )
187 {
188 SPIN_LOCK LockValue;
189 VOID *Result;
190
191 ASSERT (SpinLock != NULL);
192
193 LockValue = *SpinLock;
194 ASSERT (LockValue == SPIN_LOCK_ACQUIRED || LockValue == SPIN_LOCK_RELEASED);
195
196 _ReadWriteBarrier ();
197 Result = InterlockedCompareExchangePointer (
198 (VOID**)SpinLock,
199 (VOID*)SPIN_LOCK_RELEASED,
200 (VOID*)SPIN_LOCK_ACQUIRED
201 );
202
203 _ReadWriteBarrier ();
204 return (BOOLEAN) (Result == (VOID*) SPIN_LOCK_RELEASED);
205 }

I think the explicit barriers are better style (more helpful for the
programmer, and in general, possibly better performance).

On the other hand (independently of what you asked), the
AcquireSpinLockOrFail() logic seems a bit fishy to me.

First, the ASSERT() deserves a comment, at the bare minimum. ("Even if
the data is stale or partially updated, we can't conjure up a completely
bogus value", or some such.)

Second, the rest of the function doesn't even depend on LockValue.
Having the LockValue / ASSERT() stuff seems a bit pointless to me.

--*--

Now, regarding the "up" operations:

[UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c]

775 /**
776 Increment semaphore by 1.
777
778 @param Sem IN: 32-bit unsigned integer
779
780 **/
781 VOID
782 LibReleaseSemaphore (
783 IN OUT volatile UINT32 *Sem
784 )
785 {
786 InterlockedIncrement (Sem);
787 }

This seems OK to me.

(In an ideal world, InterlockedIncrement() should internally enforce a
compiler barrier too, so the volatile should not be needed.)

[MdePkg/Library/BaseSynchronizationLib/SynchronizationMsc.c]

207 /**
208 Releases a spin lock.
209
210 This function places the spin lock specified by SpinLock in the release state
211 and returns SpinLock.
212
213 If SpinLock is NULL, then ASSERT().
214 If SpinLock was not initialized with InitializeSpinLock(), then ASSERT().
215
216 @param SpinLock A pointer to the spin lock to release.
217
218 @return SpinLock released the lock.
219
220 **/
221 SPIN_LOCK *
222 EFIAPI
223 ReleaseSpinLock (
224 IN OUT SPIN_LOCK *SpinLock
225 )
226 {
227 SPIN_LOCK LockValue;
228
229 ASSERT (SpinLock != NULL);
230
231 LockValue = *SpinLock;
232 ASSERT (LockValue == SPIN_LOCK_ACQUIRED || LockValue == SPIN_LOCK_RELEASED);
233
234 _ReadWriteBarrier ();
235 *SpinLock = SPIN_LOCK_RELEASED;
236 _ReadWriteBarrier ();
237
238 return SpinLock;
239 }

Fishy. I would have implemented it with another
InterlockedCompareExchangePointer(), and maybe ASSERT()ed on the
original value returned by the InterlockedCompareExchangePointer().

Laszlo


Re: MemoryFence()

Ni, Ray
 

Maybe that explains why the MemoryFence was implemented so confidently.
I trust Mike:)

In UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c, the LibWaitForSemaphore, LibReleaseSemaphore are coded without the read write barrier.
But the AcquireSpinLockOrFail and ReleaseSpinLock in MdePkg/Library/BaseSynchronizationLib/SynchronizationMsc.c contain read write barrier.
Is the former implementation wrong because without the barrier the cmpxchg can happen earlier or later chosen by the compiler?

Does volatile keyword help here?

thanks,
ray
________________________________
发件人: Paolo Bonzini <pbonzini@...>
发送时间: Friday, February 5, 2021 11:42:36 PM
收件人: Ard Biesheuvel <ardb@...>
抄送: Laszlo Ersek <lersek@...>; Ni, Ray <ray.ni@...>; Andrew Fish <afish@...>; edk2 RFC list <rfc@edk2.groups.io>; Kinney, Michael D <michael.d.kinney@...>; Leif Lindholm (Nuvia address) <leif@...>; Dong, Eric <eric.dong@...>; Liming Gao (Byosoft address) <gaoliming@...>; Ankur Arora <ankur.a.arora@...>
主题: Re: MemoryFence()

On 05/02/21 16:39, Ard Biesheuvel wrote:
On Fri, 5 Feb 2021 at 16:38, Paolo Bonzini <pbonzini@...> wrote:

On 05/02/21 16:34, Laszlo Ersek wrote:
And your code is wrong because the*original*, namely the MSFT
implementation of MemoryFence() in edk2, is bugged:

* MdePkg/Library/BaseLib/X86MemoryFence.c:

/**
Used to serialize load and store operations.

All loads and stores that proceed calls to this function are guaranteed to be
globally visible when this function returns.

**/
VOID
EFIAPI
MemoryFence (
VOID
)
{
return;
}
I think it's okay-ish (it works as long as you don't do link-time
optimization, but it's less efficient than a compiler intrinsic) because
X86MemoryFence.c is a separate file.
We rely heavily on LTO for both Visual Studio and GNU/Clang toolchains.
Oops... Well, it *was* okay at the time MemoryFence() was first
implemented. :(

Paolo


Re: MemoryFence()

Paolo Bonzini <pbonzini@...>
 

On 05/02/21 16:39, Ard Biesheuvel wrote:
On Fri, 5 Feb 2021 at 16:38, Paolo Bonzini <pbonzini@...> wrote:

On 05/02/21 16:34, Laszlo Ersek wrote:
And your code is wrong because the*original*, namely the MSFT
implementation of MemoryFence() in edk2, is bugged:

* MdePkg/Library/BaseLib/X86MemoryFence.c:

/**
Used to serialize load and store operations.

All loads and stores that proceed calls to this function are guaranteed to be
globally visible when this function returns.

**/
VOID
EFIAPI
MemoryFence (
VOID
)
{
return;
}
I think it's okay-ish (it works as long as you don't do link-time
optimization, but it's less efficient than a compiler intrinsic) because
X86MemoryFence.c is a separate file.
We rely heavily on LTO for both Visual Studio and GNU/Clang toolchains.
Oops... Well, it *was* okay at the time MemoryFence() was first implemented. :(

Paolo


Re: MemoryFence()

Ard Biesheuvel <ardb@...>
 

On Fri, 5 Feb 2021 at 16:38, Paolo Bonzini <pbonzini@...> wrote:

On 05/02/21 16:34, Laszlo Ersek wrote:
And your code is wrong because the*original*, namely the MSFT
implementation of MemoryFence() in edk2, is bugged:

* MdePkg/Library/BaseLib/X86MemoryFence.c:

/**
Used to serialize load and store operations.

All loads and stores that proceed calls to this function are guaranteed to be
globally visible when this function returns.

**/
VOID
EFIAPI
MemoryFence (
VOID
)
{
return;
}
I think it's okay-ish (it works as long as you don't do link-time
optimization, but it's less efficient than a compiler intrinsic) because
X86MemoryFence.c is a separate file.
We rely heavily on LTO for both Visual Studio and GNU/Clang toolchains.


Re: MemoryFence()

Paolo Bonzini <pbonzini@...>
 

On 05/02/21 16:34, Laszlo Ersek wrote:
And your code is wrong because the*original*, namely the MSFT
implementation of MemoryFence() in edk2, is bugged:
* MdePkg/Library/BaseLib/X86MemoryFence.c:

/**
Used to serialize load and store operations.

All loads and stores that proceed calls to this function are guaranteed to be
globally visible when this function returns.

**/
VOID
EFIAPI
MemoryFence (
VOID
)
{
return;
}
I think it's okay-ish (it works as long as you don't do link-time optimization, but it's less efficient than a compiler intrinsic) because X86MemoryFence.c is a separate file.

Paolo


Re: MemoryFence()

Laszlo Ersek
 

On 02/05/21 14:15, Ni, Ray wrote:
This is very interesting. I am trying to involve the discussion to
learn.
I did some experiment.
----a.c----
// Copy EDKII MemoryFence() and as Laszlo said it's just a compiler fence.
void CompilerFence () {
return;
}

void main () {
// I copied Paolo's code in below.
int *Address = &main;
int Value;

do {
// Force re-reading *Address on every iteration
CompilerFence ();
Value = *Address;
} while (Value == 0xFF);
// As you explained above.
__asm mfence;
}
----END of a.c----

When I use "Cl.exe /O2 /FAcs a.c" to compile it.
I got a.cod as below
-----a.cod----
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.27.29112.0

TITLE E:\Work\edk2\a.c
.686P
.XMM
include listing.inc
.model flat

INCLUDELIB LIBCMT
INCLUDELIB OLDNAMES

PUBLIC _CompilerFence
PUBLIC _main
; Function compile flags: /Ogtpy
; File E:\Work\edk2\a.c
; COMDAT _main
_TEXT SEGMENT
_main PROC ; COMDAT

; 6 : int *Address = &main;

00000 b8 00 00 00 00 mov eax, OFFSET _main
00005 8b 00 mov eax, DWORD PTR [eax]
$LL4@main:

; 7 : int Value;
; 8 :
; 9 : do {
; 10 : // Force re-reading *Address on every iteration
; 11 : CompilerFence ();
; 12 : Value = *Address;
; 13 : } while (Value == 0xFF);

00007 3d ff 00 00 00 cmp eax, 255 ; 000000ffH
0000c 74 f9 je SHORT $LL4@main

; 14 : // As you explained above.
; 15 : __asm mfence;

0000e 0f ae f0 mfence
00011 c3 ret 0
_main ENDP
_TEXT ENDS
; Function compile flags: /Ogtpy
; File E:\Work\edk2\a.c
; COMDAT _CompilerFence
_TEXT SEGMENT
_CompilerFence PROC ; COMDAT

; 2 : return;
; 3 : }

00000 c2 00 00 ret 0
_CompilerFence ENDP
_TEXT ENDS
END
-----END of a.cod-----

Check the assembly:
00007 3d ff 00 00 00 cmp eax, 255 ; 000000ffH

The CompilerFence() call doesn't force to generate code to always read
the memory pointed by Address.
That's because your CompilerFence() function is not correct. You did:

void CompilerFence () {
return;
}
which is wrong. It enforces a C-language sequence point, but it does not
implement a compiler barrier.

And your code is wrong because the *original*, namely the MSFT
implementation of MemoryFence() in edk2, is bugged:

* MdePkg/Library/BaseLib/X86MemoryFence.c:

/**
Used to serialize load and store operations.

All loads and stores that proceed calls to this function are guaranteed to be
globally visible when this function returns.

**/
VOID
EFIAPI
MemoryFence (
VOID
)
{
return;
}
This code comes from historical commit e1f414b6a7d8 ("Import some basic
libraries instances for Mde Packages.", 2007-06-22).

* MdePkg/Library/BaseLib/BaseLib.inf:

[Sources.Ia32]
X86MemoryFence.c | MSFT

[Sources.X64]
X86MemoryFence.c | MSFT
Back to your email:

If I replace CompilerFence() with _ReadBarrier(), compiler can
generate assembly that always reads the memory pointed by Address.

It seems the EDKII MemoryFence() even cannot be used as a read
barrier.
Does it mean that the MSVC version of EDKII MemoryFence() is
implemented wrongly?
That's precisely the case.

This actually explains why MpInitLib and PiSmmCpuDxeSmm use "volatile",
rather than MemoryFence(). (This is one of those info bits that I've
been missing.) The reason is that, when built with Visual Studio,
MemoryFence() does absolutely nothing. So if MpInitLib and
PiSmmCpuDxeSmm used MemoryFence() rather than "volatile", things would
break immediately.

Thank you Ray for highlighting this, I couldn't have imagined.

The current status means that, considering all toolchains and all
architectures that edk2 supports, together, it's been *impossible* to
use MemoryFence() correctly.

And so my defensive thinking "whenever in doubt, use 'volatile' PLUS
MemoryFence()" apparently paid off, because when built with Visual
Studio, OVMF is left with 'volatile' *only*; in effect.

It also means that in Ankur's upcoming v7 series, relying *just* on
MemoryFence() will not suffice. We need 'volatile' (given the current
state of primitives); otherwise we'll have no barriers at all.

... I don't even know where to start changing the code. :(

Laszlo


Re: MemoryFence()

Laszlo Ersek
 

On 02/05/21 09:25, Ni, Ray wrote:
So if I understand correctly
1) volatile basically tells C to always read the memory. So it impacts the C memory model.
2) CompilerFence() tells the C serialize and complete everything before the barrier, and read memory again the 1st time after the barrier.
3) MemoryFence() is really dealing with speculative fetches and maybe coherency. Basically the CPU reordering things.
I agree with this summary.
Volatile using in MpInitLib or PiSmmCpu driver cannot be emitted because different CPU threads change the same memory content to synchronize with each other.
I don’t quite understand if removing “volatile” what mechanism can be used to force the compiler generating code that always reads from memory?
A compiler barrier (in case that's sufficient), or even a memory fence
-- seems more appropriate for inter-CPU synchronization -- which I think
should be implemented to automatically force a compiler barrier too.

Laszlo


Re: MemoryFence()

Laszlo Ersek
 

On 02/05/21 10:12, Paolo Bonzini wrote:
On 05/02/21 08:48, Laszlo Ersek wrote:
   MemoryFence ();
   do {
     Value = *Address
   } while (Value == 0xFF) ;
   MemoryFence ();
If Address points to a byte in RAM that's manipulated by
multiple CPUs, then I think:

- volatile is not needed

- the two MemoryFence() calls should be removed

- an AcquireMemoryFence() call should be inserted into the loop, after
the read access. Because, presumably, once the flag assumes the
appropriate value (it has been "released")), we'll proceed to reading
some memory resource that was protected by the flag until then.
This is correct, alternatively you could have this:

    do {
      // Force re-reading *Address on every iteration
      CompilerFence ();
      Value = *Address;
    } while (Value == 0xFF);
    // As you explained above.
    AcquireMemoryFence ();
Ah, thanks; also good example where a CompilerFence() would be justified.

Laszlo


Re: MemoryFence()

Paolo Bonzini <pbonzini@...>
 

On 05/02/21 14:15, Ni, Ray wrote:
I did some experiment.
----a.c----
// Copy EDKII MemoryFence() and as Laszlo said it's just a compiler fence.
void CompilerFence () {
return;
This should be

#define CompilerFence _ReadWriteBarrier

Paolo

}
void main () {
// I copied Paolo's code in below.
int *Address = &main;
int Value;
do {
// Force re-reading *Address on every iteration
CompilerFence ();
Value = *Address;
} while (Value == 0xFF);
// As you explained above.
__asm mfence;
}
----END of a.c----