Re: MemoryFence()
Andrew Fish <afish@...>
On Feb 8, 2021, at 9:40 AM, Laszlo Ersek <lersek@...> wrote:“inline assembly is not supported on the ARM and x64 processors. “ [1]. Kind of looks like MSVC replaced inline assembly with intrinsics [2]. Looks like _ReadBarrier, _ReadWriteBarrier [3] are cross arch. But as you point out when you look them up [4]: The _ReadBarrier, _WriteBarrier, and _ReadWriteBarrier compiler intrinsics and the MemoryBarrier macro are all deprecated and should not be used. For inter-thread communication, use mechanisms such as atomic_thread_fence <https://docs.microsoft.com/en-us/cpp/standard-library/atomic-functions?view=msvc-160#atomic_thread_fence> and std::atomic<T> <https://docs.microsoft.com/en-us/cpp/standard-library/atomic?view=msvc-160> that are defined in the C++ Standard Library <https://docs.microsoft.com/en-us/cpp/standard-library/cpp-standard-library-reference?view=msvc-160>. For hardware access, use the /volatile:iso <https://docs.microsoft.com/en-us/cpp/build/reference/volatile-volatile-keyword-interpretation?view=msvc-160> compiler option together with the volatile <https://docs.microsoft.com/en-us/cpp/cpp/volatile-cpp?view=msvc-160> keyword. The scary statement there is /volatile:iso compiler option + volatile keyword for hardware access. [1] https://docs.microsoft.com/en-us/cpp/assembler/inline/inline-assembler?view=msvc-160 [2] https://docs.microsoft.com/en-us/cpp/intrinsics/x64-amd64-intrinsics-list?view=msvc-160 [3] https://docs.microsoft.com/en-us/cpp/intrinsics/intrinsics-available-on-all-architectures?view=msvc-160 [4] https://docs.microsoft.com/en-us/cpp/intrinsics/readbarrier?view=msvc-160 I think we have a good mapping for GCC/clang on x86, but I’m still not 100% clear on what to do MSVC++.I've found this article very relevant:It is very important to be *aware* of the acquire/release semantics,I agree as long as the primitives are self-documenting. A single The VC++ docs seem to point you toward: 1) volatile + /volatile:iso compiler flag for MMIO. 3) For synchronization: inline void atomic_thread_fence(memory_order Order) noexcept; It looks like memory_order maps into the primitives you are proposing? memory_order_relaxed The fence has no effect. memory_order_consume The fence is an acquire fence. memory_order_acquire The fence is an acquire fence. memory_order_release The fence is a release fence. memory_order_acq_rel The fence is both an acquire fence and a release fence. memory_order_seq_cst The fence is both an acquire fence and a release fence, and is sequentially consistent. But it kind of seems like we may be falling down a C++ rabbit hole….. As I pointed out my experience with clang is volatile is not enough and we needed the compiler fence. Thanks, Andrew Fish Thanks
|
|
Re: MemoryFence()
Paolo Bonzini <pbonzini@...>
On 08/02/21 18:40, Laszlo Ersek wrote:
(3) The article recommends _ReadWriteBarrier, _ReadBarrier andIn theory, atomic_thread_fence (and in this case, more specifically, atomic_signal_fence) is available also on C11. But if it is possible to assume that _RWB is present on Microsoft toolchains (even though it's deprecated) I would go for it as it seems to be the best match for CompilerFence. In fact, even Microsoft's own STL does #define _Compiler_barrier() \ _STL_DISABLE_DEPRECATED_WARNING _ReadWriteBarrier() \ _STL_RESTORE_DEPRECATED_WARNING extern "C" inline void atomic_signal_fence(const memory_order _Order) noexcept { if (_Order != memory_order_relaxed) { _Compiler_barrier(); } } (https://github.com/microsoft/STL/blob/master/stl/inc/atomic). For fences that require an assembly instruction, it is possible to use intrinsics too: _mm_mfence() for x86, __dmb(_ARM_BARRIER_SY) for ARM. Paolo
|
|
Re: MemoryFence()
Laszlo Ersek
On 02/04/21 21:04, Paolo Bonzini wrote:
Il gio 4 feb 2021, 20:46 Ard Biesheuvel <ardb@...> ha scritto:[*]Acquire fences are barriers between earlier loads and subsequent loads(1) We should introduce finer-grained fence primitives:Acquire semantics typically order writes before reads, not /between/ Doesn't look too specific: https://docs.microsoft.com/en-us/cpp/assembler/inline/optimizing-inline-assembly?view=msvc-160 I've found this article very relevant:It is very important to be *aware* of the acquire/release semantics,I agree as long as the primitives are self-documenting. A single https://docs.microsoft.com/en-us/windows/win32/dxtecharts/lockless-programming (1) It provides an example for a store-load (Dekker's algorithm) where any combination of read-acquire + write-release is insufficient. Thus it would need an MFENCE (hence we need all four APIs in edk2). (... If we jump back to the part I marked with [*], then we can see Paolo's description of read-acquire and store-release covers load-load, load-store, load-store (again), and store-store. What's not covered is store-load, which Paolo said elsewhere in this thread is exactly what x86 does reorder. So the MemoryFence() API's use would be "exceptional", in future code, but it should exist for supporting patterns like Dekker's algorithm.) (2) I think the "Recommendations" section: https://docs.microsoft.com/en-us/windows/win32/dxtecharts/lockless-programming#recommendations highlights the very problem we have. It recommends When doing lockless programming, be sure to use volatile flag*after* explaining why "volatile" is generally insufficient: https://docs.microsoft.com/en-us/windows/win32/dxtecharts/lockless-programming#volatile-variables-and-reordering and *after* describing the compiler barriers. So this recommendation should recommend compiler barriers rather than volatile. :/ (3) The article recommends _ReadWriteBarrier, _ReadBarrier and _WriteBarrier, for compiler fences. I think _ReadWriteBarrier should suffice for edk2's purposes. However, the following reference deprecates those intrinsics: https://docs.microsoft.com/en-us/cpp/intrinsics/readbarrier?view=msvc-160 while offering *only* C++ language replacements. Could we implement CompilerFence() for all edk2 architectures as *non-inline* assembly? The function would consist of a return instruction only. For x86, we could use a NASM source; for ARM, separate MS and GNU assembler sources would be needed. I totally want to get rid of "volatile" at least in future code, but that's only possible if one of the following options can be satisfied: - we find a supported replacement method for _ReadWriteBarrier when using the MSFT toolchain family (such as the *non-inline*, empty assembly function), - or we accept that CompilerFence() is not possible to implement portably, and we only offer the heavier-weight acquire / release / full fences, which *include* a compiler fence too. In the latter case, the body of a busy-waiting loop would have to use the heavier read-acquire API. --*-- So the structure of the solution we're looking for is: - exactly *one* of: - volatile - compiler fence - acquire fence used as a heavy substitute for compiler fence, - and *all* of - acquire fence (load-load, load-store) - release fence (load-store, store-store) - full fence (load-load, load-store, store-store, store-load) The implementation of each fence would have to be *at least* as safe as required; it could be stronger. I feel that we have to reach an agreement on the "exactly one of" part; subsequent to that, maybe I can try an RFC patch for <BaseLib.h> (just the interface contracts, at first). Thanks Laszlo
|
|
Re: MemoryFence()
Ard Biesheuvel <ardb@...>
On Mon, 8 Feb 2021 at 17:11, Laszlo Ersek <lersek@...> wrote:
Generally, yes. But I already mentioned non-cache coherent DMA devices, and we also have some ARM platforms that use non-shareable mappings for SRAM, which is used as temporary PEI memory in some cases. So the safe option is really to just use system-wide barriers - if anyone has a use case where this is a performance bottleneck, we can always revisit this later.
|
|
Re: MemoryFence()
Laszlo Ersek
On 02/06/21 09:37, Ard Biesheuvel wrote:
On ARM, 'inner shareable' is [generally] the subset of nodes andThis seems consistent with the fact that Paolo recommended variants of "dmb ish" for all the memory (not compiler) fences -- I guess we don't care about actual RAM content as long as all CPUs and cache-coherent DMA masters see the same data. Thanks Laszlo
|
|
Re: MemoryFence()
Ard Biesheuvel <ardb@...>
On Fri, 5 Feb 2021 at 20:35, Laszlo Ersek <lersek@...> wrote:
<snip> I don't think the message passing paradigm is 100% accurate but it is very close. You should realize that a modern system is a network of CPUs, caches, DRAM controllers and other bus masters, which is being traversed by packets that represent the memory transactions (request/response pairs for reads and non-posted writes). There is no natural 'grand scheme' ordering of things, and when a certain memory ordering is required, additional work is needed, which translates into additional packets, packet types and silicon gates. (Disclaimer: I'm not a HW engineer) A barrier is a packet that is broadcast on this network to instruct all nodes that all transactions of a certain type need to be completed before servicing or initiating new ones. On ARM, 'inner shareable' is [generally] the subset of nodes and edges that make up the CPUs, caches and cache-coherent DMA masters (but not main memory) This is why it is more difficult for x86 to scale to 100s of cores: there are simply more of these broadcasts being sent for ordinary memory accesses, which gets out of hand quite quickly when scaling out.
|
|
Re: MemoryFence()
Ni, Ray
I am sorry for redirecting to the C++11 share memory model. It has nothing to do with EDK2 here because EDK2 is C based. The good news is EFI does not have threads and the PI MpServices protocol is indirectly messaging based as you wait for aIt's not that TRUE today. If you look into UefiCpuPkg, MpInitLib, RegisterCpuFeaturesLib, PiSmmCpu, these modules heavily depend on the sync primitives. I don't like the today's situation that: 1. no semaphore standard library 2. no mutex standard library (there is a BaseSynchronization library that provides spinlock, cmpxchg, lock inc/dec primitives. I prefer a mutex lib over this BaseSync lib.) 2. directly using volatile over SPIN_BLOCK 3. using volatile which can be avoided as concluded by the discussion. If with the above wheels invented, even with MP service, one can write a complex long-run daemon procedure and sync with BSP properly.
|
|
Re: MemoryFence()
Andrew Fish <afish@...>
On Feb 5, 2021, at 11:35 AM, Laszlo Ersek <lersek@...> wrote:Looks like we are heading for a crash too. So good thing we are thinking about it. The good news is EFI does not have threads and the PI MpServices protocol is indirectly messaging based as you wait for a function to return or an event. Like C it can be abused since the memory is shared. Ironically the complexity of writing thread safe code in C and the fact that back in the day a big pool of the x86 firmware writers were hardware folks who wrote in x86 assembler is one of the reasons EFI ended up with. No threads an a cooperative event model….. Thanks, Andrew Fish Laszlo
|
|
Re: MemoryFence()
Laszlo Ersek
rant warning
On 02/05/21 19:29, Ni, Ray wrote: I read a bit about c memory model.Thanks. I think what happened here is a catastrophic impedance mismatch. The underlying, physical architecture is a message-passing one. Yet the mental abstraction that is built atop, for programmers to deal with, is "shared memory". Which suggests a global shared state. This is *suicidal*. There has forever been a debate of tastes; there's a camp that prefers the message passing API, and another camp that prefers the shared memory / locking API. There are pros and cons; usually it is pointed out that message passing is safer but less performant, and also that the message passing APIs are not difficult to implement on top of shared memory / locking. What people (IME) don't spend many words on however is an attempt to implement shared memory / locking on top of message passing. "Why would you want to do that, in an *application*?", is how the thinking would go, I guess. But that's *exactly* what's being sold to us with the shared memory / locking API *itself*. The hardware underneath is message-passing! All this complexity about "non-commutativity" in the release/acquire pattern -- see: "There has been no synchronization between threads 1 and 3" -- *only* makes sense if the programmer thinks about *messages in transit* between the threads (well, the processing units that execute them). Whereas, the concepts that the programmer deals with *in the source code* are called "x" and "y" -- the same names in each thread, the same addresses, the same storage. So you've got a message-passing architecture at the bottom, then build a shared memory abstraction on top, which leaks like a sieve, and then you ask the programmer to mentally deconstruct that abstraction, while he or she still has to stare at code that uses the shared memory terminology. It's *insane*. It's self-defeating. If an alien came by and you claimed this was all invented to deceive programmers on purpose, they'd have no problem believing it. Sequentially Consistent is the *only* model that halfway makes sense (whatever the execution cost), as long as we have a shared-anything programming model. If we replaced "x" and "y" in the "Overall Summary" example, with the following objects: - thread1_thread2_channel_a - thread1_thread2_channel_b - thread2_thread3_channel_c - thread2_thread3_channel_d - thread3_thread1_channel_e - thread3_thread1_channel_f in other words, if we explicitly instantiated both "x" and "y" in every particular inter-thread relationship, and we considered "store" as "send", and "load" as "receive", and each such channel were bi-directional, then folks like me might actually get a fleeting chance at scratching the surface. The shared memory model is a *wreck*, given how the hardware operates underneath. Laszlo
|
|
Re: MemoryFence()
Paolo Bonzini <pbonzini@...>
Il ven 5 feb 2021, 19:47 Laszlo Ersek <lersek@...> ha scritto:
On 02/05/21 19:32, Paolo Bonzini wrote:Got it, it's a very good question. All edk2 porting targets shouldUnaligned pointers are already undefined behavior so you can ignore howMy question was unclearly asked, sorry. Let's say we have a UINT32 at an naturally align all types not larger than pointers. I can only think of old m68k that had 2-byte aligned 32-bit pointers. Another example; we may have a pointer to a packed structure, and we might want to poke at a UINT32 field in that structure. Not through aPacked types indeed should not be considered atomic. Paolo Thanks
|
|
Re: MemoryFence()
Andrew Fish <afish@...>
On Feb 5, 2021, at 10:47 AM, Laszlo Ersek <lersek@...> wrote:Laszlo, Clang treats unaligned pointers as undefined behavior (UB). I’m not sure if this is due to C11? What I know is clang choses to NOT optimize away UB from alignment errors, but if you run the address sanitizer you get errors. For fun one day I turned on the clang address sanitizer with our edk2 firmware and had it emit UD2 on faults so it did not require a runtime. It turned out our debugger stub had lots of alignment issues :( so I kind of gave up at that point :). Thanks, Andrew Fish Another example; we may have a pointer to a packed structure, and we
|
|
Re: MemoryFence()
Andrew Fish <afish@...>
On Feb 5, 2021, at 10:11 AM, Ni, Ray <ray.ni@...> wrote:Ray, In C calling out to assembly is also a barrier/fence operation from the compilers point of view. Actually calling an indirect procedure call (gBS->*, Protoco->*) is also a barrier. The compiler has no idea what the assemble code is doing across the boundary so all the operations need to complete prior to calling the assembly code (indirect procedure call). I guess a binary static lib is in this list too. In reality it is anything the C compiler can’t link time optimize through. For gcc flavored compliers any __asm__ call is a compiler read/write barrier. That is why you see __asm__ __volatile__ ("":::"memory”); in the memory fence. That means for gcc/clang any synchronization primitive implemented in inline assembler are also a compiler barrier/fence wrapping that assembly operation. The VC++ inline assemble seems to be “more integrated” with the compiler, so I’m not sure what the rules are for that. Some one should really investigate that. Something tells me you will not find those answer on the Linux mailing list :). The reason the MMIO operations are wrapped in _ReadWriteBarrier ()/__asm__ __volatile__ ("":::"memory”) has to do with the order of operations. 1) The leading barrier forces all the code before the call to complete the read/writes prior to the operation. 2) The trailing barrier forces the MMIO operation to complete before unrelated read writes that come after the call that could get reordered in optimization. As I mentioned in this thread I’m wondering if VC++ has extra behaviors around volatile that are compiler dependent and that is why we have not seen a lot of issues to date? I’m really glad this topic came up. It really seems like something we need to work through …... Thanks, Andrew Fish
|
|
Re: MemoryFence()
Laszlo Ersek
On 02/05/21 19:17, Paolo Bonzini wrote:
On x86 load-load, load-store and store-store ordering is already guaranteedThis is exactly what the fence API implementations should have in their comments, preferably with Intel / AMD manual references... On ARM (either 32- or 64-bit) the processor-level guarantees are weaker,Same, just with the ARM ARM. Thanks! Laszlo
|
|
Re: MemoryFence()
Laszlo Ersek
On 02/05/21 19:32, Paolo Bonzini wrote:
Unaligned pointers are already undefined behavior so you can ignore howMy question was unclearly asked, sorry. Let's say we have a UINT32 at an address that's not a multiple of 4, but a multiple of 2. A pointer to that UINT32 is "acceptably aligned" on x86, but not "naturally aligned". Dereferencing the pointer is not undefined (my reading of C99 suggests that alignment requirements are implementation-defined), but I don't know if the atomicity guarantee holds. Another example; we may have a pointer to a packed structure, and we might want to poke at a UINT32 field in that structure. Not through a naked pointer-to-UINT32 of course, which would throw away the packed-ness, but really through the pointer-to-the-whole-packed-struct. Thanks Laszlo
|
|
Re: MemoryFence()
Paolo Bonzini <pbonzini@...>
Il ven 5 feb 2021, 19:21 Laszlo Ersek <lersek@...> ha scritto:
To expand: it is an assumption on the implementation of the C compiler; itThis is why I'd like someone else to write the primitives :)Not as long as it's a pointer or smaller.ReleaseMemoryFence ();No concern that the store might not be atomic? admittedly goes beyond the C standard but it is a reasonable assumption, compilers may *merge* loads and stores but have no reason to split aligned loads and stores. Likewise for processors (the pointer limitation is because sizes larger than the pointer's may not have an instruction that reads or writes them atomically). The standard would require you to use C11 atomic loads and stores, but as far as I understand you cannot assume that they exist on all edk2 compilers. - Is it architecture-dependent? No. - Is it alignment-dependent? Unaligned pointers are already undefined behavior so you can ignore how they are dealt with at the processor level. Paolo etc... :)
|
|
Re: MemoryFence()
Ni, Ray
Wow! Thank you Paolo for explaining everything!
toggle quoted messageShow quoted text
I read a bit about c memory model. https://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync Do you have any documentation that can explain how the different models are implemented (using what instructions)? I would like to know if there is a better primitive api other than the fences because MSVC deprecates the barrier apis. https://docs.microsoft.com/en-us/cpp/intrinsics/readwritebarrier?view=msvc-160 thanks, ray ________________________________ 发件人: Paolo Bonzini <pbonzini@...> 发送时间: Saturday, February 6, 2021 2:17:53 AM 收件人: Ni, Ray <ray.ni@...> 抄送: edk2-rfc-groups-io <rfc@edk2.groups.io>; Laszlo Ersek <lersek@...>; Ard Biesheuvel <ardb@...>; Andrew Fish <afish@...>; Kinney, Michael D <michael.d.kinney@...>; Leif Lindholm (Nuvia address) <leif@...>; Dong, Eric <eric.dong@...>; Liming Gao (Byosoft address) <gaoliming@...>; Ankur Arora <ankur.a.arora@...> 主题: Re: [edk2-rfc] MemoryFence()
Il ven 5 feb 2021, 19:12 Ni, Ray <ray.ni@...<mailto:ray.ni@...>> ha scritto:
I saw the proposal of fences in first mail by Laszlo. Please forgive my ignorance. What is asm(“”) in x86? A nop? The how a nop can help as a processor level load store barrier? On x86 load-load, load-store and store-store ordering is already guaranteed by the processor. Therefore on x86 the AcquireMemoryFence and ReleaseMemoryFence are just like CompilerFence: they only have to block compiler-level reordering. MemoryFence is the only one that blocks store-load reordering and needs to emit an MFENCE instruction. On ARM (either 32- or 64-bit) the processor-level guarantees are weaker, and you need to emit a "dmb" instruction for acquire and release fences as well. Paolo thanks, ray ________________________________ 发件人: rfc@edk2.groups.io<mailto:rfc@edk2.groups.io> <rfc@edk2.groups.io<mailto:rfc@edk2.groups.io>> 代表 Paolo Bonzini <pbonzini@...<mailto:pbonzini@...>> 发送时间: Saturday, February 6, 2021 2:01:14 AM 收件人: Ni, Ray <ray.ni@...<mailto:ray.ni@...>>; Laszlo Ersek <lersek@...<mailto:lersek@...>>; Ard Biesheuvel <ardb@...<mailto:ardb@...>> 抄送: Andrew Fish <afish@...<mailto:afish@...>>; edk2 RFC list <rfc@edk2.groups.io<mailto:rfc@edk2.groups.io>>; Kinney, Michael D <michael.d.kinney@...<mailto:michael.d.kinney@...>>; Leif Lindholm (Nuvia address) <leif@...<mailto:leif@...>>; Dong, Eric <eric.dong@...<mailto:eric.dong@...>>; Liming Gao (Byosoft address) <gaoliming@...<mailto:gaoliming@...>>; Ankur Arora <ankur.a.arora@...<mailto:ankur.a.arora@...>> 主题: Re: [edk2-rfc] MemoryFence() On 05/02/21 18:53, Ni, Ray wrote: Without calling _ReadWriteBarrier, is it possible that compilerThe proposed ReleaseMemoryFence() should already have that effect. All the proposed fences except CompilerFence() are both compiler optimization barriers and processor barriers. InterlockedCompareExchange() is also both a compiler optimization barrier and a processor barrier CompilerFence() is just a better name for _ReadWriteBarrier(), it blocks optimizations but it has no effect at the processor level. It should only be used (instead of volatile) in busy-waiting loops that do not always go through an InterlockedCompareExchange. Paolo it *is* buggy because it is missing a234 _ReadWriteBarrier ();
|
|
Re: MemoryFence()
Laszlo Ersek
On 02/05/21 18:34, Paolo Bonzini wrote:
On 05/02/21 18:31, Laszlo Ersek wrote:This is why I'd like someone else to write the primitives :)On 02/05/21 18:21, Paolo Bonzini wrote:Not as long as it's a pointer or smaller.On 05/02/21 17:56, Laszlo Ersek wrote:No concern that the store might not be atomic?221 SPIN_LOCK *That would be overkill. However, it *is* buggy because it is missing a - Where is this documented? - Is it architecture-dependent? - Is it alignment-dependent? etc... :) (I'm not challenging your statement of course; I know I should have known, just not sure where from...) Thanks, Laszlo
|
|
Re: MemoryFence()
Paolo Bonzini <pbonzini@...>
Il ven 5 feb 2021, 19:12 Ni, Ray <ray.ni@...> ha scritto:
I saw the proposal of fences in first mail by Laszlo. Please forgive myOn x86 load-load, load-store and store-store ordering is already guaranteed by the processor. Therefore on x86 the AcquireMemoryFence and ReleaseMemoryFence are just like CompilerFence: they only have to block compiler-level reordering. MemoryFence is the only one that blocks store-load reordering and needs to emit an MFENCE instruction. On ARM (either 32- or 64-bit) the processor-level guarantees are weaker, and you need to emit a "dmb" instruction for acquire and release fences as well. Paolo
|
|
Re: MemoryFence()
Andrew Fish <afish@...>
Ray,
toggle quoted messageShow quoted text
Yes I was writing a mail when I wrote the bugzilla…. Ray, Function calls are not compiler fences. The compiler just optimized the code way as dead code. For non VC++ we have a real compiler fence/barrier for MemoryFence(). I filed a bugzilla [1] so we can investigate this some more. I hit a real bug with clang using the MMIO functions a long time ago and I think that might be why the GCC version of the MemoryFence() got the compiler fence/barrier. If I had to guess VC++ is probably more conservative about volatiles and that is why we have not seen issue with VC++, but that is depending on non standard behavior. Well this probably explains why we don’t have many MemoryFence() calls in the x86 CPU code, given they don’t do anything on VC++. I have noticed from debugging MP code that was developed using VC++, that we seem to hit more issues when compiled via clang. I wonder if the behavior difference with volatile would explain some of those? VOID EFIAPI MemoryFence ( VOID ) { return; } And: VOID EFIAPI MemoryFence ( VOID ) { // This is a little bit of overkill and it is more about the compiler that it is // actually processor synchronization. This is like the _ReadWriteBarrier // Microsoft specific intrinsic __asm__ __volatile__ ("":::"memory"); } So I think for VC++ it should be… VOID EFIAPI MemoryFence ( VOID ) { _ReadWriteBarrier(); return; } [1] https://bugzilla.tianocore.org/show_bug.cgi?id=3213 Thanks, Andrew Fish
On Feb 5, 2021, at 9:53 AM, Ni, Ray <ray.ni@...> wrote:
|
|
Re: MemoryFence()
Ni, Ray
I saw the proposal of fences in first mail by Laszlo. Please forgive my ignorance. What is asm(“”) in x86? A nop? The how a nop can help as a processor level load store barrier?
thanks, ray ________________________________ 发件人: rfc@edk2.groups.io <rfc@edk2.groups.io> 代表 Paolo Bonzini <pbonzini@...> 发送时间: Saturday, February 6, 2021 2:01:14 AM 收件人: Ni, Ray <ray.ni@...>; Laszlo Ersek <lersek@...>; Ard Biesheuvel <ardb@...> 抄送: Andrew Fish <afish@...>; edk2 RFC list <rfc@edk2.groups.io>; Kinney, Michael D <michael.d.kinney@...>; Leif Lindholm (Nuvia address) <leif@...>; Dong, Eric <eric.dong@...>; Liming Gao (Byosoft address) <gaoliming@...>; Ankur Arora <ankur.a.arora@...> 主题: Re: [edk2-rfc] MemoryFence() On 05/02/21 18:53, Ni, Ray wrote: Without calling _ReadWriteBarrier, is it possible that compilerThe proposed ReleaseMemoryFence() should already have that effect. All the proposed fences except CompilerFence() are both compiler optimization barriers and processor barriers. InterlockedCompareExchange() is also both a compiler optimization barrier and a processor barrier CompilerFence() is just a better name for _ReadWriteBarrier(), it blocks optimizations but it has no effect at the processor level. It should only be used (instead of volatile) in busy-waiting loops that do not always go through an InterlockedCompareExchange. Paolo it *is* buggy because it is missing a234 _ReadWriteBarrier ();
|
|