Date
21 - 40 of 66
MemoryFence()
Laszlo Ersek
On 02/05/21 17:04, Ni, Ray wrote:
Maybe that explains why the MemoryFence was implemented soLet's focus on the "down" operations first. At commit c6be6dab9c4b: 789 /** 790 Decrement the semaphore by 1 if it is not zero. 791 792 Performs an atomic decrement operation for semaphore. 793 The compare exchange operation must be performed using 794 MP safe mechanisms. 795 796 @param Sem IN: 32-bit unsigned integer 797 798 **/ 799 VOID 800 LibWaitForSemaphore ( 801 IN OUT volatile UINT32 *Sem 802 ) 803 { 804 UINT32 Value; 805 806 do { 807 Value = *Sem; 808 } while (Value == 0 || 809 InterlockedCompareExchange32 ( 810 Sem, 811 Value, 812 Value - 1 813 ) != Value); 814 } But the AcquireSpinLockOrFail and ReleaseSpinLock inI think LibWaitForSemaphore() is safe. (Which is not identical to saying "it's in the best style".) The volatile ensures there is a load instruction, and that the compiler generates the load instruction in the right place. The volatile does not ensure that the data fetched is neither corrupt nor stale -- the load could fetch a partially updated value or a stale value, into "Value". But in this particular instance that should be no problem: - if we see a corrupt or stale zero value, we'll just continue spinning; the value will stabilize eventually - if we see a corrupt or stale *nonzero* value, then when we attempt to decrement it with a safe InterlockedCompareExchange32(), we'll just fail. So the volatile fetch can be considered an "initial guess" that works most of the time. (BTW, PiSmmCpuDxeSmm has a more recent variant that is less power-hungry; see commit 9001b750df64 ("UefiCpuPkg/PiSmmCpuDxeSmm: pause in WaitForSemaphore() before re-fetch", 2020-07-31).) The implementation seen in "MdePkg/Library/BaseSynchronizationLib/SynchronizationMsc.c" is: 165 /** 166 Attempts to place a spin lock in the acquired state. 167 168 This function checks the state of the spin lock specified by SpinLock. If 169 SpinLock is in the released state, then this function places SpinLock in the 170 acquired state and returns TRUE. Otherwise, FALSE is returned. All state 171 transitions of SpinLock must be performed using MP safe mechanisms. 172 173 If SpinLock is NULL, then ASSERT(). 174 If SpinLock was not initialized with InitializeSpinLock(), then ASSERT(). 175 176 @param SpinLock A pointer to the spin lock to place in the acquired state. 177 178 @retval TRUE SpinLock was placed in the acquired state. 179 @retval FALSE SpinLock could not be acquired. 180 181 **/ 182 BOOLEAN 183 EFIAPI 184 AcquireSpinLockOrFail ( 185 IN OUT SPIN_LOCK *SpinLock 186 ) 187 { 188 SPIN_LOCK LockValue; 189 VOID *Result; 190 191 ASSERT (SpinLock != NULL); 192 193 LockValue = *SpinLock; 194 ASSERT (LockValue == SPIN_LOCK_ACQUIRED || LockValue == SPIN_LOCK_RELEASED); 195 196 _ReadWriteBarrier (); 197 Result = InterlockedCompareExchangePointer ( 198 (VOID**)SpinLock, 199 (VOID*)SPIN_LOCK_RELEASED, 200 (VOID*)SPIN_LOCK_ACQUIRED 201 ); 202 203 _ReadWriteBarrier (); 204 return (BOOLEAN) (Result == (VOID*) SPIN_LOCK_RELEASED); 205 } I think the explicit barriers are better style (more helpful for the programmer, and in general, possibly better performance). On the other hand (independently of what you asked), the AcquireSpinLockOrFail() logic seems a bit fishy to me. First, the ASSERT() deserves a comment, at the bare minimum. ("Even if the data is stale or partially updated, we can't conjure up a completely bogus value", or some such.) Second, the rest of the function doesn't even depend on LockValue. Having the LockValue / ASSERT() stuff seems a bit pointless to me. --*-- Now, regarding the "up" operations: [UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c] 775 /** 776 Increment semaphore by 1. 777 778 @param Sem IN: 32-bit unsigned integer 779 780 **/ 781 VOID 782 LibReleaseSemaphore ( 783 IN OUT volatile UINT32 *Sem 784 ) 785 { 786 InterlockedIncrement (Sem); 787 } This seems OK to me. (In an ideal world, InterlockedIncrement() should internally enforce a compiler barrier too, so the volatile should not be needed.) [MdePkg/Library/BaseSynchronizationLib/SynchronizationMsc.c] 207 /** 208 Releases a spin lock. 209 210 This function places the spin lock specified by SpinLock in the release state 211 and returns SpinLock. 212 213 If SpinLock is NULL, then ASSERT(). 214 If SpinLock was not initialized with InitializeSpinLock(), then ASSERT(). 215 216 @param SpinLock A pointer to the spin lock to release. 217 218 @return SpinLock released the lock. 219 220 **/ 221 SPIN_LOCK * 222 EFIAPI 223 ReleaseSpinLock ( 224 IN OUT SPIN_LOCK *SpinLock 225 ) 226 { 227 SPIN_LOCK LockValue; 228 229 ASSERT (SpinLock != NULL); 230 231 LockValue = *SpinLock; 232 ASSERT (LockValue == SPIN_LOCK_ACQUIRED || LockValue == SPIN_LOCK_RELEASED); 233 234 _ReadWriteBarrier (); 235 *SpinLock = SPIN_LOCK_RELEASED; 236 _ReadWriteBarrier (); 237 238 return SpinLock; 239 } Fishy. I would have implemented it with another InterlockedCompareExchangePointer(), and maybe ASSERT()ed on the original value returned by the InterlockedCompareExchangePointer(). Laszlo |
|
Paolo Bonzini <pbonzini@...>
On 05/02/21 17:56, Laszlo Ersek wrote:
On 02/05/21 17:04, Ni, Ray wrote:Agreed. I would prefer an explicit compiler fence before the "Value = *Sem" assignment and no volatile anywhere, but it is okay. 193 LockValue = *SpinLock;The _ReadWriteBarrier (aka compiler fence) is pointless too; it is implicit in the InterlockedCompareExchangePointer. 781 VOIDYep. 221 SPIN_LOCK *That would be overkill. However, it *is* buggy because it is missing a (processor) barrier on non-x86 architectures and has a useless barrier after the store. Instead it should be just this: ReleaseMemoryFence (); *SpinLock = SPIN_LOCK_RELEASED; Paolo |
|
Laszlo Ersek
On 02/05/21 18:21, Paolo Bonzini wrote:
On 05/02/21 17:56, Laszlo Ersek wrote: No concern that the store might not be atomic?221 SPIN_LOCK *That would be overkill. However, it *is* buggy because it is missing a Thanks Laszlo |
|
Andrew Fish <afish@...>
On Feb 4, 2021, at 11:48 PM, Laszlo Ersek <lersek@...> wrote:OK I see Linux recommends not using volatile [1], so I think this makes sense to me. It is not that volatile is not useful, but it is not thread safe, so not using volatile encourages people to use primitives to make things thread safe. So not using volatile is more of a coding convention. - the two MemoryFence() calls should be removedThey may need to turn into CompilerFence(). - an AcquireMemoryFence() call should be inserted into the loop, afterCompilerFence (); do { Value = *Address; AcquireMemoryFence(); } while (Value == 0xFF) ; I think this is where x86 makes things confusing with its coherency all the time motto. On x86 I think they are really CompilerFences() to force C memory model order of operations. They got added to fix real bugs on x86. Remember the compiler can still reorder a volatile access withs surrounding code as an optimization. So for example if MmioWrite8() has MemoryFence() replaced with CompilerFence() these operations happen in order. If there is no CompilerFence() the order is implementation dependent. UINT8 *A, *B, *C; *A = 1; MmioWrite8 (B, 2); *C = 3; So that is why we want the CompilerFence() + volatile for the MMIO. For example you could end up writing to a Common DMA buffer and then doing an MMIO write. You might want those to happen in sequence and you need the CompilerFence() At least the CPU code is CPU specific so the rules should be more well defined :). [1] https://www.kernel.org/doc/html/latest/process/volatile-considered-harmful.html Thanks, Andrew Fish Thanks |
|
Paolo Bonzini <pbonzini@...>
On 05/02/21 18:31, Laszlo Ersek wrote:
On 02/05/21 18:21, Paolo Bonzini wrote:Not as long as it's a pointer or smaller.On 05/02/21 17:56, Laszlo Ersek wrote:No concern that the store might not be atomic?221 SPIN_LOCK *That would be overkill. However, it *is* buggy because it is missing a Paolo |
|
Andrew Fish <afish@...>
On Feb 5, 2021, at 12:25 AM, Ni, Ray <ray.ni@...> wrote:Ray,So if I understand correctlyI agree with this summary. I agree the C definition of volatile is not thread safe. I was doing some research on volatile and I realize the Linux kernel guides recommend against using volatile [1]. I think it has to do with people confusing volatile for being thread safe. So my guess is it is more of a coding standard to encourage people to use synchronization primitives. The primitives imply compiler fences which indirectly enforces the same behavior as volatile. [1] https://www.kernel.org/doc/html/latest/process/volatile-considered-harmful.html Thanks, Andrew Fish
|
|
Andrew Fish <afish@...>
On Feb 5, 2021, at 7:38 AM, Paolo Bonzini <pbonzini@...> wrote:We do link time optimization on VC++ and clang. Thanks, Andrew Fish Paolo |
|
Ni, Ray
Without calling _ReadWriteBarrier, is it possible that compiler generates the assembly in the wrong location? I mean the compiler may in-line the LibWaitForSemaphore and call cmpxchg earlier than the desired location.
Similar to LibReleaseSemaphore. So my understanding is the _ReadWriteBarrier in ReleaseSpinLock is required. I think Andrew also has the same thoughts. thanks, ray ________________________________ 发件人: Paolo Bonzini <pbonzini@...> 发送时间: Saturday, February 6, 2021 1:34:15 AM 收件人: Laszlo Ersek <lersek@...>; Ni, Ray <ray.ni@...>; Ard Biesheuvel <ardb@...> 抄送: Andrew Fish <afish@...>; edk2 RFC list <rfc@edk2.groups.io>; Kinney, Michael D <michael.d.kinney@...>; Leif Lindholm (Nuvia address) <leif@...>; Dong, Eric <eric.dong@...>; Liming Gao (Byosoft address) <gaoliming@...>; Ankur Arora <ankur.a.arora@...> 主题: Re: MemoryFence() On 05/02/21 18:31, Laszlo Ersek wrote: On 02/05/21 18:21, Paolo Bonzini wrote:Not as long as it's a pointer or smaller.On 05/02/21 17:56, Laszlo Ersek wrote:No concern that the store might not be atomic?221 SPIN_LOCK *That would be overkill. However, it *is* buggy because it is missing a Paolo |
|
Paolo Bonzini <pbonzini@...>
On 05/02/21 18:53, Ni, Ray wrote:
Without calling _ReadWriteBarrier, is it possible that compiler generates the assembly in the wrong location? I mean the compiler may in-line the LibWaitForSemaphore and call cmpxchg earlier than the desired location.The proposed ReleaseMemoryFence() should already have that effect. All the proposed fences except CompilerFence() are both compiler optimization barriers and processor barriers. InterlockedCompareExchange() is also both a compiler optimization barrier and a processor barrier CompilerFence() is just a better name for _ReadWriteBarrier(), it blocks optimizations but it has no effect at the processor level. It should only be used (instead of volatile) in busy-waiting loops that do not always go through an InterlockedCompareExchange. Paolo it *is* buggy because it is missing a234 _ReadWriteBarrier (); |
|
Ni, Ray
Andrew,
I just read the link you provided. It recommends to use mutex semaphore spin lock over volatile. But the funny thing here is we are talking about how to implement the mutex semaphore spin lock in edk2 Maybe we need to rely on built in atomic functions but not sure these are gcc only? https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html thanks, ray ________________________________ 发件人: Andrew Fish <afish@...> 发送时间: Saturday, February 6, 2021 1:48:50 AM 收件人: Ni, Ray <ray.ni@...> 抄送: Laszlo Ersek <lersek@...>; edk2 RFC list <rfc@edk2.groups.io>; Kinney, Michael D <michael.d.kinney@...>; Leif Lindholm (Nuvia address) <leif@...>; Ard Biesheuvel (kernel.org address) <ardb@...>; Dong, Eric <eric.dong@...>; Liming Gao (Byosoft address) <gaoliming@...>; Ankur Arora <ankur.a.arora@...>; Paolo Bonzini <pbonzini@...> 主题: Re: MemoryFence() On Feb 5, 2021, at 12:25 AM, Ni, Ray <ray.ni@...<mailto:ray.ni@...>> wrote: So if I understand correctlyI agree with this summary. Volatile using in MpInitLib or PiSmmCpu driver cannot be emitted because different CPU threads change the same memory content to synchronize with each other. I don’t quite understand if removing “volatile” what mechanism can be used to force the compiler generating code that always reads from memory? Ray, I agree the C definition of volatile is not thread safe. I was doing some research on volatile and I realize the Linux kernel guides recommend against using volatile [1]. I think it has to do with people confusing volatile for being thread safe. So my guess is it is more of a coding standard to encourage people to use synchronization primitives. The primitives imply compiler fences which indirectly enforces the same behavior as volatile. [1] https://www.kernel.org/doc/html/latest/process/volatile-considered-harmful.html Thanks, Andrew Fish Thanks, Ray |
|
Ni, Ray
I saw the proposal of fences in first mail by Laszlo. Please forgive my ignorance. What is asm(“”) in x86? A nop? The how a nop can help as a processor level load store barrier?
thanks, ray ________________________________ 发件人: rfc@edk2.groups.io <rfc@edk2.groups.io> 代表 Paolo Bonzini <pbonzini@...> 发送时间: Saturday, February 6, 2021 2:01:14 AM 收件人: Ni, Ray <ray.ni@...>; Laszlo Ersek <lersek@...>; Ard Biesheuvel <ardb@...> 抄送: Andrew Fish <afish@...>; edk2 RFC list <rfc@edk2.groups.io>; Kinney, Michael D <michael.d.kinney@...>; Leif Lindholm (Nuvia address) <leif@...>; Dong, Eric <eric.dong@...>; Liming Gao (Byosoft address) <gaoliming@...>; Ankur Arora <ankur.a.arora@...> 主题: Re: [edk2-rfc] MemoryFence() On 05/02/21 18:53, Ni, Ray wrote: Without calling _ReadWriteBarrier, is it possible that compilerThe proposed ReleaseMemoryFence() should already have that effect. All the proposed fences except CompilerFence() are both compiler optimization barriers and processor barriers. InterlockedCompareExchange() is also both a compiler optimization barrier and a processor barrier CompilerFence() is just a better name for _ReadWriteBarrier(), it blocks optimizations but it has no effect at the processor level. It should only be used (instead of volatile) in busy-waiting loops that do not always go through an InterlockedCompareExchange. Paolo it *is* buggy because it is missing a234 _ReadWriteBarrier (); |
|
Andrew Fish <afish@...>
Ray,
toggle quoted message
Show quoted text
Yes I was writing a mail when I wrote the bugzilla…. Ray, Function calls are not compiler fences. The compiler just optimized the code way as dead code. For non VC++ we have a real compiler fence/barrier for MemoryFence(). I filed a bugzilla [1] so we can investigate this some more. I hit a real bug with clang using the MMIO functions a long time ago and I think that might be why the GCC version of the MemoryFence() got the compiler fence/barrier. If I had to guess VC++ is probably more conservative about volatiles and that is why we have not seen issue with VC++, but that is depending on non standard behavior. Well this probably explains why we don’t have many MemoryFence() calls in the x86 CPU code, given they don’t do anything on VC++. I have noticed from debugging MP code that was developed using VC++, that we seem to hit more issues when compiled via clang. I wonder if the behavior difference with volatile would explain some of those? VOID EFIAPI MemoryFence ( VOID ) { return; } And: VOID EFIAPI MemoryFence ( VOID ) { // This is a little bit of overkill and it is more about the compiler that it is // actually processor synchronization. This is like the _ReadWriteBarrier // Microsoft specific intrinsic __asm__ __volatile__ ("":::"memory"); } So I think for VC++ it should be… VOID EFIAPI MemoryFence ( VOID ) { _ReadWriteBarrier(); return; } [1] https://bugzilla.tianocore.org/show_bug.cgi?id=3213 Thanks, Andrew Fish On Feb 5, 2021, at 9:53 AM, Ni, Ray <ray.ni@...> wrote: |
|
Paolo Bonzini <pbonzini@...>
Il ven 5 feb 2021, 19:12 Ni, Ray <ray.ni@...> ha scritto:
I saw the proposal of fences in first mail by Laszlo. Please forgive myOn x86 load-load, load-store and store-store ordering is already guaranteed by the processor. Therefore on x86 the AcquireMemoryFence and ReleaseMemoryFence are just like CompilerFence: they only have to block compiler-level reordering. MemoryFence is the only one that blocks store-load reordering and needs to emit an MFENCE instruction. On ARM (either 32- or 64-bit) the processor-level guarantees are weaker, and you need to emit a "dmb" instruction for acquire and release fences as well. Paolo
|
|
Laszlo Ersek
On 02/05/21 18:34, Paolo Bonzini wrote:
On 05/02/21 18:31, Laszlo Ersek wrote:This is why I'd like someone else to write the primitives :)On 02/05/21 18:21, Paolo Bonzini wrote:Not as long as it's a pointer or smaller.On 05/02/21 17:56, Laszlo Ersek wrote:No concern that the store might not be atomic?221 SPIN_LOCK *That would be overkill. However, it *is* buggy because it is missing a - Where is this documented? - Is it architecture-dependent? - Is it alignment-dependent? etc... :) (I'm not challenging your statement of course; I know I should have known, just not sure where from...) Thanks, Laszlo |
|
Ni, Ray
Wow! Thank you Paolo for explaining everything!
toggle quoted message
Show quoted text
I read a bit about c memory model. https://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync Do you have any documentation that can explain how the different models are implemented (using what instructions)? I would like to know if there is a better primitive api other than the fences because MSVC deprecates the barrier apis. https://docs.microsoft.com/en-us/cpp/intrinsics/readwritebarrier?view=msvc-160 thanks, ray ________________________________ 发件人: Paolo Bonzini <pbonzini@...> 发送时间: Saturday, February 6, 2021 2:17:53 AM 收件人: Ni, Ray <ray.ni@...> 抄送: edk2-rfc-groups-io <rfc@edk2.groups.io>; Laszlo Ersek <lersek@...>; Ard Biesheuvel <ardb@...>; Andrew Fish <afish@...>; Kinney, Michael D <michael.d.kinney@...>; Leif Lindholm (Nuvia address) <leif@...>; Dong, Eric <eric.dong@...>; Liming Gao (Byosoft address) <gaoliming@...>; Ankur Arora <ankur.a.arora@...> 主题: Re: [edk2-rfc] MemoryFence() Il ven 5 feb 2021, 19:12 Ni, Ray <ray.ni@...<mailto:ray.ni@...>> ha scritto:
I saw the proposal of fences in first mail by Laszlo. Please forgive my ignorance. What is asm(“”) in x86? A nop? The how a nop can help as a processor level load store barrier? On x86 load-load, load-store and store-store ordering is already guaranteed by the processor. Therefore on x86 the AcquireMemoryFence and ReleaseMemoryFence are just like CompilerFence: they only have to block compiler-level reordering. MemoryFence is the only one that blocks store-load reordering and needs to emit an MFENCE instruction. On ARM (either 32- or 64-bit) the processor-level guarantees are weaker, and you need to emit a "dmb" instruction for acquire and release fences as well. Paolo thanks, ray ________________________________ 发件人: rfc@edk2.groups.io<mailto:rfc@edk2.groups.io> <rfc@edk2.groups.io<mailto:rfc@edk2.groups.io>> 代表 Paolo Bonzini <pbonzini@...<mailto:pbonzini@...>> 发送时间: Saturday, February 6, 2021 2:01:14 AM 收件人: Ni, Ray <ray.ni@...<mailto:ray.ni@...>>; Laszlo Ersek <lersek@...<mailto:lersek@...>>; Ard Biesheuvel <ardb@...<mailto:ardb@...>> 抄送: Andrew Fish <afish@...<mailto:afish@...>>; edk2 RFC list <rfc@edk2.groups.io<mailto:rfc@edk2.groups.io>>; Kinney, Michael D <michael.d.kinney@...<mailto:michael.d.kinney@...>>; Leif Lindholm (Nuvia address) <leif@...<mailto:leif@...>>; Dong, Eric <eric.dong@...<mailto:eric.dong@...>>; Liming Gao (Byosoft address) <gaoliming@...<mailto:gaoliming@...>>; Ankur Arora <ankur.a.arora@...<mailto:ankur.a.arora@...>> 主题: Re: [edk2-rfc] MemoryFence() On 05/02/21 18:53, Ni, Ray wrote: Without calling _ReadWriteBarrier, is it possible that compilerThe proposed ReleaseMemoryFence() should already have that effect. All the proposed fences except CompilerFence() are both compiler optimization barriers and processor barriers. InterlockedCompareExchange() is also both a compiler optimization barrier and a processor barrier CompilerFence() is just a better name for _ReadWriteBarrier(), it blocks optimizations but it has no effect at the processor level. It should only be used (instead of volatile) in busy-waiting loops that do not always go through an InterlockedCompareExchange. Paolo it *is* buggy because it is missing a234 _ReadWriteBarrier (); |
|
Paolo Bonzini <pbonzini@...>
Il ven 5 feb 2021, 19:21 Laszlo Ersek <lersek@...> ha scritto:
To expand: it is an assumption on the implementation of the C compiler; itThis is why I'd like someone else to write the primitives :)Not as long as it's a pointer or smaller.ReleaseMemoryFence ();No concern that the store might not be atomic? admittedly goes beyond the C standard but it is a reasonable assumption, compilers may *merge* loads and stores but have no reason to split aligned loads and stores. Likewise for processors (the pointer limitation is because sizes larger than the pointer's may not have an instruction that reads or writes them atomically). The standard would require you to use C11 atomic loads and stores, but as far as I understand you cannot assume that they exist on all edk2 compilers. - Is it architecture-dependent? No. - Is it alignment-dependent? Unaligned pointers are already undefined behavior so you can ignore how they are dealt with at the processor level. Paolo etc... :) |
|
Laszlo Ersek
On 02/05/21 19:32, Paolo Bonzini wrote:
Unaligned pointers are already undefined behavior so you can ignore howMy question was unclearly asked, sorry. Let's say we have a UINT32 at an address that's not a multiple of 4, but a multiple of 2. A pointer to that UINT32 is "acceptably aligned" on x86, but not "naturally aligned". Dereferencing the pointer is not undefined (my reading of C99 suggests that alignment requirements are implementation-defined), but I don't know if the atomicity guarantee holds. Another example; we may have a pointer to a packed structure, and we might want to poke at a UINT32 field in that structure. Not through a naked pointer-to-UINT32 of course, which would throw away the packed-ness, but really through the pointer-to-the-whole-packed-struct. Thanks Laszlo |
|
Laszlo Ersek
On 02/05/21 19:17, Paolo Bonzini wrote:
On x86 load-load, load-store and store-store ordering is already guaranteedThis is exactly what the fence API implementations should have in their comments, preferably with Intel / AMD manual references... On ARM (either 32- or 64-bit) the processor-level guarantees are weaker,Same, just with the ARM ARM. Thanks! Laszlo |
|
Andrew Fish <afish@...>
On Feb 5, 2021, at 10:11 AM, Ni, Ray <ray.ni@...> wrote:Ray, In C calling out to assembly is also a barrier/fence operation from the compilers point of view. Actually calling an indirect procedure call (gBS->*, Protoco->*) is also a barrier. The compiler has no idea what the assemble code is doing across the boundary so all the operations need to complete prior to calling the assembly code (indirect procedure call). I guess a binary static lib is in this list too. In reality it is anything the C compiler can’t link time optimize through. For gcc flavored compliers any __asm__ call is a compiler read/write barrier. That is why you see __asm__ __volatile__ ("":::"memory”); in the memory fence. That means for gcc/clang any synchronization primitive implemented in inline assembler are also a compiler barrier/fence wrapping that assembly operation. The VC++ inline assemble seems to be “more integrated” with the compiler, so I’m not sure what the rules are for that. Some one should really investigate that. Something tells me you will not find those answer on the Linux mailing list :). The reason the MMIO operations are wrapped in _ReadWriteBarrier ()/__asm__ __volatile__ ("":::"memory”) has to do with the order of operations. 1) The leading barrier forces all the code before the call to complete the read/writes prior to the operation. 2) The trailing barrier forces the MMIO operation to complete before unrelated read writes that come after the call that could get reordered in optimization. As I mentioned in this thread I’m wondering if VC++ has extra behaviors around volatile that are compiler dependent and that is why we have not seen a lot of issues to date? I’m really glad this topic came up. It really seems like something we need to work through …... Thanks, Andrew Fish
|
|
Andrew Fish <afish@...>
On Feb 5, 2021, at 10:47 AM, Laszlo Ersek <lersek@...> wrote:Laszlo, Clang treats unaligned pointers as undefined behavior (UB). I’m not sure if this is due to C11? What I know is clang choses to NOT optimize away UB from alignment errors, but if you run the address sanitizer you get errors. For fun one day I turned on the clang address sanitizer with our edk2 firmware and had it emit UD2 on faults so it did not require a runtime. It turned out our debugger stub had lots of alignment issues :( so I kind of gave up at that point :). Thanks, Andrew Fish Another example; we may have a pointer to a packed structure, and we |
|