Re: MemoryFence()


Laszlo Ersek
 

On 02/05/21 17:04, Ni, Ray wrote:
Maybe that explains why the MemoryFence was implemented so
confidently.
I trust Mike:)

In UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c,
the LibWaitForSemaphore, LibReleaseSemaphore are coded without the
read write barrier.
Let's focus on the "down" operations first.

At commit c6be6dab9c4b:

789 /**
790 Decrement the semaphore by 1 if it is not zero.
791
792 Performs an atomic decrement operation for semaphore.
793 The compare exchange operation must be performed using
794 MP safe mechanisms.
795
796 @param Sem IN: 32-bit unsigned integer
797
798 **/
799 VOID
800 LibWaitForSemaphore (
801 IN OUT volatile UINT32 *Sem
802 )
803 {
804 UINT32 Value;
805
806 do {
807 Value = *Sem;
808 } while (Value == 0 ||
809 InterlockedCompareExchange32 (
810 Sem,
811 Value,
812 Value - 1
813 ) != Value);
814 }

But the AcquireSpinLockOrFail and ReleaseSpinLock in
MdePkg/Library/BaseSynchronizationLib/SynchronizationMsc.c contain
read write barrier.
Is the former implementation wrong because without the barrier the
cmpxchg can happen earlier or later chosen by the compiler?

Does volatile keyword help here?
I think LibWaitForSemaphore() is safe. (Which is not identical to saying
"it's in the best style".)

The volatile ensures there is a load instruction, and that the compiler
generates the load instruction in the right place.

The volatile does not ensure that the data fetched is neither corrupt
nor stale -- the load could fetch a partially updated value or a stale
value, into "Value". But in this particular instance that should be no
problem:

- if we see a corrupt or stale zero value, we'll just continue spinning;
the value will stabilize eventually

- if we see a corrupt or stale *nonzero* value, then when we attempt to
decrement it with a safe InterlockedCompareExchange32(), we'll just
fail.

So the volatile fetch can be considered an "initial guess" that works
most of the time.

(BTW, PiSmmCpuDxeSmm has a more recent variant that is less
power-hungry; see commit 9001b750df64 ("UefiCpuPkg/PiSmmCpuDxeSmm: pause
in WaitForSemaphore() before re-fetch", 2020-07-31).)

The implementation seen in
"MdePkg/Library/BaseSynchronizationLib/SynchronizationMsc.c" is:

165 /**
166 Attempts to place a spin lock in the acquired state.
167
168 This function checks the state of the spin lock specified by SpinLock. If
169 SpinLock is in the released state, then this function places SpinLock in the
170 acquired state and returns TRUE. Otherwise, FALSE is returned. All state
171 transitions of SpinLock must be performed using MP safe mechanisms.
172
173 If SpinLock is NULL, then ASSERT().
174 If SpinLock was not initialized with InitializeSpinLock(), then ASSERT().
175
176 @param SpinLock A pointer to the spin lock to place in the acquired state.
177
178 @retval TRUE SpinLock was placed in the acquired state.
179 @retval FALSE SpinLock could not be acquired.
180
181 **/
182 BOOLEAN
183 EFIAPI
184 AcquireSpinLockOrFail (
185 IN OUT SPIN_LOCK *SpinLock
186 )
187 {
188 SPIN_LOCK LockValue;
189 VOID *Result;
190
191 ASSERT (SpinLock != NULL);
192
193 LockValue = *SpinLock;
194 ASSERT (LockValue == SPIN_LOCK_ACQUIRED || LockValue == SPIN_LOCK_RELEASED);
195
196 _ReadWriteBarrier ();
197 Result = InterlockedCompareExchangePointer (
198 (VOID**)SpinLock,
199 (VOID*)SPIN_LOCK_RELEASED,
200 (VOID*)SPIN_LOCK_ACQUIRED
201 );
202
203 _ReadWriteBarrier ();
204 return (BOOLEAN) (Result == (VOID*) SPIN_LOCK_RELEASED);
205 }

I think the explicit barriers are better style (more helpful for the
programmer, and in general, possibly better performance).

On the other hand (independently of what you asked), the
AcquireSpinLockOrFail() logic seems a bit fishy to me.

First, the ASSERT() deserves a comment, at the bare minimum. ("Even if
the data is stale or partially updated, we can't conjure up a completely
bogus value", or some such.)

Second, the rest of the function doesn't even depend on LockValue.
Having the LockValue / ASSERT() stuff seems a bit pointless to me.

--*--

Now, regarding the "up" operations:

[UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c]

775 /**
776 Increment semaphore by 1.
777
778 @param Sem IN: 32-bit unsigned integer
779
780 **/
781 VOID
782 LibReleaseSemaphore (
783 IN OUT volatile UINT32 *Sem
784 )
785 {
786 InterlockedIncrement (Sem);
787 }

This seems OK to me.

(In an ideal world, InterlockedIncrement() should internally enforce a
compiler barrier too, so the volatile should not be needed.)

[MdePkg/Library/BaseSynchronizationLib/SynchronizationMsc.c]

207 /**
208 Releases a spin lock.
209
210 This function places the spin lock specified by SpinLock in the release state
211 and returns SpinLock.
212
213 If SpinLock is NULL, then ASSERT().
214 If SpinLock was not initialized with InitializeSpinLock(), then ASSERT().
215
216 @param SpinLock A pointer to the spin lock to release.
217
218 @return SpinLock released the lock.
219
220 **/
221 SPIN_LOCK *
222 EFIAPI
223 ReleaseSpinLock (
224 IN OUT SPIN_LOCK *SpinLock
225 )
226 {
227 SPIN_LOCK LockValue;
228
229 ASSERT (SpinLock != NULL);
230
231 LockValue = *SpinLock;
232 ASSERT (LockValue == SPIN_LOCK_ACQUIRED || LockValue == SPIN_LOCK_RELEASED);
233
234 _ReadWriteBarrier ();
235 *SpinLock = SPIN_LOCK_RELEASED;
236 _ReadWriteBarrier ();
237
238 return SpinLock;
239 }

Fishy. I would have implemented it with another
InterlockedCompareExchangePointer(), and maybe ASSERT()ed on the
original value returned by the InterlockedCompareExchangePointer().

Laszlo

Join rfc@edk2.groups.io to automatically receive all group messages.